Automated Cassandra Metrics Cluster for VCD

VMware Cloud Director (VCD) has the ability to use an Apache Cassandra database to store metrics on VM performance and then make these metrics available so that tenants can view the historic performance of their VMs:

Installing a Cassandra cluster for use by VCD is reasonably straightforward and requires a minimum of 4 servers, of which 2 are configured as ‘seed’ nodes. There are several guides available online on how to do this, but many of these guides available don’t cover configuring the cluster with encryption between the nodes themselves or between clients (VCD) and the servers. In addition, since the 4.1.x release of Cassandra (which is now supported by the latest versions of VCD) ‘standard’ PEM certificates are supported unlike previous versions which required the use of Java JKS keystores which simplifies the configuration steps required.

As I find myself reasonably frequently deploying VCD in lab environments, I decided this would be an interesting challenge to fully automate the build and configuration of a Cassandra cluster with SSL enabled suitable for use with VCD. My main goals were:

  • Fully automated Cassandra cluster build from ‘scratch’ using Powershell and PowerCLI
  • Optionally configure SSL encryption between cluster nodes and between clients and the cluster
  • Use standard (cloud-init) way of customising the cluster nodes
  • Be able to use user/password and/or SSH certificate authentication to login to the cluster nodes
  • Allow customisation of all relevant cluster node parameters (node networking, sizing, storage etc.)
  • Allow use of an existing/external Certificate Authority (CA) to sign the SSL certificates
  • Publish the resulting scripts so that others could use this in their own labs/environments

Since this post is quite long, I’ve provided links to each part of the process below. If you just want to use the resulting scripts to build a cluster I’ve published a github repo which is documented with the steps required to deploy a cluster. I’ve provided two versions of the deploy script in this repo – one to configure SSL and one which allows deployment without SSL for testing.

Components Used & Environment

The specific environment used to develop and test the scripts is my homelab which uses the following components and versions – other versions/combinations may work, but have not been tested:

  • A VMware Cloud Director 10.5 environment
  • VMware vSphere 8.01 (vCenter & Hosts)
  • PowerShell Core v7.3.6 and VMware PowerCLI v13.1
  • An Openssl client (only required for the SSL script version), the one included in the Git desktop client for Windows systems is suitable and works fine
  • An Ubuntu 22.04 LTS OVA Cloud Image for the Node OS
  • A Certificate Authority (CA) to sign the node SSL certificates (if using the SSL version of the deploy script)
  • The scripts from https://github.com/jondwaite/vcd-cassandra which can be downloaded via:
    git clone https://github.com/jondwaite/vcd-cassandra (or via the download link from github)

The following files are included in the github repo:

FilePurpose
gencsrs.ps1Uses openssl to create private key files for each node and then generates Certificate Signing Request (CSR) files for each node in the ‘certs’ folder. Uses cluster.ps1 to obtain node parameters and openssl.cnf for openssl configuration
openssl.cnfCertificate configuration for the CSRs generated by gencsrs.ps1
cluster.ps1Defines the vCenter and Cassandra node cluster parameters, used by both gencsrs.ps1 and the deploy.ps1/deploy-nossl.ps1 scripts
deploy.ps1Build a Cassandra cluster using signed SSL certificates using the parameters from cluster.ps1
deploy-nossl.ps1Build the Cassandra cluster using the parameters from cluster.ps1 without using SSL certificates (does not require gencsrs.ps1 to have been run or certificates to have been generated or signed)

Configuration Files

I’ve attempting to commend scripts provided throughout so hopefully the settings will be mainly obvious to anyone reading through.

cluster.ps1

The bulk of the cluster configuration is performed in the cluster.ps1 file, this is then used by the other scripts to define the VM deployment target and the cassandra cluster configuration. The fields required to be populated in this file should be reasonably clear.

Note that the $CassNodes hash which defines the names & IP addresses for the cassandra nodes is critical and the $CassSeeds list must reference IP addresses from $CassNodes to determine which nodes are created as ‘seeds’. VCD requires a minimum cluster of 4 nodes with at least 2 of these being created as ‘seed’ nodes. There is no checking in the script to ensure these conditions are met so pay particular attention to these two variables.

The network specified and IP addressing defined for the nodes must also be accessible to the VCD cell servers in order for VCD to be able to access the nodes in the created cassandra cluster.

deploy.ps1 / deploy-nossl.ps1

These are 2 versions of the same deployment script, the deploy-nossl.ps1 version ignores certificates altogether and deploys a cassandra cluster which doesn’t use SSL between client and nodes or between the nodes. Change whichever one you intend to use, the other can be ignored.

The $nodeHD, $nodeCPUs and $nodeMem settings in the file determines the resources allocated to each provisioned node VM, the $OVAFile needs to be set to an Ubuntu 22.04-3 LTS cloud image OVA file (other releases & versions may work, but I’ve only tested against 22.04-3 LTS). This is unlikely to work with other Linux distributions, but the scripts should be (reasonably) straightforward to update to cope with other distributions.

There are some other ‘hard-coded’ settings in these files (e.g. the timezone to be assigned to the node VMs) which can also be adjusted as necessary directly in these files. Note that the GPG signing key for installing the cassandra package is specified directly and may change/need to be updated over time – the keyid: provided in the file at the time of writing (Sep 2023) is correct.

gencsrs.ps1 & openssl.cnf

These files are only used if using the SSL deployment script (deploy.ps1 rather than deploy-nossl.ps1). In gencsrs.ps1 the path to a working openssl executable and $SNRoot value should be updated as appropriate to your environment as should the openssl.cnf entries for certificate Country/State/City etc.

Generate Certificates for the Nodes

In order to be able to deploy a cassandra cluster where the client-to-node and node-to-node communication is encrypted with SSL, each cassandra node needs a certificate signed by a trusted CA. In order to provide these, the gencsrc.ps1 generates a private key file <nodename>.key and a Certificate Signing Request (CSR) file <nodename>.csr in the ‘certs’ folder (by default). These CSRs should then be submitted to a Certficate Authority (CA) to generate the actual certificate files. The signed certificate files generated should be saved as <nodename>.crt in the same folder as the key and CSR files.

A chain.crt file also needs to be created in the same location as the other certificate files which contains the public certificate chain of the signing CA (and any subordinate CAs). The order of certificates in this file should be any intermediate/signing certificates first followed by the root CA certificate as standard base64 encoded (PEM) format. This is necessary so that the trusted chain can be added to the cassandra node configurations to automatically trust the certificates presented by each node.

Deploy the Cluster

Once the files have been configured as described, open a PowerShell prompt and login to the target platform vCenter environment using Connect-VIServer and a user account with appropriate permissions to create the VMs.

Next run either deploy.ps1 (or deploy-nossl.ps1) as required and the script should create each node VM in turn and configure it. The VM configuration steps are provided by the $CloudConfig string which is built during the script execution for each node. $CloudConfig is simply a cloud-init configuration which is then converted to Base64 and added to the user_data configuration supplied to the VM deployment by Import-Vapp

This performs the following configuration for each node server:

  • Updates all packages to the latest versions (apt update & upgrade)
  • Sets the VM hostname
  • Sets the VM timezone
  • Sets the VM ‘ubuntu’ user password and sets this to not be expired (no forced password change on first login)
  • Specifies the repository source and GPG key to install the Apache Cassandra package
  • Installs Java ‘default-jdk’ (required for Cassandra), the Cassandra package itself and the net-tools package
  • Creates networking configuration files to assign the provided static IP address and networking information to the VM
  • Creates certificate files for cassandra in the /etc/cassandra/certs folder (SSL script only)
  • Creates a configuration file cassandra.yaml for cassandra (including the appropriate Cassandra SSL parameters in the SSL script)
  • For the last node to be installed (only), creates a ‘firstboot.sh’ file and set this to run on first boot which changes the cassandra user database password to that specified in cluster.ps1 and then removes itself
  • Creates a module blacklist entry for the floppy drive to prevent console Ubuntu errors
  • Enables SSH password authentication (disabled by default in Ubuntu cloud images)
  • Disables IPV6 networking
  • Enables the Ubuntu firewall and creates rules to allow Cassandra and ssh traffic to pass
  • Finally, restarts the node to ensure all updates and changes are applied

That’s quite a list, but it should be easy to identify each of these activities in the deploy.ps1 script.

The script then configures the destination VM parameters and changes the number of CPU cores, RAM and disk allocated to the VM as necessary.

Finally the VM is started and the script waits 5 minutes for the node configuration to be completed before deploying the next node. You can change this interval, but due to the large number of VM configurations made by cloud-init (and the following reboot) I’ve found this is generally a realistic value. Note that deploying all nodes without this pause can be done, but Cassandra has issues properly forming the cluster if multiple nodes attempt to join the cluster simultaneously so this pause provides a ‘safer’ way to ensure the cluster forms successfully.

Note that the firstboot.sh script which runs on the last node to set the cassandra database user password also waits an additional 30 seconds prior to running to allow the cluster to ‘settle’, this password change only needs to be done on a single node since the ‘cassandra’ database user exists for the entire cluster.

Once the cluster is deployed, operation can be checked using nodetool status to confirm that all nodes have been successfully deployed and joined to the cluster (‘UN’ status for each node):

If any of the nodes have a status other than ‘UN’ the cluster hasn’t formed correctly and consider allowing more time between the node deployments (change the Start-Sleep timer towards the end of deploy.ps1/deploy-nossl.ps1) and trying to redeploy.

Configure VCD to use the Cluster

When the deploy.ps1 (or deploy-nossl.ps1) script completes successfully it will output the command which needs to be run on the VCD cell to configure VCD to use the deployed cassandra cluster:

As mentioned before, if you are using the deploy-nossl.ps1 script you’ll also need to set ‘cassandra.use.ssl=0‘ in the VCD global.properties file or VCD won’t connect to the cluster.

If everything has been successful, pasting the cell-management-tool command line into VCD shoud give the following:

Once VCD has been successfully configured, each cell server will need the VCD service to be restarted (service restart vmware-vcd).

Final Thoughts / Conclusion

In this post I’ve detailed a method and code to deploy a functional Cassandra cluster with SSL enabled which is suitable for use as the metrics store in a VCD environment. The scripts I’ve provided in the github repo should also be easily understandable and adjustable to suit other use-cases too. In particular I’ve been impressed at how well a cloud-init user_data code block or script can be integrated into a deployment workflow by setting the OVF parameters available with the PowerCLI Import-vApp cmdlet.

Hopefully this post will prove useful to those needing to deploy Cassandra clusters with SSL encryption enabled, or serve as a starting point for such deployments.

One possible enhancement would be to automate the signing of the CSRs generated and to automatically place the generated certificates which I may look at in future.

As always, comments and feedback appreciated,

Jon.

Tagged , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

four − 1 =