vCD Edge SSL Certificate Management via PowerShell for tenants

Tom Fojta has a couple of really good blog posts on his blog ( about using Let’s Encrypt certificates on an NSX Edge Load Balancer. The first part can be found at and the second part at The method described in these posts relies on connectivity with the NSX management API, which is fine in an enterprise environment, but won’t work for tenants of a Service Provider vCloud Director environment where the NSX API is not directly accessible but needs to be accessed via a vCloud Director proxy.

VMware document the NSX proxy functionality here which allows vCloud API clients to make requests to the NSX API.

So I decided to see if I could get a script working that performed the same ‘certificate refresh’ in Tom’s posts working as a tenant in a vCloud Director environment as a tenant, the basic functionality I wanted was to:

  • Upload a new/replacement SSL certificate to the NSX Edge.
  • Change a Load Balancer application profile to use the new SSL certificate.
  • (Optionally) remove the ‘old’ certificate from the NSX Edge.

Ideally this would all be easily scriptable so that it could be triggered regularly (e.g. monthly) to continually extend the certificate life when using short-duration certificates such as those from Let’s Encrypt.

I found it was reasonably straightforward to get the script working via the vCloud proxy API to NSX, so I extended it a bit and turned it into a proper PowerShell module which I’ve also published to PS Gallery.

The project is hosted on my Github at this link: I’ve also included a with full documentation of the cmdlets made available from the module and examples of how these can be used.

There are still a couple of issues with the module which means it won’t currently work on PowerShell Core, but I hope to get these fixed and a new version uploaded which fixes this – will update this post once done.

As always, appreciate any comments/feedback, I think this module could be great for anyone wanting to use short-duration SSL certificates on services published via an NSX Edge Gateway as a tenant in a service provider environment.


vCloud Availability 3.0 – Working with the vCAV API

This entry is part 6 of 6 in the series vCloud Availability 3.0

One of the most welcome additions in vCloud Availability (vCAV) 3.0 is a public API which exposes much of the platform capability to automation and orchestration. In particular for Service Providers it is possible to relatively easily extract statistics on the number of replicated VMs, the type of replication (ongoing protection or one-off migrations), the storage consumption occupied by Point-In-Time instances and replication status.

It is also possible (I have yet to test this) to enable full configuration and re-configuration of replication via the vCAV API and this is something definitely on my list to test further.

VMware provide 2 python scripts in the vCAV appliances ( and which provide some good base information, but having to login to the appliances and run these locally under the vCAV appliance root user account isn’t ideal – as a Service Provider I’d like to be able to remotely interrogate the API and retrieve information on configured replications for billing and service monitoring purposes.

VMware has published the vCAV public API specification at but at this stage I’m unsure of the exact status of this – some conversations with VMware staff have indicated that this is not ‘officially released or supported’. Undeterred I decided to see what could be done to consume this API and write a small PowerShell module to make it easier to consume the API using vCloud Director session credentials rather than relying on ‘root’ user access to the appliances themselves.

Note: I have definitely noticed some inconsistencies between the API usage in the VMware scripts and what is currently documented on In some cases this prevented API calls from working and I had to reverse-engineer the calls from from the appliances to get the correct syntax. This may also explain why the public vCAV API is not yet officially supported.

The results of my experimentation and development have been published to github here: and I’ve also made the PowerVCAV module available in PowerShell Gallery so that it can be easily installed using the PowerShell Install-Module cmdlet. Note that PowerVCAV relies on connection information from PowerCLI and the Connect-CIServer cmdlet so this is required.

PowerVCAV consists of 6 cmdlets to assist in managing vCAV session connections and allow easy querying and consumption of the vCAV API.

I’ve included documentation for each of the cmdlets in the github repo readme, together with some examples of the connection process and syntax.

Hopefully this module will prove useful to others who need to work with the vCAV API and provide a foundation for being able to build queries against this.

As always, comments and feedback appreciated, and if you have any suggestions for improvements feel free to log a request against the github repo.


vCloud Availability 3.0 – Protecting & Migrating VMs

This entry is part 5 of 6 in the series vCloud Availability 3.0

In the 5th part in this series of posts on vCloud Availability 3.0 I wanted to try something new, so I’ve made a short (~10 min) video showing the configuration of VM replication between an on-premise vSphere environment and a Cloud Provider service using the vSphere client plugin and the vCloud Director tenant portal.

This is a bit of an experiment for me as an alternative to long pages of screen grabs so I’d love to know what you think and if I should do more in this format. The video was captured/uploaded at 1080p so probably best viewed in that quality to be able to read details properly.

I’m working on some followup videos showing how VM failover and failback works, and how to protect/migrate VMs between 2 Cloud Provider platforms – let me know what else you’d like to see too.


vCloud Availability 3.0 – On-Premise Deployment & Configuration

This entry is part 4 of 6 in the series vCloud Availability 3.0

In the first 3 parts of this series I detailed the configuration and deployment of vCAv 3.0 into a service provider site. In this 4th part I show the deployment and configuration of vCAv into a tenant on-premise infrastructure. This allows appropriately configured tenants to protect on-premise VMs to a cloud provider as well as protect cloud-hosted VMs back to their own on-premise infrastructure.

In this configuration I will use the already-configured Christchurch lab cloud environment as the endpoint and configure vCAv into an on-premise infrastructure at my test ‘Tyrell’ tenant environment which is configured with its own small vSphere SDDC based on vCenter, ESXi and VSAN storage. Note that NSX networking is not required in the tenant site, and only outbound tcp/443 (https) connectivity is required between the tenant and cloud provider site which makes vCAv almost trivially easy for customers to deploy into their datacenters.

The VMware documentation page for deploying vCAv at a tenant/on-premise site is available here, the process below show the configuration of the vCAv tenant appliance and configuring the appliance once deployed to connect to a cloud provider site.

Download the OnPrem version of the vCAv appliance from, note that this is different from the image used to deploy a Cloud Provider site and is listed under the ‘Drivers & Tools’ tab in

Login to the on-premise vCenter and select the option to deploy an OVF Template, select the downloaded OnPrem appliance:

Specify the VM name to be assigned to the vCAv OnPrem appliance and select the datacenter where it will be deployed:

Next select the vSphere cluster where the vCAv appliance will run:

The next screen allows you to review the template details prior to deployment:

You must agree to the VMware EULA agreement before proceeding:

Chose the datastore and storage policy for the vCAv OnPrem appliance deployment (in this example I only have a single VSAN datastore to select from):

Select the network for the vCAv appliance to connect on and the IP assignment type:

Assign the initial ‘root’ user password (as with the Cloud appliance, this will be forced to change on first login to the appliance UI) and configure appropriate networking settings – IP address, subnet mask, default gateway, DNS servers, DNS domain and NTP server:

Review the summary screen and click ‘Finish’ to initiate the appliance deployment:

Once the appliance deployment has completed, you can power it on and access the configuration UI at https://<appliance-deployed-ip> to continue with configuration. Once you have logged in and changed the ‘root’ user password you will be shown the screen shown below:

Click to run the initial setup wizard and enter a name for the on-premise site:

Enter the Lookup service address and authentication details for the local (on-prem) vCenter/PSC infrastructure (in this example I’m once again using vCenter with an embedded PSC):

Next specify the URI for the public API endpoint for the cloud-provider instance of vCloud Availability (not the vCloud Director public endpoint) and provide login credentials to the tenant organization within that cloud environment, you will need to confirm the provided SSL certificate as the connection is made.

Note: If you select the ‘Allow Access from Cloud’ option, then administrators in the Cloud Provider will gain capability in the local vCenter/vSphere environment – here’s what the VMware documentation says on this:

“By selecting this option you allow the cloud provider and the organization administrators to execute the following operations from the vCloud Availability Portal without authenticating to the on-premises site.

  • Discover on-premises workloads and replicate them to the cloud.
  • Reverse existing replications to the on-premises site.
  • Replicate cloud workloads to the on-premises site.

By leaving this option deselected, only users authenticated to the on-premises vCloud Availability Portal can configure new replications and existing replications cannot be reversed from the vCloud Availability Portal.”

Make sure you understand the implications of selecting this option (or not selecting this option) when configuring the appliance and set it appropriately. In my lab environment I enabled the setting.

Confirm / deny participation in VMware CEIP in the next screen:

In the confirmation screen, you can use the slider to continue to the ‘Configure local placement’ dialogs on completion of the initial OnPrem appliance configuration. If you chose you can complete this process separately later, but in this example I chose to continue and configure local VM placement immediately:

The placement configuration allows you to select the environment that will be used by VMs replicated from the cloud to the OnPrem environment, in the first screen select the VM folder destination where the VMs will appear:

Next select the compute cluster where the VMs will be registered:

Next select the default network to which the replicated VMs will be attached:

Select the vSphere Datastore where the replicated VM disks will be stored:

Finally review the supplied details and complete the placement configuration:

The ‘Configuration’ tab in the appliance should now show the configured values as shown below:

Clicking the ‘System Monitoring’ tab should show connectivity to all services and to the remove Service Provider cloud:

Signing into the local vCenter environment should now display the banner (shown below) as the vCAv plugin is registered into the local vCenter:

Clicking the ‘Refresh Browser’ button and going to the ‘Home’ link in vCenter will now show a new menu entry for vCloud Availability:

vCloud Availability tab in vCenter HTML5 UI post-installation & Configuration of the OnPrem vCAv Appliance

Selecting the ‘vCloud Availability’ link in vCenter will open a new panel showing the vCloud Availability interface:

That completes the configuration of an on-premise connection to vCloud Availability – at this point we can now replicate VMs both to and from the Cloud Service Provider infrastructure to our own vSphere cluster. Although there are quite a few steps in the process, I hope you can see that the configuration of the OnPrem appliance is actually very straightforward and easy. A particular advantage compared to previous deployments with other products such as vCloud Availability and vCloud Extender is that no inbound firewall or NAT rules are required in the vCAv OnPrem configuration with v3.0.

This concludes the 4th part of this series looking at vCloud Availability 3.0, in the next part of the series now that we have both Cloud-to-Cloud (Parts 2 & 3) and OnPrem-to-Cloud (this part) configurations completed I’ll look at configuring VM replication protection and failover/migration of replicated VMs.

As always, corrections, comments and feedback welcome!


vCloud Availability 3.0 – Site Pairing & vCAv Policies

This entry is part 3 of 6 in the series vCloud Availability 3.0

The first 2 parts of this series covered the overall vCloud Availability (vCAv) architecture and the deployment and configuration of the vCAv appliances into a Cloud Provider site. Before continuing pairing sites and configuring VM replication policies, first check that all services are online and showing as healthy.

The easiest place to do this is the ‘System Monitoring’ screen in the vApp Replication Manager portal (in my lab this is for the Auckland site and for the Christchurch site). The resulting panels look like this (the ‘Local replicators’ tab has been expanded in both sites):

System Monitoring screens for both sites prior to site pairing

Note: If you have changed the SSL certificate for the vApp Replication Manager portal as mentioned at the end of my previous post, you may see the ‘Tunnel connectivity’ item showing red with a ‘requires authentication’ error. If so, simply access the configuration tab, click ‘Edit’ next to the ‘Tunnel address’ and provide the appliance password when prompted as shown in the screen below:

Re-authenticate Tunnel Service after SSL certificate change

Note: I had a number of instances in my lab setup where the ‘Network’ entry (arrowed & boxed in green above) for the vApp Replication Manager had changed to be the public URL for vCAv. If this occurs you will not be able to pair sites as the tunnel appliance will redirect the ‘management’ traffic back to itself. To fix this, edit the entry and point this back to the internal name/IP of the vApp Replication Manager appliance and re-enter the appliance password.

Based on my experiences in testing, I strongly suggest at this stage that you do not continue attempting to pair vCAv cloud sites until you have resolved any issues and have all System Monitoring links showing as Green/Ok. I had a number of issues in my early lab attempts to configure vCAv which would likely have been avoided if I’d done this…

You also at this point need to make sure that your public API endpoint link is added to your firewall and NAT configuration so that internet access to your vCAv public API address is passed to port 8048 on the Tunnel appliance. Once configured properly, accessing the public vCAv API address from a browser should show the vCAv portal login screen:

Checking that your public vCAv URI is accessible

Pairing vCloud Availability Cloud Provider Sites

To pair the vCAv sites, first logout of any vCAv portals and go to the user login at https:<IP-address-of-vApp-Replication-Manager>/ui/login (as opposed to /ui/admin). The username will show ‘user@org’ instead of the ‘root’ Appliance login presented from the /admin/ui portal. Login with your vCloud Director provider credentials (e.g. ‘administrator@system’).

Once logged in, selecting the ‘Sites’ option should show a screen similar to the following:

Site Pairing – Check that the endpoint URL matches your vCAv public URL

Click ‘New Pairing’ and provide the details for the 2nd vCAv site (Since I’m configuring the site pair on my Auckland site I enter the details for the Christchurch site to pair):

Use the vCAv public URL from the partner site

Accept the SSL certificate from the paired site and you should be able to successfully pair the sites, the Sites window should now look like this:

Site Pairing Completed

Note: It is only necessary to perform this pairing on one site providing you specify the remote appliance credentials, vCAv will automatically associate the sites in both directions.

If we select the 2nd (C00-Christchurch) site, a login button appears allowing us to authenticate as our vCloud Director provider admin user (administrator@system) to the 2nd site. After successful authentication the Sites tab will show us with a management session to both sites:

If you now check the ‘System Monitoring’ tab, you should see that both the local (to the site you are logged in to) and remote vCAv Replicators are shown and connected:

Note: The odd ‘Address’ shown for the Remote replicator (boxed in red above) is correct – this is an internal address used to reach the remote replicator appliance via the vCAv Tunnel.

vCloud Availability Policies

Now that we have 2 paired vCAv sites, we can configure policies and assign these to vCloud Director tenants to allow these tenants to configure protection for their VMs.

Again working in the vApp Replication Manager portal ( in my lab environment) and signed in as our vCloud Director provider account we can see the vCAv policies under the ‘Policies’ tab. The default vCAv policy assigned to all vCloud tenants by default forbids any replications:

Creating a new vCAv Policy

Selecting the ‘New’ button allows us to configure a new policy:

Configuring a new vCAv Policy

Replication can be allowed in either direction and the maximum number of retained instances (snapshots) per VM replication can also be set between 1 and 24. The minimum allowed RPO (set to 4 hours in this example, but configurable from 5 minutes to 24 hours) prevents users of the policy configuring smaller RPOs than defined in the policy.

Note: You will need to configure and assign a policy in both sites (Auckland and Christchurch in my lab environment) to permit configuration of VM replication. Policies do not automatically replicate between cloud sites, neither do assignments of policies to organizations/tenants.

With the policy just created selected, we can now select the ‘assign’ link to add tenant organizations to the policy as shown below:

Asigning vCAv policy to Organization

The popup dialog that appears on clicking ‘Assign’ allows one (or many) vCloud tenant Organizations to be associated with the specified policy.

Note: A vCloud Organization can only ever be assigned to one vCAv policy at a time, if a tenant is assigned to a new policy any previous assignments from that tenant will be removed. In the screen below I assign the new policy to the ‘Tyrell’ tenant organization in vCloud Director:

Asigning vCAv Policy to a vCloud Tenant Organization

In order for the tenant to be able to configure VM replication, a policy must be defined and assigned in both sites to the organization (in order to have resources at both sites the tenant must also be defined and have virtual datacenter resources assigned to them in both sites). In my lab setup the ‘Tyrell’ organization has VDCs assigned in both sites and I have also configured an identical vCAv policy in the 2nd site and assigned it.

That is all that is required in order for tenants to be able to configure VM replication in their vCloud Director portal and perform migration and VM failover between sites.

In the next post in this series I will detail the configuration steps to link an on-premise vSphere environment to connect to a vCloud Service Provider site using vCAv.

vCloud Availability 3.0 – Cloud Deployment & Configuration

This entry is part 2 of 6 in the series vCloud Availability 3.0

Deployment Configuration

In my lab environment I have two SP datacenter locations (Auckland and Christchurch since I’m in New Zealand) and a complete vCloud infrastructure running in each location. I have defined the appliance names and IP addresses prior to deploying vCAv and registered these in DNS prior to starting deployment as this will simplify the configuration later. My lab sites happen to have network connectivity via a VPN, but this is not important for vCAv as all network communication between the sites will be via the Tunnel appliances and the external (public) network.

Note: This was one of the first issues that I encountered when building the environment – I assumed that replication traffic would be capable of using internal networking between the replicator appliances, but this is not the case in the current release of vCAv and all communication must use the Tunnel appliances’ public network.

In order to deploy vCAv into a production-like configuration 3 appliances are required in each vCloud site. Since my lab configuration spans 2 sites I will need a total of 6 appliances. While the vCloud Availability documentation has good documentation on deploying appliances in the vCenter UI, I found it much easier (and reproducible when testing) to use DOS batch file to deploy the appliances using VMware OVFTool. In my lab environment I defined the following names and IP addresses for the appliances:

Site 1 (Auckland):

Appliance NameDeployment TypeIP AddressAdministration URI(s)
vdev-a03-vcamcloud10.207.0.44vCA Replication Manager:
vCA vApp Replication Manager:
vdev-a03-vcar01replicator10.207.0.45vCA Replicator:
vdev-a03-vcattunnel10.207.0.46vCA Tunnel:

Site 2 (Christchurch)

Appliance NameDeployment TypeIP AddressAdministration URI(s)
vdev-c00-vcamcloud10.200.0.44vCA Replication Manager:
vCA vApp Replication Manager:
vdev-c00-vcar01replicator10.200.0.45vCA Replicator:
vdev-c00-vcattunnel10.200.0.46vCA Tunnel:

I then used 6 copies of the following file (saved with a .cmd extension on a Windows admin machine) to deploy the appliances, changing the variable assignments as appropriate – the example below deploys the ‘cloud’ appliance in the Christchurch site. Obviously if using this change the relevant parameters to suit your environment as well as the file locations of the ovftool.exe file and the vCloud Availability deployment .OVA file.

Note: The OVFTOOL syntax is extremely sensitive to syntax, so make sure you carefully check the entries provided. Also note that if any passwords contain certain special characters this can cause OVFTOOL issues (single and double quotation marks in particular) and you may need to use an alternative administrative account that does not have these characters in it’s password.

Note: If the appliances deploy but their consoles show that no networking is configured this most likely means that one or more of the parameters supplied are not in the correct format (in particular, don’t use single-quote marks around values as shown in the example deployment for Linux in the VMware documentation).

The script will create a log file ‘<VM name>-deploy.log’ in the folder it is run from showing the results of the ovftool command for troubleshooting.

@echo off

::Appliance deployment details:
SET DEPLOYTYPE=<One of 'cloud', 'replicator' or 'tunnel' (without ' marks) depending on appliance function>
SET VMNAME=<name for the VM>
SET VMIP=<IP address for the VM>
SET ROOTPASS=<Initial root password on the appliance - will be forced to change on first login>

::File locations for vCAv and OVFTOOL.EXE:
SET VCAIMAGE="%HOMEPATH%\Downloads\VMware-vCloud-Availability-"
SET OVFTOOL="C:\Program Files\VMware\VMware OVF Tool\ovftool.exe"

::Target vCenter:
SET VIHOST=<vCenter host name>
SET VIUSER=<vCenter admin user - e.g. administrator@vsphere.local>
SET VIPASS=<vSphere Password>
SET VILOCATOR=<vCenter Locator - e.g. C00/host/DEVCLU-C00>

::Storage & Networking for Appliance:
SET VMDS=<vCenter Datastore for appliance>
SET VMNET=<vCenter Network name for appliance>
SET NTPSERV=<NTP Server IP address for appliance>
SET DNSSERV=<DNS Server(s) for appliance - comma separated>
SET DNSDOMAIN=<DNS Domain Name for appliance>
SET IPGATEWAY=<Default IP Gateway for appliance>
SET IPNETMASK=<Subnet Mask for appliance network>

%OVFTOOL% --name="%VMNAME%" --datastore="%VMDS%" --acceptAllEulas^
 --powerOn --X:enableHiddenProperties --X:injectOvfEnv --X:waitForIp^
 --ipAllocationPolicy=fixedPolicy --deploymentOption=%DEPLOYTYPE% --machineOutput^
 --noSSLVerify --overwrite --powerOffTarget "--net:VM Network=%VMNET%"^
 --diskMode=thin --X:logFile=%VMNAME%-deploy.log --X:logLevel=verbose^

As the syntax is so fiddly, I’ve included a (working) example of the script used to deploy the ‘cloud’ appliance in the Christchurch site below unedited apart from password redaction:

@echo off

::Appliance deployment details:
SET VMNAME=vdev-c00-vcam

::File locations for vCAv and OVFTOOL.EXE:
SET VCAIMAGE="%HOMEPATH%\Downloads\VMware-vCloud-Availability-"
SET OVFTOOL="C:\Program Files\VMware\VMware OVF Tool\ovftool.exe"

::Target vCenter:
SET VIHOST=vdev-c00-vc01.vdev.local
SET VIUSER=administrator@vsphere.local
SET VIPASS=<Redacted>

::Storage & Networking for Appliance:
SET DNSDOMAIN=vdev.local

%OVFTOOL% --name="%VMNAME%" --datastore="%VMDS%" --acceptAllEulas^
 --powerOn --X:enableHiddenProperties --X:injectOvfEnv --X:waitForIp^
 --ipAllocationPolicy=fixedPolicy --deploymentOption=%DEPLOYTYPE% --machineOutput^
 --noSSLVerify --overwrite --powerOffTarget "--net:VM Network=%VMNET%"^
 --diskMode=thin --X:logFile=%VMNAME%-deploy.log --X:logLevel=verbose^

Once the appliances are deployed and started, signing into the admin URI listed in the table above first forces a password change for the root appliance user which must be completed on each appliance.

Note: The ‘root’ account is common between the 2 sites which run on the ‘cloud’ (vApp Replication Manager) appliance so only needs to be changed once here:

Changing appliance root password

The VMware documentation has very good guides for configuring the appliances once deployed, I’ve included screenshots below at each step showing the relevant steps. I’ve shown the generic (documentation) URI and the specific URI in my lab for the Auckland site for each step as it can get confusing which administrative console you should actually be using in each step. I’ve also linked each step to the relevant section of the VMware documentation to make it easier to follow.

Step 1 – Configure vCloud Availability Replication Manager
Admin Link: https://<vApp-Replication-Manager-IP-address>:8441/ui/admin
Lab Link: (vdev-a03-vcam)

Since my lab uses vCenter servers with embedded Platform Services Controllers (PSC), the Lookup Service address is actually on the vCenter server. You will need to confirm the Lookup Service certificate to configure this setting.

Step 1 – Configured Lookup service address in vCAv Replication Manager

Step 2 – Configure a vCloud Availability vApp Replication Manager
Admin Link: https://<vApp-Replication-Manager-IP-address>/ui/admin
Lab Link: (vdev-a03-vcam)

Step 2 – Run intial setup wizard
Step 2 – Enter site name and public API endpoint

Note: The Public API endpoint in this dialog should be set to the public DNS name which will be eventually used to access vCAv from the internet by your tenants. This should be different to the URI used to access the vCloud Director portal. (e.g. ‘’)

Step 2 – Configure connection to Lookup service (accept certificate)
Step 2 – Configure vCloud Director API endpoint (accept certificate)
Step 2 – Enter vCAv 3.x License key
Step 2 – Confirm participation in CEIP
Step 2 – Completion / summary screen

After completing the wizard, clicking the ‘System Monitoring’ tab should show a screen similar to the one shown below, at this stage the two warnings for Tunnel connectivity and Configured replicators are normal/expected as we haven’t completed these steps yet.

Step 2 – Post initial setup wizard

Step 3 – Configure vCloud Availability Replicator Appliance
Admin Link: https://<vApp-Replicator-Appliance-IP-address>/ui/admin
Lab Link: (vdev-a03-vcar01)

Step 3 – Configure Replicator Lookup Service

Once configured (and the certificate accepted), you should see the Replicator appliance System Monitoring screen similar to below:

Step 3 – Replicator appliance with Lookup service configured

Step 4 – Register a vCloud Availability Replicator with a vCloud Availability Replication Manager in the Same Site
Admin Link: https://<vApp-Replication-Manager-IP-address>:8441/ui/admin
Lab Link: (vdev-a03-vcam)

Step 4 – Select ‘Replicators’ option then ‘New’
Step 4 – Completing the New Replicator settings

Note: Configure port 8043 on the replicator appliance – the VMware documentation shows port 8440 for this (presumably from a ‘combined’ appliance deployment). When you click ‘Add’ you will need to accept the certificate from the Replicator appliance.

Step 4 – Replication Manager with Replicator appliance added

Step 5 – Configure vCloud Availability Tunnel
Admin Link: https://Tunnel-Appliance-IP-address>/ui/admin
Lab Link: (vdev-a03-vcat)

Step 5 – Configure Lookup service on Tunnel Appliance

After configuring the Lookup Service, check that the System Monitoring tab shows connectivity:

Step 5 – Checking Tunnel Appliance Lookup service connectivity

Step 6 – Enable vCloud Availability Tunnel
Admin Link: https://<vApp-Replication-Manager-IP-address>/ui/admin
Lab Link: (vdev-a03-vcam)

Step 6 – vApp Replication Manager console

Selecting the ‘Configuration’ tab brings up the following screen:

Note: If you are placing the Tunnel appliance behind a NAT firewall (recommended) and using DNAT port-translation from tcp/443 (externally) to 8048 (internally on the Tunnel appliance), you should click ‘Edit’ on the ‘Public API endpoint’ and update this to reflect the external port (443) at this stage. This configuration allows tenants/users to see the vCAv portal externally on port 443 and prevents them needing to open any additional outbound firewall ports.

Step 6 – Edit Tunnel settings
Step 6 – Configuring the Tunneling settings

Accept the certificate when prompted to save the tunnel configuration.

Step 7 – Restart Services
Admin Link: https://<vApp-Replication-Manager-IP-address>/ui/admin
Lab Link: (vdev-a03-vcam)
Admin Link: https://<vApp-Replicator-Appliance-IP-address>/ui/admin
Lab Link: (vdev-a03-vcar01)

As mentioned in the VMware documentation and in the warning on the tunnel configuration dialog shown above, you must now restart all vCAv services on the local site vApp Replication Manager and Replicator appliances – simply login to each appliance and under ‘System Monitoring’ click the ‘Restart Service’ button:

Step 7 – Restart Services

When accessing vCloud Availability inside the vCloud Director portal, the SSL certificate used to render the plugin data will originate from the vCloud Availability vApp Replication Manager portal. For this reason, it is a good idea at this stage to replace the self-signed certificate generated when the appliance is deployed with a ‘proper’ SSL certificate which is registered to the public URI that vCAv is using.

e.g. If the Public API for vCloud Availability is ‘’ then you should reconfigure the vApp Replication Manager portal to use an SSL certificate which is valid for

The process to reconfigure the SSL certificate in the vApp Replication Manager portal is described in the VMware documentation.

Important Note for Wildcard SSL Certificates:If you are using wildcard SSL certificates (e.g. *, you CANNOT use these when configuring the vApp Replication Manager portals in multiple Service Provider sites. This is because the site-pairing operation checks the SSL certificate thumbprint being used in each site and will refuse to pair sites if the same thumbprint is detected at both sites. Use dedicated SSL certificates at each site when configuring multiple vCAv cloud endpoints.

The next part of this series will detail pairing the 2 deployed Service Provider instances deployed and how VM replication policies can be defined and assigned to cloud tenants to allow them to start protecting their VMs.

vCloud Availability 3.0 – Introduction

This entry is part 1 of 6 in the series vCloud Availability 3.0

VMware has recently released version 3.0 of vCloud Availability (vCAv) (Release Notes) which allows vCloud Service Providers to offer a variety of VM protection and migration services to their tenant customers. vCAv 3.0 combines features previously available in 3 separate VMware products (vCloud Availability Cloud-to-Cloud DR, vCloud Availability for vCloud Director and vCloud Extender) and allows:

  • Protect/replicate and failover VMs to/from on-premise vSphere environments to a vCloud Service Provider.
  • Protect/replicate and failover VMs between 2 virtual datacenters provided by a vCloud Service Provider (these would generally be in 2 distinct geographic locations).
  • Migrate VMs to/from on-premise vSphere environments and a vCloud Service Provider.
vCloud Availability 3.0 Functions (Image is (c)VMware 2019)

vCloud Availability 3.0 (vCAv) also supports advanced functionality usually reserved for products such as VMware Site Recovery Manager (SRM) such as allowing VM network information to be changed during failover to ensure VMs can connect to the destination network when failed-over or migrated. The tenant administrative portal is tightly integrated into VMware vCloud Director allowing full control of VM replication tasks in the same interface used by tenants to administer their virtual machines.

Service Providers can define policies and apply these on a per-tenant basis to control items such as:

  • How many customer VMs can be replicated (a fixed number of VMs or ‘unlimited’).
  • What the minimum configurable RPO interval is for VM replication (as low as 5 minutes for vSphere 6.5+ environments and up to 24 hours).
  • How many snapshots of each VM can be retained (from 1 to 24).
vCloud Availability Policy Definition

Since the release of vCAv 3.0 I’ve been deploying and testing the solution components, this is the first part in a series of posts is designed to emulate a complete ‘real-world’ deployment consisting of 2 distinct cloud provider sites and a ‘customer’ on-premises infrastructure so I can detail all of the deployment, configuration and end-user usage scenarios across these.

To configure a production-realistic environment, I have deployed separate vCAv appliances for the ‘cloud’, ‘replicator’ and ‘tunnel’ functions, a typical service provider network diagram with the ports used by vCAv for communication is shown in the diagram below. Note that in an actual production implementation the ‘tunnel’ appliance would generally be deployed into a DMZ network with the ‘cloud’ (Replication Manager and vApp Replication Manager) and ‘replicator’ appliances deployed into the Service Provider management network.

vCloud Availability 3.0 Network Architecture & Ports

This concludes the first post in this series, in future posts I aim to cover:

  • Deployment and configuration of vCAv appliances into a Cloud Service Provider
  • Pairing Cloud Provider Sites, Defining VM replication policies and assigning these to tenants
  • On-premise deployment and configuration into a customer vSphere cluster
  • Protecting / replicating VMs from Cloud to Cloud, On-Premise to Cloud and Cloud to On-Premise (migration, failover and failback)
  • Monitoring and Troubleshooting vCloud Availability services
  • Conclusions, References and further reading

As always, corrections, comments and feedback are always appreciated.


vCloud Director 9.7 Portal Customization

One of the nicest additions to the new VMware vCloud Director 9.7 release is the ability to more fully customize the tenant portal. This now includes the capability to define custom links (together with section groupings / separators) and also the capability to customize the portal (and links) on a per-tenant basis:


To help take advantage of this, I’ve updated my vcd-h5-themes module on Github to understand the new capabilities in vCloud Director v9.7 (API version 32.0) to allow easier manipulation of the portal branding configuration options.

In particular the ‘Set-Branding’ cmdlet can now take a PSObject parameter with the customization links to be overridden or added to the portal (more about what this means later), it will also now take an optional parameter to limit the scope to a single vCD tenant organization (rather than applying changes to the system-default branding).

The ‘Get-Branding’ cmdlet has also been updated and can now retrieve either the global default branding, or the branding from a specific tenant organization.

There are actually 2 types of links that can be specified:

1) The ‘Help’ and the ‘About’ links under the circled ? icon can be redirected to other sites/pages (rather than showing the default VMware pages)

2) The menu under the current username (highlighted in red above) can be extended with any number of new sections, separators and links to other pages.

The way these are performed is slightly different, but both are placed into the customLinks object passed to Set-Branding.

Worked example

Let’s say that we want to make the following changes to the portal links:

– The ‘About’ link under the ? icon should redirect to our company about page at instead of the default VMware ‘About’ page.

– The Extensible menu under the username drop-down should have the following structure:

+– Help Desk (redirecting to
+– Contact Us (redirecting to mailto with a subject line of ‘Web Support’)
—– (Separator)
+– Other services (redirecting to
—– (Separator)
Terms & Conditions (redirecting to

To create these changes, we need to build a customLink object in PowerShell that reflects this arrangement, the code to do this is shown below. Running this code will create a PowerShell object variable ‘$mylinks’ which can then be passed to the Set-Branding cmdlet:

### Create $mylinks variable with our branding menu structure
$mylinks = [PSCustomObject]@(
    # Override the default 'about' link to redirect to
    # Add the section name 'Support':
    # Add the 'Help Desk' link:
        name="Help Desk";
    # Add the 'Contact Us' link:
        name="Contact Us";
        url=" Support"
    # Add the Separator:
    # Add the 'Services' group:
    # Add the 'Other services' link:
        name="Other services";
    # Add the 2nd Separator:
    # Add the 'Terms & Conditions' link:
        name="Terms & Conditions";
### End of File ###

The syntax is a bit fiddly here – in particular make sure that you place quote marks around each value as shown above – it may be easier to copy this script and edit the values rather than creating from scratch.

To test the object has been created successfully prior to configuring the portal, you can do ‘ConvertTo-Json $mylinks’ which should show a well-formatted JSON object if everything is correct:

     "menuItemType": "override",
     "url": "",
     "name": "about"
     "menuItemType": "section",
     "name": "Support"
     "menuItemType": "link",
     "url": "",
     "name": "Help Desk"
     "menuItemType": "link",
     "url": " Support",
     "name": "Contact Us"
     "menuItemType": "separator"
     "menuItemType": "section",
     "name": "Services"
     "menuItemType": "link",
     "url": "",
     "name": "Other services"
     "menuItemType": "separator"
     "menuItemType": "link",
     "url": "",
     "name": "Terms & Conditions"

To set our branding (make sure you use Connect-CIServer to connect to the appropriate cloud first in the ‘System’ context) then:

Set-Branding -customLinks $mylinks
Branding configuration sent successfully.

You can also use the ‘-Tenant’ switch to apply the changes to a specific tenant organization only.

When we look in the vCD HTML5 portal clicking on our username in the top-right of the portal we can now see our new link structure in place:


In addition, the ‘About’ option under the menu obtained by clicking the circled ? will now redirect to our own site:


Dynamic Persistent Volumes with CSE Kubernetes and Ceph


Application containerization with Docker is fast becoming the default deployment pattern for many business applications and Kubernetes (k8s) the method of managing these workloads. While containers generally should be stateless and ephemeral (able to be deployed, scaled and deleted at will) almost all business applications require data persistence of some form. In some cases it is appropriate to offload this to an external system (a database, file store or object store in public cloud environments are common for example).

This doesn’t cover all storage requirements though, and if you are running k8s in your own environment or in a hosted service provider environment you may not have access to compatible or appropriate storage. One solution for this is to build a storage platform alongside a Kubernetes cluster which can provide storage persistence while operating in a similar deployment pattern to the k8s cluster itself (scalable, clustered, highly available and no single points of failure).

VMware Container Service Extension (CSE) for vCloud Director (vCD) is an automated way for customers of vCloud powered service providers to easily deploy, scale and manage k8s clusters, however CSE currently only provides a limited storage option (an NFS storage server added to the cluster) and k8s persistent volumes (PVs) have to be pre-provisioned in NFS and assigned to containers/pods rather than being generated on-demand. This can also cause availability, scale and performance issues caused by the pod storage being located on a single server VM.

There is certainly no ‘right’ answer to the question of persistent storage for k8s clusters – often the choice will be driven by what is available in the platform you are deploying to and the security, availability and performance requirements for this storage.

In this post I will detail a deployment using a ceph storage cluster to provide a highly available and scalable storage platform and the configuration required to enable a CSE deployed k8s cluster to use dynamic persistent volumes (DPVs) in this environment.

Due to the large number of servers/VMs involved, and the possibility of confusion / working on the wrong server console – I’ve added buttons like this prior to each section to show which system(s) the commands should be used on.


I am not an expert in Kubernetes or ceph and have figured out most of the contents in this post from documentation, forums, google and (sometimes) trial and error. Refer to the documentation and support resources at the links at the end of this post if you need the ‘proper’ documentation on these components. Please do not use anything shown in this post in a production environment without appropriate due diligence and making sure you understand what you are doing (and why!).

Solution Overview

Our solution is going to be based on a minimal viable installation of ceph with a CSE cluster consisting of 4 ceph nodes (1 admin and 3 combined OSD/mon/mgr nodes) and a 4 node Kubernetes cluster (1 master and 3 worker nodes). There is no requirement for the OS in the ceph cluster and the kubernetes cluster to be the same, however it does make it easier if the packages used for ceph are at the same version which is easier to achieve using the same base OS for both clusters. Since CSE currently only has templates for Ubuntu 16.04 and PhotonOS, and due to the lack of packages for the ‘mimic’ release of ceph on PhotonOS, this example will use Ubuntu 16.04 LTS as the base OS for all servers.

The diagram below shows the components required to be deployed – in the lab environment I’m using the DNS and NTP servers already exist:

solution overview

Note: In production ceph clusters, the monitor (mon) service should run on separate machines from the nodes providing storage (OSD nodes), but for a test/dev environment there is no issue running both services on the same nodes.


You should ensure that you have the following enabled and configured in your environment before proceeding:

Configuration ItemRequirement
DNSHave a DNS server available and add host (‘A’) records for each of the ceph servers. Alternatively it should be possible to add /etc/hosts records on each node to avoid the need to configure DNS. Note that this is only required for the ceph nodes to talk to each other, the kubernetes cluster uses direct IP addresses to contact the ceph cluster.
NTPHave an available NTP time source on your network, or access to external ntp servers
Static IP PoolContainer Service Extension (CSE) requires a vCloud OrgVDC network with sufficient addresses available from a static IP pool for the number of kubernetes nodes being deployed
SSH Key PairGenerated SSH key pair to be used to administer the deployed CSE servers. This could (optionally) also be used to administer the ceph servers
VDC CapacityEnsure you have sufficient resources (Memory, CPU, Storage and number of VMs) in your vCD VDC to support the desired cluster sizes

Ceph Storage Cluster

The process below describes installing and configuring a ceph cluster on virtualised hardware. If you have an existing ceph cluster available or are building on physical hardware it’s best to follow the ceph official documentation at this link for your circumstances.

Ceph Server Builds

The 4 ceph servers can be built using any available hardware or virtualisation platform, in this exercise I’ve built them from an Ubuntu 16.04 LTS server template with 2 vCPUs and 4GB RAM for each in the same vCloud Director environment which will be used for deployment of the CSE kubernetes cluster. There are no special requirements for installing/configuring the base Operating System for the ceph cluster. If you are using a different Linux distribution then check the ceph documentation for the appropriate steps for your distribution.

On the 3 storage nodes (ceph01, ceph02 and ceph03) add a hard disk to the server which will act as the storage for the ceph Object Storage Daemon (OSD) – the storage pool which will eventually be useable in Kubernetes. In this example I’ve added a 50GB disk to each of these VMs.

Once the servers are deployed the following are performed on each server to update their repositories and upgrade any modules to current security levels. We will also upgrade the Linux kernel to a more up-to-date version by enabling the Ubuntu Hardware Extension (HWE) kernel which resolves some compatibility issues between ceph and older Linux kernel versions.

$ sudo apt-get update
$ sudo apt-get upgrade
$ sudo apt-get install --install-recommends linux-generic-hwe-16.04 -y

Each server should now be restarted to ensure the new Linux kernel is loaded and any added storage disks are recognised.

Ceph Admin Account

We need a user account configured on each of the ceph servers to allow ceph-deploy to work and to co-ordinate access, this account must NOT be named ‘ceph’ due to potential conflicts in the ceph-deploy scripts, but can be called just about anything else. In this lab environment I’ve used ‘cephadmin’. First we create the account on each server and set the password, the 3rd line permits the cephadmin user to use ‘sudo’ without a password which is required for the ceph-deploy script:

$ sudo useradd -d /home/cephadmin -m cephadmin -s /bin/bash
$ sudo passwd cephadmin
$ echo "cephadmin ALL = (root) NOPASSWD:ALL" > /etc/sudoers.d/cephadmin

From now on, (unless specified) use the new cephadmin login to perform each step. Next we need to generate an SSH key pair for the ceph admin user and copy this to the authorized-keys file on each of the ceph nodes.

Execute the following on the ceph admin node (as cephadmin):

$ ssh-keygen -t rsa

Accept the default path (/home/cephadmin/.ssh/id_rsa) and don’t set a key passphrase. You should copy the generated .ssh/id_rsa (private key) file to your admin workstation so you can use it to authenticate to the ceph servers.

Next, enable password logins (temporarily) on the storage nodes (ceph01,2 & 3) by running the following on each node:

$ sudo sed -i "s/.*PasswordAuthentication.*/PasswordAuthentication yes/g" /etc/ssh/sshd_config
$ sudo systemctl restart sshd

Now copy the cephadmin public key to each of the other ceph nodes by running the following (again only on the admin node):

$ ssh-keyscan -t rsa ceph01 >> ~/.ssh/known_hosts
$ ssh-keyscan -t rsa ceph02 >> ~/.ssh/known_hosts
$ ssh-keyscan -t rsa ceph03 >> ~/.ssh/known_hosts
$ ssh-copy-id cephadmin@ceph01
$ ssh-copy-id cephadmin@ceph02
$ ssh-copy-id cephadmin@ceph03

You should now confirm you can ssh to each storage node as the cephadmin user from the admin node without being prompted for a password:

$ ssh cephadmin@ceph01 sudo hostname
$ ssh cephadmin@ceph02 sudo hostname
$ ssh cephadmin@ceph03 sudo hostname

If everything is working correctly then each command will return the appropriate hostname for each storage node without any password prompts.

Optional: It is now safe to re-disable password authentication on the ceph servers if required (since public key authentication will be used from now on) by:

$ sudo sed -i "s/.*PasswordAuthentication.*/PasswordAuthentication no/g" /etc/ssh/sshd_config
$ sudo systemctl restart sshd

You’ll need to resolve any authentication issues before proceeding as the ceph-deploy script relies on being able to obtain sudo-level remote access to all of the storage nodes to install ceph successfully.

You should also at this stage confirm that you have time synchronised to an external source on each ceph node so that the server clocks agree, by default on Ubuntu 16.04 timesyncd is configured automatically so nothing needs to be done here in our case. You can check this on Ubuntu 16.04 by running timedatectl:

Checking time/date settings using timedatectl

For some Linux distributions you may need to create firewall rules at this stage for ceph to function, generally port 6789/tcp (for mon) and the range 6800 to 7300 tcp (for OSD communication) need to be open between the cluster nodes. The default firewall settings in Ubuntu 16.04 allow all network traffic so this is not required (however, do not use this in a production environment without configuring appropriate firewalling).

Ceph Installation

On all nodes and signed-in as the cephadmin user (important!)
Add the release key:

$ wget -q -O- '' | sudo apt-key add -

Add ceph packages to your repository:

$ echo deb $(lsb_release -sc) main | sudo tee /etc/apt/sources.list.d/ceph.list

On the admin node only, update and install ceph-deploy:

$ sudo apt update; sudo apt install ceph-deploy -y

On all nodes, update and install ceph-common:

$ sudo apt update; sudo apt install ceph-common -y

Note: Installing ceph-common on the storage nodes isn’t strictly required as the ceph-deploy script can do this during cluster initiation, but pre-installing it in this way pulls in several dependencies (e.g. python v2 and associated modules) which can prevent ceph-deploy from running if not present so it is easier to do this way.

Next again working on the admin node logged in as cephadmin, make a directory to store the ceph cluster configuration files and change to that directory. Note that ceph-deploy will use and write files to the current directory so make sure you are in this folder whenever making changes to the ceph configuration.

$ sudo apt install ceph-deploy -y

Now we can create the initial ceph cluster from the admin node, use ceph-deploy with the ‘new’ switch and supply the monitor nodes (in our case all 3 nodes will be both monitors and OSD nodes). Make sure you do NOT use sudo for this command and only run on the admin node:

$ ceph-deploy new ceph01 ceph02 ceph03

If everything has run correctly you’ll see output similar to the following:

ceph-deploy output

Checking the contents of the ~/mycluster/ folder should show the cluster configuration files have been added:

$ ls -al ~/mycluster
total 24
drwxrwxr-x 2 cephadmin cephadmin 4096 Jan 25 01:03 .
drwxr-xr-x 5 cephadmin cephadmin 4096 Jan 25 00:57 ..
-rw-rw-r-- 1 cephadmin cephadmin  247 Jan 25 01:03 ceph.conf
-rw-rw-r-- 1 cephadmin cephadmin 7468 Jan 25 01:03 ceph-deploy-ceph.log
-rw------- 1 cephadmin cephadmin   73 Jan 25 01:03 ceph.mon.keyring

The ceph.conf file will look something like this:

$ cat ~/mycluster/ceph.conf
fsid = 98ca274e-f79b-4092-898a-c12f4ed04544
mon_initial_members = ceph01, ceph02, ceph03
mon_host =,,
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx

Run the ceph installation for the nodes (again from the admin node only):

$ ceph-deploy install ceph01 ceph02 ceph03

This will run through the installation of ceph and pre-requisite packages on each node, you can check the ceph-deploy-ceph.log file after deployment for any issues or errors.

Ceph Configuration

Once you’ve successfully installed ceph on each node, use the following (again from only the admin node) to deploy the initial ceph monitor services:

$ ceph-deploy mon create-initial

If all goes well you’ll get some messages at the completion of this process showing the keyring files being stored in your ‘mycluster’ folder, you can check these exist:

$ ls -al ~/mycluster
total 168
drwxrwxr-x 2 cephadmin cephadmin   4096 Jan 25 01:17 .
drwxr-xr-x 5 cephadmin cephadmin   4096 Jan 25 00:57 ..
-rw------- 1 cephadmin cephadmin    113 Jan 25 01:17 ceph.bootstrap-mds.keyring
-rw------- 1 cephadmin cephadmin    113 Jan 25 01:17 ceph.bootstrap-mgr.keyring
-rw------- 1 cephadmin cephadmin    113 Jan 25 01:17 ceph.bootstrap-osd.keyring
-rw------- 1 cephadmin cephadmin    113 Jan 25 01:17 ceph.bootstrap-rgw.keyring
-rw------- 1 cephadmin cephadmin    151 Jan 25 01:17 ceph.client.admin.keyring
-rw-rw-r-- 1 cephadmin cephadmin    247 Jan 25 01:03 ceph.conf
-rw-rw-r-- 1 cephadmin cephadmin 128136 Jan 25 01:17 ceph-deploy-ceph.log
-rw------- 1 cephadmin cephadmin     73 Jan 25 01:03 ceph.mon.keyring

To avoid having to specify the monitor node address and ceph.client.admin.keyring path in every command, we can now deploy these to each node so they are available automatically. Again working from the ‘mycluster’ folder on the admin node:

$ ceph-deploy admin cephadmin ceph01 ceph02 ceph03

This should give the following:


Next we need to deploy the manager (‘mgr’) service to the OSD nodes, again working from the ‘mycluster’ folder on the admin node:

$ ceph-deploy mgr create ceph01 ceph02 ceph03

At this stage we can check that all of the mon and mgr services are started and ok by running (on the admin node):

$ sudo ceph -s
    id:     98ca274e-f79b-4092-898a-c12f4ed04544
    health: HEALTH_OK

    mon: 3 daemons, quorum ceph01,ceph02,ceph03
    mgr: ceph01(active), standbys: ceph02, ceph03
    osd: 0 osds: 0 up, 0 in

    pools:   0 pools, 0 pgs
    objects: 0  objects, 0 B
    usage:   0 B used, 0 B / 0 B avail

As you can see, the manager (‘mgr’) service is installed on all 3 nodes but only active on the first and in standby mode on the other 2 – this is normal and correct. The monitor (‘mon’) service is running on all of the storage nodes.

Next we can configure the disks attached to our storage nodes for use by ceph. Ensure that you know and use the correct identifier for your disk devices (in this case, we are using the 2nd SCSI disk attached to the storage node VMs which is at /dev/sdb so that’s what we’ll use in the commands below). As before, run the following only on the admin node:

$ ceph-deploy osd create --data /dev/sdb ceph01
$ ceph-deploy osd create --data /dev/sdb ceph02
$ ceph-deploy osd create --data /dev/sdb ceph03

For each command the last line of the logs shown when run should be similar to ‘Host ceph01 is now ready for osd use.’

We can now check the overall cluster health with:

$ ssh ceph01 sudo ceph health
$ ssh ceph01 sudo ceph -s
    id:     98ca274e-f79b-4092-898a-c12f4ed04544
    health: HEALTH_OK

    mon: 3 daemons, quorum ceph01,ceph02,ceph03
    mgr: ceph01(active), standbys: ceph02, ceph03
    osd: 3 osds: 3 up, 3 in

    pools:   0 pools, 0 pgs
    objects: 0  objects, 0 B
    usage:   3.0 GiB used, 147 GiB / 150 GiB avail

As you can see, the 3 x 50GB disks have now been added and the total (150 GiB) capacity is available under the data: section.

Now we need to create a ceph storage pool ready for Kubernetes to consume from – the default name of this pool is ‘rbd’ (if not specified), but it is strongly recommended to name it differently from the default when using for k8s so I’ve created a storage pool called ‘kube’ in this example (again running from the mycluster folder on the admin node):

$ sudo ceph osd pool create kube 30 30
pool 'kube' created

The two ’30’s are important – you should review the ceph documentation here for Pool, PG and CRUSH configuration to establish values for PG and PGP appropriate to your environment.

We now associated this pool with the rbd (RADOS block device) application so it is available to be used as a RADOS block device:

$ sudo ceph osd pool application enable kube rbd
enabled application 'rbd' on pool 'kube'

Testing Ceph Storage

The easiest way to test our ceph cluster is working correctly and can provide storage is to attempt creating and using a new RADOS Block Device (rbd) volume from our admin node.

Before this will work we need to tune the rbd features map by editing ceph.conf on our client to disable rbd features that aren’t available in our Linux kernel (on admin/client node):

$ echo "rbd_default_features = 7" | sudo tee -a /etc/ceph/ceph.conf
rbd_default_features = 7

Now we can test creating a volume:

$ sudo rbd create --size 1G kube/testvol01

Confirm that the volume exists:

$ sudo rbd ls kube

Get information on our volume:

$ sudo rbd info kube/testvol01
rbd image 'testvol01':
        size 1 GiB in 256 objects
        order 22 (4 MiB objects)
        id: 10e96b8b4567
        block_name_prefix: rbd_data.10e96b8b4567
        format: 2
        features: layering, exclusive-lock
        create_timestamp: Sun Jan 27 08:50:45 2019

Map the volume to our admin host (which creates the block device /dev/rbd0):

$ sudo rbd map kube/testvol01

Now we can create a temporary mount folder, make a filesystem on our volume and mount it to our temporary mount:

$ sudo mkdir /testmnt
$ sudo mkfs.xfs /dev/rbd0
meta-data=/dev/rbd0              isize=512    agcount=9, agsize=31744 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=0
data     =                       bsize=4096   blocks=262144, imaxpct=25
         =                       sunit=1024   swidth=1024 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=8 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
$ sudo mount /dev/rbd0 /testmnt
$ df -vh
Filesystem      Size  Used Avail Use% Mounted on
udev            1.9G     0  1.9G   0% /dev
tmpfs           395M  5.7M  389M   2% /run
/dev/sda1       9.6G  2.2G  7.4G  24% /
tmpfs           2.0G     0  2.0G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           2.0G     0  2.0G   0% /sys/fs/cgroup
/dev/sda15      105M  3.4M  101M   4% /boot/efi
tmpfs           395M     0  395M   0% /run/user/1001
/dev/rbd0      1014M   34M  981M   4% /testmnt

We can see our volume has been mounted successfully and can now be used as any other disk.

To tidy up and remove our test volume:

$ sudo umount /dev/rbd0
$ sudo rbd unmap kube/testvol01
$ sudo rbd remove kube/testvol01
Removing image: 100% complete...done.
$ sudo rmdir /testmnt

Kubernetes CSE Cluster

Using VMware Container Service Extension (CSE) makes it easy to deploy and configure a base Kubernetes cluster into our vCloud Director platform. I previously wrote a post here with a step-by-step guide to using CSE.

First we need an ssh key pair to provide to the CSE nodes as they are deployed to allow us to access them. You could re-use the cephadmin key-pair created in the previous section, or generate a new set. As I’m using Windows as my client OS I used the puttygen utility included in the PuTTY package to generate a new keypair and save them to a .ssh directory in my home folder.

Important Note: Check your public key file in a text editor prior to deploying the cluster, if it looks like this:

This image has an empty alt attribute; its file name is image-12.png
Public key as generated by PuTTYGen (incorrect)

You will need to change it to be all on one line starting ‘ssh-rsa’ and with none of the extra text as follows:

This image has an empty alt attribute; its file name is image-11.png

If you do not make this change this you won’t be able to authenticate to your cluster nodes once deployed.

Next we login to vCD using the vcd-cli (see my post linked above if you need to install/configure vcd-cli and the CSE extension):

Logging in to vcd-cli

Now we can see what virtual Datacenters (VDCs) are available to us:

Showing available VDCs

If we had multiple VDCs available, we need to select which one is ‘in_use’ (active) for deployment of our cluster using ‘vcd vdc use “<VDC Name>”‘. In this case we only have a single VDC and it’s already active/in use.

We can get the information of our VDC which will help us fill out the required properties when creating our k8s cluster:

VDC Properties returned by vdc info

We will be using the ‘Tyrell Servers A03’ network (where our ceph cluster exists) and the ‘A03 VSAN Performance’ storage profile for our cluster.

To get the options available when creating a cluster we can see the cluster creation help:

CSE cluster create options

Now we can go ahead and create out Kubernetes cluster with CSE:

Looking in vCloud Director we can see the new vApp and VMs deployed:

We obtain the kubectl config of our cluster and store this for later use (make the .kube folder first if it doesn’t already exist):

C:\Users\admin>vcd cse cluster config k8sceph > .kube\config

And get the details of our k8s nodes from vcd-cli:

Next we need to update and install the ceph client on each cluster node – run the following on each node (including the master). To do this we can connect via ssh as root using the key pair we specified when creating the cluster.

# wget -q -O- '' | sudo apt-key add -
# echo deb $(lsb_release -sc) main > /etc/apt/sources.list.d/ceph.list
# apt-get update
# apt-get install --install-recommends linux-generic-hwe-16.04 -y
# apt-get install ceph-common -y
# reboot

You should now be able to connect from an admin workstation and get the nodes in the kubernetes cluster from kubectl (if you do not already have kubectl installed on your admin workstation, see here for instructions).

Note: if you expand the CSE cluster at any point (add nodes), you will need to repeat this series of commands on each new node in order for it to be able to mount rbd volumes from the ceph cluster.

You should also be able to verify that the core kubernetes services are running in your cluster:

The ceph configuration files from the ceph cluster nodes need to be added to all nodes in the kubernetes cluster. Depending on which ssh keys you have configured for access, you may be able to do this directly from the ceph admin node as follows:

$ sudo scp /etc/ceph/ceph.* root@
$ sudo scp /etc/ceph/ceph.* root@
$ sudo scp /etc/ceph/ceph.* root@
$ sudo scp /etc/ceph/ceph.* root@

If not, manually copy the /etc/ceph/ceph.conf and /etc/ceph/ceph.client.admin.keyring files to each of the kubernetes nodes using copy/paste or scp from your admin workstation (copy the files from the ceph admin node to ensure that the rbd_default_features line is included).

To confirm everything is configured correctly, we should now be able to create and mount a test rbd volume on any of the kubernetes nodes as we did for the ceph admin node previously:

root@mstr-x4nb:~# rbd create --size 1G kube/testvol02
root@mstr-x4nb:~# rbd ls kube
root@mstr-x4nb:~# rbd info kube/testvol02
rbd image 'testvol02':
        size 1 GiB in 256 objects
        order 22 (4 MiB objects)
        id: 10f36b8b4567
        block_name_prefix: rbd_data.10f36b8b4567
        format: 2
        features: layering, exclusive-lock
        create_timestamp: Sun Jan 27 21:56:59 2019
root@mstr-x4nb:~# rbd map kube/testvol02
root@mstr-x4nb:~# mkdir /testmnt
root@mstr-x4nb:~# mkfs.xfs /dev/rbd0
meta-data=/dev/rbd0              isize=512    agcount=9, agsize=31744 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=0
data     =                       bsize=4096   blocks=262144, imaxpct=25
         =                       sunit=1024   swidth=1024 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=8 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
root@mstr-x4nb:~# mount /dev/rbd0 /testmnt
root@mstr-x4nb:~# df -vh
Filesystem      Size  Used Avail Use% Mounted on
udev            1.9G     0  1.9G   0% /dev
tmpfs           395M  5.7M  389M   2% /run
/dev/sda1       9.6G  4.0G  5.6G  42% /
tmpfs           2.0G     0  2.0G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           2.0G     0  2.0G   0% /sys/fs/cgroup
/dev/sda15      105M  3.4M  101M   4% /boot/efi
tmpfs           395M     0  395M   0% /run/user/0
/dev/rbd0      1014M   34M  981M   4% /testmnt
root@mstr-x4nb:~# umount /testmnt
root@mstr-x4nb:~# rbd unmap kube/testvol02
root@mstr-x4nb:~# rmdir /testmnt/
root@mstr-x4nb:~# rbd remove kube/testvol02
Removing image: 100% complete...done.

Note: If the rbd map command hangs you may still be running the stock Linux kernel on the kubernetes nodes – make sure you have restarted them.

Now we have a functional ceph storage cluster capable of serving block storage devices over the network, and a Kubernetes cluster configured able to mount rbd devices and use these. In the next section we will configure kubernetes and ceph together with the rbd-provisioner container to enable dynamic persistent storage for pods deployed into our infrastructure.

Putting it all together

Kubernetes secrets

We need to first tell Kubernetes account information to be used to connect to the ceph cluster, to do this we create a ‘secret’ for the ceph admin user, and also create a client user to be used by k8s provisioning. Working on the kubernetes master node is easiest for this as it has ceph and kubectl already configured from our previous steps:

# ceph auth get-key client.admin

This will return a key like ‘AQCLY0pcFXBYIxAAhmTCXWwfSIZxJ3WhHnqK/w==’ which is used in the next command (Note: the ‘=’ sign between –from-literal and key is not a typo – it actually needs to be like this).

# kubectl create secret generic ceph-secret --type="" \
--from-literal=key='AQCLY0pcFXBYIxAAhmTCXWwfSIZxJ3WhHnqK/w==' --namespace=kube-system
secret "ceph-secret" created

We can now create a new ceph user ‘kube’ and register the secret from this user in kubernetes as ‘ceph-secret-kube’:

# ceph auth get-or-create client.kube mon 'allow r' osd 'allow rwx pool=kube'
        key = AQDqZU5c0ahCOBAA7oe+pmoLIXV/8OkX7cNBlw==
# kubectl create secret generic ceph-secret-kube --type="" \
--from-literal=key='AQDqZU5c0ahCOBAA7oe+pmoLIXV/8OkX7cNBlw==' --namespace=kube-system
secret "ceph-secret-kube" created


Kubernetes is in the process of moving storage provisioners (such as the rbd one we will be using) out of its main packages and into separate projects and packages. There’s also an issue that the kubernetes-controller-manager container no longer has access to an ‘rbd’ binary in order to be able to connect to a ceph cluster directly. We therefore need to deploy a small ‘rbd-provisioner’ to act as the go-between from the kubernetes cluster to the ceph storage cluster. This project is available under this link and the steps below show how to obtain get a kubernetes pod running the rbd-provisioner service up and running (again working from the k8s cluster ‘master’ node):

# git clone
Cloning into 'external-storage'...
remote: Enumerating objects: 2, done.
remote: Counting objects: 100% (2/2), done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 63661 (delta 0), reused 1 (delta 0), pack-reused 63659
Receiving objects: 100% (63661/63661), 113.96 MiB | 8.97 MiB/s, done.
Resolving deltas: 100% (29075/29075), done.
Checking connectivity... done.
# cd external-storage/ceph/rbd/deploy
# sed -r -i "s/namespace: [^ ]+/namespace: kube-system/g" ./rbac/clusterrolebinding.yaml ./rbac/rolebinding.yaml
# kubectl -n kube-system apply -f ./rbac "rbd-provisioner" created "rbd-provisioner" created
deployment.extensions "rbd-provisioner" created "rbd-provisioner" created "rbd-provisioner" created
serviceaccount "rbd-provisioner" created
# cd

You should now be able to see the ‘rbd-provisioner’ container starting and then running in kubernetes:

Testing it out

Now we can create our kubernetes Storageclass using this storage ready for a pod to make a persistent volume claim (PVC) against. Create the following as a new file (I’ve named mine ‘rbd-storageclass.yaml’). Change the ‘monitors’ line to reflect the IP addresses of the ‘mon’ nodes in your ceph cluster (in our case these are on the ceph01, ceph02 and ceph03 nodes on the IP addresses shown in the file).

kind: StorageClass
  name: rbd
  adminId: admin
  adminSecretName: ceph-secret
  adminSecretNamespace: kube-system
  pool: kube
  userId: kube
  userSecretName: ceph-secret-kube
  userSecretNamespace: kube-system
  imageFormat: "2"
  imageFeatures: layering

You can then add this StorageClass to kubernetes using:

# kubectl create -f ./rbd-storageclass.yaml "rbd" created

Next we can create a test PVC and make sure that storage is created in our ceph cluster and assigned to the pod. Create a new file ‘pvc-test.yaml’ as:

kind: PersistentVolumeClaim
apiVersion: v1
  name: testclaim
    - ReadWriteOnce
      storage: 1Gi
  storageClassName: rbd

We can now submit the PVC to kubernetes and check it has been successfully created:

# kubectl create -f ./pvc-test.yaml
persistentvolumeclaim "testclaim" created
# kubectl get pvc testclaim
NAME        STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
testclaim   Bound     pvc-1e9bdbfd-22a8-11e9-ba77-005056340036   1Gi        RWO            rbd            21s
# kubectl describe pvc testclaim
Name:          testclaim
Namespace:     default
StorageClass:  rbd
Status:        Bound
Volume:        pvc-1e9bdbfd-22a8-11e9-ba77-005056340036
Labels:        <none>
Finalizers:    []
Capacity:      1Gi
Access Modes:  RWO
  Type    Reason                 Age   From                                                                               Message
  ----    ------                 ----  ----                                                                               -------
  Normal  ExternalProvisioning   3m    persistentvolume-controller                                                        waiting for a volume to be created, either by external provisioner "" or manually created by system administrator
  Normal  Provisioning           3m  External provisioner is provisioning volume for claim "default/testclaim"
  Normal  ProvisioningSucceeded  3m  Successfully provisioned volume pvc-1e9bdbfd-22a8-11e9-ba77-005056340036
# rbd list kube
# rbd info kube/kubernetes-dynamic-pvc-25e94cb6-22a8-11e9-aa61-7620ed8d4293
rbd image 'kubernetes-dynamic-pvc-25e94cb6-22a8-11e9-aa61-7620ed8d4293':
        size 1 GiB in 256 objects
        order 22 (4 MiB objects)
        id: 11616b8b4567
        block_name_prefix: rbd_data.11616b8b4567
        format: 2
        features: layering
        create_timestamp: Mon Jan 28 02:55:19 2019

As we can see, our test claim has successfully requested and bound a persistent storage volume from the ceph cluster.


VMware Container Service Extension 
VMware vCloud Director for Service Providers 

Wow, this post ended up way longer than I was anticipating when I started writing it. Hopefully there’s something useful for you in amongst all of that.

I’d like to thank members of the vExpert community for their encouragement and advice in getting this post written up and as always, if you have any feedback please leave a comment.

Time-permitting, there will be a followup to this post which details how to deploy containers to this platform using the persistent storage made available, both directly in Kubernetes and using Helm charts. I’d also like to cover some of the more advanced issues using persistent storage in containers raises – in particular backup/recovery and replication/high availability of data stored in this manner.


Installing and Using the vRealize Orchestrator (vRO) CLI

Something I was not aware of until recently was that vRealize Orchestrator (vRO) has it’s own Command Line Interface (CLI) environment. This can be an invaluable tool when developing new vRO workflows and actions as it allows you to easily test expressions and code snippets in your environment. Once developed these scripts and actions can be reused easily from vRO workflows in the vRO Workflow Designer.

Downloading vRO-CLI

Unfortunately vRO-CLI is not particularly well publicised (originating as a VMware ‘Fling’) and does not have official support, but it can still be a valuable tool. Due to the support status though I would only recommend installing and using this in a Test/Dev environment rather than in Production.

Locating the tool is currently a bit problematic, a google search for ‘vRO CLI’ will usually take you to this page:


Unfortunately this is an old version (from September 2015) and isn’t compatible with the latest (7.x) releases of vRO. To get to the latest version you will need to visit this page in the VMware community forums: which has the build 4693774 downloads which work with the latest vRO versions:


You will need to download at least 2 of the .zip files, the vRO plugin itself (ending in and the client application for whichever OS you will be using on your development machine (Linux or Windows).

Installing vRO-CLI

Installation is in 2 parts, the first installs the plugin into your vRO appliance to provide the endpoint for the vRO-CLI. The second installs the vRO-CLI client on your workstation to be able to use the service.

1) Installing the vRO-CLI plugin on the vRO appliance

Once you have downloaded both packages, the first step is to install the vRO-CLI plugin into your vRO instance.

Open the web page for your vRO appliance and select the ‘Open Control Center’ link:


Once signed in to the Control Center, select the ‘Manage Plug-Ins’ icon:


Unzip the file with the extension you previously downloaded (e.g. and use the Browse button to select the extracted file:


Click the ‘Upload’ button and when prompted accept the EULA agreement and click install:


You will get the following if everything has been successfully configured:


You need to let the Orchestrator service restart (just leave the configuration appliance and wait a couple of minutes) before the service will be available for client connections.

2) Installing the vRO-CLI Client

Since I’m using a Windows workstation for administration the following details the setup for a Windows vRO-CLI client. You will need to have Java installed and configured in your Windows OS prior to being able to run the vRO-CLI client.

Simply extract the .zip file (in this example, to your machine, I used ‘C:\Program FIles\vRO-CLI-2.0.0’ as the destination directory:


You can now start the vRO-CLI client using either the GUI or command-line. To start using the GUI, start the vcocli-gui.bat script from the o11nplugin-vcocli-dist-2.0.0\bin folder, you should see the vCO CLI login dialog:


Use the hostname of your vRO appliance as the ‘vCO address’ (without any port specification) and supply valid vRO user name and password.

Note that if you have a firewall or other network security between your workstation and the vRO server you will need to permit tcp port 8265 between them to allow connection.

The ‘Session name’ can be anything you like and can be used to reconnect to sessions which have already been started and suspended. Then click ‘New’ to create a new session, if everything has gone well you should see the initial vRO-CLI screen:


Also note that the ‘Quit Session’ button will terminate your current session and you will not be able to reconnect to it, but closing the window using the close (X) icon in the top-left will keep the session running and allow re-attachment to the same session later.

To use the CLI version, you can start the vcocli.bat file from a Windows command prompt and specify the –vco and –username switches to specify the vRO server:


If you want to see what sessions already exist and can be reconnected, you can open vRO client and browse under the ‘vCO CLI / Start Session’ in the tree, each running ‘Start Session’ token will show in the ‘Variables’ tab the session name for sessions which can be reconnected (using ‘Attach’ in the GUI or the –resume switch on the command line):


Using the vRO-CLI

Once you have the client installed and connecting successfully to your vRO server, how do you actually use it? One of the early challenges I faced was working out how to actually reference an object in the vRO object browser in my script fragments. The ‘Help’ documentation provides some useful content, but doesn’t address how to obtain object references like this.

In this example, let’s assume that we’re trying to write a script to perform an action on the ‘test03’ VM in our environment owned by the vCD tenant ‘Tyrell’ in the VDC ‘Tyrell A03 Allocated’ and in the vApp ‘test03’. We can browse down the objects to locate this VM in the vRO-CLI UI:


Now the Server.findForType function can be used when supplied with an object type and the dunesId reference for our object:

var myTest03VM = Server.findForType('vCloud:VM','602d4ed77a4d20e9f854214a808ffcc2e878185e973d38446ad2bac2a623081////https://<my vCD server>/api/vApp/vm-98ae8d30-1090-4958-9061-f1a86590dc7b');

You can copy the value of the object by highlighting the ‘dunesId’ line in the object properties window and copy (Ctrl + C) and pasting (Ctrl + V) this into your command. (Note that this will also include the ‘dunesId’ text which will need to be removed).

This allows us to extract the output of the ‘toXml()’ method for our VM as follows:


All of the usual vRO methods and functions for the object are also available to us. If you need to initiate existing vRO workflows or actions the online help has details on how this can be initiated too.

Note: If you don’t highlight any text in the upper Input area, the entire script will be executed, but if you highlight a block of code or a single line, only that code will be executed when you click the ‘Execute’ icon (or press F5) which can be extremely useful to try out fragments of code and check the results are as expected.

Hopefully this will be useful to those of you developing workflows and actions in vRO and provide you with another method to write and debug your actions.

As always, comments and feedback are always welcome.