Dynamic Persistent Volumes with CSE Kubernetes and Ceph

Introduction

Application containerization with Docker is fast becoming the default deployment pattern for many business applications and Kubernetes (k8s) the method of managing these workloads. While containers generally should be stateless and ephemeral (able to be deployed, scaled and deleted at will) almost all business applications require data persistence of some form. In some cases it is appropriate to offload this to an external system (a database, file store or object store in public cloud environments are common for example).

This doesn’t cover all storage requirements though, and if you are running k8s in your own environment or in a hosted service provider environment you may not have access to compatible or appropriate storage. One solution for this is to build a storage platform alongside a Kubernetes cluster which can provide storage persistence while operating in a similar deployment pattern to the k8s cluster itself (scalable, clustered, highly available and no single points of failure).

VMware Container Service Extension (CSE) for vCloud Director (vCD) is an automated way for customers of vCloud powered service providers to easily deploy, scale and manage k8s clusters, however CSE currently only provides a limited storage option (an NFS storage server added to the cluster) and k8s persistent volumes (PVs) have to be pre-provisioned in NFS and assigned to containers/pods rather than being generated on-demand. This can also cause availability, scale and performance issues caused by the pod storage being located on a single server VM.

There is certainly no ‘right’ answer to the question of persistent storage for k8s clusters – often the choice will be driven by what is available in the platform you are deploying to and the security, availability and performance requirements for this storage.

In this post I will detail a deployment using a ceph storage cluster to provide a highly available and scalable storage platform and the configuration required to enable a CSE deployed k8s cluster to use dynamic persistent volumes (DPVs) in this environment.

Due to the large number of servers/VMs involved, and the possibility of confusion / working on the wrong server console – I’ve added buttons like this prior to each section to show which system(s) the commands should be used on.

Disclaimer

I am not an expert in Kubernetes or ceph and have figured out most of the contents in this post from documentation, forums, google and (sometimes) trial and error. Refer to the documentation and support resources at the links at the end of this post if you need the ‘proper’ documentation on these components. Please do not use anything shown in this post in a production environment without appropriate due diligence and making sure you understand what you are doing (and why!).

Solution Overview

Our solution is going to be based on a minimal viable installation of ceph with a CSE cluster consisting of 4 ceph nodes (1 admin and 3 combined OSD/mon/mgr nodes) and a 4 node Kubernetes cluster (1 master and 3 worker nodes). There is no requirement for the OS in the ceph cluster and the kubernetes cluster to be the same, however it does make it easier if the packages used for ceph are at the same version which is easier to achieve using the same base OS for both clusters. Since CSE currently only has templates for Ubuntu 16.04 and PhotonOS, and due to the lack of packages for the ‘mimic’ release of ceph on PhotonOS, this example will use Ubuntu 16.04 LTS as the base OS for all servers.

The diagram below shows the components required to be deployed – in the lab environment I’m using the DNS and NTP servers already exist:

solution overview


Note: In production ceph clusters, the monitor (mon) service should run on separate machines from the nodes providing storage (OSD nodes), but for a test/dev environment there is no issue running both services on the same nodes.

Pre-requisites

You should ensure that you have the following enabled and configured in your environment before proceeding:

Configuration ItemRequirement
DNSHave a DNS server available and add host (‘A’) records for each of the ceph servers. Alternatively it should be possible to add /etc/hosts records on each node to avoid the need to configure DNS. Note that this is only required for the ceph nodes to talk to each other, the kubernetes cluster uses direct IP addresses to contact the ceph cluster.
NTPHave an available NTP time source on your network, or access to external ntp servers
Static IP PoolContainer Service Extension (CSE) requires a vCloud OrgVDC network with sufficient addresses available from a static IP pool for the number of kubernetes nodes being deployed
SSH Key PairGenerated SSH key pair to be used to administer the deployed CSE servers. This could (optionally) also be used to administer the ceph servers
VDC CapacityEnsure you have sufficient resources (Memory, CPU, Storage and number of VMs) in your vCD VDC to support the desired cluster sizes

Ceph Storage Cluster

The process below describes installing and configuring a ceph cluster on virtualised hardware. If you have an existing ceph cluster available or are building on physical hardware it’s best to follow the ceph official documentation at this link for your circumstances.

Ceph Server Builds

The 4 ceph servers can be built using any available hardware or virtualisation platform, in this exercise I’ve built them from an Ubuntu 16.04 LTS server template with 2 vCPUs and 4GB RAM for each in the same vCloud Director environment which will be used for deployment of the CSE kubernetes cluster. There are no special requirements for installing/configuring the base Operating System for the ceph cluster. If you are using a different Linux distribution then check the ceph documentation for the appropriate steps for your distribution.

On the 3 storage nodes (ceph01, ceph02 and ceph03) add a hard disk to the server which will act as the storage for the ceph Object Storage Daemon (OSD) – the storage pool which will eventually be useable in Kubernetes. In this example I’ve added a 50GB disk to each of these VMs.

Once the servers are deployed the following are performed on each server to update their repositories and upgrade any modules to current security levels. We will also upgrade the Linux kernel to a more up-to-date version by enabling the Ubuntu Hardware Extension (HWE) kernel which resolves some compatibility issues between ceph and older Linux kernel versions.

$ sudo apt-get update
$ sudo apt-get upgrade
$ sudo apt-get install --install-recommends linux-generic-hwe-16.04 -y

Each server should now be restarted to ensure the new Linux kernel is loaded and any added storage disks are recognised.

Ceph Admin Account

We need a user account configured on each of the ceph servers to allow ceph-deploy to work and to co-ordinate access, this account must NOT be named ‘ceph’ due to potential conflicts in the ceph-deploy scripts, but can be called just about anything else. In this lab environment I’ve used ‘cephadmin’. First we create the account on each server and set the password, the 3rd line permits the cephadmin user to use ‘sudo’ without a password which is required for the ceph-deploy script:

$ sudo useradd -d /home/cephadmin -m cephadmin -s /bin/bash
$ sudo passwd cephadmin
$ echo "cephadmin ALL = (root) NOPASSWD:ALL" > /etc/sudoers.d/cephadmin

From now on, (unless specified) use the new cephadmin login to perform each step. Next we need to generate an SSH key pair for the ceph admin user and copy this to the authorized-keys file on each of the ceph nodes.

Execute the following on the ceph admin node (as cephadmin):

$ ssh-keygen -t rsa

Accept the default path (/home/cephadmin/.ssh/id_rsa) and don’t set a key passphrase. You should copy the generated .ssh/id_rsa (private key) file to your admin workstation so you can use it to authenticate to the ceph servers.

Next, enable password logins (temporarily) on the storage nodes (ceph01,2 & 3) by running the following on each node:

$ sudo sed -i "s/.*PasswordAuthentication.*/PasswordAuthentication yes/g" /etc/ssh/sshd_config
$ sudo systemctl restart sshd

Now copy the cephadmin public key to each of the other ceph nodes by running the following (again only on the admin node):

$ ssh-keyscan -t rsa ceph01 >> ~/.ssh/known_hosts
$ ssh-keyscan -t rsa ceph02 >> ~/.ssh/known_hosts
$ ssh-keyscan -t rsa ceph03 >> ~/.ssh/known_hosts
$ ssh-copy-id cephadmin@ceph01
$ ssh-copy-id cephadmin@ceph02
$ ssh-copy-id cephadmin@ceph03

You should now confirm you can ssh to each storage node as the cephadmin user from the admin node without being prompted for a password:

$ ssh cephadmin@ceph01 sudo hostname
ceph01
$ ssh cephadmin@ceph02 sudo hostname
ceph02
$ ssh cephadmin@ceph03 sudo hostname
ceph03

If everything is working correctly then each command will return the appropriate hostname for each storage node without any password prompts.

Optional: It is now safe to re-disable password authentication on the ceph servers if required (since public key authentication will be used from now on) by:

$ sudo sed -i "s/.*PasswordAuthentication.*/PasswordAuthentication no/g" /etc/ssh/sshd_config
$ sudo systemctl restart sshd

You’ll need to resolve any authentication issues before proceeding as the ceph-deploy script relies on being able to obtain sudo-level remote access to all of the storage nodes to install ceph successfully.

You should also at this stage confirm that you have time synchronised to an external source on each ceph node so that the server clocks agree, by default on Ubuntu 16.04 timesyncd is configured automatically so nothing needs to be done here in our case. You can check this on Ubuntu 16.04 by running timedatectl:

image
Checking time/date settings using timedatectl

For some Linux distributions you may need to create firewall rules at this stage for ceph to function, generally port 6789/tcp (for mon) and the range 6800 to 7300 tcp (for OSD communication) need to be open between the cluster nodes. The default firewall settings in Ubuntu 16.04 allow all network traffic so this is not required (however, do not use this in a production environment without configuring appropriate firewalling).

Ceph Installation

On all nodes and signed-in as the cephadmin user (important!)
Add the release key:

$ wget -q -O- 'https://download.ceph.com/keys/release.asc' | sudo apt-key add -

Add ceph packages to your repository:

$ echo deb https://download.ceph.com/debian-mimic/ $(lsb_release -sc) main | sudo tee /etc/apt/sources.list.d/ceph.list

On the admin node only, update and install ceph-deploy:

$ sudo apt update; sudo apt install ceph-deploy -y

On all nodes, update and install ceph-common:

$ sudo apt update; sudo apt install ceph-common -y

Note: Installing ceph-common on the storage nodes isn’t strictly required as the ceph-deploy script can do this during cluster initiation, but pre-installing it in this way pulls in several dependencies (e.g. python v2 and associated modules) which can prevent ceph-deploy from running if not present so it is easier to do this way.

Next again working on the admin node logged in as cephadmin, make a directory to store the ceph cluster configuration files and change to that directory. Note that ceph-deploy will use and write files to the current directory so make sure you are in this folder whenever making changes to the ceph configuration.

$ sudo apt install ceph-deploy -y

Now we can create the initial ceph cluster from the admin node, use ceph-deploy with the ‘new’ switch and supply the monitor nodes (in our case all 3 nodes will be both monitors and OSD nodes). Make sure you do NOT use sudo for this command and only run on the admin node:

$ ceph-deploy new ceph01 ceph02 ceph03

If everything has run correctly you’ll see output similar to the following:

image
ceph-deploy output

Checking the contents of the ~/mycluster/ folder should show the cluster configuration files have been added:

$ ls -al ~/mycluster
total 24
drwxrwxr-x 2 cephadmin cephadmin 4096 Jan 25 01:03 .
drwxr-xr-x 5 cephadmin cephadmin 4096 Jan 25 00:57 ..
-rw-rw-r-- 1 cephadmin cephadmin  247 Jan 25 01:03 ceph.conf
-rw-rw-r-- 1 cephadmin cephadmin 7468 Jan 25 01:03 ceph-deploy-ceph.log
-rw------- 1 cephadmin cephadmin   73 Jan 25 01:03 ceph.mon.keyring

The ceph.conf file will look something like this:

$ cat ~/mycluster/ceph.conf
fsid = 98ca274e-f79b-4092-898a-c12f4ed04544
mon_initial_members = ceph01, ceph02, ceph03
mon_host = 192.168.207.201,192.168.207.202,192.168.207.203
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx

Run the ceph installation for the nodes (again from the admin node only):

$ ceph-deploy install ceph01 ceph02 ceph03

This will run through the installation of ceph and pre-requisite packages on each node, you can check the ceph-deploy-ceph.log file after deployment for any issues or errors.

Ceph Configuration

Once you’ve successfully installed ceph on each node, use the following (again from only the admin node) to deploy the initial ceph monitor services:

$ ceph-deploy mon create-initial

If all goes well you’ll get some messages at the completion of this process showing the keyring files being stored in your ‘mycluster’ folder, you can check these exist:

$ ls -al ~/mycluster
total 168
drwxrwxr-x 2 cephadmin cephadmin   4096 Jan 25 01:17 .
drwxr-xr-x 5 cephadmin cephadmin   4096 Jan 25 00:57 ..
-rw------- 1 cephadmin cephadmin    113 Jan 25 01:17 ceph.bootstrap-mds.keyring
-rw------- 1 cephadmin cephadmin    113 Jan 25 01:17 ceph.bootstrap-mgr.keyring
-rw------- 1 cephadmin cephadmin    113 Jan 25 01:17 ceph.bootstrap-osd.keyring
-rw------- 1 cephadmin cephadmin    113 Jan 25 01:17 ceph.bootstrap-rgw.keyring
-rw------- 1 cephadmin cephadmin    151 Jan 25 01:17 ceph.client.admin.keyring
-rw-rw-r-- 1 cephadmin cephadmin    247 Jan 25 01:03 ceph.conf
-rw-rw-r-- 1 cephadmin cephadmin 128136 Jan 25 01:17 ceph-deploy-ceph.log
-rw------- 1 cephadmin cephadmin     73 Jan 25 01:03 ceph.mon.keyring

To avoid having to specify the monitor node address and ceph.client.admin.keyring path in every command, we can now deploy these to each node so they are available automatically. Again working from the ‘mycluster’ folder on the admin node:

$ ceph-deploy admin cephadmin ceph01 ceph02 ceph03

This should give the following:

image

Next we need to deploy the manager (‘mgr’) service to the OSD nodes, again working from the ‘mycluster’ folder on the admin node:

$ ceph-deploy mgr create ceph01 ceph02 ceph03

At this stage we can check that all of the mon and mgr services are started and ok by running (on the admin node):

$ sudo ceph -s
  cluster:
    id:     98ca274e-f79b-4092-898a-c12f4ed04544
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum ceph01,ceph02,ceph03
    mgr: ceph01(active), standbys: ceph02, ceph03
    osd: 0 osds: 0 up, 0 in

  data:
    pools:   0 pools, 0 pgs
    objects: 0  objects, 0 B
    usage:   0 B used, 0 B / 0 B avail
    pgs:

As you can see, the manager (‘mgr’) service is installed on all 3 nodes but only active on the first and in standby mode on the other 2 – this is normal and correct. The monitor (‘mon’) service is running on all of the storage nodes.

Next we can configure the disks attached to our storage nodes for use by ceph. Ensure that you know and use the correct identifier for your disk devices (in this case, we are using the 2nd SCSI disk attached to the storage node VMs which is at /dev/sdb so that’s what we’ll use in the commands below). As before, run the following only on the admin node:

$ ceph-deploy osd create --data /dev/sdb ceph01
$ ceph-deploy osd create --data /dev/sdb ceph02
$ ceph-deploy osd create --data /dev/sdb ceph03

For each command the last line of the logs shown when run should be similar to ‘Host ceph01 is now ready for osd use.’

We can now check the overall cluster health with:

$ ssh ceph01 sudo ceph health
HEALTH_OK
$ ssh ceph01 sudo ceph -s
  cluster:
    id:     98ca274e-f79b-4092-898a-c12f4ed04544
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum ceph01,ceph02,ceph03
    mgr: ceph01(active), standbys: ceph02, ceph03
    osd: 3 osds: 3 up, 3 in

  data:
    pools:   0 pools, 0 pgs
    objects: 0  objects, 0 B
    usage:   3.0 GiB used, 147 GiB / 150 GiB avail
    pgs:

As you can see, the 3 x 50GB disks have now been added and the total (150 GiB) capacity is available under the data: section.

Now we need to create a ceph storage pool ready for Kubernetes to consume from – the default name of this pool is ‘rbd’ (if not specified), but it is strongly recommended to name it differently from the default when using for k8s so I’ve created a storage pool called ‘kube’ in this example (again running from the mycluster folder on the admin node):

$ sudo ceph osd pool create kube 30 30
pool 'kube' created

The two ’30’s are important – you should review the ceph documentation here for Pool, PG and CRUSH configuration to establish values for PG and PGP appropriate to your environment.

We now associated this pool with the rbd (RADOS block device) application so it is available to be used as a RADOS block device:

$ sudo ceph osd pool application enable kube rbd
enabled application 'rbd' on pool 'kube'

Testing Ceph Storage

The easiest way to test our ceph cluster is working correctly and can provide storage is to attempt creating and using a new RADOS Block Device (rbd) volume from our admin node.

Before this will work we need to tune the rbd features map by editing ceph.conf on our client to disable rbd features that aren’t available in our Linux kernel (on admin/client node):

$ echo "rbd_default_features = 7" | sudo tee -a /etc/ceph/ceph.conf
rbd_default_features = 7

Now we can test creating a volume:

$ sudo rbd create --size 1G kube/testvol01

Confirm that the volume exists:

$ sudo rbd ls kube
testvol01

Get information on our volume:

$ sudo rbd info kube/testvol01
rbd image 'testvol01':
        size 1 GiB in 256 objects
        order 22 (4 MiB objects)
        id: 10e96b8b4567
        block_name_prefix: rbd_data.10e96b8b4567
        format: 2
        features: layering, exclusive-lock
        op_features:
        flags:
        create_timestamp: Sun Jan 27 08:50:45 2019

Map the volume to our admin host (which creates the block device /dev/rbd0):

$ sudo rbd map kube/testvol01
/dev/rbd0

Now we can create a temporary mount folder, make a filesystem on our volume and mount it to our temporary mount:

$ sudo mkdir /testmnt
$ sudo mkfs.xfs /dev/rbd0
meta-data=/dev/rbd0              isize=512    agcount=9, agsize=31744 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=0
data     =                       bsize=4096   blocks=262144, imaxpct=25
         =                       sunit=1024   swidth=1024 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=8 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
$ sudo mount /dev/rbd0 /testmnt
$ df -vh
Filesystem      Size  Used Avail Use% Mounted on
udev            1.9G     0  1.9G   0% /dev
tmpfs           395M  5.7M  389M   2% /run
/dev/sda1       9.6G  2.2G  7.4G  24% /
tmpfs           2.0G     0  2.0G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           2.0G     0  2.0G   0% /sys/fs/cgroup
/dev/sda15      105M  3.4M  101M   4% /boot/efi
tmpfs           395M     0  395M   0% /run/user/1001
/dev/rbd0      1014M   34M  981M   4% /testmnt

We can see our volume has been mounted successfully and can now be used as any other disk.

To tidy up and remove our test volume:

$ sudo umount /dev/rbd0
$ sudo rbd unmap kube/testvol01
$ sudo rbd remove kube/testvol01
Removing image: 100% complete...done.
$ sudo rmdir /testmnt

Kubernetes CSE Cluster

Using VMware Container Service Extension (CSE) makes it easy to deploy and configure a base Kubernetes cluster into our vCloud Director platform. I previously wrote a post here with a step-by-step guide to using CSE.

First we need an ssh key pair to provide to the CSE nodes as they are deployed to allow us to access them. You could re-use the cephadmin key-pair created in the previous section, or generate a new set. As I’m using Windows as my client OS I used the puttygen utility included in the PuTTY package to generate a new keypair and save them to a .ssh directory in my home folder.

Important Note: Check your public key file in a text editor prior to deploying the cluster, if it looks like this:

This image has an empty alt attribute; its file name is image-12.png
Public key as generated by PuTTYGen (incorrect)

You will need to change it to be all on one line starting ‘ssh-rsa’ and with none of the extra text as follows:

This image has an empty alt attribute; its file name is image-11.png

If you do not make this change this you won’t be able to authenticate to your cluster nodes once deployed.

Next we login to vCD using the vcd-cli (see my post linked above if you need to install/configure vcd-cli and the CSE extension):

Logging in to vcd-cli

Now we can see what virtual Datacenters (VDCs) are available to us:

Showing available VDCs

If we had multiple VDCs available, we need to select which one is ‘in_use’ (active) for deployment of our cluster using ‘vcd vdc use “<VDC Name>”‘. In this case we only have a single VDC and it’s already active/in use.

We can get the information of our VDC which will help us fill out the required properties when creating our k8s cluster:

VDC Properties returned by vdc info

We will be using the ‘Tyrell Servers A03’ network (where our ceph cluster exists) and the ‘A03 VSAN Performance’ storage profile for our cluster.

To get the options available when creating a cluster we can see the cluster creation help:

CSE cluster create options

Now we can go ahead and create out Kubernetes cluster with CSE:

Looking in vCloud Director we can see the new vApp and VMs deployed:

We obtain the kubectl config of our cluster and store this for later use (make the .kube folder first if it doesn’t already exist):

C:\Users\admin>vcd cse cluster config k8sceph > .kube\config

And get the details of our k8s nodes from vcd-cli:

Next we need to update and install the ceph client on each cluster node – run the following on each node (including the master). To do this we can connect via ssh as root using the key pair we specified when creating the cluster.

# wget -q -O- 'https://download.ceph.com/keys/release.asc' | sudo apt-key add -
OK
# echo deb https://download.ceph.com/debian-mimic/ $(lsb_release -sc) main > /etc/apt/sources.list.d/ceph.list
# apt-get update
# apt-get install --install-recommends linux-generic-hwe-16.04 -y
# apt-get install ceph-common -y
# reboot

You should now be able to connect from an admin workstation and get the nodes in the kubernetes cluster from kubectl (if you do not already have kubectl installed on your admin workstation, see here for instructions).

Note: if you expand the CSE cluster at any point (add nodes), you will need to repeat this series of commands on each new node in order for it to be able to mount rbd volumes from the ceph cluster.

You should also be able to verify that the core kubernetes services are running in your cluster:

The ceph configuration files from the ceph cluster nodes need to be added to all nodes in the kubernetes cluster. Depending on which ssh keys you have configured for access, you may be able to do this directly from the ceph admin node as follows:

$ sudo scp /etc/ceph/ceph.* root@192.168.207.102:/etc/ceph/
$ sudo scp /etc/ceph/ceph.* root@192.168.207.103:/etc/ceph/
$ sudo scp /etc/ceph/ceph.* root@192.168.207.104:/etc/ceph/
$ sudo scp /etc/ceph/ceph.* root@192.168.207.105:/etc/ceph/

If not, manually copy the /etc/ceph/ceph.conf and /etc/ceph/ceph.client.admin.keyring files to each of the kubernetes nodes using copy/paste or scp from your admin workstation (copy the files from the ceph admin node to ensure that the rbd_default_features line is included).

To confirm everything is configured correctly, we should now be able to create and mount a test rbd volume on any of the kubernetes nodes as we did for the ceph admin node previously:

root@mstr-x4nb:~# rbd create --size 1G kube/testvol02
root@mstr-x4nb:~# rbd ls kube
root@mstr-x4nb:~# rbd info kube/testvol02
rbd image 'testvol02':
        size 1 GiB in 256 objects
        order 22 (4 MiB objects)
        id: 10f36b8b4567
        block_name_prefix: rbd_data.10f36b8b4567
        format: 2
        features: layering, exclusive-lock
        op_features:
        flags:
        create_timestamp: Sun Jan 27 21:56:59 2019
root@mstr-x4nb:~# rbd map kube/testvol02
/dev/rbd0
root@mstr-x4nb:~# mkdir /testmnt
root@mstr-x4nb:~# mkfs.xfs /dev/rbd0
meta-data=/dev/rbd0              isize=512    agcount=9, agsize=31744 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=0
data     =                       bsize=4096   blocks=262144, imaxpct=25
         =                       sunit=1024   swidth=1024 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=8 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
root@mstr-x4nb:~# mount /dev/rbd0 /testmnt
root@mstr-x4nb:~# df -vh
Filesystem      Size  Used Avail Use% Mounted on
udev            1.9G     0  1.9G   0% /dev
tmpfs           395M  5.7M  389M   2% /run
/dev/sda1       9.6G  4.0G  5.6G  42% /
tmpfs           2.0G     0  2.0G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           2.0G     0  2.0G   0% /sys/fs/cgroup
/dev/sda15      105M  3.4M  101M   4% /boot/efi
tmpfs           395M     0  395M   0% /run/user/0
/dev/rbd0      1014M   34M  981M   4% /testmnt
root@mstr-x4nb:~# umount /testmnt
root@mstr-x4nb:~# rbd unmap kube/testvol02
root@mstr-x4nb:~# rmdir /testmnt/
root@mstr-x4nb:~# rbd remove kube/testvol02
Removing image: 100% complete...done.

Note: If the rbd map command hangs you may still be running the stock Linux kernel on the kubernetes nodes – make sure you have restarted them.

Now we have a functional ceph storage cluster capable of serving block storage devices over the network, and a Kubernetes cluster configured able to mount rbd devices and use these. In the next section we will configure kubernetes and ceph together with the rbd-provisioner container to enable dynamic persistent storage for pods deployed into our infrastructure.

Putting it all together

Kubernetes secrets

We need to first tell Kubernetes account information to be used to connect to the ceph cluster, to do this we create a ‘secret’ for the ceph admin user, and also create a client user to be used by k8s provisioning. Working on the kubernetes master node is easiest for this as it has ceph and kubectl already configured from our previous steps:

# ceph auth get-key client.admin

This will return a key like ‘AQCLY0pcFXBYIxAAhmTCXWwfSIZxJ3WhHnqK/w==’ which is used in the next command (Note: the ‘=’ sign between –from-literal and key is not a typo – it actually needs to be like this).

# kubectl create secret generic ceph-secret --type="kubernetes.io/rbd" \
--from-literal=key='AQCLY0pcFXBYIxAAhmTCXWwfSIZxJ3WhHnqK/w==' --namespace=kube-system
secret "ceph-secret" created

We can now create a new ceph user ‘kube’ and register the secret from this user in kubernetes as ‘ceph-secret-kube’:

# ceph auth get-or-create client.kube mon 'allow r' osd 'allow rwx pool=kube'
[client.kube]
        key = AQDqZU5c0ahCOBAA7oe+pmoLIXV/8OkX7cNBlw==
# kubectl create secret generic ceph-secret-kube --type="kubernetes.io/rbd" \
--from-literal=key='AQDqZU5c0ahCOBAA7oe+pmoLIXV/8OkX7cNBlw==' --namespace=kube-system
secret "ceph-secret-kube" created

rbd-provisioner

Kubernetes is in the process of moving storage provisioners (such as the rbd one we will be using) out of its main packages and into separate projects and packages. There’s also an issue that the kubernetes-controller-manager container no longer has access to an ‘rbd’ binary in order to be able to connect to a ceph cluster directly. We therefore need to deploy a small ‘rbd-provisioner’ to act as the go-between from the kubernetes cluster to the ceph storage cluster. This project is available under this link and the steps below show how to obtain get a kubernetes pod running the rbd-provisioner service up and running (again working from the k8s cluster ‘master’ node):

# git clone https://github.com/kubernetes-incubator/external-storage
Cloning into 'external-storage'...
remote: Enumerating objects: 2, done.
remote: Counting objects: 100% (2/2), done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 63661 (delta 0), reused 1 (delta 0), pack-reused 63659
Receiving objects: 100% (63661/63661), 113.96 MiB | 8.97 MiB/s, done.
Resolving deltas: 100% (29075/29075), done.
Checking connectivity... done.
# cd external-storage/ceph/rbd/deploy
# sed -r -i "s/namespace: [^ ]+/namespace: kube-system/g" ./rbac/clusterrolebinding.yaml ./rbac/rolebinding.yaml
# kubectl -n kube-system apply -f ./rbac
clusterrole.rbac.authorization.k8s.io "rbd-provisioner" created
clusterrolebinding.rbac.authorization.k8s.io "rbd-provisioner" created
deployment.extensions "rbd-provisioner" created
role.rbac.authorization.k8s.io "rbd-provisioner" created
rolebinding.rbac.authorization.k8s.io "rbd-provisioner" created
serviceaccount "rbd-provisioner" created
# cd

You should now be able to see the ‘rbd-provisioner’ container starting and then running in kubernetes:

Testing it out

Now we can create our kubernetes Storageclass using this storage ready for a pod to make a persistent volume claim (PVC) against. Create the following as a new file (I’ve named mine ‘rbd-storageclass.yaml’). Change the ‘monitors’ line to reflect the IP addresses of the ‘mon’ nodes in your ceph cluster (in our case these are on the ceph01, ceph02 and ceph03 nodes on the IP addresses shown in the file).

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: rbd
provisioner: ceph.com/rbd
parameters:
  monitors: 192.168.207.201:6789, 192.168.207.202:6789, 192.168.207.203:6789
  adminId: admin
  adminSecretName: ceph-secret
  adminSecretNamespace: kube-system
  pool: kube
  userId: kube
  userSecretName: ceph-secret-kube
  userSecretNamespace: kube-system
  imageFormat: "2"
  imageFeatures: layering

You can then add this StorageClass to kubernetes using:

# kubectl create -f ./rbd-storageclass.yaml
storageclass.storage.k8s.io "rbd" created

Next we can create a test PVC and make sure that storage is created in our ceph cluster and assigned to the pod. Create a new file ‘pvc-test.yaml’ as:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: testclaim
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: rbd

We can now submit the PVC to kubernetes and check it has been successfully created:

# kubectl create -f ./pvc-test.yaml
persistentvolumeclaim "testclaim" created
# kubectl get pvc testclaim
NAME        STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
testclaim   Bound     pvc-1e9bdbfd-22a8-11e9-ba77-005056340036   1Gi        RWO            rbd            21s
# kubectl describe pvc testclaim
Name:          testclaim
Namespace:     default
StorageClass:  rbd
Status:        Bound
Volume:        pvc-1e9bdbfd-22a8-11e9-ba77-005056340036
Labels:        <none>
Annotations:   pv.kubernetes.io/bind-completed=yes
               pv.kubernetes.io/bound-by-controller=yes
               volume.beta.kubernetes.io/storage-provisioner=ceph.com/rbd
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      1Gi
Access Modes:  RWO
Events:
  Type    Reason                 Age   From                                                                               Message
  ----    ------                 ----  ----                                                                               -------
  Normal  ExternalProvisioning   3m    persistentvolume-controller                                                        waiting for a volume to be created, either by external provisioner "ceph.com/rbd" or manually created by system administrator
  Normal  Provisioning           3m    ceph.com/rbd_rbd-provisioner-bc956f5b4-g6rc2_1f37a6c3-22a6-11e9-aa61-7620ed8d4293  External provisioner is provisioning volume for claim "default/testclaim"
  Normal  ProvisioningSucceeded  3m    ceph.com/rbd_rbd-provisioner-bc956f5b4-g6rc2_1f37a6c3-22a6-11e9-aa61-7620ed8d4293  Successfully provisioned volume pvc-1e9bdbfd-22a8-11e9-ba77-005056340036
# rbd list kube
kubernetes-dynamic-pvc-25e94cb6-22a8-11e9-aa61-7620ed8d4293
# rbd info kube/kubernetes-dynamic-pvc-25e94cb6-22a8-11e9-aa61-7620ed8d4293
rbd image 'kubernetes-dynamic-pvc-25e94cb6-22a8-11e9-aa61-7620ed8d4293':
        size 1 GiB in 256 objects
        order 22 (4 MiB objects)
        id: 11616b8b4567
        block_name_prefix: rbd_data.11616b8b4567
        format: 2
        features: layering
        op_features:
        flags:
        create_timestamp: Mon Jan 28 02:55:19 2019

As we can see, our test claim has successfully requested and bound a persistent storage volume from the ceph cluster.

References

Cephhttps://ceph.com/
Dockerhttps://www.docker.com/
Kuberneteshttps://kubernetes.io/
VMware Container Service Extensionhttps://vmware.github.io/container-service-extension/ 
VMware vCloud Director for Service Providershttps://docs.vmware.com/en/vCloud-Director/index.html 

Wow, this post ended up way longer than I was anticipating when I started writing it. Hopefully there’s something useful for you in amongst all of that.

I’d like to thank members of the vExpert community for their encouragement and advice in getting this post written up and as always, if you have any feedback please leave a comment.

Time-permitting, there will be a followup to this post which details how to deploy containers to this platform using the persistent storage made available, both directly in Kubernetes and using Helm charts. I’d also like to cover some of the more advanced issues using persistent storage in containers raises – in particular backup/recovery and replication/high availability of data stored in this manner.

Jon

VM Guest Customization in vCloud Director via PowerCLI

Bit of a quick post this, but hopefully useful to others.

I got asked recently if there was an easy way to set Guest Customization options for VMs hosted in vCloud Director via Powershell/PowerCLI. It turns out there is an extremely simple way, but the syntax is a bit awkward so figured it would make a good/quick blog post.

The Guest Customization settings are available as one of the ‘Section’ entries returned by accessing the ExtensionData properties on a CIVM object. Once connected (Connect-CIServer) you can see this from PowerCLI:

image

The ‘trick’ is that there are typically 5 sections (one each for OvfVSSD, OvfMsg, network connections, guest Customization and VMware tools). I’ve seen some approaches that rely on the ‘guest Customization’ setting always being found at the Section[3] index in the ExtensionData collection, but this could easily change in future and break any functionality relying on this. A much more reliable way of finding the guest Customization section values is:

image

But how about if you need to change/update a setting, luckily there is a method provided (UpdateServerData) which does exactly this. So if we want to (for example) change the ‘CustomizationScript’ setting to “echo “Hello World!” we can:

image

You can change other settings using the same method (e.g. ComputerName or Domain join settings).

Note that for many changes the VM must be powered off, and you may need to ‘Power On and Force Recustomization’ too.

As always, comments & feedback appreciated.

Jon.

Getting detailed VM Disk Properties from the vCloud API

Since vCloud Director 8.10 VMware have allowed VMs to be created which have multiple disks using different storage policies. This can be very useful – for example, a database VM might have it’s database on fast storage but another disk containing backups or logs on slower/cheaper disk.

When trying to find out what storage is in use for a VM though this can create issues, the PowerCLI Get-CIVM cmdlet (and the Get-CIView cmdlet used to get extra information) aren’t able to properly report storage for VMs that consume multiple storage policies. This in turn can create problems for Service Providers when they need to report on overall VM disk usage divided by storage policy used.

As an example I’ve created a VM named ‘test01’ in a customer vDC which has 3 disks attached, the 2nd of these is on ‘Capacity’ tier storage while disks 1 and 3 are on ‘Performance’ storage. When we look at the VM details we see the following:

image

Digging into the ExtensionData shows

image

The StorageProfile element looks like it may contain what we need, but unfortunately this only shows the ‘home’ Storage for the VM and doesn’t indicate that at least one of the VMs disks is on a different storage profile:

image

After a lot of mucking around trying to find an easy way to discover the information, I ‘gave up’ and wrote a PowerShell module which accesses the vCD API directly to get the VM storage information (including storage tiers in use by each disk). The module isn’t overly efficient since it queries the storage profile reference for every disk on every VM (and so will result in a lot of calls if run for a large number of VMs), but otherwise works fine.

The module takes VM objects or a VM name as input and returns details on each disk attached to the VM including which storage profile they use. Save the script (e.g. as ‘Get-CIVMStorageProfile.psm1’) and then use ‘Import-Module .\Get-CIVMStorageProfile.psm1’ to import the function.

<#
  .Synopsis
   Gets detailed storage information from a vCloud VM.

.Description
   This function returns detailed disk information for a vCloud VM. Specifically
   it shows the number of disks attached and the storage policy assigned to each
   disk which is useful when VMs consume storage from multiple policies.

.Parameter CIVM
   The VM object (from Get-CIVM) to report storage for.

.Example
   Get-CIVM -Name 'test01' | Get-CIVMStorageDetail
#>

Function Get-CIVMStorageDetail
{
     [CmdletBinding()]
     Param(
         [Parameter(ValueFromPipeline)]
         $CIVM
     )
     begin {}
     process
     {

        # API version to use when communicating with vCloud Director - API 27.0 is vCloud Director 8.20:
         $vCDAPIVersion = "27.0"
                 
         # Check if we've been passed a VM name or an actual VM object and handle appropriately
         if ($CIVM.GetType() -eq [String]) {
             try {
                 $VMObj = Get-CIVM -Name $CIVM -ErrorAction Stop
             } catch {
                 Write-Host -ForegroundColor Red "Error: Could not find a VM with the name $CIVM."
                 Break
             }
         }
         else {
             $VMObj = $CIVM
         }

        # Find our vCloud SessionId that matches the URI of the VM object:
         $SessionId = $global:DefaultCIServers.SessionId | Where-Object { $VMObj.href -match $_.ServiceUri }

        try {
             $vmxml = Invoke-RestMethod -Method Get -Uri "$($VMObj.href)/virtualHardwareSection/disks" -Headers @{'x-vcloud-authorization'=$SessionId; 'Accept'="application/*+xml;version=$($vCDAPIVersion)"} -ErrorAction Stop
         } catch {
             Write-Host -ForegroundColor Red "Error attempting to get VM details from API:"
             Write-Host -ForegroundColor Red "Status Code: $($_.Exception.Response.StatusCode.value__)"
             Write-Host -ForegroundColor Red "Status Description: $($_.Exception.Response.StatusDescription)"
             Break
         }

        # Build an empty object for the VM disk details:
         $vmdisks = @()

        # RASD resource type 17 is a hard disk attached to a VM:
         foreach($disk in ($vmxml.RasdItemsList.Item | Where-Object -Property ResourceType -eq 17)) {
             
             # Dereference the StorageProfileHref for each disk to get the Storage Profile Name:
             try {
                 $sprof = Invoke-RestMethod -Method Get -Uri "$($disk.HostResource.storageProfileHref)" -Headers @{'x-vcloud-authorization'=$SessionId; 'Accept'="application/*+xml;version=$($vCDAPIVersion)"} -ErrorAction Stop
             } catch {
                 Write-Host -ForegroundColor Red "Error attempting to get Storage Profile Name:"
                 Write-Host -ForegroundColor Red "Status Code: $($_.Exception.Response.StatusCode.value__)"
                 Write-Host -ForegroundColor Red "Status Description: $($_.Exception.Response.StatusDescription)"
                 Break
             }
             
             $diskprops = @{
                 VMName         = [string]$VMObj.Name
                 InstanceID     = [string]$disk.InstanceID
                 StorageProfile = [string]$sprof.VdcStorageProfile.Name
                 CapacityGB     = [float][math]::Round(($disk.VirtualQuantity / 1024 / 1024 / 1024),3)
                 ElementName    = [string]$disk.ElementName
             }

            $diskobj = New-Object PSObject -Property $diskprops
             $vmdisks += $diskobj
         }

        return $vmdisks
     } # end process

}
Export-ModuleMember -Function Get-CIVMStorageDetail

And here is example output from the script for our test VM:

image

Hope this is useful to some of you and as always, appreciate any comments/feedback.

I’d also love to know if there’s an easier way of generating this information.

Jon.

Using vCloud Director PowerCLI and vcd-cli with Federated User Accounts

One of the issues that vCloud Director user can run into is user authentication when using the PowerCLI and vcd-cli tools to manage their cloud deployments. For ‘Local’ user accounts defined in the vCloud Director portal this isn’t an issue as username/password are stored in the vCD database and can be directly authenticated. However, many customers want to federate their vCloud users with an external directory service (often Microsoft AD FS or other similar service). Typically this is done so that security groups in the external directory can be used to control access levels, and so that additional authentication mechanisms like 2-Factor Authentication (2FA) can be applied to accounts.

If you attempt to use CLI tools like vcd-cli or PowerCLI to authenticate with a federated user account you will get a ‘Login Failed’ or ‘Unauthorized’ failure and won’t be able to connect to the service.

Fortunately, both vcd-cli and PowerCLI allow you to use an existing browser vCloud session ID to connect to the vCD API. To use this you connect to your vCloud portal in a web browser and then then use your browser’s tools to find the session ID for your connection. Once you have the session ID you can create a PowerCLI or vcd-cli session using that token.

It can sometimes be easier to use a browser plugin or extension to help find the session ID, ones which show session cookies and/or HTTP headers work best, but even without these it is possible.

In Google Chrome for example, use <ctrl + shift + I> (or Menu / More Tools / Developer Tools) to open the developer interface. Next click on the ‘Network’ heading at the top of the developer panel and refresh the vCloud Director portal. Scroll down to one of the ‘amfsecure’ document lines and select the ‘Headers’ tab, you should see a panel similar to this:

image

You can simply copy the value from the highlighted entry (87489f6a17044d66bc36704ce5c4e45c in this example) and use that to establish a vcd-cli or PowerCLI session:

For vcd-cli:

vcd login <cloud endpoint> <org name> <user name> –d <session ID string>

e.g.

vcd login mycloudprovider.com myorg joebloggs –d 87489f6a17044d66bc36704ce5c4e45c

For PowerCLI:

Connect-CIServer –Server <cloud endpoint> –SessionID <session ID string>

e.g.

Connect-CIServer –Server mycloudprovider.com –SessionID 87489f6a17044d66bc36704ce5c4e45c

You will then be connected as the same user from your browser session and able to run all the PowerCLI or vcd-cli commands with that user account.

An easier way?

Rather than digging around for HTTP headers and cookies in a browser, vcd-cli has a built-in module which is meant to retrieve the sessionID from a browser session automatically and use this to authenticate, the syntax is:

vcd login session list chrome

Which should return the session ID from an instance of Chrome, but in my initial testing this was not returning any output at all.

Reading through the vcd-cli sources it appears that this option relies on a Python extension ‘browsercookie’ which can be installed using pip install --user browsercookie. Browsercookie has a dependency on the ‘pycrypto’ module which must also be installed. However, even with both pycrypto and browsercookie installed I couldn’t get this option to work.

I did manage to get this working by installing the browser_cookie3 module from https://pypi.python.org/pypi/browser-cookie3/0.6.0 by using pip install --user browser-cookie3 and then making the following changes in the vcd-cli\login.py file:

Line 24: Change:

from vcd_cli import browsercookie to: import browser_cookie3

On both lines 126 and 148: Change:

cookies = browsercookie.chrome() to: cookies = browser_cookie3.chrome()

Once these changes are complete the ‘vcd login session list chrome’ command can be used to obtain the current session ID from Chrome automatically:

image

And this can be used directly to login automatically once a Chrome session exists using the --use-browser-session switch.

Also note that you can obtain the session ID like this from vcd-cli and use it to authenticate a PowerCLI session with no issues at all.

Jon.

Using VMware Container Service Extension (CSE)

Yesterday I wrote showing the currently available container hosting options from VMware. As we’ve recently deployed one of these options – CSE in our environment I thought it would be useful to show a sample workflow on how the service functions and how customers can use this to deploy and manage both CSE clusters, and also micro-service applications onto those clusters.

There are a few requirements on the tenant side which must be completed prior to any of this working:

  • An Organizational Administrator login to the vCloud platform where CSE is deployed.
  • Access to a virtual datacenter (VDC) with sufficient CPU, Memory and Storage resources for the cluster to be deployed into.
  • An Org VDC network which can be used by the cluster and has sufficient free IP addresses in a Static Pool to allocate to the cluster nodes (clusters take 1 IP address for the ‘master’ node and an additional address for each ‘worker’ node deployed).
  • A client prepared with Python v3 installed and the vcd-cli and container-service-extension packages installed on it.
  • The {$HOMEDIR}\.vcd-cli\profiles.yaml file edited to add the CSE extension to vcd-cli.
  • The kubectl utility installed to administer the Kubernetes cluster once deployed and working. kubectl can be obtained most easily from here.

Detailed instructions for the client setup can be found in the CSE documentation at https://vmware.github.io/container-service-extension/#tenant-installation. Note that on a Windows platform the .vcd-cli folder and profiles.yaml file will not be automatically created, but you can do this manually by

mkdir %HOMEPATH%\.vcd-cli

from a DOS prompt and then using vcd-cli to log in and out of your cloud provider. This will cause profiles.yaml to be generated in the .vcd-cli folder. The profiles.yaml file can then be edited in your favourite text editor to add the required CSE extension lines.

Deploying a Cluster with CSE

When deploying a cluster, you will need to know the storage profile and network names which the cluster will use, the easiest way of obtaining these is either from the vCloud portal, or using the vcd vdc info command when logged in to your environment:

image

If you have multiple VDCs available to you, the ‘vcd vdc use <VDC Name>’ command to set which one to work with.

In this example we will be using the highlighted entries (the ‘Tyrell-Servers’ network and the ‘CHC Performance’ storage profile).

To retrieve a list of available cluster deployment templates that the Service Provider has made available to us we can use the vcd cse template list command:

image

In this example only the Photon OS template is available and is also the default template. CSE actually comes with 2 profiles (Photon OS v2 and Ubuntu Linux 16-04, but I’ve only installed the Photon OS v2 template in my lab environment). The default template will be used if you do not specify the ‘–template’ switch when creating a cluster.

The cluster create command takes a number of parameters which are documented in the CSE page:

image

Be careful with the memory specification is it is in MB and not GB.

I chose to generate a public/private key to access the cluster nodes without needing a password, but this is optional. If you want to use key authentication you will need to generate a key pair and specify the public key filename in the cluster creation command using the –ssh-key switch.

To deploy a cluster with 3 worker nodes into our VDC where each node has 4GB of RAM and 2 CPUs using my public key and the network and storage profile identified above:

image

The deployment process will take several minutes to complete as the cluster VMs are deployed and started.

In to the vCloud Director portal, we can see the new vApp that has been deployed with our master and worker nodes inside it, we can also see that all 4 VMs are connected to the network we specified:

image

To see the details of the nodes deployed we can use ‘vcd cse node list <cluster name>’:

image

To manage the cluster with kubectl, we need a configuration file for Kubernetes containing our authentication certificates. kubectl by default looks for a file named ‘config’ in a folder called ‘.kube’ under the current user’s home directory. The config file itself can be downloaded using CSE. To create the folder and write the config file:

image

If you have multiple deployed clusters you can create separate config files for each one (with different file names) and use the –kubeconfig= switch to kubectl to select which one to use.

To test kubectl we can ask for a list of all containers (‘pods’ in Kubernetes) from the cluster, the ‘–all-namespaces’ switch shows system pods as well as any user created pods (which we don’t have yet). This must be run from a machine that has network connectivity with the deployed nodes (the ‘Tyrell-Servers’ network in this example):

image

 

Cluster Scaling

Adding Nodes to Clusters

If we need to add worker nodes to a cluster this is accomplished with the ‘vcd cse node create’ command. For example, we can add a 4th worker node to our ‘myCluster’ cluster as follows:

image

The node list now shows our cluster with 4 worker nodes including our new one:

image

Removing Nodes from Clusters

To remove a cluster member is just as easy using the ‘vcd cse node delete’ command:

image

You will be prompted to confirm the node deletion, and if you have deployed container applications you should ensure that the node is properly drained and/or replica sets and deployments configured correctly so that the node deletion will not impact your applications.

 

Cluster Host Affinity

One item that CSE does not deal with yet is creating vCloud Anti-Affinity rules to ensure that your worker nodes are spread across different physical hosts. This means that with appropriately configured applications a host failure will not impact on the availability of your deployed services. It is reasonably straightforward to add anti-affinity rules in the vCloud portal though.

Our test cluster is back to 3 nodes following the deletion example:

image

In the vCloud portal we can go to ‘Administration’ and select our virtual datacenter in the left pane, we will then see an ‘Affinity Rules’ tab:

image

Clicking the ‘+’ icon under Anti-Affinity Rules allows us to create a new rule to keep our worker nodes on separate hosts:

image

Provided the VDC has sufficient backing physical hosts, the screen will update to show the new rule and that it has successfully been applied and separated the worker nodes to different hosts:

image

Of course if the host running the master node experiences a failure then this will be unavailable until the VMware platform restarts the VM on another host.

 

Application Deployment using kubectl

Of course now that our cluster is up and running, it would be nice to actually deploy a workload to it. The ‘sock shop’ example mentioned in the CSE documentation is a good example application to try as it consists of several pods running in a separate namespace.

First we use kubectl to create the namespace:

image

Now we can deploy the application into our name space from the microservices-demo project on github. You can read more about the sock-shop demo app at https://github.com/microservices-demo/microservices-demo.

C:\Users\jon>kubectl apply -n sock-shop -f "https://github.com/microservices-demo/microservices-demo/blob/master/deploy/kubernetes/complete-demo.yaml?raw=true"
deployment "carts-db" created
service "carts-db" created
deployment "carts" created
service "carts" created
deployment "catalogue-db" created
service "catalogue-db" created
deployment "catalogue" created
service "catalogue" created
deployment "front-end" created
service "front-end" created
deployment "orders-db" created
service "orders-db" created
deployment "orders" created
service "orders" created
deployment "payment" created
service "payment" created
deployment "queue-master" created
service "queue-master" created
deployment "rabbitmq" created
service "rabbitmq" created
deployment "shipping" created
service "shipping" created
deployment "user-db" created
service "user-db" created
deployment "user" created
service "user" created

We can see deployment status by getting the pod status in our namespace:

image

After a short while all the pods should have been created and show a status of ‘Running’:

image

The ‘sock-shop’ demo creates a service which listens on port 30001 on all nodes (including the master node) for http traffic, so we can get our master node IP address from ‘vcd cse node list myCluster’ and open this page in a browser:

image

And here’s our deployed application running!

image

Summary / Further Reading

Of course there’s much more that can be done with Docker and Kubernetes, but hopefully I’ve been able to demonstrate how easily a cluster can be deployed using CSE and how micro-services applications can be run in this platform.

For further reading on kubectl and all the available functionality I can recommend the Kubernetes kubectl documentation at https://kubernetes.io/docs/reference/kubectl/overview/. In fact the entire Kubernetes site is well worth a read for those considering deployment of these architectures.

As always, comments, feedback, suggestions and corrections always welcome.

Jon.

vCloud Director Extender – Part 5 – Stretch Networking (L2VPN)

In this 5th part of my look into vCloud Director Extender (CX), I deal with the extension of a customer vCenter network into a cloud provider network using the L2VPN network extension functionality. Apologies that this post has been a bit delayed, turned out that I needed a VMware support request and a code update to vCloud Director 9.0.0.1 before I could get this functionality working. (I also had an issue with my lab environment which runs as a nested platform inside a vCloud Director environment and it turned out that the networking environment I had wasn’t quite flexible enough to get this working).

Update: an earlier version of this article didn’t include the steps to configure the L2 appliance settings in the vCloud Director Extender web interface – I’ve now added these to provide a more complete guide.

Links to the other parts of this series:
Part 1 – Overview
Part 2 – Cloud Provider / Service Provider installation and configuration (MyCloud)
Part 3 – Customer / Tenant installation and configuration (Tyrell)
Part 4 – Customer / Tenant connecting to a Cloud Provider and Virtual Machine migration (Tyrell)

I won’t deal with the use-case here that the customer already has NSX networking installed and configured, since in most cases you can simply create L2VPN networks directly between the customer and provider NSX Edge appliances and don’t really need to use the CX L2VPN functionality.

In order to be able to use the standalone L2VPN connectivity, the following pre-requisites are required:

  • A tenant vSphere environment with the vCloud Director Extender appliance deployed (it does not appear to be necessary to deploy the replication appliance if you only wish to use the L2VPN functionality, but obviously if you are intending to migrate VMs too you will need this deployed and configured as described in Part 3 of this series. In either case you will still need to register the cloud provider in the CX interface.
  • A configured vCloud Director VDC for the tenant to connect to. This environment must also have an Advanced Edge Gateway deployed with at least one uplink having a publicly accessible (internet) IP address. Note that you do not need to configure the L2VPN service on this gateway – the CX wizard completes this for you.
  • At least one OrgVDC network created as a subinterface on this edge gateway. The steps to create a suitable new OrgVDC network are detailed below.
  • Outbound internet connectivity to allow the standalone edge deployed in the tenant vCenter to communicate with the cloud-hosted edge gateway – only port 443/tcp is required for this.
  • Administrative credentials to connect to both the tenant vCenter and the cloud tenancy/VDC (Organization Administrator role is required).

Opening the tenant vCenter environment and selecting the ‘Home’ page shows the following:

Selecting the vCloud Director Extender icon opens the CX interface:

If you have not yet configured the L2 appliance settings, selecting the ‘DC Extensions’ tab will show the following error:

To fix this, open the vCloud Director Extender web interface in a browser by opening https://<ip address of deployed cx appliance>/ and log in, select the ‘DC Extensions’ tab:

Select the ‘Add Appliance Configuration’ option and complete the form to provide the deployment parameters where the standalone NSX edge appliance will be deployed:

The ‘Uplink Network Pool IP’ setting is a bit strange – it appears to be asking for a network pool or IP range, but the ‘help text’ in the field is asking for a single IP address. I found that the validation on this field is a bit odd – it will basically accept any input at all (even random strings) without complaining, but obviously deployment won’t work. What you need to do is add individual IPv4 addresses and click the ‘Add’ button for each. You will need 1 address for each stretched network you will be extending to your cloud platform. In this example I am only extending a single network so have added a single IPv4 address (192.168.0.201).

Once you click the ‘Create’ button you will be returned to the ‘DC Extensions’ tab and shown a summary of the L2 appliance configuration:

Note that there doesn’t appear to be any way to edit an existing L2 Appliance configuration, so if you need to change settings (e.g. to add additional uplink IP pool addresses) you will likely need to delete and recreate the entire entry.

 

Next we need to add a new ‘subinterface’ network to our hosted Edge gateway appliance, logging in to our cloud provider portal we can select the ‘Administration’ tab and the ‘Org VDC Networks’ sub-option, clicking the ‘Add’ button shows the dialog to create a new Org VDC Network. We need to select ‘Create a routed network by connecting to an existing edge gateway’ and then check the ‘Create as subinterface’ check box:

Next we configure the standard network information (Gateway, Network mask, DNS etc.) Since this network will be bridged to our on-premises network we can use the same details. Optionally a new Static IP pool can also be created so that new VMs provisioned in the cloud service can use this pool for their IP addresses. This won’t be an issue for VMs being migrated as they will carry across whatever IP addresses are already assigned to them. Note that the gateway address is set to be the same address as the existing (on-premises) gateway – this means that re-configuring the default gateway setting in the guest OS isn’t required either:

Now we supply a name for the new Org VDC network and optionally a description. The check box can also be used if the customer has multiple VDCs and wishes to share the new network across them:

Finally the summary screen allows us to check the information provided and go back and make any changes required if not correct. The most important setting is to make sure the network is attached to the edge gateway as a subinterface:

Once finished creating, the Org VDC network will be shown in the list with a type of ‘Routed’ and an interface type of ‘Subinterface’:

Next we access the vCloud Extender interface from within the customer vCenter plugin, selecting the ‘DC Extensions’ tab takes us to the following dialog:

Selecting ‘New Extension’ shows the dialog to create a new L2 extension, the fields are mostly populated for you. The ‘Enable egress’ allows you to select which gateway(s) will be allowed to forward traffic outside of the extended network. In this example I’ve only configured egress on the Source (on-premises) side through the existing gateway:

When you click ‘Start’, the status will go to ‘Connecting’ and a number of activities will take place in the customer vCenter:

Reading from the bottom (oldest) upwards, a new port group is created, an NSX Edge Standalone appliance is deployed and powered-on and the new port group is reconfigured once this has completed (ignore the VM migration task, that just happened to occur during the same time window in my lab). In this case the new NSX standalone edge was named ‘mcloudext-edge-4’ and the port group ‘mcxt-tpg-l2vpn-vlan-Tyrell-VDC15’.

Once deployment has completed (takes a few minutes) the vCloud Extender client interface shows the new DC extension network with a status of ‘Connected’:

In the tenant vCloud Director portal you can also see the status of the tunnel under ‘Statistics’ and ‘L2 VPN’ within the edge gateway interface:

You will now find that any VMs connected to the stretched network (OrgVDC network) in your cloud environment have L2 connectivity with the on-premises network and will continue to function as if they were still located in the customer’s own datacenter.

As I mentioned at the start of this post, I hit a number of issues when configuring this environment and getting it working took several attempts and a couple of rebuilds of my lab. The main issue was that in the initial release of vCloud Director v9.0.0.0 there is an issue that prevents the details required for the standalone NSX edge being deployed from being returned by the API. This prevents the deployment of the customer edge at all and resulted in my VMware support call. The specific issue is referenced in the vCloud Director 9.0.0.1 release notes  as ‘Resolves an issue where the vCloud Director API does not return a tunnelID parameter in response to a GET /vdcnetworks request sent against a routed Organization VCD network that has a subinterface enabled.’ As far as I can work out, it will be impossible to successfully use L2VPN in CX without upgrading the provider to vCloud Director 9.0.0.1 to resolve this issue.

The other issue I hit in my lab was that my hosted ‘Tenant Edge’ was NAT’d behind another NSX Edge gateway which was also performing NAT translation (Double-NAT). This was due to the way my lab is built in a nested environment inside vCloud Director. Unfortunately this meant the external interface of my hosted ‘Tenant Edge’ was actually an internal network address, so when the customer/on-premise edge tried to establish contact it was using an internal network address which obviously wasn’t going to work. I solved this by connecting a ‘real’ external internet network to my hosted Tenant Edge.

As always, comments and feedback always appreciated.

Jon.

vCloud Director Extender – Part 4 – Connect to Provider & VM Migration

In the first 3 parts of this series I covered an overview of vCloud Director Extender (CX), the installation and configuration of CX at the Cloud Provider site and the installation and configuration of CX at the customer/tenant site. In this 4th part I will be covering the configuration of the tenant environment to connect to the provider cloud and then migrate VM workloads to the provider.

This part follows on from the configuration completed in part 3 of this series and assumes that Tyrell (the customer site) have an existing virtual datacenter (VDC) environment available from MyCloud (the provider) and an appropriate Organization Administrator login to this environment. I’ve also created local DNS entries in the Tyrell network for the ‘chc.mycloud.local’ and ‘vcde.mycloud.local’ DNS names which resolve to the public IP addresses for the MyCloud vCloud Director instance and the provider CX endpoint respectively. Obviously in the real world these would be registered Internet DNS names.

In the Tyrell vCenter server when we select the ‘vCloud Director Extender’ icon we are shown an initial view of the CX plugin interface:

Selecting the ‘New Provider Cloud’ button opens a wizard to configure the connection to the Cloud Provider endpoints:

The ‘Provider Cloud URL’ needs to be set to include the appropriate path for the vCloud Director Organisation which is being connected to (the /cloud/org/Tyrell part in this example). The user details hold the Organization Administrator role within this cloud organisation.

When clicking ‘Add’ you will be presented with a certificate warning if the cloud provider is not using trusted/signed certificates, you can optionally select to trust these certificates if this is the case (very handy for a lab environment).

You can use the ‘Test’ button to confirm the settings are valid – you will see a status update at the bottom of the dialog showing the status of this test:

Note that even if the ‘Test’ succeeds, there are still some circumstances to do with network connectivity that can result in the enablement process failing – this is shown in the following capture from the ‘Provider Clouds’ tab where you can see the ‘Status’ shows ‘Enable Failed’:

This is usually caused by incorrect firewall rules, NAT rules or Public Endpoint URL’s set incorrectly when the CX appliances are deployed, I’m intending to cover this in a future ‘Troubleshooting’ part to this series of posts.

Once the networking and URLs are configured correctly you will see the new provider cloud registered under the ‘Provider Clouds’ tab with a status of ‘Running’, you will also see any accessible virtual datacenters (vDC) to which you have access:

Now that our provider cloud is properly registered, we can submit a migration request using the ‘Migrations’ tab in the CX interface, first we will be asked if we wish to perform a ‘Cold’ or ‘Warm’ migration – the differences between these are well explained in the dialog. Note that ‘Warm’ migration is not a vMotion, but does involve a period of network disconnection as the VM is cutover to the Cloud Provider. For this example we’ll select a ‘Warm’ migration:

Clicking ‘Next’ takes us to an inventory view where we can select the source VM(s) to be migrated. The grey panel below the ‘Inventory Browser’ dynamically expands to show candidate VMs from the vCenter environment. When a VM is selected the status and disk sizes are update in the right-side panel. For this example we’ve selected the ‘deckard’ VM:

Clicking ‘Next’ takes us on to the Target selection – here we can select the Cloud Provider, vDC, VM storage profile for the remote copy and the network to be connected to the VM in the Cloud Provider. Note that we are not L2-extending our on-premises network in this example and relying on our Cloud Provider (MyCloud) having already defined an Org vDC network for us (in this case called ‘Tyrell Servers’). All of the values are populated automatically from the vCloud Director environment and drop-downs allow easy select of other options. Finally we have the option when migrating multiple VMs together to group these into a single vApp rather than creating a new vApp for each VM:

In the final migration configuration step we can specify when the VM synchronisation should start, what our target Recovery Point Objective (RPO) is in minutes and whether to provision the destination disks as ‘Thin’ provisioned or ‘Thick’ provisioned. Finally we can add an optional tag to reference against this job later:

If everything has worked, you’ll now see a progress indicator against the VM in the Migrations tab. Initially the status will be ‘Created’:

Once data synchronisation begins this status will be updated to show the synchronised percentage for the migration. If you get an ‘Error’ prior to the sync percentage moving from 0% this is almost certainly a network configuration issue (and one which I encountered frequently when first building my lab environment). I’ll cover the common causes and remedies for this more in my vCloud Extender Troubleshooting post.

Once the initial synchronisation process has completed you will see the VM listed as ‘Cutover ready’ which means it’s staged and ready to be migrated:

Logging in to the Tyrell vCloud Director portal at this point shows that nothing actually has been provisioned into the Tyrell VDC:

Looking at the ‘Home’ page for the CX environment in vCenter shows our VM as in a ‘Transition’ state:

In the Migrations tab we can now select the ‘Start Cutover’ button to actually cutover the VM to the Cloud Provider environment which opens the Cutover dialog:

Clicking ‘Start’ asks for confirmation and then performs the actual cutover to running the VM in the Cloud Provider datacenter, progress is updated during the cutover procedure:

When the cutover process is complete you will see the Status update:

Looking in vCenter at this point shows the original VM still in place, but now powered off, you should probably take steps to ensure that this VM cannot be accidentally started at this point or risk having two running instances of the same VM (potentially on the same network if your network is extended to the Cloud Provider):

Refreshing the Tyrell vCloud Director portal shows the migrated VM now running in the Tyrell Cloud Provider VDC:

The status in the vCloud Extender vCenter plugin also now shows the completed migration total:

In the next part of this series of articles I look at the options to extend L2 networking directly from a customer site into vCloud Director using CX and the changes this introduces into the migration workflow.

Link back to Part 3 || Link to Part 5

As always, corrections, comments and feedback are always appreciated.

Jon.

vCloud Director Extender – Part 3 – Tenant Setup

In part 1 and part 2 of this series I detailed an overview of VMware vCloud Director Extender (CX) and the configuration from a Service Provider perspective to configure their platform to support CX.

This third article in the series details the configuration steps required for a tenant/customer environment to deploy and configure CX into their environment.

Once a service provider configuration is complete, any customers of that provider with sufficient allocated resources in a Virtual Datacenter (VDC) can configure the tenant CX environment and connect this to their vCenter environment. Once complete they will be able to migrate and replicate vSphere VMs between their own vCenter and the service provider datacenter extremely easily. Optionally they can use L2VPN functionality to stretch their networks into the Cloud Provider’s datacenter removing the requirement to have a pre-configured network in place. Of course many customers will wish to move to dedicated networking later, but having the initial ability to quickly provision their networks into a Cloud provider can dramatically shorten migration timeframes.

The initial deployment steps for customers deploying CX are exactly the same as for a Service Provider – download (or have provided to them by their Cloud Provider) the ova appliance for vCloud Director Extender and deploy this into their vCenter environment.

Right-clicking on the desired location and selecting ‘Deploy OVF Template…’ allows the local CX .ova file to be selected

The appliance name and folder are selected next:

Followed by the vCenter Cluster which will run the deployed appliance:

Check the template details and then click ‘Next’ to continue:

Read and accept the VMware license agreement:

Next select the Datastore storage on which the appliance will be deployed:

Select the required network for the appliance:

Make sure that ‘cx-connector’ (default) is selected for the ‘Deployment Type’ and fill out the IP addressing information for the appliance:

Check the summary information carefully and click ‘Finish’ to begin the deployment operation:

Once the appliance deployment task has configured, power-on the deployed VM in vCenter and wait for it to initialise. When it is running you can open a web browser to the IP address you configured for the appliance and login using the password configured. Note that you have to add ‘/ui/mgmt’ to the login URL for the appliance, so the full URL will be ‘https://<IP address of appliance>/ui/mgmt’:

The initial CX dialog when logged in allows you to start the Setup Wizard, note that in contrast to the Service Provider UI, there is no ‘Replication Managers’ tab in the cx-connector configuration:

The first step of the wizard is to link to the existing on-premise vCenter environment, note that if you are using an external Platform Services Controller (PSC) you will need to specify the PSC URL for the Lookup Service URL (although this is optional). The user specified needs to have administrative permissions within the vCenter environment:

Once the vCenter details and credentials are accepted, CX will provide a success notification, click ‘Next’ to continue:

The next page asks you to register the CX plugin with vCenter, this will likely become important in future as CX is updated, but for now leave the Version as 1.0.0 and click ‘Next’:

Once the plugin has registered into vCenter you will see a success notification. In testing I found that if the CX plugin had previously been registered with the vCenter (and not manually removed), this step would generate an error notification, but it was still possible to continue with the wizard and everything appeared to function fine afterwards:

Next you need to provide the configuration for the ‘Replicator’ appliance that will be deployed into the on-premise vCenter. The VMware documentation advises not to use DHCP for this and to manually specify a static IP configuration:

The ‘Replicator’ appliance is now deployed into vCenter and powered on. Once it has established network communication with the CX environment you will see a success notification:

The next step is to activate the Replicator appliance by providing a root password and authentication details for the on-premise vCenter environment. Note that you will need to set the Public Endpoint URL correctly in order for the appliance to be reachable by your cloud provider. If the on-premise Replicator appliance is behind a corporate firewall (as most will be), you will need to configure inbound firewall and translation rules and make sure this field is set correctly.

In my lab setup I configured the replicator public URL to be on port 443 on the public (Internet) address of the outside of the Tyrell firewall and used NAT port translation (see the networking configuration information below).

If everything is accepted you’ll receive a success notification in the wizard (note that I blanked the Public Endpoint URL field in this capture which is why it doesn’t show in the grab below):

The wizard is now complete, click ‘Finish’ to return to the UX interface:

The ‘vCenter Management’ tab should now show the on-premise vCenter details

The ‘Replicators’ tab should show the details for the replicator appliance deployed in the wizard:

Once vCenter has been closed and restarted you should now see a new ‘vCloud Director Extender’ item in the UI:

The networking configuration for a customer environment is a little simpler than for the cloud provider side, you will need to permit 2 inbound ports through the firewall, both of which need to communicate directly with the ‘Replicator’ appliance.

Assuming that you configured the ‘Public Endpoint URL’ with port 443, you will need to use NAT translation to divert this to port 8043 on the appliance:

Source Address Destination Destination Port/Protocol Translated Port/Protocol Translated Internal Address
External (Internet) Public IP Address 443/tcp 8043/tcp Replicator appliance internal address
External (Internet) Public IP Address 44045/tcp 44045/tcp Replication appliance internal address

You can (and should) limit the public/external addresses permitted to communicate with your Replicator appliance to just those public IP addresses used by your Cloud Provider – they should be able to provide you with this information.

Also note that if you restrict outbound internet traffic from your CX network you will also need to permit the following traffic in an Outbound direction:

Source Destination Source Port/Protocol Destination Port/Protocol Description
CX Server Network Cloud Provider Public CX Address Any 443/tcp Required for communications with the provider CX appliance
CX Server Network Cloud Provider Public CX Address Any 8044/tcp Required for communications with the provider Replication Manager appliance
CX Server Network Cloud Provider Public CX Address Any 44045/tcp Required for communications with the provider Replicator appliance

Of course if your provider has configured different ports for these components you will need to allow access to these instead of the defaults listed.

In the next part of this series I’ll continue with configuring the customer environment to connect to a cloud provider CX environment and to migrate some VMs.

Link back to Part 2 || Link to Part 4

As always, corrections, comments and feedback are always appreciated.

Jon.

vCloud Director Extender – Part 2 – Cloud Provider Setup

In the first part of this series of articles I described the new vCloud Director Extender (CX) software released by VMware. In this article I will show the steps required to install and configure the software from a Cloud Provider perspective. Included in this will be the necessary network and firewall configuration required.

vCloud Director Extender is supplied as a single .ova appliance from the VMware download site (login required). The download is located in the ‘Drivers & Tools’ section of the vCloud Director for Service Providers v9.0 page:

The ova file will generate the 3 different server components required to create a functional deployment:

CX Cloud Service The main vCloud Director Extender appliance, this is used to provide the UI for setup/configuration. This is the appliance initially deployed from the vCloud Director Extender appliance download package.
Cloud Continuity Manager (CCM) This component (also known as the ‘Replicator Manager’) is the operational manager of the deployment. CCM only runs in provider deployments and manages the replicator (CCE) appliances. CCM appliances are deployed and managed by the CX appliance (no additional download is required).
Cloud Continuity Engine (CCE) This component (also known as the ‘Replicator’) is the transfer engine that deals with data transfers between the customer and provider environments. CCE runs in both the provider and client environments. CCE appliances are deployed and managed by the CX appliance (no additional download is required).

The downloaded CX appliance is deployed from vCenter, the first selection allows you to specify the VM name and datacenter/folder location to deploy. In most service providers this would likely be the management cluster for their environment (as opposed to resource vcenters used for customer workloads)

Next you select which cluster/resource pool to deploy the CX appliance into:

A Review screen is presented which allows you to confirm the ova details:

And of course we have to read/accept the license agreement:

Next we select the datastore location for deployment:

And the internal network which the appliance will be connected to:

Make sure in the ‘Customize template’ screen (below) you change the ‘Deployment Type’ to ‘cx-cloud-service’ and don’t leave the default selection (cx-connector) selected as this will install the customer/tenant environment instead of the service provider configuration! The rest of the configuration options on this page are straightforward:

A summary screen is displayed showing a summary of the customization options selected, check these carefully as if they are wrong you’ll probably have to re-deploy from scratch:

Once the appliance is deployed, you will need to manually power it on from the vSphere client (or I did anyway – not sure if this is by design or not). Once it has booted and configured itself it will show the browser link to access to begin the environment configuration:

Note that if you open a page to just the hostname/IP address you’ll get an error, you must include the ‘/ui/mgmt’ suffix to the URL. You can now login with the ‘initial root login’ password you configured during the ova deployment. As you can see from the screen grab below I pre-configured DNS entries for the 3 provider components and used these wherever possible to avoid IP address confusion:

The main screen opens to the Setup Wizard, the tabs at the top of the screen allow you to easily navigate between sections, but these won’t show much until you complete the wizard:

Clicking on the ‘Setup Wizard’ opens a series of dialogs to provide the initial system configuration, first we have to specify the management vCenter authentication details. Note that the ‘Lookup Service URL’ as well as being optional also requires the path to the Platform Services Controller (PSC) if you are using external PSCs. The full path is truncated in this grab but should be https://<psc or vcenter with embedded psc address>/lookupservice/sdk:

The wizard includes very useful feedback at each step to show you if the previous actions have been successful or not, just click ‘Next’ through if everything is ok, or go back and fix the issue if not:

Now we need to provide a ‘system’ (administrator) level login to vCloud Director, you don’t need to specify the @system part of the user name here:

Again we get confirmation that we’ve successfully linked to vCloud Director and can continue with ‘Next’:

Next we can add the resource vCenters (where customer workloads actually run). In my lab environment this is the same vCenter that supports the management environment so the details are the same, but in production environments this will almost certainly be different. The setup wizard is intelligent enough to retrieve the names of any vCenter servers being used in Provider VDCs (pVDCs) in vCloud Director so for these you only need to ‘Update’.

When you click update you’ll be asked to provide administrator credentials to the resource vCenter environment. Be careful here as the default ‘Lookup Service URL’ will be set to the vCenter name, even if the vCenter is using an external Platform Services Controller (PSC) as mine was and will need to be manually edited to point to the PSC. This caught me out initially and I couldn’t work out why authentication to the resource vCenter was failing.

Once the resource vCenter(s) are authenticated they’ll show as ‘Registered’ in the wizard:

Next we need to configure the 2nd appliance configuration – this will be the ‘Replication Manager’ (also called the Cloud Continuity Manager / CCM in the documentation). We need to specify the parameters shown (the dialog scrolls down and also asks for default gateway address, DNS server address and netmask).

The wizard will now deploy and start up the replication manager appliance on the vCenter specified. If the networking information is incorrect the process will stall at this point as the wizard relies on establishing network connectivity with the replication manager before continuing. A status update is given at the top of the dialog as the appliance is deployed and started up. Once the replication manager appliance is running and seen on the network you’ll see the success message:

Next the replication manager appliance must be ‘activated’ by setting the password for the root user and the ‘Public Endpoint URL’. Make sure you set this to the correct external (public) IP address that your customers will be using to connect to your CX environment. I haven’t found any way yet to alter this setting after deployment if specified incorrectly without deleting the entire CX environment and starting over (the xx’s in this grab are simply to hide the real internet addressing I was using – I’m also pretty sure I eventually used the default port of 8044 for this public URL):

If everything has gone ok, you’ll get the screen below showing that the replication manager deployment has succeeded and you can move on to the replicator configuration:

The deployment details for the Replicator are specified next – the wizard helpfully copies across some of the settings from the Replication Manager deployment, but you still need to specify the (unique) IP and Netmask details:

The Replicator appliance will now be deployed in vCenter in exactly the same way as the Replication Manager was previously. Once it becomes available on the network the wizard will detect this and show the screen below:

Next we have to ‘Activate’ the Replicator appliance by completing the settings shown below to authenticate to the resource vCenter which this Replicator will be responsible for.

If everything worked ok you’ll get a ‘Successfully Activated’ message:

Clicking ‘Next’ takes you to the ‘Complete’ screen and shows that if you have additional Resource vCenters you’ll need to deploy additional Replicator appliances for these (1 per vCenter):

Clicking through the tabs in the management UI should now show that all the required CX components are now deployed and registered. The ‘Cloud Resoures’ tab shows linked vCloud Director instances and resource vCenters:

The ‘Replication Manager’ tab shows the deployed Replication Manager appliance:

Th ‘Replicators’ tab shows the deployed Replicator appliance(s) – 1 per resource vCenter if you have multiples of these.

That completes the appliance installation and initial configuration, next you will need to configure appropriate NAT/firewall rules so that customers on the internet can connect to your new CX service!

Assuming that you wish to use a single external (public) Internet IP address for the entire CX service, the configuration is a little tricky since traffic will need to be directed to either the CX, Replication Manager or Replicator appliance depending on what port it is attempting to aceess. The NAT/Firewall rules that I worked out from the documentation and found that worked are:

Source Address Destination Destination Port/Protocol Translated Port/Protocol Translated Internal Address
External (Internet) CX Service Public IP Address 443/tcp 443/tcp CX (vCD Extender) appliance internal address
External (Internet) CX Service Public IP Address 8044/tcp 8044/tcp Replication Manager appliance internal address
External (Internet) CX Service Public IP Address 44045/tcp 44045/tcp Replicator appliance internal address

Also note that if you restrict outbound internet traffic from your CX network you will also need to permit the following traffic in an Outbound direction:

Source Destination Source Port/Protocol Destination Port/Protocol Description
CX Server Network External (Internet) Any 443/tcp Required for CX to be able to communicate with customer Replicator management interface
CX Server Network External (Internet) Any 44045/tcp Required for CX to be able to communicate with customer Replicator data interface

In the next part of this series of articles I’ll continue with the installation and configuration of the CX components required on the customer / tenant site.

Link back to Part 1 || Link to Part 3

As always, corrections, comments and feedback are always appreciated.

Jon.

vCloud Director Extender – Part 1 Overview

Last week VMware released version 1.0 of the new vCloud Director Extender (CX) (link to documentation set). This provides some extremely flexible options for customers to migrate servers to/from a vCloud service provider cloud platform, including the use of L2VPN to transparently stretch their on-premise networks to the cloud provider. Together with a ‘warm’ cutover feature, this enables any customer with an appropriately configured vCloud tenancy and resources to safely and easily move their virtual servers to the most suitable hosting location with minimal application downtime.

As always, there are a few pre-requisites:

– The customer site must be running vSphere 6 Update 3 or later (6.5.0 and 6.5 Update 1 are also both supported).
– If the customer wishes to use L2VPN network extension and is already running VMware NSX, this must be v6.2.8 or v6.3.2.
– The cloud provider must be running vCloud Director v8.20 or v9.0.

Deployment of the replication environment is different for the Cloud Provider and tenant (as you would expect) and firewall rules and address translation need to be appropriately configured to permit the required traffic flows at both the provider and customer end.

This series of articles will detail the installation and configuration of vCloud Director Extender and is intended to be useful for both Cloud Providers needing to configure their own environments to support CX and for customers wishing to configure their environments to allow migration to/from a CX-enabled provider.

The environment that I will be describing and building through this series is shown in the graphic below, Tyrell Corporation is the client organisation and MyCloud is the Cloud Provider which Tyrell wish to use to host 3 of their production VMs (‘Deckard’, ‘Rachael’ and ‘Roy’). In this example Tyrell and MyCloud happen to use different internal IP network ranges, but that is not a requirement to use CX since NAT firewalls are in place at both organisations.

Since I built this environment using ‘real’ public Internet addresses and VMware NSX edge gateways as the firewalls for both Tyrell and MyCloud, I have stripped the public IP addresses from the configurations shown in these articles, but it should be easy to see where these are substituted.

I’m expecting this series to consist of 6 parts eventually including this introduction:

Part 1 – This overview
Part 2 – Cloud Provider / Service Provider installation and configuration (MyCloud)
Part 3 – Customer / Tenant installation and configuration (Tyrell)
Part 4 – Customer / Tenant connecting to a Cloud Provider and Virtual Machine migration (Tyrell)
Part 5 – Stretched networking (L2VPN) configurations
Part 6 – Troubleshooting

I’m still working on the later parts of this series so check back if I haven’t published all of them yet.

Link to Part 2

As always, corrections, comments and feedback are always appreciated.

Jon.