Tenant Portal Displays ‘No Datacenters are available’ in vCloud Director 9.1

We had an issue recently when updating our vCloud Director environment to v9.1 where the new tenant portal would show ‘No Datacenters are available’ for every tenant even though the remainder of the site worked correctly (and other tabbed options like the Service Library & catalogs worked fine). Initially we suspected that our SSL certificate chain or public URI’s were set incorrectly.

Adrian Begg has a great blog post here: http://www.pigeonnuggets.com/2018/03/vcloud-director-9-1-tenant-portal-displays-no-datacenters-available-after-upgrade/ which details this issue and how to ensure the correct settings are applied, however in our case this didn’t resolve our issue.

Eventually an offhand remark in a slack channel by Tom Fojta put me on the right track to solving the issue, I’ve written this post up in case anyone else comes across the same issue. If you’re impatient and want to know the solution – it’s DNS (isn’t it always DNS?), but that’s jumping ahead a bit.

In our environment we have 3 vCloud Director cell servers behind a load balancer, we also load-balance internally so that our management environment can talk to the vCD API and we can conduct testing of the environment without necessarily having it open to the public internet. The arrangement looks logically like this:

 

vCloud Director Load Balancing

Users from the internet accessing ‘portal.cloud.com’ get redirected to one of the vCD cell servers (and if one of them is unavailable the monitoring on the Load Balancer doesn’t direct requests there). The same happens for internal users, but in this case the ‘portal.cloud.com’ DNS entry has been overridden to point at the internal (192.168.0.10) address to allow connectivity to the cells even if the external LB or internet link is unavailable.

The issue in our environment was that the cell servers themselves use DNS to access the vCloud API – and they use the public URL specified in the vCloud Director configuration.

The cell servers were configured with our internal DNS servers, so when they attempted to access the public URL (‘portal.cloud.com’) were being given the internal Load Balancer address (192.168.0.10). For reasons we’re still exploring, this didn’t allow them to get a response from the vCD API and resulted in the ‘No Datacenters are available’ error in the tenant portal.

The fix turned out to be reasonably simple – on each cell server we added an entry to the /etc/hosts file to resolve the public URL to the cell’s own IP address, so on cell 01:

192.168.0.11    portal.cloud.com

On cell02:

192.168.0.12   portal.cloud.com

And on cell03:

192.168.0.13    portal.cloud.com

Once we’d made this change the tenant portals began functioning correctly (note that no restart of the cell servers or vCloud Director services was required).

What I assume is happening is that when the internal load balancer responds the the request it gives out a different cell server’s address (since the ‘source’ of the request will be a cell server) and that cell server has no knowledge of the session being used by the original cell and so responds incorrectly (either with nothing, or with an error). Not sure if this is actually a bug, or just something to be aware of, but either way overriding name resolution in this way fixes the issue. Note that simply using ‘localhost’ or 127.0.0.1 for the hosts file entry doesn’t work since the vCloud web server doesn’t respond on the loopback interface in the default configuration.

Just posting this here in the hope it will save someone else any frustration caused by this issue.

Jon.

Installing Microsoft Azure Stack TP2 on VMware ESXi

This week Microsoft released Technical Preview 2 (TP2) of their ‘Cloud in a box’ Azure Stack product. This is scheduled for release in mid-2017 to allow enterprises and service providers to run Azure consistent services from their own datacenters.

TP2 has a number of additional features over TP1 released earlier this year, but doesn’t support installation as a virtual machine. The hardware requirements are detailed here but basically you’re going to need a reasonably good spec server with enough local hard disks to be able to install it.

As I had good success running the previous TP1 release of Azure Stack in a virtual machine I thought I’d see if the same could be done with TP2. As with TP1, installation is only supported for a single machine node (clustered multi-node deployments are likely to come with TP3).

Of course installing TP2 as a VM is completely unsupported by both Microsoft and VMware so please don’t bug them with any issues – since TP2 is definitely not for production use this shouldn’t be a huge concern.

After several failed attempts I finally worked out a method to allow installation of TP2 as a virtual machine using VMware ESXi 6.0 Update 2 as the hypervisor platform. The key is in building the host virtual machine correctly and in modifying a couple of places in the installation PowerShell scripts to bypass the checks for physical hardware.

To start, create a new virtual hardware v11 Windows VM with appropriate sizing (I used a 200GB system disk, 128GB of RAM and 12 CPU cores configured as a single socket / 12 cores arrangement).

I then made the following changes:

  • Add a new SCSI host bus adapter and set the ‘bus sharing’ for this adapter to ‘Physical’ – this is required to allow the Storage Spaces Direct (S2D) configuration in the Azure Stack installer to correctly configure clustered storage.
  • Make sure that ‘Expose hardware assisted virtualization to the guest OS’ option is enabled to allow the VM to run the Hyper-V role and nested VMs.
  • Add 4 new virtual hard disks of at least 150GB size each (I used 200GB for each disk again) and configure these as ‘Thick provisioned eager zeroed’ and make sure they are attached to the new (physical bus sharing) SCSI adapter.
  • Use a single VMXNET3 network adapter connected to a network that has a DHCP server available on it.
  • Change portgroup security for the network to which the VM is attached to allow ‘Promiscuous Mode’, ‘MAC address changes’ and ‘Forged Transmits’.
  • Set the VM to boot to BIOS on next power up and when it boots make sure to set the BIOS date/time to match your current timezone date/time.

Next power on and install a base operating system on the VM (I used Server 2012 R2, but it really doesn’t matter as this environment is only used to bootstrap the installation process).

Once the server is running, download  and unpack Azure Pack TP2 and move the extracted ‘CloudBuilder.vhdx’ file to the root of the C:\ drive. Following the Microsoft instructions to download and run the ‘PrepareBootFromVHD.ps1’ script which will reconfigure the VM to boot from the CloudBuilder.vhdx file and restart the VM.

Note: Depending on disk speed It can take a considerable time to extract CloudBuilder.vhdx from the TP2 archive – you might want to keep a copy of it elsewhere on your network (or on the VM disk if you have space) in case you need to restart the installation from scratch.

Once the VM is up and running from the TP2 CloudBuilder vhdx image, make the following changes:

  • Install VMware Tools (required to add the VMXNET3 network driver) and restart when prompted. – See note below, E1000 network adapter may be a better choice.
  • (Optional) Rename the computer and restart when prompted.
  • (Optional) Change the VM’s IP address to a static IPv4 address (rather than just using DHCP) so you can easily locate it on the network later – note that DHCP is still a requirement for the other VMs unless you use the Microsoft documented installer switches to allow use of a static addresses.
  • Make sure that the date/time and timezone are set correctly and match the VM BIOS setting (Can’t stress this enough, I had at least 3 failed installation attempts due to date/time problems).
  • Make the following changes to the C:\CloudDeployment\Roles\PhysicalMachines\Tests\BareMetal.Tests.ps1 file:

Line 376:
Change:
$physicalMachine.IsVirtualMachine | Should Be $false
To:
$false | Should Be $false

Line 453:
Change:
($physicalMachine.Processors.NumberOfEnabledCores | Measure-Object -Sum).Sum | Should Not BeLessThan $minimumNumberOfCoresPerMachine
To:
12 | Should Not BeLessThan $minimumNumberOfCoresPerMachine

Then save the file. (The second change should not be necessary if you’ve built the VM with at least 12 cores, but the installation script appears to detect the number of physical cores as ‘0’ in a VM so this is required).

You should now be able to run the CloudDeployment\Configuration\InstallAzureStackPOC.ps1 script and everything should work…..

The installation process will take a considerable time, but hopefully if you’ve configured everything correctly you’ll have a working Azure Stack TP2 installation at the end with all of the required infrastructure servers running as Hyper-V guests within the VM.

NOTE: I hit an issue with installation failing at step 60.61.93 and thought this was related to installing in a VM, but it appears this is a more general issue with TP2 installation – see this MSDN thread for possible solutions if you encounter this error. If you encounter any issues with the installation I also recommend following the troubleshooting advice here.

Best of luck trying this out for yourselves!

Update 7th Oct 2016: If you’re having issues with guest (nested Hyper-V) VMs crashing, try using the E1000 network adapter for the host instead of VMXNET3, I’ve been doing some testing with this and E1000 may be a better option and prevent this occurring.

Jon