Tag Archives: VMware

Connecting VMware Cloud on AWS to Amazon EC2

This post demonstrates the connectivity between VMware Cloud (VMC) on AWS and native AWS services. In the example below we will be using Amazon Elastic Compute Cloud (EC2) to provision a virtual instance backed by Amazon Elastic Block Store (EBS) storage. To complete the use case we will install Veeam and use the EC2 instance to backup virtual machines hosted in the VMware Cloud Software-Defined Data Centre (SDDC).

Connectivity Overview

  • VMware Cloud on AWS links with your existing AWS account to provide access to native services. During provisioning a Cloud Formation template will grant AWS permissions using the Identity Access Management (IAM) service. This allows your VMC account to create and manage Elastic Network Interfaces (ENI) as well as auto-populate Virtual Private Cloud (VPC) route tables.
  • An Elastic Network Interface (ENI) dedicated to each physical host connects the VMware Cloud to the corresponding Availability Zone in the native AWS VPC. There is no charge for data crossing the 25 Gbps ENI between the VMC VPC and the native AWS VPC, however it is worth remembering that data crossing Availability Zones is charged at $0.01 per GB (at the time of writing).
  • The example architecture we will be using is shown below. For more information see VMware Cloud on AWS Migration Planning.

VMC_Connectivity

Security Group Configuration

AWS Security Groups will be attached to your EC2 instances and ENIs, it is therefore vital that you fully understand the concepts and configuration you are implementing. Please review Understanding AWS Security Groups with VMware Cloud on AWS by Brian Graf.

In the AWS console Security Groups can be accessed from the EC2 service. In this example I have created a security group allowing all protocols (any port) inbound from the source CIDR block used in VMC for both my compute and management subnets. In other words this is allowing connectivity into the EC2 instance from VM in my VMC SDDC. You may want to lock this down to specific IP addresses or ports to provide a more secure operating model. Outbound access from the EC2 instance is defined as any IPv4 destination (0.0.0.0/0) on any port.

Veeam_SG

I have also changed the default security group associated with the ENIs used by VMC to a custom security group. The security group allows inbound access on the ENI (which is inbound access to VMC as explained in the article below) on all ports from the source CIDR block of my native AWS VPC. Outbound access which is from VMC into AWS is defined as any IPv4 destination (0.0.0.0/0) on any port.

ENI_SG

EC2 Deployment

Log into the VMware on AWS Console, from the SDDCs tab locate the appropriate SDDC and click View Details. Select the Networking & Security tab. Under System click Connected VPC. Make a note of the AWS Account ID and the VPC ID. You will need to deploy an EC2 instance into this account and VPC.

Log into the AWS Console and navigate to the EC2 service. Launch an EC2 instance that meets the System Requirements for Veeam. In this example I have used the t2.medium instance and Microsoft Windows Server 2019 Base AMI. When configuring network the EC2 instance must be in the VPC connected to VMC. I have added an additional EBS volume for the backup repository using volume type General Purpose SSD (gp2). Ensure the security group selected or created allows the relevant access.

Gateway Firewall

In addition to security group settings inbound access also needs allowing on the VMC Gateway Firewall. In this instance as we are connecting the EC2 instance to the vCenter we define the rule on the Management Gateway. If we were connecting to a workload in one of the compute subnets the rule would be defined on the Compute Gateway. You may have noticed that although I allowed any port in the AWS Security Groups, the actual ports allowed can also be defined on the Gateway Firewall.

In this example I have added a new user defined group which contains the private IPv4 address for the EC2 instance and added it as a source in the vCenter Inbound Rule. The allowed port is set to HTTPS (TCP 443) – I have also allowed ICMP. I have added the same source group to the ESXi Inbound Rule which allows Provisioning (TCP 902). Both these rules are needed to allow Veeam to backup virtual machines in VMC.

VMC_GW_FW

Veeam Setup

Now that connectivity between the EC2 instance and the VMC vCenter has been configured I can hop onto the EC2 instance and begin the setup of Veeam. I will, of course, need an inbound rule for RDP (TCP 3389) adding to the security group of the EC2 instance, specifying the source I am connecting from.

Follow the installation steps outlined in the Veeam Backup & Replication 9.5 Update 4 User Guide for VMware vSphere.

Veeam_1

In the VMC console navigate to the Settings tab of the SDDC and make a note of the  password for the cloudadmin@vmc.local account. Open the Veeam Backup & Replication console and add the vCenter private IP address, use the vCenter cloud admin credentials.

Veeam_2

Add the backup repository using the EBS volume and create a backup job as normal. Refer to the Veeam Backup Guide if you need assistance with Veeam.

Veeam_3

To make use of S3 object storage AWS you will need an IAM Role granting S3 access, and an S3 VPC Endpoint. In the case of VMC, as an alternative design, you can host the Veeam B&R server inside your VMC SDDC to make use of the built in S3 endpoint. In testing we found backup speeds to be faster but you will likely still need an EBS backed EC2 instance for your backup repository. It goes without saying you should make sure backup data is not held solely on the same physical site as the servers you are backing up. See Veeam KB2414: VMware Cloud on AWS Support for further details.

Add a new Scale-Out Backup Repository and follow the steps to add account and bucket details.

Set an appropriate policy for moving backups to object based storage, once this threshold is met you will start to see Veeam files populating the S3 bucket.

S3_repo

VMware Cloud on AWS Migration Planning

This post pulls together the notes I have made during the planning of VMware Cloud (VMC) on AWS (Amazon Web Serivces) deployment, and migration planning of virtual machines from traditional on-premise vSphere infrastructure. It is intended as a list of considerations and not a comprehensive guide. For more information on VMware Cloud on AWS review the following resources:

VMware Cloud on AWS Demo | VMware Cloud on AWS VideosVMware Cloud on AWS Operations Docs | YouTube PlaylistsRoadmap | VMworld 2018 Recorded Sessions | AWS FAQs

Capacity Planning

  • At the time of writing up to 10 SDDC’s can be deployed per organisation, each SDDC supporting up to 10 vSphere clusters and each cluster up to 16 physical nodes.
  • The standard I3 bare metal instance currently offers 2 sockets, 36 cores, 512 GiB RAM, 10.7 TB vSAN storage, a 16-node cluster provides 32 sockets, 576 cores, 8192 GiB RAM, 171.2 TB.
  • New R5 bare metal instances are deployed with 2.5 GHz Intel Platinum 8000 series (Skylake-SP) processors; 2 sockets, 48 cores, 768 GiB RAM and AWS Elastic Block Storage (EBS) backed capacity scaling up to 105 TB for 3-node resources and 560 TB for 16-node resources.
  • When deploying the number of hosts in the SDDC consider the pay as you go pricing model and ability to scale out later on-demand; either manually or using Elastic DRS which can optimised for performance or cost.
  • A really useful tool for VMC planning is the VMware Cloud on AWS Sizer and TCO calculator.
  • The What-If analysis in both vRealize Business and vRealize Operations can also help with capacity planning and cost comparisons for migrations to VMware Cloud on AWS. Use Network Insight to understand network egress costs and application topology in your current environment, see Calculate AWS Egress Fees Proactively for VMware Cloud on AWS for more information.

Highly Available Deployments

  • An SDDC can be deployed to a single Availability Zone (AZ) or across multiple AZ’s, otherwise known as a stretched cluster. For either configuration if a problem is identified with a host in the cluster High Availability (HA) evacuation takes place as normal, an additional host is then automatically provisioned and added as a replacement.
  • The recommendation for workload availability is to use a stretched cluster which distributes workloads across 2 Availability Zones with a third hosting a witness node. In this setup data is written to both Availability Zones (synchronous write replication) in an active active setup; in the event of an outage to an entire Availability Zone vSphere HA brings virtual machines back online in the alternative AZ.
  • Stretched clusters provide a Recovery Point Objective (RPO ) of zero by using synchronous data replication. Note that there may be additional cross-AZ charges for stretched clusters.
  • The decision on whether to use single or multiple Availability Zones needs to be taken at the time of deployment. An existing SDDC cannot be upgraded to multi-AZ or downgraded to a single AZ.

Placement Planning

  • VMware Cloud on AWS links with your existing AWS account to provide access to native services. During provisioning a Cloud Formation template will grant AWS permissions using the Identity Access Management (IAM) service. This allows your VMC account to create and manage Elastic Network Interfaces (ENI’s) as well as auto-populate Virtual Private Cloud (VPC) route tables when NSX subnets are created. It is good practise to enable Multi-Factor Authentication (MFA) for your accounts in both VMC and AWS.
  • Cloud Formation can also be used to deploy your SDDC if desired, review VMware Cloud on AWS Integrations with CloudFormation and the VMware Cloud on AWS Dev Center for more information.
  • An Elastic Network Interface (ENI) dedicated to each physical host connects the VMware Cloud to the corresponding Availability Zone in the native AWS VPC. There is no charge for data crossing the 25 Gbps ENI between the VMware Cloud VPC and the native AWS VPC.
  • Data that crosses Availability Zones however is charged at $0.01 per GB (at the time of writing), therefore it is good practise to deploy the SDDC to the same region and AZ as your current or planned native AWS services.
  • Microsoft SQL Server Workloads and VMware Cloud on AWS: Design, Migration, and Configuration is aimed at migrating SQL into VMC but also contains some useful architectural and operational guidelines so is worth a read.
  • Compute policies can be used to control the placement of virtual machines, see VMWARE CLOUD ON AWS – COMPUTE POLICIES – THE START OF SOMETHING GREAT! for more information.
  • An example architecture of a stretched cluster SDDC is shown below.

vmc_aws_part

Connectivity Planning

Migration Planning

  • If possible your migration team should be made up of the following: Infrastructure administrators for compute, storage, network, and data protection. Networking and Security teams for security and compliance. Application owners for applications, development, and lifecycle management. Support and Operations for automation, lifecycle, and change management.
  • Group services together based on downtime tolerance, as this could determine how the workload is moved: prolonged downtime, minimal downtime, and zero downtime.
  • Consider migration paths for any physical workloads, whether that be P2V, AWS Bare Metal instances, or co-locating equipment.
  • Consider any load balancing and edge security requirements. The AWS Elastic Load Balancer (ELB) can be used or alternative third party options can be deployed through virtual appliances. NSX load balancing as a service in VMC is planned for future releases.
  • You will likely still need Active Directory, DNS, DHCP, time synchronisation, so use native cloud services where possible, or migrate these services as VMs to VMC on AWS.
  • Remember Disaster Recovery (DR) still needs to be factored in. DR as a Service (DRaaS) is offered through Site Recovery Manager (SRM) between regions in the cloud or on-premise.
  • Make sure any existing monitoring tools are compatible with the new environment and think about integrating cloud monitoring and management with new or existing external tools.
  • Move backup tooling to the cloud and perform full backups initially to create a new baseline. Consider native cloud backup products that will backup straight to S3, or traditional backup methods that connect into vCenter. The reference architecture below has been updated to include Elastic Block Storage (EBS) backed Elastic Compute Cloud (EC2) instances running Veeam:

vmc_aws.png

For up to date configuration maximums and the latest features and information visit the VMware Cloud on AWS FAQs page. Up to date pricing for AWS services can be found at AWS Pricing. Most of the major compliance certification has been achieved at VMC on AWS data centres, see the VMware Cloud on AWS Meets Industry-Standard Security and Compliance Standards blog post for more information.

In addition, if you are working towards the VMware Cloud on AWS Management exam then review 5V0-31.19: VMware Cloud on AWS Management Exam 2019 – Study tips.

VMware Site Recovery Manager 8.x Upgrade Guide

This post will walk through an inplace upgrade of VMware Site Recovery Manager (SRM) to version 8.1, which introduces support for the vSphere HTML5 client and recovery / migration to VMware on AWS. Read more about what’s new in this blog post. The upgrade is relatively simple but we need to cross-check compatibility and perform validation tests after running the upgrade installer.

SRM81

Planning

  • The Site Recovery Manager upgrade retains configuration and information such as recovery plans and history but does not preserve any advanced settings
  • Protection groups and recovery plans also need to be in a valid state to be retained, any invalid configurations or not migrated
  • Check the upgrade path here, for Site Recovery Manager 8.1 we can upgrade from 6.1.2 and later
  • If vSphere Replication is in use then upgrade vSphere Replication first, following the steps outlined here
  • Site Recovery Manager 8.1 is compatible with vSphere 6.0 U3 onwards, and VMware Tools 10.1 and onwards, see the compatibility matrices page here for full details
  • Ensure the vCenter and Platform Services Controller are running and available
  • In Site Recovery Manager 8.1 the version number is decoupled from vSphere, however check that you do not need to perform an upgrade for compatibility
  • For other VMware products check the product interoperability site here
  • If you are unsure of the upgrade order for VMware components see the Order of Upgrading vSphere and Site Recovery Manager Components page here
  • Make a note of any advanced settings you may have configured under Sites > Site > Manage > Advanced Settings
  • Confirm you have Platform Services Controller details, the administrator@vsphere.local password, and the database details and password

Download the VMware Site Recovery Manager 8.1.0.4 self extracting installer here to the server, and if applicable; the updated Storage Replication Adapter (SRA) – for storage replication. Review the release notes here, and SRM upgrade documentation centre here.

Database Backup

Before starting the upgrade make sure you take a backup of the embedded vPostgres database, or the external database. Full instructions can be found here, in summary:

  • Log into the SRM Windows server and stop the VMware Site Recovery Manager service
  • From command prompt run the following commands, replacing the db_username and srm_backup_name parameters, and the install path and port if they were changed from the default settings
cd C:\Program Files\VMware\VMware vCenter Site Recovery Manager Embedded Database\bin
pg_dump -Fc --host 127.0.0.1 --port 5678 --username=db_username srm_db > srm_backup_name
  • If you need to restore the vPostgres database follow the instructions here

In addition to backing up the database check the health of the SRM servers and confirm there are no pending reboots. Log into the vSphere web client and navigate to the Site Recovery section, verify there are no pending cleanup operations or configuration issues, all recovery plans and protection groups should be in a Ready state.

Process

As identified above, vSphere Replication should be upgraded before Site Recovery Manager. In this instance we are using Nimble storage replication, so the Storage Replication Adapter (SRA) should be upgraded first. Download and run the installer for the SRA upgrade, in most cases it is a simple next, install, finish.

We can now commence the Site Recovery Manager upgrade, it is advisable to take a snapshot of the server and ensure backups are in place. On the SRM server run the executable downloaded earlier.

  • Select the installer language and click Ok, then Next
  • Click Next on the patent screen, accept the EULA and click Next again
  • Double-check you have performed all pre-requisite tasks and click Next
  • Enter the FQDN of the Platform Services Controller and the SSO admin password, click Next
  • The vCenter Server address is auto-populated, click Next
  • The administrator email address and local host ports should again be auto-populated, click Next
  • Click Yes when prompted to overwrite registration
  • Select the appropriate certificate option, in this case keeping the existing certificate, click Next
  • Check the database details and enter the password for the database account, click Next
  • Configure the service account to run the SRM service, again this will be retain the existing settings by default, click Next
  • Click Install and Finish once complete

Post-Upgrade

After Site Recovery Manager is upgraded log into the vSphere client. If the Site Recovery option does not appear immediately you may need to clear your browser cache, or restart the vSphere client service.

SRM_81

On the summary page confirm both sites are connected, you may need to reconfigure the site pair if you encounter connection problems.

SRM_81_1

Validate the recovery plan and run a test to confirm there are no configuration errors.

SRM_81_2

The test should complete successfully.

SRM_81_5

I can also check the replication status and Storage Replication Adapter status.

SRM_81_4

Configuring vCenter 6.7 High Availability

The vCenter Server Appliance has provided vCenter High Availability (HA) with vSphere 6.5 onwards. In the fully functioning HTML5 release of vCenter 6.7 Update 1 onwards the setup of vCenter HA was hugely simplified. Read more about the improvements made in vSphere 6.7U1 in this blog post. By implementing vCenter HA you can protect your vCenter from host and hardware failures, and significantly reduce down time during patching due to the active / standby nature of the vCenter cluster.

The vCenter HA architecture is made up of the components in the vSphere image below. The vCenter Server Appliance is cloned out to create passive and witness nodes. Updated data is replicated between the active and passive nodes. In the event of an outage to the active vCenter the passive vCenter automatically assumes the active role and identity. Management connections still route to the same IP address and FQDN, however they have now failed over to the replica node. When the outage is resolved and the vCenter that failed comes back online; it then takes on the role of the passive node, and receives replication data from the active vCenter Server.

vCenter_HA

Requirements

  • vCenter HA was introduced with the vCenter Server Appliance 6.5
  • The vCenter deployment size should be at least small, and therefore 4 vCPU 16 GB RAM
  • A minimum of three hosts
  • The hosts should be running at least ESXi 5.5
  • The management network should be configured with a static IP address and reachable FQDN
  • SSH should be enabled on the VCSA
  • A port group for the HA network is required on each ESXi host
  • The HA network must be on a different subnet to the management network
  • Network latency between the nodes must be less than 10ms
  • vCenter HA is compatible with both embedded deployment model and external PSC
  • For further information on vCenter HA performance and best practises see this post

If you are configuring vCenter HA on a version of vCenter prior to 6.7 Update 1 then see this post. If you are configuring vCenter HA in a cluster with less than the required number of physical hosts, such as in a home lab, you can add a parameter to override the anti-affinity setting; see this post by William Lam.

Configuring vCenter HA

Log into the vSphere client and select the top level vCenter Server in the inventory. Click the Configure tab and vCenter HA. The vCenter HA summary page is displayed with a list of prerequisites, ensure these are met along with the requirements above. Click Setup vCenter HA.

vCenter_HA_1

Select the vCenter HA network by clicking Browse. Scroll down the vCenter HA resource settings, review the network and resource settings of the active node of the vCenter Server. Scroll down to the passive node and click Edit. Follow the on-screen prompts to select a folder location, compute and storage resources. Select the management and HA networks for the passive node, review the settings once complete and click Finish. Follow the same steps for the witness node.

vCenter_HA_2

On the IP settings page enter the HA network settings for the active, passive, and witness nodes. Click Finish.

vCenter_HA_3

The vCenter Server will now be cloned and the HA network settings applied, this can be monitored from the tasks pane. Once complete the vCenter HA state will show Healthy, and all nodes in the cluster will show Up.

vCenter_HA_4

You can edit the status of vCenter HA at any time by going back into the vCenter HA menu and clicking Edit. You also have the option of removing the vCenter HA configuration or manually initiating a failover.

vCenter_HA_Edit

For more information on vCenter 6.7 High Availability see the vCenter Documentation Centre here.

vRA Deployments with Terraform

This post covers notes made when using Terraform to deploy basic resources from VMware vRealize Automation (vRA). Read through the vRA provider plugin page here and the Terraform documentation here. There are a couple of other examples of Terraform configurations using the vRA provider here and here. If you’re looking for an introduction on why Terraform and vRA then this blog post gives a good overview. If you have worked with the vRA Terraform provider before feel free to add any additional pointers or best practises in the comments section, as this is very much a work in progress.

Terraform Setup

Before starting you will need to download and install Go and Git to the machine you are running Terraform from. Visual Studio Code with the Terraform extension is also a handy tool for editing config files but not a requirement. The steps below were validated with Windows 10 and vRA 7.3.

After installing Go the default GOROOT is set to C:\Go and GOPATH to %UserProfile%\Go. Go is  the programming language we will use to rebuild the vRA provider plugin. GOPATH is going to be the location of the directory containing source files for Go projects.

In this instance I have set GOPATH to D:\Terraform and will keep all files in this location. To change GOPATH manually open Control Panel, System, Advanced system settings, Advanced, Environment Variables. Alternatively GOROOT and GOPATH can be set from CLI:

set GOROOT=C:\Go
set GOPATH=D:\Terraform

Download Terraform for Windows, put the executable in the working directory for Terraform (D:\Terraform or whatever GOPATH was set to).

In AppData\Roaming create a new file terraform.rc (%UserProfile%\AppData\Roaming\terraform.rc) with the following contents, replace D:\Terraform with your own Terraform working directory.

providers {
     vra7 = "D:\\Terraform\\bin\\terraform-provider-vra7.exe"
}

Open command prompt and navigate to the Terraform working directory. Run the following command to download the source repository:

go get github.com/vmware/terraform-provider-vra7et GOROOT=C:\Go

Open the Terraform working directory and confirm the repository source files have been downloaded.

The final step is to rebuild the Terraform provider using Go. Download the latest version of dep. Rename the executable to dep.exe and place in your Terraform working directory under \src\github.com\vmware\terraform-provider-vra7.

Back in command prompt navigate to D:\Terraform\src\github.com\vmware\terraform-provider-vra7 and run:

dep ensure
go build -o D:\Terraform\bin\terraform-provider-vra7.exe

Running dep ensure can take a while, use the -v switch if you need to troubleshoot. The vRA Terraform provider is now ready to use.

Using Terraform

In the Terraform working directory a main.tf file is needed to describe the infrastructure and set variables. There are a number of example Terraform configuration files located in the source repository files under \src\github.com\vmware\terraform-provider-vra7\example.

A very basic example of a configuration file would first contain the vRA variables:

provider "vra7" {
     username = "username"
     password = "password"
     tenant = "vRAtenant"
     host = "https://vRA
}

Followed by the resource details:

resource "vra7_resource" "machine" {
   catalog_name = "BlueprintName"
}

Further syntax can be added to pass additional variables, for a full list see the resource section here. The configuration file I am using for the purposes of this example is as follows:

main_tf

Example config and variable files from source repo:

multi_machine_example

variables_example

Once your Terraform configuration file or files are ready go back to command prompt and navigate to the Terraform working directory. Type terraform and hit enter to see the available options, for a full list of commands see the Terraform CLI documentation here.

Start off with initialising the vRA provider plugin:

terraform init

terraform_init

Validate the Terraform configuration files:

terraform validate

If you’re ready then start the deployment:

terraform apply

terraform_apply_1

Monitor the progress from the CLI or from the task that is created in the Requests tab of the vRA portal.

terraform_apply_2

terraform_apply_3

Check the state of the active deployments using the available switches for:

terraform state

terraform_state

To destroy the resource use:

terraform destroy

terraform_destroy

NSX 6.4.1 Upgrade Guide

This post will walk through upgrading to NSX 6.4.1. If upgrading from 6.4.0 then the new Upgrade Coordinator feature can be used, allowing simultaneous upgrade planning of multiple NSX objects, see the NSX 6.4.x Upgrade Coordinator post for more information. If upgrading from an earlier version than 6.4.0 then the steps outlined below are applicable. When performing an upgrade the NSX components must be upgraded in the following order: NSX Manager, NSX Controllers, Host Clusters, NSX Edge, Service Virtual Machines (such as Guest Introspection).

Review the operational impacts of NSX upgrades for each component here when planning your upgrade, it is best practise to limit all operations in the environment until the upgrade is complete. Make sure NSX Manager is backed up before starting an upgrade, and be aware that after a successful upgrade NSX cannot be downgraded. You should also review the VMware NSX for vSphere 6.4.1 Release Notes here and NSX for vSphere Documentation Center here.

Requirements

Requirements specific to NSX 6.4.1 are listed below. As we are doing an upgrade the assumption is that the vSphere and NSX environment is already setup and working, you can validate the existing NSX configuration here. You should also ensure an underlying network with IP connectivity and an MTU size of 1600 or above, FQDN resolution, connectivity, and time synchronisation between NSX and vSphere components, syslog, monitoring, and backups are all in place. In addition review the basic system requirements for NSX here and the full list of network port requirements here.

  • NSX 6.4.1 is compatible with vSphere versions 6.0 U2 and above, also note; if you are using 6.0 then U3 is recommended, the minimum supported version for 6.5 is 6.5a, support for 5.5 has now been removed
  • Supported upgrade paths to NSX 6.4.1 are from 6.2.4 onwards, there is a workaround for upgrading from 6.2.0, 6.2.1, or 6.2.2 which can be found here
  • Review the VMware Upgrade Path page here and also fully review the NSX 6.4.1 Release Notes here, as there are a number of things to be aware of when upgrading from 6.2.x or 6.3.x
  • Check compatibility with VMware products using the VMware Interoperability page here
  • Check compatibility with other third party products such as partner services for Guest Introspection using the VMware Compatibility Guide here
  • Before starting the upgrade make sure existing appliances meet the recommended hardware requirements:
    • NSX Manager 16 GB RAM (24 GB for large deployments), 4 vCPU (8 vCPU for large deployments), and 60 GB disk, a large deployment is typically 256+ hosts or 2000+ VMs
    • NSX Controllers 4 GB RAM, 4 vCPU, and 28 GB disk
    • NSX Edge Compact: 512 MB RAM, 1 vCPU, 584 MB + 512 MB disks. Large: 1 GB RAM, 2 vCPU, 584 MB + 512 MB disks. Quad Large: 2 GB RAM, 4 vCPU, 584 MB + 512 MB disks. X-Large: 8 GB RAM, 6 vCPU, 584 MB + 2 GB + 256 MB disks.
  • Verify the existing NSX Manager has sufficient space by connecting to the CLI (if using SSH service may need starting on the summary page of NSX Manager appliance page) and running show filesystems
  • Maximum latency between NSX components and NSX and vSphere components should be 150 ms RTT or below
  • NSX Data Security is no longer supported, it should be removed if installed prior to the upgrade
  • If you are using Cross-vCenter NSX then each component should be upgraded in the order listed here
  • Enabling DRS on the vSphere cluster allows running VMs to be automatically migrated when each host is placed into maintenance mode for the NSX VIB upgrades. This process can of course be undertaken manually if DRS is not in use
  • A completed upgrade can be validated following the steps listed here

Backups

Before we start take a backup of the vCenter Server and NSX Manager. NSX configuration can be backed up using FTP/SFTP, see this post for more information. From version 6.4.1 a configuration backup is automatically taken at the start of the upgrade process, this is intended as a fall back and you should still take your own backup before beginning. You can also take a snapshot of the NSX Manager incase we need to revert back the NSX Manager upgrade. For extra peace of mind export the vSphere Distributed Switch configuration by following the instructions here.

In the event you do need to restore from an NSX backup a new appliance should be deployed and the configuration restored, click here for further details.

Upgrade Process

As noted above make sure you have read all the linked documentation, specifically the release notes and operational impacts for each component upgrade. The steps below will not list the operational impact for each step of the upgrade.

Download the NSX for vSphere 6.4.1 Upgrade Bundle from the download page here to a location accessible from the NSX Manager. Browse to the NSX Manager and log in as admin. From the home page click Upgrade.

Click Upload Bundle and browse to the upgrade bundle downloaded earlier, click Continue. Once the bundle is uploaded you can (optional) select to enable SSH and/or join the Customer Experience Improvement Program. Click Upgrade to start the upgrade.

NSX64_1

The installer will now upgrade NSX Manager, once complete you will be returned to the login page.

NSX64_2

Log back into NSX Manager and click Upgrade. Verify the upgrade state is complete and the version number is correct. Click Summary and verify the health of the NSX Manager.

NSX64_3

Log into the vSphere Client, if you were already logged in then log out and back in, or you may need to clear your browser cache. From the Menu drop-down select Networking and Security.

Before upgrading any other components we need to upgrade the NSX Controller Cluster. On the Dashboard tab confirm there are 3 controller nodes all connected, the upgrade cannot commence if any nodes are in a disconnected state.

NSX64_5

Click Installation and Upgrade and select the Management and NSX Managers tab. Check the NSX Manager version is correct, in the Controller Cluster Status column click Upgrade Available.

NSX641_1

Each controller is upgraded and rebooted one at a time. From NSX 6.3.3 onwards the underlying operating system of the controller nodes changed to Photon-OS. If you are upgrading from 6.3.3 onwards an in-place upgrade is applied. If you are upgrading from 6.3.2 or earlier then the controller nodes are redeployed, any DRS rules anti-affinity rules are lost and will need to be reapplied.

Click Yes to being the Controller Cluster upgrade.

NSX641_2

Monitor the status in the NSX Controller Nodes tab. After all the controller nodes have been upgraded validated the Status, Peers, and Upgrade Status are all green. Confirm the Software Version is correct.

NSX641_3

Next we can upgrade the host NSX VIBs, click the Host Preparation tab. Clusters running NSX are displayed, upgrades are initiated on a per cluster basis. Select the cluster and click Upgrade to begin the upgrade.

Hosts running NSX 6.2.x require a reboot for the installation of new VIBs, hosts running NSX 6.3.0 and above do not need a reboot but must be placed into maintenance mode. You can either manually place hosts into maintenance mode and vMotion / power off VMs yourself, or allow DRS to live migrate VMs and remediate hosts one at a time.

NSX641_4

Click Yes to commence the cluster upgrade.

NSX641_5

At this stage if hosts are not in maintenance mode the NSX Installation will show Not Ready. If you have DRS enabled on the cluster click Actions and Resolve All, this will automatically vMotion running machines from a host, place into maintenance mode, update the VIBs, and exit maintenance mode, one host at a time. Alternatively you can select individual hosts and click Resolve if you want to control the order of the upgrades.

NSX641_6

Monitor the status of the NSX Installations in the Hosts table. You can also monitor Recent Tasks to make sure a host is not taking too long to enter maintenance mode, if a host cannot be evacuated due to DRS rules, or a VM that cannot be migrated then manual intervention may be required (in this case see here).

If you are using stateless images with Auto Deploy you should also update your ESXi image with the latest NSX VIBs or they will be lost at next reboot, for guidance see this post.

NSX641_7

The next step is to upgrade NSX Edges. Before commencing with validate the status of all NSX prepared hosts is green and they are showing successfully upgraded to the correct version. During Edge upgrades a replacement appliance is deployed which means 2 appliances (or 4 if running in HA mode) are powered on at the same time, ensure your cluster has sufficient compute resource.

NSX641_8

At the time of writing (v6.4.1) NSX Edges still need to be upgraded using the vSphere web client. Log into the vSphere web client and click Networking & Security, NSX Edges, deployed Edges are displayed .If you have multiple NSX Managers ensure the correct NSX Manager is selected in the drop-down. Select the NSX Edge to upgrade and from the Actions menu click Upgrade Version.

NSX641_9

The upgraded version will be deployed from OVF, you can follow the progress in the Recent Tasks pane and also the Status column for the Edge. Repeat this process for each Edge Services Gateway (ESG) and Distributed Logical Router (DLR) you wish to upgrade.

NSX641_10

The final stage is to upgrade Guest Introspection. This can either be done in the vSphere web client or by going back into the HTML5 web client. From the Menu drop-down select Networking and Security, click Installation and Upgrade and the Service Deployment tab. Existing service deployments are listed, the Installation Status for Guest Introspection shows Upgrade Available. Select the Guest Introspection deployment and click Upgrade, once complete verify the Installation Status and Service Status are both green and the version number is correct.

NSX641_11

After all NSX components are upgraded if you want to follow additional verification steps then see the upgrade validation KB here, or the post upgrade tasks listed here. You should take a further backup of NSX Manager after completion of the upgrade. Any third party appliances for Guest Introspection or Network Introspection that require an update can now be upgraded.

NSX 6.4.x Upgrade Coordinator

This post will walk through an upgrade to NSX 6.4.1 using the new Upgrade Coordinator feature allowing simultaneous upgrade planning of multiple NSX components. If you are upgrading from an earlier version of NSX, see the NSX 6.4.1 Upgrade Guide post for details on upgrading individual components. From version 6.4 onwards upgrade plans can be used to upgrade host clusters, controller clusters, Edge Service Gateways (ESGs), Distributed Logical Routers – including Universal (DLRs and UDLRs), and Service Virtual Machines such as Guest Introspection. Upgrade plans consist of either a one click system managed upgrade, or planning your own upgrade where objects and options can be customised.

Review the operational impacts of NSX upgrades for each component here when planning your upgrade, it is best practise to limit all operations in the environment until the upgrade is complete. Make sure NSX Manager is backed up before starting an upgrade, and be aware that after a successful upgrade NSX cannot be downgraded. You should also review the VMware NSX for vSphere 6.4.1 Release Notes here and NSX for vSphere Documentation Center here.

Requirements

Requirements specific to NSX 6.4.1 are listed below. As we are doing an upgrade the assumption is that the vSphere and NSX environment is already setup and working, you can validate the existing NSX configuration here. You should also ensure an underlying network with IP connectivity and an MTU size of 1600 or above, FQDN resolution, connectivity, and time synchronisation between NSX and vSphere components, syslog, monitoring, and backups are all in place. In addition review the basic system requirements for NSX here and the full list of network port requirements here.

  • NSX 6.4.1 is compatible with vSphere versions 6.0 U2 and above, also note; if you are using 6.0 then U3 is recommended, the minimum supported version for 6.5 is 6.5a, support for 5.5 has now been removed
  • Supported upgrade paths to NSX 6.4.1 are from 6.2.4 onwards, there is a workaround for upgrading from 6.2.0, 6.2.1, or 6.2.2 which can be found here
  • Review the VMware Upgrade Path page here and also fully review the NSX 6.4.1 Release Notes here, as there are a number of things to be aware of when upgrading from 6.2.x or 6.3.x
  • Check compatibility with VMware products using the VMware Interoperability page here
  • Check compatibility with other third party products such as partner services for Guest Introspection using the VMware Compatibility Guide here
  • Before starting the upgrade make sure existing appliances meet the recommended hardware requirements:
    • NSX Manager 16 GB RAM (24 GB for large deployments), 4 vCPU (8 vCPU for large deployments), and 60 GB disk, a large deployment is typically 256+ hosts or 2000+ VMs
    • NSX Controllers 4 GB RAM, 4 vCPU, and 28 GB disk
    • NSX Edge Compact: 512 MB RAM, 1 vCPU, 584 MB + 512 MB disks. Large: 1 GB RAM, 2 vCPU, 584 MB + 512 MB disks. Quad Large: 2 GB RAM, 4 vCPU, 584 MB + 512 MB disks. X-Large: 8 GB RAM, 6 vCPU, 584 MB + 2 GB + 256 MB disks.
  • Verify the existing NSX Manager has sufficient space by connecting to the CLI (if using SSH service may need starting on the summary page of NSX Manager appliance page) and running show filesystems
  • Maximum latency between NSX components and NSX and vSphere components should be 150 ms RTT or below
  • NSX Data Security is no longer supported, it should be removed if installed prior to the upgrade
  • If you are using Cross-vCenter NSX then each component should be upgraded in the order listed here
  • Enabling DRS on the vSphere cluster allows running VMs to be automatically migrated when each host is placed into maintenance mode for the NSX VIB upgrades. This process can of course be undertaken manually if DRS is not in use
  • A completed upgrade can be validated following the steps listed here

Backups

Before we start take a backup of the vCenter Server and NSX Manager. NSX configuration can be backed up using FTP/SFTP, see this post for more information. From version 6.4.1 a configuration backup is automatically taken at the start of the upgrade process, this is intended as a fall back and you should still take your own backup before beginning. You can also take a snapshot of the NSX Manager incase we need to revert back the NSX Manager upgrade. For extra peace of mind export the vSphere Distributed Switch configuration by following the instructions here.

In the event you do need to restore from an NSX backup a new appliance should be deployed and the configuration restored, click here for further details.

Upgrade Process

As noted above make sure you have read all the linked documentation, specifically the release notes and operational impacts for each component upgrade. The steps below will not list the operational impact for each step of the upgrade.

Download the NSX for vSphere 6.4.1 Upgrade Bundle from the download page here to a location accessible from the NSX Manager. Browse to the NSX Manager and log in as admin. From the home page click Upgrade.

Click Upload Bundle and browse to the upgrade bundle downloaded earlier, click Continue. Once the bundle is uploaded you can (optional) select to enable SSH and/or join the Customer Experience Improvement Program. Click Upgrade to start the upgrade.

NSX64_1

The installer will now upgrade NSX Manager, once complete you will be returned to the login page.

NSX64_2

Log back into NSX Manager and click Upgrade. Verify the upgrade state is complete and the version number is correct. Click Summary and verify the health of the NSX Manager.

NSX64_3

Log into the vSphere Client, if you were already logged in then log out and back in, or you may need to clear your browser cache. From the Menu drop-down select Networking and Security.

For any upgrade plan the NSX Controller Cluster upgrade is mandatory and performed first. On the Dashboard tab confirm there are 3 controller nodes all connected, the upgrade cannot commence if any nodes are in a disconnected state.

NSX64_5

Click Installation and Upgrade and select the Upgrade tab. Review the components, any warnings, and current and target version details.

NSX64_4

To start an upgrade plan click Plan Upgrade.

Upgrade Coordinator puts objects of the same type in default upgrade groups when planning an upgrade. These groups and other settings can be modified by planning your own upgrade (controller upgrades are mandatory) or you can allow the system to upgrade everything using a one click upgrade. Select the desired upgrade plan and click Next.

NSX64_7

The default options for the one click upgrade are to upgrade Host Clusters and Service VMs individually (serial), and to upgrade NSX Edges all together (parallel). There is no pause between components or pause on error. If you are happy with these settings then click Start Upgrade to being the upgrade process, otherwise go back to Plan Your Upgrade.

NSX64_8

Select your own upgrade to choose which components are upgraded, controller upgrades are mandatory and are done first. You can also pause the upgrade between components or pause the upgrade if an error is returned.

NSX64_9

The next 3 pages of the Upgrade Coordinator allow you to manage upgrade groups for Host Clusters, NSX Edges, and Service VMs. When planning your upgrade take into consideration the following:

  • Objects of the same type can be added to or removed from an upgrade group
  • The order of object upgrades within a group can be changed
  • All components included in an upgrade group must be upgraded before the next component type can be upgraded, e.g. all hosts included in an upgrade plan must be upgraded before moving onto Edges, and so on
  • Excluding an object within an upgrade group is useful for multiple maintenance windows, where you want to add an object to an upgrade plan but exclude them from this upgrade session
  • If the upgrade order within group is set to Serial then each object is upgraded one at a time, if it is Parallel then multiple objects within that group are upgraded at the same time

Controller Upgrades: each controller is upgraded and rebooted one at a time. From NSX 6.3.3 onwards the underlying operating system of the controller nodes changed to Photon-OS. If you are upgrading from 6.3.3 onwards an in-place upgrade is applied. If you are upgrading from 6.3.2 or earlier then the controller nodes are redeployed, any DRS rules anti-affinity rules are lost and will need to be reapplied.

Host Upgrades: hosts running NSX 6.2.x require a reboot for the installation of new VIBs, hosts running NSX 6.3.0 and above do not need a reboot but must be placed into maintenance mode. You can either manually place hosts into maintenance mode and vMotion / power off VMs yourself, or allow DRS to live migrate VMs and remediate hosts one at a time. Monitor the status of the NSX Installations on the Upgrade tab. You can also monitor Recent Tasks to make sure a host is not taking too long to enter maintenance mode, if a host cannot be evacuated due to DRS rules, or a VM that cannot be migrated then manual intervention may be required (in this case see here).

If you are using stateless images with Auto Deploy you should also update your ESXi image with the latest NSX VIBs or they will be lost at next reboot, for guidance see this post.

NSX64_10

Configure your upgrade plan based on the components you want to upgrade in this session, and review the final plan. When you’re read click Start Upgrade to begin the upgrade process.

NSX64_13

Monitor the status of the upgrade on the Upgrade page. If any warnings or errors are displayed during the upgrade process see the Monitor and Troubleshoot Your Upgrade page here. If you selected Pause between components you must Resume or Replan after each stage of the upgrade.

nsx64_14

An in-progress upgrade plan can still be paused to make modifications; when paused the object currently being upgraded will continue and the upgrade plan pauses when this object upgrade succeeds or fails.

nsx64_15

After the upgrade is complete verify the Upgrade page shows the system upgrade status successful.

nsx64_16

Verify the NSX health from the Dashboard page. After all NSX components are upgraded if you want to follow additional verification steps then see the upgrade validation KB here, or the post upgrade tasks listed here. You should take a further backup of NSX Manager after completion of the upgrade. Any third party appliances for Guest Introspection or Network Introspection that require an update can now be upgraded.