This post pulls together the workload migration and lessons learned notes I have made during evacuation of an on-premise date centre to VMware Cloud (VMC) on AWS (Amazon Web Services). The content is a work in progress and intended as a generic list of considerations and useful links, it is not a comprehensive guide. Cloud, more-so than traditional infrastructure, is constantly changing. Features are implemented regularly and transparently so always validate against official documentation. This post was last updated on September 16th 2019.
Part 2: Migration Planning & Lessons Learned
1. Virtual Machine Migrations
The following points should help with the planning of Virtual Machine (VM) workload migrations. An assumption is made that the Software Defined Data Centre (SDDC) is stood up and operational with monitoring, backups, Anti-Virus, etc. in place. Review Part 1: SDDC Deployment for more information. I found the SDDC deployment and getting the environment available was the easy part. Internal processes and complexity of the existing environment are going to determine how quickly you can migrate workloads to the SDDC.
We started by exporting a list of Virtual Machines from each vCenter, from that we identified the service it was running and the service owner or business owner. The biggest surprise here was the amount of servers deployed by, or for, people who had left the organisation. These servers were still being hosted, maintained, patched, but no longer needed. We were able to decommission more workloads than expected due to years of VM sprawl. Whilst VMware Cloud on AWS isn’t directly responsible for this the project forced us to evaluate each server we hosted. For remaining workloads we put together a migration flow which identified the following criteria:
- CPU, RAM, storage requirements: identified a baseline to automatically accept and then anything above our baseline would require a manual check.
- Network dependencies: is there a large amount of data in transit, is IP retention required, is the VLAN stretched using Hybrid Cloud Extension (HCX), load balancer requirements.
- Data flows: used vRealize Network Insight to identify potential egress costs and additional service dependencies.
- Additional application or organisation specific considerations: e.g. data classification, tagging / charge-back model, backups, security, monitoring, DNS, authentication, licensing or support.
- Service Management considerations: is the service platinum/gold/silver/bronze or unclassified, do the platform Service Level Agreements (SLAs) fulfil the existing SLAs in place for each service, is the proposed migration type (i.e. amount of downtime) taking this into consideration. Involving Service Management right from the start was useful as they were able to advise on internal processes for Service Acceptance and Business Continuity.
- Service Owner considerations: if the technical criteria above is met then the next step was to meet with service owners and get their buy-in for the migration. We migrated internal services we owned first, and then used that as a success story to onboard other services. This process involved meeting with various departments, presenting the solution and the benefits over their existing hosting, in our case DR and performance improvements, and migrating dev or test workloads first to build confidence.
- Migration passport: one of our Senior Engineers came up with this concept as a one-pager for each service that was migrated, it consisted of migration details (change ID, date, status), migration scope (server names, locations, and notes), firewall rules, vRNI outputs, and other information such as associated documentation.
Each environment is different so these are provided as example considerations only. Use resources such as those outlined below, and , to develop your own migration strategy.
- Cloud Migration Technical Whitepaper
- Quick Reference for Cloud Migration
- VMware Cloud on AWS: Get your basics right: Part 1
- VMware Cloud on AWS: Get your basics right: Part 2: Cloud Migration
- VMware Cloud on AWS: Get your basics right Part 3: Extend on-premises infrastructure to the cloud
- VMware Validated Design for VMware Cloud on AWS: Blog, Documentation
- Ensure you check the Migrating Virtual Machines Documentation as there are considerations dependant on the migration type, for example vSphere version, hardware version, distributed switch version, maximum vmdk size, no shared vmdk files, no virtual media or ISO attached.
- Microsoft SQL Server Workloads and VMware Cloud on AWS: Design, Migration, and Configuration is aimed at migrating SQL into VMC but also contains some useful architectural and operational guidelines so is worth a read.
- Try and group services together based on your own requirements, this could be application complexity, service-tier or downtime tolerance, risk level, subnet or network requirements, etc. Some of these criteria may determine how the workload is moved: prolonged downtime, minimal downtime, zero downtime. The links above go into detail about the different type of migration.
- If you are stretching networks using HCX make sure a plan for the full subnet is built into your migration timeline. This may require you to re-IP or move workloads elsewhere if they are not part of the migration plan.
- Consider migration paths for any physical workloads, whether that be P2V, AWS Bare Metal instances, or co-locating equipment.
2. Network Design
- Research the differences and limitations around the different connection types, especially under 1Gbps – Configuring AWS Direct Connect with VMware Cloud on AWS
Make sure you understand the terminology around a Virtual Interface (VIF) and the difference between a Standard VIF, Hosted VIF, and Hosted Connection: What’s the difference between a hosted virtual interface (VIF) and a hosted connection? It is important to consider that VMware Cloud on AWS requires a dedicated Virtual Interface (VIF) – or a pair of VIFs for resilience. If you have a standard 1Gbps or 10Gbps connection direct from Amazon then you can create and allocate VIFs for this purpose. If you are using a hosted connection from an Amazon Partner Network (APN) for sub-1G connectivity then you may need to procure additional VIFs, or a dedicated Direct Connect with the ability to have multiple VIFs on a single circuit. This is a discussion you should have with your APN partner.
- The Virtual Private Cloud (VPC) provided by the shadow AWS account cannot be used as a transit VPC. In other words if you want to connect to private IP addressing of native AWS services you cannot hop via VMware Cloud. In this instance a Transit Gateway can be used.
- At the time of writing a VPN attachment must be created to connect the SDDC to a Transit Gateway, if Direct Connect is in use then the minimum requirement is 1Gbps.
- If there is a requirement to connect multiple existing AWS VPCs, or multiple SDDCs, with on-premise networks then definitely check out VMware Cloud on AWS with Transit Gateway Demo.
- If a backup VPN is in use then you may be able to reduce failover time using Bidirectional Forwarding Detection (BFD) which is automatically enabled by AWS, in our case it was not supported by our third party provider.
- Use vRealize Network Insight to get an idea of dependencies and data flows that you can use to plan firewall rules and estimate egress or cross-AZ charges. In general my experience with these charges is that they have been minimal, but this depends entirely on your own environment.
- If you want to update your default route see How to Set the Default Route in VMware Cloud on AWS: Part 1 & Part 2.
- VMware Cloud on AWS: NSX Networking and Security eBook
3. Load Balancing & Security
- With the acquisition of Avi Networks we can expect Avi Networks services as a paid add-on for VMware Cloud: VMware Cloud on AWS: NSX and Avi Networks Load Balancing and Security.
- Third party load balancers such as virtual F5 can be deployed in virtual appliance format. If you are planning on using AWS Elastic Load Balancer (ELB) on a private IP address accessible on-premise ensure you have a connectivity method as outlined above.
- The NSX Distributed Firewall (DFW) feature is included in the price of VMware Cloud, the paid for message is removed from SDDC v1.8 onwards, this was announced at VMworld 2019.
- Another VMworld 2019 announcement was the inclusion of syslog forwarding with the free version of VMware Cloud Log Intelligence (SaaS offering for log analytics), although for troubleshooting NSX DFW logs you still need the paid for version.
- If you are using HCX this product uses its own IPSec tunnel and therefore we could not get it working with the private IP address over a backup VPN. It was assumed that HCX would also not work with the private IP address via Transit Gateway either, due to the SDDC VPN requirement, and would need to be reconfigured to use the public IP address.
- Another HCX consideration is that when you are stretching a network all traffic goes via the HCX Interconnects. This means you are encapsulating everything in port UDP 4500, and essentially bypassing your on-premise firewall rules while the network is stretched. It is important to double check all rules are correct before eventually moving the gateway to VMC.
- Again if you are using HCX to migrate workloads, remember to remove stretched networks once complete. This involves shutting down the gateway on-premise, removing the L2 stretch, and changing the network in the SDDC to routed, for us the down time was around 30 seconds. The deployment of HCX in our environment, although covered by vSphere High Availability (HA), didn’t have resilience built in, therefore we decided to minimise the amount of time they were in use by planning a migration strategy around each subnet.
- If you use NSX Service Deployments for Anti-Virus, i.e. Guest Introspection for agentless AV then you will need to deploy an agent on each VM, as this feature is still currently unavailable.
- The Cloud Services Portal (CSP) can be integrated with enterprise federation, allowing you to control access using your organisational policies, hopefully therefore enforcing Multi-Factor Authentication (MFA) and removing access as part of a leavers process. Federation will only work with a tenant, it will not work with a master organisation.
- It is not possible at the time of writing to easily transfer an SDDC deployed in the root/master organisation into a tenant. The process currently is a redeploy and migrate.
- Druva offer a product that will backup Virtual Machines from VMware Cloud on AWS direct into an S3 bucket they manage, for a greenfield deployment if you are not transferring any existing licenses this could be a good option as you only pay for the capacity you use. Having a backup environment setup in AWS has many benefits but also adds a management overhead and the consideration of replicating between Availability Zones.
- In general internal support was good once teams were educated on the platform and the slightly different operating model we were implementing. In terms of external support we have not encountered any compatibility issues yet, there was one application vendor with a published KB article stating they support running the application on VMware Cloud on AWS, then back tracked and said they wouldn’t support it as vSphere was a version not yet GA (6.8 at the time of writing).