This post pulls together notes made during a real life customer planning and deployment of VMware Cloud (VMC) on AWS (Amazon Web Services). Later on we move into the migrations of VMware Virtual Machines (VMs) from traditional on-premise vSphere infrastructure to VMware Cloud on AWS. The content is a work in progress and intended as a generic list of considerations and useful links for both VMware and AWS, and is not a comprehensive guide. Cloud, more-so than traditional infrastructure, is constantly changing. Features are implemented regularly and transparently so always validate against official documentation. This post was last updated on August 6th 2019.
Part 1: SDDC Deployment
1. Capacity Planning
You can still use existing tools or methods for basic capacity planning, you should also consult the VMware Cloud on AWS Sizer and TCO Calculator provided by VMware to help compare on-premise pricing with VMware Cloud on AWS pricing. There is a What-If Analysis built into both vRealize Business and vRealize Operations, which is similar to the sizer tool and can also help with cost comparisons. Additional key considerations are:
- Egress costs are now a thing! Use vRealize Network Insight to understand network egress costs and application topology in your current environment. Calculate AWS Egress Fees Proactively for VMware Cloud on AWS is a really useful resource and needs to be considered in your overall VMware on AWS pricing estimate.
- You do not need to factor in N+1 when planning capacity. If there is a host failure VMware will automatically add a new host to the cluster, allowing you to utilise more of the available resource.
- Export a list of Virtual Machines (VMs) from vCenter and review each VM. Contact service owners, application owners, or super users to understand if there is still a requirement for the machine and what it is used for. This ties in to the migration planning piece but crucially allows you to better understand capacity requirements. Most environments have VM sprawl and identifying services that are either obsolete, moved to managed services, or were simply test machines no longer required will clearly reduce capacity requirements.
- Consider you are now on a ‘metered’ charging model, so don’t set the meter going; in other words don’t deploy the SDDC, until you are ready to start using the platform. Common sense, but internal service reviews or service acceptance and approvals can take longer than expected.
- You can make savings using reserved instances, by committing to 1 or 3 years. Pay as you go pricing may be sufficient for evaluation or test workloads, but for production workloads it is much more cost effective to use reserved instances.
- At the time of writing up to 2 SDDC’s can be deployed per organisation (soft limit), each SDDC supporting up to 20 vSphere clusters and each cluster up to 16 physical nodes.
- The standard i3 bare metal instance currently offers 2 sockets, 36 cores, 512 GiB RAM, 10.7 TB vSAN storage, a 16-node cluster provides 32 sockets, 576 cores, 8192 GiB RAM, 171.2 TB.
- New R5 bare metal instances are deployed with 2.5 GHz Intel Platinum 8000 series (Skylake-SP) processors; 2 sockets, 48 cores, 768 GiB RAM and AWS Elastic Block Storage (EBS) backed capacity scaling up to 105 TB for 3-node resources and 560 TB for 16-node resources. For up to date configuration maximums see Configuration Maximums for VMware Cloud on AWS.
2. Placement and Availability
Ultimately placement of your SDDC is going to be driven by specific use cases, and any regulations for the data type you are hosting. How VMware is Accelerating NHS Cloud Adoption uses the UK National Health Service (NHS) and Information Governance as an example. Additional placement and availability considerations are:
- An SDDC can be deployed to a single Availability Zone (AZ) or across multiple AZ’s, otherwise known as a stretched cluster. For either configuration if a problem is identified with a host in the cluster High Availability (HA) evacuation takes place as normal, an additional host is then automatically provisioned and added as a replacement.
- The recommendation for workload availability is to use a stretched cluster which distributes workloads across 2 Availability Zones with a third hosting a witness node. In this setup data is written to both Availability Zones in an active active setup. In the event of an outage to an entire Availability Zone vSphere HA brings virtual machines back online in the alternative AZ: VMware Cloud on AWS Stretched Cluster Failover Demo.
- Stretched clusters have an SLA Availability Commitment of 99.99% (99.9% for single AZ), and provide a Recovery Point Objective (RPO ) of zero by using synchronous data replication. Note that there are additional cross-AZ charges for stretched clusters. The Recovery Time Objective (RTO) is a vSphere HA failover, usually sub 5 minutes.
- The decision on whether to use single or multiple Availability Zones needs to be taken at the time of deployment. An existing SDDC cannot be upgraded to multi-AZ or downgraded to a single AZ.
- An Elastic Network Interface (ENI) dedicated to each physical host connects the VMware Cloud to the corresponding Availability Zone in the native AWS Virtual Private Cloud (VPC). There is no charge for data crossing the 25 Gbps ENI between the VMware Cloud VPC and the native AWS VPC.
- Data that crosses Availability Zones is chargeable, therefore it is good practise to deploy the SDDC to the same region and AZ as your current or planned native AWS services.
3. Networks and Connectivity
- VMware Cloud on AWS links with your existing AWS account to provide access to native services. During provisioning a Cloud Formation template will grant AWS permissions using the Identity Access Management (IAM) service. This allows your VMC account to create and manage Elastic Network Interfaces (ENIs) as well as auto-populate Virtual Private Cloud (VPC) route tables when NSX subnets are created.
- It is good practise to enable Multi-Factor Authentication (MFA) for your accounts in both VMC and AWS. VMware Cloud can also use Federated Identity Management, for example with Azure AD. This currently needs to be facilitated by your VMware Customer Success team, but once setup means you can control accounts using Active Directory and enforce MFA or follow your existing user account policies.
- It is important to ensure proper planning of your IP addressing scheme, if the IP range used overlaps with anything on-premise or in AWS then routes will not be properly distributed and the SDDC needs destroying and reinstalling with an updated subnet to resolve.
- You will need to allocate a CIDR block for SDDC management, as well as network segments for your SDDC compute workloads to use. Review Selecting IP Subnets for your SDDC for assistance with selecting IP subnets for your VMC environment.
- Connectivity to the SDDC can be achieved using either AWS Direct Connect (DX) or VPN, see Connectivity Options for VMware Cloud on AWS Software Defined Data Centers. From SDDC v1.7 onwards it is possible to use DX with a backup VPN for resilience.
- Traffic between VMC and your native AWS VPC is handled by the 25 Gbps Elastic Network Interfaces (ENI) referenced in the section above. To connect to additional VPCs or accounts you can setup an IPsec VPN. The Amazon Transit Gateway feature is available for some regions and configurations, if you are using DX then the minimum requirement is 1Gbps.
- Access to native AWS services needs to be setup on the VMC Gateway Firewall, for example: Connecting VMware Cloud on AWS to EC2 Instances, as well as Amazon security groups; this is explained in How AWS Security Groups Work With VMware Cloud on AWS.
- To migrate virtual machines from your on-premise data centre review Hybrid Linked Mode Prerequisites and vMotion across hybrid cloud: performance and best practices. In addition you will need to know the Required Firewall Rules for vMotion and for Cold Migration.
- For virtual machines to keep the same IP addressing layer 2 networks can be stretched with HCX, review VMware HCX Migration Documentation. HCX is included with VMC licensing but is a separate product in its own right so should be planned accordingly and is not covered in this post. Review VMware Cloud on AWS Live Migration Demo to see a VMware HCX Migration in action.
- VMware Cloud on AWS: Internet Access and Design Deep Dive is a useful resource for considering virtual machines that may require internet access.
4. Operational Readiness
The SDDC is deployed but before you can start migrating virtual machines to VMware on AWS you need to make sure the platform is fully operational. There are some key aspects but in general make sure you cover everything you do currently on premise:
- You will likely still have a need for Active Directory, DNS, DHCP, and time synchronisation. Either use native cloud services, or build new Domain Controllers for example in VMC.
- If you have a stretched-cluster and build Domain Controllers, or other management servers, consider building these components in each Availability Zone, then using compute policies to control the virtual machine placement. This is similar to anti-affinity rules on-premise, see VMware Cloud on AWS Compute Policies for more information.
- Remember Disaster Recovery (DR) still needs to be factored in. DR as a Service (DRaaS) is offered through Site Recovery Manager (SRM) between regions in the cloud or on-premise. A stretched-cluster may be sufficient but again, this is dependent on the organisation or service requirements.
- Anti-Virus, monitoring, and patching (OS / application) solutions need to be implemented. Depending on your licensing model you should be able to continue using the same products and tool-set, and carry the license over, but check with the appropriate vendor. Also start thinking about integrating cloud monitoring and management where applicable.
- VMware Cloud Log Intelligence is a SaaS offering for log analytics, it can forward to an existing syslog solution or integrate with AWS CloudTrail.
- Backups are still a crucial part of VMware Cloud on AWS and it is entirely the customers responsibility to ensure backups are in place. Unless you have a specific use case to backup machines from VMware Cloud to on-premise, it probably makes sense to move or implement backup tooling in the cloud, for example using Veeam in Native AWS.
- Perform full backups initially to create a new baseline. Try native cloud backup products that will backup straight to S3, or continue with traditional backup methods that connect into vCenter. The reference architecture below uses Elastic Block Storage (EBS) backed Elastic Compute Cloud (EC2) instances running Veeam as a backup solution, then archiving out to Simple Storage Services (S3). Druva are able to backup straight to S3 from VMC. Veeam are also constantly updating functionality so as mentioned at the start of the post this setup may not stay up to date for long:
- Customers must be aware of the shared security model that exists between: VMware; delivering the service, Amazon Web Services (the IaaS provider); delivering the underlying infrastructure, and customers; consuming the service.
- VMware Cloud on AWS meets a number of security standards such as NIST, ISO, and CIS. You can review VMware’s security commitments in the VMware Cloud Services on AWS Security Overview.
- When using native AWS services you must always follow Secure by Design principals to make sure you are not leaving the environment open or vulnerable to attack.