Tag Archives: SDDC

VMware Cloud on AWS Deployment Planning

This post pulls together the notes I have made during the planning of VMware Cloud (VMC) on AWS (Amazon Web Services) deployment, and migrations of virtual machines from traditional on-premise vSphere infrastructure. It is intended as a generic list of considerations and useful links, and is not a comprehensive guide. Cloud, more-so than traditional infrastructure, is constantly changing. Features are implemented regularly and transparently so always validate against official documentation. This post was last updated on August 6th 2019.

Part 1: SDDC Deployment

1. Capacity Planning

You can still use existing tools or methods for basic capacity planning, you should also consult the VMware Cloud on AWS Sizer and TCO Calculator provided by VMware. There is a What-If Analysis built into both vRealize Business and vRealize Operations, which is similar to the sizer tool and can also help with cost comparisons. Additional key considerations are:

  • Egress costs are now a thing! Use vRealize Network Insight to understand network egress costs and application topology in your current environment. Calculate AWS Egress Fees Proactively for VMware Cloud on AWS is a really useful resource.
  • You do not need to factor in N+1 when planning capacity. If there is a host failure VMware will automatically add a new host to the cluster, allowing you to utilise more of the available resource.
  • Export a list of Virtual Machines (VMs) from vCenter and review each VM. Contact service owners, application owners, or super users to understand if there is still a requirement for the machine and what it is used for. This ties in to the migration planning piece but crucially allows you to better understand capacity requirements. Most environments have VM sprawl and identifying services that are either obsolete, moved to managed services, or were simply test machines no longer required will clearly reduce capacity requirements.
  • Consider you are now on a ‘metered’ charging model, so don’t set the meter going; in other words don’t deploy the SDDC, until you are ready to start using the platform. Common sense, but internal service reviews or service acceptance and approvals can take longer than expected.
  • You can make savings using reserved instances, by committing to 1 or 3 years. Pay as you go pricing may be sufficient for evaluation or test workloads, but for production workloads it is much more cost effective to use reserved instances.
  • At the time of writing up to 2 SDDC’s can be deployed per organisation (soft limit), each SDDC supporting up to 20 vSphere clusters and each cluster up to 16 physical nodes.
  • The standard i3 bare metal instance currently offers 2 sockets, 36 cores, 512 GiB RAM, 10.7 TB vSAN storage, a 16-node cluster provides 32 sockets, 576 cores, 8192 GiB RAM, 171.2 TB.
  • New R5 bare metal instances are deployed with 2.5 GHz Intel Platinum 8000 series (Skylake-SP) processors; 2 sockets, 48 cores, 768 GiB RAM and AWS Elastic Block Storage (EBS) backed capacity scaling up to 105 TB for 3-node resources and 560 TB for 16-node resources. For up to date configuration maximums see Configuration Maximums for VMware Cloud on AWS.

2. Placement and Availability

Ultimately placement of your SDDC is going to be driven by specific use cases, and any regulations for the data type you are hosting. How VMware is Accelerating NHS Cloud Adoption uses the UK National Health Service (NHS) and Information Governance as an example. Additional placement and availability considerations are:

  • An SDDC can be deployed to a single Availability Zone (AZ) or across multiple AZ’s, otherwise known as a stretched cluster. For either configuration if a problem is identified with a host in the cluster High Availability (HA) evacuation takes place as normal, an additional host is then automatically provisioned and added as a replacement.
  • The recommendation for workload availability is to use a stretched cluster which distributes workloads across 2 Availability Zones with a third hosting a witness node. In this setup data is written to both Availability Zones in an active active setup. In the event of an outage to an entire Availability Zone vSphere HA brings virtual machines back online in the alternative AZ: VMware Cloud on AWS Stretched Cluster Failover Demo.
  • Stretched clusters have an SLA Availability Commitment of 99.99% (99.9% for single AZ), and provide a Recovery Point Objective (RPO ) of zero by using synchronous data replication. Note that there are additional cross-AZ charges for stretched clusters. The Recovery Time Objective (RTO) is a vSphere HA failover, usually sub 5 minutes.
  • The decision on whether to use single or multiple Availability Zones needs to be taken at the time of deployment. An existing SDDC cannot be upgraded to multi-AZ or downgraded to a single AZ.
  • An Elastic Network Interface (ENI) dedicated to each physical host connects the VMware Cloud to the corresponding Availability Zone in the native AWS Virtual Private Cloud (VPC). There is no charge for data crossing the 25 Gbps ENI between the VMware Cloud VPC and the native AWS VPC.
  • Data that crosses Availability Zones is chargeable, therefore it is good practise to deploy the SDDC to the same region and AZ as your current or planned native AWS services.

3. Networks and Connectivity

  • VMware Cloud on AWS links with your existing AWS account to provide access to native services. During provisioning a Cloud Formation template will grant AWS permissions using the Identity Access Management (IAM) service. This allows your VMC account to create and manage Elastic Network Interfaces (ENI’s) as well as auto-populate Virtual Private Cloud (VPC) route tables when NSX subnets are created.
  • It is good practise to enable Multi-Factor Authentication (MFA) for your accounts in both VMC and AWS. VMware Cloud can also use Federated Identity Management, for example with Azure AD. This currently needs to be facilitated by your VMware Customer Success team, but once setup means you can control accounts using Active Directory and enforce MFA or follow your existing user account policies.
  • It is important to ensure proper planning of your IP addressing scheme, if the IP range used overlaps with anything on-premise or in AWS then routes will not be properly distributed and the SDDC needs destroying and reinstalling with an updated subnet to resolve.
  • You will need to allocate a CIDR block for SDDC management, as well as network segments for your SDDC compute workloads to use. Review Selecting IP Subnets for your SDDC for assistance with selecting IP subnets for your VMC environment.
  • Connectivity to the SDDC can be achieved using either AWS Direct Connect (DX) or VPN, see Connectivity Options for VMware Cloud on AWS Software Defined Data Centers. From SDDC v1.7 onwards it is possible to use DX with a backup VPN for resilience.
  • Traffic between VMC and your native AWS VPC is handled by the 25 Gbps Elastic Network Interfaces (ENI) referenced in the section above. To connect to additional VPCs or accounts you can setup an IPsec VPN. The Amazon Transit Gateway feature is available for some regions and configurations, if you are using DX then the minimum requirement is 1Gbps.
  • Access to native AWS services needs to be setup on the VMC Gateway Firewall, for example: Connecting VMware Cloud on AWS to EC2 Instances, as well as Amazon security groups; this is explained in How AWS Security Groups Work With VMware Cloud on AWS.
  • To migrate virtual machines from your on-premise data centre review Hybrid Linked Mode Prerequisites and vMotion across hybrid cloud: performance and best practices. In addition you will need to know the Required Firewall Rules for vMotion and for Cold Migration.
  • For virtual machines to keep the same IP addressing layer 2 networks can be stretched with HCX, review VMware HCX Documentation. HCX is included with VMC licensing but is a separate product in its own right so should be planned accordingly and is not covered in this post. Review VMware Cloud on AWS Live Migration Demo to see HCX in action.
  • VMware Cloud on AWS: Internet Access and Design Deep Dive is a useful resource for considering virtual machines that may require internet access.

4. Operational Readiness

The SDDC is deployed but before you can start migrating virtual machines you need to make sure the platform is fully operational. There are some key aspects but in general make sure you cover everything you do currently on premise:

  • You will likely still have a need for Active Directory, DNS, DHCP, and time synchronisation. Either use native cloud services, or build new Domain Controllers for example in VMC.
  • If you have a stretched-cluster and build Domain Controllers, or other management servers, consider building these components in each Availability Zone, then using compute policies to control the virtual machine placement. This is similar to anti-affinity rules on-premise, see VMware Cloud on AWS Compute Policies for more information.
  • Remember Disaster Recovery (DR) still needs to be factored in. DR as a Service (DRaaS) is offered through Site Recovery Manager (SRM) between regions in the cloud or on-premise. A stretched-cluster may be sufficient but again, this is dependent on the organisation or service requirements.
  • Anti-Virus, monitoring, and patching (OS / application) solutions need to be implemented. Depending on your licensing model you should be able to continue using the same products and tool-set, and carry the license over, but check with the appropriate vendor. Also start thinking about integrating cloud monitoring and management where applicable.
  • VMware Cloud Log Intelligence is a SaaS offering for log analytics, it can forward to an existing syslog solution or integrate with AWS CloudTrail.
  • Backups are still a crucial part of VMware Cloud on AWS and it is entirely the customers responsibility to ensure backups are in place. Unless you have a specific use case to backup machines from VMware Cloud to on-premise, it probably makes sense to move or implement backup tooling in the cloud, for example using Veeam in Native AWS.
  • Perform full backups initially to create a new baseline. Try native cloud backup products that will backup straight to S3, or continue with traditional backup methods that connect into vCenter. The reference architecture below uses Elastic Block Storage (EBS) backed Elastic Compute Cloud (EC2) instances running Veeam as a backup solution, then archiving out to Simple Storage Services (S3). Druva are able to backup straight to S3 from VMC. Veeam are also constantly updating functionality so as mentioned at the start of the post this setup may not stay up to date for long:

vmc_aws.png

  • Customers must be aware of the shared security model that exists between: VMware; delivering the service, Amazon Web Services (the IaaS provider); delivering the underlying infrastructure, and customers; consuming the service.
  • VMware Cloud on AWS meets a number of security standards such as NIST, ISO, and CIS. You can review VMware’s security commitments in the VMware Cloud Services on AWS Security Overview.
  • When using native AWS services you must always follow Secure by Design principals to make sure you are not leaving the environment open or vulnerable to attack.

Part 2 of this post will cover the planning and migration of virtual machine workloads.

Additional resources: VMware Cloud On AWS On-Boarding Handbook | VMware Cloud on AWS Operating Principles | Resources | Documentation | Factbook

VMware Cloud on AWS VideosYouTube PlaylistsVMworld 2018 Recorded Sessions

vRealize Operations 6.4 Install Guide

The vRealize product suite is a complete, enterprise, cloud management and automation platform for private, public, and hybrid clouds. Specifically vRealize Operations Manager provides intelligent operations management across heterogeneous physical, virtual, and cloud environments from a wide range of vendors. vRealize Operations Manager is able to deliver proactive and automated performance improvements  by implementing resource reclamation, configuration standardisations, workload placement, planning, and forecasting techniques. By leveraging vRealize Operations Manager users can protect their environment from outages with preventative and predictive analytics and monitoring across the estate; utilising management packs to  unify operations management. The image below is taken from the vRealize Operations Manager datasheet.

vro

vRealize Operations Manager can be deployed as a single node cluster, or a multiple node cluster. In single node cluster environments the master node is deployed with adapters installed which collect data and perform analysis. For larger environments additional data nodes can be added to scale out the solution, these are known as multiple node clusters. In a multiple node cluster the master node is responsible for the management of all other nodes. Data nodes handle data collection and analysis. High availability can be achieved by converting a data node into a replica of the master node. For distributed environments remote collector nodes are deployed to gather inventory objects and navigate firewalls in remote locations. These nodes do not store data or perform analytics; you can read more about remote collector nodes here. In this post we will deploy a single node cluster for small environments, proof of concept, test, or lab purposes, and link it to a vCenter Server instance. There will also be references to larger deployments and scaling out the application throughout the guide. If you have already deployed your vRealize cluster and want to add additional nodes or configure High Availability click here.

Licensing is split out into 3 editions; standard, advanced, and enterprise. To view the full feature list of the different editions see the vRealize Operations page. There are a number of VMware product suites bundling vRealize Operations, or it can be purchased standalone. Licensing is allocated in portable license units (vCloud suite and vRealize suite only), per processor with unlimited VMs, or in packs of 25 VMs (or OS instances).

Design Considerations

  • Additional data nodes can be added at any time using the Expand an Existing Installation option.
  • When scaling out the cluster by 25% or more the cluster should be restarted to optimise performance.
  • The master node must be online before any other nodes are brought online (except for when adding nodes at first setup of the cluster).
  • When adding additional data nodes keep in mind the following:
    • All nodes must be running the same version
    • All nodes must use the same deployment type, i.e. virtual appliance, Windows, or Linux.
    • All nodes must be sized the same in terms of CPU, memory, and disk.
    • Nodes can be in different vSphere clusters, but must be in the same physical location and subnet.
    • Time must be synchronised across all nodes.
  • These rules also apply to replica nodes. Click here to see a full list of multiple node cluster requirements.
  • Remote collector nodes can be deployed to remote locations to gather objects for monitoring. These nodes do not store data or perform any analytics but connect remote data sources to the analytics cluster whilst reducing bandwidth and providing firewall navigation. Read more about remote collector nodes here.
  • When designing a larger vROps environment check the Environment Complexity guide to determine if you should engage VMware Professional Services. You should also review the following documentation:

Requirements

  • The vRealize Operations Manager virtual appliance can be deployed to hosts running ESXi 5.1 U3 or later, and requires vCenter Server 5.1 U3 or later (it is recommended that vSphere 5.5 or later is used).
  • The virtual appliance is the preferred deployment method, a Windows and Linux installer is also available however the Windows installer will no longer be offered after v6.4, and end of life for the Linux installer is also imminent.
  • A static IP address must be used for each node (to change the IP after deployment see this kb).
  • Review the list of Network Ports used by vRealize Operations Manager.
  • The following table is from the vRealize Operations Manager Sizing Guide and lists the hardware requirements, latency, and configuration maximums.

sizing

Installation

Download vRealize Operations Manager here, in virtual appliance, Windows, or Linux formats. Try for free with hands on labs or a 60 day trial here.

In this example we are going to deploy as an appliance. Navigate to the vSphere web client home page, click vRealize Operations Manager and select Deploy vRealize Operations Manager.

vro1

The OVF template wizard will open. Browse to the location of the OVA file we downloaded earlier and click Next.

vro2

Enter a name for the virtual appliance, and select a location. Click Next.

vro3

Select the host or cluster compute resources for the virtual appliance and click Next.

vro4

Review the details of the OVA, click Next.

vro5

Accept the EULA and click Next.

vro6

Select the configuration size based on the considerations listed above, then click Next.

vra7

Select the storage for the virtual appliance, click Next.

vra8

Select the network for the virtual appliance, click Next.

vra9

Configure the virtual appliance network settings, click Next.

vra10

Click Finish on the final screen to begin deploying the virtual appliance.

vra11

Setup

Once the virtual appliance has been deployed and is powered on, open a web browser to the FQDN or IP address configured during deployment. Select New Installation.

install1

Click Next to begin the setup wizard.

install2

Configure a password for the admin account and click Next.

install3

On the certificate page select either the default certificates or custom. For assistance with adding custom certificates click here.

install4

Enter the host name for the master node and an NTP server, click Next.

install5

Click Finish.

install6

If required you can add additional data nodes before starting the cluster, or add them at a later date. See the Design Considerations section of this post before scaling out. To add additional data nodes or configure High Availability follow the steps at vRealize Operations High Availability before starting the cluster. Alternatively, you can start the cluster as a single node cluster and add data nodes or High Availability at a later date.

Since we are deploying a single node cluster we will now click Start vRealize Operations Manager. Depending on the size of the cluster it may take 10-30 minutes to fully start up.

install7

Confirm that the cluster has adequate nodes for the environment and click Yes to start up the application.

install8

After the cluster has started you will be diverted to the user interface. Log in with the admin details configured earlier.

install9

The configuration wizard will automatically start, click Next.

install10

Accept the EULA and click Next.

install11

Enter the license key or use the 60 day product evaluation. Click Next.

install12

Select whether or not to join the VMware Customer Experience Improvement Program and click Next.

install13

Click Finish.

install14

The vRealize Operations Manager dashboard will be loaded. The installation process is now complete. The admin console can be accessed by browsing to http:///admin where is the IP address of FQDN of your vRealize Operations Manager appliance or server.

install15

To add additional data nodes or configure High Availability see the vRealize Operations High Availability post.

Post Installation

After first setup we need to secure the console by creating a root account. Browse to the vROps appliance in vSphere and open the console. Press ALT + F1 and log in as root. You will be prompted to create a root password. All other work in this post is carried out using the vRealize Operations web interface.

The vRealize Operations web interface can be accessed by browsing to the IP address or FQDN of any node in the vRealize Operations management cluster (master node or replica node). During the installation process the admin interface is presented, after installation the IP address or FQDN resolves to the user interface. To access the admin interface browse to https:///admin where is the IP address or FQDN of either node in the management cluster. For supported browsers see the vRealize Operations Manager 6.4 Release Notes.

The next step is to configure the vCenter Adapter to collect and analyse data. Select Administration from the left hand navigation pane. From the Solutions menu select VMware vSphere and click the Configure icon.

config1

Enter the vCenter Server details and credentials with administrator access.

config2

Click Test Connection to validate connectivity to the vCenter Server.

config3

Expand Advanced Settings and review the default settings, these can be changed if required. Click Define Monitoring Goals and review the default policy, again this can be changed to suit your environment.

config4

When you’re ready click Save Settings and Close. The vCenter adapter will now begin collecting data. Collection cycles begin every 5 minutes, depending on the size of your environment the initial collection may take more than one cycle.

config5

Once data has been collected from the vCenter Server go back to the Home page and browse the different tabs and dashboards.

dashboard

Customise your vRealize Operations Manager instance to suit you environment using the VMware guides below.

Windows 2016 Storage Spaces Direct

Storage Spaces Direct for Windows Server 2016 is a software defined storage solution providing pooled storage resources across industry standard servers with attached local drives. Storage Spaces Direct (S2D) is able to provide scalability, built-in fault tolerance, resource efficiency, high performance, simplified management, and cost savings.

Storage Spaces Direct is a feature included at no extra cost with Datacentre editions of Windows Server 2016. S2D can be deployed across Windows clusters comprising of between 2 and 16 physical servers, with over 400 drives, using the Software Storage Bus to establishe a software-defined storage fabric spanning the cluster. Existing clusters can be scaled out by simply adding more drives, or more servers to the cluster. Storage Spaces Direct will automatically detect additional resources and absorb these drives into the pool; redistributing existing volumes. Resiliency is provided across not only drives, components, and servers; but can also be configured for chasis, rack, and site fault tolerance by creating fault domains to which the data spread will comply. The video below provided by Microsoft goes into more detail about fault domains and how they provide resiliency.

Furthermore volumes can be configured to use mirror resiliency or parity resiliency to protect data. Using mirror resiliency provides resiliency to drive and server failures by storing a default of 3 copies across different drives in different servers. This is a simple deployment with minimal CPU overhead but a relatively inefficient use of storage. Alternatively we can use parity resiliency, where parity symbols are spread across a larger set of data symbols to provide both drive and server resiliency, but also a more efficient use of storage resources (requires 4 physical servers). You can learn more about both these methods at the Volume Resiliency blog by Microsoft.

The main use case for Storage Spaces Direct is a private cloud (either on or off-premises) using one of two deployment models. Hyper-Converged where compute and storage reside on the same servers, in this use case virtual machines would sit directly on top of the volumes provided by S2D. Using a Private Cloud Storage or Converged deployment method S2D is disaggregated from the hypervisor, providing a separate storage cluster for larger-scale deployments such as Iaas (Infrastructure as a Service). A SoFS (Scale-out File Server) is built on S2D to provide network-attached storage over SMB3 file shares.

Storage Spaces Direct is configured using a number of PowerShell cmdlets, and utilises Failover Clustering and Cluster Shared Volumes. For instructions on enabling and configuring S2D see Configuring Storage Spaces Direct – Step by Step, Robert Keith, Argon Systems. The requirements are as follows:

  • Windows Server 2016 Datacentre Edition.
  • Minimum of 2 servers, maximum of 16, with local-attached SATA, SAS, or NVMe drives.
  • Each server must have at least 2 solid-state drives plus at least 4 additional drives, the read/write cache uses the fastest media present by default.
  • The SATA and SAS devices should be behind a HBA and SAS expander.
  • Storage Spaces Direct uses SMB3, including SMB Direct and SMB Multichannel, over Ethernet to communicate between servers. 10 GbE or above is recommended for optimum performance.
  • All hardware must support SMB (Server Message Block) and RDMA (Remote Direct Memory Access).

s2ddeployments

Deploying EMC Unity VSA

The EMC Unity product line is a mid-range storage platform built completely from the group up as an eventual replacement for most VNX and VNXe use cases. The Unity virtual storage appliance is a software defined storage platform bringing the software intelligence of Unity arrays to your existing storage infrastructure.

The Unity VSA is ideal for remote office and branch offices (ROBO) as well as hardware consolidation and IT staging and testing. It comes in a 4 TB free community edition and a subscription based professional edition which seamlessly scales up from 10 TB to 20 or 50 TB. The virtual storage appliance includes all the features of the Unity range such as replication, data protection snapshots, FAST VP auto-tiering and more.

See also EMC Unity Setup Guide, which covers a walkthrough on the setup of a physical Unity array.

vsa

Key features

  • Affordable software defined solution
  • Deploy to your existing storage infrastructure
  • Quick and easy setup of CIFS, NFS and iSCSI
  • Unified block, file and VMware VVOLs support
  • Allows VMware administrators to manage storage from vCenter
  • HTML5-enabled Unisphere management
  • Manage virtual storage and physical arrays together

Requirements

  • ESXi 5.5 or later (must be ESXi 6.0 or later for VVOLs)
  • The use of VMware vCenter Server to manage ESXi is optional but recommended
  • The Unity VSA requires 2 vCPU, 12 GB RAM and 6 NICs (4 ports for I/O, 1 for Unisphere, 1 for system use)

If you are deploying the Unity VSA in a production environment then you should consider how the data is stored across your existing hardware ensuring RAID and HA are configured appropriately. If you are presenting VMware datastores or virtual volumes then contact EMC support for best practises and the VMware vStorage APIs for Storage Integration (VAAI) and vStorage APIs for Storage Awareness (VASA).

Deploying Unity VSA

Download the OVA file from https://www.emc.com/products-solutions/trial-software-download/unity-vsa.htm and deploy the OVA to vSphere. Accept the extra configuration options, this is just to disable time synchronisation of the virtual machine as it is controlled from within the appliance.

ovf1

The only customisation settings required are the system name and network settings.

ovf2

Once the appliance has been deployed right click the virtual machine and select Edit Settings. Add the virtual hard disks required for the file systems on your virtual appliance, this can be done later but you will not be able to create any storage pools until additional disks are added. Note that virtual hard disks 1 – 3 are for system use and should not be modified.

Powered on the appliance, when it has fully booted browse to the IP address configured during the OVF deployment process. Log in with the default user of admin with password Password123#.

vsa1

The Unisphere configuration wizard will auto start, click Next.

vsa2

Accept the license agreement and click Next.

vsa3

Configure the admin and service passwords, click Next.

vsa4

Obtain a license key from https://www.emc.com/auth/elmeval.htm and click Install License to upload the .lic file, click Next.

vsa5

Configure the DNS servers and click Next.

vsa6

Configure the NTP servers and click Next.

vsa7

You can create a pool now or later. To create a storage pool now click Create Pools. Unisphere scans for virtual disks available to the VM that can be used for a storage pool. Once the storage pool has been created click Next.

vsa8

Configure the SMTP server and recipients for email alerts, click Next.

vsa9

Add the network interfaces to use for iSCSI and click Next.

vsa10

Add a NAS server to store metadata, click Next.

vsa11

This concludes the Unisphere configuration wizard.

vsa12

You will be returned to the Unisphere dashboard.

vsa13

The virtual storage appliance has now been deployed and uses the same software and Unisphere interface as its hardware counterpart. From here you can go ahead and setup CIFS and NFS shares or present iSCSI targets.