Tag Archives: SDDC

This post will walk through the example design below; building out the Amazon Web Services (AWS) framework enabling VMware Cloud on AWS customers to start using AWS at scale, alongside VMware workloads. The key focus will be around the control of multiple accounts, using AWS Organizations and Service Control Policies, and cross-account connectivity, with Transit Gateway and the role of the VMware Cloud-connected Virtual Private Cloud (VPC).

Example VMC AWS Setup

To enlarge the image right click and select open image in new tab.

VMware Cloud on AWS Focus

This article assumes you already have a working knowledge of VMware Cloud on AWS and have either deployed or are planning the deployment of, your Software-Defined Data Centre (SDDC). If you are unclear about the requirements for the connected AWS account and VPC review the VMware Cloud documentation here.

In the example architecture, we are working on, a Stretched Cluster has been deployed in the eu-west-2 (London) region. During the SDDC deployment, I connected an existing AWS account, aws.sddc@myorg.com, and now have a 25 Gbps cross-VPC link between VMware Cloud and my own VPC using the Elastic Network Interface (ENI). More information on how the connected VPC works can be found in AWS Native Services Integration With VMware Cloud on AWS.

VMCFocus

In this setup, I have also configured some Elastic Compute Cloud (EC2) instances to back up my Virtual Machines (VMs) to Simple Storage Services (S3). Great, so how do I start deploying AWS services at scale, and onboard the rest of my business that wants to begin creating their own AWS accounts?

AWS Organizations & Service Control Policies

AWS Organizations is a good starting point for those wanting to implement policies and governance across multiple accounts, for compliance, security, and standardised environments. Organizations can consolidate billing for your accounts, and automate the creation of new accounts as your environments grow. There is no additional charge for using AWS Organizations or Service Control Policies (SCP). An AWS Organization can be deployed manually, or as part of a Landing Zone which is discussed in the next section.

First, log into the AWS console with the account you will assign as the master. This account is used for account and policy management (it is itself exempt from Service Control Policies we will cover shortly), and assumes the role of a payer account for charges accrued by accounts within the organization hierarchy. Once the master account is set, it cannot be changed.

From the Services drop-down locate AWS Organizations, click Create Organization. Name your organization and select either consolidated billing only features or all features. From the Accounts tab, you can create and manage AWS accounts, and add existing accounts to your organization. A member, and master, account can only be a member of one organization. When you create an account with AWS Organizations, an Identity and Access Management (IAM) role is created with full administrative permissions in the new account. The master account can assume this IAM role if required to gain access.

The Organize Accounts tab is where you can start creating the hierarchy of Organizational Units (OU). The canvas starts with the top-most container, which is the administrative root. OUs, and nested OUs (up to 5 levels deep including root), are added for separate groupings of departments or business units allowing policies to be applied to groups of accounts. An OU will inherit policies from the parent OU in addition to any policies assigned directly to it.

Organization1

A Service Control Policy contains statements defining the controls that are applied to an account or group of accounts. SCPs can only be used for organizations created with all features enabled, they are not available with consolidated billing only. Multiple SCPs can be attached or inherited to accounts in the hierarchical OU chain, however, a deny will always override any allow policies. SCPs can be created and managed in the Policies tab.

A default FullAWSAccess policy exists an is attached to the organization root allowing access to any operation. In this example, I have created a DenyInternet policy to be applied to my DataOps OU, who have a requirement to analyse sensitive data from data sets running in VMware Cloud. The SCP is a JSON policy that specifies the maximum available permission for accounts or grouping of accounts (OU) that the policy is attached to. You can write the JSON out yourself or use the statement filter on the left-hand side.

Organization3

Once the policy is created, I attach it to the relevant OU, where it is instantly applied to any member accounts residing in that particular OU. I attach the policy either from the Policies tab, or directly on the account, OU, or organization root.

Organization2

Now, when logging in with the user account, I am unable to create an Internet Gateway as defined in the SCP statement. For more information on Service Control Policies review the Service Control Policies User Guide which details example statements, relationship with IAM permissions, and scenarios where SCPs would not apply, such as resource-based policies, and users or roles outside of the organization.

IGWDeny

Outside of the master account, my AWS hierarchy now looks like this. With a repeatable process in place for members of the DataOps team to create new accounts which do not have internet access. Furthermore, I may want to create some root policies to limit the tampering of AWS security tools such as CloudTrail, GuardDuty, and Config. You can read more about these services in the next section.

AWSOrg

Additional Baseline AWS Services

To protect the AWS Organization, I can look to implement a security baseline across all my accounts, using central management of the services outlined below. These tools can be implemented individually or automated as part of AWS Landing Zone. For VMware Cloud, the connected AWS account that I have full control over can fall into the remit of these services, my organizational hierarchy, and Service Control Policies. However, remember that the SDDC environment is deployed to a shadow AWS account that the customer does not have access to, and this means that we need to utilise Log Insight Cloud to capture and analyse any syslog output from vCenter, NSX-T, etc. Log Insight Cloud can also pull logs from AWS as a log source, from services like CloudTrail and CloudWatch. You can read more about VMware Cloud security measures in VMware Cloud on AWS Security One Stop Shop.

IAM is a mechanism by which we can manage, control, and govern authentication, authorisation, and access to resources within your AWS account. For administrators overseeing multiple accounts, IAM can help with enforcing password policies, Multi-Factor Authentication (MFA), and Identity Federation or Single Sign-On. IAM policies can be applied to users, groups, or roles.

CloudTrail records and tracks all Application Programming Interface (API) requests in an AWS account. Each API request is captured as an event, containing associated metadata such as caller identity, timestamp, and source IP. The event is recorded and stored as a log file in an S3 bucket, with custom retention periods, and optional delivery to CloudWatch Logs for metric monitoring and alerting. CloudTrail logs for multiple accounts can be stored in a central encrypted S3 bucket for effective auditing and security analysis.

GuardDuty is a regional-based intelligent threat detection service that monitors for unusual behaviour from CloudTrail event logs, VPC flow logs, and DNS logs. Logs are assessed against multiple security feeds for anomalies and known malicious sources. GuardDuty can provide continuous security analysis, powered by machine learning, for your entire AWS environment across multiple-accounts. GuardDuty findings are presented in a dashboard with priority level and severity score and integrate with other services such as CloudWatch and Lambda for remediation automation.

Config can record and capture resource changes in your environment for Configuration Items, detail resource relationships, store configuration history, provide a snapshot of configurations, act as a resource inventory for AWS resources, and allow you to check the compliance of those resources against pre-defined and custom rules. Config can enable notifications of changes, as well as detailing who made the change and when, by integrating with CloudTrail. When coupled with rules like encryption checks, Config can become a powerful security analysis tool.

The Security Pillar White Paper of the AWS Well-Architected Framework is worth reviewing as a starting point to these services.

AWS Control Tower & Landing Zone

AWS Control Tower is a paid-for option for customers who want to quickly setup and govern new AWS environments based on AWS best practices, for those with multiple and distributed applications that will span many accounts. AWS Control Tower consists of established blueprints allowing for automated setup and configuration of your multi-account AWS environments and Identity Federation. Account Factory automates and standardises account provisioning from a configurable account template, with pre-approved network and region settings. Guardrails prevent resources from being deployed that do not conform to policies and detect and remediate non-compliant accounts and resources. Control Tower dashboards provide visual summaries to monitor security and compliance across the organization.

One of the components included with Control Tower is AWS Landing Zone. A Landing Zone can also be implemented yourself outside of Control Tower, it deploys a multi-account AWS environment based on AWS well-architected and security and compliance best practices. The Landing Zone deployment is made up of 4 accounts for AWS Organization & Single Sign-On (SSO), shared infrastructure services, log archive, and security. The good thing about AWS Landing Zone is it provides a security baseline for several security services and settings, you can see the full list here. Once again, you can create these accounts and services yourself manually if there is a need for greater customisation or granular control, however doing so is time-consuming.

SDDC Cross-Account AWS Connectivity

Not having a Landing Zone or Organization and account structure in place does not stop or delay the VMware Cloud on AWS deployment. For example, you can still create the connected AWS account, and your own central shared services or network account, if it is appropriate to your design, and retrospectively fit these accounts into the Organization hierarchy.

In the setup below, the connected AWS VPC has been reserved for SDDC operations only, in this case, VM backups. The SDDC router is connected to this VPC / account using the subnets defined in the ENI configuration at deployment, meaning backups will run over the 25 Gbps cross-VPC link with no additional data charges. Further services can be deployed to this account, but as the number of AWS services and environments (prod, dev, test, etc.) start to scale, it is good practice to use separate accounts. This is where the Transit Gateway and centralised shared network account can help.

SDDCConnection

The Transit Gateway (TGW) allows customers to connect many VPCs (including the SDDC), and on-premises networks to a single gateway. In the example architecture, we have the following, which provides connectivity between VMware Cloud on AWS, multiple VPCs and accounts, and the on-premises data centre, using a central shared network services model:

  • Direct Connect has been attached to the TGW using a Direct Connect Gateway, you can read how here.
  • VMware Cloud on AWS has been connected to the TGW using a VPN attachment. The VPN needs setting up in the Cloud Services Portal, you can read how here. Note that to my knowledge using this model in conjunction with HCX L2 extension may not be supported end to end.
  • Additional VPCs are connected to the TGW using VPC attachments, you can read how here.

Outside of VMware cloud, VPC Peering has traditionally been used to provide one to one private network connectivity between VPCs, including VPCs in different accounts. VPC Peering cannot be configured in the SDDC as we do not have access to the underlying AWS account. If it is unlikely that the VMware Cloud customer will be a heavy user of native AWS services, then using a TGW may be overkill, and the SDDC connected VPC may suffice.

For small environments, a VPN connection between additional VPCs can be configured on a one-to-one basis from the VMware Cloud Services Portal. However, as the number of VPCs and accounts begins to scale the VPN approach becomes harder to manage. VPC endpoints can also be used for targeted access to service-by-service resources in other accounts, you can see examples of this at AWS Native Services Integration With VMware Cloud on AWS.

In any case, when connecting VPCs and networks together, it is essential to remember that you should not have overlapping IP address ranges. This is relatively easy to plan for greenfield AWS environments but may need further consideration when connecting your existing on-premises networks.

Now, when we pull the shared network and account management together, we start to have the basis for the DataOps team to deploy their own AWS services with cross-environment access, governed by organizational policy and control. This post was intended as a high-level view on account and network management for VMware Cloud on AWS design integration with native AWS. Allowing connectivity into your SDDC requires correct firewall configuration, you can view examples at Connecting VMware Cloud on AWS to Amazon EC2.

Example VMC AWS Setup

vRealize Operations 6.4 Install Guide

The vRealize product suite is a complete, enterprise, cloud management and automation platform for private, public, and hybrid clouds. Specifically vRealize Operations Manager provides intelligent operations management across heterogeneous physical, virtual, and cloud environments from a wide range of vendors. vRealize Operations Manager is able to deliver proactive and automated performance improvements  by implementing resource reclamation, configuration standardisations, workload placement, planning, and forecasting techniques. By leveraging vRealize Operations Manager users can protect their environment from outages with preventative and predictive analytics and monitoring across the estate; utilising management packs to  unify operations management. The image below is taken from the vRealize Operations Manager datasheet.

vro

vRealize Operations Manager can be deployed as a single node cluster, or a multiple node cluster. In single node cluster environments the master node is deployed with adapters installed which collect data and perform analysis. For larger environments additional data nodes can be added to scale out the solution, these are known as multiple node clusters. In a multiple node cluster the master node is responsible for the management of all other nodes. Data nodes handle data collection and analysis. High availability can be achieved by converting a data node into a replica of the master node. For distributed environments remote collector nodes are deployed to gather inventory objects and navigate firewalls in remote locations. These nodes do not store data or perform analytics; you can read more about remote collector nodes here. In this post we will deploy a single node cluster for small environments, proof of concept, test, or lab purposes, and link it to a vCenter Server instance. There will also be references to larger deployments and scaling out the application throughout the guide. If you have already deployed your vRealize cluster and want to add additional nodes or configure High Availability click here.

Licensing is split out into 3 editions; standard, advanced, and enterprise. To view the full feature list of the different editions see the vRealize Operations page. There are a number of VMware product suites bundling vRealize Operations, or it can be purchased standalone. Licensing is allocated in portable license units (vCloud suite and vRealize suite only), per processor with unlimited VMs, or in packs of 25 VMs (or OS instances).

Design Considerations

  • Additional data nodes can be added at any time using the Expand an Existing Installation option.
  • When scaling out the cluster by 25% or more the cluster should be restarted to optimise performance.
  • The master node must be online before any other nodes are brought online (except for when adding nodes at first setup of the cluster).
  • When adding additional data nodes keep in mind the following:
    • All nodes must be running the same version
    • All nodes must use the same deployment type, i.e. virtual appliance, Windows, or Linux.
    • All nodes must be sized the same in terms of CPU, memory, and disk.
    • Nodes can be in different vSphere clusters, but must be in the same physical location and subnet.
    • Time must be synchronised across all nodes.
  • These rules also apply to replica nodes. Click here to see a full list of multiple node cluster requirements.
  • Remote collector nodes can be deployed to remote locations to gather objects for monitoring. These nodes do not store data or perform any analytics but connect remote data sources to the analytics cluster whilst reducing bandwidth and providing firewall navigation. Read more about remote collector nodes here.
  • When designing a larger vROps environment check the Environment Complexity guide to determine if you should engage VMware Professional Services. You should also review the following documentation:

Requirements

  • The vRealize Operations Manager virtual appliance can be deployed to hosts running ESXi 5.1 U3 or later, and requires vCenter Server 5.1 U3 or later (it is recommended that vSphere 5.5 or later is used).
  • The virtual appliance is the preferred deployment method, a Windows and Linux installer is also available however the Windows installer will no longer be offered after v6.4, and end of life for the Linux installer is also imminent.
  • A static IP address must be used for each node (to change the IP after deployment see this kb).
  • Review the list of Network Ports used by vRealize Operations Manager.
  • The following table is from the vRealize Operations Manager Sizing Guide and lists the hardware requirements, latency, and configuration maximums.

sizing

Installation

Download vRealize Operations Manager here, in virtual appliance, Windows, or Linux formats. Try for free with hands on labs or a 60 day trial here.

In this example we are going to deploy as an appliance. Navigate to the vSphere web client home page, click vRealize Operations Manager and select Deploy vRealize Operations Manager.

vro1

The OVF template wizard will open. Browse to the location of the OVA file we downloaded earlier and click Next.

vro2

Enter a name for the virtual appliance, and select a location. Click Next.

vro3

Select the host or cluster compute resources for the virtual appliance and click Next.

vro4

Review the details of the OVA, click Next.

vro5

Accept the EULA and click Next.

vro6

Select the configuration size based on the considerations listed above, then click Next.

vra7

Select the storage for the virtual appliance, click Next.

vra8

Select the network for the virtual appliance, click Next.

vra9

Configure the virtual appliance network settings, click Next.

vra10

Click Finish on the final screen to begin deploying the virtual appliance.

vra11

Setup

Once the virtual appliance has been deployed and is powered on, open a web browser to the FQDN or IP address configured during deployment. Select New Installation.

install1

Click Next to begin the setup wizard.

install2

Configure a password for the admin account and click Next.

install3

On the certificate page select either the default certificates or custom. For assistance with adding custom certificates click here.

install4

Enter the host name for the master node and an NTP server, click Next.

install5

Click Finish.

install6

If required you can add additional data nodes before starting the cluster, or add them at a later date. See the Design Considerations section of this post before scaling out. To add additional data nodes or configure High Availability follow the steps at vRealize Operations High Availability before starting the cluster. Alternatively, you can start the cluster as a single node cluster and add data nodes or High Availability at a later date.

Since we are deploying a single node cluster we will now click Start vRealize Operations Manager. Depending on the size of the cluster it may take 10-30 minutes to fully start up.

install7

Confirm that the cluster has adequate nodes for the environment and click Yes to start up the application.

install8

After the cluster has started you will be diverted to the user interface. Log in with the admin details configured earlier.

install9

The configuration wizard will automatically start, click Next.

install10

Accept the EULA and click Next.

install11

Enter the license key or use the 60 day product evaluation. Click Next.

install12

Select whether or not to join the VMware Customer Experience Improvement Program and click Next.

install13

Click Finish.

install14

The vRealize Operations Manager dashboard will be loaded. The installation process is now complete. The admin console can be accessed by browsing to http:///admin where is the IP address of FQDN of your vRealize Operations Manager appliance or server.

install15

To add additional data nodes or configure High Availability see the vRealize Operations High Availability post.

Post Installation

After first setup we need to secure the console by creating a root account. Browse to the vROps appliance in vSphere and open the console. Press ALT + F1 and log in as root. You will be prompted to create a root password. All other work in this post is carried out using the vRealize Operations web interface.

The vRealize Operations web interface can be accessed by browsing to the IP address or FQDN of any node in the vRealize Operations management cluster (master node or replica node). During the installation process the admin interface is presented, after installation the IP address or FQDN resolves to the user interface. To access the admin interface browse to https:///admin where is the IP address or FQDN of either node in the management cluster. For supported browsers see the vRealize Operations Manager 6.4 Release Notes.

The next step is to configure the vCenter Adapter to collect and analyse data. Select Administration from the left hand navigation pane. From the Solutions menu select VMware vSphere and click the Configure icon.

config1

Enter the vCenter Server details and credentials with administrator access.

config2

Click Test Connection to validate connectivity to the vCenter Server.

config3

Expand Advanced Settings and review the default settings, these can be changed if required. Click Define Monitoring Goals and review the default policy, again this can be changed to suit your environment.

config4

When you’re ready click Save Settings and Close. The vCenter adapter will now begin collecting data. Collection cycles begin every 5 minutes, depending on the size of your environment the initial collection may take more than one cycle.

config5

Once data has been collected from the vCenter Server go back to the Home page and browse the different tabs and dashboards.

dashboard

Customise your vRealize Operations Manager instance to suit you environment using the VMware guides below.

Windows 2016 Storage Spaces Direct

Storage Spaces Direct for Windows Server 2016 is a software defined storage solution providing pooled storage resources across industry standard servers with attached local drives. Storage Spaces Direct (S2D) is able to provide scalability, built-in fault tolerance, resource efficiency, high performance, simplified management, and cost savings.

Storage Spaces Direct is a feature included at no extra cost with Datacentre editions of Windows Server 2016. S2D can be deployed across Windows clusters comprising of between 2 and 16 physical servers, with over 400 drives, using the Software Storage Bus to establishe a software-defined storage fabric spanning the cluster. Existing clusters can be scaled out by simply adding more drives, or more servers to the cluster. Storage Spaces Direct will automatically detect additional resources and absorb these drives into the pool; redistributing existing volumes. Resiliency is provided across not only drives, components, and servers; but can also be configured for chasis, rack, and site fault tolerance by creating fault domains to which the data spread will comply. The video below provided by Microsoft goes into more detail about fault domains and how they provide resiliency.

Furthermore volumes can be configured to use mirror resiliency or parity resiliency to protect data. Using mirror resiliency provides resiliency to drive and server failures by storing a default of 3 copies across different drives in different servers. This is a simple deployment with minimal CPU overhead but a relatively inefficient use of storage. Alternatively we can use parity resiliency, where parity symbols are spread across a larger set of data symbols to provide both drive and server resiliency, but also a more efficient use of storage resources (requires 4 physical servers). You can learn more about both these methods at the Volume Resiliency blog by Microsoft.

The main use case for Storage Spaces Direct is a private cloud (either on or off-premises) using one of two deployment models. Hyper-Converged where compute and storage reside on the same servers, in this use case virtual machines would sit directly on top of the volumes provided by S2D. Using a Private Cloud Storage or Converged deployment method S2D is disaggregated from the hypervisor, providing a separate storage cluster for larger-scale deployments such as Iaas (Infrastructure as a Service). A SoFS (Scale-out File Server) is built on S2D to provide network-attached storage over SMB3 file shares.

Storage Spaces Direct is configured using a number of PowerShell cmdlets, and utilises Failover Clustering and Cluster Shared Volumes. For instructions on enabling and configuring S2D see Configuring Storage Spaces Direct – Step by Step, Robert Keith, Argon Systems. The requirements are as follows:

  • Windows Server 2016 Datacentre Edition.
  • Minimum of 2 servers, maximum of 16, with local-attached SATA, SAS, or NVMe drives.
  • Each server must have at least 2 solid-state drives plus at least 4 additional drives, the read/write cache uses the fastest media present by default.
  • The SATA and SAS devices should be behind a HBA and SAS expander.
  • Storage Spaces Direct uses SMB3, including SMB Direct and SMB Multichannel, over Ethernet to communicate between servers. 10 GbE or above is recommended for optimum performance.
  • All hardware must support SMB (Server Message Block) and RDMA (Remote Direct Memory Access).

s2ddeployments