vRealize Operations Capacity Shows 100% Cluster Utilisation

Recently we were examining a vSphere cluster where vRealize Operations Manager was showing 100% CPU utilisation, with zero capacity remaining. However, the usage of all resources in the cluster was generally low. We know that the cluster capacity is based on demand rather than usage. CPU demand is the amount of CPU resources a virtual machine would use if there were no CPU contention or limit. Sometimes, this can cause a little confusion when we look at the utilisation metrics of the cluster.

This type of behaviour is actually expected because of how vRealize Operations interprets the data. When virtual machines have latency sensitivity set to high, all of the CPU is requested by the virtual machine in order to reserve it. Since vRealize Operations Manager cannot differentiate between latency sensitivity reservations and legitimate CPU requests, we see CPU and/or memory contention alerts. More information can be found in the KB article Virtual Machine(s) Workload badge reports constant 100+ score in VMware vRealize Operations Manager (2145552). The KB article suggests that if latency sensitivity cannot be set back to normal, then a custom group can be created to disable the alerts.

This scenario is well documented. However what if latency sensitivity is not enabled or configured beyond the default setting, but the symptoms are the same? In this case, the cluster is dedicated to running SQL workloads.

From using the metrics view of the cluster under the environment tab, we can see high peaks for the CPU co-stop and CPU ready values every night. The discrepency seems to be caused by the behaviour of the virtual machines in claiming all available CPU resource at a specific time. Whilst this might sound environmentally specific, there are a number of scenarios where this could be the case and a workaround is needed.

Beyond changing the behaviour of the virtual machines, some available options are as follows:

  • Action the rightsize recommendations to ensure we are not over allocating CPU resources
  • Follow the steps outlined in the KB article above to ignore/disable the alerts
  • Follow the steps outlined below to set a maintenance schedule, disregarding metrics where the peak is at a consistent time every day or night
  • In the capacity policy change the setting of the time remaining calculations

Updating how the time remaining is calculated may be a last resort, but can provide a slightly different interpretation of the data. You can see the description of each setting, and how the associated projection graph changes in the screenshots below. The default policy uses conservative capacity planning which takes the higher values, whereas aggressive uses the averages values of resource utilisation.

To update this setting either change the default policy, or create a new policy to assign to specific objects like a cluster. Follow the policy based steps outlined below, disregarding the maintenance schedule. You can find out more information on how remaining time is calculated in the blog Rightsizing VMs with vRealize Operations.

vRealize Operations Conservative Capacity Policy
vRealize Operations Aggressive Capacity Policy

Setting a Maintenance Schedule

The following steps will walk through creating a maintenance schedule with associated capacity policy. You can also change the time remaining calculations from the capacity policy, with or without a maintenance schedule. The screenshots are from vROps 8.6, but previous versions of 8.x should be a similar process.

  • First, create the maintenance schedule. From the left hand navigation pane, expand Configure and select Maintenance Schedules.
  • Click Add. Enter the name, time zone, and time configuration of the schedule. Click Save.
  • Next, we need to create a policy. From the Configure menu again, select Policies.
  • Click Add. Enter the name, and select a policy to clone. Click Create Policy.
  • Select the policy from the list, and click Edit Policy.
  • Select the Capacity block, and then choose the object type.
vRealize Operations Capacity Policy

Here if required you can change the policy for time remaining calculations, mentioned above, as well as manually change the alert thresholds. When considering the time remaining calculations, the default conservative policy will take the highest resource utilisation to project the time remaining before this crosses the usable capacity threshold. The aggressive policy will use the mean average resource utilisation to project the time remaining before this average crosses the usable capacity threshold. Both policies are of use, aggressive may be better suited to smaller organisations wanting to sweat hardware assets.

  • Make any desired changes to the policy per the description above. Scroll down to Maintenance Schedule and select the schedule created earlier. Click Save.
  • Next, select Groups and Objects. Choose a custom group or object to apply the policy to, and click Save.
vRealize Operations Assigned Policy
  • Now that the policy is configured and assigned to an object, it is active and in use.
vRealize Operations Active Policy
  • When we check back on the maintenance schedule we can now see the linked policy.
vRealize Operations Maintenance Schedule

There are additional ways of setting maintenance schedules, the example above is relevant to the described use case to disregard metrics during a certain time interval. You can also manually enter maintenance through both the vROps UI and API, see Maintenance Mode for vRealize Operations Objects, Part 1 by Thomas Kopton, or create dynamic groups containing hosts in maintenance mode, see Maintenance Mode for vRealize Operations Objects, Part 2.

How to Upgrade to vRealize Operations Manager 8.3

Introduction

Recently I installed vRealize Operations Manager 8.2 in my home lab environment. Less than a week later 8.3 was released – of course it was! The new version has some extra features like 20-second peak metrics and VMware Cloud on AWS objects, but what I’m interested to look at is the new Cloud Management Assessment (CMA). The vSphere Optimisation Assessment (VOA) has been around for a while to show the value of vRealize Operations (vROps) and optimise vSphere environments. The CMA is the next logical step in extending that capability out into VMware Cloud and vRealize Cloud solutions. You can read more in the What’s New in vRealize Operations 8.3 blog. This post walks through the steps required to upgrade vRealize Operations Manager from 8.2 to 8.3.

vRealize Operations Manager 8.3 Upgrade Guide

The upgrade process is really quick and easy for a single node standard deployment. The upgrade may take longer if you have multiple distributed nodes that the software update needs pushing out to, or if you need to clone any custom content. If you are upgrading from vROps 8.1.1 or earlier you will need to upgrade End Point Operations Management agents using the steps detailed here. The agent builds for 8.3 and 8.2 are the same.

Before upgrading vRealize Operations Manager we’ll run the Upgrade Assessment Tool; a non-intrusive read only software package that produces a report showing system validation checks and any removed or discontinued metrics. The latter point is important to make sure you don’t lose any customisation like dashboards or management packs as part of the upgrade. Here are some additional points for the upgrade:

  • Take a snapshot or backup of the existing vRealize Operations Manager before starting
  • Check the existing vRealize Operations Manager is running on ESXi 6.5 U1 and later, and managed by vCenter 6.5 or later
  • Check the existing vRealize Operations Manager is running at least hardware version 11
  • You can upgrade to vROps 8.3 from versions 7.0 and later, check the available upgrade paths here
  • If you are using any other VMware solutions check product interoperability here
  • If you need to backup and restore custom content review the Upgrade, Backup and Restore section of the vRealize Operations 8.3 documentation here

vROps 8.3 Upgrade Checks

First, download the vRealize Operations Manager upgrade files, you’ll need the Virtual Appliance Upgrade for 8.x or 7.x pak file, and the Upgrade Assessment Tool pak file. The vRealize Operations 8.3 release notes can be found here.

Browse to the FQDN or IP address of the vRealize Operations Manager master node /admin, and log in with the admin credentials.

vRealize Operations Manager admin login

From the left-hand navigation pane browse to Software Update. Click Install Software Update and upload the Upgrade Assessment Tool pak file. Follow the steps to accept the End User License Agreement (EULA) and click Install.

Check the status of the software bundle from the Software Update tab. Once complete, click Support and Support Bundles. Highlight the bundle and click the download icon to obtain a copy of the report.

vRealize Operations Manager support bundle

Extract the downloaded zip file and expand the apuat-data and report folders. Open index.html.

vRealize Operations Manager system validation

System validation checks and impacted components can be viewed. For any impacted components you can drill down into the depreciated metric and view any applicable replacements.

vRealize Operations Manager content validation

vROps 8.3 Upgrade Process

Following system and content validation checks the next step is to run the installer itself. Navigate back to the Software Update tab and click Install Software Update. Upload the vRealize Operations Manager 8.3 upgrade pak file.

vRealize Operations Manager software update

When upload and staging is complete click Next.

vRealize Operations Manager software upload

Accept the End User License Agreement (EULA) and click Next.

vRealize Operations Manager EULA

Review the update information and click Next.

vRealize Operations Manager update information

Click Install to begin the software update.

vRealize Operations Manager install software update

You can monitor the upgrade process from the Software Update page, however after about 5 minutes you will be logged out.

vRealize Operations Manager update in progress

After logging back in, it takes around a further 15-20 minutes before the update is finalised and the cluster is brought back online. Refresh the System Status and System Update pages when complete.

vRealize Operations Manager update complete

I can now log back into vROps. The Cloud Management Assessment can be accessed from the Quick Start page by expanding View More, selecting Run Assessments and clicking VMware vRealize Cloud Management Assessment.

vRealize Operations Manager Cloud Management Assessments

vRA Deployments with Terraform

This post covers notes made when using Terraform to deploy basic resources from VMware vRealize Automation (vRA). Read through the vRA provider plugin page here and the Terraform documentation here. There are a couple of other examples of Terraform configurations using the vRA provider here and here. If you’re looking for an introduction on why Terraform and vRA then this blog post gives a good overview. If you have worked with the vRA Terraform provider before feel free to add any additional pointers or best practises in the comments section, as this is very much a work in progress.

Terraform Setup

Before starting you will need to download and install Go and Git to the machine you are running Terraform from. Visual Studio Code with the Terraform extension is also a handy tool for editing config files but not a requirement. The steps below were validated with Windows 10 and vRA 7.3.

After installing Go the default GOROOT is set to C:\Go and GOPATH to %UserProfile%\Go. Go is  the programming language we will use to rebuild the vRA provider plugin. GOPATH is going to be the location of the directory containing source files for Go projects.

In this instance I have set GOPATH to D:\Terraform and will keep all files in this location. To change GOPATH manually open Control Panel, System, Advanced system settings, Advanced, Environment Variables. Alternatively GOROOT and GOPATH can be set from CLI:

set GOROOT=C:\Go
set GOPATH=D:\Terraform

Download Terraform for Windows, put the executable in the working directory for Terraform (D:\Terraform or whatever GOPATH was set to).

In AppData\Roaming create a new file terraform.rc (%UserProfile%\AppData\Roaming\terraform.rc) with the following contents, replace D:\Terraform with your own Terraform working directory.

providers {
     vra7 = "D:\\Terraform\\bin\\terraform-provider-vra7.exe"
}

Open command prompt and navigate to the Terraform working directory. Run the following command to download the source repository:

go get github.com/vmware/terraform-provider-vra7et GOROOT=C:\Go

Open the Terraform working directory and confirm the repository source files have been downloaded.

The final step is to rebuild the Terraform provider using Go. Download the latest version of dep. Rename the executable to dep.exe and place in your Terraform working directory under \src\github.com\vmware\terraform-provider-vra7.

Back in command prompt navigate to D:\Terraform\src\github.com\vmware\terraform-provider-vra7 and run:

dep ensure
go build -o D:\Terraform\bin\terraform-provider-vra7.exe

Running dep ensure can take a while, use the -v switch if you need to troubleshoot. The vRA Terraform provider is now ready to use.

Using Terraform

In the Terraform working directory a main.tf file is needed to describe the infrastructure and set variables. There are a number of example Terraform configuration files located in the source repository files under \src\github.com\vmware\terraform-provider-vra7\example.

A very basic example of a configuration file would first contain the vRA variables:

provider "vra7" {
     username = "username"
     password = "password"
     tenant = "vRAtenant"
     host = "https://vRA
}

Followed by the resource details:

resource "vra7_resource" "machine" {
   catalog_name = "BlueprintName"
}

Further syntax can be added to pass additional variables, for a full list see the resource section here. The configuration file I am using for the purposes of this example is as follows:

main_tf

Example config and variable files from source repo:

multi_machine_example

variables_example

Once your Terraform configuration file or files are ready go back to command prompt and navigate to the Terraform working directory. Type terraform and hit enter to see the available options, for a full list of commands see the Terraform CLI documentation here.

Start off with initialising the vRA provider plugin:

terraform init

terraform_init

Validate the Terraform configuration files:

terraform validate

If you’re ready then start the deployment:

terraform apply

terraform_apply_1

Monitor the progress from the CLI or from the task that is created in the Requests tab of the vRA portal.

terraform_apply_2

terraform_apply_3

Check the state of the active deployments using the available switches for:

terraform state

terraform_state

To destroy the resource use:

terraform destroy

terraform_destroy