Recently I installed vRealize Operations Manager 8.2 in my home lab environment. Less than a week later 8.3 was released – of course it was! The new version has some extra features like 20-second peak metrics and VMware Cloud on AWS objects, but what I’m interested to look at is the new Cloud Management Assessment (CMA). The vSphere Optimisation Assessment (VOA) has been around for a while to show the value of vRealize Operations (vROps) and optimise vSphere environments. The CMA is the next logical step in extending that capability out into VMware Cloud and vRealize Cloud solutions. You can read more in the What’s New in vRealize Operations 8.3 blog. This post walks through the steps required to upgrade vRealize Operations Manager from 8.2 to 8.3.
vRealize Operations Manager 8.3 Upgrade Guide
The upgrade process is really quick and easy for a single node standard deployment. The upgrade may take longer if you have multiple distributed nodes that the software update needs pushing out to, or if you need to clone any custom content. If you are upgrading from vROps 8.1.1 or earlier you will need to upgrade End Point Operations Management agents using the steps detailed here. The agent builds for 8.3 and 8.2 are the same.
Before upgrading vRealize Operations Manager we’ll run the Upgrade Assessment Tool; a non-intrusive read only software package that produces a report showing system validation checks and any removed or discontinued metrics. The latter point is important to make sure you don’t lose any customisation like dashboards or management packs as part of the upgrade. Here are some additional points for the upgrade:
Take a snapshot or backup of the existing vRealize Operations Manager before starting
Check the existing vRealize Operations Manager is running on ESXi 6.5 U1 and later, and managed by vCenter 6.5 or later
Check the existing vRealize Operations Manager is running at least hardware version 11
You can upgrade to vROps 8.3 from versions 7.0 and later, check the available upgrade paths here
If you are using any other VMware solutions check product interoperability here
If you need to backup and restore custom content review the Upgrade, Backup and Restore section of the vRealize Operations 8.3 documentation here
vROps 8.3 Upgrade Checks
First, download the vRealize Operations Manager upgrade files, you’ll need the Virtual Appliance Upgrade for 8.x or 7.x pak file, and the Upgrade Assessment Tool pak file. The vRealize Operations 8.3 release notes can be found here.
Browse to the FQDN or IP address of the vRealize Operations Manager master node /admin, and log in with the admin credentials.
From the left-hand navigation pane browse to Software Update. Click Install Software Update and upload the Upgrade Assessment Tool pak file. Follow the steps to accept the End User License Agreement (EULA) and click Install.
Check the status of the software bundle from the Software Update tab. Once complete, click Support and Support Bundles. Highlight the bundle and click the download icon to obtain a copy of the report.
Extract the downloaded zip file and expand the apuat-data and report folders. Open index.html.
System validation checks and impacted components can be viewed. For any impacted components you can drill down into the depreciated metric and view any applicable replacements.
vROps 8.3 Upgrade Process
Following system and content validation checks the next step is to run the installer itself. Navigate back to the Software Update tab and click Install Software Update. Upload the vRealize Operations Manager 8.3 upgrade pak file.
When upload and staging is complete click Next.
Accept the End User License Agreement (EULA) and click Next.
Review the update information and click Next.
Click Install to begin the software update.
You can monitor the upgrade process from the Software Update page, however after about 5 minutes you will be logged out.
After logging back in, it takes around a further 15-20 minutes before the update is finalised and the cluster is brought back online. Refresh the System Status and System Update pages when complete.
I can now log back into vROps. The Cloud Management Assessment can be accessed from the Quick Start page by expanding View More, selecting Run Assessments and clicking VMware vRealize Cloud Management Assessment.
The following post reviews The Backup Bible, a 204 page ebook written by Eric Siron and published by Altaro Software. I’ve decided to break this down into a short discussion around each section, 3 in total, to help with the flow of the text. The Backup Bible can be downloaded for free here.
Initial thoughts on downloading the ebook and flicking through; clearly a lot of research and work has gone into this based on content applicable to both the business outcomes and objectives as well as low level technical detail. A substantial number of example process documents are included that m§ean practical outputs can be applied by the reader straight away. The graphics, colour scheme, font, and text size all look like they’ll make the ebook easy to follow, so let’s jump in.
Introduction and part 1: creating a backup & disaster recovery strategy
Chapter 1 is short and to the point; providing a check list for the scope of disaster recovery. The check list serves as a nice starting point for organisations to expand into sub-categories based on their own goals, requirements, or industry specific regulations. An interesting observation made during the introduction is that human nature leans towards planning for best outcomes, rather than worst. As a result, organisations design systems with defences for known common failures (think disk, network adapter) but less often design with catastrophic failure in mind. Rather than designing systems with the expectation of failure principals, an assumption is made that they will operate as expected. We’ll discuss more points made in the ebook around shifting that mentality later.
A further key-takeaway from the introduction is that disaster recovery planning is not a one-time event, but an ongoing process. Plans and requirements will need to adapt as new services are added and older services change. This is why the ebook focuses heavily on the implementation of a disaster recovery strategy, staying agile, as oppose to a singular process or policy. The disaster recovery strategy will be driven by the overall business strategy.
Some great talking points are introduced in chapter 2 around popular responses as to why disaster recovery fails, or is not properly implemented. In most cases expect a combination of many, if not all, of these reasons. Building disaster recovery into the initial design of a greenfield environment can be relatively easy, in comparison with retrospectively fitting it into brownfield environments. Older systems and infrastructure were generally deployed in silos yet interlinked with dependencies. New services have been deployed over time, bolted onto existing infrastructure, with changing topologies. Disaster recovery plans need to be agile enough to cater for different systems, and keep up with both technical and business changes.
The chapter moves on to talk about common risks, and determining key stakeholders. Again, useful examples that can be adapted and increased based on your industry and organisational structure, such as protecting physical assets as well as digital. Gaining insights into business priorities and current risks from different people within the organisation boosts your chances of building successful plans and processes.
Chapter 3 starts out with some easy to understand explanations and graphics on Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO). It’s good to set out why these are important, since different systems will have different RTO and RPO values. Defining and setting these values helps with the trade-off between budget and availability.
Another tip here that really resonates with me is to translate business impact into monitory value. The suggested “How long can we operate without this system?” suffices in most cases, but to create urgency the ebook recommends asking “How much does this system cost us per hour when offline?”. Hearing that the business is losing £x an hour hits harder than saying part of the website is down. Establishing RPO’s for different events is also something often overlooked, for example recovery from a hardware failure will most likely differ from a site outage or ransomware attack.
The first 3 chapters laid the groundwork for capturing business requirements. Chapter 4 helps break business items down and map them to common technical solutions, applied at both the software and hardware level. Solid explanations of fault tolerance and high availability are followed up with in-depth descriptions of applying these terms to storage, memory, and other systems through clustering, replication, and load-balancing. Although fault tolerance and high availability contribute to IT operations, they absolutely do not replace backups. This point is positioned alongside an extensive list of how to assess potential backup solutions against your individual requirements when evaluating software.
Part 2: backup best practices in action
The second part of The Backup Bible begins by setting out some example architectures for backup software placement. Security principals are also now being worked into the disaster recovery strategy, alongside backup software and storage, whilst those not currently in a backup or system administrator role will learn about the differences between crash-consistent, and application-consistent backups. Chapter 5 again allows for practical application, in the form of a 2-phase backup software selection and testing plan. The example tracking table provided can be used for feature comparisons, and kept for audit or reporting purposes.
Options for backup storage targets follow in chapter 6; another section of comprehensive explanations detailing the advantages and disadvantages of different storage types. The information allows consumers to apply each scenario to their own requirements and make informed decisions when implementing their disaster recovery strategy. Most examples are naturally data centre storage types, but perhaps where cloud storage targets differ is that they should be factored into an organisationals overall cloud strategy too. Factors like egress charges, connectivity, and data offshoring governance or regulations for cloud targets will likely need building into the wider cloud computing picture.
Chapter 7 moves on to a really important topic, and I’m glad this section looks pretty weighty. In many industries, securing and protecting backup data can be just as important as the availability of the data itself. The opening pages walk through securing backups using examples like encryption, account management, secret/key vaults, firewalls, network segmentation and air-gapping. At the same time keeping to the functionality within your existing backup, storage, network and security tools, so as not to over-engineer a solution that will be difficult to support in the future. The chapter moves on to talk about layering security practices, and how these can be weighed up using risk assessments, and then built into policy if desired. Very well made points that are realistic and applicable to real world scenarios.
An example backup software deployment checklist is covered in chapter 8, before chapter 9 expands more on documentation. This is another step often missed in the rush to get solutions out of the door. In disaster recovery, documentation is especially important so that on-call staff and those not familiar with the setup can carry out any required actions quickly and correctly.
The next stage in the process is to implement your organisations data protection needs in the form of backup schedule(s) – usually multiple. Various examples of backup schedules are used in chapter 10; placing the most value on full backups without dependencies on other data sources, and filtering through incremental, differential, or delta backups for convenience. A recommendation is made to time backups with business events, such as month end processes, when high value changes take place, or maintenance such as monthly operating system updates. Once again a real focus is placed firstly on building a technical strategy around business requirements, and secondly on communication and documentation to ensure that strategy is as effective as possible.
The final 2 chapters of this section describe maintaining the backup solution beyond its initial implementation. This includes monitoring and testing backups for data integrity, and keeping systems up to date and in good health. Security patches and upgrades will be in the form of scheduled or automatic regular updates, and less frequent one-off events like drivers, firmware, or hot patches for exposed vulnerabilities or fixes. Often newly implemented systems look great on day 1, but come back on day x in 6 or 12 months and things can look different. It’s refreshing to see the ebook call this out and plan for the ongoing lifecycle of a solution as well as the initial design and implementation.
Part 3: disaster recovery & business continuity blueprint
In part 3 the ebook dives into business continuity, focusing on areas beyond technology; like making sure that office or physical space, people, and business dependencies are all taken care of to start recovery operations as quickly as possible. A theme that has remained consistent is asking questions and woking with multiple subject-matter experts, in turn exposing more questions and considerations. Example configurations of hot, warm, and cold secondary sites are used, alongside the required actions to prepare each site should business continuity ever be invoked. These topics really get the reader thinking of the bigger picture, and how some of the technical planning earlier feeds into business priorities.
Chapter 15 moves on to replication with a thorough description on replication types, and when they are or are not applicable. Replication enables rapid failover to another site, and does not replace but rather complements backups where budgets allow. Clear direction is given here to ensure replication practices are understood, along with which tasks can be automated but which need to remain manual. The ebook points out that if configured incorrectly replication can also add overhead, and put data at risk, so it’s good to see both sides of the coin. A helpful tip in this chapter is to use external monitoring where possible, rather than rely on built-in email alerting, which could fail along with the system it is supposed to be monitoring!
The use of cloud services is probably beyond the scope of this ebook, that said, because cloud storage is becoming an option for backup targets it is mentioned and implied currently as more of a comfort blanket. Chapter 16 adds some further considerations, like not making an assumption that backups or disaster recovery is included in a service just because it is hosted in the cloud. Cloud services need to be architected for availability in the same way they do on-premises. Some Software-as-a-Service (SaaS) offerings may include this functionality but it will be explicitly stated. The scope of using cloud for disaster recovery obviously varies between organisations already heavily invested in the cloud, or multiple cloud providers, and those not using it at all. Despite there being more talking points I think it’s right to keep the conversation on topic to avoid going down a rabbit hole.
We’re now at a stage where disaster recovery has been invoked and chapter 17 runs through what some of the business processes may look like. There’s a lot of compassion shown in this chapter and whilst flexibility and considerations for people is a given, going a step further and writing it into the business continuity documentation is something I must admit I hadn’t previously thought about. Having plans laid out ready as to how employees will be contacted in the event of a disaster will also save time and reduce confusion. There’s some good information here on implementing home working too, and this is particularly relevant as we still navigate through COVID-19 restrictions. Certain areas of industries like manufacturing and healthcare may need that secondary site mentioned earlier on, but a lot of jobs in the Internet-era can be carried out from home or distributed locations.
As the ebook begins to draw to a close, the latter 2 chapters produce guidance on how the disaster recovery strategy can be tested and maintained. This is crucial for many of the reasons we’ve read through, and we learn that automating testing in a ‘set and forget’ way is usually not sufficient. Some processes need to be manually checked and validated on a schedule to protect against unexpected behaviour. Chapter 19 calls once again on working together with teams outside of IT, and this is more good advice since rather than IT setting baseline values, for example on backup frequency and retention, it encourages line of business to take responsibility for their own data and applications.
Overall The Backup Bible has been an enlightening read; containing some really useful guidance and personal narratives about situations the author has experienced. A substantial number of templates for documentation and process checklists are included at the end of the ebook, ranging from backup and restore documentation to data protection questionnaires to testing plans. The ebook does enough of the explanation leg work, and not assuming knowledge, to make sure that a reader from any area of the business can take something away from it.
There are of course references to Altaro software, but this in no way is a glorified advertisement. The points discussed are presented in a neutral manner with comprehensive detail for organisations to make informed decisions about technologies and processes most suited to their own business. Rather than publishing a list of ‘must-haves’ for disaster recovery, the ebook acknowledges that business have different requirements and provides the tools for the reader to go about implementing their own version based on what they have learnt through the course of the ebook.
From a small business with very little protection against a disaster, to an enterprise organisation with processes already in place, anybody interested in disaster recovery will be able to gain something from reading this ebook. The Backup Bible can be downloaded for free here.
In this post we take a look at a vRealize Operations (vROps) deployment for vSphere 7; building on the installation of vCenter 7.0 U1 and vSAN 7.0 U1. Shortly after installing vROps 8.2, vRealize Operations 8.3 was released. The install process is similar, you can read what’s new here and see the upgrade process here.
vRealize Operations is an IT operations management tool for monitoring full-stack physical, virtual, and cloud infrastructure, along with virtual machine, container, operating system, and application level insights. vROps provides performance and capacity optimisation, monitoring and alerting, troubleshooting and remediation, and dashboards and reporting. vROps also handles private costings, showback, and what-if scenarios for VMware, VMware Cloud, and public cloud workloads. Many of these features have been released with version 8.2, and now work slicker fully integrated into the vROps user interface, rather than a standalone product. Previously vRealize Business would cater for similar costing requirements, but has since been declared end of life.
vRealize Operations can be deployed on-premises to an existing VMware environment, or consumed Software-as-a-Service (SaaS). vRealize Operations Cloud has the same functionality, with the ongoing operational overhead of lifecycle management and maintenance taken care of by VMware. Multiple vCenter Servers or cloud accounts can be managed and monitored from a single vROps instance. For more information on vROps see the What is vRealize Operations product page.
vRealize Operations Manager 8.2 Install Guide
The vRealize Operations Manager installation for lone instances is really straight forward, as is applying management packs for monitoring additional environments. Where the installation may get more complex, is if multiple cluster nodes need to be deployed, along with remote collector nodes, and/or multiple instances. If you think this may apply to you review the complexity levels outlined in the vRealize Operations Manager 8.2 Deployment Guide.
The installation steps below walk through the process of installing vROps using the master node. All deployments start out with a master node, which in some cases is sufficient to manage itself and perform all data collection and analysis operations. Optional nodes can be added in the form of; further data nodes for larger deployments, replica nodes for highly available deployments, and remote collector nodes for distributed deployments. Remote collector nodes, for example, can be used to compress and encrypt data collected at another site or another VMware Cloud platform. This could be an architecture where a solution like Azure VMware Solution is in use, with an on-premises installation of vROps. For more information on the different node types and availability setups see the deployment guide linked above.
When considering the deployment size and node design for vROps, review the VMware KB vRealize Operations Manager Sizing Guidelines, which is kept up to date with sizing requirements for the latest versions. The compute and storage allocations needed depend on your environment, the type of data collected, the data retention period, and the deployment type.
Before starting ensure you have a static IP address ready for the master node, or (ideally and) a Fully Qualified Domain Name (FQDN) with forward and reverse DNS entries. For larger than single node deployments check the Cluster Requirements section of the deployment guide.
The vRealize Operations Manager appliance can be downloaded in Open Virtualisation Format (OVF) here, and the release note for v8.2.0 here. As with many VMware products a 60 day evaluation period is applied. The vRealize Operations Manager OVF needs to be deployed for each vROps cluster node in the environment. Deployment and configuration of vRealize Operations Manager can also be automated using vRealize Suite Lifecycle Manager.
Log into the vSphere client and deploy the OVF (right click the data centre, cluster, or host object and select Deploy OVF Template).
The deployment interface prompts for the usual options like compute, storage, and IP address allocation, as well as the appliance size based on the sizing guidelines above. Do not include an underscore (_) in the hostname. The disk sizes (20 GB, 250 GB, 4 GB) are the same regardless of the appliance size configured. New disks can be added, but extending existing disks is not supported. Also be aware that snapshots can cause performance degradation and should not be used. For this deployment I have selected a small deployment; 4 CPU, 16 GB RAM.
Once deployed browse to the appliance FQDN or IP address to complete the appliance setup. You can double check the IP address from the virtual machine page in vSphere or the remote console. For larger environments and additional settings like custom certificates, high availability, and multiple nodes select New Installation. In this instance since vROps will be managing only a single vCenter with 3 or 4 hosts I select the Express Installation.
The vRealize Operations Manager appliance will be set as the master node, this configuration can be scaled out later on if needed. Click Next to continue.
Set an administrator password at least 8 characters long, with an uppercase and lowercase letter, number, and special character, then click Next. Note that the user name is admin, and not administrator.
Click Finish to apply the configuration. A loading bar preparing vRealize Operations Manager for first use will appear. This stage can take up to 15 minutes.
Login with the username admin and the password set earlier.
There are a few final steps to configure before gaining access to the user interface. Click Next.
Accept the End User License Agreement (EULA) and click Next.
Enter the license information and click Next.
Select or deselect the Customer Experience Improvement Program (CEIP) option and click Next. Click Finish to progress to the vROps user interface.
Finally we’re into vRealize Operations home page, take a look around, or go straight into Add Cloud Account.
Select the account type, in this case we’re adding a vCenter.
Enter a name for the account, and the vCenter Server FQDN or IP address. I’m using the default collector group since we are only monitoring a small lab environment. You can test using Validate Connection, then click Add.
Give the vCenter account a few minutes to sync up, the status should change to OK. A message in the right-hand corner will notify that the vCenter collection is in progress.
Back at the home page a prompt is displayed to set the currency; configurable under Administration, Management, Global Settings, Currency. In this case I’ve set GBP(£). For accurate cost comparisons and environment specific optimisations you can also add your own costs for things like hardware, software, facilities, and labour. Cost data can be customised under Administration, Configuration, Cost Settings.
A common next step is to configure access using your corporate Identity Provider, such as Active Directory. Click Administration, Access, Authentication Sources, Add, and configure the relevant settings.
Multiple vCenter Servers can be managed from the vRealize Operations Manager interface. Individual vCenter Servers can also access vROps data from the vSphere client, from the Menu dropdown and vRealize Operations. A number of nested ESXi hosts are shut down in this environment which is generating the critical errors in the screenshot.