Disaster Recovery Strategy – The Backup Bible Review

The following post reviews The Backup Bible, a 204 page ebook written by Eric Siron and published by Altaro Software. I’ve decided to break this down into a short discussion around each section, 3 in total, to help with the flow of the text. The Backup Bible can be downloaded for free here.

Initial thoughts on downloading the ebook and flicking through; clearly a lot of research and work has gone into this based on content applicable to both the business outcomes and objectives as well as low level technical detail. A substantial number of example process documents are included that m§ean practical outputs can be applied by the reader straight away. The graphics, colour scheme, font, and text size all look like they’ll make the ebook easy to follow, so let’s jump in.

Introduction and part 1: creating a backup & disaster recovery strategy

Chapter 1 is short and to the point; providing a check list for the scope of disaster recovery. The check list serves as a nice starting point for organisations to expand into sub-categories based on their own goals, requirements, or industry specific regulations. An interesting observation made during the introduction is that human nature leans towards planning for best outcomes, rather than worst. As a result, organisations design systems with defences for known common failures (think disk, network adapter) but less often design with catastrophic failure in mind. Rather than designing systems with the expectation of failure principals, an assumption is made that they will operate as expected. We’ll discuss more points made in the ebook around shifting that mentality later.

A further key-takeaway from the introduction is that disaster recovery planning is not a one-time event, but an ongoing process. Plans and requirements will need to adapt as new services are added and older services change. This is why the ebook focuses heavily on the implementation of a disaster recovery strategy, staying agile, as oppose to a singular process or policy. The disaster recovery strategy will be driven by the overall business strategy.

Some great talking points are introduced in chapter 2 around popular responses as to why disaster recovery fails, or is not properly implemented. In most cases expect a combination of many, if not all, of these reasons. Building disaster recovery into the initial design of a greenfield environment can be relatively easy, in comparison with retrospectively fitting it into brownfield environments. Older systems and infrastructure were generally deployed in silos yet interlinked with dependencies. New services have been deployed over time, bolted onto existing infrastructure, with changing topologies. Disaster recovery plans need to be agile enough to cater for different systems, and keep up with both technical and business changes.

The chapter moves on to talk about common risks, and determining key stakeholders. Again, useful examples that can be adapted and increased based on your industry and organisational structure, such as protecting physical assets as well as digital. Gaining insights into business priorities and current risks from different people within the organisation boosts your chances of building successful plans and processes.

Chapter 3 starts out with some easy to understand explanations and graphics on Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO). It’s good to set out why these are important, since different systems will have different RTO and RPO values. Defining and setting these values helps with the trade-off between budget and availability.

The Backup Bible – Recovery Time Objective

Another tip here that really resonates with me is to translate business impact into monitory value. The suggested “How long can we operate without this system?” suffices in most cases, but to create urgency the ebook recommends asking “How much does this system cost us per hour when offline?”. Hearing that the business is losing £x an hour hits harder than saying part of the website is down. Establishing RPO’s for different events is also something often overlooked, for example recovery from a hardware failure will most likely differ from a site outage or ransomware attack.

The first 3 chapters laid the groundwork for capturing business requirements. Chapter 4 helps break business items down and map them to common technical solutions, applied at both the software and hardware level. Solid explanations of fault tolerance and high availability are followed up with in-depth descriptions of applying these terms to storage, memory, and other systems through clustering, replication, and load-balancing. Although fault tolerance and high availability contribute to IT operations, they absolutely do not replace backups. This point is positioned alongside an extensive list of how to assess potential backup solutions against your individual requirements when evaluating software.

The Backup Bible – Hard Drive Fault Tolerance

Part 2: backup best practices in action

The second part of The Backup Bible begins by setting out some example architectures for backup software placement. Security principals are also now being worked into the disaster recovery strategy, alongside backup software and storage, whilst those not currently in a backup or system administrator role will learn about the differences between crash-consistent, and application-consistent backups. Chapter 5 again allows for practical application, in the form of a 2-phase backup software selection and testing plan. The example tracking table provided can be used for feature comparisons, and kept for audit or reporting purposes.

Options for backup storage targets follow in chapter 6; another section of comprehensive explanations detailing the advantages and disadvantages of different storage types. The information allows consumers to apply each scenario to their own requirements and make informed decisions when implementing their disaster recovery strategy. Most examples are naturally data centre storage types, but perhaps where cloud storage targets differ is that they should be factored into an organisationals overall cloud strategy too. Factors like egress charges, connectivity, and data offshoring governance or regulations for cloud targets will likely need building into the wider cloud computing picture.

Chapter 7 moves on to a really important topic, and I’m glad this section looks pretty weighty. In many industries, securing and protecting backup data can be just as important as the availability of the data itself. The opening pages walk through securing backups using examples like encryption, account management, secret/key vaults, firewalls, network segmentation and air-gapping. At the same time keeping to the functionality within your existing backup, storage, network and security tools, so as not to over-engineer a solution that will be difficult to support in the future. The chapter moves on to talk about layering security practices, and how these can be weighed up using risk assessments, and then built into policy if desired. Very well made points that are realistic and applicable to real world scenarios.

An example backup software deployment checklist is covered in chapter 8, before chapter 9 expands more on documentation. This is another step often missed in the rush to get solutions out of the door. In disaster recovery, documentation is especially important so that on-call staff and those not familiar with the setup can carry out any required actions quickly and correctly.

The next stage in the process is to implement your organisations data protection needs in the form of backup schedule(s) – usually multiple. Various examples of backup schedules are used in chapter 10; placing the most value on full backups without dependencies on other data sources, and filtering through incremental, differential, or delta backups for convenience. A recommendation is made to time backups with business events, such as month end processes, when high value changes take place, or maintenance such as monthly operating system updates. Once again a real focus is placed firstly on building a technical strategy around business requirements, and secondly on communication and documentation to ensure that strategy is as effective as possible.

The final 2 chapters of this section describe maintaining the backup solution beyond its initial implementation. This includes monitoring and testing backups for data integrity, and keeping systems up to date and in good health. Security patches and upgrades will be in the form of scheduled or automatic regular updates, and less frequent one-off events like drivers, firmware, or hot patches for exposed vulnerabilities or fixes. Often newly implemented systems look great on day 1, but come back on day x in 6 or 12 months and things can look different. It’s refreshing to see the ebook call this out and plan for the ongoing lifecycle of a solution as well as the initial design and implementation.

The Backup Bible – Firewalls

Part 3: disaster recovery & business continuity blueprint

In part 3 the ebook dives into business continuity, focusing on areas beyond technology; like making sure that office or physical space, people, and business dependencies are all taken care of to start recovery operations as quickly as possible. A theme that has remained consistent is asking questions and woking with multiple subject-matter experts, in turn exposing more questions and considerations. Example configurations of hot, warm, and cold secondary sites are used, alongside the required actions to prepare each site should business continuity ever be invoked. These topics really get the reader thinking of the bigger picture, and how some of the technical planning earlier feeds into business priorities.

Chapter 15 moves on to replication with a thorough description on replication types, and when they are or are not applicable. Replication enables rapid failover to another site, and does not replace but rather complements backups where budgets allow. Clear direction is given here to ensure replication practices are understood, along with which tasks can be automated but which need to remain manual. The ebook points out that if configured incorrectly replication can also add overhead, and put data at risk, so it’s good to see both sides of the coin. A helpful tip in this chapter is to use external monitoring where possible, rather than rely on built-in email alerting, which could fail along with the system it is supposed to be monitoring!

The use of cloud services is probably beyond the scope of this ebook, that said, because cloud storage is becoming an option for backup targets it is mentioned and implied currently as more of a comfort blanket. Chapter 16 adds some further considerations, like not making an assumption that backups or disaster recovery is included in a service just because it is hosted in the cloud. Cloud services need to be architected for availability in the same way they do on-premises. Some Software-as-a-Service (SaaS) offerings may include this functionality but it will be explicitly stated. The scope of using cloud for disaster recovery obviously varies between organisations already heavily invested in the cloud, or multiple cloud providers, and those not using it at all. Despite there being more talking points I think it’s right to keep the conversation on topic to avoid going down a rabbit hole.

We’re now at a stage where disaster recovery has been invoked and chapter 17 runs through what some of the business processes may look like. There’s a lot of compassion shown in this chapter and whilst flexibility and considerations for people is a given, going a step further and writing it into the business continuity documentation is something I must admit I hadn’t previously thought about. Having plans laid out ready as to how employees will be contacted in the event of a disaster will also save time and reduce confusion. There’s some good information here on implementing home working too, and this is particularly relevant as we still navigate through COVID-19 restrictions. Certain areas of industries like manufacturing and healthcare may need that secondary site mentioned earlier on, but a lot of jobs in the Internet-era can be carried out from home or distributed locations.

As the ebook begins to draw to a close, the latter 2 chapters produce guidance on how the disaster recovery strategy can be tested and maintained. This is crucial for many of the reasons we’ve read through, and we learn that automating testing in a ‘set and forget’ way is usually not sufficient. Some processes need to be manually checked and validated on a schedule to protect against unexpected behaviour. Chapter 19 calls once again on working together with teams outside of IT, and this is more good advice since rather than IT setting baseline values, for example on backup frequency and retention, it encourages line of business to take responsibility for their own data and applications.

The Backup Bible – Systems Fail

Summary

Overall The Backup Bible has been an enlightening read; containing some really useful guidance and personal narratives about situations the author has experienced. A substantial number of templates for documentation and process checklists are included at the end of the ebook, ranging from backup and restore documentation to data protection questionnaires to testing plans. The ebook does enough of the explanation leg work, and not assuming knowledge, to make sure that a reader from any area of the business can take something away from it.

There are of course references to Altaro software, but this in no way is a glorified advertisement. The points discussed are presented in a neutral manner with comprehensive detail for organisations to make informed decisions about technologies and processes most suited to their own business. Rather than publishing a list of ‘must-haves’ for disaster recovery, the ebook acknowledges that business have different requirements and provides the tools for the reader to go about implementing their own version based on what they have learnt through the course of the ebook.

From a small business with very little protection against a disaster, to an enterprise organisation with processes already in place, anybody interested in disaster recovery will be able to gain something from reading this ebook. The Backup Bible can be downloaded for free here.

How to Install vSphere 7.0 – vRealize Operations Manager 8.2

How to Install vSphere 7.0 – vRealize Operations Manager 8.2

Introduction

In this post we take a look at a vRealize Operations (vROps) deployment for vSphere 7; building on the installation of vCenter 7.0 U1 and vSAN 7.0 U1. Shortly after installing vROps 8.2, vRealize Operations 8.3 was released. The install process is similar, you can read what’s new here and see the upgrade process here.

vRealize Operations is an IT operations management tool for monitoring full-stack physical, virtual, and cloud infrastructure, along with virtual machine, container, operating system, and application level insights. vROps provides performance and capacity optimisation, monitoring and alerting, troubleshooting and remediation, and dashboards and reporting. vROps also handles private costings, showback, and what-if scenarios for VMware, VMware Cloud, and public cloud workloads. Many of these features have been released with version 8.2, and now work slicker fully integrated into the vROps user interface, rather than a standalone product. Previously vRealize Business would cater for similar costing requirements, but has since been declared end of life.

vRealize Operations can be deployed on-premises to an existing VMware environment, or consumed Software-as-a-Service (SaaS). vRealize Operations Cloud has the same functionality, with the ongoing operational overhead of lifecycle management and maintenance taken care of by VMware. Multiple vCenter Servers or cloud accounts can be managed and monitored from a single vROps instance. For more information on vROps see the What is vRealize Operations product page.

vRealize Operations Manager 8.2 Install Guide

The vRealize Operations Manager installation for lone instances is really straight forward, as is applying management packs for monitoring additional environments. Where the installation may get more complex, is if multiple cluster nodes need to be deployed, along with remote collector nodes, and/or multiple instances. If you think this may apply to you review the complexity levels outlined in the vRealize Operations Manager 8.2 Deployment Guide.

The installation steps below walk through the process of installing vROps using the master node. All deployments start out with a master node, which in some cases is sufficient to manage itself and perform all data collection and analysis operations. Optional nodes can be added in the form of; further data nodes for larger deployments, replica nodes for highly available deployments, and remote collector nodes for distributed deployments. Remote collector nodes, for example, can be used to compress and encrypt data collected at another site or another VMware Cloud platform. This could be an architecture where a solution like Azure VMware Solution is in use, with an on-premises installation of vROps. For more information on the different node types and availability setups see the deployment guide linked above.

When considering the deployment size and node design for vROps, review the VMware KB ​vRealize Operations Manager Sizing Guidelines, which is kept up to date with sizing requirements for the latest versions. The compute and storage allocations needed depend on your environment, the type of data collected, the data retention period, and the deployment type.

Installation

Before starting ensure you have a static IP address ready for the master node, or (ideally and) a Fully Qualified Domain Name (FQDN) with forward and reverse DNS entries. For larger than single node deployments check the Cluster Requirements section of the deployment guide.

The vRealize Operations Manager appliance can be downloaded in Open Virtualisation Format (OVF) here, and the release note for v8.2.0 here. As with many VMware products a 60 day evaluation period is applied. The vRealize Operations Manager OVF needs to be deployed for each vROps cluster node in the environment. Deployment and configuration of vRealize Operations Manager can also be automated using vRealize Suite Lifecycle Manager.

vRealize Operations Manager download

Log into the vSphere client and deploy the OVF (right click the data centre, cluster, or host object and select Deploy OVF Template).

The deployment interface prompts for the usual options like compute, storage, and IP address allocation, as well as the appliance size based on the sizing guidelines above. Do not include an underscore (_) in the hostname. The disk sizes (20 GB, 250 GB, 4 GB) are the same regardless of the appliance size configured. New disks can be added, but extending existing disks is not supported. Also be aware that snapshots can cause performance degradation and should not be used. For this deployment I have selected a small deployment; 4 CPU, 16 GB RAM.

Once deployed browse to the appliance FQDN or IP address to complete the appliance setup. You can double check the IP address from the virtual machine page in vSphere or the remote console. For larger environments and additional settings like custom certificates, high availability, and multiple nodes select New Installation. In this instance since vROps will be managing only a single vCenter with 3 or 4 hosts I select the Express Installation.

vRealize Operations Manager start page

The vRealize Operations Manager appliance will be set as the master node, this configuration can be scaled out later on if needed. Click Next to continue.

vRealize Operations Manager new cluster setup

Set an administrator password at least 8 characters long, with an uppercase and lowercase letter, number, and special character, then click Next. Note that the user name is admin, and not administrator.

vRealize Operations Manager administrator credentials

Click Finish to apply the configuration. A loading bar preparing vRealize Operations Manager for first use will appear. This stage can take up to 15 minutes.

vRealize Operations Manager initial setup

Login with the username admin and the password set earlier.

vRealize Operations Manager login page

There are a few final steps to configure before gaining access to the user interface. Click Next.

vRealize Operations Manager final setup

Accept the End User License Agreement (EULA) and click Next.

vRealize Operations Manager terms and conditions

Enter the license information and click Next.

vRealize Operations Manager license information

Select or deselect the Customer Experience Improvement Program (CEIP) option and click Next. Click Finish to progress to the vROps user interface.

vRealize Operations Manager final setup

Finally we’re into vRealize Operations home page, take a look around, or go straight into Add Cloud Account.

vRealize Operations Manager home page

Select the account type, in this case we’re adding a vCenter.

vRealize Operations Manager account types

Enter a name for the account, and the vCenter Server FQDN or IP address. I’m using the default collector group since we are only monitoring a small lab environment. You can test using Validate Connection, then click Add.

vRealize Operations Manager add vCenter Server

Give the vCenter account a few minutes to sync up, the status should change to OK. A message in the right-hand corner will notify that the vCenter collection is in progress.

vRealize Operations Manager vCenter collection

Back at the home page a prompt is displayed to set the currency; configurable under Administration, Management, Global Settings, Currency. In this case I’ve set GBP(£). For accurate cost comparisons and environment specific optimisations you can also add your own costs for things like hardware, software, facilities, and labour. Cost data can be customised under Administration, Configuration, Cost Settings.

vRealize Operations Manager quick start page

A common next step is to configure access using your corporate Identity Provider, such as Active Directory. Click Administration, Access, Authentication Sources, Add, and configure the relevant settings.

Multiple vCenter Servers can be managed from the vRealize Operations Manager interface. Individual vCenter Servers can also access vROps data from the vSphere client, from the Menu dropdown and vRealize Operations. A number of nested ESXi hosts are shut down in this environment which is generating the critical errors in the screenshot.

vRealize Operations Manager overview page

Featured image by Jonas Svidras on Unsplash

How to Install vSphere 7.0 – vSAN 7.0

How to Install vSphere 7.0 – vSAN 7.0

Introduction

This second post in a new lab series provides a walkthrough for installing the latest iteration of vSAN 7. At the time of writing the latest version of vSAN is vSAN 7.0 Update 1. To read about what’s new see vSphere 7 and vSAN 7 Headline New Features.

VMware vSAN is a software-defined storage solution baked directly into the vSphere hypervisor. vSAN enables aggregation of local or directly-attached devices and pools them together across hosts in a vSphere cluster to provide a single shared storage pool. Functionality is abstracted from the underlying hardware and managed at a software level, within vCenter, to provide granular policy based availability and controls. Non-disruptive scale out can be achieved by adding more ESXi hosts, either in the same cluster or a new cluster, and scale up by adding more disks to the existing hardware. Multiple vSAN clusters can be created and managed within a single vCenter Server. Since vSAN is already implemented directly into ESXi, activating the functionality simply requires planning and enabling the configuration, along with the appropriate VMware vSAN licenses.

In this example vSAN will be configured in a lab environment using a 2 host cluster (Intel NUC Bean Canyon) running vSphere 7 U1C, with a third node acting as the vSAN witness. As of vSAN 7.0 U1 a single witness appliance can support up to 64 2-node clusters. If you’re looking for more information on running a vSphere lab on the Intel NUC range check out the VMware Homelab section of virten.net, which has some great guides and resources.

vSAN 7.0 Install Guide

vSAN can be configured in an all-flash or hybrid setup. In a hybrid setup, flash is used for the cache with spinning disks providing the capacity tier. Although all local capacity devices are pooled together and shared across hosts in the cluster; an optimal vSAN configuration will contain hosts with the same or similar physical storage configurations, balancing storage devices consistently across the cluster. That said, hosts without any contributing storage can also join the cluster and run virtual machines. In this type of setup, planning the deployment to cover fault tolerance and protection against loss of specific contributing nodes is of particular importance.

All hosts contributing storage devices to the cluster must include at least one flash device for local cache, alongside at least one capacity device. For hybrid configurations, the flash device must be a minimum of 10% of the anticipated consumed storage of the capacity tier, and this should account for future growth to prevent reduced performance over time as the consumed storage grows. The cache for each host in any setup does not count towards the overall size of the shared datastore. Cache and capacity devices in a host form one or more disk groups, outlined in the high level image below. For more information on capacity and sizing considerations when designing a vSAN deployment, review the VMware vSAN Design Guide and the Designing and Sizing a vSAN Cluster documentation.

VMware vSAN high level overview from the vSAN 7.0 Planning and Deployment documentation

VMware vSAN is an enterprise solution and supports all VMware features that rely on shared storage, like High Availability, Distributed Resource Scheduler, and Storage vMotion. vSAN also includes features like stretched clustering, and fault domain implementations. Hosts in a vSAN cluster can also mount other VMFS and NFS datastores, although vSAN itself does not require or rely on any kind of external storage or Storage Area Network (SAN). You can find more information in the vSAN Planning and Deployment – VMware vSphere 7.0 documentation, which should be studied before configuring vSAN, along with the relevant release notes – in this example I am using vSAN 7.0 Update 1.

System Requirements

  • VMware vSAN can be built on the following hardware:
    • vSAN ReadyNode – preconfigured solutions using hardware tested and certified for vSAN by the server OEM and VMware
    • Turn key deployments – fully packaged Hyper-Converged Infrastructure (HCI) solutions like Dell EMC VxRail
    • Custom solution – hardware components compiled by the user, all hardware used with vSphere 7 and vSAN 7 must be listed in the VMware Compatibility Guide
  • To check version compatibility with other VMware products, see also the VMware Product Interoperability Matrices.
  • A standard vSAN cluster needs at least 3 hosts, with a maximum of 64. At least 4 hosts are recommended for maximum availability due to limitations around maintenance and protection after a failure with 3-host clusters. The 2-host vSAN cluster with witness is also a separate configuration and exception.
  • Each physical host contributing capacity to the vSAN cluster requires:
    • 1 x SAS or SATA HBA, or RAID controller in passthrough mode
    • 1 x SAS or SATA SSD, or PCIe flash device, for the cache
    • At least 1 x (further) SAS or SATA SSD, or PCIe flash device, for capacity in an all-flash disk group, OR; at least 1 x SAS or NL-SAS magnetic disk, for capacity in a hybrid disk group, with no existing partition configuration in both cases
    • A minimum of 8 GB RAM, but in most cases it is preferable to have at least 32 GB RAM
    • Dedicated 1 Gbps bandwidth for hybrid configuration (10 Gbps recommended), OR; dedicated or shared 10 Gbps for all-flash configurations (25 Gb, 40 Gb, and 100 Gb are also supported) – for best results new environments should consider 25 Gbps connectivity using vSphere Distributed Switches with Network I/O Control (vSphere Standard Switches are also supported but do not offer QoS)
    • A configured VMkernel network adapter for vSAN traffic
    • A maximum network latency of 1 ms RTT for standard vSAN clusters (200 ms to a witness node, 5 ms for stretched clusters)
    • Layer 2 or Layer 3 network connectivity between hosts in the cluster (jumbo frames are supported but not required, if jumbo frames are already in use then the setting should be configured end-to-end across the environment)
    • A valid vSAN license, normally managed per CPU although per OSI licensing is available for branch office configurations
  • When sizing a vSAN cluster keep in mind the total capacity of all disks pooled together is only the raw capacity. True payload capacity can be calculated using the primary level of failures to tolerate, in conjunction with the failure tolerance method (RAID). For more information review the Designing and Sizing a vSAN Cluster documentation.
  • Prior to vSAN 7.0 U1, a general recommendation to keep the vSAN datastore below 70% usage was made. The latest release has made substantial improvements to improve usage of free capacity, and as such can be calculated per cluster based on variables outlined in the Designing for Capacity section of the VMware vSAN Design Guide.
  • It is good practice to synchronise ESXi and vCenter versions, and run the latest release. Hosts should also be in the same L2 subnet for best networking performance.
  • If your environment has firewalls review the list of Required ports for vSAN.
  • For larger enterprise environments see also the vSAN Configuration Limits.

vSAN Activation

In this example we’ll use the vSphere Cluster Quickstart page to configure vSAN. Quickstart consolidates the storage and networking workflows required to activate vSAN. A new cluster has been created containing 2 ESXi hosts running 7.0 U1C. The hosts are in maintenance mode and have no existing datastores or partition information beyond the standard boot disk. Both hosts are using PCIe flash devices in passthrough mode.

A third host will act as the witness node. The witness for a 2-host vSAN cluster needs to have available disks for writing metadata; at least 10 GB cache and 15 GB capacity. All 3 hosts need a VMkernel port configured. Since this is a lab environment, with limited physical connections and bandwidth, I have configured the management vmk port to also be used for vSAN traffic. The vmk port is a virtual adapter used to handle VMware service traffic for various functionality. If you need guidance on setting up the VMkernel adapter for vSAN, see the How to Configure vSAN VMkernel Networking Knowledge Base page.

Shared vmk0 for management and vSAN traffic (lab only)

Now that the VMkernel ports are setup for vSAN traffic, and there is IP-reachability between the vSAN cluster hosts and witness node, we can start the vSAN configuration. Select the cluster in the vSphere client and click Configure > Quickstart. For stage 1 click Edit and select the vSAN service. After a couple of seconds the pre-requisite health checks in stage 2 are complete. Providing no issues arise move on to stage 3 and click Configure.

vSAN Configuration Quickstart

Configure the network settings for the vSAN cluster. The Quickstart setup uses vSphere Distributed Switches, which are recommended, although vSphere Standard Switches are also supported. In my lab, since I already enabled vSAN traffic on the management port, I can skip the Distributed Switch setup, and click Next.

vSAN cluster deployment network configuration

Configure the vSAN cluster settings, like encryption, compression, and deduplication, as required. In this example I am using the Two node vSAN cluster deployment type. Click Next.

vSAN cluster deployment type

Select the disks and tier to be claimed for the vSAN cluster. Remember that vSAN can only use local or direct-attached storage, and not remote storage. In this example 2 x 500 GB flash devices have been allocated to the capacity tier, and 2 x 50 GB flash devices have been allocated to the cache tier. The total of the claimed disks is 1.07 TB. This does not provide any component failure protection and is only for lab purposes. I accept the recommended configuration and click Next.

vSAN cluster deployment storage types

Since my vSAN cluster is only 2-nodes, I need to add a witness host. The witness host, with available disks for metadata, and vSAN enabled VMkernel adapter for communication, is selected and passes the compatibility checks. Click Next to continue.

vSAN cluster deployment with witness host

Claim the disks for the witness host to use, in this case I have allocated a 10 GB disk for the cache tier metadata, and 15 GB disk for the capacity tier metadata. Click Next to continue.

vSAN cluster deployments with witness host disks

Review the settings configured and click Finish to deploy the vSAN configuration. Although the Quickstart interface returns a message pretty quickly saying the cluster is configured, keep an eye on activity in the Recent Tasks pane as there is likely still configuration taking place.

vSAN cluster deployment review and finalise

The easiest way to check the vSAN status is to select the cluster, click Monitor, and scroll down to vSAN. Skyline Health will show the vSAN health checks associated with the cluster, you can also see physical and virtual object states, capacity and performance.

vSAN capacity monitoring

To view or manually edit the cluster settings select the cluster, click Configure and scroll down to vSAN. Services shows the available vSAN services and their configuration, in my lab environment most of these are disabled. Disk Management shows the configured disk groups and their health state. In this lab scenario I only have 2 fault domains configured.

vSAN disk group configurations

Fault domains allow grouping together of physical hosts to protect against common failures like chassis or racks. It is best practice to configure consistent fault domains with the same number of hosts across the environment. Consider the impact on placement of data and overall number of host failures to tolerate when configuring fault domains. Clearly for a lab environment or a 2-node cluster in a small branch office setup fault domains and data availability cannot be applied in the same way as larger deployments. The following resources will help with designing such environments:

Finally, if you want to create a new storage policy to apply to the vSAN datastore, or create multiple granular policies that can be applied at VM or VMDK level, this can be done from the Menu dropdown, Policies and Profiles, VM Storage Policies. If you need more information on the policy options available review the VM Storage Policy Design Considerations documentation.

Featured image by Jonas Svidras on Unsplash