Site Recovery Manager Configuration and Failover Guide

This post will walk through the configuration of Site Recovery Manager; we’ll protect some virtual machines with a Protection Group, and then fail over to the DR site using a Recovery Plan. The pre-requisites for this post are for Site Recovery Manager (SRM) and the Storage Replication Adapter (SRA) to be installed at both sites along with the corresponding vSphere infrastructure, and replication to be configured on the storage array. It is also possible to use vSphere Replication, for more information see the previous posts referenced below.

Part 1 – Nimble Storage Integration with SRM

Part 2 – Site Recovery Manager Install Guide

Part 3 – Site Recovery Manager Configuration and Failover Guide

Before creating a Recovery Plan ensure that you have read the documentation listed in the installation guide above and have the required components for each site. You should also make further design considerations around compute, storage, and network. In this post we will be using storage based replication and stretched VLANs to ensure resources are available at both sites. If you want to assign a different VLAN at the failover site then you can use SRM to reconfigure the network settings, see this section of the documentation center.

SRM

Configuring SRM

Log into the vSphere web client for the primary site as an administrator, and click the Site Recovery Manager icon.

config

The first step is to pair the sites together. When sites are paired either site can be configured as the protected site.

  • Click Sites, both installed sites should be listed, select the primary site.
  • On the Summary tab, in the Guide to configuring SRM box, click 1. Pair sites.
  • The Pair Site Recovery Manager Servers wizard will open. Enter the IP address or FQDN of the Platform Services Controller for the recovery site, and click Next.
  • The wizard then checks the referenced PSC for a registered SRM install. Select the corresponding vCenter Server from the list and enter SSO administrator credentials.
  • Click Finish to pair the sites together.

Now the sites are paired they should both show connected. When we configure protection one will be made the protected site and the other failover.

config3

Next we will configure mappings to determine which resources, folders, and networks will be used at both sites.

  • Locate the Guide to configuring SRM box and the subheading 2. Configure inventory mappings.
  • Click 2.1 Create resource mappings.
  • Expand the vCenter servers and select the resources, then click Add mappings and Next.
  • On the next page you can choose to add reverse mappings too, using the tick box if required.
  • Click Finish to add the resource mappings.

config4

  • Click 2.2 Create folder mappings.
  • Select whether you want the system to automatically create matching folders in the failover site for storing virtual machines, or if you want to manually choose which folders at the protected site map to which folders at the failover site. Click Next.
  • Select the folders to map for both sites, including reverse mappings if required, and click Finish.

config5

  • Click 2.3 Create network mappings.
  • Select whether you want the system to automatically create networks, or if you want to manually choose which networks at the protected site map to which networks at the failover site. Click Next.
  • Select the networks to map for both sites and click Next.
  • Review the test networks, these are isolated networks used for SRM test failovers. It is best to leave these as the default settings unless you have a specific isolated test network you want to use. Click Next.
  • Include any reverse mappings if required, then click Finish.

Next we will configure a placeholder datastore. SRM creates placeholder virtual machines at the DR site, when a failover is initiated the placeholder virtual machines are replaced with the live VMs. A small datastore is required at each site for the placeholder data, placeholder VMs are generally a couple of KBs in size.

  • Click 3. Configure placeholder datastore.
  • Select the datastore to be used for placeholder information and click Ok.

The screenshot below shows the placeholder VMs in the failover site on the left, and the live VMs in the protected site on the right.

placeholder

Although we followed the wizard on the site summary page for the above tasks, it is also possible to configure, or change the settings later, by selecting the site and then the Manage tab, all the different mappings are listed.

mappings

Site Protection

The following steps will configure site protection, we’ll start by adding the storage arrays.

  • Click 4. Add array manager and enable array pair.
  • Select whether to use a single array manager, or add a pair of arrays, depending on your environment, and click Next. I’m adding two separate arrays.

array1

  • Select the site pairing and click Next.
  • Select the installed Storage Replication Adapter and click Next.

array2

  • Enter the details for the two storage arrays where volumes are replicated and click Next.
  • Select the array pair to enable and click Next.
  • Confirm the details on the review page and click Finish.

An array pair can be managed by selecting the SRM site and clicking the Related Objects tab, then Array Based Replication. If you add new datastores to the datastore group, you can check they have appeared by selecting Array Based Replication from the Site Recovery Manager home page, select the array, and click the Manage tab. Array pairs and replicated datastores will be listed, click the blue sync icon to discover new devices.

Now the storage arrays are added we can create a Protection Group.

  • Click 5. Create a Protection Group.
  • Enter a name for the protection group and select the site pairing, click Next.

protection1

  • Select the direction of protection and the type of protection group. In this example I am using datastore groups provided by array based replication so I’ll need to select the array-pair configured above, and Next.

protection2

  • Select the datastore groups to protect, the datastores and virtual machines will be listed, click Next.
  • Review the configuration and click Finish.

The final step is to group our settings together in a Recovery Plan.

  • Click 6. Create a Recovery Plan.
  • Enter a name for the recovery plan and select the site pairing, click Next.
  • From the sites detected select the recovery site and click Next.
  • Select the Protection Group we created above and click Next.
  • Review the test networks, these are isolated networks used for SRM test failovers. It is best to leave these as the default settings unless you have a specific isolated test network you want to use. Click Next.
  • Review the configuration and click Finish.

Now we have green ticks against each item in the Guide to configuring SRM box, we can move on to testing site failover. The array based replication, Protection Groups, and Recovery Plans settings can all be changed, or new ones created, using the menus on the left handside of the Site Recovery Manager home page.

complete.PNG

Site Failover

SRM allows us to do a test failover, as well as an actual failover in the event of a planned or unplanned site outage. The test failover brings online the replicated volumes and starts up the virtual machines, using VMware Tools to confirm the OS is responding. It does not connect the network or impact the production VMs.

  • Log in to the vSphere web client for the vCenter Server located at the DR site.
  • Click Site Recovery, click Recovery Plans and select the appropriate recovery plan.
    • To test the failover plan click the green start button (Test Recovery Plan).
    • Once the test has completed click the cleanup icon (Cleanup Recovery Plan) to remove the test data, previous results can still be viewed under History.
  • To initiate an actual fail over click the white start button inside a red circle (Run Recovery Plan).
  • Select the tick-box to confirm you understand the virtual machines will be moved to different infrastructure.
  • Select the recovery type; if the primary site is available then use Planned migration, datastores will be synced before fail over. If the primary site is unavailable then use Disaster recovery, datastores will be brought online using the most recent replica on the storage array.
  • Click Next and then Finish.

failover

During the failover you will see the various tasks taking place in vSphere. Once complete the placeholder virtual machines in the DR site are replaced with the live virtual machines. The virtual machines are brought online in the priority specified when we created the Recovery Plan.

failover1

Ensure the virtual machines are protected again as soon as the primary site is available by following the re-protection steps below.

Site Re-Protection

When the primary site is available the virtual machines must be re-protected to allow failback. Likewise after failing back to the primary site the virtual machines must be re-protected to allow failover again to the DR site.

  • Log in to the vSphere web client for either site and click Site Recovery, Recovery Plans and select the appropriate Recovery Plan.
  • Under Monitor, Recovery Steps, the Plan status needs to show Recovery complete, before we can re-protect.

reprotect1

If the status shows incomplete then you can troubleshoot which virtual machine(s) are causing the problem under Related Objects, Virtual Machines. VMware Tools must be running on the VMs to detect the full recovery process.

  • To re-protect virtual machines click Reprotect from the Actions menu at the top of the page.
  • Click the tick-box to confirm you understand the machines will be protected based on the sites specified.

reprotect2

  • Click Next and Finish. The re-protect job will now run, follow the status in the Monitor tab.

reprotect3

Once complete the Plan Status, and Recovery Status, will show Complete. The virtual machine Protection Status will show Ok. The VMs are now protected and can be failed over to the recovery site. If you are failing back to the primary site follow the same steps as outlined in the SRM Failover section above. Remember to then re-protect the VMs so they can failover to the DR site again in the event of an outage. When a Protection Plan is active the status will show Ready, the plan is ready for test or recovery.

reprotect4

_______________

Part 1 – Nimble Storage Integration with SRM

Part 2 – Site Recovery Manager Install Guide

Part 3 – Site Recovery Manager Configuration and Failover Guide

Site Recovery Manager 6.x Install Guide

This post will walk through the installation of Site Recovery Manager (SRM) to protect virtual machines from site failure. SRM plugs into vCenter to protect virtual machines replicated to a failover site using array based replication or vSphere replication. In the event of a site outage, or outage of components within a site meaning production virtual machines can no longer run there; SRM brings online the replicated datastore and VMs in vSphere, with a whole bunch of automated customisation options such as assigning new IP addresses, boot orders, dependencies, running scripts, etc. After a failover SRM can reverse the replication direction and protect virtual machines ready to fail back, all from within the vSphere web client.

Requirements

  • SRM is installed on a Windows machine at the protected site and the recovery site. SRM requires an absolute minimum of 2vCPU, 2 GB RAM and 5 GB disk available, more is recommended for large environments and installations with an embedded database.
  • The Windows server should have User Access Control (UAC) disabled (in the registry, not just set to never notify) as this interferes with the install.
  • Each SRM installation requires its own database, this can be embedded for small deployments, or external for large deployments.
  • A vCenter Server must be in place at both the protected site and the recovery site.
  • SRM supports both embedded and external Platform Services Controller deployments. If the external deployment method is used ensure the vCenter at the failover site is able to connect to the Platform Services Controller (i.e. it isn’t in the primary site). For more information click here.
  • The vCenter Server, Platform Services Controller, and SRM versions must be the same on both sites.
  • You will need the credentials of the vCenter Server SSO administrator for both sites.
  • For vCenter Server 6.0 U2 compatibility use SRM v6.1.1, vCenter Server 6.0 U3 use SRM v6.1.2 and for vCenter Server 6.5 and 6.5 U1 use v6.5 or v6.5.1 of SRM.
  • Check compatibility of other VMware products using the Product Interoperability Matrix.
  • If there any firewalls between the management components review the ports required for SRM in this KB.
  • SRM can be licensed in packs of 25 virtual machines, or for unlimited virtual machines on a per CPU basis with vCloud Suite. Read more about SRM licensing here.
  • Array based replication or vSphere Replication should be in place before beginning the SRM install. If you are using array based replication contact your storage vendor for best practices guide and the Storage Replication Adapter which is installed on the same server as SRM.

As well as the requirements listed above the following points are best practices which should also be taken into consideration:

  • Small environments can host the SRM installation on the same server as vCenter Server, for large environments SRM should be installed on a different system.
  • For vCenter Server, Platform Services Controller, Site Recovery Manager servers, and vSphere Replication (if applicable) use FQDN where possible rather than IP addresses.
  • Time synchronization should be in place across all management nodes and ESXi hosts.
  • It is best practice to have Active Directory and DNS servers already running at the failover site.

Installation

In this example we will be installing Site Recovery Manager using Nimble array based replication. There is a vCenter Server with embedded Platform Services Controller already installed at each site. The initial screenshots are from an SRM v6.1.1 install, but I have also validated the process with SRM v6.5.1 and vCenter 6.5 U1.

SRM

The virtual machines we want to protect are in datastores replicated by the Nimble array. For more information on the storage array pre-installation steps see the Nimble Storage Integration post referenced below. The Site Recovery Manager install, configuration, and failover guides have no further references to Nimble and are the same for all vendors and replication types.

Part 1 – Nimble Storage Integration with SRM

Part 2 – Site Recovery Manager Install Guide

Part 3 – Site Recovery Manager Configuration and Failover Guide

Installing SRM

The installation is pretty straight forward, download the SRM installer and follow the steps below for each site. We’ll install SRM on the Windows server for the primary / protected site first, and repeat the process for the DR / failover site. We can then pair the two sites together and create recovery plans.

SRM 6.5.1 (vSphere 6.5 U1) Download | Release Notes | Documentation

SRM 6.5 (vSphere 6.5) Download | Release Notes | Documentation

SRM 6.1.2 (vSphere 6.0 U3) Download | Release Notes | Documentation

SRM 6.1.1 (vSphere 6.0 U2) Download | Release Notes | Documentation

Log into the Windows server where SRM will be installed as an administrator, and right click the downloaded VMware-srm-version.exe file. Select Run as aministrator. If you are planning on using an external database then the ODBC data source must be configured, for SQL integrated Windows authentication make sure you log into the Windows server using the account that has database permissions to configure the ODBC data source, and run the SRM installer.

Select the installer language and click Ok.

SRM1

Click Next to begin the install wizard.

SRM2

Review the patent information and click Next.

SRM3

Accept the EULA and click Next.

SRM4

Confirm you have read the prerequisites located at http://pubs.vmware.com/srm-61/index.jsp by clicking Next.

SRM5

Select the destination drive and folder, then click Next.

SRM6

Enter the IP address or FQDN of the Platform Services Controller that will be registered with this SRM instance, in this case the primary site. If possible use the FQDN to make IP address changes easier if required at a later date. Enter valid credentials to connect to the PSC and click Next. If your vCenter Server is using an embedded deployment model then enter your vCenter Server information.

SRM7

Accept the PSC certificate when prompted. The vCenter Server will be detected from the PSC information provided. Confirm this is correct and click Next. Accept the vCenter certificate when prompted.

SRM8

Enter the site name that will appear in the Site Recovery Manager interface, and the SRM administrator email address. Enter the IP address or FQDN of the local server, again use the FQDN if possible, and click Next.

SRM11

In this case as we are using a single protected site and recovery site we will use the Default Site Recovery Manager Plug-in Identifier. For environments with multiple protected sites create a custom identifier. Click Next.

SRM12

Select Automatically generate a certificate, or upload one of your own if required, and click Next.

SRM13

Select an embedded or external database server and click Next. If you are using an external database you will need a DSN entry configured in ODBC data sources on the local Windows server referencing the external data source. Click Next.

SRM14

If you opted for the embedded database you will be prompted to enter a new database name and create new database credentials. Click Next.

SRM15

Configure the account to run the SRM services, if applicable, and click Next.

SRM10

Click Install to begin the installation.

SRM9

Site Recovery Manager is now installed. Repeat the process to install SRM on the Windows server in the DR / recovery site, referencing the local PSC and changing the site names as appropriate. If you are using storage based replication you also need to install the Storage Replication Adapter (SRA) on the same server as Site Recovery Manager. In this example I have installed the Nimble SRA, available from InfoSight downloads, which is just a next and finish installer.

After each site installation of SRM you will see the Site Recovery Manager icon appear in the vSphere web client for the corresponding vCenter Server.

SRMvsphereSRMvsphere2

Providing the datastores are replicated, either using vSphere replication or array based replication, we can now move on to pairing the sites and creating recovery plans in Part 3.

_______________

Part 1 – Nimble Storage Integration with SRM

Part 2 – Site Recovery Manager Install Guide

Part 3 – Site Recovery Manager Configuration and Failover Guide

Nimble Storage Integration with SRM

This post will walk through the steps required to prepare Nimble Storage arrays at primary and secondary sites for VMware Site Recovery Manager (SRM) using array based replication. The following posts in this Site Recovery Manager series detail the end to end installation and configuration process.

Part 1 – Nimble Storage Integration with SRM

Part 2 – Site Recovery Manager Install Guide

Part 3 – Site Recovery Manager Configuration and Failover Guide

SRM

Before beginning ensure that time synchronization and DNS are in place across the Nimble arrays and vCenter / SRM servers. It is best practice to have Active Directory and DNS servers already running at the secondary site, and also recommended for virtual machine swap files to be stored in a dedicated datastore without replication. Make sure you review the Nimble Storage Best Practices for VMware Site Recovery Manager guide.

All Nimble Storage arrays are listed on the VMware Compatibility Guide, and Nimble have been providing a VMware specific Storage Replication Adapter (SRA) since version 5.1 of SRM. The SRA is the main integration point between SRM and Nimble; allowing storage interactive workflows to be initiated from SRM. In my environment I will only be using VMDK for the virtual storage and can utilise  the Nimble built in vCenter synchronization to quiesce I/O during snapshots. This means the replica is in an application-consistent state and can be cleanly brought back online in the event of failover. Nimble arrays are supplied inclusive of all features, so there are no additional licensing costs for replication.

VMware Integration

If you’re using Nimble to present LUNs to VMware then it’s likely you configured VMware integration during the initial configuration. However, to check log into the web UI of both the replication source and target Nimble arrays by browsing to the IP address or FQDN. From the drop-down Administration menu select VMware Integration.

If the correct vCenter Server is already registered confirm the settings using the Test Status button. Otherwise, enter the required vCenter information to register with the Nimble Storage array.

vcenter

Furthermore, any ESXi hosts connected to Nimble volumes should have the Nimble Connection Manager installed, which includes the Nimble Path Selection Policy (PSP) for VMware. Installing the Nimble VIBs is not included in the scope of this article, however I have briefly outlined the process below.

  • Log in to InfoSight and select Software Downloads from the Resources drop down.
  • Click Connection Manager (NCM) for VMware and download the appropriate version.
  • The downloaded zip file contains the Nimble VIBs, you can install these using one of the following methods:

nmp

Configure Replication Partners

Log into the Nimble web UI using the management IP address or FQDN of the desired replication source array. From the drop-down Manage menu, select Protection, and Replication Partners.

replication

Existing replication partners will be listed, at this stage we don’t have any. Click New Replication Partner.

replication1

The replication partner wizard will load.

  • Enter the group name of the replication target in the partner name field, the group name can be obtained by logging into the web UI of the target Nimble array and clicking the Group referenced in the top right hand corner, or by navigating to Manage, Arrays. Fill in the rest
  • Enter a description if required. Enter the hostname or management IP address of the target Nimble array.
  • Enter a shared secret, this will be configured the same on both arrays.
  • Specify if replication traffic should use the existing management network, or specified data IPs.
  • Specify any folder assignments if required.

replication2

If you want to configure bandwidth limits this is done on the QoS Policy page, click Finish once complete.

replication3

The replication partner will now be listed with a status of OK, the Test function should come back with a success message. Repeat the process on the replication target Nimble array, adding a replication partner for the replication source array.

Configure Volume Replication

Now we have an available replication target we can configure replication of important volumes. In the Nimble web UI for the replication source array, navigate to Manage, Volumes. Replication is configured in the Protection tab, either during the New Volume wizard, or for an existing volume by clicking the volume and Edit, then selecting Protection. Select Create new volume collection and enter a name.

rep1

Something to be aware of – later in the SRM configuration stage we will create protection groups which use consistency groups to group datastores for protection. These SRM consistency groups map to the volume collection groups in Nimble, so if you want to configure different protection settings for different virtual machines they will need to be in volumes using separate volume collection groups. See here for more information.

If application or hypervisor synchronization is required then enter the appropriate details. In this case since we are integrating with SRM we will select VMware vCenter and enter the vCenter details, ensuring application-consistent copies are replicated to the secondary site.

rep2

Configure the protection schedule and how many snapshots to keep locally and on the replication target array, make sure you select the replication partner we created earlier from the Replicate to drop-down menu. When you have entered the required details click Save.

rep3

Repeat this process for any other volumes requiring replication. Once a volume collection has been created you can use the same protection schedules by selecting the existing volume collection on the Protection tab of a volume.

replication5

To view or edit a volume collection navigate to Manage, Protection, Volume Collections in the Nimble web UI.

replication6

Existing replication snapshots are displayed on the Replication tab when selecting a volume, or in the volume collections page referenced above. On the replication target array replication volumes are displayed in the volumes page with a grey coupled LUN icon.

We can now move on to installing Site Recovery Manager in Part 2, the only other Nimble specific step is to install the Storage Replication Adapter (SRA) on the same Windows server as SRM, after SRM has been installed. The Nimble SRA can be downloaded here from InfoSight, and is a simple next and finish installer. After SRM is installed you can confirm the SRA status in the vSphere web client by browsing to Site Recovery Manager, Sites, select the site, open the Monitor tab, and click SRAs.

SRA

_______________

Part 1 – Nimble Storage Integration with SRM

Part 2 – Site Recovery Manager Install Guide

Part 3 – Site Recovery Manager Configuration and Failover Guide