Site Recovery Manager Configuration and Failover Guide

This post will walk through the configuration of Site Recovery Manager; we’ll protect some virtual machines with a Protection Group, and then fail over to the DR site using a Recovery Plan. The pre-requisites for this post are for Site Recovery Manager (SRM) and the Storage Replication Adapter (SRA) to be installed at both sites along with the corresponding vSphere infrastructure, and replication to be configured on the storage array. It is also possible to use vSphere Replication, for more information see the previous posts referenced below.

Part 1 – Nimble Storage Integration with SRM

Part 2 – Site Recovery Manager Install Guide

Part 3 – Site Recovery Manager Configuration and Failover Guide

Site Recovery Manager now has integration with the HTML5 vSphere client, see VMware Site Recovery Manager 8.x Upgrade Guide for more information.

Before creating a Recovery Plan ensure that you have read the documentation listed in the installation guide above and have the required components for each site. You should also make further design considerations around compute, storage, and network. In this post we will be using storage based replication and stretched VLANs to ensure resources are available at both sites. If you want to assign a different VLAN at the failover site then you can use SRM to reconfigure the network settings, see this section of the documentation center.

SRM

Configuring SRM

Log into the vSphere web client for the primary site as an administrator, and click the Site Recovery Manager icon.

config

The first step is to pair the sites together. When sites are paired either site can be configured as the protected site.

  • Click Sites, both installed sites should be listed, select the primary site.
  • On the Summary tab, in the Guide to configuring SRM box, click 1. Pair sites.
  • The Pair Site Recovery Manager Servers wizard will open. Enter the IP address or FQDN of the Platform Services Controller for the recovery site, and click Next.
  • The wizard then checks the referenced PSC for a registered SRM install. Select the corresponding vCenter Server from the list and enter SSO administrator credentials.
  • Click Finish to pair the sites together.

Now the sites are paired they should both show connected. When we configure protection one will be made the protected site and the other failover.

config3

Next we will configure mappings to determine which resources, folders, and networks will be used at both sites.

  • Locate the Guide to configuring SRM box and the subheading 2. Configure inventory mappings.
  • Click 2.1 Create resource mappings.
  • Expand the vCenter servers and select the resources, then click Add mappings and Next.
  • On the next page you can choose to add reverse mappings too, using the tick box if required.
  • Click Finish to add the resource mappings.

config4

  • Click 2.2 Create folder mappings.
  • Select whether you want the system to automatically create matching folders in the failover site for storing virtual machines, or if you want to manually choose which folders at the protected site map to which folders at the failover site. Click Next.
  • Select the folders to map for both sites, including reverse mappings if required, and click Finish.

config5

  • Click 2.3 Create network mappings.
  • Select whether you want the system to automatically create networks, or if you want to manually choose which networks at the protected site map to which networks at the failover site. Click Next.
  • Select the networks to map for both sites and click Next.
  • Review the test networks, these are isolated networks used for SRM test failovers. It is best to leave these as the default settings unless you have a specific isolated test network you want to use. Click Next.
  • Include any reverse mappings if required, then click Finish.

Next we will configure a placeholder datastore. SRM creates placeholder virtual machines at the DR site, when a failover is initiated the placeholder virtual machines are replaced with the live VMs. A small datastore is required at each site for the placeholder data, placeholder VMs are generally a couple of KBs in size.

  • Click 3. Configure placeholder datastore.
  • Select the datastore to be used for placeholder information and click Ok.

The screenshot below shows the placeholder VMs in the failover site on the left, and the live VMs in the protected site on the right.

placeholder

Although we followed the wizard on the site summary page for the above tasks, it is also possible to configure, or change the settings later, by selecting the site and then the Manage tab, all the different mappings are listed.

mappings

Site Protection

The following steps will configure site protection, we’ll start by adding the storage arrays.

  • Click 4. Add array manager and enable array pair.
  • Select whether to use a single array manager, or add a pair of arrays, depending on your environment, and click Next. I’m adding two separate arrays.

array1

  • Select the site pairing and click Next.
  • Select the installed Storage Replication Adapter and click Next.

array2

  • Enter the details for the two storage arrays where volumes are replicated and click Next.
  • Select the array pair to enable and click Next.
  • Confirm the details on the review page and click Finish.

An array pair can be managed by selecting the SRM site and clicking the Related Objects tab, then Array Based Replication. If you add new datastores to the datastore group, you can check they have appeared by selecting Array Based Replication from the Site Recovery Manager home page, select the array, and click the Manage tab. Array pairs and replicated datastores will be listed, click the blue sync icon to discover new devices.

Now the storage arrays are added we can create a Protection Group.

  • Click 5. Create a Protection Group.
  • Enter a name for the protection group and select the site pairing, click Next.

protection1

  • Select the direction of protection and the type of protection group. In this example I am using datastore groups provided by array based replication so I’ll need to select the array-pair configured above, and Next.

protection2

  • Select the datastore groups to protect, the datastores and virtual machines will be listed, click Next.
  • Review the configuration and click Finish.

The final step is to group our settings together in a Recovery Plan.

  • Click 6. Create a Recovery Plan.
  • Enter a name for the recovery plan and select the site pairing, click Next.
  • From the sites detected select the recovery site and click Next.
  • Select the Protection Group we created above and click Next.
  • Review the test networks, these are isolated networks used for SRM test failovers. It is best to leave these as the default settings unless you have a specific isolated test network you want to use. Click Next.
  • Review the configuration and click Finish.

Now we have green ticks against each item in the Guide to configuring SRM box, we can move on to testing site failover. The array based replication, Protection Groups, and Recovery Plans settings can all be changed, or new ones created, using the menus on the left handside of the Site Recovery Manager home page.

complete.PNG

Site Failover

SRM allows us to do a test failover, as well as an actual failover in the event of a planned or unplanned site outage. The test failover brings online the replicated volumes and starts up the virtual machines, using VMware Tools to confirm the OS is responding. It does not connect the network or impact the production VMs.

  • Log in to the vSphere web client for the vCenter Server located at the DR site.
  • Click Site Recovery, click Recovery Plans and select the appropriate recovery plan.
    • To test the failover plan click the green start button (Test Recovery Plan).
    • Once the test has completed click the cleanup icon (Cleanup Recovery Plan) to remove the test data, previous results can still be viewed under History.
  • To initiate an actual fail over click the white start button inside a red circle (Run Recovery Plan).
  • Select the tick-box to confirm you understand the virtual machines will be moved to different infrastructure.
  • Select the recovery type; if the primary site is available then use Planned migration, datastores will be synced before fail over. If the primary site is unavailable then use Disaster recovery, datastores will be brought online using the most recent replica on the storage array.
  • Click Next and then Finish.

failover

During the failover you will see the various tasks taking place in vSphere. Once complete the placeholder virtual machines in the DR site are replaced with the live virtual machines. The virtual machines are brought online in the priority specified when we created the Recovery Plan.

failover1

Ensure the virtual machines are protected again as soon as the primary site is available by following the re-protection steps below.

Site Re-Protection

When the primary site is available the virtual machines must be re-protected to allow failback. Likewise after failing back to the primary site the virtual machines must be re-protected to allow failover again to the DR site.

  • Log in to the vSphere web client for either site and click Site Recovery, Recovery Plans and select the appropriate Recovery Plan.
  • Under Monitor, Recovery Steps, the Plan status needs to show Recovery complete, before we can re-protect.

reprotect1

If the status shows incomplete then you can troubleshoot which virtual machine(s) are causing the problem under Related Objects, Virtual Machines. VMware Tools must be running on the VMs to detect the full recovery process.

  • To re-protect virtual machines click Reprotect from the Actions menu at the top of the page.
  • Click the tick-box to confirm you understand the machines will be protected based on the sites specified.

reprotect2

  • Click Next and Finish. The re-protect job will now run, follow the status in the Monitor tab.

reprotect3

Once complete the Plan Status, and Recovery Status, will show Complete. The virtual machine Protection Status will show Ok. The VMs are now protected and can be failed over to the recovery site. If you are failing back to the primary site follow the same steps as outlined in the SRM Failover section above. Remember to then re-protect the VMs so they can failover to the DR site again in the event of an outage. When a Protection Plan is active the status will show Ready, the plan is ready for test or recovery.

reprotect4

_______________

Part 1 – Nimble Storage Integration with SRM

Part 2 – Site Recovery Manager Install Guide

Part 3 – Site Recovery Manager Configuration and Failover Guide

One thought on “Site Recovery Manager Configuration and Failover Guide

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s