Custom Alerts for vSphere Events

Alarms in vSphere are notifications that are generated when selected events occur, or conditions and states are met. There are a number of predefined alarms within vCenter Server that monitor datacentres, clusters, hosts, virtual machines, datastores, and networks. When an alarm is triggered an action can be configured to automatically respond, this could be something such as sending an email alert, or running a script.

Alarms are automatically inherited from objects higher up in the vSphere hierarchy, for example alarms configured at datacentre level apply to all clusters, hosts, virtual machines within that datacentre. The top level is the vCenter Server, alarms configured here monitor all applicable objects in the vCenter.

In this post we will quickly go over how to setup an alert for an event logged by vSphere for which a predefined alert does not already exist. This is done by creating a custom alert using the EventTypeId as the trigger event.

The first thing we need to do is get the EventTypeId, this differs from the fully formatted message you see in vSphere. Open a PowerCLI window and type Connect-VIServer where is the name of the vCenter Server to connect. Use the Get-VIEvent Cmdlet to view events with the syntax below to narrow the results.

  • -Entity where is the virtual machine, host, resource pool, etc. you want to view events for.
  • -Start where is the start date to retrieve events from, this should be in dd/mm/yyyy format (or mm/dd/yyyy if using US regional settings).
  • -Finish where is the end date to retrieve events from, this should be in dd/mm/yyyy format (or mm/dd/yyyy if using US regional settings).
  • -MaxSamples where is the number of events to retrieve. The default value is 100.
  • -Types where is the type of event to list, valid values are error, info, and warning.
  • -Username where is the user that has initiated the events you want to retrieve.
  • You don’t have to use all of the above. The example below will list the last 10 events on Host1 that were initiated by administrator@vsphere.local.

    Get-VIEvent -Entity Host1 -Username administrator@vsphere.local -MaxSamples 10

    The output for each event should look something like this. Locate the EventTypeId and make a note of it.

    getvievent

    Log into the vSphere web client and browse to the level for which you want to apply the alarm, for example vCenter, host, virtual machine, etc. Click the Manage tab and select Alarm Definitions.

    alarms1

    Click the green plus symbol to add a new alarm. Enter a name and description for the alarm, select the type of object to monitor and change the Monitor for option to specific events occurring on this object and click Next.

    alarms4

    Click the green plus symbol to add an event. In the Event field enter the EventTypeId to monitor for, that we recorded earlier. The status of the new event is set to Alert by default, click Next.

    alarms5

    On the Actions page select how to act upon the alert when the event is generated. This could be by sending an email, automatically running a script, etc.

    alarms3

    The alarm has now been created and the configured action will be applied when the event is logged.

Storage Connectivity Loss with VMCP

This post looks at VM Component Protection and how it helps protect vSphere 6 environments from storage connectivity loss. When a host loses a storage device it marks it in one of the following states:

PDL (Permanent Device Loss)

A device will be marked as permanently lost if the storage array responds with a SCSI sense code marking the device as unavailable. This could be in the event of a failed LUN or one which has been unmapped at a storage array level whilst active in vSphere. As the array and the host can still communicate SCSI sense codes are issued regarding the state of the device, at this point the host will stop sending I/O requests and label the device permanently unavailable.

APD (All Paths Down)

If the PDL SCSI code is not returned from a device then this is marked as All-Paths-Down (APD) and the ESXi host continues to send I/O requests until the host receives a response. This could be in the event of a fibre channel switch or HBA failure. The ESXi host is not able to determine if the device loss is permanent (PDL) or transient (APD) and therefore it indefinitely retries virtual machine I/O from the hostdagent. In vSphere 5.x an APD timeout was introduced for non-virtual machine I/O.

VMCP (VM Component Protection)

VMCP is a high availability feature, introduced in vSphere 6.x, to help detect and respond to PDL and APD events. If a device enters permanent device loss state vSphere can take the following actions:

  • Do nothing (disabled)
  • Issue an event to notify administrators
  • Restart the virtual machines on a host which still has access to the storage

If a device enters all paths down state vSphere can take the following actions:

  • Do nothing (disabled)
  • Issue an event to notify administrators
  • Restart the virtual machines on other hosts only if there is sufficient capacity to do so (conservative)
  • Restart the virtual machines on other hosts regardless of the response from the HA master (aggressive)

It is also possible to configure a delayed VM failover for APD and automatically reset a virtual machine if APD recovers before the VM failover timeout. This is useful for applications which may become unstable after a storage outage.

Speaking from personal experience a storage connectivity loss can be troublesome to identify and keep on top of, especially when intermittent. VMCP can’t fix any underlying issues with your storage array or at the fabric layer, but it can quickly do the leg work to determine which hosts still have access to the storage; automatically bringing virtual machines back online where possible.

Configuring VMCP

In a vSphere 6.x environment VMCP can be configured within the vSphere HA options of the manage tab at cluster level. As this is a new feature it needs to be switched on within the vSphere web client. First tick the box to enable VM Component Protection, and then configure the relevant responses for datastores in PDL and APD states.

VMCP