VSS 101 and Design Considerations in VMware environment (Part II)

Nimble engineers(Jay Wang, Anagha Barve, Sathya Bhat, Scott Moreland) and tech marketing strike again with part II of VSS.  Now that you have basic understanding of how VSS framework and VMware quiesced snapshot integration work (if you haven’t read the first post, click here).  Now let’s jump into the subject of design considerations – for those of you that read my blog regularly, you  know this is my favorite subject – highlighting areas of that you should watch out for, when designing your virtualization infrastructure.  Here we go, for each disk attachment method available in ESX environment:


RDM(Physical Compatibility Mode) for application disks, VMDK for OS disk

In this case, VMware will simply ignore the RDM disk during snapshot operation, meaning  ESX will only create a VMware snapshot for the O/S VMDK.  As for the application disk that is running as RDM, the Nimble VSS hardware provider will be used for snapshots.  Therefore, it is imperative to ensure the Volume Collection containing the RDM volume has “Microsoft VSS” synchronization selected.

NOTE

1)      With “Microsoft VSS” synchronization, there would be NO VMware snapshot taken by ESX servers.  The Nimble VSS hardware provider will be leverage for taking the snapshot, after the VSS writer has successfully freeze incoming I/O requests

2)      The Nimble Windows Toolkit (Nimble Protection Manager/NMP) needs to be installed on the VM that has RDM storage attached

 

VMDK for both application and OS disks

With this configuration, keep in mind that ALL VMsn a given VMFS volume need to be quiesced + VMware snapshot taken before the array takes a volume level snapshot.  It is a wise idea to limit the number of virtual machines you have in a given VMFS volume.  For example, if you have VMs that are running file share/print/web services, then you are basically wasting the time for taking a ‘quiesced’ snapshot, as the application is stateless in nature.  Simply create another volume to host such VMs, and ensure the volume collection contains only VMs that require VSS quiesced snapshot (with appropriate VSS writer).

NOTE

The current VMware implementation of the software provider does NOT truncate logs for Exchange.  If you have an integrated backup application such as Commvault  that could be invoked to truncate the logs, be sure to leverage that.  If not, you could 1)enable circular logging in Exchange 2)consider in-guest/RDM mounted storage 3)build custom script to invoke during backup to truncate the Exchange logs.

 

Direct attached/in-guest mounted storage for application data, VMDK for OS disk

With this configuration, the in-guest mounted storage will bypass the ESX VMkernel storage stack, and simply appear as network traffic to ESX.  Customers typically use this configuration for 1)MSCS on iSCSI protocol 2)get beyond the 2TB VMDK size limitation.  Just like any other method, there are design considerations/tradeoffs.

NOTE

1)      The “gotcha” with this configuration is SRM (Site Recovery Manager) and other upper level solution (i.e., vCloud Director) interoperability.  Let’s start with SRM – it does NOT know about in-guest mounted storage as storage/vmdk, therefore, you’d have to add extra steps to mount these volumes for each VM that uses this type of storage.  Refer to my previous post on SRM “gotchas” for further details.  For vCloud Director, you will NOT be able to package the VM as a vApp template and deploy without manual intervention to mount the in-guest storage.  Additionally, in terms of performance monitoring, esxtop will not display disk level stats for the in-guest storage – you’d have to reply on the network stats (press ‘N’).  vCenter Operations will not interpret the in-guest attached storage stats as ‘storage’ info.  It will get treated as network traffic

2)      The Nimble Windows Toolkit(Nimble Protection Manager/NMP) needs to be installed on the VM that has RDM storage attached

Last but not least, with all three virtual machine storage connectivity methods above, the following “gotcha” applies:

*ONLY a scheduled snapshot operation in a Volume Collection would invoke VSS quiescing, a manual snapshot that you take from vCenter Server Nimble plugin and/or Nimble UI would NOT trigger the VSS requestor to request for application quiescing*

In Summary, here’s a quick table reference on all three connectivity options and considerations/caveats for each:

Connectivity Method Snapshot Synchronization NPM Installation Considerations
Raw Device Mapping(RDM) Microsoft VSS Inside the guest OS (NPM requestor & provider used for snapshot)
  • No VMware snapshot taken
VMDK vCenter None (VMware VSS requestor/provider used for snapshot)
  • No Log truncation for MS Exchange
  • Avoid too many VMs in the same datastore
Direct Attached (In-guest mounted) Microsoft VSS Inside the guest OS (NPM requestor & provider used for snapshot)
  • Manual work needed for SRM/vCD
  • No disk level stats from esxtop/vC Ops
  • NPM requestor & provider used for quiescing

That’s it for now – happy backing up your VMs!  Oh, don’t forget to restore them once in a while – what good is backup when you can’t restore from it, right?

 

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>