VSS 101 and Design Considerations in VMware environment (Part I)

We have gotten lots of customer inquiries in learning more about VMDK vs. RDM vs. in-guest attached iSCSI storage, and its implication on data protection with Microsoft VSS framework/integration. Instead of telling you what to do directly, I decided to work with our engineering gurus (Jay Wang, Anagha Barve, Sathya Bhat and Scott Moreland) who think about this day in and day out, to do a joint two part post on how VSS works, follow by design considerations.


VSS 101
Microsoft Volume Shadow Service (VSS) framework provides application consistent “shadow” copy of volumes hosting application data. There are three major components in the VSS framework:

  • Requestor (any backup app that requests the service from VSS framework to create consistent shadow copy of the application volume(s))
  • Provider (manages application data volume, responds to requests from requestor to make shadow copy of the managed volume)
  • Writer (applications that are VSS-framework aware, for example, Exchange, SQL, Sharepoint)

 

When it comes to “Provider”, there are three main types:

  1. Software-based Provider ->shadow copy of the volume is created in software, at a layer above NTFS filesystem
  2. System-based Provider -> like #1 above, the provider is supplied by the Operating System itself.  System-based provider typically creates a copy-on-write shadow copy, and does not leverage the capabilities of the underlying hardware storage device.  Third party backup vendors typically supplied their own software based Provider that is optimized for their own backup application or storage hardware.
  3. Hardware-based Provider -> the work of creating the shadow copy is performed by the storage controller (array)

 

 

 

 

So how does it work?  Below is an over-simplified diagram along with explanation:

 

  1. Requestor asks VSS to create a shadow copy of a specific volume
  2. VSS instructs the Writer to prepare data for shadow copy (i.e., completing all pending transactions).   When finished, the writer will inform VSS that it’s done prepping the volume for an application consistent shadow copy
  3. VSS, upon confirmation that all pending transactions have been completed, instructs Writer to hold all new write requests (reads could still be served), for up to 60 seconds
  4. Upon acknowledgement from Writer that the application has been quiesced, all buffer cache is flushed to disk from NTFS filesystem
  5. VSS now directs Provider to create a shadow copy of the volume (it has 10 seconds to finish the job)
  6. Provider creates a shadow copy of the volume
  7. After Provider informs VSS that the shadow copy creation is done, VSS informs the writer to ‘un-quiesce’ the new writes(NOT shown in diagram above); lastly, VSS will inform the requestor that the shadow copy request is completed

NOTE the above steps are greatly simplified – keep in mind that VSS does check back with the Writer to make sure step 4 is completed successfully, meaning the new writes are properly quiesced/paused.  If not, it simply simply fails the operation and falls back to step 1 again.

Now let’s dive a bit more in the CLI “vssadmin

In the Windows Command Prompt, you can use “vssadmin” command to find out some native info about the system providers, writers as well as volumes available for shadow copy

vssadmin list providers (this will only list the system provider)

What about the software providers then?  Typically, they are registered as a service that runs in the OS.  For example, VMware Tools provides a VSS requestor and provider, the provider service can be found in the Services:

This service is stopped by default – it will only get started when the tools VSS requestor is attempting to make a shadow copy call to VSS.  If/when VMware snapshot is invoked, the service will start, and you will notice the “Volume Shadow Copy” service getting stopped.  That is to be expected, as we are not leveraging the System Provider to do the job for us.

vssadmin list volumes (this command returns the list of volumes that are available for app queisced shadow copy)


Above is the output from my SQL 2008 VM with three VMDKs, C:\ for OS, E:\ for database files, F:\ for transaction logs.  NOTE: if you created your Win2k8 VM prior to vSphere 4.1, then there’s an extra step you need to do to take, to enable the UUID for the VM to register with the guest (more info can be found here).

vssadmin list writers (this command lists all the applications that are VSS-aware; its core functions is to listen to the VSS service for shadow copy service, so it could flash data from memory buffer, commit pending transaction logs, and freeze new writes).  This is command is also a good tool for quick spot check on whether the previous application quiescing was successful.  If you need to dig deeper on failures, VMware KB 1037376 and KB 1007696 have list of instructions for tools log, vss trace, in addition to this command.

Below is the output from my SQL2008 VM:

Don’t bother with the “vssadmin list shadows command”, unless you are leveraging the built-in system providers (in most cases, you will NOT be using those).

Here’s a workflow diagram of application quiesced snapshot workflow with VMware tools VSS provider, working in conjunction with Nimble Storage volume collection with “vCenter Sync” snapshot schedule.

 

At this point, you might be curious to see what the workflow looks like when hardware VSS provider is used – let’s use MS Exchange application for our example:

 

  1. NPM schedule triggers snapshot process; NMP agent request for shadow copy through VSS
  2. VSS tells Exchange to “quiesce” mail stores
  3. Exchange “quiesce” mail stores and alerts VSS upon completion
  4. VSS tells NTFS to flush buffer cache.
  5. VSS tells Nimble array to take a snapshot of volume(s) – either in-guest mounted or RDM in passthru mode (since VMware ESX ignores all RDM in PT-mode during vmware snapshot creation task)
  6. Nimble array captures snapshots of all volumes in collection; the hardware provider also truncates the Exchange logs (as an app consistent, quiesced copy of the data is now captured)

That’s all there is to VSS!  And this wraps up our part I of the post for data protection with VSS.  Stay tuned for part II – we will take a deep dive into design considerations based on various disk connection methods (think RDM, direct attach/in-guest mounted, regular VMDK).

 

2 Comments

  1. Hey dude, thx…Hyper-V would have the same flow as the hardware provider through Nimble NMP. You’d have to install the Nimble NMP agent (which includes the VSS requestor & provider) – when a scheduled snapshot is triggered, it would communicate w/ the VSS requstor to kick off the app quiescing. Also keep in mind that a manual snapshot from array would NOT trigger this workflow — that snapshot is only a crash consistent one.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>