Storage Performance 102 – 100,000 IOPS with 0.1 ms latency?!?! PROVE IT!

No, this post is not about how to get 100,000 IOPS with sub-millisecond latency from your storage solution.  Instead, I will show you some tips on how to make sure your storage vendor lives up to their performance claims.  We had discussed the fundamental concepts of storage performance in the part 1 post, it’s time to dig deeper into storage IO performance monitoring and measurement.  Knowing this will help making sure whatever you end up buying is actually better than the legacy crap you are replacing.  Isn’t that the whole point anyway? :) NOTE: Physical server measurement technique is out of scope for this post – we are only focusing on VMware ESX server in this post.

The million bucks question is —- how do I get a rough idea of how many IOPS my ESX server is running?  And what is the average latency?

It is always a good practice to know how your storage is performing, from the ESX host standpoint.  Pick one ESX server that is running active workload in your environment, hopefully a busy one (and one that you plan to replace your storage with).  Before getting into esxtop, find out which HBAs are serving I/O for your ESX host.  In my example, I have an ESX server connected to a base Nimble CS200 array using software iSCSI initiator in ESX host, vmhba34 is the name of the adapter of interest:

Now that you have found the HBA that is responsible for I/O (if you use FC, you will most likely have multiple HBAs for multipathing/failover; if not, you got a bigger problem to focus on, which is to add another HBA to the host or buy a dual port FC HBA to avoid single point of failure.  With iSCSI using SW/iSCSI adapter in ESX, we will only see one vmhba as the vmkernel port binding allows for the logical vmhba to issue I/O across all of the vmnics that it binds to).  Once you have identified the vmhba(s), remember the following sequence of commands:


Type ‘d’ to bring up ‘disk adapter stats’

Type ‘f’ to bring up list of available fields to choose from (type ‘h’ and ‘I’ to add in average read/write latency):

Press “Enter” to get back into the “disk adapter” stats screen; now the field you want to focus on are the ones highlighted below (The key here is to make sure the ESX host is under load from your real application or the workload generation tool from the storage vendor).  Below is an example of my ESX host under a simple random write IOMeter test with 4KB block size (I had my screen refresh interval to 2 seconds).  The key is to focus on the CMDS/s (commands per second = reads/second + writes/second), READS/s, WRITES/s, DAVG/cmd (device average latency per command), and try to match it up with what the storage vendor’s user interface shows.

If you are looking to replace your legacy storage with a new solution, get a baseline measurement of the average IOPS and latency during peak times.  And then do a comparison with the new solution during the same peak period.  If you are too lazy to sit in front of the ssh session to monitor the IO stats in action, you could of course issue esxtop in batch mode and poke through the average after the fact.  Here is the trick to manipulating results from esxtop batch mode: if you have used it before, you’d know that the CSV file generated contains a million entries (from helper vmworld VM to CPU to memory to network to disk stats, for each and every damn object in the ESX host).  If all you care about is storage performance metrics, then follow the trick below (trick was documented in my previous post here):

#touch esxtopstat

#vi esxtopstat

Inside the file, simply enter in “vmhba34” (or whatever your vmhbaX number might be), save and exit by pressing ESC, then “:x”

#esxtop –b –i ./esxtopstat –d 2 –n 30 >> ./esxtopstats.csv (where –b means batch mode, -i means import the relevant object for stats gathering, -d is the interval between updates, -n is the number of iterations to run, and we are directing the output to a csv file named “esxtopstats”)

Doing this reduce the size of the CSV file by at least 90% (I did a simple stat test on my ESX host for 1 minute, and the file size is 430KB as oppose to 4800KB with all the other craps).

Now that you have gotten the csv file, you could open it in Excel or perfmon.

Excel: open up the CSV file, then search using “adapter” for the top row – this search will take you straight to the vmhba that you care about – in my case, it is vmhba34.  And I could easily calculate the average IOPS for the duration of my sample collection, as well as the latency using Excel formulas.  The columns that are most important are:

  • Physical Disk Adapter(vmhba34)\Commands/sec <–total IOPS per second
  • Physical Disk Adapter(vmhba34)\Reads/sec <–read IO per second
  • Physical Disk Adapter(vmhba34)\Writes/sec <–write IO per second
  • Physical Disk Adapter(vmhba34)\Average Guest MilliSec/Command <–latency as observed by the VM

If Excel is not your thing, then you could also use perfmon (given that we eliminated a bunch of objects, it is a lot faster to import the CSV file into perfmon:

Step 1: open up perfmon

Step 2: type “CTRL + SHIFT + L”

Step 3: select the CSV file you collected











Step 4: Select the vmhba (you should only see the one you care about if you use the “esxtop –i” option)

4a: Select the appropriate IOPS counters:











4b: In the Graph tab, enter in the maximum value for the vertical scale (if the storage vendor claims X number of IOPS, then enter that number here, and see how much they lives up to their claims :))  After entering the latency max value for the graph, click OK to generate the graph


After review of the IOPS graph, the next step is to select appropriate counters for latency as observed by the VM (you can leave read/write latency stats if you run a mixed workload)











Enter the maximum value for latency in ms (if the vendor claims 0.1 ms for example, then you could enter 1 ms and see how much IO actually lives in the 10% range of the graph J)


In summary, remember the following set of useful esxtop commands to get an idea of:

1)      How many IOPS is your ESX environment running (do you really need 100,000+ IOPS?)

2)      What is your current latency as seen from the host side, with your current storage solution?  What about your new storage solution?

Last but not least, not all hybrid array solutions are created equal, and performance is only one aspect (don’t forget other important evaluation criteria such as data protection, DR protection, integrations & post-sale support).  Happy test driving!

Leave a Reply

Your email address will not be published. Required fields are marked *