I have recently spoken to a number of prospective customers who are evaluating new storage solutions (emerging storage vendors that are building flash enabled storage arrays) – what I find (not to my surprise), is that a good number of vendors are touting ambiguous/misleading marketing claims. If you are evaluating storage solutions out there, be sure to challenge and validate the vendor’s claims, and equip yourself to smell B.S and push the bullshit button. This will be a two part blog post on storage performance. Here are some examples of ridiculous bullshit that I heard:
- “We could do 100,000 random IOPS with 512-byte block size”
- “We see average of 0.2 ms latency with queue depth of 1”
- “Well, you can test performance on our storage unit with 10-20GB volume, just to make it quick and simple”
I will tear apart these stupid claims towards the end of this post…or maybe you could even do that yourself after reading this post.
What is an I/O?
Stands for Input/Output, it is a request for a block, it could be a read or write operation.
What is Sequential I/O?
Blocks are read or written in a sequential fashion. Examples would be video streaming, VMware storage vMotion, DB transaction log replay. Even mechanical disks perform sequential I/O relatively well, especially for the price you pay. Most array solutions will perform this well.
What is Random I/O?
Blocks are requested/over-written randomly. Example workloads that generates random I/O are: OLTP, Exchange, VDI. Most enterprise real workloads are random, so you’d most likely be interested in how the storage solution does in this area.
Basic Storage Performance Measurement
These measurements are analogous to 0-60mph time for a car. In the storage world, we care about Throughput & Latency.
Throughput: How many IO transactions can the array complete in a second? Or how many MB of data blocks are processed by the array for a given second. The unit of measurement used depends on the type of IO.
- Always measure random I/O with IOPS (IO processed per second) – why? Because the application will request for blocks of various sizes…and given that MB/s throughput = block size x IOPS, variable block size request would yield inconsistent MB/s throughput
- Always measure sequential I/O with MB/s – why? Because blocks are typically accessed in a sequential fashion, therefore, the MB/s throughput number is more meaningful
Latency: how long does the array take to finish an IO operation?
Most main stream applications care about average latency. Of course, apps such as trading or medical apps may have a strict requirement for max latency tolerable.
Variables that affect throughput & latency
I/O Request Size (a.k.a I/O Block Size)
I/O Request Size varies, depending on the I/O access pattern. Sequential I/O typically involves large block size (we know you will likely need the next block, so we will go ahead and fetch it). Random I/O, on the other hand, typically involves smaller block size. For example, existing customer placed a new set of products would result in a small update to the existing database table corresponding to the specific customer. Keep in mind, while random I/O typically operates in small I/O block size, they typically don’t go as small as 512-bytes! Always keep in mind, random I/O typically operates at small block size (4k-32k), but it typically doesn’t go as small as 512-bytes! When in doubt, check with your application owner, or if you are running ESX, try vscsistats tool to generate histogram of the I/O block maps. My next post will cover this in greater detail. If you hear high IOPS throughput with less than 512-byte block size, ask to see the performance number with 4k or above block size!
Queue Depth (a.k.a Outstanding I/O)
Enterprise workloads typically have lots of outstanding I/Os. A fast storage array that can process and complete I/O will yield lower latency for the application, and that translates to faster application response time. By definition, latency = Queue Depth/IOPS. Imagine if you set your host to have a queue depth of 1, you’d get great latency number. But that’s not a good representation of the real workload in your environment. Additionally, you are barely scratching the surface of your array by sending only one outstanding I/O at a given time. So beware of claims of low latency that has a queue depth of 1 during performance testing. Key is to ask the storage vendor what the sweet spot queue depth value is, for both random IO & sequential IO.
If you are doing a performance validation to prove the vendor’s performance claim, then be sure to use a volume size to generate data set that is larger than the DRAM size of the storage controller. Otherwise, you don’t fully test the array’s ability to process I/O outside of DRAM in the controller.
In summary, when evaluating a new emerging storage vendor’s solution (all flash, hybrid), be sure to keep these questions in mind, when facing claims of XXX IOPS, X ms latency: