Last week I presented a couple of sessions at the annual Sydney vForum, people sure are hungry for information on vSphere 5.5. There were around 400 people in my session on Architecting vSphere 5.5, it was standing room only! You can find my presentations here and here. Given this thirst for this information, I thought I would dive a little deeper into one of my favourite features in vSphere 5.5, vFRC (vFlash Read Cache). I would like to thank Mark Achtemichuk and Sankaran Sivathanu for their indirect assistance with this. I'd also HIGHLY recommend reading the vFRC Performance Whitepaper.
One of the most important considerations when implementing vFRC is the block size chosen for each VMDK cache. Since vFRC is configured on a per VMDK basis, each cache can have a different block size, ranging from 4KB to 1024KB with 8KB being the default. Why do we have an option for this configuration? Efficiency.
All applications are different, even the same app could have a different storage IO profile depending on the application use. There are some general rules that you could follow, such as Oracle using IO of 8KB or MS SQL using 64KB but unless you do some proper monitoring, there’s no real way to tell what your particular application's IO pattern predominantly uses.
Why does it matter? Well as I mentioned earlier, it’s around efficiency. When we configure vFRC for a certain block size, the application data block that is cached resides inside a vFRC block. If the cache block size does not match the application’s IO block size we get one of two things:
- If the cache block size is too small this increases cache misses.
- If the cache block size is too large there is internal fragmentation - wasted space in each cache block. This leads to a reduced overall cache capacity.
One more thing we need to consider is memory overhead. As we decrease the block size of the cache, the number of blocks increases. Keeping track of more blocks, takes more memory. You can find the details one the amount of overhead required in the whitepaper mentioned earlier.
So how do you ensure that the block size you set for each VMDK cache is correct? A handily little tool built into ESXi called vscsiStats. Now I’m not going to go over what this tool is in great detail, Duncan and Cormac have already done that. However I would like to explain the steps required to configure vFRC with the correct block size for your workloads.
- Log into the host that runs the vFRC candidate via SSH or via the console command line.
- List the currently running VMs and their VMDKs: vscsiStats -l
- Start capturing I/O for a particular candidate: vscsiStats -s -w YYYYY -i XXXX ( Where YYYYY is the Virtual Machine wouldGroupID and XXXX is the Virtual SCSI Disk handleID)
- Run the typical workload within the candidate VM
- After the workload has run for a bit of time, display the histogram for "IO Length": vscsiStats -p ioLength -c -w XXXXX -i YYYY
You output will be similar to the list below. Each line in the following output shows a different IO (block) size from 512-bytes to 524288-bytes (512-Kilobytes) The first number (before the comma) shows how many IOs, the second number (after the comma) shows the IO size
Histogram: IO lengths of commands,virtual machine worldGroupID,128782,virtual disk handleID,8192 (scsi0:0)
- Frequency,Histogram Bucket Limit
Finally, stop vscsiStats from collecting data: vscsiStats -x
As you can see from the captured data above, I/O size of 4KB has the largest count. Does this mean we should set the vFRC to 4KB? Not necessarily. If it was a higher count by a large margin, then yes this would be the correct setting. However we need to consider the entire I/O profile. For this example, I would start by setting the vFRC block size to 8KB or 16KB. This will ensure there is not too much cache fragmentation, but also covers most of the I/O range fairly well.
Once the block size is set, monitor your cache! Keep an eye out for cache hit percentage using:
esxcli storage vflash cache stats get –m <module > - c <cache file>
I’m going to be running some benchmarks to try and explain what kind of performance difference there is between a default vFRC configuration and one that has been rightsized. I'll share my results here when it’s done. In the mean time, make sure you size your vFRC correctly!