Hybrid Arrays aren’t bullshit, but the name is!

By | March 18, 2014

Why in the world has the storage industry come up with this name “Hybrid Array” for arrays that are combinations of Flash (usually SSD) and HDD? At first it makes sense, but when you dig into it, it does not.

Most of the enterprise tier 1 and tier 2 disk arrays sold in the past decade have been a combination of HDD and some sort of mirrored/protected DRAM for caching. The DRAM give a low latency I/O data protection mechanism, allowing the array to acknowledge writes as soon as the data is protected into (usually) mirrored DRAM. Secondarily, the DRAM can also provide a read cache, and variations on read/write caching have been implemented over the years.
Data at rest resides on disk, usually in some sort of RAID layout, be it conventional or unique or proprietary.

Over the years, vendors also offered tiered arrays. You could buy an array with some number of, say, 15k Fibre Channel disks, and some number of 7200rpm Nearline SAS or SATA disks. Tiered arrays don’t always allow for mobility between tiers. In the basic case, you create raid groups and/or LUNs from one type of disk, and raid groups or LUNs from another type, and move data between those at the host level.

Maybe your array allows you to migrate a LUN from one type to another, manually. That’s still tiered, but offloads the data movement. Maybe your vendor has software that will do it for you under a schedule or under specific conditions.

With Tiered arrays, if you wish to maintain data mobility between tiers, you must have enough free capacity in each tier to allow data to move to that tier. The cost of mobility in a tiered array is capacity utilization.

Enter the Flash Hybrid array. What is a Flash Hybrid?  Well, it seems that a Flash Hybrid can be one of many things.

  • It can be a caching array that uses Flash as a non-volatile (or sometimes volatile!) replacement for DRAM read or write caches of old. The storage capacity of a caching array is limited to the capacity of the HDDs less RAID overhead (important)
  • It can be a traditional tiered array that has pools of Flash/SSD and pools of HDD, with non-automated or automated movement between tiers, on a whole LUN basis (a LUN is either SSD or it’s HDD). The storage capacity of a caching array is  the capacity of the HDDs  and the SSDs combined, less RAID, but spare “slack” capacity is needed to allow data mobility between tiers. Any moved to increase performance  by moving data to SSD will require ejecting data to HDD, reducing performance of that ejected data.
  • It can be something in between, wherein data can live on either tier, on a sub-LUN basis, as any LUN is highly virtualized and lives across many HDDs and SSDs. Importantly, the capacity of a Hybrid can be greater than the capacity of the HDD components. Data is not “cached” in Flash while living on HDD, under normal “at rest” circumstances.

Have I left anything out? Feel free to comment… this is an exercise in thought, not a bible of hard facts!

Look at the bullets above: The first is simply replacing DRAM with Flash for caching. There’s nothing “hybrid” about this, and if there is, the CLARiiON FC4700 I used to run in my home lab was a DRAM Hybrid Array. Take that to your marketing person and sell it.  The second bullet simply represents a traditional tiered storage array, using SSDs as a tier.

To me, only the last bullet describes something new, something that is deserving of the term “Hybrid Flash Array”. If the vendor is doing something new and unique with Flash and HDD, that’s a Hybrid.

Vendors that are simply adding SSD pools to storage  and tacking on the word “Hybrid”, are simply playing a game with their user base and trying to ride the coattails of vendors that are doing new and unique things.

Vendors that are using Flash as a larger capacity non-volatile replacement for DRAM caches are doing something a bit more interesting. There are current “Hybrid” players out there that I respect for their ability to compete in the market that are using this type of hybrid architecture.  I still feel strongly that a different name should be used for these in order to differentiate them.

My proposal:

  • Tiered arrays are just tiered arrays. Do you offer an SSD tier?  Fine! Just don’t call it a hybrid.
  • Arrays using Flash as a caching layer should be Flash-Enhanced Arrays. Or Flash Cache Arrays, but there are already a couple or more players using the term “Flash Cache”.
  • Hybrid Arrays manage data movement to and from SSD and HDD in a non-volatile manner. They do so on the fly, dynamically and on a fine grained basis.

Conversely, there are things I want to call out are remove from the world of “hybrids”

  • Doing sub-LUN data-movement but on a large number of MegaBytes at a time is not “hybrid”. That’s “sub-LUN tiering”.
  • Doing so on a scheduled mechanism based on statistics over time is not “hybrid”. That’s the type of thing that legacy array vendors do by bolting sub-lun tiering onto a legacy technology.

 

 

Category: Rants Storage

About Bill Plein

I've been in the data storage industry since the 1990's, most recently with 3PAR, Fusion-io, and Pure Storage. I'm now with Diamanti, where we make Kubernetes easy and fast. I'm attracted to bright, shiny new objects.

4 thoughts on “Hybrid Arrays aren’t bullshit, but the name is!

  1. Nile Gardner

    What ever happened to HSM? Don’t hear anyone mentioning that term anymore.

  2. Bill Plein Post author

    Nile, long time no see!

    HSM, ILM…. sort of like “workflow software” became “business process automation” software 🙂

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.