FreeBSD 9.2 supports ZFS TRIM/UNMAP

Just playing around with this today, and it’s very cool. FreeBSD 9.2 now supports TRIM/UNMAP for ZFS.

To see if your disk is reporting the capability, look for kern.cam.da.<device>.delete_method where <device> is the number of the da device, i.e. /dev/da5 is “5″ in that place.

For example:

# sysctl -a | grep delete_method
kern.cam.da.0.delete_method: NONE
kern.cam.da.1.delete_method: NONE
kern.cam.da.2.delete_method: NONE
kern.cam.da.3.delete_method: NONE
kern.cam.da.4.delete_method: NONE
kern.cam.da.5.delete_method: UNMAP
kern.cam.da.6.delete_method: NONE
kern.cam.da.7.delete_method: NONE
kern.cam.da.8.delete_method: NONE
kern.cam.da.9.delete_method: NONE
kern.cam.da.10.delete_method: NONE
kern.cam.da.11.delete_method: NONE
kern.cam.da.12.delete_method: NONE

I’ve tested this with a Pure Storage FlashArray, using FreeBSD 9.2 running under VMware, and an RDM LUN passed through to FreeBSD to give it native SCSI access. That’s da5 above showing the UNMAP method.

According to Alexander Motin, the following are supported and settable via sysctl:

  NONE - no provisioning support reported by the device;
  DISABLE - provisioning support was disabled because of errors;
  ZERO - use WRITE SAME (10) command to write zeroes;
  WS10 - use WRITE SAME (10) command with UNMAP bit set;
  WS16 - use WRITE SAME (16) command with UNMAP bit set;
  UNMAP - use UNMAP command.

 

Hybrid Arrays aren’t bullshit, but the name is!

Why in the world has the storage industry come up with this name “Hybrid Array” for arrays that are combinations of Flash (usually SSD) and HDD? At first it makes sense, but when you dig into it, it does not.

Most of the enterprise tier 1 and tier 2 disk arrays sold in the past decade have been a combination of HDD and some sort of mirrored/protected DRAM for caching. The DRAM give a low latency I/O data protection mechanism, allowing the array to acknowledge writes as soon as the data is protected into (usually) mirrored DRAM. Secondarily, the DRAM can also provide a read cache, and variations on read/write caching have been implemented over the years.
Data at rest resides on disk, usually in some sort of RAID layout, be it conventional or unique or proprietary.

Over the years, vendors also offered tiered arrays. You could buy an array with some number of, say, 15k Fibre Channel disks, and some number of 7200rpm Nearline SAS or SATA disks. Tiered arrays don’t always allow for mobility between tiers. In the basic case, you create raid groups and/or LUNs from one type of disk, and raid groups or LUNs from another type, and move data between those at the host level.

Maybe your array allows you to migrate a LUN from one type to another, manually. That’s still tiered, but offloads the data movement. Maybe your vendor has software that will do it for you under a schedule or under specific conditions.

With Tiered arrays, if you wish to maintain data mobility between tiers, you must have enough free capacity in each tier to allow data to move to that tier. The cost of mobility in a tiered array is capacity utilization.

Enter the Flash Hybrid array. What is a Flash Hybrid?  Well, it seems that a Flash Hybrid can be one of many things.

  • It can be a caching array that uses Flash as a non-volatile (or sometimes volatile!) replacement for DRAM read or write caches of old. The storage capacity of a caching array is limited to the capacity of the HDDs less RAID overhead (important)
  • It can be a traditional tiered array that has pools of Flash/SSD and pools of HDD, with non-automated or automated movement between tiers, on a whole LUN basis (a LUN is either SSD or it’s HDD). The storage capacity of a caching array is  the capacity of the HDDs  and the SSDs combined, less RAID, but spare “slack” capacity is needed to allow data mobility between tiers. Any moved to increase performance  by moving data to SSD will require ejecting data to HDD, reducing performance of that ejected data.
  • It can be something in between, wherein data can live on either tier, on a sub-LUN basis, as any LUN is highly virtualized and lives across many HDDs and SSDs. Importantly, the capacity of a Hybrid can be greater than the capacity of the HDD components. Data is not “cached” in Flash while living on HDD, under normal “at rest” circumstances.

Have I left anything out? Feel free to comment… this is an exercise in thought, not a bible of hard facts!

Look at the bullets above: The first is simply replacing DRAM with Flash for caching. There’s nothing “hybrid” about this, and if there is, the CLARiiON FC4700 I used to run in my home lab was a DRAM Hybrid Array. Take that to your marketing person and sell it.  The second bullet simply represents a traditional tiered storage array, using SSDs as a tier.

To me, only the last bullet describes something new, something that is deserving of the term “Hybrid Flash Array”. If the vendor is doing something new and unique with Flash and HDD, that’s a Hybrid.

Vendors that are simply adding SSD pools to storage  and tacking on the word “Hybrid”, are simply playing a game with their user base and trying to ride the coattails of vendors that are doing new and unique things.

Vendors that are using Flash as a larger capacity non-volatile replacement for DRAM caches are doing something a bit more interesting. There are current “Hybrid” players out there that I respect for their ability to compete in the market that are using this type of hybrid architecture.  I still feel strongly that a different name should be used for these in order to differentiate them.

My proposal:

  • Tiered arrays are just tiered arrays. Do you offer an SSD tier?  Fine! Just don’t call it a hybrid.
  • Arrays using Flash as a caching layer should be Flash-Enhanced Arrays. Or Flash Cache Arrays, but there are already a couple or more players using the term “Flash Cache”.
  • Hybrid Arrays manage data movement to and from SSD and HDD in a non-volatile manner. They do so on the fly, dynamically and on a fine grained basis.

Conversely, there are things I want to call out are remove from the world of “hybrids”

  • Doing sub-LUN data-movement but on a large number of MegaBytes at a time is not “hybrid”. That’s “sub-LUN tiering”.
  • Doing so on a scheduled mechanism based on statistics over time is not “hybrid”. That’s the type of thing that legacy array vendors do by bolting sub-lun tiering onto a legacy technology.

 

 

Don’t trust your primary data to a Drobo

I was reminded to data about the availability issues with storage. Of course, I live it every day on the “sell side” working for Pure Storage. But for my home lab and office, I tend to have “less than enterprise” requirements.

Even so, my primary use of my DroboElite has been for secondary copies (backups) of data on VMs. These include my primary storage (VM boot images, as well as file shares where I store all of my music, video and of course photos).

The iSCSI DroboElite works with VMware ESXi (VSphere 5.5) clusters, even though it is only qualified with ESXi 4.x.  The DroboElite isn’t very fast, but it’s worked well for me for a few years. It’s survived power outages despite lack of a UPS, and in general I’ve been happy to have it, although I’ve been frustrated at times with the performance as well as with it’s predilection for rebooting when being touched by non-qualified OSes such as FreeBSD.

Today I deleted a 4TB volume from my DroboElite, removing it from my cluster. It had about 2TB of data on it, but it was a full ZFS send/receive copy of data on another DroboElite volume. I was in the process of moving data from a block-aligned FreeBSD (virtualized) ZFS pool to a 4k-aligned ZFS pool (using these instructions, which I’ve been successful with in the past). I wanted to start the process from scratch, so I deleted the old 4k-aligned pool from FreeBSD, removed it from the Drobo, created a new 4TB volume and exported it to the host and consumed it as an RDM in my guest VM.

I noticed that space on the Drobo, which is scavenged after a volume deletion asynchronously as a background operation, was very slow to free up. I had plenty of space on the Drobo, so started creating my new pool.

During this operation, which has never given me issues in the past, the Drobo rebooted (I could tell by the change in fan speed).  It then went into endless reboot cycling. The blue lights, which light up one by one, were my benchmark for how far I was getting. The Drobo would get to light 8 and then reboot.

I followed the instructions at Drobo’s KB article #102 but that was no help.

The worst part about this was that I had broken my own rule, and had two “necessary” VMs living on the Drobo:  One of my routers (my primary PFSense router instance) and the vCenter Appliance.

Lucky for me I was running CARP redundancy and my other router (another VM running under VMware Fusion on a Mac mini in my garage) took over.

I lost one two pools on my FreeBSD file server, but the second was just a copy of the first, and the first had only backups of my all-SSD data pool, plus TimeMachine backups from our desktops and laptops. Do no primary data from that VM was lost. I also have been toying around with ZFS in another instance, using Passthrough disks, as an NFS server to replace my iSCSI Drobo, so I had yet another copy of everything.

This was a good scare. Time for me to go back and rebuild my backup strategy.

Also time for me to look into getting a Synology or other dedicated hardware array for my office.  I don’t trust the Drobo, and am going to want something with a bit more performance.

Thoughts?

How to DOS (Denial of Service) an iSCSI Drobo

Here’s how you can reboot any iSCSI Drobo (DroboElite or B800i, anyhow) on your network:

  • Install FreeBSD 8, 9 or 10 on a system or VM
  • Fire up the iSCSI initiator (old version with FreeBSD 9 or earlier, or new version with FreeBSD 10 beta)
  • Connect to the Drobo
  • With the earlier version of the iSCSI initiator, you could enumerate LUNs and read from a LUN, but as soon as you wrote, BAM, the Drobo would reboot.

    With the current FreeBSD 10-beta initiator, just doing a discovery of the Drobo will reboot it.

    Nifty, huh?