Don’t trust your primary data to a Drobo

I was reminded to data about the availability issues with storage. Of course, I live it every day on the “sell side” working for Pure Storage. But for my home lab and office, I tend to have “less than enterprise” requirements.

Even so, my primary use of my DroboElite has been for secondary copies (backups) of data on VMs. These include my primary storage (VM boot images, as well as file shares where I store all of my music, video and of course photos).

The iSCSI DroboElite works with VMware ESXi (VSphere 5.5) clusters, even though it is only qualified with ESXi 4.x.  The DroboElite isn’t very fast, but it’s worked well for me for a few years. It’s survived power outages despite lack of a UPS, and in general I’ve been happy to have it, although I’ve been frustrated at times with the performance as well as with it’s predilection for rebooting when being touched by non-qualified OSes such as FreeBSD.

Today I deleted a 4TB volume from my DroboElite, removing it from my cluster. It had about 2TB of data on it, but it was a full ZFS send/receive copy of data on another DroboElite volume. I was in the process of moving data from a block-aligned FreeBSD (virtualized) ZFS pool to a 4k-aligned ZFS pool (using these instructions, which I’ve been successful with in the past). I wanted to start the process from scratch, so I deleted the old 4k-aligned pool from FreeBSD, removed it from the Drobo, created a new 4TB volume and exported it to the host and consumed it as an RDM in my guest VM.

I noticed that space on the Drobo, which is scavenged after a volume deletion asynchronously as a background operation, was very slow to free up. I had plenty of space on the Drobo, so started creating my new pool.

During this operation, which has never given me issues in the past, the Drobo rebooted (I could tell by the change in fan speed).  It then went into endless reboot cycling. The blue lights, which light up one by one, were my benchmark for how far I was getting. The Drobo would get to light 8 and then reboot.

I followed the instructions at Drobo’s KB article #102 but that was no help.

The worst part about this was that I had broken my own rule, and had two “necessary” VMs living on the Drobo:  One of my routers (my primary PFSense router instance) and the vCenter Appliance.

Lucky for me I was running CARP redundancy and my other router (another VM running under VMware Fusion on a Mac mini in my garage) took over.

I lost one two pools on my FreeBSD file server, but the second was just a copy of the first, and the first had only backups of my all-SSD data pool, plus TimeMachine backups from our desktops and laptops. Do no primary data from that VM was lost. I also have been toying around with ZFS in another instance, using Passthrough disks, as an NFS server to replace my iSCSI Drobo, so I had yet another copy of everything.

This was a good scare. Time for me to go back and rebuild my backup strategy.

Also time for me to look into getting a Synology or other dedicated hardware array for my office.  I don’t trust the Drobo, and am going to want something with a bit more performance.

Thoughts?

3 thoughts on “Don’t trust your primary data to a Drobo”

  1. What type of drives are you using? There may be an undiagnosed failure causing the problem. I have a DroboElite, too, from 2010. This Summer I had a problem similar to what you described — I deleted a volume with about 1.2TB of data. A few hours later I looked and it hadn’t reclaimed all the space, maybe 500-600MB was recovered. The next morning I had a flashing red light – one of my 1.5TB Hitachi drives had failed. A few days later I received a replaced it with a 3TB from Seagate. About 2 days later the array rebuild completed, including the reclaim of the deleted volume’s space.

  2. I have 3 Seagate enterprise disks, and 2 desktop disks. in the system. Drobo only supports enterprise disks. They actually gave me a non-standard firmware for the Drobo to correct for a specific issue with one drive I used to have in there.

    My failure mode isn’t the same as yours… it never comes up to the point where I can determine if one of the disks is bad (single red light). It just keeps rebooting. I can remove all the drives and it will come up showing 5 missing drives. So it obviously is reading something on one or more of the drives that is throwing it for a loop.

    Right now I am going to bring it up multiple times, each time with one of the 5 drives removed, and see if it comes up in degraded mode. That will identify the bad disk or at least the corrupt disk.

    At this point I have little reason to bring it up fully, it’s more of an experiment to see if I can recover it.

  3. Well, apparently the Drobo does NOT want to come up with one drive (any of them) missing. It comes up all red, and says that I’ve removed one too many drives. It’s acting as if two or more drives are missing (I use single drive redundancy).

    Interestingly (I am still in this testing process), removal of one of the drives gives different results: Constant rebooting. So that drive doesn’t seem to help at all.

    My guess? That drive is partially alive, but not really. Removing any other drive results in a double fault. Removing this one doesn’t allow it to get far enough (other drives are corrupt in some fashion) to boot.

    I am going to double check that I’m happy that I’ve got all my data, and wipe this system. And, again, treat it as as backup target only. No primary data on this bugger.

Leave a Reply