IBM XIV….

Oh Moshe – you disappoint me so….

When I was at EMC I used to look up at Moshe Yanai like he was a god.  The father of the Symmetrix, used to fly in and out of work in his own Helecopter on occasion, the uber-engineer that we all aspired to be.

Oh how the mighty have fallen.

There was an engineering adage that my uncle taught me when I was very young.

He said: “You can have it cheaper, you can have it faster, you can have it smaller – now pick any two of those.”

it’s always held true.  Cheap / Fast is usually huge, Small/Fast is usually expensive as hell, and Cheap/Small is usually show as molasses in January.

I got a chance to get a real close look at the XIV for the first time last week, and I have to say it’s got to be the biggest pile of garbage I’ve ever seen in my life.  In the above addage, it definitely falls into the “Cheap/Small/Slow” category.

From a “Tiering” standpoint I put it somewhere on the low end of being between the Clariion and Centerra, maybe like Atmos without the universal namespace and nas connectivity.

The idea that someone I work with THINKS they can run a transactional database on it is absolute nuttiness, and would be fun to watch if it wasn’t also just *SO* painful.

Here’s the stats I’ve found on it.

22,000-25,000 IOPS peak.

Depending on the cache-friendliness of the appliation.  However it must be said that IBM’s “testing” shows a much higher rate, but when you’re writing zeros to a system that assumes a zero-state at the start and just drops the write when it matches, it’s not a fair test now is it.

Power/Heat:

Read these numbers:

Operating environment

Temperature:  10 to 35 degrees C
Relative humidity:  25 to 80  percent
Max wet bulb:  23 C
Thermal dissipation:  26K BTU/hour (Holy surfaceofthesun!)
Maximum power consumption in watts:  8.4 KW
Sound Power, LwAu = 8.4 bels

8400 Watts/hr @ 0.15US/kWh = over $11,000 per year just in power requirements for a single frame. That’s not including cooling, which given that I almost got heat-stroke spending 5 minutes standing behind the thing, cooling has got to add up to a pretty penny as well.  Barry Burke over at The Storage Anarchist put the total operating cost at between $20,000 and $22,000 per year to run.

You can have any Protection level you want, as long as it’s Raid-1 (and any remote mirroring as long as it’s synchronous)

With a tip of the hat to Henry Ford.  Moshe *NEVER* liked anything but RAID1.  he grugingly added Raid-S to the Symm (and did it so half-heartedly that it *NEVER* worked right) because whiny customers demanded it.  For some reason he doesn’t like Raid-5 despite the fact that it has a place, especially in sequential read-intensive applications.  So customers start out being forced to buy twice the storage they actually need.

It works on a distributed-node system

Like the Atmos or Centerra.  (More like the Centerra in it’s methods actually, the XIV stores data in “BLOBS” (Binary-Large-Objects).  1Meg in size is what I’m led to understand, spread across the ENTIRE array.  So in theory, if you write a 200Meg file, a peice of that file is on every disk, and mirrored to at least one more disk.

The nodes presumably run a customized Linux OS (As much as I could get out of the CE before he realized I was an EMC-o-phile and quit talking to me)   So the downside of course is that if a node fails, you lose 12 disks in the system.

A dual-disk failure on a full system would almost certainly bring disaster

Yes, I said it.  This is the first system I’ve seen that, when full, would be incapable of losing two drives without data loss.  (The only way it would work would actually be if both drives were in the same node, since presumably the algorithm that governs the writes is smart enough not to mirror blobs within the same node.)

If a disk fails, it immediately starts a rebuild of the data to other disks (presuming the free space exists, but it must reserve enough to know it can re-balance a failure).  Now IBM says the time to rebuild is about 30 minutes.  (maybe true, haven’t seen it so I won’t say for certain)  Now if a second disk fails during that rebuild time, because of the distributed nature of the writes, it’s almost a 100% certainty that if the disk is in another node, it will have elements of the failed disk on it.  When this happens, even IBM’s redpaper says that restore from backup is necessary.  (Don’t know if like the Symm it’s capable of reading from the remote mirror to rebuild, if so that may be how they get around it, but then in order to have REAL protection you have to buy FOUR TIMES the disk space you actually need)

And last, but not least:

The IBM XIV back-end consists of…..

(drumroll please)

…(gigabit ethernet)

Yeah…I said it again.  Gig-E.  And not DCE or anything fancy (and lossless) but plain old standard copper gigabit ethernet.

First off, if you’re going to use Gig-E fine, but use optical.  In a box that is rife with magnetic fields, even the most shielded Cat-6 cable is easily penetrated.

The faults in this are too many to fathom.  If they had used DCE with Class3 service (guaranteed delivery) they might have had a chance of making this anything but Tier3 storage.

But the way it works is this.  The fibrechannel connections go to four of the 15 nodes.  These nodes are then in turn connected to the rest of the array via a dual-ethernet setup.  (Probably Round-Robin, I dont think the switches they used are layer-3 capable and as such support etherchannel or LACP, please correct me if I’m wrong)

So now *ALL* of your IO is now being processed by 4 system, which then have to write the data to the other 11.

That means, if you have a dual-connected 4 Gig host connect, it actually has only 4 GIG TOTAL instead of the 8Gig front end connection.  Since the back end is completely distributed, every host you add to the XIV takes a percentage of the bandwidth.

So let’s see, if you connect 30 hosts you have 1/30 of 8Gig of bandwidth, (four-dual-attached FC nodes) or 273megabits/sec if they all happen to hit at the same time.  (Now we all know that’s not likely, and that given normal operation *MOST* IO will not get queued up behind other IO.

Then it depends on the switch, if they used a switch with a 12G 100% non-blocking backplane, they might pull it off, seeing as the most they’ll have running at any given time is 4Gig.

But when you pair that up against a clariion CX4-960 (which this customer also has) and look at it’s 16x 4G dedicated Fibrechannel busses, you wonder what the hell they were thinking.

What does IBM say to do when an XIV system starts getting slow?

“Stop adding hosts/storage to it”

Really?  Are you really saying that if I’m at 70% capacity and I start seeing performance degridation and wait-for-disk in #topas, I should just write-off the remaining 24 terabytes of usable space?

Wow – that’s quite a marketing gimmick.  I bet you’d like me to come and buy another XIV when that happens too.

——————-

So yes, they bought one, and yes, they’re trying to put transactional databases on it, and yes, it’s going to fall flat on it’s face.

Not sure I want to be around when THAT happens, because I’m *SURE* they’ll try and blame me somehow.

Stay tuned – next week we’ll be evalutating the difference between VMWare and RedHat Enterprise(?) Virtualization…. (someone shoot me now…please)

Leave a Reply

Your email address will not be published.