IBM XIV….

Oh Moshe – you disappoint me so….

When I was at EMC I used to look up at Moshe Yanai like he was a god.  The father of the Symmetrix, used to fly in and out of work in his own Helecopter on occasion, the uber-engineer that we all aspired to be.

Oh how the mighty have fallen.

There was an engineering adage that my uncle taught me when I was very young.

He said: “You can have it cheaper, you can have it faster, you can have it smaller – now pick any two of those.”

it’s always held true.  Cheap / Fast is usually huge, Small/Fast is usually expensive as hell, and Cheap/Small is usually show as molasses in January.

I got a chance to get a real close look at the XIV for the first time last week, and I have to say it’s got to be the biggest pile of garbage I’ve ever seen in my life.  In the above addage, it definitely falls into the “Cheap/Small/Slow” category.

From a “Tiering” standpoint I put it somewhere on the low end of being between the Clariion and Centerra, maybe like Atmos without the universal namespace and nas connectivity.

The idea that someone I work with THINKS they can run a transactional database on it is absolute nuttiness, and would be fun to watch if it wasn’t also just *SO* painful.

Here’s the stats I’ve found on it.

22,000-25,000 IOPS peak.

Depending on the cache-friendliness of the appliation.  However it must be said that IBM’s “testing” shows a much higher rate, but when you’re writing zeros to a system that assumes a zero-state at the start and just drops the write when it matches, it’s not a fair test now is it.

Power/Heat:

Read these numbers:

Operating environment

Temperature:  10 to 35 degrees C
Relative humidity:  25 to 80  percent
Max wet bulb:  23 C
Thermal dissipation:  26K BTU/hour (Holy surfaceofthesun!)
Maximum power consumption in watts:  8.4 KW
Sound Power, LwAu = 8.4 bels

8400 Watts/hr @ 0.15US/kWh = over $11,000 per year just in power requirements for a single frame. That’s not including cooling, which given that I almost got heat-stroke spending 5 minutes standing behind the thing, cooling has got to add up to a pretty penny as well.  Barry Burke over at The Storage Anarchist put the total operating cost at between $20,000 and $22,000 per year to run.

You can have any Protection level you want, as long as it’s Raid-1 (and any remote mirroring as long as it’s synchronous)

With a tip of the hat to Henry Ford.  Moshe *NEVER* liked anything but RAID1.  he grugingly added Raid-S to the Symm (and did it so half-heartedly that it *NEVER* worked right) because whiny customers demanded it.  For some reason he doesn’t like Raid-5 despite the fact that it has a place, especially in sequential read-intensive applications.  So customers start out being forced to buy twice the storage they actually need.

It works on a distributed-node system

Like the Atmos or Centerra.  (More like the Centerra in it’s methods actually, the XIV stores data in “BLOBS” (Binary-Large-Objects).  1Meg in size is what I’m led to understand, spread across the ENTIRE array.  So in theory, if you write a 200Meg file, a peice of that file is on every disk, and mirrored to at least one more disk.

The nodes presumably run a customized Linux OS (As much as I could get out of the CE before he realized I was an EMC-o-phile and quit talking to me)   So the downside of course is that if a node fails, you lose 12 disks in the system.

A dual-disk failure on a full system would almost certainly bring disaster

Yes, I said it.  This is the first system I’ve seen that, when full, would be incapable of losing two drives without data loss.  (The only way it would work would actually be if both drives were in the same node, since presumably the algorithm that governs the writes is smart enough not to mirror blobs within the same node.)

If a disk fails, it immediately starts a rebuild of the data to other disks (presuming the free space exists, but it must reserve enough to know it can re-balance a failure).  Now IBM says the time to rebuild is about 30 minutes.  (maybe true, haven’t seen it so I won’t say for certain)  Now if a second disk fails during that rebuild time, because of the distributed nature of the writes, it’s almost a 100% certainty that if the disk is in another node, it will have elements of the failed disk on it.  When this happens, even IBM’s redpaper says that restore from backup is necessary.  (Don’t know if like the Symm it’s capable of reading from the remote mirror to rebuild, if so that may be how they get around it, but then in order to have REAL protection you have to buy FOUR TIMES the disk space you actually need)

And last, but not least:

The IBM XIV back-end consists of…..

(drumroll please)

…(gigabit ethernet)

Yeah…I said it again.  Gig-E.  And not DCE or anything fancy (and lossless) but plain old standard copper gigabit ethernet.

First off, if you’re going to use Gig-E fine, but use optical.  In a box that is rife with magnetic fields, even the most shielded Cat-6 cable is easily penetrated.

The faults in this are too many to fathom.  If they had used DCE with Class3 service (guaranteed delivery) they might have had a chance of making this anything but Tier3 storage.

But the way it works is this.  The fibrechannel connections go to four of the 15 nodes.  These nodes are then in turn connected to the rest of the array via a dual-ethernet setup.  (Probably Round-Robin, I dont think the switches they used are layer-3 capable and as such support etherchannel or LACP, please correct me if I’m wrong)

So now *ALL* of your IO is now being processed by 4 system, which then have to write the data to the other 11.

That means, if you have a dual-connected 4 Gig host connect, it actually has only 4 GIG TOTAL instead of the 8Gig front end connection.  Since the back end is completely distributed, every host you add to the XIV takes a percentage of the bandwidth.

So let’s see, if you connect 30 hosts you have 1/30 of 8Gig of bandwidth, (four-dual-attached FC nodes) or 273megabits/sec if they all happen to hit at the same time.  (Now we all know that’s not likely, and that given normal operation *MOST* IO will not get queued up behind other IO.

Then it depends on the switch, if they used a switch with a 12G 100% non-blocking backplane, they might pull it off, seeing as the most they’ll have running at any given time is 4Gig.

But when you pair that up against a clariion CX4-960 (which this customer also has) and look at it’s 16x 4G dedicated Fibrechannel busses, you wonder what the hell they were thinking.

What does IBM say to do when an XIV system starts getting slow?

“Stop adding hosts/storage to it”

Really?  Are you really saying that if I’m at 70% capacity and I start seeing performance degridation and wait-for-disk in #topas, I should just write-off the remaining 24 terabytes of usable space?

Wow – that’s quite a marketing gimmick.  I bet you’d like me to come and buy another XIV when that happens too.

——————-

So yes, they bought one, and yes, they’re trying to put transactional databases on it, and yes, it’s going to fall flat on it’s face.

Not sure I want to be around when THAT happens, because I’m *SURE* they’ll try and blame me somehow.

Stay tuned – next week we’ll be evalutating the difference between VMWare and RedHat Enterprise(?) Virtualization…. (someone shoot me now…please)

40 comments

Skip to comment form

  1. I”m gobsmacked that something so bad would even be designed, yet alone manufactured and bought by the likes of IBM. Just couldn’t believe it and stumbled across a reference to a case-study of Bank Leumi using an XIV as tier 1 storage for Oracle.

    Of course, nobody asked you to evaluate the proposed storage solution before they purchased! If I were you, I’d decline the contract. Unless you enjoy the mud slinging that’s going to follow and as you say, you’ll probably cop an unfair share.

    1. It definitely shows a lack of judgment on their end, or a desperate desire to jump into a market that they really do not understand.

  2. The fibre channel connections go to 6 nodes out of the total 15, not 4 of the nodes (unless you’ve got something like a partial rack implementation, but then you’d have less than 15 nodes).

    The dual-disk failure risk is an issue, yes, but really it comes back to your original statement at the start of this post – you can have cheap, fast, and small, now pick 2.

    I’ve heard it as “cheap, fast, and reliable, now pick 2” – in this case, the XIV is cheap and fast, reliability is the element that’s potentially been compromised.

    For what it does, the XIV is cheap (especially when you consider all the software licences are included in the base price), and pretty fast, but as with all IT purchases, expectations are everything. If your colleagues implementing the transactional database are happy with 25K IOPs at the price they paid, then who are we to argue? Surely they’ll of asked EMC for a competitive bid price on a DMX/VMax which would have delivered equivalent performance and functionality?

    I’d be interested in an update in a few months on how the XIV is going, generally the customer references are very happy with them, but then they wouldn’t be reference sites if they weren’t..

    1. I stand corrected. 6 access nodes.

      Everything I’ve heard though points to “It’s Cheap” as the only reason for buying an XIV, the reliability is questionable, and the speed, where I’ve been able to find published numbers, is horrible once the array is beyond about 50% capacity. (Probably because THAT is close to the point where it is no longer writing to the access nodes, but instead is having to pass the data through the Gig-E backend to the data nodes.)

    • Dave on March 24, 2010 at 6:13 am
    • Reply

    Speaking as a SAN engineer at a big EMC shop myself..

    It’s not RAID-1 at all, you’re over simplifying it. It’s not even a disk array, it’s a storage array. Think about the fundamental difference of an array that has no 1:1 relationship between disks. It does store 2 copies of all of your data blocks, which is why people who haven’t given it a lot of thought just assume it is RAID-1.

    It can rebuild in 30 minutes (or less), I’ve seen it and when you understand the point in the paragraph above you’ll understand why rebuilds can be so fast.

    Given that it can rebuild in 30 mins or less, which is more vulnerable to dual-disk data loss – the XIV or a Symm VP pool on mirrored SATA? How long will the symm rebuild window be? I’ve seen as long as a day with 1TB drives.

    Ewan covered a few of the other misconceptions/errors.. But what the XIV really comes down to is the first of the next gen storage systems. It’s nowhere near perfect, but it is a return to simplicity. The Symmetrix type array will end up imploding under the weight of it’s complexity sooner or later and they will have to start over from scratch. My bet is that it will conceptually look a lot like XIV does today.

    1. I do understand the quick rebuild, and it’s a nice feature, I just don’t think it’s worth the trade-off in performance. And the chances of a failure during that window is not non-zero, so must be accounted for.

      I traditionally use Raid-6 on SATA simply because of the long-rebuild times. In fact, on our CX4 we universally use Raid-6 for SATA disks. (We also stripe vertically across busses and DAE’s to avoid the potential of a bus or DAE failure, which given the built-in redundancy of the DAE is slim to none at all by itself)

        • Dave on March 24, 2010 at 6:54 pm
        • Reply

        Granted, if the performance is as you claim that’s not a good thing. But I do think this may be a prototype of future arrays. The VMax and such are trying to make things simplier (FAST, virtual luns, etc) by adding more layers of complexity. The XIV was designed from scratch to be simple and I have to give them credit for that, even if the first version isn’t perfect.

        Funny you make those comments about the CX4. The Clariion is clearly a mid tier array at best. And I’d need more than one hand to count the number of DAE failures we’ve had on our Clariions in the last year or two. Which we find extremely interesting since EMC claims the only difference between the modern CX DAEs and the DAEs in the DMX4 is firmware. And we’ve never had one of those fail (knock on wood).

        Oh, BTW, the XIV does do async replication. It’s a reasonably new addition but it is there. I’m sure it does it a lot better than the DMX as well (not hard given how god awful SRDF/A is).

          • Chris B on May 27, 2010 at 4:10 pm
          • Reply

          SRDF/A is God awful? Would like to see what your setup is as I’ve been using it for years without issue.

            • Jesse on May 29, 2010 at 10:37 am
              Author

            SRDF/A is godawful to two groups of people.

            * Competetors who don’t have a simliar product.

            * People who don’t know how to use it or how it works.

            For the rest of us it’s a lifesaver esp when you consider that “Synchronous Replication Range” is usually shorter than “Minimum Safe Distance”

  3. http://www.techmute.com/2010/03/xiv-recap.html – current post has details on disk configuration, and times when a dual disk failure faults the entire array. Also links to recent XIV posts by IBM and NetApp.

    1. Good recap. Especially the segregation of data (what’s stored on the access nodes and what isn’t)

      is it safe to assume that this is done with the intention of servicing all write off the access nodes directly so it rarely has to hit the back-end of the system? If so, aren’t you sacrificing the read-benefit of mirrored data completely?

      It’s *STILL* not something I would recommend for anything remotely resembling Tier-1. In fact I put it only slightly above Centerra in terms of performance and usability when it comes down to it.

      1. My post on the 40:60 ratio of Interface to Data nodes is up here: http://bit.ly/drXBd9 Started it a few days ago, but the fact that you were thinking along the same lines confirmed that I might not be a kook.

        If you want the Wave sent to you, post here or DM me and I can either invite you or send you the PDF. If you’re on twitter, add me [@techmute] or let me know so I can at least follow you.

  4. I’ve had similar thoughts myself on interface vs active nodes… post will be coming tonight (I’ve sent it to a few people offline already).

    • SRJ on March 25, 2010 at 12:07 pm
    • Reply

    This is one of the least-informed posts on XIV that I’ve seen. As passionately against the XIV as you clearly are, it might help your credibility if you had at least the majority of your facts straight.

    1. Maybe – instead of crying “liar” and turning and running, you could tell me exactly where I am wrong? Quote me numbers (as I’ve been able to find precious few performance numbers published, please provide sources as well)

      I am not an XIV hater, I just think it’s been horribly mispositioned in the storage arena. Right off the top of my head I can think of a number of good uses for them, not including paper-weights. A pair of XIV arrays would be a great place to store a large quantity of mission critical but non-performance sensitive data. Image archives, and the like.

      Would I ever use one without a DR site? No.
      Would I ever put a database of ANY kind on it? No.
      Do I think anyone who has done so is in for a nasty surprise?

      Absolutely.

      Feel free to prove me wrong. But I mean proof, because I’ve got one here I’m going to beat the crap out of and post ACTUAL numbers on.

        • SRJ on March 25, 2010 at 10:20 pm
        • Reply

        I’ve already posted details in numerous places, including the Google Wave that Techmute referenced above. It takes too much time to do it over and over again… I highly recommend you check it out. Feel free to ask the hard questions and I’ll do my best to provide the facts.

        A quick sampling of the incorrect statements in this post:

        1. It’s not RAID-1.
        2. It does sync and async replication.
        3. It *is* capable of losing 2 drives simultaneously in certain combinations.
        4. And no, they don’t need to be in the same node.
        5. Data loss is not “almost 100% certain” if second failure is simply “in another node” – it is a specific set of nodes…it depends.
        6. There are 6 interface nodes, not 4.
        7. Front-end host bandwidth is all wrong.
        8. Your performance characterization makes so many assumptions, it’s not even funny. Tell me what kind of performance that CX-960 can produce on a transactional db workload with a few hundred active snapshots…? Or is that even possible? =)
        9. Your claim that it can’t run any transactional db workload is awfully presumptuous. Tell that to all the customers doing it quite successfully. It might be fair to say that it can’t run the *same* size workload as some other systems, but your complete dismissal (even admittedly before testing) lacks any hint of credibility.

    • scottf on March 31, 2010 at 6:13 pm
    • Reply

    You might want to actually use one before you blast the XIV in an arm-chair quarterback style. I’ve been working with EMC gear for years and the XIV can blow CX4’s out of the water with certain workloads. You don’t really understand how it works. And the interface makes the entire EMC (and HDS, and most other IBM products) look like the stone age.

    1. Yeah, i saw the interface, very “Mac Like” (If you turn your head sideways you can see where the inspiration came from)

      As far as actually using one, I can’t wait to see it in action. As an EMC guy, I don’t usually get the opportunity to see other arrays in action, so this will definitely be a learning experience for me. That being said, I *SERIOUSLY* doubt I’m going to get any meaningful introduction to the hardware so I’m not holding my breath.

      If IBM wants to ship one my way I’d be happy to beat the daylights out of it and give my honest opinion. 😉 I’d have to clear the extra $1,000 a month in electricity with the boss (my wife) though. (I get enough grief over the $245 a month it costs me to spin my 13TB of CX500 disks. 🙂

      As to the rest, you said it best. “With Certain Workloads” Any array can be made to perform exactly the way the person running the test wants it to perform. But for an apples to apples test, you’d have to have a CX4-960 with 180 1TB Sata drives, pooled and thick-provisoned into raid-1 volumes. That’s the closest you’d get. Even that wouldn’t allow for the granularity of the XiV’s storage, but it’s the closest. (The 960 would be to approximate the amount of cache and CPU the XIV has, I’m sure a CX4-240 configured this way could still out-perform the XiV, *ESPECIALLY* full. 😉

      I don’t doubt that the ability to do a scatter-read off the access nodes provides for some great read performance, though I am curious as to whether reads are serviced off the data nodes or only in the event that the access node is unavailable? (But no-one is posting performance numbers other than the basic tests, copying /dev/zero to a lun is hardly a valid test.)

      I’m also VERY curious as to the read-write performance when the array is over, say 80% full. (That is at the point about where the space on the access nodes is exhausted and the from that point *ALL* reads/writes go to the data nodes, right?)

      And yes, I’ve seen a CX4 perform HORRIBLY. As in “write-intensive applications on Raid-3 volumes” horribly. (As with any system that allows the users to control where the data goes, it allows the users to control where the data goes, which isn’t always a good thing.)

      The main difference between the CX4 and the XiV, is that on the CX4, you can fix it if it’s a problem. Online lun migrations from differing raid-groups, from Raid-6 to Raid-5 to Raid-1 to Raid 1/0 are easy, and transparent to the host. Is there a similar function on the XiV that allows for mitigation of performance problems should they arise? Everything I’ve read says that there is zero user control of data placement.

      For the record, using Raid-6 was *NEVER* my idea. It’s use was dictated entirely by politics of the environment.

      IBM’s solution to performance degradation over the 80% mark is to “Stop adding data to the system.” which if i were a customer, would torque me out of shape to no end to know that over 12% of the (usable – 25% of the raw) array I paid good money for shouldn’t be used.

      I see great potential in the XiV, but again, as it sits, and as I understand it, it’s not something *I* would put a transactional database on given the choice. I can think of several applications in the building that would be MUCH better suited for XIV storage.

      Feel free to educate me – I actually welcome being wrong – it means I’m still learning and can still be taught.

    • william bishop on April 2, 2010 at 6:21 pm
    • Reply

    I have to say, it probably is faster than a cx…but then what isn’t? Kidding aside, I looked at one very closely when we were planning our new purchase, but compared to what we needed, it was WAY down the performance chain. We bought a HDS USP-V, and I don’t think I could be happier with an array. Beats the living hell out of my symmetrix, and absolutely shreds our old IBM gear. Moshe was a genius, but the XIV, from the references and from the research, just wasn’t enough.

    1. I seriously doubt it, unless the CX4 in question is configured by a moron that is. (Or has a customer who has restricted them to the use of Raid-6 for no real reason)

      But I digress.

      Even the lowest end CX4 could run circles around the XIV, when appropriately configured.

      The difference is you get to actually optimize the CX, where the XIV is a true black-box.

    • william bishop on April 3, 2010 at 10:34 pm
    • Reply

    Jesse, you blind? I did say kidding aside, which means I was kidding. A CX might be faster (I never could get true IO numbers from anyone on the XIV)…don’t want you getting upset about it… 😉

  5. Ok…so Jesse’s uncle sez “You can have it cheaper, you can have it faster, you can have it smaller – now pick any two of those.”

    Apparently neither Jesse nor his uncle ever studied queueing theory or statistics.

    If they had, Jesse would know that two cheap disks that can each do about 120 IOPS are not >>as<< fast as a single disk that can do 240, they are faster. Up to 40% faster depending on the arrival process and utilization.

    So, now let's take XIV with 180 x 7,200RPM HDDs and compare that to an EMC box with 90 x 15KRPM spindles, and demonstrate what you and uncle missed by skipping statistics.

    1) At full load, XIV does more IOPS at the same response time because Wq dominates and XIV spindles offer twice as many queues. Or, XIV does the same IOPS at faster response time. Yes, disk service time is slower, but under heavy loads typically 80% of I/O latency is Wq. Since Wq dominates at high utilization AND Wq is halved by doubling the number of disks, XIV wins.

    So much for having it faster.

    Obviously you already know that 3.5" SATA class disks deliver 2x – 3x the capacity of 15K enterprise class disks in the same number of cubic inches, right?

    So XIV is not only faster, but it is smaller (per capacity)

    Finally we get to cost. As even you (if not your uncle) must know, a 7,200 RPM SATA disk is about 1/3rd the cost of a 15K disk on a "per spindle" basis. It's about 1/6th the cost on a per-gigabyte basis.

    There's the "cheaper" part, and XIV is now three-for-three.

    But there is more. Disk drives consume exponentially more power as RPM increases — as a result a 7,200 RPM disk uses about 1/5th the energy per GByte and 1/2 the energy per IOP.

    So Jessie…you can call up your uncle and tell him that you can have it cheaper, smaller, faster — and use less power. Of course in the case of XIV, they've gone and spent those power savings on massively parallel/concurrent caches — which is ironic, since it was Moshe's DRAM cache designs for Symmetrix that put EMC on the map.

    Here's the bottom line though.

    The numbers just out from IDC and Gartner show that EMC has lost more than 3 points of market share in mid-range and high-end block-level storage. IBM has picked up four points — three of them attributable solely to XIV.

    Looks like 'ol Moshe and his crew are kicking your butts in the enterprise space, eh?

    Well…just think, you can always change the subject and talk about how EMC has taken the lead in the race-to-the-bottom. In 2009 EMC surpassed Dell to take the lead from Dell in the under-$5,000 "junk array" class!!

    1. So let’s *REALLY do apples to apples.

      CX4-960, 180 1TB 7,200 SATA disks. Put them down all 8 back-end busses and all in a single storage pool, you can even go Raid-1 (though having Raid-5 available is a nice option if you want to REALLY save some money right?)

      And let’s go one step further, and actually fill the box with data. You know, past the point where the XIV is stuffing *ALL* writes over the crappy gigabit back-end to the data nodes, bypassing the access nodes. The CX4 would be, in this situation, giving you 8 gigs of bandwidth for every 30 drives, right?. How does that compare to 2G for the 12-drive data-nodes PLUS switch latency PLUS data-node latency PLUS access-node latency. (I didn’t count rotational speed and seek times since that will be roughly the same on both arrays at this point)

      Hell – side by side, with full arrays, I’d put a single CX4-960 against the XIV any day of the week and twice on sunday.

      What’s the switching latency of the POS low-end switches you used in the back-end? They *Look* like Cisco 3900’s you’ve rebranded, those are departmental switches at best.

      What’s the switching latency of the access-nodes themselves? How fast can they take a write, break it up, and send it out over said crappy low-end switches?

      How well does the system respond to a midnight backup of all data? (Granted, people with brains tend to break their backups up – but then again, let’s ask that question again. When backing up all data, how long does it take for the back-end to saturate once you start backing up a system that has *ALL* of it’s data on the data-nodes?

      Oh – and what happens when you lose a drive in the access node and a drive in a data-node at the same time? Statistically the likelihood of this event is rare, but not non-zero, but I believe since the data is spread so far and wide on the back-end, the odds of data-loss are MUCH higher.

      Oh – and what if the IBM CE who installs the system doesn’t realize that the patch cables from the patch panel to the access nodes run 50micron cables, and approves customer installed 62.5micron cables going to the front end? (IBM’s lucky I caught that one today – saved some embarrassment, except the customer was interested to note that the EMC rep was the one who had to point it out)

      Finally, remember, the best always loses a little market share during a recession as short-sighted people start trying to cut corners hoping to save a buck or two. Once that blows up, they always come back.

      The smart ones realize that cutting corners *ALWAYS* ends up costing more money in the long run. The Long-Term ROI is the *ONLY* thing an intelligent person looks at.

    • william bishop on April 9, 2010 at 9:42 pm
    • Reply

    Aren’t they losing it to someone else besides IBM though? Market share that is?

    1. Oh probably, some of the less discerning customers are probably trying their hands at “Do It Yourself” solutions like Open-E or OpenFiler NAS products, I actually saw a Xiotech SAN last week which I thought was a riot, because the question was

      “Ok now we need to replicate it – Wait, it doesn’t support replication?, How much is did you say a Clariion runs?”

      You can cheap out in the short-term, but when the chips are down, you always come running back, and your little diversion always ends up costing you more money in the long run.

      You know, like the extra $100,000 over five years the XiV costs just in power consumption… No-one figures that into their TCO computations.

  6. No…EMC pretty much lost it all to IBM, and the Symmetrix business for sure went to XIV.

    Below is a table with the biggest movers (share gainers and losers; external SAN attached storage in IDCs Midrange and High-end segments).

    Both Vendors and individual products are listed.

    Among vendors, IBM was the biggest share-gainer and EMC was the biggest share loser.

    Among platforms, XIV was (far and away) the biggest share gainer, going from 0.8% share in 2008 to 5.4% market share in Q4-2009.

    Likewise, Symmetrix was the biggest share loser, dropping like a stone; minus 4.7 share pts.

    Given that Symmetrix lost 4.7% and Clariion lost 2.2%, and there were no other big movers, I’d say it’s pretty clear that XIV is having a huge impact on EMC.

    The frightening thing for EMC is that when you combine the massive share loss from Dell’s Clariion busness and EMC’s own native share losses, it turns out that EMC and EMC sourced products lost a combined 7.1% share in 2009. That’s almost unprecedented. Even Sun only lost 1.8 share points.

    Vendor/Product 2008 Total Q409 2009 Total Gain/Loss
    IBM All Products 15.9% 21.2% 20.0% 4.1%
    NetApp All 6.9% 10.0% 9.8% 2.9%
    IBM XIV 0.8% 5.4% 3.4% 2.6%
    IBM DS5000 0.4% 1.8% 2.0% 1.6%
    HDS All 10.6% 11.3% 12.2% 1.6%
    EMC Data Domain 0.0% 0.8% 0.4% 0.4%
    HP All 9.4% 10.0% 9.7% 0.3%
    EMC Celerra NS 1.5% 0.6% 0.8% -0.7%
    Others Vendors All 7.3% 5.2% 6.2% -1.1%
    Sun STK All 6.0% 4.8% 4.3% -1.8%
    EMC Clariion + Dell Cla 12.8% 8.6% 10.7% -2.2%
    Dell (Clariion Only) 9.4% 5.0% 5.8% -3.5%
    EMC All (wo/Dell Clarii 30.1% 27.4% 26.5% -3.6%
    EMC Symmetrix 25.2% 22.4% 20.5% -4.7%
    EMC All (w/Dell Clariio 39.5% 32.4% 32.3% -7.1%

    EMC did, (as the previous poster mentions above) do well enough with the very low-end Data Domain and IOmega stuff to close the revenue gap. Clearly though, EMC’s mid-range and high-end business is in trouble and suffering from XIV.

    1. No sources quoted? I’m curious to find out where you got your numbers.

      Again – you missed my point. An example:

      Since 1999 I’ve *ALWAYS* bought new cars… In 2009 I bought my first used car since I was 22. A 2002 Chevy Tahoe with 117,000 miles on it. Now – do we assume that because I bought this car over a more expensive one that the cheap car was somehow better? No. We know that during periods of economic hardship, people settle for crap because money is tight, and I’m no exception to the rule.

      Your numbers *ONLY* cover post-2008 data, and therefore is immediately suspect because there is no measure of how many people made the decision on cost alone. I believe there will be a Tier-1 bounce once the economy starts moving up, and EMC has done well positioning itself for that bump with the VMax line, which allows users to buy only as much performance/bandwidth as they actually need instead of putting this huge, monolithic beast in when all they need is 20-30TB of Tier-1 storage. (And by Huge, Monolithic beast, I’m actually referring to the DMX-4)

      If IBM maintains the trend through the recovery I’ll be suitably impressed, because I think once people realize the true limitations of the array they’ll come flocking back to EMC in droves.

      So no-one has answered the question yet. What happens when the array reaches 90% of capacity and all writes are going over the gigabit back-end? Do we write-off the last 20% of the storage due to the performance hit it’s going to take? What does that do to the total cost of ownership of the array? Is Power/Cooling figured into the TCO? I’ve stood behind an XiV and after 5 minutes I feel like I’ve been licking the floor of the Mojave desert because it’s sucked the moisture right out of me.

    • william bishop on April 12, 2010 at 4:08 pm
    • Reply

    Marketman…I’m going to point out the obvious and say that since it looks like HDS gained 1.6%, and I, like several other admins in the area chose to buy hitachi rather than continue with EMC (and I DEFINATELY chose not to continue down the DS route), and we only looked at XIV briefly before continuing our search…I might offer an alternative answer.

    • william bishop on April 12, 2010 at 4:11 pm
    • Reply

    Which is to say your “obviously they are suffering from XIV is at best not given from the current data, and at the least is assumption”. My assumption would be that since everything is branched out….the change is partially a lack of continuance, and the rest is split among 10 or 12 of the listed. I know plenty of IBM shops that like me dropped IBM and EMC, picking up hitachi, netapp, and others….

    • william bishop on April 12, 2010 at 4:12 pm
    • Reply

    I would ask for a number of installed units of XIV, then we can easily address the numbers. It’s not that high, or at least it wasn’t a few months ago.

    • Tiger on April 24, 2010 at 9:48 am
    • Reply

    Any body know the throughput of full XIV?
    Thanks.

    1. The *ONLY* reference I found was 23,000-25,000 IOPS, but that was from an EMC site so there may be a bit of bias in there.

      But compared to the DMX4, even the high-end of that range isn’t even a measurable percentage.

  7. Jesse,
    It would be interesting how much your opinion changes after you have had a few months actually using the XIV.

    To address the double-drive failure non-issue FUD, see my post here:

    https://www.ibm.com/developerworks/mydeveloperworks/blogs/InsideSystemStorage/entry/ddf-debunked-xiv-two-years-later?lang=en

    I suggest you compare energy figures in Watts per Usable TB. The XIV with 1TB drives is only 97 W/TB, and with 2TB drives is only 41 W/TB. These numbers are in the XIV Introduction and Planning Guide if you need to review them in more detail. Compare that to Solid-State Drives (120 W/TB) and traditional disk systems with FC drives (380 to 435 W/TB) and you see that XIV is very GREEN storage.

    One of our clients lost their CRAC causing the entire data center to go to 66 Celcius, and the only system to keep running was their XIV.

    — Tony

    1. Hey Tony -Blog’s been down for about two weeks – kind of ran out of hours in the day and my kids miss me. In my attempt to find some free time, writing for this was the ‘low-hanging-fruit’ i gave up to reclaim some of my life. However because of one of the quirks of wordpress, the posts are still visible if you know what you’re looking for.

      I’m not arguing that it’s useless, I see it as a great fit for the archival market, mostly the same market that I see Atmos going into, but Atmos also has the global namespace and more options for data-concurrency.

      I would love it however if you’d clarify the performance numbers for the XiV. From what I’ve read, the throughput for the entire box is put at 25,000-ish IOPS, real-world experience puts the DMX-4 as able to do that on one processor pair.

      Since the XiV has never, to my knowledge, been tested by a reputable third-party even my 25,000 number is suspect (since it did, point of fact, come from someone @ EMC)

      It’s my opinion that the only reason NOT to have Gartner or some third party firm put a machine through it’s paces is if there is something to hide. (25k IOPS coming from a cached disk array would definitely qualify)

      On that note, what does the performance drop to when the box hits the ~75% capacity mark when all IO goes THROUGH the access nodes straight to the data-nodes, and yes, I understand that data is going from HBA –> Cache –> Cache –> Disk, but when the link between the access node and the data node is Gigabit Ethernet, that is a pretty significant bottleneck.

      I understand – it’s your job to defend your product, and you will do so vigorously. I can’t fault you for that, your paycheck has the IBM logo on it. (And yes, indirectly, EMC pays mine, so we’re even)

      However it’s not in my job description to defend EMC, I do it because I love their products. (as I’ve said before, I personally have a clariion in my basement) – I’ve STILL yet to find a storage system superior to the DMX. In speed, or resilience. (yeah, it doesn’t have that Macintosh like GUI to manage it with.)

      That and, for the most part, I don’t trust people, especially sales and marketing people. So as I’ve also said. Tell IBM to put me in front of one and let me bang on it, if I learn differently I’ll be the first one to admit I am wrong, publicly. (you and I both know THAT isn’t going to happen)

      If I see I am wrong, I will admit I’m wrong, but only then.

      Oh, and if ambient air temperature hits 66C and my system hasn’t shut itself down there is something wrong. The idea is to shut down automatically to protect against damage, and more importantly, data loss – if it doesn’t shut down, it’s putting my data at risk, which is unforgivable.

      Also, my math puts, worst-case-scenario, a CX4-960 with 180 drives (closest to similarly configured XIV) at about 64 watts per usable terabyte, under heaviest load. (All DAE power supplies maxed out, which never happens) And that’s just Raid-1, Raid-5 is 40.4 watts per usable terabyte. But you can’t do raid-5, right?

      And THAT is just the 1TB full-power drives, the 1 and 2TB low-power drives are 33% lower power consumption. (And sitting at idle the numbers are about half that.)

      Look forward to your response.

    2. Ok, maybe it’s not as down as it was supposed to be. Had a few people tell me I should leave it up, and who am I to argue.

      Tony –

      Read the article you referenced in your reply, and you do make some interesting points.

      With the “wide-stripe” you MIGHT lose about 9.1G of data in the event of a dual-disk failure. (The MIGHT of course referring to how far you are through the rebuild when the second disk fails)

      That’s not bad, depending on WHICH 9.1G you lose. If you’re running straight filesystems you might lose file-pointers and file-data, as you say, no big deal. You restore the affected files.

      However, if you are running LVM and lose parts of a logical volume, or the journal, it’s not as likely to recover as easily. It’s *MORE* likely you’d have to replace a whole filesystem. (As is the case when you lose one drive out of a striped volume group, *ALL* of the LV’s are lost because there is no longer a consistency point with which to reference data.

      And if you happen to lose the block containing the part of the inode table for a filesystem, you’re even more dead in the water, whole filesystem is irrecoverable.

      In any case, I notice you carefully use the words “has never happened in the field.” This leads me to believe you’ve been able to replicate it in the lab. This also leads me to believe that you are fully aware that the odds of such a failure are not non-zero.

      And yes, with Raid-1,3,5, etc. A dual-disk failure is just as much a risk. It’s why for criticial, non-performance data I will always go with Raid-6 on Clariion. Raid-level on Symmetrix with Sync replication is irrelevant to protection levels. You’d have to lose the right two disks on two geographically dispersed frames at the same time to lose data. And while the odds of this happening are not zero, you are more likely to win the power-ball jackpot in five consecutive drawings.

      I agree, the quick recovery is nice, and does limit your exposure. I’d feel 100% better about it if lost blocks could be recalled from the remote mirror like the Symm does. (A Symmetrix with SRDF/S can lose an entire raid-group on the source side and will continue to service IO, albeit slowly, from the target Symm.)

      Again – I’m intregued… Hopefully I’ll get to see it work and put it through it’s paces. More than likely though the IBM zealots won’t let me near it.

      Sokay though, I’m going to have my hands full with the new DMX4 and the 300+TB of new Clariion storage that’s on it’s way. 🙂

    • Adriaan on June 15, 2010 at 2:49 pm
    • Reply

    I keep seeing this 30 minute rebuild as a comparison against the hours it takes to rebuild a 1TB disk.

    Point is when a 1TB disk has failed – that is still 1TB of data that is now single copy only in the array.

    Break that into 1MB chunks fair enough. But we now have 1 million chunks to restore and 179-12(module) disks involved so 167 disks busy with parallel rebuilds – that is why it takes less time – but it is also 167 times more likely to fail during that period – probability theory indicates possibly even higher chance than a single disk failing in 167x 30 min ie 83.5 hours or three and a half days!!!

    And what really scares me is this magic list you get of which chunks were not recovered after a second disk failure – somebody in systems admin is going to run around WHILE THEIR APPLICATIONS ARE LIVE working out which files have holes in them ?????

    Beam me up Scotty!

    • James Hansen on August 12, 2010 at 9:44 am
    • Reply

    On the market share numbers, no one mentioned that IBM has achieved the increase in market share with XIV by practically giving it away. I’ve seen it go for 90 points off list bundled in with a larger deals.

  8. LOL – yeah, well that would work wouldn’t. It’s all consumer-grade hardware anyway so it’s not like it costs them much to build them, and the development costs can’t amount to much using cheap off-shore labor…

    The one we got about…what…4 months ago? Still doesn’t have data on it. The one at the DR site still isn’t even connected to the SAN (it is however plugged in, sucking up power and throwing off HEAT)

    I haven’t been privy to why…but I suspect none of the internal customers want to risk their data on it.

    Bottom line. The *ONLY* way an inferior product can compete is based on price. And anything that is psychotically underpriced like that either has major hidden costs you’ll run into down the road, or it’s crap.

    Without exception.

    • sokay on August 26, 2010 at 11:21 am
    • Reply

    Interesting thread. I don’t know much about XIV (yet) but am interesting in learning as this technology (et al) is positioned (not there yet) to disrupt the entire storage industry as we know it.

    Lot’s of talk about the dreaded double disk drive failure as it is the perceived soft underbelly of the XIV system. But doesn’t RAID5 in a VP Pool on DMX pose just as much of an availability hazard? Doesn’t the same apply to an HDP Pool with RAID5 on a USP-V? Yes, I would argue that FC or SATA in the same wide striped pools with RAID5 protection have close to the same probabilities for data loss.

    Jesse you said above, “I’d feel 100% better about it if lost blocks could be recalled from the remote mirror like the Symm does”, I’ll have to ask somebody from IBM again just to verify but I’m pretty sure XIV does this now with both it’s sync and async solutions. In fact, there is zero intervention required to pull the 3rd or 4th copies of data from the remote target systems. So maybe losing data on a replicated XIV system is like winning Powerball 3 times consecutively 🙂

    Next thing to consider that the guys from XIV shared with me, the double disk drive failure only pertains to hard disk errors – it doesn’t apply to proactive disks sparing that are sparked from Latent Defects or Disk Scrubbing. Hard errors only account for 7% of all replaced disks in the system (XIV numbers) and 93% are proactively spared. These numbers gel with stats I got from HDS when I was working with the USP-V and AMS. I’m guessing they would gel with EMC numbers too. Also, when a disk is proactively spared a third copy of data is created on the XIV frame before the spare process kicks off.

    Does this mean that data loss event has a zero probability? Certainly not! This will never be zero. I just don’t buy all FUD talk. Every product on the market has value and merit and it’s an exposed and scared manufacturer that puts out FUD on its competition rather than brag on its own product. IMHO 🙂

    Anyway. I’m looking forward to see if anybody will post performance numbers seen in the field. I’ve heard numbers range from 20k IOPS to 100k IOPS workload dependent – anybody else heard anything on real world performance?

    • sokay on August 26, 2010 at 11:29 am
    • Reply

    Oh yeah, I wanted to post this doc too that a fella from Netapp posted. Great read on reliability – actually, I think it’s required read for anybody working in the storage industry.

    http://media.netapp.com/documents/rp-0046.pdf

Leave a Reply

Your email address will not be published.