IBM XiV – Real-Life impressions…

The Ethernet back-end on an XiV will still be it's undoing

That's a lotta ethernet...

First impression of the XiV in “action”

The GUI is fancy.  Looks like a Mac turned on it’s side. The GUI is also NOT web-based.  It’s an app-install. I do believe however it’s available for multiple platforms.

It really does seem to take all of the guess work out of provisioning since you don’t really have any say on what goes where in your array.

Our first use?  Backing up 6+ TB that was stored on Clariion and moving it to XiV…

Now first off, I’m glad it was decided to do it this way.  Whereas a copy straight from one to the other is possible, utilizing both arrays at the same time, it wouldn’t have provided any comparison as to performance.

The backup was done using Veritas NetBackup, over the network.  The data consisted of a pair of hosts running an extensive XML-type database used for indexing and categorization of unstructured content.  The backup and restore were both done to the same host, over the same network, and the storage was addressed over the same switches, just zoned to different arrays.  The only significant difference was that while the backup was done multiplexed, the restore had to be done single-threaded…(because NBU multiplexed both backups to the same tape)

I have to get the final start/stop-times out of NBU, but from the halway conversation I had with the NBU guy, the backup took 6-8 hours (for both hosts), the restore took 21+ hours…

The most interesting part of it was the first restore took almost the same amount of time as the backup, which is kind of what we would expect.  The second host took dramatically longer to restore than to back-up.

This would indicate to me that, as expected, the XiV didn’t handle the long, sequential write very well.  Since the host only connects to two of the six data nodes, virtually 100% of writes have to be destaged over the Gig-E backend.  My guess is we nailed the cache to the wall with the first restore, and then kept it pegged with the second one.

I like sequential write-tests on this scale because it shows without a doubt whether the cache is masking a back-end issue or not.  If it is, this is exactly what you’ll see.  An initial burst of writes followed by a sharp drop as cache is saturated.  This is even more pronounced in a more utilized array (rather than an idle one) because a certain percentage of cache will already be utilized by host reads/writes.

This doesn’t bode well for an application that requires occasional complete reloads of the XML database…

I can’t wait to see it in action.

8 comments

Skip to comment form

  1. Hi Jesse.
    What firmware is your XIV on?
    The 10.2.4 code contains some improvements in write IO response times that may have helped with throughput.

    1. That I couldn’t tell you. But I will put that forward as a question first thing in the morning. I assumed it was latest/greatest as we have IBM on site working on the implementation, but I could be wrong.

      How disruptive is the firmware upgrade? Does it do a staged / staggered load to prevent data-unavailability?

        • Al on March 28, 2011 at 8:14 pm
        • Reply

        Code upgrades are non-disruptive…

  2. What you have learned is that grid-based systems are not ideal for single-threaded workloads. No surprises there. Now load it up with a whole wild mix of all your apps and users and see how it goes : )

    1. I *AM* curious….the other thing I’m worried about with this particular app is this.

      When they do the (weekly?) reload of the XML database is it going to bring everything ELSE on the system to a screeching halt?

      My first assumptions as to poor performance were based on the assumption that the host sees *ALL* access nodes. Finding that you only zone a host to two of the six nodes makes matters MUCH MUCH worse, because it means that unless the read/write is happening from the access node you happen to be connected to, the IO has to traverse the back-end.

      Whomever thought using Gigabit-Ethernet as a storage back-end was a good idea (I’m talking to you Moshe) needs to have his head examined. Ethernet is way to ‘chatty’ to be reliable as a storage medium. (It’s why I would never use iSCSI in production either)

    2. Oh, and don’t get me started on the psychotically complicated host-connect procedure.

      Fibrechannel is supposed to be plug & play (once masking is completed of course). At WORST you have to install / configure a multipath driver, but it is supposed to just work.

      The “host connect procedure” I got from IBM made my eyes water.

    • Al on March 28, 2011 at 8:20 pm
    • Reply

    One thing to be careful of is that when the queue depth of an interface module becomes drained, the fiber channel interfaces can go offline. I’m hoping IBM fixed this in newer code, but we don’t put enough of a workload onto it now to have the same problem.

    • Nigel Poulton on April 1, 2011 at 3:31 pm
    • Reply

    Hi Jesse,

    Makensure you keep everyone up to dare on your experiences as you use it more and more.

    Nigel

Leave a Reply

Your email address will not be published.