Microsoft loses data, but no backups?

Microsoft, TMobile Apologize For Data Loss, Offer Month Credit

File this under whups.  Microsoft loses data.  That’s not a big surprise.

But I’m in a situation where Microsoft is recommending to a customer that they use almost exactly the same technology to protect their new exchange environment – there is a HUGE part of me that wants to stand up and scream that this is *NOT* a good idea.

Nevermind that in the past I’ve tried a number of times explaining to them some of the shortcomings of their design.

1. That using DAS in an enterprise environment when there is a multi-million-dollar replicated SAN already at their disposal is foolish.

2. They are going to replicate over a 50% saturated gigabit IP network when there is an 8b DWDM Fibrechannel connection available might leave their production and DR environments out of sync.

3. They are going to set all of this up on Hyper-V when VMware offers load-balancing, HA and an amazingly scalable environment is a bit short-sighted.

It’s obvious to me that the genius who designed this cluster____ pulled the design directly from a Microsoft white-paper..

But look at the Microsoft/T-Mobile debacle and ask yourself…  Is the Microsoft way always the right way?

My answer would be quite solidly…no.

4 comments

Skip to comment form

    • SANDiety on October 13, 2009 at 1:31 pm
    • Reply

    In a comment towards M$ and DAS…

    It seems M$ support really does not ‘get’ shared disk arrays at all (my opinion)…recently we had some heavy latency issues (>200ms across the board!) due to what ended up being a driver problem…anyway…check this M$ response to our issue for a laugh:

    —begin snip
    “… The other cause is having all databases and all transaction logs going to a single LUN (each on their own S: and L: ). When we have all databases going to a single LUN, in your case 18 databases, we cause an extreme amount of I/O across those spindles happening or trying to happen simultaneously. My recommended action plan as follows and the items are listed in priority as to what I would take care of first if it were my environment, the corresponding data will follow the action plan.

    Plan of Action:

    1.) Work with your storage team to get 36 LUNS carved out (make sure these are not crossing spindles).
    2.) Move each database to its own LUN.
    3.) Move each transaction log set to its own LUN.
    Here is a link to the Mailbox Server Storage Design and a snippet from it below. Please understand that I know carving out this many LUNs may not be feasible in your environment however, it is best practice and thus it is what I need to recommend to you to resolve your issue. With that said there are a other approaches that I see corporate businesses take and if the above is not feasible we can discuss them if you like.” —end snip

    Anyway, the config is 1 lun for DB’s, one for logs. After getting the drivers updated, < 5ms across the board. As, the aggregate of spindles behind each lun are way greater than the IOPS required. 36 luns. Not crossing spindles (come on now). For a single server. How old school is that??? lol.

    1. Um….you must be talking 2007, because with 2003 at least, it’s 4 databases per storage group, 4 storage groups per server.

      I do understand that that’s an out-of-date thinking (haven’t had a lot of chance to play with 2007 yet, my Dell 2650’s won’t support 64 bit wintel.) but I’m not sure if a single-transaction-log volume is really the way to go.

      That plus I love the “(Make sure these are not crossing spindles)” comment. This is *DEFINITELY* an indicator of the thinking of someone who does not understand modern cached disk arrays.

      To clarify. When you’ve got 128G of cache in a disk array, it *REALLY* doesn’t matter in any meaningful way whether or not you’re crossing spindles. All writes happen directly to cache. All reads happen from disk, but due to modern pre-fetch algorithms happen..well…from cache.

      DAS = no (or very little) cache.

    • william bishop on October 13, 2009 at 7:24 pm
    • Reply

    No way on earth I would trust a Hyper-v installation with anywhere NEAR that level of critical service. I hate VMware’s support, but their product is 10,000X better than hyperv. Then again, microsoft’s support pretty much sucks too….but I don’t even bother trying to call them.

    1. Both Microsoft and VMware support leaves something to be desired.

      However with VMWare support you’re about 90% less likely to need it, which mitigates that fact.

      VMWare is also a *MUCH* more mature platform. More feature friendly, more functionality, and just plain easier to use.

      Again – the only person who would choose Hyper-V over VMWare is someone who has drank the Microsoft kool-aid.

      T-Mobile did that, didn’t they? Now look what happened.

      Actually to find out later that this was a HITACHI problem… (Not sure yet if it was hardware or with their contractors – still trying to find that out)

      http://www.channelregister.co.uk/2009/10/12/sidekick_hitachi/

      Points to it as a problem during a “Storage Area Network Remediation done by Hitachi Data Systems”

      A few other articles I’ve seen are pulling the ‘blame the contractor’ nonsense that they all do. We’re convenient scapegoats, especially when consulting on badly flawed technology.

Leave a Reply

Your email address will not be published.