VMWare disk problems…

Ok, I’ve been playing more and more with VMware lately.  All of it personal, because the work opportunities just haven’t really presented themselves.

In relocating one of my “production” servers to the Fibre array I purchased recently, I ran into a problem.  I realized that I was doing it wrong and tried to cancel out of a disk move.

Every vmware vmfs disk is made up of two parts.  The actual virtual disk is contained in a file ending in “-flat.vmdk” then there is a header file that is named the same way, minus the “-flat”.

In my particular mistake somehow the -flat file got moved but the header file didn’t.  So when I went to re-mount the disk under the VM, it was just gone.

To give you an idea of the level of panic that was going on, the name of the disk that was lost was “finance.vmdk”.  Yes, this is the root disk of the server that runs my accounting package for work.  Not a happy time for me”

I played with it, I scoured the vmfs volumes to ensure that it didn’t get redirected to the wrong lun, I searched VMWare’s knowledge base (a useless endeavor) and was getting ready to rebuild the server when I had an idea.

I renamed the remaining flat file to “finance-temp-flat.vmdk” and went into the console and created a new disk of exactly the same size.  I then deleted the -flat file that was created, and renamed ‘finance-temp-flat.vmdk” to “finance-flat.vmdk” .

I restarted the virtual machine, and lo and behold, it booted without effort.

I then immediately shut it down and backed it up.

I then exhaled.



Skip to comment form

    • on February 15, 2008 at 5:43 pm
    • Reply

    You should consider NFS for your VMware environment. No cluster file system IO contentions when you run with several ESX hosts, dynamic storage resizing, and you could have restored your lost VMDK file directly from a Snapshot.

  1. Thanks – and Welcome. 🙂

    I’ve been considering that or iSCSI, but I have a crappy ethernet switch, and a good fibrechannel switch. 🙂

    I’ve got an AMD based generic dual 2.8ghz server sitting around, and the best part of it is it has a terabyte (usable – Raid-5) of disks sitting in it. The problem is is that there is something that causes it to reset itself at random intervals.

    When I was at my previous employer (before they folded like a house of cards in a hurricane) we had a number of VM’s running on a Celerra NFS mount. I didn’t have any problem with that because it was an enterprise-class environment, you know, real switches and a truly H/A NAS setup.

    My setup at home – not so much.

    Question for you – is it possible to use the VMWare based snapshot to free up the disk files for direct backup? If so, that might make it worth trying. 🙂 As I said, I don’t like the Fibrechannel setup in that the dell PV224F I’m using for an array isn’t a raid array. (and if I lose data due to a disk failure I don’t think I’ll ever hear the end of it) I’ve considered looking up the upgrade for it (Dell’s website says that a 224F can be upgraded, but not how or how much)

    • on February 15, 2008 at 8:59 pm
    • Reply

    WE’ve all been there, my first time scared the crap out of me, but the things are fairly robust and it’s hard to screw it up so bad you have to go to a backup…Yes, you can backup a running vm a number of ways, look for backup scripts at the forums. You can do some pretty nifty stuff with it.

    • on February 19, 2008 at 3:04 am
    • Reply

    The 224 was kind of ugly – I’d be snooping around for CLARiiON customers throwing out their CX400s or CX300s – or, hey, build one yerself 🙂
    With ESX vmkfstools is your friend, it works better with vmdk files than cp or more _traditional_ unix tools. And you would avoid the problem you had entirely …

  2. I’ve actually been looking at buying an AX150 for my production stuff, the one I can set up and forget about. Believe it or not you can get them fairly cheap, about the price of a top of the line laptop new with support.

    As far as ESX, I’m getting my comfort level.. Once I’ve had a chance to play with things a bit the underlying concepts become clear.

    • on May 1, 2008 at 8:22 am
    • Reply

    I assume you scrapped the AX150 purchase after cobbling together your home FC system?

    eBay listings for reasonably priced AX150 boxes have been few and far between. I was debating going with something simple like the SS4200-e but there are some severe limitations with it.

    ESX is a blast and once you start utilizing disk motion it becomes really fun. I wish I had the ability to bring even a CX3 box home to use as everything else just doesn’t get it done in the same way. VMware loves being on a SAN.

  3. Never got to it – AX100’s and AX150’s are indeed hard to come by.

    I ended up doing the NFS back-end with 4 Gigabit interfaces bound together. Works well so far. 🙂

    I actually tried to go with the fibrechannel, but the powervault 660F i had fell flat on it’s face the first time a drive failed.

    So I’m going to list all the SAN stuff on Ebay and go with this for now. The great part is that the way the drives are laid out I can replace the 146G drives with 300G drives when I run out of space and just expand the volume group.

    What I really need – is for the folks at VMWare to figure out real binding and not that routing based on MAC or routing based on IP garbage.

    • Angel on June 8, 2009 at 11:54 am
    • Reply

    You saved my job. Thanks!

Leave a Reply

Your email address will not be published.