«

»

Aug 29

Blades vs. Virtual

Well – I started conceiving this as a definitive “Blades vs VMWare” post.  I was fully expecting that by the end of the week I was going to have enough information about bladeservers, and the IBM BladeCenter in particular to be able to say one way or another.

I’m here to report that I’ve failed to come to any conclusion whatsoever.

Blades seem to have some advantages, the most obvious being the ability to shut down one node for a hardware problem or upgrade rather than having to take multiple VM hosts offline.  But then VM also holds several major advantages, flexibility, scalibility, etc.

Here are the pros/cons I’ve found:

Blades

PRO

  • Individual servers can be removed and replaced for hardware issues.
  • Hardware clock (I’ve had issues with clock-drift on VMWare systems)
  • Actual service console.

CON

  • Power Consumption – a blade takes almost as much power to run as a standard server.
  • Cooling (The fans on the back of a single bladecenter blow HOT air at 90 miles an hour)
  • System utilization – a Blade has the same limitations as a PC, if it’s not utilized effectively, 70-90% of the processor can sit idle.

VMWare

PRO

  • Much smaller footprint – on a powerful enough server, 7-10 VM’s can reside on a single 2U server.
  • Power Consumption – A fully utilized Dell 2950 can run 8-10 VMWare systems with minimal increase in power consumption.
  • Cooling – see above
  • Management – Because of the virtual console, a VMWare system can be managed remotely much more easily than a blade-based system can.

CON

  • Clock Drift- as I’ve said, I’ve had a recurring problem with clock-drift that not even the VMWare tools can compensate for.  I think right now the system I have running this website is 13 hours ahead.  This will be corrected on the reboot, but it doesn’t seem that the VMWare clock syncs with the hardware clock in the Dell at all.
  • Management – obviously I’ve had some VMWare experience, and I manage my own stuff.  However for corporate users, someone will have to be trained and gain some experience before you can really be proficient at managing a VMWare environment.  Small misteps can bring down entire environments if you don’t know what you’re looking for.

Trust me – I know. 🙂

 

21 comments

Skip to comment form

  1. I don’t feel like blades or vmware are exclusive strategies. I used to think so but after talking to folks out there, if you need a lot of horsepower (IO and memory) then using blades with vmware running on them provides for some amazingly dense power.

    I was at a Sun event talking to an engineer about their M6000’s and someone asked where the power and footprint of pizza boxes vs. blades came in and the engineer’s response was about 5 units. One thing the engineer didn’t realize is that the blades in an M6000 are as powerful as probably 2-4 2U 2x2proc servers. They are about $16k per blade and can take 4x2core, 64gb of memory and ~160GB/s sustained IO. Most 2U servers can’t touch that.

    By putting vmware on blades you can do all the fancy vmotion/vmha/drs but get the advantage of quicker and denser hardware. That’s why I say they’re complimentary.

  2. Jesse

    You know I’d never thought of putting VMWare on a blade – but that makes total sense. Running VMWare and VMMotion on a blade could provide the basis for what would essentially be a single-rack datacenter. (Provided the top half of the rack is occupied by a Clariion CX3 series for storage)

    Maybe add a second rack for disk/tape backup and you’re done.

  3. You mention as a pro for blades, “Individual servers can be removed and replaced for hardware issues.” Doesn’t VMotion make this a pro in the VMWare category as well, so the comparison is a wash on that point?

    I’ve also seen clock drift issues running Linux on VMWare. I went through a whole lot of clock= kernel options and also used the VMWare tools and didn’t manage to make it work properly. I ended up accepting that there would be some drift and used ntpdate via cron to keep it close. Not ideal…

  4. Jesse

    I’ve not had a chance to play with VMotion, much as I’d like to, even my relationship with EMC doesn’t allow for those “extra benefits” Vmware licenses are closely guarded, much more closely than any of the Symmetrix/SYMCLI licenses.

    Yeah – that’s kind of where I’ve been going with it. I’ve just accepted the drift as well, and reboot it on Sunday nights just on GP. I’m going to give the ntpdate via cron a try and see if I can eliminate the reboot if possible. Don’t some applications have issues when the clock moves backwards though?

    In a ‘production’ environment where transaction times are important, this bug alone makes VMWare unusable. I’m not a programmer but I don’t see how hard it can be to more directly tie the VM “Hardware” clock to the physical hardware clock on the server.

  5. I imagine there are many apps that wouldn’t like time to go in reverse. Doesn’t ntp do some sort of skewing to slow the clock gradually until the real time catches up? I don’t know, I’m no expert on NTP. The Linux box I was running ran syslogd, smtpd, and some backup-type scripts via cron, so nothing too fancy. I believe the problem has something to do with how Linux tracks time, there are a few methods to try. I don’t recall Windows VMs in the same environment having similar issues.

    If you’ve got some time to kill, check out the VMWare forums and search for time drift on Linux hosts. There’s several posts out there with many options (of which none worked for me sometime last year).

  6. Jesse

    Windows does it when VMWare tools isn’t installed, but the tools seem to run constantly keeping the clocks in sync.

    My primary linux VM is actually this site, which is why it shows that this conversation is happening at around 2am tomorrow morning. 😉 If I reboot the box within the skew time of the last post, it actually rolls the post back and puts it into a “Pending” queue and re-posts it when the time rolls around again. Quite annoying, but not fatal. 🙂

    I run a mailscanner and several windows servers (Exchange, Blackberry, etc) on this Virtual as well, but the mailscanner isn’t time-critical and obviously the windows hosts are in sync.

    I’m curious as to how well VMotion works and if anyone has any direct experience. I keep requesting a non-expiring eval copy for training purposes, but the folks at VMWare are religiously ignoring my emails.

    Until such a time as they provide me with a copy, my official opinion is that the product sucks and I wouldn’t touch it with a 10 meter cattle prod. 😉

  7. We run a HUGE shop on IBM bladecenters(HS21 XM’s) and we’re typically running about 60 desktops(that’s actually low of what it will do, but we need N+1 for failover).

    Power wise, I’ve got actual numbers that show vm servers running on bladecenter host blades results in about a 72% power reduction on servers, and nearly 90% on desktop vm’s.

    We’ve been running this scenario for about a year and wouldn’t go back for anything. I’ve literally had servers failover without even knowing that the blade had gone down, it’s that smooth.

    We’ve got 4 chassis now, moving to 8. Yes, if you’re running a single server on a blade, you’re wasting time, energy and power…but put 15 servers or more onto a blade and we’re talking some seriouis savings. I can get 800 sessions on a single chassis, so most small to medium shops could do their whole infrastructure off one chassis. Mix a decent SAN into the thing(dmx in our case) and you’ve got a winning combination.

  8. I’ll send you the fix for the clock issue as well, we had it happen on a couple of linux boxes, most it doesn’t have any problems with. Email me out of band, you know the addy.

  9. Wow, Jesse you actually floored me today… you’re trash talking VI3 & VMotion without actually having any experience with it.

    I’m with edsai, VI3 on blades is awesome. Combine it with IBM BladeCenter blades and the density levels you get are… well… ridiculous (think 1200 servers in 42U – assuming SAN is separate). You can mix and match between physical blades and ESX hosts within the same chasis, deploy centrally, and using something like PlateSpin even alow you to have your physical blades back up as either VMs or new physical boxes in no time.

    And for the record for 99% of servers in my experience VMotion & DRS works like a charm. The only thing I wish worked better in the VMware world is Citrix boxes… even if you combine with Softricity and such to eliminate silos, you still end up with far too many blades dedicated to running 4GB Wintel servers (too many of the apps lack 64-bit capabilities).

  10. Just to be clear, in reference to PlateSpin, I was suggesting using PowerConvert to rapidly migrate existing servers to blades or VMs (or even between the two!), making the moves between the two form-factors point-and-click.

  11. Jesse

    LOL – Like I said – if the folks at VMWare would like to come up with a copy of VMotion for me to play with…. I’ll be more than happy to give it an objective review. 🙂

    Actually I did download the eval – I just need to make sure I can uninstall it as easily as it installs, because don’t have “development” equipment. I only have one VMWare server now, I’ll build the second one this weekend with a Dell 2650 I picked up on eBay for $550, and maybe I’ll be able to test it out. 🙂

  12. Jesse

    William – please, I would appreciate the fix – that would be great. 🙂 You can email it to me at jg@sangod.com.

    Jesse

  13. Here is the document from VMware on timekeeping in virtual machines:

    http://www.vmware.com/pdf/vmware_timekeeping.pdf

  14. Well, it looks like Andrew beat me to it. If it still doesn’t work, let me know, I’ve got some other tweaks, that while they aren’t as elegant, will do the trick.

    Keep in mind, that while some vm’s on dual cores lose time to some extent, quads are far worse….but the problem seems to be less frequent, just your weird percentage. Dunno why.

  15. Jesse

    I don’t have any dual-cores, but my machines do support hyperthreading.

    I’ve been reading up on VMotion – it looks like fun but as I no longer have a SAN in my basement, getting to try this out is going to take some finagling.

    Does VMotion support NFS Storage as well? I’ve got about a TB of Raid-5 storage that I’m going to install in one of my CentOS boxes as an NFS server, that might enable play-time.

  16. Yes, you can use nfs, but it is…..unpredictable. And since it’s nas, performance will not be great, but since you only want to try it…It should be fine.

  17. Jesse

    When I was running the VMWare infrastructure at EduCap, I put most of their “Production” infrastructure on the NFS mount on the Celerra. The main reason was of course that they were too cheap to buy a real VMWare license and I couldn’t put the storage in shared SAN space.

    And since the difference between the basic and extended VMWare license was too much, you *KNOW* they weren’t going to go for VMotion as well.

    So I put everything on the NFS volumes so that if/when I needed to bring hardware down, I could relocate servers with a fairly minimal outage.

    Now granted, I had the 4 Gigabit connections on the celerra trunked together directly into a pair of 6509’s, and the 2 gigabit connections on the back of the 1850’s that were usedusing as VMWare servers were configured as redundant, which I think helped a lot with the stability.

  18. I wasn’t implying ALL nfs(or nas) was a problem, but you described a small raid 5 device in your basement.

  19. Jesse

    Yeah, I know. I would trust it more if I had more than a crappy Dell Gig-E network switch in the mix. I mean yes, it’s Gigabit, but it’s also no Cisco and I’m not sure can be relied on.

  20. Does it need to be relied on? For testing play with what you have.

  21. Jesse

    A little of both – but I’ve got an upgrade coming to the server, I’m going to play with dual bonded NIC links for performance, and got a Dual 2.4Ghz AMD motherboard coming to replace the server.

    I started to install a basic Windoze server to it for a friend to play on from outside, and the install went incredibly slowly…

    I suspect its more likely system performance than network, because I’ve run VM’s over a gig-network link before without any real performance impact.

    Then again, I had a Celerra serving it out and dual 6509 switches as the network back-end.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>