I Have A Love/Hate Relationship with VMware Capacity Planner

by Denny Cherry
Sep 25, 2012

One of the tools which is going to be very familiar to anyone who has gone through a Physical to Virtual migration using VMware, is the VMware Capacity Planner.  This tool runs on a machine within your network and gathers up a wealth of performance monitor data from all the (Windows) servers running within the network.  When installed, you simply tell the software which machines you want it to monitor and it spits out a nice report a couple of weeks later which includes all the various metrics for your network.

Related: Essential Tips for Virtualizing SQL Server

This tool is used by just about every SAN/Server/VMware reseller and VAR out there to size environments for new virtualization projects.  My problem with it, is that it doesn’t work worth a damn for smaller environments.  The reason that I say this, is that the VMware Capacity Planner works on averages.  If the averages that it comes up with are for servers that run at a consistent load throughout the day then the capacity planner report will be just fine.  However if the server only works for a small number of hours per week, but when it works it works really hard then the numbers that the capacity planner reports will be next to useless.

As an example I was working with a client last week that has a brand new EMC, UCS, VMware environment which was sized from the VMware Capacity Planner.  They took one of their existing production SQL Servers and created a copy of it in the new platform and the process went from running in one hour to 9 hours.  The reason that the process was so slow was that the storage hadn’t been sized correctly.  The reason that it hadn’t been sized correctly was that the VMware Capacity Planner showed that the SQL Server needed 14 IO/second and 0.08 MB/sec of data transfer, so the storage for this server was designed with this workload in mind.  However the actual workload for the server is that for 1 hour a night the servers runs to about 600 IO/second and the rest of the day the server is totally idle.  So on average the numbers break down to about 14 IO/second, but the actual workload when the server is running is WAY higher than that.

Now if this was a large company which had purchased a fully loaded vMAX from EMC, or even a pretty powerful midrange VNX there wouldn’t be any problems as the system would have been powerful enough to handle this extra IO without issue.  However this company was sold a smaller VNX 5300 which didn’t have enough IO capacity to handle the unexpected workload.

Now I may not be fair in placing the blame on the VMware Capacity Planner, there is plenty of blame to be placed on the VAR for not validating the numbers from the report against the raw data.  If they had looked at the peaks in the raw data they would have seen that the numbers weren’t anywhere close to the output from the capacity planner and they could have done something.  But instead I’ve got a client who trusted their reseller and vendors to sell them a solution which could handle their workload and instead they are pissed off about the situation they are in, and they’ve got me telling them that the brand new platform which they just purchased needs to be upgraded to handle their workload.

If you are working with a VAR or vendor and they are using the VMware Capacity Planner to size a new storage or server platform be VERY sure that you double check the numbers that they are using to size the platform so that you don’t end up with a big surprise after you finish the purchase.  If fact, one of the services which I’m happy to offer as a consultant is to help you double check those numbers from the VMware Capacity Planner to ensure that the solution that you are purchasing is the right size solution.

Discuss this Blog Entry 1

on Sep 27, 2012
I don't know where the love part of the love/hate comes in, from a SQL Server point of view its just a hate/hate. It simply doesn't work at all as a capacity planner at any level due to the way it does that calculations and the assumption that 5 physical servers running at 20% on any performance metric can now be run on one host to equal 100%. That really doesn't work becuase of the overhead and latency within the vmware product. let me say from the beginning that putting SQL Server on vmware in the first place is the worst possible thing you can do with SQL Server. If anyone really is just desperate to put SQL Server on vmware, please skip any capacity planners and do the same work you would normally do to size SQL Servers, but do it based on the size of the host and dedicate your resources. You will quickly realize that there is a very limited use case for SQL Server on vmware. Almost all inititives for using vmware is to overallocate the host or why would you be doing this in the first place. While strategy works reasonbbly well on application servers, its does not work at all database servers if you are in the business of performance and stability which is the business we are in if you are a SQL Server DBA. There will be an ROI immediatly after rollout which will drop off radicaly as you have to move back to physical to satisfy performance. I am a profession DBA with 14 years experience on SQL Server and 5 years experince with keeping it off of vmware. :)

Please log in or register to post comments.