I am currently teaching a VMware Manage for Performance class at TCC. During this week’s class we had a great discussion (OK there was a little bit of soap box preaching involved also) on resource allocation in a virtual environment. The discussion started from a question that was asked based on the resolution to a network performance lab scenario.
In the lab a standard vSwitch with seven 1GB physical uplinks had eight VMs attached to it. In the scenario a VM attached to the vSwitch had network throughput that was maxing out at 200MB. A series of performance graphs were provided and from the graphs you could see that only one of the seven physical uplinks (vmnic3) was passing network traffic. The issue was that all the other physical uplinks were in standby and vmnic3 was the only active uplink passing network traffic for all VMs attached to the vSwitch. The solution was to make four of the remaining physical uplinks active and enable load balancing. This left two uplinks in standby to be used in the event of a failure.
This raised the question “Why not make all seven physical uplinks active?”
Just from the lab scenario we do not have any insight into specific application needs or service level agreements (SLAs) but my answer to the question was the environment did not require more than the five physical uplinks available so two uplinks are set to standby to ensure the production capacity is available in the event of a failure of up to two of the active uplinks.
“Wouldn’t seven active 1GB uplinks be better than five 1GB uplinks?” Seven is definitely more but not necessarily better. It really depends on the needs of the environment and any SLAs that may be in place. If five 1GB uplinks support the applications and the service levels are being met then adding more uplink capacity is more than likely unnecessary.
If all uplinks are active there is no failover capacity. If there is a failure it will decrease the amount of available throughput and possibly degrade performance. The environment that has been delivering service that exceeded the workloads requirements is no longer providing the same level of service. You may still be within acceptable service levels but it is not what application users have come to expect. If you have five uplinks active and two uplinks on standby if there is a failure of a physical uplink (or two) there will not be any impact on production, 5GB of throughput will still be available.
Some people took this to mean that by providing mediocre serivce (5GB of throughput instead of 7GB) then users become accustom to the mediocre service. It is not that at all. It is about managing resources to deliver acceptable service levels for the applications running in your environment with a good plan in place to deal with failures and handle growth.
We had a pretty lengthy discussion on the pros and cons of this and here are a few of key things I hope the students took away from the discussion:
- High resource usage != bad.
This is one of the core benefits of virtualization – better use of available resources. If the environment is using an average 70-75% of available resources this is a good thing. It means the resources you have are being used. Idle resources can be (resources reserved for failover should not be considered idle – they have a purpose) a waste of those resources.
- Allocate the resources needed.
Just because you can allocate 8GB of RAM and 4 CPUs to a virtual machine does not mean you should. Allocate resources to VMs based on the resources the workloads need. Unlike with physical servers it is much easier to scale resources up if necessary. When allocating resources to a new VM start at the lower end of the requirements and scale up as needed.
- Plan for failover.
It is important to plan for failover so that a minor failure (loss of a network adapter) does not impact service level agreements nor should it impact the production performance users are expecting. Failover resources are often looked at as idle resources but this is not the case especially in an environment that needs to be highly available. Failover/standby resources maintain service levels in the event of a failure and are an important (and necessary) part of the environments design.
- Plan for the future.
With physical servers it was a lot harder to plan for future needs. Adding a CPU, memory, or even a network adapter more than likely meant down time because of this it is not uncommon to build out a physical server with more resources than needed to meet application requirements. In a virtual infrastructure not only is it much easier to scale up resources on individual VMs but it is also much easier to scale out available resources in your environment to handle more workloads. It is still important to plan for future needs but it is much easier to add future resources when needed.
These kind of discussions are something I enjoy about teaching the class. It helps me understand how other people see things and how I can hopefully pass on a bit of knowledge and experience that doesn’t come right from a text book.
I know a few students of the Manage for Performance are probably reading this. I would appreciate any comments you might have on this discussion or on the Manage for Performance class in general. Comments from others in the virtualization community are also very much welcome and appreciated.