This post was inspired by a conversation with @Emad_Younis (If you are not following him, you should be) at VMware PEX about ESXi network design using a limited number of NICs. Emad is working towards his VCDX so as part of the discussion we worked through the solution design applying the same design methodology used when creating a VCDX design.
Here is the information we had available:
Customer has a HP server with three physical NICs, 1 on-board and 2 on a dual port card, to use as an ESXi host. Customer did not have budget/time/whatever to purchase another dual port NIC and wanted to get the host in production as soon as possible. So only three NICs were available. The customer wants the configuration to support the connectivity necessary (Mgmt, vMotion, iSCSI, and VM Network) and be highly available.
Not much there but we are able to reach a solution using the conceptual, logical, physical design process.
The first thing we did was created a conceptual design by identifying the requirements and constraints, made assumptions due to some missing information, and then identified the potential risks.
- Provide network connectivity for management, vMotion, iSCSI Storage, and virtual machine networks
- Separate traffic types (Management, vMotion, iSCSI Storage, and virtual machine networks)
- No single points of failure for Management, vMotion, iSCSI Storage, and virtual machine network connectivity
- HP Server
- Three 1 GbE NICs available – 1 on-board, 2 on a dual NIC PCIe Card
- 2 upstream physcial switches
- 1 GbE will provide sufficient throughput to support virtual machine traffic
- 1 GbE will provide sufficient throughput to support iSCSI traffic
- vSphere Licensing is not Enterprise Plus – using a virtual distributed switch is not an option
- VLANs will be used, and have been created, to logically separate Management, vMotion, iSCSI storage, and virtual machine traffic.
There are possibly some other assumptions that should be made, such as the server and NIC being on the HCL. Even though these are relevant assumptions for this discussion we just made a blanket assumption that they are and that ESXi had already been installed. If not, this would introduce some other risks to the design but we decided to keep things simple. In a full blown design you want to document these due to the constraint.
- If the assumption of two physical switches is not validated to be true then the physcial switch will become a single point of failure
- No information available on the throughput requirements for virtual machine traffic, a single 1 GbE up-link may not be sufficient.
- No information available on the throughput requirements for iSCSI traffic, a single 1 GbE up-link may not be sufficient.
- Up-links may become saturated which could have a performance impact, this is more likely during a failure.
Logical and Physical Design
The logical design is incomplete since we do not have information on the throughput required for virtual machine and iSCSI traffic so we had to make assumptions that 1 GbE would provide sufficient throughput. Since vMotion is required we know that at least 1 GbE needs to be available for the vMotion vmkernel, we also know that a vMotion can (and often does) saturate a 1 GbE link. ESXi management traffic requires minimal bandwidth, but availability is important.
Here is the host network connectivity design we came up with to satisfy the requirements:
We have the three vmnics which will be connected across two different physical switches (pSwitch1 and pSwitch2).
vmnic0 – onboard —— pSwitch1
vmnic1 – PCIe Card —– pSwitch2
vmnic2 – PCIe Card —– pSwitch2
Physical switch ports should be configured as trunk ports. The VLANs for Management, vMotion, iSCSI storage, and virtual machine network traffic will be allowed on the trunk ports.
A single vSwitch would be created to use the 3 up-links. The vmkernel adapters would be created for Management, vMotion, and iSCSI storage connectivity. A VM Network portgroup would be created for virtual machine network connectivity. Explicit failover order would be configured for each vmkernel and portgroup to separate traffic across vmnics. The explicit failover over configuration would also ensure availability by providing active and standby adapters for the portgroups/vmkernels to eliminate single points of failure.
Mgmt vmk: vmnic0 active, vmnic2 standby
VM Network(s): vmnic0 active, vmnic1 standby
vMotion vmk: vmnic1 active, vmnic0 standby
iSCSI_A vmk: vmnic2 active, all other vmnics unused
iSCSI_B vmk: vmnic0 active, all other vmnics unused
Another thing we talked about was the iSCSI path configuration. We decided to use Fixed for the multi-pathing policy and setting iSCSI_A as the preferred path. This keeps storage traffic on its own vmnic during normal operations. This way other traffic types will not have or be impacted by iSCSI traffic during normal operation. If there was a failure in the iSCSI_A path (vmnic2, pSwitch2, etc) the iSCSI_B path will be used, when iSCSI_A path is restored traffic will switch back to using it.
The following image provides a visual representation on how all this is put together.
Of course this is not an optimal configuration, but it does meet the customer requirements within the constraints. There are likely other ways to configure things which would achieve the same (or similar – or even better) results. The real purpose of this post, and the discussion, was to come up with a solution by applying the VMware design methodology of identifying the requirements and constraints, making assumptions, and documenting the risks.
It was great talking this through with @Emad_Younis (and again follow him, if you are not already – great guy). Look forward to many other discussions with him as he continues on his journey to VCDX.
Constructive comments are always welcome.