HPE SimpliVity Federation Design
I get a lot of questions about the SimpliVity Federation. Most of the questions are around the purpose and placement of the Arbiter, support for vCenter running in a SimpliVity Cluster, and the deployment topology to use for the vCenter environment.
This post provides an overview of HPE SimpliVity Federation design focusing on the HPE SimpliVity Arbiter’s function, the Arbiter’s requirements, and the placement of the Arbiter along with some guidance on the design of the vCenter environment supporting an HPE SimpliVity Federation. This post also includes two HPE SimpliVity Federations design examples.
The following are two of the key factors which influence HPE SimpliVity Federation design which this post will focus on:
- HPE SimpliVity Arbiter Requirements and Placement
- Platform Services Controller (PSC) and vCenter Server Topology and Placement
Note that I refer to the HPE SimpliVity Federation, as a SimpliVity Federation, or just simply as a Federation. I also refer to the HPE SimpliVity Arbiter as the Arbiter.
The Arbiter and the vCenter Server are required components of a HPE SimpliVity Federation. When deploying a new SimpliVity Federation an Arbiter and the vCenter environment must exist prior to the deployment of the SimpliVity nodes. The Arbiter can be deployed to a temporary resource and then redeployed to its permanent location once the SimpliVity Federation is deployed. The vCenter environment can also be deployed to a temporary location and migrated on to the SimpliVity Cluster after deployment. There are a number of options for migrating a vCenter environment into the SimpliVity Federation, I cover a couple of migration options in this post.
The Arbiter is a service which acts as a witness to maintain quorum for a HPE SimpliVity Cluster to ensure data availability and data consistency should a SimpliVity node fail or become inaccessible. The Arbiter service runs on a 64-bit Windows Desktop or Server OS and requires minimal CPU (1 GHz), memory (1 GB), and disk (16 GB) resources (Check the latest SimpliVity Administrator Guide for the most up-to-date resource requirements as they can change from version to version).
The Arbiter service can be installed on a physical or virtual machine and there is no requirement to install it on a dedicated machine. The Arbiter service does not consume a lot of CPU/Memory resources so it can be safely installed on a machine running other services (AD, DNS, etc). The key constraint of the Arbiter is it cannot be run on a HPE SimpliVity Datastore in a SimpliVity Cluster which it is witnessing.
A single Arbiter can witness all SimpliVity Clusters in a SimpliVity Federation. In large, multisite, Federations multiple Arbiters can, and likely should, be deployed. SimpliVity nodes communicate with the Arbiter over the management network using both UDP and TCP on port 22122. Round trip latency between the Arbiter and the SimpliVity nodes should be no more than 300 ms (there can be exceptions to this). All nodes within the same SimpliVity Cluster must communicate with the same Arbiter.
The Arbiter is a dependency for SimpliVity Clusters but if an Arbiter fails all SimpliVity features continue to function and workloads remain available. If an Arbiter becomes inaccessible or fails, alarms will be triggered within the Federation but there will not be any impact on operations. If the Arbiter experiences a critical failure, where it cannot be restarted, it must never be recovered from backup, simply reinstall the Arbiter service. A failed Arbiter should be quickly returned to service, by restarting or reinstalling, to ensure availability and data consistency should a node in a witnessed SimpliVity Cluster experience an unplanned failure.
The following is an example of a SimpliVity Federation design for a multisite enterprise environment. In this type of environment there are likely production workloads running in both sites and the SimpliVity Backup Policies are being used to protect workloads between sites. Arbiter services are running in both the Production Datacenter and Recovery Datacenter.
The Arbiter services are running in virtual machines which are hosted on the SimpliVity Data Virtualization Platform (DVP) but witnessing different, not the one it is running on, SimpliVity Clusters (and in this case a different site) in the same Federation. The Arbiter in the Production Management Cluster is witnessing the SimpliVity Clusters, Management and Recovery, in the Recovery Datacenter. The Arbiter in the Recovery Management Cluster is witnessing the SimpliVtiy Clusters, Management and Recovery, in the Production Datacenter. This is a fully supported deployment configuration which is commonly used with larger deployments.
vCenter Server determines the SimpliVity Federation boundary, multiple vCenter Server managing SimpliVity Clusters in the same Federation must be deployed in Enhanced Linked Mode. In this multisite conceptual design the Platform Services Controller (PSC) and vCenter Server are virtualized and running in the SimpliVity environment. SimpliVity supports both vCenter Server Appliance (VCSA) and Windows vCenter Server deployments. Virtualizing and running these components on SimpliVity has been support for a couple of years now (ever since the first 3.x release – today the current release is 3.7). Virtualizing the PSC and vCenter Server allow vSphere HA to protect the vCenter environment. Running the vCenter environment on the SimpliVity Data Virtualization Platform (DVP) not only provides SimpliVity data efficiencies but allows for the vCenter components to be protected using SimpliVity Policy Based Backups.
As of vSphere 6.0 VMware has deprecated support for using the Embedded PSC when replicating the SSO Domain across Platform Services Controllers or when register multiple vCenter Servers with a single PSC. This VMware KB provides details on the Supported Topologies for deploying the Platform Services Controller (PSC) and vCenter Server for vSphere 6.5.
I almost always recommend deploying an External PSC, as opposed to using the Embedded PSC, even in small environments. In smaller environments there are number of benefits to using an External PSC, one of which is simply providing a supported topology should the environment expand. In larger environments other requirements, for example multiple vCenters in Enhanced Linked Mode, will likely require the use of an External PSC. The VCSA can be deployed as an External PSC so no additional Guest OS licensing is required.
Below is a Federation design to support smaller environments. We commonly refer to this deployment as a Two Plus One, or 2+1. Two SimpliVity nodes in the Production Datacenter and a single node which is used as a SimpliVity Backup and Recovery target in the Recovery Datacenter.
An External PSC and vCenter Server are running on the SimpliVity Production Cluster. This allows for availability of the vCenter environment using vSphere HA and for the vCenter components to be protected with SimpliVity Backup Policies. Using an External PSC allows the environment to easily scale, using a supported topology, if a second vCenter is added to support the Recovery site.
In this SimpliVity Federation design the Arbiter service is running on a physical (or virtual) Windows machine outside the SimpliVity DVP. If a second node is added to the Recovery site, or another cluster is added to the Production site, additional Arbiters could be added to witness across SimpliVity Clusters which would remove the need for this external resource.
I covered a lot of information in this post. Here are a few of the key points to consider for SimpliVity Federation design:
- The Arbiter service is a dependency, and is required, in a SimpliVity Federation.
- The Arbiter acts as a witness to maintain quorum for a SimpliVity Cluster(s).
- The Arbiter service must not be run on the same SimpliVity Cluster it is witnessing but it can run in another SimpliVity Cluster.
- NEVER recover a failed Arbiter from a backup. Reinstall the service.
- Running the vCenter components on the SimpliVity DVP and protecting them with SimpliVity Backup Policies is fully supported.
- Follow VMware Best Practices for supported PSC/vCenter Server Deployment Topologies.
- vCenter determines the SimpliVity Federation boundary. vCenters in Enhanced Linked Mode expand the Federation boundary.
- Multiple vCenter Servers managing SimpliVity Clusters in the same Federation must be linked using Enhanced Linked Mode.
Environments vary and so will the design, hopefully the information here helps with understanding the key factors and the flexibility of a SimpliVity Federation Design.
Questions or comments welcome. Just leave them below.
18 thoughts on “HPE SimpliVity Federation Design”
Nutanix can run there witness server inside the cluster. Compare to HPE putting witness outside , customer may think they need provide another physical server to run arbiter which is (2+arbiter) +1 compare to Nutanix 3 nodes. Any pros and cons to put arbiter outside compare like Nutanix ?
I have the Arbiter installed on the same physical Windows server as vCenter. I want to reboot that server which is managing and arbitrating a 2 node cluster. Is this possible for Windows and other updates?
Yes you can reboot the server running vCenter and the Arbiter. This will not impact the availability of the VMs running on the SimpliVity nodes.
Thanks for stopping by.
if Arbiter and one of simplivity node fail in same time, The VM data still avalailble ?
in a 2 node Simplivity with an Arbiter on a different physical server, if the server with Arbiter needs to be maintained for a preiod of time, how can another Arbiter brought on line on a different server ? is there a good process?
For general maintenance of the Arbiter server (upgrading the Arbiter, applying Windows updates, etc) there is no need to switch to a different Arbiter. The process for Arbiter maintenance would be: ensure the Federation is healthy (if not correct any issues), perform the maintenance on the Arbiter, validate the Federation is healthy (and Arbiter communication is re-established). Updating the arbiter or performing maintenance on the machine the arbiter is running on will have no impact on the availability of workloads or SimpliVity services in a healthy federation.
In the second scenario 2+1 (2 nodes in production site + 1 node in recovery site), Would be possible to run the production cluster arbiter in the recovery node?
Hi Hersey, great post, thank you. Just a question if you had time.
In a situation where one needs to perform a complete shutdown the entire data centre, including all vSphere/SimpliVitiy cluster nodes and Arbiter server. Could you confirm two things please:
1) Does the Arbiter service need to be available before the vSphere/SimpliVity cluster nodes are powered on and the storage cluster can reform?,
2) what do you think would happen to the Simplivity storage cluster if the Arbiter is not running. Would it simply not form the storage cluster?
Thank you for your help in advance.
Can a single vCenter manage multiple federations of 32 nodes.. We have a customer who is looking into HC who buys large quantities of servers from HPE (semi conductor) and one of the concerns that came up was this
Quick question, could we have 2 arbiters for a single federation (2 nodes)? mode (Active-Active, Active-StandBy)?
You mention following “Round trip latency between the Arbiter and the SimpliVity nodes should be no more than 300 ms (there can be exceptions to this)”, can you address the exceptions also?
We are trying to find out how much time we have to switch from a primary link to a secondary link (Spanning Tree Failover) without impact to running VM’s, in case one site dies while having the primary management connection to the Arbiter machine. We are talking about a stretched SimpliVity cluster solution with the arbiter on a third site.
Thanks very much,
One question about federation management Under vsphere:
Can you manage two différents HPE Federation with the same vsphere server?
Or do you need one vsphere server for each HPE federation you wish to manage?
For deployment as a Two Plus One, or 2+1. Two SimpliVity nodes in the Production Datacenter and a single node which is used as a SimpliVity Backup and Recovery target in the Recovery Datacenter. This is the sample deployment method you show at above. I would like to ask, if production site down, how can we recover VM in Recovery Datacenter? As vCenter in Production also down.
There is an emergency restore which can be performed from the CLI on the OVC at the surviving site. This can be used to restore the vCenter Server so that you then have access to other management or you can restore other VMs. This process restores them to the ESXi host the OVC is running on. The CLI commands for the emergency restore can be found in the admin guide. Hope that helps.
I have a 2 + 1 setup but without connecting to 10Gb switch because we do not have 10GB switch.
Prod site has 2 nodes and is connected directly on the 10GB ports with Storage and Federation on separated isolated network.
for DR site, how can I join into the federation if the DR node is unable to communicate with the PRD federation network?
Unfortunately I have deployed it as new federation. Is there a way still to join into existing federation without re-depoly?
The OVC management networks need to be able to communicate with each other. There is away to join a node into an existing Federation after deployment, contact support and they should be able to help you with this. I believe it is a dsv command which requires elevated privileges. You will not be able to do this if the OVC management networks cannot communicate across sites.
Hope that helps…
in case you lose one node within the smplivity cluster containing the arbiter VM
this can impact the production regarding the failover storage .
and this can be recovered but with difficulty . ( dunfey ) .
If the arbiter of a two node cluster is powered off for a period of weeks/months (putting the cluster at risk), can it be powered back on, or should a new arbiter always be deployed.