We wanted to share our story of migrating from VMware to XCP.ng; the benefits we’ve realised and more importantly why our business is financially better off having made the move.
You might ask why trust us and what we have to say? For the last 22 years, our core business has centred around providing specialist-managed hosting and cyber security services. It’s what we do day in and day out, 24/7 for our clients. For over sixteen years we’ve used VMware to provide our private cloud platform, and we also manage VMware on other private clouds as part of our managed services offering. Without wishing to sound arrogant, we know VMware extremely well indeed.
The acquisition of VMware by Broadcom came with a surprise sting in the tail when Broadcom announced the termination of all partner contracts and the imposition of new long-term contracts, along with some hefty and onerous commitments. Those topics have already been covered elsewhere however, it’s worth mentioning that our VMware bill would have gone up approximately 100x our current spend, which was unacceptable.
We were therefore faced with the challenge of migrating our private cloud platform to a new hypervisor under tight time constraints. We are by nature curious about all matters of technology, especially when it comes to our specialist services so we were already familiar with a few alternative hypervisors over the years, including Proxmox, Hyper-V, KVM, and Xen Server.
Why we chose XCP.ng (and came to understand the benefits it offers)
XCP.ng is the natural successor to Citrix Xen Server. It is open source, under active development, with a passionate and engaged user base. Its corporate support is provided by Vates who are guardians of the XCP.ng project as well as the authors of the Xen Orchestra Appliance (XOA) which provides vCenter-like capabilities for clusters of XCP host servers.
For us, needing high availability and live host migration, Proxmox wasn’t quite ready. Hyper-V, despite Microsoft’s protestations, is a Windows product and half our estate is Linux. Azure copes with Linux extremely well, Hyper-V not so much.
Incidentally, we considered Public Cloud migration, but the costs didn’t stack up and our customers chose us because we’re not a faceless hyper-scaler. It didn’t suit what we wanted to offer or reflect our ethos as a business.
Differences between XCP.ng and VMware
RAM usage
Unlike VMware, which has a highly memory-efficient hypervisor, XCP requires a control plane on each host, known as dom0, which needs plenty of RAM if you’re running big guests.
XCP.ng is also not as good at RAM sharing as VMware so if you’re used to economies of scale when hosting, say, a farm of similar Windows VMs, you’ll find you need a lot more physical RAM.
Virtual Disks
There are some real trip hazards when it comes to virtual disks:
- XCP.ng has a 2TiB limit on disk size in their native format, to exceed this size, the disk must be installed on a direct attached iSCSI device within the guest VM.
- Persistent Independent disks don’t exist in XCP.ng. These are useful if you’re using a snapshot-based backup system on large or busy disks as they are excluded from the snapshot, making the process much faster. We used them on SQL server data disks, since they’re already backed up by SQL, there was no need to try and snapshot the data disks. This can be done with iSCSI disks directly mounted to the VM, as they are ignored by the XOA backup.
- Direct attached ISCSI disks are finicky to get running with good performance. You need to pay close attention to disk and network settings. Even then we experienced problems with disk read performance during backups.
- You can’t resize a disk whilst the machine is on. If you have LVM running on a Linux VM is trivial to add a new disk and extend the virtual partition but without that, or on a Windows guest, you need to power it off to resize the underlying disks.
Management Agent
In VMware, a guest can be vMotioned across hosts without any installed agent. Unfortunately, XCP.ng does require a functioning management agent in each VM for hot migration.
Xentools is available for all modern operating systems, and many older versions too. But there is a limit, and we had some difficulty getting a Windows Server 2008 (don’t ask, it’s a client requirement) agent to install properly. Vates were very helpful in this regard, suggesting several possibilities, one of which worked a treat.
We also have some black-box appliances wherein we cannot install the xentools agent. High Availability still works, if a host fails the guests are restarted elsewhere, but to conduct maintenance on the host, the guest must be shut down and manually moved. Not a huge inconvenience but worth bearing in mind if you have many such devices in your estate.
Backup and Disaster Recovery
VMware provides DR via Site Recovery Manager (SRM), which is a paid-for bolt-on; and doesn’t handle basic backup and restore at all. We, like many, used Veeam for backup and restore in VMware, but unfortunately, Veeam currently doesn’t support XCP.ng.
Thankfully, XOA has native backup and continuous replication built-in at no extra cost which is a great bonus and, as noted above, it is controlled very simply by adding tags to VMs you want to either just backup, backup and verify, or backup and run continuous replication.
There is, of course, a but – XOA lacks many of the features of SRM that make the latter so powerful. It doesn’t have the concept of recovery plans, or protection groups, so recovery of VMs in the event of a DR invocation is a much more manual process than with SRM. We’ve long been spoiled by SRM’s push-button failover capabilities. That said, we know from conversations with Vates that this is something they are working on.
UI Quirks
The UI is not as complicated as vCenter, and in many ways benefits from the simplicity. It does, however, have quirks of its own.
There’s no concept of a folder structure, all VMs are just in a long list so it is vital to define a tagging taxonomy at the start and use it. We understand the next major version of XOA includes a folder view based on tags, but it is still vital to get this right.
Tagging is also used to control some of the XOA behaviours, principally the backup system. To back up a VM you only need to tag it Backup.
In the current version of XOA, lists are sorted “asciibetically”, ie abcABC, which is extremely annoying if you have a mix of linux servers, which prefer lowercase, and Windows servers, which favour uppercase. We hope this is fixed in a future release.
XOA is a ReactJS application talking to a Node.js backend. It loads up a lot of data to get started and is very responsive. Sometimes, though, the connection between the two is closed, either a timeout or poor network connectivity. The app pops up a little message at the bottom of the menu list saying “Disconnected” but on an ordinary 1080 display, that message is lost below the fold, yet the app still functions with its out-of-date data until you try to do something when it often fails silently. It’s only minor, and we’re told it will be fixed in a forthcoming release, but it can niggle a bit every time you get caught out.
Technical Support
“Open source has no support” – categorically untrue in this case. Vates, as noted above, offer a range of support contracts, one of which we selected as we knew that we would need some help in getting the project done in the timeframe.
I have no hesitation in recommending Vates’ support. They’ve been enthusiastic, knowledgeable, quick to respond, and above all, ready and able to fix any problems we’ve encountered.
They’ve also been responsive to our requests and comments (see the Disconnected UI issue above).
It all feels a lot more personal than VMware, never mind the horror stories that are coming out about the problems VMware by Broadcom are having transitioning to Broadcom systems. We were concerned about the sudden growth caused by people migrating away from VMware but I’m pleased to report Vates continues to provide excellent service.
Conclusion
We were happy enough with VMware and had no great desire to change such a core part of our infrastructure, that is until Broadcom forced us to look elsewhere. However, after successfully finding an alternative that we are pleased with, we are even more delighted to report on the great return on investment.
Benefits of migrating from VMware to XCP.ng
Even after buying new hardware to run the two platforms in parallel, we expect it to pay for itself within 12 months, and across three years – savings in excess of £500k are expected, something which has made our finance director smile and that doesn’t happen very often. The final point to make here is the peace of mind our team has as we are now free of contracts being terminated with little regard for the impact on us, huge (and unjustified) increases in costs and breaking free from a company that seemed to have little if any regard for its many customers.
It has become a win-win for our business.
In summary – we recommend that the combination of XCP.ng, XOA, and support from Vates is a highly capable replacement for VMware ESXi, vCenter, and SRM.
If you would like to talk to our team about migrating to XCP.ng, the best way to contact us is via email: [email protected] or call us directly on 020 3745 7706.