Doc team metadata: This doc seems like it would fit well into a narrative about the history of backup in the reference section of the documentation leads up to where we are now and why. It could be added to as backup continues to develop.
Hyper-V is completely changed backup and restore in Windows Server 2016. This document follows the journey of Hyper-V backup from Virtual PC and Virtual Server to Windows Server 2016 with Hyper-V. This doc should provide a better understanding of how Hyper-V backup has evolved over the years.
In 2004, Microsoft released the original Microsoft Virtual Server 2005. Virtual Server 2005 didn’t support for clustering, no support for checkpoints, no support for backup at all, and was 64-bit host but 32-bit guest only.
At that time, Microsoft started working on the follow-up release which is Virtual Server 2005 R2 and half-way through, the leaders at Microsoft started contemplating building a brand new hypervisor Windows based virtualization solution. There were intense architectural meetings, long discussions on finding a code name for the project, and eventually Hyper-V was born.
In 2005, the DPM team started working with the Virtual Server engineering team to come up with the first implementation of agentless backup for virtual machines based on Virtual Server 2005 R2, back then the average system available in the market was a 1 processor, dual core system, and could run up to 6/7 virtual machines. However, the average deployment was 3 to 4 virtual machines. The original implementation of backup on Virtual Server 2005 R2 was done while the Hyper-V team were busy on getting Hyper-V up and running, then Hyper-V technical preview was released and customers started asking where is the backup support?
The Hyper-V team took the backup architecture that was built for Virtual Server 2005 R2 and they did the same for Hyper-V V1.
In order to understand how backup initially started in Hyper-V, I will explain the basic of Volume Shadow Copy Service (VSS) concepts. Volume Shadow Copy Service (VSS) provides the system infrastructure for running VSS applications on Windows-based systems. The VSS requester is any application that uses the VSS API to request the services of the Volume Shadow Copy Service to create and manage shadow copies and shadow copy sets of one or more volumes. The VSS Writers are applications (Hyper-V) or services that store persistent information in files on disk and that provide the names and locations of these files to requesters by using the shadow copy interface. The VSS Providers manage running volumes and create the shadow copies of them on demand (storage). What VSS does in the background is the following:
- Coordinates activities of providers, writers, and requesters in the creation and use of shadow copies.
- Furnishes the default system provider.
- Implements low-level driver functionality necessary for any provider to work.
Now in the physical computer, you have a backup application which comes in as VSS requester goes to the VSS system and request, I want a backup of this system, VSS then goes and talk to all the writers on the system which are the various server applications they are installed including Windows components and tell them to get ready for backup, once they are ready for backup, then VSS goes and talk to the provider from the storage infrastructure and says ok take the backup, and finally that goes back to the original VSS requester (backup application), that is the basic workflow in the physical world.
This becomes really tricky once you get virtual machines into the picture, because now you have multiple operating systems inside of operating systems.
So in Virtual Server 2005 R2 and in Windows Server 2008 R2 / 2012 Hyper-V, the backup workflow looks like as Figure 1.
As you can see, we have the Hyper-V Host at the bottom line, we have the Backup App comes in and says I want to take a backup of this system place, the Backup App talks to VSS, then VSS detect there is a Writer on the system for virtualization in this case Virtual Server or Hyper-V, then VSS ask Hyper-V can you get ready for backup? Hyper-V then uses the Integration Components and it reaches into the guest OS, in the guest we have VSS for Hyper-V Integration Components which is actually you can think of it like a light weight Backup App, then VSS for Hyper-V IC talks to VSS inside the guest and ask to take a backup of the system place, then VSS goes and talks to all the various Writers inside the guest and says get ready for backup, when it’s done it returns back to VSS Hyper-V IC and says, can you take a system Guest Snapshot place? then VSS comes back and confirm, I am done. Next, the VSS Hyper-V Integration Components inside the guest talks to the Hyper-V Writer on the host and confirm I am done, the Hyper-V Writer then talks to VSS and says I am done as well, then the VSS on the host either use a Software Snapshot or if you have a VSS provider for a SAN it will use a Hardware Snapshot, at this stage a Physical Backup will take place.
However, there is a problem here, the problem is that time has elapsed between Hyper-V Writer and VSS on the host, because when VSS takes the snapshot on the host, all the virtual hard disks for the virtual machine are no longer data consistent.
What we have at this stage whether we are using a Software provider or Hardware provider, in the backup set we have a collection of VHDs each of which have their own snapshot, the final step which is called backup/auto-recovery where they take the system snapshot and find all the VHDs that are stored in the collection, they took those VHDs and they mount them back into the Host operating system as disks, and lastly they use VSS on the host to roll back to the Guest Snapshot that was taken. This allows to get a clean snapshot.
Figure 1. Hyper-V Backup architecture diagram for Virtual Server 2005 R2, Windows Server 2008 R2 and 2012 This was the architecture used in Virtual Server 2005 R2, Windows Server 2008 R2 and Windows Server 2012. The first issue with this architecture is not scalable, basically this architecture worked reasonably well as long as you have a small number of virtual machines. The second issue though was actually the mount/revert operation of the VHDs at the end of the backup process, as you scale up the number of virtual machines, the backup operation will become exponentially longer, because of all the plug and play mounting drives at the backend and roll them back each and every time, this became a very expensive operation.
Windows Server 2012 R2 Hyper-V In Windows Server 2012 R2, Microsoft made a substantial change to the architecture were they had two primary goals, the first goal was to set the scene for Shielded virtual machines, and the second goal was to increase the reliability of backup. So in Windows Server 2012 R2 Hyper-V, the backup workflow looks like as Figure 2.
As you can see, we have the Hyper-V Host at the bottom line, we have the Backup App comes in and says I want to take a backup of this system place, the Backup App talks to VSS, then VSS detect there is a Writer on the system for virtualization in this case Hyper-V, then VSS ask Hyper-V can you get ready for backup? Hyper-V then uses the Integration Components and it reaches into the guest OS, in the guest we have VSS for Hyper-V Integration Components, then VSS for Hyper-V IC talks to VSS inside the guest and ask to take a backup of the system place, then VSS goes and talks to all the various Writers inside the guest and says get ready for backup. Now here things get different if we compare Figure 1 vs Figure 2, in Windows Server 2012 R2, Microsoft implemented the Hyper-V VSS Provider, which they made the virtual hard disks looks like they supported hardware snapshot.
The VSS inside the guest get the system ready and sends the snapshot request down, then this request gets to the storage stack where an .avhdx get created, this is exactly at the moment when the guest snapped, then VSS on the host confirm I am done, and then it does a host snapshot, this host snapped now includes the (.VHDX / .avhdx), where the VHDX is the data consistent point. In this new architecture the whole mount/auto-revert operation gets removed which increases scalability and reliability of the entire backup process.
Figure 2. Hyper-V Backup architecture diagram for Windows Server 2012 R2 The first interesting challenge with this architecture is, when VSS on the Host calls the Hyper-V Writer and asks to get the metadata file at the beginning of the backup operation (metadata file is basically a blob of all the bits that you should backup), because those files do not exist yet on the other side. The workaround was to generates VHDX GUIDs at early stage and send them to the other side in order to make sure the file names are line-up. The big change here is that the action of reverting the virtual hard disk to the data consistent VSS snapshot now takes place inside the virtual machine instead of in the host operating system as they did in Hyper-V 2008 R2 and 2012 (mount/auto-recovery). This has many benefits, one of which being that it scales excellently! It does, however have one (minor) drawback. In order for this method to work – Hyper-V needs to be able to hot/add and remove virtual hard disks to and from the virtual machine. And this is something that is only supported on SCSI controller (not in IDE controller). If you have noticed when you create a virtual machine in Hyper-V Manager, in System Center Virtual Machine Manager or in PowerShell, Hyper-V always created virtual machines with a SCSI controller connected (even if there were no disks attached). However, if you have manually removed all SCSI controllers from a virtual machine - Hyper-V backup will now fail on that virtual machine, so if you have a PowerShell scripts that removes all SCSI controllers for Gen 1 VMs, then make sure to add at least one SCSI controller. For more information, please check the great post by Mr. Hyper-V @VirtualPCGuy How Hyper-V Backup Got Better in 2012 R2. The second challenge is if you are running a Hyper-V Cluster with lots of Virtual Machines, in Windows Server 2012 R2 there are two particular problems, the first one is still an architecture problem of scale, because in VSS architecture today there is no way to backup virtual machines without triggering a snapshot of your underlying storage! so if you consider this scenario and take 8 nodes Hyper-V Cluster that has 800 virtual machines on it, and then you trigger a backup of all VMs, then Hyper-V will generate 800 snapshots on your storage backend, so you can imagine what will happen with your storage once you start hammering them like that… The workaround was to reduce the number of backup batches that backup vendors have implemented. The second problem that raised is around clustering as well, because we have another layer which added complexity is the Clustered Shared Volume (CSV) which involved all levels of coordination to make sure that the node who is taking the snapshot is the owner node of that CSV and can do all the coordination around that. Therefore Microsoft released a lot of hotfixes for Windows Server 2012 R2-based failover clusters to make this possible.
Windows Server 2016 – Evolving Hyper-V Backup
In Windows Server 2016 the game is completely changed, Microsoft made a pretty significant changes to the backup architecture.
What they did actually, they took the middle piece of the Backup architecture from Windows Server 2012 R2 and they completely decoupled it from the rest of the system. They gave Hyper-V the support so that anyone (backup partners) can call into Hyper-V WMI and ask for VSS snapshot of this virtual machine, and then it will go through and do the whole backup process independently.
So in Windows Server 2016 Hyper-V, the backup workflow looks like as Figure 3.
As you can see, we have the Hyper-V Host at the bottom line, the Backup App will first call into Hyper-V WMI to get all virtual machines that they want in any backup set ready for backup, and then the Backup App will call into VSS/VDS to orchestrate a single hardware snapshot on the storage backend, the goal here is to get to a model where no matter how many virtual machines, and no matter what scale point are you running. So if you compare the backup workflow in previous releases, there are two snapshots happening, there is the VMs snapshot and the underlying hardware snapshot, and those two operations are very tight together, so you cannot do one without doing the other. However, in Windows Server 2016 the Backup App can take as long as it requires to get the set of virtual machines with data consistency, and then do the hardware snapshot as a separate operation, that the key architectural change in Hyper-V 2016.
Figure 3. Hyper-V Backup Architecture diagram for Windows Server 2016
The second improvement that Microsoft worked on is a new set of technology called Resilient Change Tracking (RCT), the goal of RCT is two, the first one is in all the previous architectures (Figure 1 and Figure 2) the results was a full backups of the virtual hard drives of the virtual machines, what this means is that every time you do backup (daily, hourly or whatever), the data is sent over the network each and every time, and that architecture wouldn’t scale, so in order to avoid sending all the data over network, in Windows Server 2008 R2 up to Windows Server 2012 R2 every backup partner has implemented a file system filter so they can track the change blocks on the storage, but having a third-party file system filter in the kernel host OS is a potential system crashing bug, and the second issue though it will affect the storage performance profiling. So what Microsoft did in Windows Server 2016, they built a system where you don’t have to put any file system filter anymore, this was the first motivation, the second motivation though was in the new architecture as shown in Figure 3, where the Backup App call in to snapshot the VMs independently and then take a hardware snapshot as a separate operation, because in all the previous architectures there is an extended period of time where virtual machines will be running on a .avhdx files or differencing files, and Microsoft wants to mitigate the performance impact of doing that. In Windows Server 2016, Microsoft is providing native change block tracking as part of the platform now. With RCT, they were able to get the block allocation table that exist in every VHDX file and lets you know what block is changed, but they don’t write down the data, because with the Backup App, you have a copy of the original data somewhere else, thus will avoid copying the data twice. The great thing about RCT infrastructure, it’s tight to the VHDX file, so wherever the file exists, it will stay with it, which is very flexible when it comes to all VM mobility scenarios. As a side note, two important points to be aware of, if you will be running virtual machines with VHDs in Windows Server 2016 instead of VHDX (please use VHDX), or if your virtual machine still at version 5.0, you will not take the benefit of RCT support, because a virtual machine with version 5.0 might be taken to a host running Windows Server 2012 R2, and 2012 R2 doesn’t understand RCT, so if you are in either of those situations, you will hit a performance impact during the backup process, because Microsoft will not use RCT in this case, and will use differencing disk instead (old style). Microsoft will also support backup for guest clusters (groups of virtual machines with shared virtual hard disks) using the RCT infrastructure, but in order to do that Microsoft introduced a new file format called VHDS (VHD Set), the VHDS is a very small file that has a bunch of .avhdx files along with it.
With the introduction of VHD Set file, Microsoft can take advantage of the storage snapshot and then lazily update the virtual machine configurations in order to reference the right thing. The VHDS file is a reference/pointer file and includes checkpoint metadata. No user data is stored in the VHDS file. You can think of the VHDS as an external shared configuration file between the virtual machines (guest clusters), because in Windows Server 2012 R2, if you have two virtual machines using Shared VHDX file, then each VM has its own configuration file which make a real problem about metadata update and re-synch all the changes, however in Windows Server 2016, if you have two virtual machines with their own configuration but depending on Shared VHDS file which is essentially a configuration file that allows us to have one place to update when there is changes to the underlying storage. The VHD Set file enables solving the problems associated with coordinated updates to all the VM’s configurations by centralizing the VHD file paths in the single VHD Set file. The VHD Set file also provides a stable file name to use in the UI or PowerShell. This VHD Set file can be used like any other VHD file; it can be queried, migrated, and mounted. The primary reason for VHDS is to have support for checkpoints on guest clusters.
Conclusion If you manage Hyper-V backup and knowing the new innovations being enabled in Windows Server 2016 for backup applications, which will enable you to make informed decisions on architectures, solutions and backups. Many Thanks to Mr. Hyper-V, Ben Armstrong for the information.