Checkpoint and Restore: are we there yet?
|Project:||CRIU / OpenVZ|
Checkpoint/restore is a feature that allows to freeze a set of running processes and save their complete state to a disk. This state can later be restored, so the processes resume exactly the way they were running before. This feature opens a whole set of possibilities, such as live migration, fast start of a huge application, or kernel update without service interruption. While such functionality exist in e.g. OpenVZ kernel, many attempts to merge it upstream had failed miserably, mostly for the code complexity reasons.
We found a way to overcome this by implementing most of the required pieces in userspace, using the existing kernel APIs and extending those if necessary. This is what Checkpoint and Restore in Userspace (aka CRIU) project is about.
The talk outlines the current state of the project, including:
- recent CRIU-related changes merged to the upstream kernel and their use outside of CRIU needs;
- current abilities of CRIU userspace tool, including a demo of Apache+MySQL checkpoint/restore;
- plans for the future.
The report is of interest to system and distro developers, advanced users, and anyone interested in containers, virtualization, HA and HPC.
Pavel Emelyanov is a principal engineer at Parallels, leading the OpenVZ kernel and CRIU teams. He is a prolific upstream kernel contributor with 7 years of kernel development experience. Pavel holds a PhD in Applied Mathematics from the Moscow Institute of Physics and Technology.