Rethinking CPU hotplug for elegance and efficiency
| Project: | The Linux Kernel | ||||
Linux supports the feature of "hotplugging" CPUs in the system, a term used to denote the ability to logically online or offline processors on the fly (i.e., on a running system). This feature is provided by the CPU hotplug infrastructure in the Linux kernel.
CPU hotplug has now come to have many more usecases than what it was originally designed for. Some of the important ones are RAS (to offline malfunctioning CPUs on a running system), power management (to hotplug CPUs based on the load/utilization for aggressive power savings), RT (for isolation of RT workloads), Suspend/Resume (uses CPU hotplug during both suspend and resume), SMP booting, dynamic partition resizing on powerpc/s390 architectures etc.
The current design and implementation of CPU hotplug in Linux has encountered some significant challenges owing to the newer and wider usecases which were not foreseen earlier. Some of them are: less-than-expected performance in several new usecases, race conditions and deadlock possiblities in new scenarios, unacceptable latency for RT workloads and so on. Also, the CPU hotplug design and code has slowly developed into a convoluted mess, making it hard to maintain and improve. This has become a significant problem as highlighted by the fact that it has hindered progress on several occasions [1][2].
To summarize, there is a need to redesign CPU hotplug for both elegance and efficiency, which can help us reap benefits such as fast booting, faster suspend/resume, and more efficient power management. This talk will present an overview of the current usecases and challenges with CPU hotplug and discuss some of the new designs being explored in the community for improving the CPU hotplug framework.
----
[1]. There is a lot of CPU hotplug related code duplication in almost all architectures that support SMP, and worse, these implemenations have plenty of bugs. This caused boot failures on several architectures when a simple change was made to how the scheduler dealt with CPU hotplug.
[2]. Recently, a patch was posted in the community to speed-up booting by trying to boot CPUs asynchronously with the rest of the kernel initialization. This caused boot failures on some architectures because the CPU hotplug code was simply not ready for handling such a scenario.
Srivatsa Bhat
Srivatsa S. Bhat joined IBM Linux Technology Center, Bangalore in July 2011 as a fresher, after graduating from NITK Surathkal in the same year. As a kernel developer, he has been working on Energy Management features in the Linux kernel such as Suspend-to-RAM (Suspend/Resume) and has worked on improving the stability, reliability and performance of several related kernel subsystems such as Freezer, CPU hotplug, Scheduler etc.
Currently he is working on redesigning the CPU Hotplug infrastructure and is developing a generic SMP booting framework to consolidate common code across various architectures.


