Critical flaw in NVIDIA Container Toolkit allows full host takeover
A critical vulnerability in NVIDIA Container Toolkit impacts all AI applications in a cloud or on-premise environment that rely on it to access GPU resources.
The security issue is tracked as CVE-2024-0132 and allows an adversary to perform container escape attacks and gain full access to the host system, where they could execute commands or exfiltrate sensitive information.
The particular library comes pre-installed in many AI-focused platforms and virtual machine images and is the standard tool for GPU access when NVIDIA hardware is involved.
According to Wiz Research, more than 35% of cloud environments are at risk of attacks exploiting the vulnerability.
Container escape flaw
The security issue CVE-2024-0132 received a critical-severity score of 9.0. It is a container escape problem that affects NVIDIA Container Toolkit 1.16.1 and earlier, and GPU Operator 24.6.1 and older.
The problem is a lack of secure isolation of the containerized GPU from the host, allowing containers to mount sensitive parts of the host filesystem or access runtime resources like Unix sockets for inter-process communication.
While most filesystems are mounted with “read-only” permissions, certain Unix sockets such as ‘docker.sock’ and ‘containerd.sock’ remain writable, allowing direct interactions with the host, including command execution.
An attacker can take advantage of this omission via a specially crafted container image and reach the host when executed.
Wiz says that such an attack could be carried out either directly, via shared GPU resources, or indirectly, when the target runs an image downloaded from a bad source.
Wiz researchers discovered the vulnerability and reported it to NVIDIA on September 1st. The GPU maker acknowledged the report a couple of days later, and released a fix on September 26th.
Impacted users are recommended to upgrade to NVIDIA Container Toolkit version 1.16.2 and NVIDIA GPU Operator 24.6.2.
Technical details for the exploiting the security issue remain private for now, to give impacted organizations time to mitigate the issue in their environments. However, the researchers are planning to release more technical information.