What Nvidia Device Plugin Can Do for Machine Learning
Sciences et technologies

What Nvidia Device Plugin Can Do for Machine Learning

Generative AI is currently on the rise. For the Cloud Native Computing Foundation, Kubernetes is a natural choice. You still need to add the latest version of the NVIDIA Kubernetes device plugin. Explanations.

GPUs have become the hardware of choice for accelerating machine learning tasks. In this area, the NVIDIA CUDA platform has established itself as the dominant framework for GPU computing.

However, the Cloud Native Computing Foundation recognizes that running GPU-accelerated workloads in Kubernetes environments poses many challenges. This is where the NVIDIA Device Plugin comes into play.

Using the NVIDIA Device plugin for Kubernetes, available at open source under Apache licenseDevelopers and data scientists can focus on building and deploying their models without worrying about the underlying infrastructure. The latter integrates easily with Kubernetes.

How it works ?

The NVIDIA Device Plugin is a Kubernetes daemonset that makes it easy to manage GPU resources in a cluster. Its main function is Automatically display the number of GPUs on each nodemaking them visible and distributed by the Kubernetes scheduler.

This allows modules to request and use GPU resources in the same way as CPU and memory. Under the hood, the plugin communicates with the kubelet on each node, providing information about available GPUs and their capabilities. It also monitors the health of GPUs, ensuring they are running optimally and reporting any issues to Kubernetes.

Prerequisites and installation

Some prerequisites must be met: your GPU nodes must have the required NVIDIA drivers installed (version ~= 384.81), you need to install nvidia-container-toolkit (version >= 1.7.0) on each GPU node and configure nvidia- container. – Runtime as the default runtime for Docker or container, all with a minimum version of Kubernetes 1.10. For example, if you are using AWS EKS, these items will be processed by default when using GPU nodes.

The complete installation is described in this article (in English).

Hi, I’m laayouni2023