Using Docker with GPU Support for Machine Learning Applications

Mihir Popat
4 min readJan 15, 2025

--

The use of GPUs (Graphics Processing Units) in machine learning has revolutionized the ability to train complex models quickly and efficiently. Docker, being a key tool in modern software development, allows developers to encapsulate their machine learning environments into portable and reproducible containers. When combined with GPU support, Docker becomes an essential tool for deploying, training, and running machine learning applications at scale.

In this article, we’ll take a deep dive into how to use Docker with GPU support for machine learning applications. We’ll explore the prerequisites, setup, and best practices for leveraging the power of Docker and GPUs in a seamless manner.

Photo by Thufeil M on Unsplash

Why Use Docker with GPUs for Machine Learning?

Using Docker with GPU support offers several advantages for machine learning workflows:

  1. Reproducibility: Docker ensures that your code runs consistently, regardless of the underlying system, by packaging dependencies and configurations into a container.
  2. Scalability: Containers make it easy to scale workloads across multiple GPUs or machines, especially in production environments.
  3. Portability: Docker containers can be shared across teams or environments without worrying about compatibility issues.
  4. Simplified Setup: Docker eliminates the need for complex installations of GPU drivers and libraries by providing a pre-configured environment.

Prerequisites

Before you start using Docker with GPU support, make sure your system meets the following requirements:

1. Hardware

  • A system with an NVIDIA GPU. Docker’s GPU support is designed to work with NVIDIA GPUs using the CUDA framework.

2. Software

  • Operating System: Linux (Ubuntu or CentOS are commonly used). GPU support in Docker is primarily designed for Linux-based systems.
  • Docker: Install Docker Engine (version 19.03 or later).
  • NVIDIA Driver: Install the latest NVIDIA GPU driver for your system.
  • NVIDIA Container Toolkit: Install the NVIDIA Container Toolkit to enable GPU support in Docker containers.

3. CUDA and cuDNN

  • CUDA and cuDNN are required for most deep learning frameworks. These libraries can either be installed on the host machine or pulled directly from NVIDIA’s Docker images.

Step-by-Step Guide: Setting Up Docker with GPU Support

Step 1: Install the NVIDIA GPU Driver

Install the correct NVIDIA GPU driver for your system. You can download the driver from the NVIDIA Driver Downloads page.

sudo apt update
sudo apt install nvidia-driver-<version>
sudo reboot

Verify the installation:

nvidia-smi

This command should display information about your GPU.

Step 2: Install Docker

Follow the official instructions to install Docker on your system. For Ubuntu, use the following commands:

sudo apt update
sudo apt install apt-transport-https ca-certificates curl software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install docker-ce docker-ce-cli containerd.io

Verify Docker is installed:

docker --version

Step 3: Install the NVIDIA Container Toolkit

The NVIDIA Container Toolkit enables Docker to use GPUs.

  1. Add the NVIDIA package repository:
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
&& curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
  • Install the NVIDIA Container Toolkit:
sudo apt update
sudo apt install -y nvidia-container-toolkit
sudo systemctl restart docker
  1. Verify GPU support in Docker:
docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi

Step 4: Pull a Docker Image with GPU Support

NVIDIA provides Docker images for various machine learning frameworks like TensorFlow and PyTorch. These images come pre-installed with CUDA and cuDNN, making it easier to start using GPUs right away.

Example: Pull the TensorFlow image with GPU support:

docker pull tensorflow/tensorflow:latest-gpu

Step 5: Run a Docker Container with GPU Support

To run a container with GPU access, use the --gpus flag. For example:

docker run --gpus all -it tensorflow/tensorflow:latest-gpu bash

Inside the container, verify that TensorFlow detects the GPU:

import tensorflow as tf
print("GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

Best Practices for Using Docker with GPUs

  1. Use Pre-Built Images: Save time by using official Docker images from NVIDIA, TensorFlow, or PyTorch, which come pre-configured for GPU support.
  2. Leverage Multi-Stage Builds: Use multi-stage builds in your Dockerfile to keep your final image lightweight.
  3. Monitor GPU Usage: Tools like nvidia-smi can help monitor GPU usage inside a container.
  4. Isolate GPU Resources: Use the --gpus flag to allocate specific GPUs to a container (e.g., --gpus '"device=0,1"' to use GPU 0 and 1).
  5. Keep CUDA Versions Consistent: Ensure the CUDA version in your container matches the version supported by your GPU driver.

Use Cases

Here are some common use cases for running machine learning applications with Docker and GPUs:

  • Training Large Models: Train complex neural networks on multiple GPUs to accelerate training time.
  • Inference at Scale: Deploy Docker containers with GPU support to serve machine learning models in production for real-time predictions.
  • Experimentation: Use Docker containers to test and compare different machine learning frameworks without modifying the host system.
  • CI/CD Pipelines: Integrate GPU-enabled Docker containers into CI/CD pipelines for automated testing and training of models.

Conclusion

Docker with GPU support is a game-changer for machine learning developers. It simplifies the deployment of complex environments, ensures reproducibility, and accelerates workflows by leveraging the power of GPUs. By following the steps outlined in this guide, you can set up Docker with GPU support and unlock new possibilities for training and deploying machine learning applications. Start experimenting today and take your machine learning projects to the next level!

Connect with Me on LinkedIn

Thank you for reading! If you found these DevOps insights helpful and would like to stay connected, feel free to follow me on LinkedIn. I regularly share content on DevOps best practices, interview preparation, and career development. Let’s connect and grow together in the world of DevOps!

--

--

Mihir Popat
Mihir Popat

Written by Mihir Popat

DevOps professional with expertise in AWS, CI/CD , Terraform, Docker, and monitoring tools. Connect with me on LinkedIn : https://in.linkedin.com/in/mihirpopat

No responses yet