Dockerized AI on Large Models With NLP and Transformers

Jarek Szczegielniak

5.00/5 (4 votes)

May 19, 2021

CPOL

5 min read

8725

In this article we run an inference model for NLP using models persisted on a Docker volume.

Download project files - 3.7 KB

Introduction

Container technologies, such as Docker, simplify dependency management and improve portability of your software. In this series of articles, we explore Docker usage in Machine Learning (ML) scenarios.

This series assumes that you are familiar with AI/ML, containerization in general, and Docker in particular.

In the previous article of this series, we have run inference on sample images with TensorFlow using a containerized Object Detection API environment. In this one, we’ll continue to tackle large models, this time for Natural Language Processing (NLP) tasks with PyTorch and Transformers. You are welcome to download the code used in this article.

In subsequent articles of this series, we’ll serve the inference model via a Rest API. Next, we’ll debug the Rest API service running in a container. Finally, we’ll publish the created container in the cloud using Azure Container Instances.

Containers and Large Models

Containers use for Machine Learning is rather straightforward as long as you work with relatively small models. The simplest approach is to treat models as part of your code and add them to the container during build. This gets more complicated as the models become larger. Quite often, models can measure hundreds of megabytes or even more. Adding so much weight to an image would increase its size and build time. Also, if you run your code in a horizontally scaled cluster, this additional weight would be multiplied by the number of running containers.

As a general rule, you should not include in your image anything that can be easily shared between container instances. During development, shared files can be handled by mapping a local folder to a container volume. That’s what we did in the previous article for the Object Detection API model. In production, this may not be possible though, especially if you plan to run your containers in the cloud.

The preferred solution in such cases is to rely on Docker volumes. This approach has some drawbacks though. Persisting and sharing anything between containers introduces additional dependencies. The same container may behave differently depending on what is already stored in the persisted volume. Besides, you cannot store anything in the volume until the container is already running (not during the build).

Dockerfile for Transformers

Hugging’s Face Transformers is a very popular library for handling Natural Language Processing (NLP) tasks. It supports many modern NLP model architectures and ML frameworks, including both TensorFlow and PyTorch. We have already used TensorFlow a few times – let’s go with the PyTorch version for a change.

Our Dockerfile will be fairly simple:

FROM pytorch/pytorch:1.6.0-cuda10.1-cudnn7-runtime

ENV DEBIAN_FRONTEND=noninteractive

ARG USERNAME=mluser
ARG USERID=1000
RUN useradd --system --create-home --shell /bin/bash --uid $USERID $USERNAME \
 && mkdir /home/$USERNAME/.cache && chown -R $USERNAME /home/$USERNAME/.cache 

COPY requirements.txt /tmp/requirements.txt 
RUN pip install -r /tmp/requirements.txt \
 && rm /tmp/requirements.txt

USER $USERNAME
COPY --chown=$USERNAME ./app /home/$USERNAME/app
WORKDIR /home/$USERNAME/app

ENV LC_ALL=C.UTF-8
ENV LANG=C.UTF-8

ENTRYPOINT ["python", "nlp.py"]

Apart from the base image, not much is different from the previous article. The official PyTorch base image provides us with everything we need, and is still smaller than TensorFlow-GPU one.

This time, we don’t need any additional system dependencies, so we can avoid the apt-get command. Note the mkdir /home/$USERNAME/.cache command in the first RUN statement. During the container execution, we’ll mount a Docker volume to this path for the downloaded PyTorch models.

We create this folder explicitly to enable setting access permissions when we build the image. These permissions will carry over to the volume mounted during the container execution.

The requirements.txt file is extremely compact as well. It contains a single line:

transformers==4.3.2

The most complex part of our solution is stored in the nlp.py script. It is set as an entrypoint to our container, and is available in the code download archive.

Composing Containers

To simplify volume creation and mounting, this time we’ll use docker-compose to build and run our container. If you are using the Docker desktop, you should have it already installed. In the case of Docker Server on Linux, you may need to install it.

Note that, in the recent Docker versions, you can use the docker compose command instead of docker-compose. However, to our surprise, we’ve noticed that the results these two commands had brought weren’t always the same. So we’ll stick to docker-compose in this article series.

Now, let’s create a short docker-compose.yml file in the same directory where our Dockerfile resides:

version: '3.7'
volumes:
  mluser_cache:
    name: mluser_cache
services:
  mld07_transformers: 
    build:
      context:  '.'
      dockerfile: 'Dockerfile'
    image: 'mld07_transformers'
    volumes:
      - mluser_cache:/home/mluser/.cache

We instruct docker-compose to do two things here: to create the mluser_cache volume and to mount it to the /home/mluser/.cache path when the container is running. Apart from that, the file defines the image name and the build context.

Building Container

With docker-compose, the command to build the container will be slightly different than in the previous article:

$ docker-compose build --build-arg USERID=$(id -u)

With all basic parameters included in the docker-compose.yml configuration, all we need here is to pass the USERID value. Because this time we won’t mount a local folder as the container volume, the --build-arg USERID attribute could be safely ignored. In fact, when running on Windows, you can always ignore it. We only keep it here to ensure consistent permissions for the subsequent articles.

Running Container

Now that the container is built, let’s check if it works. We’ll start with a question-answering task:

$ docker-compose run --user $(id -u) mld07_transformers \
--task="qa" \
--document="We have considered many names for my dog, such as: Small, Black or Buster. Finally we have called him Ratchet. It is quite an unusual name, but the dog's owner named Michael picked it." \
--question="What is the name of my dog?"

We were deliberately tricky here: throwing multiple "names" in the document; however, the model handled it pretty well:

The first time the model was executed, a related model was downloaded and stored in the .cache folder mapped to our mluser_cache volume.

To inspect the volume size, we can use the following command:

$ docker system df -v

Local Volumes space usage should display:

To make sure that the model data is persisted, we can run the same docker-compose run command as we did previously, with a slightly different result:

This time the model was already in the .cache folder, so it wasn’t downloaded again.

Feel free to experiment with other models for the different NLP tasks included in the downloaded nlp.py script. Note that each of these models needs to be downloaded the first time it is used, and some of them exceed 1 GB in size.

Summary

Our inference model for NLP tasks is working, and the downloaded models are saved to the persisted volume. In the next article, we’ll modify our code to expose the same logic via a Rest API service. Stay tuned!