Pytorch Docker镜像

本文最后更新于 2024年4月8日凌晨

本文记录制作Pytorch Docker镜像，也作为个dockerfile的笔记

1 制作镜像

1.1找到基础镜像

先去pytorch发布页找到基础镜像，镜像标明了pytorch版本、cuda版本、cudnn版本，然后分成了devel和runtime

runtime: Builds on the base and includes the CUDA math libraries, and NCCL. A runtime image that also includes cuDNN is available.
devel: Builds on the runtime and includes headers, development tools for building CUDA images. These images are particularly useful for multi-stage builds.

总的来说，runtime版本适用于只需要在运行时执行CUDA代码的用户，而devel版本适用于需要在构建过程中进行CUDA开发的用户。

根据自己情况，我是pytorch 2.1.0、cuda 11.8、cudnn 8所有选择如下镜像

[2.1.0-cuda11.8-cudnn8-devel](https://hub.docker.com/layers/pytorch/pytorch/2.1.0-cuda11.8-cudnn8-devel/images/sha256-558b78b9a624969d54af2f13bf03fbad27907dbb6f09973ef4415d6ea24c80d9?context=explore)

1.2 安装环境

安装docker

安装nvidia-container-toolkit工具包，它的作用是为容器提供对NVIDIA GPU的访问和使用能力，使得在容器中可以运行需要GPU加速的应用程序。

sudo apt-get install -y nvidia-container-toolkit

# 配置nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
# 测试nvidia-container-toolkit是否安装正确
sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

运行个容器进去看看，--rm命令是当容器退出是会自动删除容器，这里只是进去看看所有

1	`sudo docker run -ti --rm --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=0 pytorch/pytorch:2.1.0-cuda11.8-cudnn8-devel /bin/bash`

如果报错：

1	`docker: Error response from daemon: Unknown runtime specified nvidia.`

则安装：

1	`sudo apt-get install -y nvidia-docker2`

重启docker

1 2	`sudo systemctl daemon-reload sudo systemctl restart docker`

执行nvidia-smi可以看到run时分配的0号GPU，镜像已经安装了Python 3.10.13，然后进入python环境测试测试

>>> import torch
>>> torch.cuda.is_available()
True
>>>

1.3 编写Dockerfile

1.3.1 常用语法

添加基础镜像

1	`FROM pytorch/pytorch:2.1.0-cuda11.8-cudnn8-devel`

安装基础包，根据自己情况选择安装什么，这里举例就安装了个wget

RUN apt update && \
    apt install -y \
        wget  \
    apt clean && \
    rm -rf /var/lib/apt/lists/*

可以通过WORKDIR切换目录，确定默认地址，可以切换不同位置执行不同命令

1	`WORKDIR /workspace`

暴露端口

1	`EXPOSE 6000`

pip安装包，--no-cache-dir是不缓存下载的包

1	`RUN pip install --no-cache-dir -i https://mirrors.aliyun.com/pypi/simple/ fastapi peft pydantic PyYAML sse_starlette transformers uvicorn tiktoken einops`

复制文件到镜像中

1	`COPY ./requirements.txt /workspace/requirements.txt`

执行命令

1	`CMD ["gunicorn", "server:app", "-b", "0.0.0.0:6000"]`

1.3.2 千问模型的dockerfile示例

qwen仓库中原本已经有dockerfile示例了，但是这里和它不同，使用pytorch的镜像，模型和代码都链接本地文件夹，而不是直接在镜像中下载。

下载千问的模型，保存地址为/home/server/AI/models/Qwen-7B-Chat-231009
克隆qwen代码，保存地址为/home/server/AI/code/Qwen

FROM pytorch/pytorch:2.1.0-cuda11.8-cudnn8-devel

RUN apt update -y && apt upgrade -y && apt install -y --no-install-recommends \
    git \
    git-lfs \
    python3 \
    python3-pip \
    python3-dev \
    wget \
    vim \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /workspace

RUN pip install --no-cache-dir -i https://mirrors.aliyun.com/pypi/simple/ transformers==4.32.0 accelerate tiktoken einops transformers_stream_generator==0.0.4 scipy

RUN pip install --no-cache-dir -i https://mirrors.aliyun.com/pypi/simple/ fastapi uvicorn openai pydantic sse_starlette

EXPOSE 23231

WORKDIR /workspace/code

CMD ["python", "openai_api.py", "--server-port", "23231", "--server-name", "0.0.0.0", "-c", "../model/"]

生成镜像

1	`docker build -t qwen-api .`

2 使用镜像

千问模型的dockerfile示例，创建容器

sudo docker run --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=0 -p 1212:23231 -v /home/server/AI/models/Qwen-7B-Chat-231009:/workspace/model -v /home/server/AI/code/Qwen:/workspace/code --name qwen-api qwen-api

如果需要后台运行则加入-d参数，这里为了方便看日志没有加。后台运行时查看日志

1	`sudo docker logs qwen-api`

然后使用下列代码测试接口

from openai import OpenAI
client = OpenAI(base_url="http://127.0.0.1:1212/v1")

completion = client.chat.completions.create(
  model="gpt-3.5-turbo",
  messages=[
    {"role": "user", "content": "你好"}
  ]
)

print(completion.choices[0].message)

打印如下log


==========
== CUDA ==
==========

CUDA Version 11.8.0

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

Warning: please make sure that you are using the latest codes and checkpoints, especially if you used Qwen-7B before 09.25.2023.请使用最新模型和代码，尤其如果你在9月25日前已 经开始使用Qwen-7B，千万注意不要使用错误代码和模型。
The model is automatically converting to bf16 for faster inference. If you want to disable the automatic precision, please manually add bf16/fp16/fp32=True to "AutoModelForCausalLM.from_pretrained".
Try importing flash-attention for faster inference...
Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary
Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention
Loading checkpoint shards: 100%|██████████| 8/8 [00:14<00:00,  1.85s/it]
INFO:     Started server process [1]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:23231 (Press CTRL+C to quit)
<chat>
[('你好', '你好！有什么我能帮助你的吗？')]
你好
<!-- *** -->
你好！有什么我能帮助你的吗？
</chat>
INFO:     172.19.180.11:63288 - "POST /v1/chat/completions HTTP/1.1" 200 OK

3 上传镜像

以格式：<your-dockerhub-username>/<image-name>:<tag> 创建容器

1	`sudo docker build -t lissettecarlr/qwen-api-torch210-cuda118:v0.1 .`

登录hub

1	`docker login`

上传镜像

1	`sudo docker push lissettecarlr/qwen-api-torch210-cuda118:v0.1`

后续迭代

1
2
3

sudo docker build -t lissettecarlr/qwen-api-torch210-cuda118:v0.2 .
sudo docker tag lissettecarlr/qwen-api-torch210-cuda118:v0.1 lissettecarlr/qwen-api-torch210-cuda118:v0.2
sudo docker push lissettecarlr/qwen-api-torch210-cuda118:v0.2

技术类

#深度学习 #docker

Pytorch Docker镜像

https://blog.kala.love/posts/e6563228/

作者

久远·卡拉

发布于

2023年12月7日

许可协议

streamlit的简单使用上一篇

Kavita的使用下一篇