配置 Nvidia GPU 主机的运行环境
阅读原文时间:2023年07月08日阅读:34

在 Linux 主机上配置了很多次 Cuda/CuDNN 的运行环境,在此记录下用到的脚本命令以复用。

特别提醒,先了解清楚 GPU 卡的型号,查清与主机 Linux 内核兼容的驱动程序、Cuda 和 CuDNN 的发行版。

请以 root 权限执行本文的所有 bash 命令。

1. NVIDIA 驱动安装

# WIKI: https://download.nvidia.com/XFree86/Linux-x86_64/375.20/README/installdriver.html
wget http://us.download.nvidia.com/tesla/384.145/NVIDIA-Linux-x86_64-384.145.run && \
chmod u+x NVIDIA-Linux-x86_64-384.145.run && \
./NVIDIA-Linux-x86_64-384.145.run --silent --dkms --accept-license

2. 打开持久模式

nvidia-smi -pm ENABLED # WIKI https://docs.nvidia.com/deploy/driver-persistence/index.html

4. GPU 设备信息查看

nvidia-smi

+-----------------------------------------------------------------------------+

| NVIDIA-SMI 384.145 Driver Version: 384.145 |

|-------------------------------+----------------------+----------------------+

| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |

| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |

|===============================+======================+======================|

| 0 Tesla V100-PCIE… Off | 00000000:1A:00.0 Off | 0 |

| N/A 34C P0 37W / 250W | 0MiB / 16152MiB | 0% Default |

+-------------------------------+----------------------+----------------------+

| 1 Tesla V100-PCIE… Off | 00000000:1F:00.0 Off | 0 |

| N/A 36C P0 36W / 250W | 0MiB / 16152MiB | 0% Default |

+-------------------------------+----------------------+----------------------+

nvidia-smi topo --matrix # 查看拓扑信息

GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 mlx5_1 mlx5_0 CPU Affinity

GPU0 X PIX PIX PIX SYS SYS SYS SYS SYS SYS 0-15,32-47

GPU1 PIX X PIX PIX SYS SYS SYS SYS SYS SYS 0-15,32-47

GPU2 PIX PIX X PIX SYS SYS SYS SYS SYS SYS 0-15,32-47

GPU3 PIX PIX PIX X SYS SYS SYS SYS SYS SYS 0-15,32-47

GPU4 SYS SYS SYS SYS X PIX PIX PIX NODE NODE 16-31,48-63

GPU5 SYS SYS SYS SYS PIX X PIX PIX NODE NODE 16-31,48-63

GPU6 SYS SYS SYS SYS PIX PIX X PIX NODE NODE 16-31,48-63

GPU7 SYS SYS SYS SYS PIX PIX PIX X NODE NODE 16-31,48-63

mlx5_1 SYS SYS SYS SYS NODE NODE NODE NODE X PIX

mlx5_0 SYS SYS SYS SYS NODE NODE NODE NODE PIX X

nvidia-smi --id=0 --format=csv --query-gpu=utilization.gpu,memory.used

utilization.gpu [%], memory.used [MiB]

0 %, 0 MiB

5. CUDA Toolkit 安装

wget https://developer.nvidia.com/compute/cuda/9.0/Prod/local_installers/cuda_9.0.176_384.81_linux-run && \
chmod u+x cuda_9.0.176_384.81_linux-run && \
./cuda_9.0.176_384.81_linux-run --toolkit --silent --verbos
cat << EOF >> /etc/ld.so.conf.d/cuda.conf
/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
EOF
ldconfig
cat << EOF >> /etc/profile.d/cuda.sh
export PATH=/usr/local/cuda/bin:\$PATH
EOF
source /etc/profile

5. CuDNN 安装

# CuDNN 下载需要 Nvidia 账号。直接访问以下 URL,会被重定向到登录页面。

https://developer.nvidia.com/compute/machine-learning/cudnn/secure/v7.0.5/prod/9.0_20171129/Ubuntu16_04-x64/libcudnn7_7.0.5.15-1+cuda9.0_amd64

dpkg -i libcudnn7_7.0.5.15-1+cuda9.0_amd64.deb # 安装到 /usr/lib/x86_64-linux-gnu

手机扫一扫

移动阅读更方便

阿里云服务器
腾讯云服务器
七牛云服务器

你可能感兴趣的文章