在 Linux 主机上配置了很多次 Cuda/CuDNN 的运行环境,在此记录下用到的脚本命令以复用。
特别提醒,先了解清楚 GPU 卡的型号,查清与主机 Linux 内核兼容的驱动程序、Cuda 和 CuDNN 的发行版。
请以 root 权限执行本文的所有 bash 命令。
1. NVIDIA 驱动安装
# WIKI: https://download.nvidia.com/XFree86/Linux-x86_64/375.20/README/installdriver.html
wget http://us.download.nvidia.com/tesla/384.145/NVIDIA-Linux-x86_64-384.145.run && \
chmod u+x NVIDIA-Linux-x86_64-384.145.run && \
./NVIDIA-Linux-x86_64-384.145.run --silent --dkms --accept-license
2. 打开持久模式
nvidia-smi -pm ENABLED # WIKI https://docs.nvidia.com/deploy/driver-persistence/index.html
4. GPU 设备信息查看
nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.145 Driver Version: 384.145 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-PCIE… Off | 00000000:1A:00.0 Off | 0 |
| N/A 34C P0 37W / 250W | 0MiB / 16152MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-PCIE… Off | 00000000:1F:00.0 Off | 0 |
| N/A 36C P0 36W / 250W | 0MiB / 16152MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
nvidia-smi topo --matrix # 查看拓扑信息
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 mlx5_1 mlx5_0 CPU Affinity
GPU0 X PIX PIX PIX SYS SYS SYS SYS SYS SYS 0-15,32-47
GPU1 PIX X PIX PIX SYS SYS SYS SYS SYS SYS 0-15,32-47
GPU2 PIX PIX X PIX SYS SYS SYS SYS SYS SYS 0-15,32-47
GPU3 PIX PIX PIX X SYS SYS SYS SYS SYS SYS 0-15,32-47
GPU4 SYS SYS SYS SYS X PIX PIX PIX NODE NODE 16-31,48-63
GPU5 SYS SYS SYS SYS PIX X PIX PIX NODE NODE 16-31,48-63
GPU6 SYS SYS SYS SYS PIX PIX X PIX NODE NODE 16-31,48-63
GPU7 SYS SYS SYS SYS PIX PIX PIX X NODE NODE 16-31,48-63
mlx5_1 SYS SYS SYS SYS NODE NODE NODE NODE X PIX
mlx5_0 SYS SYS SYS SYS NODE NODE NODE NODE PIX X
nvidia-smi --id=0 --format=csv --query-gpu=utilization.gpu,memory.used
utilization.gpu [%], memory.used [MiB]
0 %, 0 MiB
5. CUDA Toolkit 安装
wget https://developer.nvidia.com/compute/cuda/9.0/Prod/local_installers/cuda_9.0.176_384.81_linux-run && \
chmod u+x cuda_9.0.176_384.81_linux-run && \
./cuda_9.0.176_384.81_linux-run --toolkit --silent --verbos
cat << EOF >> /etc/ld.so.conf.d/cuda.conf
/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
EOF
ldconfig
cat << EOF >> /etc/profile.d/cuda.sh
export PATH=/usr/local/cuda/bin:\$PATH
EOF
source /etc/profile
5. CuDNN 安装
# CuDNN 下载需要 Nvidia 账号。直接访问以下 URL,会被重定向到登录页面。
dpkg -i libcudnn7_7.0.5.15-1+cuda9.0_amd64.deb # 安装到 /usr/lib/x86_64-linux-gnu