0. 下载驱动
Nvidia 驱动官方主页,可以选择需要下载的驱动版本: - https://www.nvidia.com/download/index.aspx - https://www.nvidia.com/en-gb/drivers/
显卡 T4 11.4
https://www.nvidia.com/download/driverResults.aspx/225022/en-us/
Version: 470.256.02
Release Date: 2024.6.4
Operating System: Linux 64-bit
CUDA Toolkit: 11.4
Language: English (US)
File Size: 260.21 MB
下载地址:https://us.download.nvidia.com/tesla/470.256.02/NVIDIA-Linux-x86_64-470.256.02.run
1. 安装 NVIDIA 驱动
1.1 下载驱动
下载页面:https://www.nvidia.co.uk/Download/driverResults.aspx/188830/en-uk 根据显卡型号和系统版本,选择显卡驱动的版本新。
Version: 510.73.08
Release Date: 2022.5.23
Operating System: Linux 64-bit
CUDA Toolkit: 11.6
Language: English (UK)
File Size: 314.18 MB
1.2 禁用系统默认的驱动
Create the /etc/modprobe.d/blacklist-nouveau.conf
file and add the following information to the file.
blacklist nouveau
options nouveau modeset=0
Re-generate initramfs.
$ sudo dracut --force
然后重启系统:
reboot
参考:https://support.huawei.com/enterprise/en/doc/EDOC1100165479/93fe5683/how-to-disable-the-nouveau-driver-for-different-linux-systems
1.3 安装新的驱动
- 安装 gcc
yum install -y gcc
- 安装
kernel-devel
yum install -y kernel-devel
- 安装驱动 有可能需要指定内核代码的位置,有的系统上驱动脚本可以自己找到,有的系统无法自己找到,必须手动制出来。
sh NVIDIA-Linux-x86_64-510.73.08.run --kernel-source-path=/usr/src/kernels/3.10.0-1160.66.1.el7.x86_64/
1.4 查看驱动状态
执行命令 nvidia-smi
会列出当前系统中的显卡状态。
$ nvidia-smi
Wed Jun 8 09:26:14 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.73.08 Driver Version: 510.73.08 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:65:00.0 Off | 0 |
| N/A 68C P0 33W / 70W | 4036MiB / 15360MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla T4 Off | 00000000:66:00.0 Off | 0 |
| N/A 60C P0 30W / 70W | 0MiB / 15360MiB | 6% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
2. nvidia-smi 命令的使用
nvidia-smi -L
- 列出当前显卡的 UUID
$ nvidia-smi -L
GPU 0: Tesla T4 (UUID: GPU-b0a15bb5-8865-71d6-dcc5-5bd333c0e6ab)
GPU 1: Tesla T4 (UUID: GPU-685d0312-46ee-10cc-f380-78efdb9d78a1)