说明:
(1)Ubuntu 14.04 系统升级NVIDIA显卡驱动及 CUDA 8.0
(2)已经安装 CUDA 7.5,nvidia-smi显示驱动版本为 361.93,(很多博客说要先卸载旧驱动,但是笔者没卸载也成功了;还有关闭三方驱动的问题,比如 blacklist nouveau等,如果已经成功安装过显卡驱动,那应该已经禁用过了)
(3)显卡类型 GeForce GTX TITAN X,查看显卡
$ lspci |grep VGA
02:00.0 VGA compatible controller: NVIDIA Corporation GM200 [GeForce GTX TITAN X] (rev a1)
03:00.0 VGA compatible controller: NVIDIA Corporation GM200 [GeForce GTX TITAN X] (rev a1)
06:00.0 VGA compatible controller: ASPEED Technology,Inc. ASPEED Graphics Family (rev 30)
1.步骤一,首先按照以下步骤安装CUDA 8.0 (源自https://caffe2.ai/docs/getting-started.html?platform=ubuntu&configuration=compile),其实增加了 NVIDIA的源之后,安装CUDA的同时,会把匹配的显卡驱动也安装上,很省心~
sudo apt-get update && sudo apt-get install wget -y --no-install-recommends wget "http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1404/x86_64/cuda-repo-ubuntu1404_8.0.61-1_amd64.deb" sudo dpkg -i cuda-repo-ubuntu1404_8.0.61-1_amd64.deb sudo apt-get update sudo apt-get install cuda
此外,使用NVIDIA源之后,更新驱动的方法(原自http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#ubuntu-installation),比sudo apt-get install nvidia-375 nvidia-settings nvidia-prime 等命令要方便
$ sudo apt-get install cuda-drivers # Ubuntu
2.步骤二,再安装cudnn 5.1 (源自https://caffe2.ai/docs/getting-started.html?platform=ubuntu&configuration=compile),安装更高版的cudnn 6.0 需要注册登录NVIDIA后手动下载
CUDNN_URL="http://developer.download.nvidia.com/compute/redist/cudnn/v5.1/cudnn-8.0-linux-x64-v5.1.tgz" wget ${CUDNN_URL} sudo tar -xzf cudnn-8.0-linux-x64-v5.1.tgz -C /usr/local rm cudnn-8.0-linux-x64-v5.1.tgz && sudo ldconfig
注意:
*安装cudnn实际上是在/usr/local/cuda-8.0/lib64/ 目录下增加了 libcudnn.*.* 的程序库文件
*安装CUDA时,会在 /usr/local/ 目录下自动创建软连接 cuda 指向 cuda-8.0 ,便于使用
问题解决:安装CUDA 及 cudnn完成后,执行 nvidia-smi 提示
Failed to initialize NVML: Driver/library version mismatch
版本不匹配,执行dpkg-l|grep nvidia 发现有重复驱动
$ sudo dpkg -l |grep nvidia
ii nvidia-352 375.88-0ubuntu1 amd64 Transitional package fornvidia-375
rc nvidia-361 361.93.02-0ubuntu1 amd64 NVIDIA binary driver - version 361.93.02
ii nvidia-375 384.90-0ubuntu0.14.04.1 amd64 Transitional package fornvidia-384
ii nvidia-384 384.90-0ubuntu0.14.04.1 amd64 NVIDIA binary driver - version 384.90
ii nvidia-384-dev 384.90-0ubuntu0.14.04.1 amd64 NVIDIA binary Xorg driver development files
ii nvidia-modprobe 384.66-0ubuntu1 amd64 Load the NVIDIA kernel driver and create device files
rc nvidia-opencl-icd-361 361.93.02-0ubuntu1 amd64 NVIDIA OpenCL ICD
ii nvidia-opencl-icd-384 384.90-0ubuntu0.14.04.1 amd64 NVIDIA OpenCL ICD
ii nvidia-prime 0.6.2 amd64 Tools to enable NVIDIA's Prime
ii nvidia-settings 384.66-0ubuntu1 amd64 Tool for configuring the NVIDIA graphics driver
把多余的驱动删除后(没有使用--purge选项,没必要动不动就完全清理),执行 nvidia-smi 仍然提示版本不匹配
sudo apt-get remove nvidia-352
sudo apt-get remove nvidia-361
sudo apt-get remove nvidia-375
sudo apt-get remove nvidia-opencl-361
(* 后来分析,应该是未重启的原因,并非旧驱动未删除。因为笔者删除 nvidia-361驱动时,提示该驱动已经不存在了,估计是安装CUDA8.0时自动卸载了旧驱动,然后安装了新驱动,但是dpkg的列表未更新,重启才能生效。切记删除的时候慎用 autoremove !!!)
最后,重启系统,执行nvidia-smi ,看到驱动版本为 384.90,CUDA及cudnn更新成功
$ nvidia-smi
Fri Nov 10 11:07:39 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.90 Driver Version: 384.90 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX TIT... Off | 00000000:02:00.0 Off | N/A |
| 22% 51C P0 75W / 250W | 0MiB / 12207MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX TIT... Off | 00000000:03:00.0 Off | N/A |
| 0% 58C P0 67W / 250W | 0MiB / 12207MiB | 1% Default |
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+