- Published on
Nvidia (10.0) for TensorFlow & PyTorch on Fedora 30
- Authors
- Name
- Martin Andrews
- @mdda123
Fedora 30 now has Python 3.7
So let's rebuild our virtualenv
:
sudo dnf install python3-virtualenv
virtualenv --system-site-packages -p python3.7 env37
. env37/bin/activate
pip3 install https://download.pytorch.org/whl/cu100/torch-1.1.0-cp37-cp37m-linux_x86_64.whl
pip3 install https://download.pytorch.org/whl/cu100/torchvision-0.3.0-cp37-cp37m-linux_x86_64.whl
pip install tf-nightly-gpu-2.0-preview # 8hrs old...
BUT ... TF hasn't got to cuda 10.1 yet
Unfortunately, once again, TensorFlow goes against the grain whereas the negativo
repo wants to keep me up-to-data (latest cuda
is v10.1).
So: Reinstall cuda
and cudnn
(but leave negativo
in charge of the Nvidia driver, since that process, including kernel recompilation, etc, works well).
First, get rid of the existing cuda
files (all the below should be done as root
) :
dnf remove cuda
# Which should disappear the following :
#/usr/lib64/libcuda.so.1
#/usr/lib64/libcuda.so.390.12
#/usr/lib64/pkgconfig/cuda.pc
#/usr/lib64/pkgconfig/cudart.pc
#/usr/lib64/libcudart.so.10.1.168
#/usr/lib64/libicudata.so.63
#/usr/include/cuda/cuda.h
Download the local installer, and use it:
wget https://developer.nvidia.com/compute/cuda/10.0/Prod/local_installers/cuda_10.0.130_410.48_linux
sh cuda_10.0.130_410.48_linux --override # Ignore gcc version complaints
Which (following the prompts) gives:
Do you accept the previously read EULA?
accept/decline/quit: accept
You are attempting to install on an unsupported configuration. Do you wish to continue?
(y)es/(n)o [ default is no ]: yes
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 410.48?
(y)es/(n)o/(q)uit: no
Install the CUDA 10.0 Toolkit?
(y)es/(n)o/(q)uit: yes
Enter Toolkit Location
[ default is /usr/local/cuda-10.0 ]:
Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: yes
Install the CUDA 10.0 Samples?
(y)es/(n)o/(q)uit: no
Installing the CUDA Toolkit in /usr/local/cuda-10.0 ...
===========
= Summary =
===========
Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-10.0
Samples: Not Selected
Please make sure that
- PATH includes /usr/local/cuda-10.0/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-10.0/lib64, or, add /usr/local/cuda-10.0/lib64 to /etc/ld.so.conf and run ldconfig as root
To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-10.0/bin
Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-10.0/doc/pdf for detailed information on setting up CUDA.
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 384.00 is required for CUDA 10.0 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
sudo <CudaInstaller>.run -silent -driver
Logfile is /tmp/cuda_install_6715.log
#/usr/local/cuda-10.0/lib64/libcudart.so
#/usr/local/cuda-10.0/lib64/libcudart.so.10.0
#/usr/local/cuda-10.0/lib64/libcudart_static.a
#/usr/local/cuda-10.0/lib64/libcudart.so.10.0.130
Also obtain cudnn
(which doesn't work via direct download, since Nvidia requires a login...) :
# After downloading in a browser as a regular user :
mv /home/username/Downloads/cudnn-10.0-linux-x64-v7.6.1.34.tgz .
tar -xzf cudnn-10.0-linux-x64-v7.6.1.34.tgz
# And move over the files parallel to the libcudart.so ones above :
cp cuda/include/cudnn.h /usr/local/cuda-10.0/include/
cp cuda/lib64/libcudnn.so.7.6.1 /usr/local/cuda-10.0/lib64/
ln -s /usr/local/cuda-10.0/lib64/libcudnn.so.7.6.1 /usr/local/cuda-10.0/lib64/libcudnn.so.7
ln -s /usr/local/cuda-10.0/lib64/libcudnn.so.7 /usr/local/cuda-10.0/lib64/libcudnn.so
cp cuda/lib64/libcudnn_static.a /usr/local/cuda-10.0/lib64/
# So that :
ls -l /usr/local/cuda-10.0/lib64/ | grep cudnn
# Looks like :
#lrwxrwxrwx. 1 root root 13 Jun 20 06:59 libcudnn.so -> libcudnn.so.7
#lrwxrwxrwx. 1 root root 17 Jun 20 06:59 libcudnn.so.7 -> libcudnn.so.7.6.1
#-rwxrwxr-x. 1 root root 390137496 Jun 20 06:04 libcudnn.so.7.6.1
#-rw-rw-r--. 1 root root 389213742 Jun 20 06:04 libcudnn_static.a
Finally, let the machine know that there's a new cuda
in town :
echo '/usr/local/cuda-10.0/lib64' >> /etc/ld.so.conf
ldconfig
Test that it works
Go into the python
within the v3.7 virtualenv
already set up, and :
import torch
#dtype = torch.FloatTensor # Use this to run on CPU
dtype = torch.cuda.FloatTensor # Use this to run on GPU
a = torch.Tensor( [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]).type(dtype)
b = torch.Tensor( [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]]).type(dtype)
c = a.mm(b)
# As usual ...
print(c) # matrix-multiply (should state : device='cuda:0')
print(c.device) # Hope for : 'cuda:0'
# And now tensorflow (has to be second, because it's "greedy")
# See : https://www.tensorflow.org/beta/guide/using_gpu
import tensorflow as tf
tf.debugging.set_log_device_placement(True)
a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]], shape=[2, 3], name='a')
b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Eager mode FTW!
print(c)
print(c.device) # Hope for : /job:localhost/replica:0/task:0/device:GPU:0
Add a few more libraries
(According to taste, etc) :
pip install jupyter