- Published on
Nvidia (11.3) for TensorFlow & PyTorch while upgrading Fedora 34
- Authors
- Name
- Martin Andrews
- @mdda123
Upgrading to Fedora 34 from Fedora 32 (or Fedora 33)
cuda
versions)
(while being careful about Suppose you already have a working Fedora 32 (or 33) machine, and want to upgrade : As always, the Nvidia drivers are going to be painful!
Once again, the negativo
repo wants to keep Fedora too up-to-data (latest cuda
is v11.4) for current builds of TF and PyTorch.
Fortunately, cuda
v11.3 works (confirmed) for TF and PyTorch, so we can use the negativo
repo for everything, if we are careful how it gets installed.
Install the negativo repo
Get the negativo Nvidia repo if you haven't already :
dnf config-manager --add-repo=http://negativo17.org/repos/fedora-nvidia.repo
And (only if you need it for display ahead of the upgrade) install the nvidia driver :
dnf install nvidia-driver
cuda
ahead of the upgrade
Remove NB: Assuming we're still on Fedora 32 (or 33).
If you're currently below Fedora 32, you'll need to upgrade to Fedora 32 before going further - it seems like more than 2 steps is too big a gap before FC34.
Remove the current cuda
(since the upgrade process will try and upgrade to the latest, which is 'too far'):
dnf remove nvidia-driver-cuda nvidia-driver-cuda-libs cuda cuda-cudnn
Standard Fedora 34 upgrade steps
dnf install dnf-plugin-system-upgrade --best
# Large download (to latest Fedora 32) :
dnf --refresh upgrade
shutdown -r now
# Reboots back into latest Fedora 32
# Very Large download (to latest Fedora 34)
dnf system-upgrade download --refresh --releasever=34
# Hold-your-breath reboot (will install 0%-100% hands-free ~45mins) :
dnf system-upgrade reboot
# Reboots via Fedora 32 into 'upgrade counter'
# Then reboots into new Fedora 34
cuda
after the upgrade
Add back the correct See the rpm package lists to see where the version number information came from.
Install the correct cuda
packages:
dnf install nvidia-driver-cuda # includes nvidia-smi
dnf install cuda-11.3.0 cuda-devel-11.3.0
dnf install cuda-cudnn-devel-8.2.0.53-1.fc34.x86_64
Now you should be able to check :
rpm -qa | grep cuda
nvidia-smi
Surprisingly, even though the rpms should all be 11.3
, the nvidia-smi
mentions 11.4
at the top. This seems normal.
cuda
getting upgraded by mistake
Stop To /etc/dnf/dnf.conf
add:
exclude=cuda*
Python installation using virtual environments
The basic virtual environment creation (NB: the old environment won't work any more, so move it away to ~/DELETE-ME_env38
, for instance) :
python3.9 -m venv env39
. env39/bin/activate
pip install --upgrade pip
Add TensorFlow (simple, for once!)
Still in env39
:
pip install tensorflow
And test out the installation using a Python CLI instance:
import tensorflow as tf
tf.test.gpu_device_name() # Deprecated, sadly
# NVIDIA GeForce GTX TITAN X, pci bus id: 0000:01:00.0, compute capability: 5.2
tf.config.list_physical_devices('GPU')
# [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
tf.debugging.set_log_device_placement(True)
a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]], shape=[2, 3], name='a')
b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]], shape=[3, 2], name='b')
c = tf.matmul(a, b)
print(c.device) # Hope for : /job:localhost/replica:0/task:0/device:GPU:0
Add PyTorch (more intricate)
For PyTorch, we must resort to messing with the cuda
versions, though this should work, since Nvidia talks about backward compatibility post 11.1
(and this installation method does seem to work) :
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
And test out the installation using a Python CLI instance:
import torch
torch.cuda.is_available()
# True
#dtype = torch.FloatTensor # Use this to run on CPU
dtype = torch.cuda.FloatTensor # Use this to run on GPU
a = torch.Tensor( [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]).type(dtype)
b = torch.Tensor( [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]]).type(dtype)
c = a.mm(b)
print(c) # matrix-multiply (should state : device='cuda:0')
print(c.device) # Hope for : 'cuda:0'
And add a finicky extra, if needed:
pip install pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py39_cu111_pyt190/download.html
virtualenv
installs
Useful extra ML # Essentials
pip install jupyter matplotlib
# Good for removing cell outputs prior to git commits
pip install nbstripout
# Could be handy
pip install opencv-python ffmpeg-python
NB: Crypto changes during upgrade
Fedora strengthened its cryptographic key standards in the move to Fedora 33 (and beyond), which essentially means that the default key type previously generated (ssh-rsa
) isn't acceptable any more.
See this explanation for details.
This means that any key exchange with an upstream server where you have created an authorized_user
using a ssh-rsa
key won't work any more unless you :
- Either downgrade the certificate exchange process on your local (upgraded) machine;
- Or:
- Create a new (better) certificate on your local machine; and
- Upload it to the server
The simplest way to do this is to first create a better certificate locally (once):
ssh-keygen -t ed25519 -a 64
And then temporarily 'backtrack' locally to then copy the new id up to each server that needs it (in the case that you don't want to type in the user password, or cannot...) :
# as root :
update-crypto-policies --set LEGACY
# as user :
ssh-copy-id -i ~/.ssh/id_ed25519.pub user@example.com
#... more as required ...
# as root :
update-crypto-policies --set DEFAULT
This leaves the local system with 'modern' defaults again, while giving the server the option of a stronger key.
Later on, I guess, we should update the server(s) which accept older keys to exclude ssh-rsa
keys too.
End
All done!