- Published on
Upgrade to Fedora 38 with cuda-12.4
- Authors
- Name
- Martin Andrews
- @mdda123
Upgrading to Fedora 38 from Fedora 36 with Nvidia cuda-12.4
NB: Here we're emphasising being careful about cuda
versions - in particular we want to get cuda-12-4
, since that's better for lamma.cpp
since the Fedora gcc
versions are higher than expected by previous versions of cuda
.
Standard Fedora 36 → 38 upgrade steps
dnf install dnf-plugin-system-upgrade --best
# Large download (to latest Fedora 36) :
dnf upgrade --refresh
# Takes several minutes, depending on whether you update regularly
shutdown -r now
dnf system-upgrade download --releasever=38
# Takes 30mins?
dnf system-upgrade reboot
# Takes 30mins?
# Collect useful stats:
uname -r;
rpm -q --queryformat '%{name}\t: %{version}\n' xorg-x11-drv-nvidia;
rpm -q --queryformat '%{name}\t\t\t: %{version}\n' cuda;
rpm -q --queryformat 'cudnn\t\t\t: %{version}\n' libcudnn8;
rpm -q --queryformat '%{name}\t: %{version}\n' google-chrome-stable;
nvidia-smi
# If this gives nice output : Then we are done
# NB: It might say '12.2' as the cuda version at the top
# But the installed rpm is (likely 11.8) : Which *seems* to match reality
cuda
library installed...
If there's no # NB: This is for a later version of Fedora (39) but is confirmed to work
REPO_BASE=https://developer.download.nvidia.com/compute/cuda/repos
dnf config-manager \
--add-repo ${REPO_BASE}/fedora39/x86_64/cuda-fedora39.repo
dnf install cuda # We get 12.4
# Clean up previous cuda versions according to taste:
dnf remove cuda-11-7
dnf remove cuda-11-8
dnf remove cuda-12-0
dnf remove cuda-12-1
dnf remove cuda-12-3
# The cuda should show up with the stats lines:
rpm -q --queryformat '%{name}\t: %{version}\n' xorg-x11-drv-nvidia;
rpm -q --queryformat '%{name}\t\t\t: %{version}\n' cuda;
rpm -q --queryformat 'cudnn\t\t\t: %{version}\n' libcudnn8;
nvidia-smi
# If this gives nice output : Then we are done - look at PyTorch install next
Installing & Testing PyTorch
Do this once cuda
is installed and working according to nvidia-smi
. If the previous part didn't work, skip this section and continue below : Come back here later!
We also want PyTorch installed, which requires PyTorch with cu124
:
python -mvenv env311
. env311/bin/activate
pip install -U pip
pip install --pre torch torchvision torchaudio \
--index-url https://download.pytorch.org/whl/nightly/cu124
# And now test it:
python
>>> import torch
>>> #... no error displayed
Supposing PyTorch now works : All done!
Relink the kernel module...
(continue here if the nvidia-smi
looks bad above)
# If nvidia-smi is failing :
lsmod | grep nv
# Probably empty (except for i2c_nvidia_gpu)
journalctl -b | grep nvidia
# Check the journal kernel lines mention stuff like : modprobe.blacklist=nouveau
Apparently, this may be a something that doesn't work quite right in what akmods
produces:
depmod -a
shutdown -r now
nvidia-smi
# If this gives nice output : Then we are done - look at PyTorch install above
Rebuild the kernel module...
Apparently, normal updating does give enough time for akmods to complete, so let's do it manually here.
NV_KMOD=`rpm -qa | grep kmod | grep $(uname -r)`
echo $NV_KMOD
dnf remove $NV_KMOD
akmods --force --kernels $(uname -r)
# Takes a couple of minutes
shutdown -r now
nvidia-smi
# If this gives nice output : Then we are done - look at PyTorch install above
If the journal still mentions starting 'nvidia-fallback.service' ...
Perhaps the service is falling back to nouveau
in before the nvidia
module loads properly:
systemctl disable nvidia-fallback.service
systemctl mask nvidia-fallback.service
shutdown -r now
nvidia-smi
# If this gives nice output : Then we are done - look at PyTorch install above
All done!