- Published on
Nvidia (7.0) installation for Theano on Fedora 22
- Authors
- Name
- Martin Andrews
- @mdda123
Below is "Nvidia's way" (modified to actually work)
Even though Nvidia still hasn't provided RPMs for Fedora 22 (which was launched a couple of months ago as-of this post date, having been in Alpha for 3 months prior), we can fix up their code as it installs.
This write-up is simply a condensed version of Dr Donald Kinghorn's excellent write-up (with which it's probably best to follow along, opened in a separate tab) plus additional instructions concerning the building of Theano.
Set up a scratch directory
As root
:
cd ~ # pwd = /root/
mkdir fedora22-cuda
cd fedora22-cuda/
Nvidia Driver download (for later)
Go to the Nvidia Driver download page, and grab the 76Mb driver package, for installation later...
CUDA Driver download (installed first)
Download the 1Gb CUDA local installer for RHEL7 (1Gb):
CUDA7=http://developer.download.nvidia.com/compute/cuda/7_0
RPMDEB=${CUDA7}/Prod/local_installers/rpmdeb
wget ${RPMDEB}/cuda-repo-rhel7-7-0-local-7.0-28.x86_64.rpm
Install CUDA using Nvidia's repo
cd ~/fedora22-cuda # pwd=/root/fedora-cuda/
dnf install cuda-repo-rhel7-7-0-local-7.0-28.x86_64.rpm
dnf install cuda
Fix the path & library directories globally
echo 'export PATH=$PATH:/usr/local/cuda/bin' >> /etc/profile.d/cuda.sh
ls -l /usr/local/cuda/lib64
echo '/usr/local/cuda/lib64' >> /etc/ld.so.conf.d/cuda.conf
ldconfig
Now install the graphics drivers
To enable the 'DKMS' part of the installer will run, make sure you have the kernel module compilation parts available:
dnf install kernel-devel
Before this point, the Nvidia software has not actually checked for the presence of an Nvidia video card.
Now run the Nvidia installer (look at the notes in this section for answer-hints):
chmod 755 NVIDIA-Linux-x86_64-352.21.run
./NVIDIA-Linux-x86_64-352.21.run
Say "Yes" to the question about registering with DKMS
Say "Yes" to the question about 32-bit libs
It should now compile the NVIDIA kernel modules...
- Say "No" to the question about running nvidia-xconfig!
Now reboot.
Test the installation
To see that your driver is installed and working properly, check that the kernel modules are there :
sudo lsmod | grep nv
# Output::
nvidia_uvm 77824 0
nvidia 8564736 1 nvidia_uvm
drm 331776 4 i915,drm_kms_helper,nvidia
Check on the CUDA compiler:
/usr/local/cuda/bin/nvcc --version
# Output::
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Mon_Feb_16_22:59:02_CST_2015
Cuda compilation tools, release 7.0, V7.0.27
And an actual check on the card itself :
sudo nvidia-smi -L
# Output::
GPU 0: GeForce GTX 760 (UUID: GPU-b8075eeb-56ff-4595-7901-eef770de8296)
gcc
Fix the CUDA headers to accept new Now, as root
, fix up Nvidia's header file that disallows gcc
greater than v4.9
...
In file /usr/local/cuda-7.0/include/host_config.h
, look to make the following replacement :
// #if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ > 9) // Old version commented out
// This is the updated line, which guards again gcc > 5.1.x instead
#if __GNUC__ > 5 || (__GNUC__ == 5 && __GNUC_MINOR__ > 1)
Test the CUDA functionality
As a regular user, compile the CUDA samples from within a clean directory :
cd ~ # for instance
mkdir cuda # for instance
cd cuda
rsync -av /usr/local/cuda/samples .
cd samples/
make -j4
cd bin/x86_64/linux/release/
./deviceQuery
Cleaning up
If everything tests out Ok above, then the /root/fedora22-cuda
directory can be safely deleted.
The Theano Part
libgpuarray
Installation of To install the bleeding edge libgpuarray
into your virtualenv
, first compile the .so
and .a
libraries that the module creates, and put them in a sensible place :
. env/bin/activate
cd env
git clone https://github.com/Theano/libgpuarray.git
cd libgpuarray
mkdir Build
cd Build
cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/usr
make
sudo make install
This will likely complain about not finding clBLAS
, which isn't a problem here. Although, if you know you will require clBLAS
in the future (and this is for advanced/experimental users only), see my OpenCL post, since you need to install this before running cmake
above).
It may also complain about :
runtime library [libOpenCL.so.1] in /usr/lib64 may be hidden by files in:
/usr/local/cuda/lib64
This won't affect the CUDA functionality (its impact on OpenCL is still TBD).
Next, install the Python component (after going into the same virtualenv
) :
cd env/libgpuarray/
python setup.py build
python setup.py install
And then test it from within a regular user directory (using the same virtualenv
) :
python
import pygpu
pygpu.init('cuda0')
A good result is something along the lines of :
<pygpu.gpuarray.GpuContext object at 0x7f1547e79550>
# Errors seen :
#(A) 'cuda' ::
# pygpu.gpuarray.GpuArrayException: API not initialized = WEIRD
#(B) 'cuda0' ::
# pygpu.gpuarray.GpuArrayException: No CUDA devices available = GO BACK...
#(C) 'opencl0:0' ::
# RuntimeError: Unsupported kind: opencl (if OpenCL library not found)
Theano stuff
Store the following to a file gpu_check.py
:
from theano import function, config, shared, sandbox
import theano.tensor as T
import numpy
import time
vlen = 10 * 30 * 768 # 10 x #cores x # threads per core
iters = 1000
rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], T.exp(x))
print f.maker.fgraph.toposort()
t0 = time.time()
for i in xrange(iters):
r = f()
t1 = time.time()
print 'Looping %d times took' % iters, t1 - t0, 'seconds'
print 'Result is', r
if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):
print 'Used the cpu'
else:
print 'Used the gpu'
And then run, successively :
THEANO_FLAGS=mode=FAST_RUN,floatX=float32,device=cpu python gpu_check.py
""" output is ::
[Elemwise{exp,no_inplace}(<TensorType(float32, vector)>)]
Looping 1000 times took 3.35117197037 seconds
Result is [ 1.23178029 1.61879337 1.52278066 ..., 2.20771813 2.29967761 1.62323284]
Used the cpu
"""
and
THEANO_FLAGS=mode=FAST_RUN,floatX=float32,device=gpu python gpu_check.py
""" output is ::
Using gpu device 0: GeForce GTX 760
[GpuElemwise{exp,no_inplace}(<CudaNdarrayType(float32, vector)>), HostFromGpu(GpuElemwise{exp,no_inplace}.0)]
Looping 1000 times took 0.339042901993 seconds
Result is [ 1.23178029 1.61879349 1.52278066 ..., 2.20771813 2.29967761 1.62323296]
Used the gpu
"""
but
THEANO_FLAGS=mode=FAST_RUN,floatX=float32,device=cuda0 python gpu_check.py
""" output is ::
*FAILURE...*
"""
Check on the usage of GPU / BLAS
TP=`python -c "import os, theano; print os.path.dirname(theano.__file__)"`
THEANO_FLAGS=mode=FAST_RUN,floatX=float32,device=gpu python ${TP}/misc/check_blas.py
# GPU : 0.14s (GeForce GTX 760)
# CPU : 5.72s (i7-4770 CPU @ 3.40GHz)
OpenCL stuff (for another day)
dnf -y install clinfo ocl-icd opencl-tools