Nvidia for Theano on Fedora 22 laptop

This write-up is for Laptops with 'separate' NVidia graphics cards, and makes use of the Bumblebee Nvidia installation.

For more typical desktop instructions, see the write-up here.

Bumblebee RPMs

The basic RPM installation is as before, in the previous write-up.

CUDA

The CUDA installation should be done from the Nvidia downloads site, as usual.

The end result should be the standard tree of folders :: /usr/local/cuda/{include,bin}.

Source fixes

Now, as root, fix up Nvidia disallowing gcc greater than v4.9...

In file /usr/local/cuda/include/host_config.h, look to make the following replacement :

// #if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ > 9)  // Old version commented out
// This is the updated line, which guards again gcc > 5.1.x instead
#if __GNUC__ > 5 || (__GNUC__ == 5 && __GNUC_MINOR__ > 1)

Now, to set the right path for nvcc, in the user's ~/.bash_profile add ::

export PATH=$PATH:/usr/local/cuda/bin

Test the installation

Check that the kernel modules is there :

sudo optirun lsmod | grep nv
nvidia_uvm             69632  0
nvidia               8380416  28 nvidia_uvm
drm                   331776  7 i915,drm_kms_helper,nvidia

# NB: With no 'optirun'
sudo lsmod | grep nv
# -Nothing-

Looking good:

/usr/local/cuda/bin/nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2014 NVIDIA Corporation
Built on Thu_Jul_17_21:41:27_CDT_2014
Cuda compilation tools, release 6.5, V6.5.12

This works better than the previous desktop ...

optirun nvidia-smi -L
GPU 0: GeForce GT 750M (UUID: GPU-9cabfc96-3f6e-889d-29c5-57057738f794)

(and without the optirun) :

nvidia-smi -L
NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.
Please also try adding directory that contains libnvidia-ml.so to your system PATH.

Installation of `libgpuarray`

Install the bleeding edge libgpuarray into your virtualenv - first compile the .so and .a libraries, and put them in a sensible place :

. env/bin/activate
cd env
git clone https://github.com/Theano/libgpuarray.git
cd libgpuarray
mkdir Build
cd Build
cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/usr \
  -DCUDA_CUDA_LIBRARY=/usr/lib64/nvidia-bumblebee/libcuda.so \
  -DCUDA_INCLUDE_DIRS=/usr/local/cuda/include \
  -DOPENCL_LIBRARIES=/usr/lib64/nvidia-bumblebee/libOpenCL.so \
  -DOPENCL_INCLUDE_DIRS=/usr/local/cuda/include/CL
make
sudo make install

This will likely complain about not finding clBLAS, which isn't a problem here. Although, if you know you will require clBLAS in the future (and this is for advanced/experimental users only), see my OpenCL post, since you need to install this before running cmake above).

Next, install the Python component (after going into the same virtualenv) :

cd env/libgpuarray/
python setup.py build
python setup.py install

And then test it from within a regular user directory (using the same virtualenv) :

optirun python
import pygpu
pygpu.init('cuda0')

A good result is something along the lines of :

<pygpu.gpuarray.GpuContext object at 0x7f1547e79550>

# Errors seen :
#(A) 'cuda'      ::
#  pygpu.gpuarray.GpuArrayException: API not initialized = WEIRD

#(B) 'cuda0'     ::
#  pygpu.gpuarray.GpuArrayException: No CUDA devices available = GO BACK...

#(C) 'opencl0:0' ::
#  RuntimeError: Unsupported kind: opencl (if OpenCL library not found)

Theano stuff

Store the following to a file gpu_check.py :

from theano import function, config, shared, sandbox
import theano.tensor as T
import numpy
import time

vlen = 10 * 30 * 768  # 10 x #cores x # threads per core
iters = 1000

rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], T.exp(x))
print f.maker.fgraph.toposort()
t0 = time.time()
for i in xrange(iters):
    r = f()
t1 = time.time()
print 'Looping %d times took' % iters, t1 - t0, 'seconds'
print 'Result is', r
if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):
    print 'Used the cpu'
else:
    print 'Used the gpu'

And then run, successively :

THEANO_FLAGS=mode=FAST_RUN,floatX=float32,device=cpu  optirun  python gpu_check.py
""" output is ::
[Elemwise{exp,no_inplace}(<TensorType(float32, vector)>)]
Looping 1000 times took 5.44066691399 seconds
Result is [ 1.23178029  1.61879337  1.52278066 ...,  2.20771813  2.29967761  1.62323284]
Used the cpu
"""

and

THEANO_FLAGS=mode=FAST_RUN,floatX=float32,device=gpu  optirun  python gpu_check.py
""" output is ::
Using gpu device 0: GeForce GT 750M
[GpuElemwise{exp,no_inplace}(<CudaNdarrayType(float32, vector)>), HostFromGpu(GpuElemwise{exp,no_inplace}.0)]
Looping 1000 times took 1.06558203697 seconds
Result is [ 1.23178029  1.61879349  1.52278066 ...,  2.20771813  2.29967761
  1.62323296]
Used the gpu
"""

but

THEANO_FLAGS=mode=FAST_RUN,floatX=float32,device=cuda0  optirun  python gpu_check.py
""" output is ::
*FAILURE...*
"""

Check on the usage of GPU / BLAS

TP=`python -c "import os, theano; print os.path.dirname(theano.__file__)"`
THEANO_FLAGS=mode=FAST_RUN,floatX=float32,device=gpu optirun python ${TP}/misc/check_blas.py


Total execution time: 9.38s on CPU (with direct Theano binding to blas).
Total execution time: 0.44s on GPU.
# GPU : 0.44s (GeForce GT 750M)
# CPU : 9.38s (i5-4200U CPU @ 1.60GHz)