Published on

Nvidia (7.0) installation for Theano on Fedora 23


Extra Fix required for Nvidia NVCC to work under Fedora 23

Fedora 23 changed a default gcc ABI setting from that which was used in Fedora 22. The previous value was chosen so that the default compiler behaviour matched earlier versions of gcc as closely as possible, so this change actually broke systems that want the earlier behaviour : In our case, Nvidia's tool chain.

This has to be fixed by adding additional options into the NVCC invocations. Doing this is not so easy, since Theano (ideally) insulates the programmer from these kinds of details.

The following instructions build upon the previous Fedora 22 version.

Fix the CUDA headers to accept new gcc (5.3.1 for Fedora 23)

Now, as root, fix up Nvidia's header file that disallows gcc greater than v4.9...

In file /usr/local/cuda-7.0/include/host_config.h, look to make the following replacement :

// #if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ > 9)  // Old version commented out
// This is the updated line, which guards again gcc > 5.4.x instead
#if __GNUC__ > 5 || (__GNUC__ == 5 && __GNUC_MINOR__ > 4)

Prove that it's still broken

As a regular user, let's try out one of the CUDA samples from within a clean directory (and then fix it):

cd ~        # for instance
mkdir cuda  # for instance
cd cuda
rsync -av /usr/local/cuda/samples .
cd samples/
cd 0_Simple/asyncAPI/

... this will FAIL, with a nasty looking message like these ::

/usr/include/c++/5.3.1/bits/locale_classes.tcc:283:23: error: template argument required for ‘class collate_byname’
/usr/include/c++/5.3.1/bits/locale_classes.tcc:285:23: error: ‘collate’ is not a template function
/usr/include/c++/5.3.1/bits/locale_classes.tcc:285:30: error: expected ‘;’ before ‘<’ token
/usr/include/c++/5.3.1/bits/locale_classes.tcc:289:22: error: parse error in template argument list
/usr/include/c++/5.3.1/bits/locale_classes.tcc:289:22: error: template-id ‘has_facet<<expression error> >for ‘bool std::has_facet(const std::locale&)’ does not match any template declaration

Fix the CUDA functionality

Edit the Makefile within that directory, adding the -D_GLIBCXX_USE_CXX11_ABI=0 line :


# Add the following ::

ALL_CCFLAGS += $(addprefix -Xcompiler ,$(CCFLAGS))
ALL_CCFLAGS += $(addprefix -Xcompiler ,$(EXTRA_CCFLAGS))

then, running make again should work without complaint, and running ./asyncAPI should produce results like ::

[./asyncAPI] - Starting...
GPU Device 0: "GeForce GTX TITAN X" with compute capability 5.2

CUDA device [GeForce GTX TITAN X]
time spent executing by the GPU: 11.26
time spent by CPU in CUDA calls: 0.02
CPU executed 48864 iterations while waiting for GPU to finish

If that works, then we can move on to fixing the issue within Theano...

Theano stuff - command line

Using the same as given in the Fedora 22 instructions, the following command-line should FAIL :

THEANO_FLAGS=mode=FAST_RUN,floatX=float32,device=gpu   python
""" output is (among other stuff) ::

This should work, though, if we supply an additional flag related to NVCC (for when it is invoked deep inside Theano) :

THEANO_FLAGS=mode=FAST_RUN,floatX=float32,device=gpu,nvcc.flags='-D_GLIBCXX_USE_CXX11_ABI=0'   python
""" output is ::
Using gpu device 0: GeForce GTX 760
[GpuElemwise{exp,no_inplace}(<CudaNdarrayType(float32, vector)>), HostFromGpu(GpuElemwise{exp,no_inplace}.0)]
Looping 1000 times took 0.339042901993 seconds
Result is [ 1.23178029  1.61879349  1.52278066 ...,  2.20771813  2.29967761 1.62323296]
Used the gpu

Theano stuff - within a program

To achieve the same effect via program code within a module that uses Theano, add an extra line after the standard Python import 'preamble' :

import theano
from theano import tensor

floatX = theano.config.floatX = 'float32'

# Required for fedora 23 compilation of NVCC code...
theano.config.nvcc.flags = '-D_GLIBCXX_USE_CXX11_ABI=0'

Theano stuff - in the ~/.theanorc file

To achieve the same effect using your .theanorc file, add the following section (the config file format doesn't seem to object to the second '='):