GPUEater/GPU driver/DeepLearning library installation manual

We support developers and researchers who want to change the world by using their cutting-edge A.I. technologies and ideas.

Notice

NVIDIA GPU deriver version

Warning: We recommend NVIDIA GPU-based instance users to choose 410.48 of NVIDIA driver.
We currently don't support other NVIDIA driver versions. Therefore, the users who didn't install the 410.48 driver, please run the following command.

sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
echo "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64 /" | sudo tee /etc/apt/sources.list.d/cuda.list
sudo apt update
sudo apt install -f nvidia-410 nvidia-410-dev libcuda1-410 nvidia-modprobe nvidia-settings nvidia-opencl-icd-410
sudo mkdir /usr/lib/nvidia
wget https://developer.nvidia.com/compute/cuda/9.0/Prod/local_installers/cuda_9.0.176_384.81_linux-run
sudo sh cuda_9.0.176_384.81_linux-run # Graphics Driver=No, CUDA 9.0 Toolkit=Yes, Sample=No,

sudo apt install -y libcupti-dev

Getting started

Account registration

1. Goto the following url. https://www.gpueater.com/signup

2. Fill in your email address and password. Then click the “Sign up” button.

3. Open your email box. You’d able to see an email from us, the title is “GPUEaeter: Account Registration”. If you can’t find out the email, then check your spam mail folder just in case.

4. Account Registration. Click the attached link in the email to be completed your account registration.

5. Your registration has finished!

Launch an instance

1. Goto the following url and do login.https://www.gpueater.com/login

2. Payment method - Add your credit Card. Enter your credit card information.

3. Billing details. Enter your billing details.

4. Register your ssh key. Otherwise, generate a new key following the instructions.

5. Launch an on-demand instance. Chose an instance and select OS version and ssh key, then click Launch button.

6. Completed. Congratulations! Your instance has been launched!

Access your GPU instance (MacOS/Linux)

Your private key must be stored in your ~/.ssh/ folder and set permissions as 600.

Goto https://www.gpueater.com/console/servers

1. Click an IP address that you’d like to connect. Then an ssh command will be copied to your click board automatically.

ssh root@xxx.xxx.xxx.xxx -p 22 -i ~/.ssh/guest.pem -o ServerAliveInterval=10

2. Launch a terminal and paste the command there and click return key.

3. Congratulations! Your terminal is connecting the instance.

Instance operation

IPv4 Renew and Port forwarding

Goto https://www.gpueater.com/console/servers

If you require changing network setting, click “advanced” button at the right side end. And...

a. Change Port Forward setting: You can add and delete Port forward settings.

b. Renew IPv4: Click the “Renew” button. The new IPv4 address will be assigned within a few minutes.

Stop / Start / Restart / Terminate

Goto https://www.gpueater.com/console/servers

Instance Stop / Start / Restart

Select an instance and click Start, Stop and/or Restart button.

Instance Terminate

Select an instance and click Terminate button.

*RUNNING/STOPPED status will be charged. INITIALIZING/ERROR/TERMINATED status will be not charged.

GPUEater API Console ( Detail )

Version
Install
8.0+
npm install gpueater
2.7+
pip install gpueater
3.5+
pip3 install gpueater
2.0+
gem install gpueater
-
curl -sL http://install.aieater.com/setup_gpueater | bash -
-
curl -sL http://install.aieater.com/setup_gpueater | bash -
Documents : https://github.com/aieater

ROCm Docker (not recommended) ( Detail )

- Pull image
docker pull gpueater/rocm-tensorflow-1.8
- Run a container with GPU driver file descriptor
docker run -it --device=/dev/kfd --device=/dev/dri --group-add video gpueater/rocm-tensorflow-1.8

Documents : https://hub.docker.com/r/gpueater/rocm-tensorflow-1.8/

Account operation

Credit card update

Goto https://www.gpueater.com/console/account

Change default credit card: Click “Default” button on the credit card.

Add a new credit card: Enter a credit card number, expire month, CVC and click “Add Card” Button.

Delete your a card: Click “Delete” button on the credit card.

Deep Learning

Overview

AMDGPU cannot use CUDA and cuDNN system, so we have to use HIP/MIOpen instead them. Surprisingly, AMD has began to provide an open source computing driver to OSS community from 2015. This is a very wonderful thing, and you can realize to use TensorFlow on any Radeon processors.

Library stack is,

Keras or something

TensorFlow

MIOpen(CUDA simulation layer)

ROCm(AMDGPU graphics driver)

OS

Native GPU

Reference: https://github.com/RadeonOpenCompute/ROCm

For now, MIOpen and ROCm support TensowFlow 1.14+, and almost DeepLearning models are working. AlexNet, VGG, GoogLeNet(Inception), RNN, LSTM, YoloSeries, M2Det, CenterNet, CycleGAN, FCN, ICNet and so on. We made sure working on Ubuntu16.04. If you want to use other Linux distributions, you have to investigate how to set up on it.

Simple installer for ROCm-TensorFlow

We are providing a simple installer. This version support Python3.5 and 2.7.

For Python2.7
curl -sL http://install.aieater.com/setup_rocm_tensorflow_p2 | bash - 
For Python3.5
curl -sL http://install.aieater.com/setup_rocm_tensorflow_p3 | bash - 

This way will almost installed all packages automatically.
And popular libraries also are supported like OpenCV, VideoEncoders, Cython and Pillow image library. If you want to see the installer instructions, click here.

How to get GPU memory usage on rocm-smi

If you wanted to get GPU memory usage on rocm-smi, replace to patched rocm-smi.
Follow this command,

johndoe@gpueater.local:~$ curl -O http://install.aieater.com/gpueater/rocm/gpueater-smi
	  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
					 Dload  Upload   Total   Spent    Left  Speed
	100 45447  100 45447    0     0  1643k      0 --:--:-- --:--:-- --:--:-- 1643k

	johndoe@gpueater.local:~$ ./gpueater-smi


	====================    ROCm System Management Interface    ====================
	================================================================================
	 GPU  Temp    AvgPwr   SCLK     MCLK     Fan      Perf    SCLK OD    MCLK OD  USED MEM
	  0   48c     4.0W     852Mhz   167Mhz   35.69%   auto      0%         0%       7619MB
	================================================================================
	====================           End of ROCm SMI Log          ====================

	johndoe@gpueater.local:~$ mv gpueater-smi `which rocm-smi`
	
You can see GPU memory on "USED MEM" column.

Python3 and basic package installation

Install python"3" and install basic package for machine learning.
Chainer library is required a 'libhdf5-dev', OpenCV3 package is required a pkg-config and cmake.
Learn more SciPy reference, if you want to install a numpy with OpenBLAS.

sudo apt update
	sudo apt install -y build-essential python3 python3-dev python3-pip pkg-config check cmake libhdf5-dev
	sudo pip3 install --upgrade pip
	sudo pip3 install -y setuptools scipy numpy six pillow h5py

CUDA SDK installation on Ubuntu16.04

1. At first, you must stop a lightdm service. Because if you still enabled a lightdm service, NVIDIA driver will be failed to install.

sudo lightdm stop
	echo "manual" | sudo tee -a /etc/init/lightdm.override
	sudo systemctl stop lightdm
	sudo systemctl disable lightdm
	sudo service lightdm stop

2. Remove old packages.

sudo apt remove -y --purge nvidia-*
	sudo apt remove -y --purge cuda*
	sudo apt autoremove -y
	sudo apt --purge remove && sudo apt autoclean
	sudo unlink /usr/local/cuda
	sudo rm -rf /usr/local/cuda*

3. Install a CUDA SDK 8 (not 9). *If you want to install SDK for a binary of other Linux distribution, see CUDA reference.

mkdir ~/src
	cd ~/src
	curl -O http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
	sudo dpkg -i cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
	sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
	sudo apt update
	sudo apt install -y cuda-8-0 # Almost DeepLearning libraries are not supported CUDA9.x now(in 2017 Oct).
	sudo apt install -y libcupti-dev

4. Make a symbolic link for CUDA.

sudo unlink /usr/local/cuda
	sudo ln -s /usr/local/cuda-8.0 /usr/local/cuda

5. Setup environments. Copy and paste below code to "~/.profile" or "~/.bash_profile".

export PATH="$HOME/bin:$HOME/.local/bin:$PATH:/usr/local/bin:/usr/local/cuda/bin"
	export CUDA_HOME=/usr/local/cuda
	export CUDA_PATH=/usr/local/cuda
	export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CUDA_PATH/lib64:/usr/local/lib:/usr/lib/x86_64-linux-gnu
	export LIBRARY_PATH=$LIBRARY_PATH:$CUDA_PATH/lib64:/usr/local/lib:/usr/lib/x86_64-linux-gnu
	export C_INCLUDE_PATH=$C_INCLUDE_PATH:/usr/local/cuda/targets/x86_64-linux/include
	export CPATH=$CPATH:/usr/local/cuda/targets/x86_64-linux/include
	export CPLUS_INCLUDE_PATH=$CPLUS_INCLUDE_PATH:/usr/local/cuda/targets/x86_64-linux/include
	export LANG=en_US.UTF-8
	export LC_ALL=en_US.UTF-8

6. Make sure commands.

johndoe@h001:~$ source ~/.profile
	johndoe@h001:~$ nvidia-smi
	Wed Oct 18 00:54:06 2017
	+-----------------------------------------------------------------------------+
	| NVIDIA-SMI 384.81                 Driver Version: 384.81                    |
	|-------------------------------+----------------------+----------------------+
	| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
	| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
	|===============================+======================+======================|
	|   0  Tesla V100-SXM2...  Off  | 00000000:01:00.0 Off |                  N/A |
	| N/A   39C    P0    35W / 300W |      0MiB / 16152MiB |      0%      Default |
	+-------------------------------+----------------------+----------------------+
	|   1  Tesla V100-SXM2...  Off  | 00000000:02:00.0 Off |                  N/A |
	| N/A   37C    P0    35W / 300W |      0MiB / 16152MiB |      0%      Default |
	+-------------------------------+----------------------+----------------------+

	+-----------------------------------------------------------------------------+
	| Processes:                                                       GPU Memory |
	|  GPU       PID   Type   Process name                             Usage      |
	|=============================================================================|
	|  No running processes found                                                 |
	+-----------------------------------------------------------------------------+

	johndoe@h001:~$ nvcc --version
	nvcc: NVIDIA (R) Cuda compiler driver
	Copyright (c) 2005-2016 NVIDIA Corporation
	Built on Tue_Jan_10_13:22:03_CST_2017
	Cuda compilation tools, release 8.0, V8.0.61

	

How to setup a cuDNN

Almost DeepLearning libraries are required a cuDNN. However, NVIDIA does not provide easy installation method via apt, yum due to license.
So, unfortunately, we could not provide automated installation way due to cuDNN license.

Please setup a cuDNN(libcudnn6-dev) your self from NVIDIA cuDNN site.

  1. Register NVIDIA account.
  2. Download cuDNN v6.0 binary or deb(in case of Ubuntu). *YOU MUST DOWNLOAD TWO CUDNN PACKAGES FOR "RUNTIME" AND "DEVELOPMENT"!
  3. Extract binary or install via apt.

You have to download cuDNN runtime and development version. Because runtime version archive does not have a "cudnn.h".
Below commands are installation example.
*("libcudnn6_6.0.21-1+cuda8.0_amd64.deb","libcudnn6-dev_6.0.21-1+cuda8.0_amd64.deb" is already downloaded cuDNN package for Ubuntu)

sudo dpkg -i libcudnn6_6.0.21-1+cuda8.0_amd64.deb   # Runtime
	sudo dpkg -i libcudnn6-dev_6.0.21-1+cuda8.0_amd64.deb   # Development (for cudnn.h)

Make sure cuDNN library in /usr/.

root@instance:/usr# find /usr/* -name "cudnn.h"
	/usr/include/cudnn.h
root@instance:/usr# find /usr/* -name "libcudnn*"
	/usr/lib/x86_64-linux-gnu/libcudnn.so.6
	/usr/lib/x86_64-linux-gnu/libcudnn.so.6.0.21
	/usr/lib/x86_64-linux-gnu/libcudnn.so
	/usr/lib/x86_64-linux-gnu/libcudnn_static_v6.a
	/usr/lib/x86_64-linux-gnu/libcudnn_static.a
	/usr/share/doc/libcudnn6-dev
	/usr/share/doc/libcudnn6
	/usr/share/lintian/overrides/libcudnn6-dev
	/usr/share/lintian/overrides/libcudnn6

cuDNN old package installation

If you need a cuDNN old version(v5.1), download a cuDNN v5.1 Linux tar.gz archive from NVIDIA cuDNN site.

Uninstall libcudnn packages.

apt remove -y libcudnn*

"cudnn-8.0-linux-x64-v5.1.tgz" is already downloaded cuDNN archive.

tar zxvf cudnn-8.0-linux-x64-v5.1.tgz
	sudo cp cuda/include/* /usr/local/cuda/include/
	sudo cp cuda/lib64/* /usr/local/cuda/lib64/

Make sure installed packages.

root@instance:~# find /usr/* -name "cudnn.h"
	/usr/local/cuda-8.0/targets/x86_64-linux/include/cudnn.h
root@instance:~# find /usr/* -name "libcudnn*"
	/usr/local/cuda-8.0/targets/x86_64-linux/lib/libcudnn_static.a
	/usr/local/cuda-8.0/targets/x86_64-linux/lib/libcudnn.so.5
	/usr/local/cuda-8.0/targets/x86_64-linux/lib/libcudnn.so.5.1.10
	/usr/local/cuda-8.0/targets/x86_64-linux/lib/libcudnn.so

TensorFlow installation

Please make sure bash environments, and CUDA/cuDNN SDK before install.
*Currently TensorFlow does not support the cuDNN5.1. You must prepare the cuDNN6.

TensorFlow installation is straightforward, and we recommend based on python3. TensorFlow still doesn't support the CUDA SDK 9.x as stable now(at 2017 Oct). So you need to install the CUDA8 library. If you already installed the SDK9 as "apt install cuda", remove that and reinstall the SDK8 like this "apt install cuda-8-0". Otherwise, you will get an error when you import the TensorFlow in python.

sudo pip3 install --upgrade tensorflow-gpu

Make sure to work TensorFlow library in python3.

python3 -c "import tensorflow"

Learn more ... https://www.tensorflow.org/tutorials/using_gpu

Theano installation

Please make sure bash environments, and CUDA/cuDNN SDK before install.

Theano installation is desperate for a beginner. This library has a long history more than other libraries, and those stacks are very fearless and complicated now.

1. Install basic packages.

sudo apt install -y check cmake mpich libopenblas-dev
	sudo pip3 install mpi4py cython numpy scipy

2. Install "NCCL 2.x product version" for multi node, multi GPU communication from NVIDIA git.

Visit to https://developer.nvidia.com/nccl and then login, and Download NCCL2 for CUDA8+Ubuntu 16.04.

sudo dpkg -i nccl-repo-ubuntu1604-2.0.5-ga-cuda8.0_2-1_amd64.deb # downloaded binary
	sudo apt update
	sudo apt install -y libnccl2 libnccl-dev
	

Learn more NCCL2 ... http://docs.nvidia.com/deeplearning/sdk/nccl-install-guide/index.html

3. Install a "libgpuarray" from source. This version is required cuDNN v5.1(old). If you already installed cuDNN v6.0, uninstall that and install v5.1. Please see cuDNN old package installation.

curl -O -L https://github.com/Theano/libgpuarray/archive/v0.6.9.tar.gz
	tar zxvf v0.6.9.tar.gz
	cd libgpuarray-0.6.9
	mkdir build ; cd build
	cmake .. -DCMAKE_BUILD_TYPE=Release
	make -j
	sudo make install
	cd ..
	python3 setup.py build
	sudo python3 setup.py install
	sudo ldconfig

4. Make sure to work. *Notice: If you still in libgpuarray build directory, you will get a import error.

cd ~/; DEVICE="cuda" python3 -c "import pygpu;pygpu.test()"
root@instance:~# cd ~/; DEVICE="cuda" python3 -c "import pygpu;pygpu.test()"
	pygpu is installed in /usr/local/lib/python3.5/dist-packages/pygpu-0.7.4+2.g6710220-py3.5-linux-x86_64.egg/pygpu
	NumPy version 1.13.3
	NumPy relaxed strides checking option: True
	NumPy is installed in /usr/local/lib/python3.5/dist-packages/numpy
	Python version 3.5.2 (default, Sep 14 2017, 22:51:06) [GCC 5.4.0 20160609]
	nose version 1.3.7
	*** Testing for Tesla V100-SXM2...
	mpi4py found: True
	.............................................................................................................................
	.............................................................................................................................
	................................*** Collectives testing for Tesla V100-SXM2..................................................
	.............................................................................................................................
	.............................................................................................................................
	.............................................................................................................................
	----------------------------------------------------------------------
	Ran 7253 tests in 127.257s

	OK
	

5. Finally, install Theano.

sudo pip3 install theano

6. Make sure to work. This code is the sample as theano_test.py from official page.

print("import pygpu")
	import pygpu
	print("import theano")
	from theano import function, config, shared, tensor
	import numpy
	import time

	vlen = 10 * 30 * 768  # 10 x #cores x # threads per core
	iters = 1000

	rng = numpy.random.RandomState(22)
	x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
	f = function([], tensor.exp(x))
	print(f.maker.fgraph.toposort())
	t0 = time.time()
	for i in range(iters):
		r = f()

	t1 = time.time()
	print("Looping %d times took %f seconds" % (iters, t1 - t0))
	print("Result is %s" % (r,))
	if numpy.any([isinstance(x.op, tensor.Elemwise) and
		      ('Gpu' not in type(x.op).__name__)
		      for x in f.maker.fgraph.toposort()]):
	    print('Used the cpu')
	else:
	    print('Used the gpu')

	

And execute theano_test.py.

root@instance:~# THEANO_FLAGS=device=cuda0 python3 ./theano_test.py
	import pygpu
	import theano
	Using cuDNN version 5110 on context None
	Mapped name None to device cuda0: Tesla V100-SXM2... (0000:02:00.0)
	[GpuElemwise{exp,no_inplace}((float64, (False,))>), HostFromGpu(gpuarray)(GpuElemwise{exp,no_inplace}.0)]
	Looping 1000 times took 0.385726 seconds
	Result is [ 1.23178032  1.61879341  1.52278065 ...,  2.20771815  2.29967753
	  1.62323285]
	Used the gpu
	

If you created theano environment file to ~/.theanorc, theano will get environment values the file preferentially.

Learn more Theano installation documents. ... http://deeplearning.net/software/theano/install_ubuntu.html

Keras installation

Please make sure bash environments, and CUDA/cuDNN SDK before install.

Keras library is wrapper library for TensorFlow or Theano. TensorFlow and Theano are very low-level APIs for linear algebra. Those relationships are similar to Unity and OpenGL/DirectX. Most developers are using Unity, and don't use low-level API like OpenGL/DirectX. In DeepLearning, it is also same. Many people are using high-level layer library like the Keras, the Chainer and the MxNet.

Keras must select a DeepLearning low-level library in TensorFlow, CNTK, or Theano.

At first, see Theano installation or TensorFlow installation.

Follow command to install.

sudo pip3 install keras

At first, Keras will use a backend as TensorFlow. So you should change to Theano in ~/.keras/keras.json. That json file is created by keras installer automatically.

vi ~/.keras/keras.json

The default setting is like this.

{
	    "image_data_format": "channels_last",
	    "backend": "tensorflow",
	    "floatx": "float32",
	    "epsilon": 1e-07
	}

Change a backend setting like this.

{
	    "image_data_format": "channels_last",
	    "backend": "theano",
	    "floatx": "float32",
	    "epsilon": 1e-07
	}

Make sure a backend.

root@instance:~# python3 -c "import keras"
	Using Theano backend.

Learn more ... https://keras.io/

Chainer installation

Please make sure bash environments, and CUDA/cuDNN SDK before install.

Chainer installation is also easy. But sometimes we got errors at installation for cupy due to pip cache. You should clean up a cache or use a no-cache command. In particular, you have to care at the reinstall.

Follow commands to install.

sudo pip3 install cupy --no-cache-dir -vvvv
	sudo pip3 install chainer --no-cache-dir -vvvv

If you need to serialize your model from storage, install h5py. And install a pillow image library if you want to recognize images.

sudo pip3 install h5py pillow

Finally, make sure to work with GPU.

python3 -c "from chainer import cuda;cuda.check_cuda_available();"

Example CIFAR10.

cd ~/
	git clonehttps://github.com/chainer/chainer.git
	cd ~/chainer/examples/cifar
	python3 train_cifar.py
root@instance:~/chainer/examples/cifar# python3 train_cifar.py
	GPU: 0
	# Minibatch-size: 64
	# epoch: 300

	Using CIFAR10 dataset.
	epoch       main/loss   validation/main/loss  main/accuracy  validation/main/accuracy  elapsed_time
	1           2.32365     1.93552               0.177709       0.218352                  45.9733
	2           1.76847     1.67501               0.301476       0.345541                  85.4524
	3           1.44913     1.25012               0.455926       0.536425                  125.684
	4           1.19285     1.03128               0.579605       0.635052                  166.33
	5           1.02002     1.01944               0.647179       0.648587                  207.452
	6           0.915798    0.84697               0.692262       0.716162                  248.652
	

MXNet installation (with CUDA)

Please make sure bash environments, and CUDA/cuDNN SDK before install.

MXNet also supports SDK8 and cuDNN6. MXNet is faster in DeepLearning libraries and API is similar to Keras. It's straightforward, and you can code like a chain.

Community activity and information are a very few, and useful documents are only official documents.

*Currently, MXNet still does not support SDK9 in official. (at 2017 Oct)

Prepare requirement packages.

wget https://bootstrap.pypa.io/get-pip.py && sudo python3 get-pip.py

Install MXNet with CUDA8.

sudo pip3 install mxnet-cu80==0.11.0

Make sure to work.

python3
	>>> import mxnet as mx
	>>> a = mx.nd.ones((2, 3))
	>>> b = a * 2 + 1
	>>> b.asnumpy()
	array([[ 3.,  3.,  3.],
	   [ 3.,  3.,  3.]], dtype=float32)
	

GET STARTED
WITH GPU EATER

Signup

We offer a free trial for businesses,
so please contactinfo@pegara.comafter making an account.