I am quite familiar with Dask distributed for CPUs. I'd like to explore a transition to running my code on GPU cores. When I submit a task to the LocalCUDACluster I get this error:
ValueError: tuple is not allowed for map key
This is my test case:
import cupy as cp
import numpy as np
from dask_cuda import LocalCUDACluster
from dask.distributed import Client
cluster = LocalCUDACluster()
c = Client(cluster)
def test_function(x):
return(x+1)
sample_np = np.array([0,1])
sample_cp = cp.asarray(sample_np)
test_1 = test_function(sample_cp)
test_2 = c.submit(test_function,sample_cp)
test_2 = test_2.result()
test_1 output:
array([1, 2])
test_2 output:
distributed.protocol.core - CRITICAL - Failed to deserialize
.....
ValueError: tuple is not allowed for map key
How do I correctly distribute tasks on CUDA cores?
UPDATE:
I managed to get it working by first installing the Dask Distributed and Dask CUDA release.
However, I noticed that only 1 worker is available, but I have 600 CUDA cores. How do I distributed individual tasks on these 600 CUDA cores? I'd like to parallelize tasks on these 600 cores.
Versions:
dask 2.17.2
dask-cuda 0.13.0
cupy 7.5.0
cudf 0.13.0
msgpack-python 1.0.0
It looks like this question has an answer in the comments. I'm going to copy a response from Nick Becker
Dask's distributed scheduler is single threaded (CPU and GPU), and Dask-CUDA uses a one worker per GPU model. This means that each task assigned to a given GPU will run serially, but that the task itself will use the GPU for parallelized computation. You may want to look at the Dask documentation and explore Dask.Array (which also supports GPU arrays).
Related
I recently bought a GPU (RTX 3060 Ti) before that I used to work on google collab (Free version). I have downloaded yolov5 on my local machine and made environment variable for it and downloaded the required dependency libraries. I ran training for 3 epochs to test my gpu with same dataset which I use on collab that takes only around 30 seconds to complete (Tesla T4 which has around 2000 cuda cores less than RTX 3060 Ti)on the otherhand my GPU kept running for around 3 hours but didnt stop (So I Intrupted it).
Screenshot of Yolo in VS Code
The code I ran on my local machine is:
# !git clone https://github.com/ultralytics/yolov5 # clone
# %cd yolov5
%pip install -qr requirements.txt # install
import torch
import utils
display = utils.notebook_init() # checks
# Train YOLOv5s on COCO128 for 3 epochs
!python train.py --img 412 --batch 16 --epochs 3 --data train_data/data.yaml --weights yolov5s.pt
I have a small development cluster on 3 AWS T2 machines. One machine serves as the client, one as scheduler and finally one as worker. On all of them I performed a git clone and manually installed Numpy version 1.21.0 on all 3. However, when following the basic setup, the error bellow is produced when executing A = client.map(square, range(10)) on the Python3(3.8) interpreter. How can this issue be fixed? Seems like an internal error, Dask was acquired with pip install on Client machine.
ubuntu#ip-172...:~$ python3
Python 3.8.5 (default, Jan 27 2021, 15:41:15)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from dask.distributed import Client
>>> client = Client('IPv4Addr:8786')
>>> client
<Client: 'tcp://172...:8786' processes=1 threads=4, memory=8.18 GB>
>>> def square(x):
... return x ** 2
...
>>> A = client.map(square, range(10))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.8/dist-packages/distributed-2021.2.0+19.g2c5d2cf8-py3.8.egg/distributed/client.py", line 1764, in map
futures = self._graph_to_futures(
File "/usr/local/lib/python3.8/dist-packages/distributed-2021.2.0+19.g2c5d2cf8-py3.8.egg/distributed/client.py", line 2542, in _graph_to_futures
dsk = dsk.__dask_distributed_pack__(self, keyset)
AttributeError: 'HighLevelGraph' object has no attribute '__dask_distributed_pack__'
To anyone having the same issue, a possible fix (the one that worked for us) would be to create a virtual environment where you will install Dask and all its dependencies.
1- Install Dask on newly created venv
2- Produce a requirements specification with $ pip freeze > ~/requirements.txt
3- On worker and client machines create a venv and perform a $ pip install -r requirements.txton said env.
This will guarantee identical environments, and hopefully prevent various issues such as the one detailed on the original question.
We had the same error message. It turned out, that we had a version mismatch between the packages dask and distributed. Somehow distributed got upgraded to 2021.3.0, while dask was still at 2020.12.0. Downgrading distributed to the old version fixed the problem.
I built the gpu version of the docker image https://github.com/floydhub/dl-docker with keras version 2.0.0 and tensorflow version 0.12.1. I then ran the mnist tutorial https://github.com/fchollet/keras/blob/master/examples/mnist_cnn.py but realized that keras is not using GPU. Below is the output that I have
root#b79b8a57fb1f:~/sharedfolder# python test.py
Using TensorFlow backend.
Downloading data from https://s3.amazonaws.com/img-datasets/mnist.npz
x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
Train on 60000 samples, validate on 10000 samples
Epoch 1/12
2017-09-06 16:26:54.866833: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-09-06 16:26:54.866855: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-09-06 16:26:54.866863: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-09-06 16:26:54.866870: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-09-06 16:26:54.866876: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
Can anyone let me know if there are some settings that need to be made before keras uses GPU ? I am very new to all these so do let me know if I need to provide more information.
I have installed the pre-requisites as mentioned on the page
Install Docker following the installation guide for your platform: https://docs.docker.com/engine/installation/
I am able to launch the docker image
docker run -it -p 8888:8888 -p 6006:6006 -v /sharedfolder:/root/sharedfolder floydhub/dl-docker:cpu bash
GPU Version Only: Install Nvidia drivers on your machine either from Nvidia directly or follow the instructions here. Note that you don't have to install CUDA or cuDNN. These are included in the Docker container.
I am able to run the last step
cv#cv-P15SM:~$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 375.66 Mon May 1 15:29:16 PDT 2017
GCC version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4)
GPU Version Only: Install nvidia-docker: https://github.com/NVIDIA/nvidia-docker, following the instructions here. This will install a replacement for the docker CLI. It takes care of setting up the Nvidia host driver environment inside the Docker containers and a few other things.
I am able to run the step here
# Test nvidia-smi
cv#cv-P15SM:~$ nvidia-docker run --rm nvidia/cuda nvidia-smi
Thu Sep 7 00:33:06 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.66 Driver Version: 375.66 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 780M Off | 0000:01:00.0 N/A | N/A |
| N/A 55C P0 N/A / N/A | 310MiB / 4036MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 Not Supported |
+-----------------------------------------------------------------------------+
I am also able to run the nvidia-docker command to launch a gpu supported image.
What I have tried
I have tried the following suggestions below
Check if you have completed step 9 of this tutorial ( https://github.com/ignaciorlando/skinner/wiki/Keras-and-TensorFlow-installation ). Note: Your file paths may be completely different inside that docker image, you'll have to locate them somehow.
I appended the suggested lines to my bashrc and have verified that the bashrc file is updated.
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-8.0/lib64:/usr/local/cuda-8.0/extras/CUPTI/lib64' >> ~/.bashrc
echo 'export CUDA_HOME=/usr/local/cuda-8.0' >> ~/.bashrc
To import the following commands in my python file
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID" # see issue #152
os.environ["CUDA_VISIBLE_DEVICES"]="0"
Both steps, done separately or together unfortunately did not solve the issue. Keras is still running with the CPU version of tensorflow as its backend. However, I might have found the possible issue. I checked the version of my tensorflow via the following commands and found two of them.
This is the CPU version
root#08b5fff06800:~# pip show tensorflow
Name: tensorflow
Version: 1.3.0
Summary: TensorFlow helps the tensors flow
Home-page: http://tensorflow.org/
Author: Google Inc.
Author-email: opensource#google.com
License: Apache 2.0
Location: /usr/local/lib/python2.7/dist-packages
Requires: tensorflow-tensorboard, six, protobuf, mock, numpy, backports.weakref, wheel
And this is the GPU version
root#08b5fff06800:~# pip show tensorflow-gpu
Name: tensorflow-gpu
Version: 0.12.1
Summary: TensorFlow helps the tensors flow
Home-page: http://tensorflow.org/
Author: Google Inc.
Author-email: opensource#google.com
License: Apache 2.0
Location: /usr/local/lib/python2.7/dist-packages
Requires: mock, numpy, protobuf, wheel, six
Interestingly, the output shows that keras is using tensorflow version 1.3.0 which is the CPU version and not 0.12.1, the GPU version
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K
import tensorflow as tf
print('Tensorflow: ', tf.__version__)
Output
root#08b5fff06800:~/sharedfolder# python test.py
Using TensorFlow backend.
Tensorflow: 1.3.0
I guess now I need to figure out how to have keras use the gpu version of tensorflow.
It is never a good idea to have both tensorflow and tensorflow-gpu packages installed side by side (the one single time it happened to me accidentally, Keras was using the CPU version).
I guess now I need to figure out how to have keras use the gpu version of tensorflow.
You should simply remove both packages from your system, and then re-install tensorflow-gpu [UPDATED after comment]:
pip uninstall tensorflow tensorflow-gpu
pip install tensorflow-gpu
Moreover, it is puzzling why you seem to use the floydhub/dl-docker:cpu container, while according to the instructions you should be using the floydhub/dl-docker:gpu one...
I had similar kind of issue - keras didn't use my GPU. I had tensorflow-gpu installed according to instruction into conda, but after installation of keras it simply not listed GPU as available device. I've realized that installation of keras adds tensorflow package! So I had both tensorflow and tensorflow-gpu packages. I've found that there is keras-gpu package available. After complete uninstallation of keras, tensorflow, tensorflow-gpu and installation of tensorflow-gpu, keras-gpu the problem was solved.
In the future, you can try using virtual environments to separate tensorflow CPU and GPU, for example:
conda create --name tensorflow python=3.5
activate tensorflow
pip install tensorflow
AND
conda create --name tensorflow-gpu python=3.5
activate tensorflow-gpu
pip install tensorflow-gpu
This worked for me:
Install tensorflow v2.2.0
pip install tensorflow==2.2.0
Also remove tensorflow-gpu (if it's present)
I'm testing a segmentation model on gcloud and the inference is incredibly slow. It takes 3 min to get the result (averaged over 5 runs). Same model runs ~2.5 s on my laptop when running through tf-serving.
Is it normal? I didn't find any mention in the documentation on how to define the instance type and it seems impossible to run inference on GPU.
The steps I'm using is fairly straightforward and follows the examples and tutorials:
gcloud ml-engine models create "seg_model"
gcloud ml-engine versions create v1 \
--model "seg_model" \
--origin $DEPLOYMENT_SOURCE \
--runtime-version 1.2 \
--staging-bucket gs://$BUCKET_NAME
gcloud ml-engine predict --model ${MODEL_NAME} --version v1 --json-instances request.json
Upd: after running more experiments I found that redirecting output to a file gets the inference time down to 27s. Model output size is 512x512, which probably causes some delays on a client side. Although it is much lower than 3 min, it is still an order of magnitude slower than tf-serving.
I am configuring an Apache Spark cluster.
When I run the cluster with 1 master and 3 slaves, I see this on the master monitor page:
Memory
2.0 GB (512.0 MB Used)
2.0 GB (512.0 MB Used)
6.0 GB (512.0 MB Used)
I want to increase the used memory for the workers but I could not find the right config for this. I have changed spark-env.sh as below:
export SPARK_WORKER_MEMORY=6g
export SPARK_MEM=6g
export SPARK_DAEMON_MEMORY=6g
export SPARK_JAVA_OPTS="-Dspark.executor.memory=6g"
export JAVA_OPTS="-Xms6G -Xmx6G"
But the used memory is still the same. What should I do to change used memory?
When using 1.0.0+ and using spark-shell or spark-submit, use the --executor-memory option. E.g.
spark-shell --executor-memory 8G ...
0.9.0 and under:
When you start a job or start the shell change the memory. We had to modify the spark-shell script so that it would carry command line arguments through as arguments for the underlying java application. In particular:
OPTIONS="$#"
...
$FWDIR/bin/spark-class $OPTIONS org.apache.spark.repl.Main "$#"
Then we can run our spark shell as follows:
spark-shell -Dspark.executor.memory=6g
When configuring it for a standalone jar, I set the system property programmatically before creating the spark context and pass the value in as a command line argument (I can make it shorter than the long winded system props then).
System.setProperty("spark.executor.memory", valueFromCommandLine)
As for changing the default cluster wide, sorry, not entirely sure how to do it properly.
One final point - I'm a little worried by the fact you have 2 nodes with 2GB and one with 6GB. The memory you can use will be limited to the smallest node - so here 2GB.
In Spark 1.1.1, to set the Max Memory of workers.
in conf/spark.env.sh, write this:
export SPARK_EXECUTOR_MEMORY=2G
If you have not used the config file yet, copy the template file
cp conf/spark-env.sh.template conf/spark-env.sh
Then make the change and don't forget to source it
source conf/spark-env.sh
In my case, I use ipython notebook server to connect to spark. I want to increase the memory for executor.
This is what I do:
from pyspark import SparkContext
from pyspark.conf import SparkConf
conf = SparkConf()
conf.setMaster(CLUSTER_URL).setAppName('ipython-notebook').set("spark.executor.memory", "2g")
sc = SparkContext(conf=conf)
According to Spark documentation you can change the Memory per Node with command line argument --executor-memory while submitting your application. E.g.
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://master.node:7077 \
--executor-memory 8G \
--total-executor-cores 100 \
/path/to/examples.jar \
1000
I've tested and it works.
The default configuration for the worker is to allocate Host_Memory - 1Gb for each worker. The configuration parameter to manually adjust that value is SPARK_WORKER_MEMORY, like in your question:
export SPARK_WORKER_MEMORY=6g.