Has anyone had success in training a semantic segmentation model (FCN8s) using DIGITS (v5) in multi-GPU mode?
I can successfully train on a single GPU, but it's very slow.
When I select multiple GPUs (my workstation has 4 Titan Xp), the entire system shuts down and restarts. This seems to be a problem that others have reported in the nVidia DIGITS Google group.
Any insight is greatly appreciated.
Ubuntu 14.04
Caffe 0.15.13
DIGITS 5.0
Related
I'm using the latest version of ML.NET image classification in Visual Studio 2019 on a Windows 10 PC to detect inappropriate images. I was using a dataset of 3000 SFW and 3000 NSFW images to train it, but it got stuck while training. There are no errors outputted, it just stops using the CPU and stops outputting to the console.
It has often stopped randomly after a line such as:
[Source=ImageClassificationTrainer; ImageClassificationTrainer, Kind=Trace] Phase: Bottleneck Computation, Dataset used: Train, Image Index: 1109
or
[Source=ImageClassificationTrainer; MultiClassClassifierScore; Cursor, Kind=Trace] Channel disposed
After it stops using the CPU the training page on the machine learning model builder remains the same:
I have also tried this with a smaller dataset of 700 images for each type but ended up with similar results. What's causing this?
This may be related to the chosen learning environment. Most likely you chose GPU training, but it is not supported. Choose CPU.
I want to train YOLO model for my custom objects data-set. I read about it everywhere on various sites and everybody is talking about GPU should be used to train and run YOLO custom model.
But, due to I don't have GPU I am confused about what to do? Because I can not buy a GPU for that. Also, I read about Google Colab but I can not use it, that I want to use my model on offline system.
I am scared after seeing the system utilization of the YOLO used in the program from github:
https://github.com/AhmadYahya97/Fully-Automated-red-light-Violation-Detection.git.
I was running this on my laptop having configuration:
RAM: 4 GB
Processor: Intel i3, 2.40 GHz
OS: Ubuntu 18.04 LTS
Although it is going to be lot slower, yes you can use CPU only when training and make prediction. If you are using original Darknet Framework, set the GPU flag in Makefile when installing darknet to GPU=0.
How to install darknet : https://pjreddie.com/darknet/install/
Then you can start to train or predict following this guide : https://pjreddie.com/darknet/yolo/
I have a Keras model which is doing inference on a Raspberry Pi (with a camera). The Raspberry Pi has a really slow CPU (1.2.GHz) and no CUDA GPU so the model.predict() stage is taking a long time (~20 seconds). I'm looking for ways to reduce that by as much as possible. I've tried:
Overclocking the CPU (+ 200 MhZ) and got a few extra seconds of performance.
Using float16's instead of float32's.
Reducing the image input size as much as possible.
Is there anything else I can do to increase the speed during inference? Is there a way to simplify a model.h5 and take a drop in accuracy? I've had success with simpler models, but for this project I need to rely on an existing model so I can't train from scratch.
VGG16 / VGG19 architecture is very slow since it has lots of parameters. Check this answer.
Before any other optimization, try to use a simpler network architecture.
Google's MobileNet seems like a good candidate since it's implemented on Keras and it was designed for more limited devices.
If you can't use a different network, you may compress the network with pruning. This blog post specifically do pruning with Keras.
Maybe OpenVINO will help. OpenVINO is an open-source toolkit for network inference, and it optimizes the inference performance by, e.g., graph pruning and fusing some operations. The ARM support is provided by the contrib repository.
Here are the instructions on how to build an ARM plugin to run OpenVINO on Raspberry Pi.
Disclaimer: I work on OpenVINO.
I need to train a recurrent neural network as a language model and I decided to use keras with theano backend for that. Is it better to use an ordinary PC with some graphics card instead of a "cool" server machine that can't do gpu computing? Is there a boundary (given perhaps by the architecture of the NN and amount of the training data) that would separate "cpu-learnable" problems from those that can be done (in reasonable time) only by utilizing gpu?
(I have access to an older production server in the company I work in. It has 16 cores, about 49GB of available RAM so I thought I was ready for training, now I am reading about gpu optimization theano is doing and I am thinking I am basically screwed without it.)
Edit
I have just come across this article, where Tomáš Mikolov states they managed to train a single-layer recurrent neural network with 1024 states in 10 days while using only 24 CPUs and no GPU.
Is there a boundary
One that would separate CPU vs GPU is memory access. If you are accessing the values from your neural network often, CPU would do better, as it has faster access to RAM. If I'm not wrong, getting the updates (SGD, RMSProp, Adagrad etc) would require that the values be accessed.
GPU would be advisable when amount of computation is larger than memory access, e.g. training a deep neural network.
that can be done (in reasonable time) only by utilizing gpu
Unfortunately, if you are trying to solve such a hard problem, Theano would be a bad choice, as you are constrained to running on a single machine. Try other frameworks that would allow running on multiple CPU and GPU across machines, such as Microsoft CNTK or Google TensorFlow.
thinking I am basically screwed
The difference (may be speed up or slow down) won't be that big, depending on the neural network. Plus, running the neural network computation on your machine can get in the way of your work. So you are probably better off using that extra server and making it useful.
I am using GoogleNet model for binary classification of images. Earlier, I was using the virtual machine and now I am using Ubuntu 14.04. Both are giving me different results. I tried to find out a lot where is the problem but couldn't pinpoint it.
I have trained two models separately one in Ubuntu 14.04 and another in the virtual machine. Both models are using CPU. cuDNN is not being used in both. Regarding BLAS library I am using default ATLAS.
Any suggestions would be of great help.
Since you started your training from scratch in both cases and you did not explicitly fixed random_seed parameter in your solver.prototxt it is very likely that caffe initialized your model with different random weights for each of the two training processes. Starting from different points is very likely to end with differently trained models.
If you are concerned about possible differences in caffe between the two architectures, try repeat the training but with the same random_seed parameter in solver.prototxt.