I'm using the latest version of ML.NET image classification in Visual Studio 2019 on a Windows 10 PC to detect inappropriate images. I was using a dataset of 3000 SFW and 3000 NSFW images to train it, but it got stuck while training. There are no errors outputted, it just stops using the CPU and stops outputting to the console.
It has often stopped randomly after a line such as:
[Source=ImageClassificationTrainer; ImageClassificationTrainer, Kind=Trace] Phase: Bottleneck Computation, Dataset used: Train, Image Index: 1109
or
[Source=ImageClassificationTrainer; MultiClassClassifierScore; Cursor, Kind=Trace] Channel disposed
After it stops using the CPU the training page on the machine learning model builder remains the same:
I have also tried this with a smaller dataset of 700 images for each type but ended up with similar results. What's causing this?
This may be related to the chosen learning environment. Most likely you chose GPU training, but it is not supported. Choose CPU.
Related
I have not GPU support so it often happens that my model takes hours to train. Can I train my model in batches , for example if I want to have 100 epochs for my model,but due to power cut my training stops(at 50th epoch) but when I retrain my model I want to train it from where it was left (from 50th epoch).
It would be much appreciated if anyone can explain it by some example. https://timbu.com user
The weights are already updated, retraining the model with the updated weights without reintializing them will carry where it left.
You can work with online notebooks like google's colab or microsoft azure notebook if you have ressources problems. They offer a good working environnement, colab for exemple has gpu and tpu enabled and 16go ram limit.
Has anyone had success in training a semantic segmentation model (FCN8s) using DIGITS (v5) in multi-GPU mode?
I can successfully train on a single GPU, but it's very slow.
When I select multiple GPUs (my workstation has 4 Titan Xp), the entire system shuts down and restarts. This seems to be a problem that others have reported in the nVidia DIGITS Google group.
Any insight is greatly appreciated.
Ubuntu 14.04
Caffe 0.15.13
DIGITS 5.0
I am a newbie in deep learning with tensorflow.I am trying out a seq2seq model sample code.
I wanted to understand:
What is the minimum values of number of layers, layer size and batch
size that I could start off with to be able to test the seq2seq
model with satisfactory accuracy?
Also,the minimum infrastructure setup required in terms of memory
and cpu capability to train this deep learning model within a max
time of a few hours.
My experience has been training a seq2seq model to build a neural network with
2 layers of size 900 and batch size 4
took around 3 days to train on a 4GB RAM,3GHz Intel i5 single core
processor.
took around 1 day to train on a 8GB RAM,3GHz Intel i5 single core
processor.
Which helps the most for faster training - more RAM capacity, multiple CPU cores or a CPU + GPU combination core?
Disclaimer: I'm also new, and could be wrong on a lot of this.
I am a newbie in deep learning with tensorflow.I am trying out a
seq2seq model sample code.
I wanted to understand:
What is the minimum values of number of layers, layer size and batch
size that I could start off with to be able to test the seq2seq model
with satisfactory accuracy?
I think that this will just have to be up to your experimentation. Find out what works for your data set. I have heard a few pieces of advice: don't pick your own architecture if you can - find someone else's that is tried and tested. Seems deeper networks are better than wider if you're going to choose between the too. I also think bigger batch sizes are better if you have the memory. I've heard to maximize network size and then regularize so you don't overfit.
I have the impression these are big questions that no one really knows the answer to (could be very wrong about this!). We'd all love a smart way of choosing layer size / number of layers, but no one knows exactly how changing these things affects training.
Also,the minimum infrastructure setup required in terms of memory and cpu capability to train this deep
learning model within a max time of a few hours.
Depending on your model, that could be an unreasonable request. Seems like some models train for hundreds if not thousands of hours (on GPUs).
My experience has
been training a seq2seq model to build a neural network with 2 layers
of size 900 and batch size 4
took around 3 days to train on a 4GB RAM,3GHz Intel i5 single core
processor. took around 1 day to train on a 8GB RAM,3GHz Intel i5
single core processor. Which helps the most for faster training - more
RAM capacity, multiple CPU cores or a CPU + GPU combination core?
I believe a GPU will help you the most. I have seen some stuff that uses the CPU (asynchronous actor critic or something? They didn't use locking) where it seemed like CPU was better, but I think GPU will give you huge speedups.
I need to train a recurrent neural network as a language model and I decided to use keras with theano backend for that. Is it better to use an ordinary PC with some graphics card instead of a "cool" server machine that can't do gpu computing? Is there a boundary (given perhaps by the architecture of the NN and amount of the training data) that would separate "cpu-learnable" problems from those that can be done (in reasonable time) only by utilizing gpu?
(I have access to an older production server in the company I work in. It has 16 cores, about 49GB of available RAM so I thought I was ready for training, now I am reading about gpu optimization theano is doing and I am thinking I am basically screwed without it.)
Edit
I have just come across this article, where Tomáš Mikolov states they managed to train a single-layer recurrent neural network with 1024 states in 10 days while using only 24 CPUs and no GPU.
Is there a boundary
One that would separate CPU vs GPU is memory access. If you are accessing the values from your neural network often, CPU would do better, as it has faster access to RAM. If I'm not wrong, getting the updates (SGD, RMSProp, Adagrad etc) would require that the values be accessed.
GPU would be advisable when amount of computation is larger than memory access, e.g. training a deep neural network.
that can be done (in reasonable time) only by utilizing gpu
Unfortunately, if you are trying to solve such a hard problem, Theano would be a bad choice, as you are constrained to running on a single machine. Try other frameworks that would allow running on multiple CPU and GPU across machines, such as Microsoft CNTK or Google TensorFlow.
thinking I am basically screwed
The difference (may be speed up or slow down) won't be that big, depending on the neural network. Plus, running the neural network computation on your machine can get in the way of your work. So you are probably better off using that extra server and making it useful.
I have a large dataset and I'm trying to build a DAgger classifier for it.
As you know, in the training time, I need to run the initial learned classifier on training instances (predict them), one instance at a time.
Libsvm is too slow even for the initial learning.
I'm using OLL but that needs each instance to be written to a file and then run the test code on it and get the prediction, this involves many disk I/O.
I have considered working with vowpal_wabbit (yet I'm not sure if it will help with disk I/O) but I don't have the permission to install it on the cluster I'm working with.
Liblinear is too slow and again needs disk I/O I believe.
What are the other alternatives I can use?
I recommend trying Vowpal Wabbit (VW). If Boost (and gcc or clang) is installed on the cluster, you can simply compile VW yourself (see the Tutorial). If Boost is not installed, you can compile it yourself as well.
VW contains more modern algorithms than OLL. Moreover, it contains several structured prediction algorithms (SEARN, DAgger) and also a C++ and Python interface to it. See an iPython notebook tutorial.
As for the disk I/O: for one-pass learning, you can pipe the input data directly to vw (cat data | vw) or run vw --daemon. For multi-pass learning, you must use cache file (the input data in binary fast-to-load format), which takes some time to create (during the first pass, unless it already existed), but the subsequent passes are much faster because of the binary format.