Using python with ml - machine-learning

I have not GPU support so it often happens that my model takes hours to train. Can I train my model in batches , for example if I want to have 100 epochs for my model,but due to power cut my training stops(at 50th epoch) but when I retrain my model I want to train it from where it was left (from 50th epoch).
It would be much appreciated if anyone can explain it by some example. https://timbu.com user

The weights are already updated, retraining the model with the updated weights without reintializing them will carry where it left.
You can work with online notebooks like google's colab or microsoft azure notebook if you have ressources problems. They offer a good working environnement, colab for exemple has gpu and tpu enabled and 16go ram limit.

Related

Why does google vision take so much time to train the model?

I have been learning gcp and how to use google vision to train, test and evaluate a model over the image data set. I have been testing the model for just 60 images, Split as shown in the image. The estimated time to train has been over 60-90 min. I have been wondering what happens in the backend that it takes so much time to train.
As I understand, if you do not provide enough images, the algorithm will tend to converge to an heuristic solution. This may take more time due to the low statistical significance of the trained data ingested, especially per label given your warning...

Multi GPU training for Transformers with different GPUs

I want to fine tune a GPT-2 model using Huggingface’s Transformers. Preferably the medium model but large if possible. Currently, I have a RTX 2080 Ti with 11GB of memory and I can train the small model just fine.
My question is: will I run into any issues if I added an old Tesla K80 (24GB) to my machine and distributed the training? I cannot find information about using different capacity GPUs during training and issues I could run into.
Will my model size limit essentially be sum of all available GPU memory? (35GB?)
I’m not interested in doing this in AWS.
You already solved your problem. That's great. I would like to point out a different approach and address a few questions.
Will my model size limit essentially be sum of all available GPU
memory? (35GB?)
This depends on the training technique you use. The standard data parallelism replicates the model, gradients and optimiser states to each of the GPUs. So each GPU must have enough memory to hold all these. The data is splitted across the GPUs. However, the bottleneck is usually the optimiser states and the model not the data.
The state-of-the-art approach in training is ZeRO. Not only the dataset, but also the model parameters, the gradients and the optimizer states are splitted across the GPUs. This allows you to train huge models without hitting OOM. See the nice illustration below from the paper. The baseline is the standard case that I mentioned. They gradually split optimizer states, gradients and model parameter accross the GPU's and compare the memory usage per GPU.
The authors of the paper created a library called DeepSpeed and it is very easy to integrate it with huggingface. With that I was able to increase my model size from 260 Million to 11 Billion :)
If you want to understand in detail how it works, here is the paper:
https://arxiv.org/pdf/1910.02054.pdf
More information on integrating DeepSpeed with Huggingface can be found here:
https://huggingface.co/docs/transformers/main_classes/deepspeed
PS: There is a the model parallelism technique in which each GPU trains different layers of the model but it lost its popularity and is not being actively used.

Get training data little by little

I am working on cifar 10 with azure ml, but it takes too much time for learning because there are too many data. Tensorflow have next_batch function to get training data little by little. I would also like to use it in azure ml. How can I get data little by little and speed up learning per epoch?
Incremental training of DNNs is not supported in Azure ML Studio. I suggest taking a look into Azure ML Workbench with gives you programmatic access to the algorithms to do minibatch training.
See here: https://learn.microsoft.com/en-us/azure/machine-learning/preview/how-to-use-gpu

How intensive is training a machine learning algorithm?

I'd like to make an app using iOS's new CoreML framework that does image recognition. To do so I'd probably have to train my own model, and I'm wondering exactly how much data and compute power it would require. Is it something I could feasibly accomplish on an dual core i5 Macbook Pro using Google Images for source data or would it be much more involved?
It depends on what sort of images you want to train your model to recognize.
What is often done is fine-tuning an existing model. You take a pretrained version of Inception-v3 (let's say) and then replace the final layer with your own. You train this last layer on your own images.
You still need a fair number of training images (a few 100 per category, but more is better) but you can do this on your MacBook Pro in anywhere between 30 minutes to a few hours.
TensorFlow comes with a script that makes it really easy to do this. Keras has a great blog post on how to do this. I used the TensorFlow script to re-train Inception-v3 to tell apart my two cats, from 50 or so images of each cat.
If you want to train from scratch you probably want to do this in the cloud using AWS, Google's Cloud ML Engine, or something easy like FloydHub.

Choice of infrastructure for faster deep learning model training with tensorflow?

I am a newbie in deep learning with tensorflow.I am trying out a seq2seq model sample code.
I wanted to understand:
What is the minimum values of number of layers, layer size and batch
size that I could start off with to be able to test the seq2seq
model with satisfactory accuracy?
Also,the minimum infrastructure setup required in terms of memory
and cpu capability to train this deep learning model within a max
time of a few hours.
My experience has been training a seq2seq model to build a neural network with
2 layers of size 900 and batch size 4
took around 3 days to train on a 4GB RAM,3GHz Intel i5 single core
processor.
took around 1 day to train on a 8GB RAM,3GHz Intel i5 single core
processor.
Which helps the most for faster training - more RAM capacity, multiple CPU cores or a CPU + GPU combination core?
Disclaimer: I'm also new, and could be wrong on a lot of this.
I am a newbie in deep learning with tensorflow.I am trying out a
seq2seq model sample code.
I wanted to understand:
What is the minimum values of number of layers, layer size and batch
size that I could start off with to be able to test the seq2seq model
with satisfactory accuracy?
I think that this will just have to be up to your experimentation. Find out what works for your data set. I have heard a few pieces of advice: don't pick your own architecture if you can - find someone else's that is tried and tested. Seems deeper networks are better than wider if you're going to choose between the too. I also think bigger batch sizes are better if you have the memory. I've heard to maximize network size and then regularize so you don't overfit.
I have the impression these are big questions that no one really knows the answer to (could be very wrong about this!). We'd all love a smart way of choosing layer size / number of layers, but no one knows exactly how changing these things affects training.
Also,the minimum infrastructure setup required in terms of memory and cpu capability to train this deep
learning model within a max time of a few hours.
Depending on your model, that could be an unreasonable request. Seems like some models train for hundreds if not thousands of hours (on GPUs).
My experience has
been training a seq2seq model to build a neural network with 2 layers
of size 900 and batch size 4
took around 3 days to train on a 4GB RAM,3GHz Intel i5 single core
processor. took around 1 day to train on a 8GB RAM,3GHz Intel i5
single core processor. Which helps the most for faster training - more
RAM capacity, multiple CPU cores or a CPU + GPU combination core?
I believe a GPU will help you the most. I have seen some stuff that uses the CPU (asynchronous actor critic or something? They didn't use locking) where it seemed like CPU was better, but I think GPU will give you huge speedups.

Resources