Iterations in DL4J? - deeplearning4j

Iterations in DL4J? - deeplearning4j

From here and other sources TotalIteration = TotalExamples/Batchsize * Epochs. I have a CNN with 54 layers in total for 36000 examples, Batchsize of 12 and 3 Epochs. From the formula my Iteration is supposed to be 9000 but from the training-ui I am getting 3x9000 = 27000 at the end.
Tried a smaller dataset with the formula again lo and behold it's 3 times whatever the calculation was for the total iteration.
Don't understand where the 3 is coming from or is it because my images have 3 channels(RGB) so it's seeing each channel as an example(wonder why if this is the case, since CNNs are supposed to consume data in volumes)

Related

Why max_batches independent of the size of the dataset?

I am wondering why the number of images has no influence on the number of iterations when training. Here is an example to to make my question clearer:
Suppose we have 6400 images for a training to recognize 4 classes. Based on AlexeyAB explanations, we keep batch= 64, subdivisions = 16 and write max_batches = 8000 since max_batches is determined by #classes x 2000.
Since we have 6400 images, a complete epoch requires 100 iterations. Therefore this training ends after 80 epochs.
Now, suppose that we have 12800 images. In that case, an epoch needs 200 iterations. Therefore the training ends after 40 epochs.
Since an epoch refers to one cycle through the full training dataset, I'm wondering why we don't increase the number of iterations when our dataset increases, in order to keep the number of epochs constant.
Said differently, I'm asking for a simple explanation as to why the number of epochs seems to be irrelevant to the quality of the training. I feel that it's a consequence of Yolo's construction but I am not knowledgeable enough to understand how.

Why the number of images has no influence on the number of iterations when training?
In darknet yolo, the number of iterations depends on the max_batches parameter in .cfg file. After running for max_batches, the darknet saves the final_weights.
In each epoch, all the data samples are passed through the network, so if you have many images, the training time for one epoch (and iteration) will be higher, you can test that by increasing images in your data.
The sub-division accounts for the number of mini-batches. Let's say, you have 100 images in your dataset. your batch size is 10, sub-division is 2, max_batches is 20.
So, in each iteration, 10 images are passed to the network in two mini-batches (Each having 5 samples), once you have done 20 baches (20*10 data samples), the training will be completed. (The details can be a little different, I'm using a slightly modified darknet by original author pjreddie)

The instructions are updated now. max_batches is equal to classes*2000 but not less than number of training images and not less than 6000. Please find it at this link.

How to calculate the amount of connections in neural network

I have exactly this scenario and I need to know how many connections this set has. I searched in several places and I'm not sure of the answer. That is, I do not know how to calculate the number of connections on my network, this is still unclear to me. What I have is exactly as follows:
** Having bias in all but least of the input
Input: 784
First hidden layer - Output: 400
Second hidden layer - Output: 200
Output layer - Output: 10
I would calculate this as follows: ((784 * 400) + bias) + ((400 * 200) + bias) + ((200 * 10) + bias) = XXX
I do not know if this is correct. I need help figuring out how to solve this, and if it's not just something mathematical, what's the theory to do this calculation?
Thank you.

Your calculation is correct for total number of weights. When you have n neurons connected to m neurons, the number connections between neurons is n*m. You can see this by drawing a small graph, say 3 neurons connected to 4 neurons. You will see that there's 12 connections between the two layers. So if you want connections rather than weights, just drop the '+bias' parts of your equation.
If you want total weights, then the number is simply (n*m+m) since you get 1 bias weight for each of the m neurons in the second layer.
Total connections in that neural network: (784*400)+(400*200)+(200*10)
Total weights in that neural network: (784*400+400)+(400*200+200)+(200*10+10)

Classification with imbalanced dataset using Multi Layer Perceptrons

I am having a trouble in classification problem.
I have almost 400k number of vectors in training data with two labels, and I'd like to train MLP which classifies data into two classes.
However, the dataset is so imbalanced. 95% of them have label 1, and others have label 0. The accuracy grows as training progresses, and stops after reaching 95%. I guess this is because the network predict the label as 1 for all vectors.
So far, I tried dropping out layers with 0.5 probabilities. But, the result is the same. Is there any ways to improve the accuracy?

I think the best way to deal with unbalanced data is to use weights for your class. For example, you can weight your classes such that sum of weights for each class will be equal.
import pandas as pd
df = pd.DataFrame({'x': range(7),
'y': [0] * 2 + [1] * 5})
df['weight'] = df['y'].map(len(df)/2/df['y'].value_counts())
print(df)
print(df.groupby('y')['weight'].agg({'samples': len, 'weight': sum}))
output:
x y weight
0 0 0 1.75
1 1 0 1.75
2 2 1 0.70
3 3 1 0.70
4 4 1 0.70
5 5 1 0.70
6 6 1 0.70
samples weight
y
0 2.0 3.5
1 5.0 3.5

You could try another classifier on subset of examples. SVMs, may work good with small data, so you can take let's say 10k examples only, with 5/1 proportion in classes.
You could also oversample small class somehow and under-sample the another.
You can also simply weight your classes.
Think also about proper metric. It's good that you noticed that the output you have predicts only one label. It is, however, not easily seen using accuracy.
Some nice ideas about unbalanced dataset here:
https://machinelearningmastery.com/tactics-to-combat-imbalanced-classes-in-your-machine-learning-dataset/
Remember not to change your test set.

That's a common situation: the network learns a constant and can't get out of this local minimum.
When the data is very unbalanced, like in your case, one possible solution is a weighted cross entropy loss function. For instance, in tensorflow, apply a built-in tf.nn.weighted_cross_entropy_with_logits function. There is also a good discussion of this idea in this post.
But I should say that getting more data to balance both classes (if that's possible) will always help.

What is the difference between steps and epochs in TensorFlow?

In most of the models, there is a steps parameter indicating the number of steps to run over data. But yet I see in most practical usage, we also execute the fit function N epochs.
What is the difference between running 1000 steps with 1 epoch and running 100 steps with 10 epoch? Which one is better in practice? Any logic changes between consecutive epochs? Data shuffling?

A training step is one gradient update. In one step batch_size examples are processed.
An epoch consists of one full cycle through the training data. This is usually many steps. As an example, if you have 2,000 images and use a batch size of 10 an epoch consists of:
2,000 images / (10 images / step) = 200 steps.
If you choose your training image randomly (and independently) in each step, you normally do not call it epoch. [This is where my answer differs from the previous one. Also see my comment.]

An epoch usually means one iteration over all of the training data. For instance if you have 20,000 images and a batch size of 100 then the epoch should contain 20,000 / 100 = 200 steps. However I usually just set a fixed number of steps like 1000 per epoch even though I have a much larger data set. At the end of the epoch I check the average cost and if it improved I save a checkpoint. There is no difference between steps from one epoch to another. I just treat them as checkpoints.
People often shuffle around the data set between epochs. I prefer to use the random.sample function to choose the data to process in my epochs. So say I want to do 1000 steps with a batch size of 32. I will just randomly pick 32,000 samples from the pool of training data.

As I am currently experimenting with the tf.estimator API I would like to add my dewy findings here, too. I don't know yet if the usage of steps and epochs parameters is consistent throughout TensorFlow and therefore I am just relating to tf.estimator (specifically tf.estimator.LinearRegressor) for now.
Training steps defined by num_epochs: steps not explicitly defined
estimator = tf.estimator.LinearRegressor(feature_columns=ft_cols)
train_input = tf.estimator.inputs.numpy_input_fn({'x':x_train},y_train,batch_size=4,num_epochs=1,shuffle=True)
estimator.train(input_fn=train_input)
Comment: I have set num_epochs=1 for the training input and the doc entry for numpy_input_fn tells me "num_epochs: Integer, number of epochs to iterate over data. If None will run forever.". With num_epochs=1 in the above example the training runs exactly x_train.size/batch_size times/steps (in my case this was 175000 steps as x_train had a size of 700000 and batch_size was 4).
Training steps defined by num_epochs: steps explicitly defined higher than number of steps implicitly defined by num_epochs=1
estimator = tf.estimator.LinearRegressor(feature_columns=ft_cols)
train_input = tf.estimator.inputs.numpy_input_fn({'x':x_train},y_train,batch_size=4,num_epochs=1,shuffle=True)
estimator.train(input_fn=train_input, steps=200000)
Comment: num_epochs=1 in my case would mean 175000 steps (x_train.size/batch_size with x_train.size=700,000 and batch_size=4) and this is exactly the number of steps estimator.train albeit the steps parameter was set to 200,000 estimator.train(input_fn=train_input, steps=200000).
Training steps defined by steps
estimator = tf.estimator.LinearRegressor(feature_columns=ft_cols)
train_input = tf.estimator.inputs.numpy_input_fn({'x':x_train},y_train,batch_size=4,num_epochs=1,shuffle=True)
estimator.train(input_fn=train_input, steps=1000)
Comment: Although I have set num_epochs=1 when calling numpy_input_fnthe training stops after 1000 steps. This is because steps=1000 in estimator.train(input_fn=train_input, steps=1000) overwrites the num_epochs=1 in tf.estimator.inputs.numpy_input_fn({'x':x_train},y_train,batch_size=4,num_epochs=1,shuffle=True).
Conclusion:
Whatever the parameters num_epochs for tf.estimator.inputs.numpy_input_fn and steps for estimator.train define, the lower bound determines the number of steps which will be run through.

In easy words
Epoch: Epoch is considered as number of one pass from entire dataset
Steps: In tensorflow one steps is considered as number of epochs multiplied by examples divided by batch size
steps = (epoch * examples)/batch size
For instance
epoch = 100, examples = 1000 and batch_size = 1000
steps = 100

Epoch: A training epoch represents a complete use of all training data for gradients calculation and optimizations(train the model).
Step: A training step means using one batch size of training data to train the model.
Number of training steps per epoch: total_number_of_training_examples / batch_size.
Total number of training steps: number_of_epochs x Number of training steps per epoch.

According to Google's Machine Learning Glossary, an epoch is defined as
"A full training pass over the entire dataset such that each example has been seen once. Thus, an epoch represents N/batch_size training iterations, where N is the total number of examples."
If you are training model for 10 epochs with batch size 6, given total 12 samples that means:
the model will be able to see whole dataset in 2 iterations ( 12 / 6 = 2) i.e. single epoch.
overall, the model will have 2 X 10 = 20 iterations (iterations-per-epoch X no-of-epochs)
re-evaluation of loss and model parameters will be performed after each iteration!

Since there’re no accepted answer yet :
By default an epoch run over all your training data. In this case you have n steps, with n = Training_lenght / batch_size.
If your training data is too big you can decide to limit the number of steps during an epoch.[https://www.tensorflow.org/tutorials/structured_data/time_series?_sm_byp=iVVF1rD6n2Q68VSN]
When the number of steps reaches the limit that you’ve set the process will start over, beginning the next epoch.
When working in TF, your data is usually transformed first into a list of batches that will be fed to the model for training. At each step you process one batch.
As to whether it’s better to set 1000 steps for 1 epoch or 100 steps with 10 epochs I don’t know if there’s a straight answer.
But here are results on training a CNN with both approaches using TensorFlow timeseries data tutorials :
In this case, both approaches lead to very similar prediction, only the training profiles differ.
steps = 20 / epochs = 100
steps = 200 / epochs = 10

Divide the length of x_train by the batch size with
steps_per_epoch = x_train.shape[0] // batch_size

We split the training set into many batches. When we run the algorithm, it requires one epoch to analyze the full training set. An epoch is composed of many iterations (or batches).
Iterations: the number of batches needed to complete one Epoch.
Batch Size: The number of training samples used in one iteration.
Epoch: one full cycle through the training dataset. A cycle is composed of many iterations.
Number of Steps per Epoch = (Total Number of Training Samples) / (Batch Size)
Example
Training Set = 2,000 images
Batch Size = 10
Number of Steps per Epoch = 2,000 / 10 = 200 steps
Hope this helps for better understanding.

Torch - Large input and output sizes into neural network

I'm new to machine learning and seek some help.
I would like to train a network to predict the next values I expect as follow:
reference: [val1 val2 ... val 15]
val = 0 if it doesn't exists, 1 if it does exist.
Input: [1 1 1 0 0 0 0 0 1 1 1 0 0 0 0]
Output: [1 1 1 0 0 0 0 0 1 1 1 0 0 1 1] (last two values appear)
So my neural network would have 15 inputs and 15 outputs
I would like to know if there would be a better way to do that kind of prediction. Do my data would need normalization also?
Now the problem is, I dont have 15 values, but actually 600'000 of them. Can a neural network handle such big tensors, and I've hear I would need twice the number for hidden layer units.
Thanks a lot for your help, you machine learning expert!
Best

This is not a problem for the concept of a neural network: the question is whether your computing configuration and framework implementation deliver the required memory. Since you haven't described your topology, there's not a lot we can do to help you scope this out. What do you have for parameter and weight counts? Each of those is at least a short float (4 bytes). For instance, a direct FC (fully-connected) layer would give you (6e5)^2 weights, or 3.6e11 * 4 bytes => 1.44e12 bytes. Yes, that's pushing 1.5 terabytes for that layer's weights.
You can get around some of this with the style of NN you choose. For instance, splitting into separate channels (say, 60 channels of 1000 features each) can give you significant memory savings, albeit at the cost of speed in training (more layers) and perhaps some accuracy (although crossover can fix a lot of that). Convolutions can also save you overall memory, again at the cost of training speed.
600K => 4 => 600K
That clarification takes care of my main worries: you have 600,000 * 4 weights in each of two places: 1,200,004 parameters and 4.8M weights. That's 6M total floats, which shouldn't stress the RAM of any modern general-purpose computer.
The channelling idea is when you're trying to have a fatter connection between layers, such as 600K => 600K FC. In that case, you break up the data into smaller groups (usually just 2-12), and make a bunch of parallel fully-connected stream. For instance, you could take your input and make 10 streams, each of which is 60K => 60K FC. In your next layer, you swap the organization, "dealing out" each set of 60K so that 1/10 goes into each of the next channels.
This way, you have only 10 * 60K * 60K weights, only 10% as many as before ... but now there are 3 layers. Still, it's a 5x saving on memory required for weights, which is where you have the combinatorial explosion.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart