Weight initialisation in ANN [closed] - machine-learning

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
In case of linear regression, each feature column has one coefficient value irrespective of the number of rows.
What about in neural network?
In case of single layered perceptron, whether it works same as linear regression or else the weights vary for each an every row in the feature column?

Let's see if I got you there.
In a MLP every neuron's output in a layer is a linear regression of all the neurons in the layer just before it, then an activation is, optionally, applied to the layer after doing the linear regression thing.
Every parameter in an ANN is not related to other parmaeters at least value wise.
The number of rows in data is the number of examples or batches and the number of cloumns is the number of features or inputs.
As for weight initialization, there are many techniques. The most common of them is Xavier initialization.
If you meant what is the shape of the matrix containing the weights of a layer, it should be of the shape (n_features,layer_out_size).

Related

Pytorch - Should 'CenterCrop' be used to test set? Does this count as cheating? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I'm learning image classification with Pytorch. I found some papers code use 'CenterCrop' to both train set and test set,e.g. Resize to larger size,then apply CenterCrop to obtain a smaller size. The smaller size is a general size in this research direction.
In my experience, I found apply CenterCrop can get a significant improvement(e.g. 1% or 2%) on test, compare to without CenterCrop on test set.
Because it is used in the top conference papers, confused me. So, Should CenterCrop be used to test set this count as cheating? In addition, should I use any data augmentation to test set except 'Resize' and 'Normalization'?
Thank you for your answer.
That is not cheating. You can apply any augmentation as long as the label is not used.
In image classification, sometimes people use a FiveCrop+Reflection technique, which is to take five crops (Center, TopLeft, TopRight, BottomLeft, BottomRight) and their reflections as augmentations. They would then predict class probabilities for each crop and average the results, typically giving some performance boost with 10X running time.
In segmentation, people also use similar test-time augmentation "multi-scale testing" which is to resize the input image to different scales before feeding it to the network. The predictions are also averaged.
If you do use such kind of augmentation, do report them when you compare with other methods for a fair comparison.

Identify time-series forecasting algorithm [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I'm trying to build an algorithm in C# based on these videos (CLICK!) My question is not related to the coding part of these tasks.
I'm trying to gain a deeper understanding of this algorithm since it is perfect for my assignment. However, the YouTuber doesn't identify it by name.I'd like to know any information that you can give me -- name, resources, etc.
Edit: It's Time-series decomposition model. Specifically, classical multiplicative decomposition.
Steps:
Calculate a moving average equal to the length of the season to
identify the trend cycle.
Center the moving average if the seasonal length is an even
number.
Calculate the actual as a proportion of the centered moving
average to obtain the seasonal index for each period.
Adjust the total of the seasonal indexes to equal the number of
periods.
Deseasonalized the time series by dividing it by the seasonal
index.
Estimated the trend-cyclical regression using deseasonalized
data.
Multiply the fitted trend values by their appropriate seasonal
factors to compute the fitted values
Calculate the errors and measure the accuracy of the fit using
known actual series.
If cyclical factors are important, calculate cyclical indexes.
Check for outliers, adjust the actual series and repeat steps
from 1 to 9 if necessary
It is a well known, well documented, identifiable algorithm.
One of the comments to the video says "What you did is Moving Average, can you please show us how to do Auto Regressive (AR) and Auto Regressive Moving Average (ARMA) if it is possible in Excel?"
You can learn about MA, AR, and AR(I)MA from this book - https://otexts.com/fpp2/

Does the back propagation for a FC network change when there's a dropout layer involved? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I have a fairly simple neural network - input layer, hidden layer, output layer. It's a fully connected (FC) neural network. I'm using a fairly standard gradient backpropagation. I'm wondering if I add a dropout layer to make the network input layer -> dropout layer -> hidden layer -> output layer, if I need to factor in the deletion layer into my backpropagation algorithm.
I can see it in two different ways:
1) It's random, so don't touch it
2) If I don't touch it, my final results will be exactly the same as if I didn't have a deletion layer.
So what's the proper way to handle a dropout layer when training a NN? Do I just not adjust the neurons that got deleted?
I found the answer here: https://wiseodd.github.io/techblog/2016/06/25/dropout/
" Dropout backprop
During the backprop, what we need to do is just to consider the Dropout. The killed neurons don’t contribute anything to the network, so we won’t flow the gradient through them.
dh1 *= u1
For full example, please refer to: https://github.com/wiseodd/hipsternet/blob/master/hipsternet/neuralnet.py."
In other words, dead neurons aren't contributing, so when we're back propagating, we don't adjust them.

Where Dropout should be inserted.? Fully Connected Layer.? Convolutional Layer.? or Both.? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I would like to get your feedback on where Dropout should be inserted?
Should it be located in the Fully connected layer (Dense), or Convolutional Layer.? or Both.?
Thank you for your feedback in advance.
Usually, dropout is placed on the fully connected layers only because they are the one with the greater number of parameters and thus they're likely to excessively co-adapting themselves causing overfitting.
However, since it's a stochastic regularization technique, you can really place it everywhere. Usually, it's placed on the layers with a great number of parameters, but no one denies you from applying it to convolutional layer instead (that got a lower number of parameters respect to the fc layers).
Moreover, the drop probability should be changed according to the impact of the regularization you want.
A rule of thumb is to set the keep probability (1 - drop probability) to 0.5 when dropout is applied to fully connected layers whilst setting it to a greater number (0.8, 0.9, usually) when applied to convolutional layers.
Just a note: since in every machine learning framework dropout is implemented in its "inverted" version, you should have to lower your learning rate in order to overcome the "boost" that the dropout probability gives to the learning rate.
For a more comprehensive assessment about it: https://pgaleone.eu/deep-learning/regularization/2017/01/10/anaysis-of-dropout/
Dropout is just a regularization technique for preventing overfitting in the network. It sets a node's weight to zero with a given probability during training, reducing the number of weights required for training at each iteration. It can be applied for each layer of the network (regardless if it is fully connected or convolutional), or after selected layers. To which layers dropout is applied is really just a design decision for what results in best performance.
You can choose where you want to put your Dropout. I usually use in after Convolution, but you can use it with the FC layer as well. Try different combinations so as to get the best results.

Real-Time Multi-Object Tracking with Learning [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
My Goal is to have real-time MultiTracker with Learning.I used Kalman filter to track an object but i found errors in the estimation while tracking.
The object was not been tracked continuously. I want to implement some learning mechanism along with Tracking.
One way i thought of doing this is,
1) calculate the average HSV of a particular roi then store that HSV value in a vector(Scalar or Vec3b)
2) Compare the new HSV value (average from some ROI) with all previous HSV values present in vector collection.
3)If the new HSV value did not match with the HSV values in vector then track this as a new separate object.
4) else if the new roi matched HSV values in vector, then it is said to be the same object present in the roi. continue tracking old object.
5) Have some regular time based checking to remove old HSV values in vector.
I tried KCF, MIL e.t.c they are not realtime. can you recommend any realtime learning mecahnism or ways to improve above proposed one.

Resources