Why are there no dropout layers in inception net? [closed] - machine-learning

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
I recently was implementing the InceptionNet and came across the scenario where the dropout layer was not implemented in the network at all in the early or mid stages. Any particular reason for this?

You can see this paper posted model:

It actually has a slight regularization effect which is similar to dropout.
Think like that we are choosing every node with a certain possibility for that
layer so we creating our NN architecture with a possibilities. Similar
situation is valid also in here but this time we apply the all possibilities.
Hence, inception network helps to prevent over fitting the parameters so that
learning is happening for more deeper understanding please check out the
original paper but that is just an observation not a prove.

Related

End-to-end machine learning project processes [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 1 year ago.
Improve this question
I've read a book chapter that walks you through all the steps involved in an end-to-end machine learning project. After doing all the practical exercises I'm still not quite sure that my way of thinking about the whole process is right.
I've tried to depict it in the following flowchart:
Is this the right way of thinking about all the steps in an ML project? Is something missing?
Seems decent.
Just want to mention that the cross-validation and model-selection in your short-listing step could also include tuning the pipelines, because different types of transformations may be suitable to different models.
For example, when there are sparse or categorical features, the pipelines may matter a lot.

why dont we need feature scaling in multiple linear regression [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
I am currently learning ML
and I notice that in multiple linear regression we don't need scaling for our independent variable
and I didn't know why?
Whether feature scaling is useful or not depends on the training algorithm you are using.
For example, to find the best parameter values of a linear regression model, there is a closed-form solution, called the Normal Equation. If your implementation makes use of that equation, there is no stepwise optimization process, so feature scaling is not necessary.
However, you could also find the best parameter values with a gradient descent algorithm. This could be a better choice in terms of speed if you have many training instances. If you use gradient descent, feature scaling is recommended, because otherwise the algorithm might take much longer to converge.

Training and Testing deep CNN with same piece of data [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 5 years ago.
Improve this question
I have created a deep CNN from a research paper (using tensorflow) and am now curious if I have done everything correctly. Eventually I want to train and test the CNN on many images, but at the moment I only have one image on hand. If I was to use this one image as training and testing data, should the CNN always have 100% accuracy
Yes, with only one image to match, you will have a trivial case of perfect accuracy (1-for-1). However, this is only a "breath of life" test for your model. All you'll know is that you are functionally capable of running one image through that model; this will tell you nothing (or very little) about your topology's effectiveness with that type of image.

Can we improve performance by using BatchNorm with PReLU? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 4 years ago.
Improve this question
I have a network as follows
BN-Scale-ReLU
I want to replace ReLU by PReLU. Then, it will be
BN-Scale-PReLU
Could I obtain any gain with the second setting? Why? As I search,The second setting is not so popular. In some paper, they replaced BN-Scale-ReLU=PReLU. Is it right?
There is a paper evaluating these choices, which can be found here: https://arxiv.org/pdf/1606.02228.pdf. They do get better accuracy by using PReLU, but that is very minor. I am unsure if the improvement offsets the higher workload you have to do by using PReLU instead of ReLU. The question is are you already evaluating for that last percentage point in accuracy? If not do not bother yet with choices that only have minor impact on the performance of the model.

When to use regression trees/forests? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 7 years ago.
Improve this question
As I was looking for a fine regression algorithm for my problem. I found out one can do that with simple decision trees as well, which is usually used for classification. The output would be something like:
The red noise would be the prediction states of such a tree or forest.
Now my question is, why at all to use this method, when there are alternatives, that really try to figure out the underlying equation (such as the famous support vector machines SVM). Are there any positive / unique aspects, or was a regression tree more a nice-to-have-algorithm?
The image you posted conveys a smooth function of y in x. A regression tree is certainly not the best technique to estimate such a function and I probably wouldn't use SVMs either. This looks like a good application for splines, e.g., by using a GAM (generalized additive model).
A regression tree on the other hand is a handy tool if you haven't got such smooth functions and if you don't know which explanatory variable will have which effect on the response. It will be particularly useful if there are jumps in the response or interactions - especially if the jump points and interaction patterns are not known in advance.

Resources