About the ROC of deep leaning model [closed] - machine-learning

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
I have made a deep neural network which classify such data, with a threshold of 0.5, it means if the incoming data is more than 0.5, the output is 1 and 0 else.
My question, is that graph is logic and is the choice of threshold 0.5 is correct?

The graph seems to show the true positives (TP) and false negatives (FN) for different thresholds, not just for 0.5. There's no way to tell from that graph whether a threshold of 0.5 is a good value since we don't know the corresponding threshold values. Even if the values were shown in the graph, it still wouldn't be possible to say whether 0.5 is a good value, because it depends on the ratio of TP/FN you want for your application.

Related

Batch normalization [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 1 year ago.
Improve this question
Why the batch normalization is working on the different samples of the same characteristics instead of different characteristics of the same sample? Shouldn't it be the normalization of different features? In the diagram, why do we use the first row and not the first column?
Could someone help me?
Because different features of the same object mean different things, and it's not logical to calculate some statistics over these values. They can have different range, mean, std, etc. E.g. one of your features could mean the age of a person and other one is the height of the person. If you calculate mean of these values you will not get any meaningful number.
In classic machine learning (especially in linear models and KNN) you should normalize your features (i.e. calculate mean and std of the specific feature over the entire dataset and transform your features to (X-mean(X)) / std(X) ). Batch normalization is analogue of this applied to stochastic optimization methods, like SGD (it's not meaningful to use global statistics on mini batch, furthermore you want to use batch norm more often than just before the first layer). More fundamenal ideas could be found in the original paper

In locally weighted regression, how determine distance from query point with more than one dimension [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 1 year ago.
Improve this question
If the query point in a locally weighted regression is multidimensional (for different features), how do we determine if there are points close-by the query point? This is especially true if the features have different units.
If x is a vector of the individual differences for each feature, one could use a few different norms to measure the "size" of x (and the distance between any two points). The most commonly used norm is the L2 norm. Different normalization schemes could be used but if you scale each feature so that 80% of the points have a value between -10 and 10 you should be OK.

when using RobustScaler should you transform y train? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
I am using RobustScaler to fit and transform x_train and x_test. Should i also transform
y_train and y_test as well. I was thinking this because neural net gives weird val loss.
Sometimes val loss is small and good but sometimes its high and bad maybe its just initialized weights
of neural net but just want to make sure.
No, you shouldn't. You should scale your Xs because otherwise you neural network can start thinking some of the features are more usefull only because too big values.
y - is the result. Scaling it - is pointless activity. Neural networks can produce big numbers.
Actually, NNs can process big values, when all features have the same "weight". Using scalers is just a good practice.

Davies-Bouldin Index higher or lower score better [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed last year.
Improve this question
I trained a KMEANS clustering model using Google Bigquery, and it gives me these metrics in the evaluation tab of my model. My question is are we trying to maximize or minimize Davies-Bouldin index and mean-squared distance?
Davies-Bouldin index is a validation metric that is often used in order to evaluate the optimal number of clusters to use. It is defined as a ratio between the cluster scatter and the cluster’s separation and a lower value will mean that the clustering is better.
Regarding the second metric, the mean squared distance makes reference to the intra cluster variance, which we want to minimize as a lower WCSS (within-cluster sums of squares) will maximize the distance between clusters.

Does the order of terms in MSE matter in case of differentiation? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Mean squared error is a popular cost function used in machine learning:
(1/n) * sum(y - pred)**2
Basically the order of subtraction terms doesn't matter as the whole expression is squared.
But if we differentiate this function, it will no longer be squared:
2 * (y - pred)
Would the order make a difference for a neural network?
In most cases reversing the order of the terms y and pred would change the sign of the result. As we use the result to compute the slope of the weight - would it influence the way the neural network converges?
Well, actually
and
so they're the same.
(I took the derivative w.r.t. y_i assuming those are the network outputs but of course the same holds if you derive by \hat{y}_i.)

Resources