Categorical accuracy - machine-learning

How does categorical accuract works? By definition
categorical_accuracy checks to see if the index of the maximal true
value is equal to the index of the maximal predicted value.
and
Calculates the mean accuracy rate across all predictions for
multiclass classification problems
What does it mean in practice? Lets say i am prediction bounding box of object
it has (xmin,ymin,xmax,ymax) does it check if xmin predicted is equal with xmin real? So if i xmin and xmax where same in prediction and real values, and ymin and ymax were different i would get 50%?
Please help me undestand this concept

Traditionally for multiclass classification, your labels will have some integer (or equivalently categorical) label; for example:
labels = [0, 1, 2]
The output of a multiclass classification prediction will typically be a probability distribution of confidences; for example:
preds = [0.25, 0.5, 0.25]
Normally the index associated with the most likely event will be the index of the label. In this case, the argmax(preds) is 1, which maps to label 1.
You can see the total accuracy of your predictions a la confusion matrices, where one axis is the "true" value, and the other axis is the "predicted" value. The values for each cell are the sums of the values of CM[y_true][y_pred]. The accuracy will be the sum of main diagonal of the matrix (y_true = y_pred) over the total number of training instances.

Related

Selecting an IoU and confidence threshold for evaluation of model performance

mAP is commonly used to evaluate the performance of object detection models. However, there are two variables that need to be set when calculating mAP:
confidence threshold
IoU threshold
Just to clarify, confidence threshold is the minimum score that the model will consider the prediction to be a true prediction (otherwise it will ignore this prediction entirely). IoU threshold is the minimum overlap between ground truth and prediction boxes for the prediction to be considered a true positive.
Setting both of these thresholds to be low would result in a greater mAP. However, the low thresholds would most likely be inconsistent with the mAP scores from other studies. How does one select, and justify, these threshold values?
In Yolov5, we do NMS on the outputs of network, then calculate mAP. So, there is a conf_thres and an iou_thres in NMS to filter some boxes, these are set to 0.001 and 0.6, see: https://github.com/ultralytics/yolov5/blob/2373d5470e386a0c63c6ab77fbee6d699665e27b/val.py#L103.
When calculating mAP, we set iou threshold to 0.5 for mAP#0.5, or 0.5 to 0.95 with step 0.05 for mAP#0.5:0.95.
I guess the way of calculating mAP in Yolov5 is aligned to other framework. If I'm wrong, please correct me.

how to calculate theta in univariate linear regression model?

I have hypothesis function h(x) = theta0 + theta1*x.
How can I select theta0 and theta1 value for the linear regression model?
The question is unclear whether you would like to do this by hand (with the underlying math), use a program like Excel, or solve in a language like MATLAB or Python.
To start, here is a website offering a summary of the math involved for a univariate calculation: http://www.statisticshowto.com/probability-and-statistics/regression-analysis/find-a-linear-regression-equation/
Here, there is some discussion of the matrix formulation of the multivariate problem (I know you asked for univariate but some people find the matrix formulation helps them conceptualize the problem): https://onlinecourses.science.psu.edu/stat501/node/382
We should start with a bit of an intuition, based on the level of the question. The goal of a linear regression is to find a set of variables, in your case thetas, that minimize the distance between the line formed and the data points observed (often, the square of this distance). You have two "free" variables in the equation you defined. First, theta0: this is the intercept. The intercept is the value of the response variable (h(x)) when the input variable (x) is 0. This visually is the point where the line will cross the y axis. The second variable you have defined is the slope (theta1), this variable expresses how much the response variable changes when the input changes. If theta1 = 0, h(x) does not change when x changes. If theta1 = 1, h(x) increases and decreases at the same rate as x. If theta1 = -1, h(x) responds in the opposite direction: if x increases, h(x) decreases by the same amount; if x decreases, h(x) increases by the quantity.
For more information, Mathworks provides a fairly comprehensive explanation: https://www.mathworks.com/help/symbolic/mupad_ug/univariate-linear-regression.html
So after getting a handle on what we are doing conceptually, lets take a stab at the math. We'll need to calculate the standard deviation of our two variables, x and h(x). WTo calculate the standard deviation, we will calculate the mean of each variable (sum up all the x's and then divide by the number of x's, do the same for h(x)). The standard deviation captures how much a variable differs from its mean. For each x, subtract the mean of x. Sum these differences up and then divide by the number of x's minus 1. Finally, take the square root. This is your standard deviation.
Using this, we can normalize both variables. For x, subtract the mean of x and divide by the standard deviation of x. Do this for h(x) as well. You will now have two lists of normalized numbers.
For each normalized number, multiply the value by its pair (the first normalized x value with its h(x) pair, for all values). Add these products together and divide by N. This gives you the correlation. To get the least squares estimate of theta1, calculate this correlation value times the standard deviation of h(x) divided by the standard deviation of x.
Given all this information, calculating the intercept (theta0) is easy, all we'll have to do is take the mean of h(x) and subtract the product (multiply!) of our calculated theta1 and the average of x.
Phew! All taken care of! We have our least squares solution for those two variables. Let me know if you have any questions! One last excellent resource: https://people.duke.edu/~rnau/mathreg.htm
If you are asking about the hypothesis function in linear regression, then those theta values are selected by an algorithm called gradient descent. This helps in finding the theta values to minimize the cost function.

normalization in image processing

What is the correct mean of normalization in image processing? I googled it but i had different definition. I'll try to explain in detail each definition.
Normalization of a kernel matrix
If normalization is referred to a matrix (such as a kernel matrix for convolution filter), usually each value of the matrix is divided by the sum of the values of the matrix in order to have the sum of the values of the matrix equal to one (if all values are greater than zero). This is useful because a convolution between an image matrix and our kernel matrix give an output image with values between 0 and the max value of the original image. But if we use a sobel matrix (that have some negative values) this is not true anymore and we have to stretch the output image in order to have all values between 0 and max value.
Normalization of an image
I basically find two definition of normalization. The first one is to "cut" values too high or too low. i.e. if the image matrix has negative values one set them to zero and if the image matrix has values higher than max value one set them to max values. The second one is to linear stretch all the values in order to fit them into the interval [0, max value].
I will extend a bit the answer from #metsburg. There are several ways of normalizing an image (in general, a data vector), which are used at convenience for different cases:
Data normalization or data (re-)scaling: the data is projected in to a predefined range (i.e. usually [0, 1] or [-1, 1]). This is useful when you have data from different formats (or datasets) and you want to normalize all of them so you can apply the same algorithms over them. Is usually performed as follows:
Inew = (I - I.min) * (newmax - newmin)/(I.max - I.min) + newmin
Data standarization is another way of normalizing the data (used a lot in machine learning), where the mean is substracted to the image and dividied by its standard deviation. It is specially useful if you are going to use the image as an input for some machine learning algorithm, as many of them perform better as they assume features to have a gaussian form with mean=0,std=1. It can be performed easyly as:
Inew = (I - I.mean) / I.std
Data stretching or (histogram stretching when you work with images), is refereed as your option 2. Usually the image is clamped to a minimum and maximum values, setting:
Inew = I
Inew[I < a] = a
Inew[I > b] = b
Here, image values that are lower than a are set to a, and the same happens inversely with b. Usually, values of a and b are calculated as percentage thresholds. a= the threshold that separates bottom 1% of the data and b=the thredhold that separates top 1% of the data. By doing this, you are removing outliers (noise) from the image.
This is similar (simpler) to histogram equalization, which is another used preprocessing step.
Data normalization, can also be refereed to a normalization of a vector respect to a norm (l1 norm or l2/euclidean norm). This, in practice, is translated as to:
Inew = I / ||I||
where ||I|| refeers to a norm of I.
If the norm is choosen to be the l1 norm, the image will be divided by the sum of its absolute values, making the sum of the whole image be equal to 1. If the norm is choosen to be l2 (or euclidean), then image is divided by the sum of the square values of I, making the sum of square values of I be equal to 1.
The first 3 are widely used with images (not the 3 of them, as scaling and standarization are incompatible, but 1 of them or scaling + streching or standarization + stretching), the last one is not that useful. It is usually applied as a preprocess for some statistical tools, but not if you plan to work with a single image.
Answer by #Imanol is great, i just want to add some examples:
Normalize the input either pixel wise or dataset wise. Three normalization schemes are often seen:
Normalizing the pixel values between 0 and 1:
img /= 255.0
Normalizing the pixel values between -1 and 1 (as Tensorflow does):
img /= 127.5
img -= 1.0
Normalizing according to the dataset mean & standard deviation (as Torch does):
img /= 255.0
mean = [0.485, 0.456, 0.406] # Here it's ImageNet statistics
std = [0.229, 0.224, 0.225]
for i in range(3): # Considering an ordering NCHW (batch, channel, height, width)
img[i, :, :] -= mean[i]
img[i, :, :] /= std[i]
In data science, there are two broadly used normalization types:
1) Where we try to shift the data so that there sum is a particular value, usually 1 (https://stats.stackexchange.com/questions/62353/what-does-it-mean-to-use-a-normalizing-factor-to-sum-to-unity)
2) Normalize data to fit it within a certain range (usually, 0 to 1): https://stats.stackexchange.com/questions/70801/how-to-normalize-data-to-0-1-range

SVM : How to calculate the distance of a point to the margin?

Hello actually i'm developing a CBIR using opencv for features extraction and for the svm.
My issues is : I'm using a ONE_CLASS classifier with a RBF Kernel. I'm using the function predict of opencvSVM with the last parameter at true ( means that if the classifieur is binary then it return the signed distance to the margin ) in order to classify my data.
But even if this parameters is set to true it's return only the label ( so not very helpful in my case).
So my question is : What is the equation ( knowing the vector support) to calcul the distance of a data to the marge ?
Thanks
In general, distance from the separating hyperplane is the exact same thing SVM is using for classification. In other words classification equation is simply a sign of the signed distance you are asking for.
Given your kernel K, support vectors SV_i, and alpha coefficients alpha_i (Lagrange multipliers) and threshold (bias) b the equation is simply, given labels associated with each support vector y_i:
sgn_dist(x) = SUM_i alpha_i y_i K(x, SV_i) - b
This is signed distance, so you get positive value when it is on positive side and negative otherwise, if you want a "true" distance (without label) simply divide by label or take absolute value
dist(x) = |sgn_dist(x)| = |SUM_i alpha_i y_i K(x, SV_i) - b|

How to do a weighted sum for k-NN with multiple class?

I have a lots of data points (x, y) and I'm trying to use k-NN to predict future y'.
If y has only two possible values, then I can treat y = +1 or -1.
Each time I have a input x', find the nearest k elements, and multiple their y with inverse of distance(x,x').
If the sum is greater than 0, then I will predict y'=+1, otherwise y'=-1
However, right know my y has 10 different possible values.
How do I do similiar weighted sum under this situation?
You can keep different scores for each class that are always positive, with more positive being a stronger association. Then just take the class max score.

Resources