Probability calculation of a normally distributed continuous variable - machine-learning

I see a formula to calculate the probability for any value(x=x1) in the image attached. Don't the probability of any continuous variable for a particular values would be zero? Because probability is the area right? which is computed between 2 values. So, don't the probability be 0 for any particular continuous value? Please someone correct me if i am wrong!

You are correct. The probability for any particular value in a continuous distribution is zero. The equation you've posted isn't a formula for the probability, it's a formula for the Probability Density Function
In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function, whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) can be interpreted as providing a relative likelihood that the value of the random variable would equal that sample. In other words, while the absolute likelihood for a continuous random variable to take on any particular value is 0 (since there are an infinite set of possible values to begin with), the value of the PDF at two different samples can be used to infer that, in any particular draw of the random variable, how much more likely it is that the random variable would equal one sample compared to the other sample.

Related

How Support Vector Regression works?

I'm trying to understand SVR model.
To do it I looked at SVM and it's pretty clear for me. But there is no much explications about SVR.
The first question is why it's called Support Vector Regression or how we use vectors to predict numerical values?
Also I don't understand some parameters such as epsilon and gamma. How they influence predicted result?
A SVM learns a so called decision function from your features, such that features from you positive class produce positive real numbers, and features from the negative class produce negative numbers (at least most of the time, depending on your data).
For two features you can visualize this in a 2D plane. The function assigns a real value to each point in the plane, this value can be depicted as color. This plot shows the values as different blue colors.
The feature values resulting in zero form the so called decision boundary.
This function itself has two kind of parameters:
kernel dependend parameters. In your case for the radial basis functions, these parameters are epsilon and gamma, which you set before learning.
And the so called support-vectors which are determined during learning. support-vectors are just parameters of your decision function.
Learning is nothing than determining good support-vectors (parameters !).
In this 2d example video the colors don't show the actual function value, but only the sign. You can see how gamma influences the smoothness of the decision function.
To answer you question:
SVR builds such a function but with a different goal. The function does not try to assign positive outcomes to your postive examples, and negative outcomes to the negative examples.
Instead the function is built to approximate the given numeric outcomes.

Cumulative Density Function for a variable that it's probability should increase while the random variable value increases

Let's say I have a variable and plot a histogram of it. Turns out the histogram looks like a exponentially decreasing curve.
The highest probability will be arround the values where the most frequent events happened. For me this is not what I'm looking for because the less probability the PDF is giving me it represents that I need to have a higher probability than the most frequent events.
If I fit a distribution and get the CDF looks like I'm getting what I want.
Now let's say I have another feature more and have the same tendency where the less probability the PDF is giving me I need to have a higher probability and then fit a PDF and get the CDF.
From these two probability values I need to get one. What are the methos for calculating this unique probability value. I thought: multiplying the 2 probabilities and a weighted sum (One of these 2 features has more relevance).

How to make the labels of superpixels to be locally consistent in a gray-level map?

I have a bunch of gray-scale images decomposed into superpixels. Each superpixel in these images have a label in the rage of [0-1]. You can see one sample of images below.
Here is the challenge: I want the spatially (locally) neighboring superpixels to have consistent labels (close in value).
I'm kind of interested in smoothing local labels but do not want to apply Gaussian smoothing functions or whatever, as some colleagues suggested. I have also heard about Conditional Random Field (CRF). Is it helpful?
Any suggestion would be welcome.
I'm kind of interested in smoothing local labels but do not want to apply Gaussian smoothing functions or whatever, as some colleagues suggested.
And why is that? Why do you not consider helpful advice of your colleagues, which are actually right. Applying smoothing function is the most reasonable way to go.
I have also heard about Conditional Random Field (CRF). Is it helpful?
This also suggests, that you should rather go with collegues advice, as CRF has nothing to do with your problem. CRF is a classifier, sequence classifier to be exact, requiring labeled examples to learn from and has nothing to do with the setting presented.
What are typical approaches?
The exact thing proposed by your collegues, you should define a smoothing function and apply it to your function values (I will not use a term "labels" as it is missleading, you do have values in [0,1], continuous values, "label" denotes categorical variable in machine learning) and its neighbourhood.
Another approach would be to define some optimization problem, where your current assignment of values is one goal, and the second one is "closeness", for example:
Let us assume that you have points with values {(x_i, y_i)}_{i=1}^N and that n(x) returns indices of neighbouring points of x.
Consequently you are trying to find {a_i}_{i=1}^N such that they minimize
SUM_{i=1}^N (y_i - a_i)^2 + C * SUM_{i=1}^N SUM_{j \in n(x_i)} (a_i - a_j)^2
------------------------- - --------------------------------------------
closeness to current constant to closeness to neighbouring values
values weight each part
You can solve the above optimization problem using many techniques, for example through scipy.optimize.minimize module.
I am not sure that your request makes any sense.
Having close label values for nearby superpixels is trivial: take some smooth function of (X, Y), such as constant or affine, taking values in the range [0,1], and assign the function value to the superpixel centered at (X, Y).
You could also take the distance function from any point in the plane.
But this is of no use as it is unrelated to the image content.

how to interpret the "soft" and "max" in the SoftMax regression?

I know the form of the softmax regression, but I am curious about why it has such a name? Or just for some historical reasons?
The maximum of two numbers max(x,y) could have sharp corners / steep edges which sometimes is an unwanted property (e.g. if you want to compute gradients).
To soften the edges of max(x,y), one can use a variant with softer edges: the softmax function. It's still a max function at its core (well, to be precise it's an approximation of it) but smoothed out.
If it's still unclear, here's a good read.
Let's say you have a set of scalars xi and you want to calculate a weighted sum of them, giving a weight wi to each xi such that the weights sum up to 1 (like a discrete probability). One way to do it is to set wi=exp(a*xi) for some positive constant a, and then normalize the weights to one. If a=0 you get just a regular sample average. On the other hand, for a very large value of a you get max operator, that is the weighted sum will be just the largest xi. Therefore, varying the value of a gives you a "soft", or a continues way to go from regular averaging to selecting the max. The functional form of this weighted average should look familiar to you if you already know what a SoftMax regression is.

Is the median or mean of a set of values in Decibels (dB) taken directly or conversion to linear is required

I need to take the median of a set of values of path loss (dB) in MatLab. Does anyone know that they shall be converted to linear units like Watts before their median is calculated by the formula. The result is different in both the cases but i don't know which one is correct.
The median is just the middle number if you were to line them up in ascending or descending order. It should give the same result if you convert to a linear scale or if you keep it as dB.
However, if there are an even number of values, then it will make a difference and you should probably stick with the linear values.

Resources