How to compute the 99% percent quantile of a normal variable: Scilab - normal-distribution

I am trying to write a function that computes the quantile of the normal distribution using the function cdfnor.
for example
alpha= cdfnor("PQ",x,0,1)
anyone could help me to derive from this function the 99 percent quantile for example. how should I define the x?

I think the perctl function is what you are looking for...

Related

Find unknown parameters of the function, if f(x) and x are given

I need some help.
I have an equation:
f(x) = acos(x) + (bsqrt(x) + ctg(x))^2,
where a, b, c are unknown parameters.
Also I have a few pairs of x -> f(x) like in Supervised ML problem.
How can I find parameters? I'm thinking of some numerical methods and linear regression, but actually I don't know what to do.
The unknown parameters would minimize the sum of squared differences between computed function values and observed function values.
You could define such a sum of squared errors in Excel and use Excel Solver to minimize it.
From a Python program, you could try Scipy fsolve.

What statistic method to use in multivariate abundance data with random effects?

I am working with multivariate data with random effects.
My hypothesis is this: D has an effect on A1 and A2, where A1 and A2 are binary data, and D is a continuous variable.
I also have a random effect, R, that is a factor variable.
So my model would be something like this: A1andA2~D, random=1=~1|R
I tried to use the function manyglm in mvabund package, but it can not deal with random effects. Or I can use lme4, but it can not deal with multivariate data.
I can convert my multivariate data to a 4 level factor variable, but I didn't find any method to use not binary but factor data as a response variable. I also can convert the continuous D into factor variable.
Do you have any advice about what to use in that situation?
First, I know this should be a comment and not a complete answer but I can't comment yet and thought you might still appreciate the pointer.
You should be able to analyze your data with the MCMCglmm R package. (see here for an Intro), as it can handle mixed models with multivariate response data.

Different costs for underestimation and overestimation

I have a regression problem, but the cost function is different: The cost for an underestimate is higher than an overestimate. For example, if predicted value < true value, the cost will be 3*(true-predicted)^2; if predicted value > true value, the cost will be 1*(true-predicted)^2.
I'm thinking of using classical regression models such as linear regression, random forest etc. What modifications should I make to adjust for this cost function?
As I know, the ML API such as scikit-learn does not provide the functionality to directly modify the cost function. If I have to use these APIs, what can I do?
Any recommended reading?
You can use Tensorflow (or theano) for custom cost functions. The common linear regression implementation is here.
To find out how you can implement your custom cost function looking at a huber loss function implemented in tensorflow might help you. Here comes your custom cost function which you should replace in the linked code so instead of
cost = tf.reduce_sum(tf.pow(pred-Y, 2))/(2*n_samples)
in the linked code you'll have:
error = y_known - y_pred
condition = tf.less(error, 0)
overestimation_loss = 1 * tf.square(error)
underestimation_loss = 3 * tf.square(error)
cost = tf.reduce_mean(tf.where(condition, overestimation_loss, underestimation_loss))
Here when condition is true, error is lower than zero which means you y_known is smaller than y_pred so you'll have overestimation and so the tf.where statement will choose overestimation_loss otherwise underestimation loss.
The secret is that you'll compute both losses and choose where to use them using tf.where and condition.
Update:
If you want to use other libraries, if huber loss is implemented you can take a look to get ideas because huber loss is a conditional loss function similar to yours.
You can use asymmetric cost function to make your model overestimate or underestimate. You can replace cost function in this implementation with:
def acost(a): return tf.pow(pred-Y, 2) * tf.pow(tf.sign(pred-Y) + a, 2)
for more detail see this link

Wouldn't setting the first derivative of Cost function J to 0 gives the exact Theta values that minimize the cost?

I am currently doing Andrew NG's ML course. From my calculus knowledge, the first derivative test of a function gives critical points if there are any. And considering the convex nature of Linear / Logistic Regression cost function, it is a given that there will be a global / local optima. If that is the case, rather than going a long route of taking a miniscule baby step at a time to reach the global minimum, why don't we use the first derivative test to get the values of Theta that minimize the cost function J in a single attempt , and have a happy ending?
That being said, I do know that there is a Gradient Descent alternative called Normal Equation that does just that in one successful step unlike the former.
On a second thought, I am thinking if it is mainly because of multiple unknown variables involved in the equation (which is why the Partial Derivative comes into play?) .
Let's take an example:
Gradient simple regression cost function:
Δ[RSS(w) = [(y-Hw)T(y-Hw)]
y : output
H : feature vector
w : weights
RSS: residual sum of squares
Equating this to 0 for getting the closed form solution will give:
w = (H T H)-1 HT y
Now assuming there are D features, the time complexity for calculating transpose of matrix is around O(D3). If there are a million features, it is computationally impossible to do within reasonable amount of time.
We use these gradient descent methods since they give solutions with reasonably acceptable solutions within much less time.

Using my own kernel in libsvm

I am currently developing my own kernel to use for classification and want to include it into libsvm, replacing the standard kernels that libsvm offers.
I however am not 100% sure how to do this, and obviously do not want to make any mistakes. Beware, that my c++ is not very good. I found the following on the libsvm faq-page:
Q: I would like to use my own kernel. Any example? In svm.cpp, there
are two subroutines for kernel evaluations: k_function() and
kernel_function(). Which one should I modify ? An example is "LIBSVM
for string data" in LIBSVM Tools.
The reason why we have two functions is as follows. For the RBF kernel
exp(-g |xi - xj|^2), if we calculate xi - xj first and then the norm
square, there are 3n operations. Thus we consider exp(-g (|xi|^2 -
2dot(xi,xj) +|xj|^2)) and by calculating all |xi|^2 in the beginning,
the number of operations is reduced to 2n. This is for the training.
For prediction we cannot do this so a regular subroutine using that 3n
operations is needed. The easiest way to have your own kernel is to
put the same code in these two subroutines by replacing any kernel.
Hence, I was trying to find the two subroutinges k_function() and kernel_function(). The former I found with the following signature in svm.cpp:
double Kernel::k_function(const svm_node *x, const svm_node *y,
const svm_parameter& param)
Am I correct, that x and y each store one observation (=row) of my feature matrix in an array and I need to return the kernel value k(x,y)?
The function kernel_function() on the other hand I was not able to find at all. There is a pointer in the Kernel class with that name and the following declaration
double (Kernel::*kernel_function)(int i, int j) const;
which is set in the Kernel constructor. What are i and j in that case? I suppose I need to set this pointer as well?
Once I overwrote Kernel::k_function and Kernel::*kernel_function I'd be finished, and libsvm would use my kernel to compare two observations?
Thank you!
You don't have to break into the code of LIBSVM to use your own kernel, you can use the pre-computed kernel option (i.e., -t 4 training_set_file).
Thus, you can compute the kernel matrix externally as it suits you, store the values in a file and load the pre-computed kernel to LIBSVM. There's an explanation accompanied with an example of how to do this in the README file that you can find in LIBSVM tar ball (see in the Precomputed Kernels section line 236).

Resources