bayesianoptimization in machine learning - machine-learning

Thanks for reading this. I am currently studying bayesoptimization problem and follow the tutorial. Please see the attachment.bayesian optimization tutorial
In page 11, about the acquisition function. Before I raise my question I need state my understanding about bayesian optimization to see if there is anything wrong.
First we need take some training points and assume them as multivariable gaussian ditribution. Then we need use acquisiont function to find the next point we want to sample. So for example we use x1....x(t) as training point then we need use acquisition function to find x(t+1) and sample it. Then we'll assume x1....x(t),x(t+1) as multivariable gaussian ditribution and then use acquisition function to find x(t+2) to sample so on and so forth.
In page 11, seems we need find the x that max the probability of improvement. f(x+) is from the sample training point(x1...xt) and easy to get. But how to get u(x) and that variance here? I don't know what is the x in the eqaution. It should be x(t+1) but the paper doesn't say that. And if it is indeed x(t+1), then how could I get its u(x(t+1))? You may say use equation at the bottom page 8, but we can use that equation on condition that we have found the the x(t+1) and put it into multivariable gaussian distribution. Now we don't know what is the next point x(t+1) so I have no way to calculate, in my opinion.
I know this is a tough question. Thanks for answering!!

In fact I have got the answer.
Indeed it is x(t+1). The direct way is we compute every u and varaince of the rest x outside of the training data and put it into acquisition function to find which one is the maximum.
This is time consuming. So we use nonlinear optimization like DIRECT to get the x that max the acquisition function instead of trying one by one

Related

Search for the optimal value of x for a given y

Please help me find an approach to solving the following problem: Let X is a matrix X_mxn = (x1,…,xn), xi is a time series and a vector Y_mx1. To predict values ​​from Y_mx1, let's train some model, let linear regression. We get Y = f (X). Now we need to find X for some given value of Y. The most naive thing is brute force, but what are the competent ways to solve such problems? Perhaps there is a use of the scipy.optimize package here, please enlighten me.
get an explanation or matherial to read for understanding
Most scipy-optimize algorithm use gradient method, for those optimization problem, we could apply these into re-engineering of data (find the best date to invest in the stock market...)
If you want to optimize the result, you should choose a good step size and suitable optimize method.
However, we should not classify tge problem as "predict" of xi because what we are doing is to find local/global maximum/minimum.
For example Newton-CG, your data/equation should contain all the information needed/a simulation, but no prediction is made from the method.
If you want to do a pretiction on "time", you could categorize the time data in "year,month..." then using unsupervise learning to "group" the data. If trend is obtained, then we can re-enginning the result to know the time

How Multicollinearity affects the model?

enter image description here
I took 4 features, all the features are the same X1=X2=X3=X4 and the target is Y=X1.
I am wondering, how multicollinearity affects the coefficients of the model?. I trained sklearn linear regression model with this data, It seems it does not have any effect on the coefficients. please help me to understand this.
See to understand what is the problem with multi-collinearity we need to understand what is slope.Slope is nothing but how much y changes when there is unit change in x when rest of the features are kept constant.Suppose you want to predict y with two features
y=m1x1+m2x2+b(ideal equation of line)
If there is multi-collinearity problem with above equation and if we try to change x1, eventually x2 will also change as they are correlated.This might create problem to calculate y(target variable) and may give a wrong answer.

How can I implement a custom loss function that takes into account multiple predictions of the network?

I am currently implementing a CNN with a custom error function.
The problem I am trying to solve is physics-based, so I can calculate the maximal achievable precision, or to put it another way, I know the best possible (i.e. minimal) standard deviation I can achieve. Those best possible precisions are calculated during the generation of the training data using the Cramer-Rao-lower bound (CRLB).
Right now, my error function looks something like this (in Keras):
def customLoss(yTrue, yPred):
STD = yTrue[:, 10:20]
yTrue = yTrue[:, 0:10]
dev = K.mean(K.abs(K.abs(yTrue - yPred) - STD))
return dev
In this case, I have 10 parameters, so I want to estimate with 10 CRLB's. I put the CRLB's in the target vector just to be able to handle the in the error function.
To my question. This method works, but it is not what I want. The problem is that the error is calculated considering a single prediction of the network, but to be correct the network would have to predict the same dataset/batch multiple times. By doing that I would be able to see the standard deviation of the prediction and use that to calculate the error (I'm using a Bayesian CNN).
Has someone an idea how to implement such a function in Keras or Tensorflow (I would also not mind switching to PyTorch)?

Find out the training error after fit()

I'm training a LinearSVC model and I want to get the training error of it. Is it possible to get it w/o evaluating it manually?
Thanks
sklearn is using liblinear for this task.
You can take a quick glance into the sources here:
self.coef_, self.intercept_, self.n_iter_ = _fit_liblinear(
X, y, self.C, self.fit_intercept, self.intercept_scaling,
self.class_weight, self.penalty, self.dual, self.verbose,
self.max_iter, self.tol, self.random_state, self.multi_class,
self.loss, sample_weight=sample_weight)
which shows that only coefficients, intercepts and number of iterations are processed by sklearn's python-API. Whatever else is available in liblinear's output is not grabbed. You can't directly read out the training-error without changing the internal code.
There might be a possible hack turning on verbose-mode, redirect the output and parse additional info available there. But this assumes the info you look for is available there and it's also hacky and i won't recommend it.
Just use the score-method. It won't be too costly compared to fitting.

optimizer.compute_gradients how the gradients are calculated programatically?

I'm new to machine learning.I was going through tensorflow and i have a doubt on a particular function.
grads_and_vars = optimizer.compute_gradients(loss) can someone explain how the gradients are calculated programatically(i.e what formula does it use to compute the gradients)?
Tensorflow uses an algorithm called reverse-mode automatic differentiation. It's too complex a topic to explain here, but the Wikipedia page is a good starting point:
https://en.wikipedia.org/wiki/Automatic_differentiation
Hope that helps!

Resources