RHO in Keras framework - machine-learning

I use Keras framework for recurrent neural network. I found RHO argument in optimizer (RMSprop) parameter and I know that it is hyperparameter, but I cant find this parameter's description.
What does RHO mean?
link to documentaton
https://keras.io/optimizers/

Rho is a hyper-parameter which attenuates the influence of past gradient.
See also this link (gamma in this article)

Related

Sklearn models: decision function vs predict_proba for roc curve

In Sklearn, roc curve requires (y_true, y_scores). Generally, for y_scores, I feed in probabilities as outputted by a classifier's predict_proba function. But in the sklearn example, I see both predict_prob and decision_fucnction are used.
I wonder what is the difference in terms of real life model evaluation?
The functional form of logistic regression is -
f(x)=11+e−(β0+β1x1+⋯+βkxk)
This is what is returned by predict_proba.
The term inside the exponential i.e.
d(x)=β0+β1x1+⋯+βkxk
is what is returned by decision_function. The "hyperplane" referred to in the documentation is
β0+β1x1+⋯+βkxk=0
My Understanding after reading few resources:
Decision Function: Gives the distances from the hyperplane. These are therefore unbounded. This can not be equated to probabilities. For getting probabilities, there are 2 solutions - Platt Scaling & Multi-Attribute Spaces to calibrate outputs using Extreme Value Theory.
Predict Proba: Gives the actual probabilities (0 to 1) however attribute 'probability' has to be set to True while fitting the model itself. It uses Platt scaling which is known to have theoretical issues.
Refer to this in documentation.

How to set up loss function in PyTorch for Soft-Actor-Critic

I'm trying to implement a custom loss function for a soft Q-learning, actor-critic policy gradient algorithm in PyTorch. This comes from the following paper Learning from Imperfect Demonstrations. The structure of the algorithm is similar to deep q-learning, in that we are using a network to estimate Q-values, and we use a target network to stabilize results. Unlike DQN, however, we calculate V(s) from Q(s) by:
This is simple enough to calculate with PyTorch. My main question has to do with how to set up the loss function. Part of the update equation is expressed as:
Note that Q_hat comes from the target network. How can I go about putting something like this into a loss function? I can compute values for V and Q, but how can I handle the gradients in this case? If anyone can point me towards a similar example that would be much appreciated.
This turns out to be fairly simple assuming you can calculate V, Q, and Q^. After discussing this with some people offline I was able to get pytorch to calculate this loss by setting it up as:
loss = (Q-V)*(Q-Q_hat).detach()
optimizer.zero_grad()
loss.backward()
optimizer.step()

Machine Learning, After training, how exactly does it get a prediction? opencv

So after you have a machine learning algorithm trained, with your layers, nodes, and weights, how exactly does it go about getting a prediction for an input vector? I am using MultiLayer Perceptron (neural networks).
From what I currently understand, you start with your input vector to be predicted. Then you send it to your hidden layer(s) where it adds your bias term to each data point, then adds the sum of the product of each data point and the weight for each node (found in training), then runs that through the same activation function used in training. Repeat for each hidden layer, then does the same for your output layer. Then each node in the output layer is your prediction(s).
Is this correct?
I got confused when using opencv to do this, because in the guide it says when you use the function predict:
If you are using the default cvANN_MLP::SIGMOID_SYM activation
function with the default parameter values fparam1=0 and fparam2=0
then the function used is y = 1.7159*tanh(2/3 * x), so the output
will range from [-1.7159, 1.7159], instead of [0,1].
However, when training it is also stated in the documentation that SIGMOID_SYM uses the activation function:
f(x)= beta*(1-e^{-alpha x})/(1+e^{-alpha x} )
Where alpha and beta are user defined variables.
So, I'm not quite sure what this means. Where does the tanh function come into play? Can anyone clear this up please? Thanks for the time!
The documentation where this is found is here:
reference to the tanh is under function descriptions predict.
reference to activation function is by the S looking graph in the top part of the page.
Since this is a general question, and not code specific, I did not post any code with it.
I would suggest that you read about appropriate algorithm that your are using or plan to use. To be honest there is no one definite algorithm to solve a problem but you can explore what features you got and what you need.
Regarding how an algorithm performs prediction is totally depended on the choice of algorithm. Support Vector Machine (SVM) performs prediction by fitting hyperplanes on the feature space and using some metric such as distance for learning and than the learnt model is used for prediction. KNN on the other than uses simple nearest neighbor measurement for prediction.
Please do more work on what exactly you need and read through the research papers to get proper understanding. There is not magic involved in prediction but rather mathematical formulations.

SVM vector of weights

I have a classification task, and I use svm_perf application.
The question is having trained the model I wonder whether it's possible to get the weight of the features.
There is an -a parametes which outputs the alphas, honestly I don't recall alphas in SVM I think the weights are always w.
If you are implementing linear SVM, there is a Python script based on the model file output by svm_learn and svm_perf_learn. To be more specific, the weight is just w=SUM_i (y_i*alpha_i*sv_i) where sv_i is the support vector, y_i is the category from trained sample.
If you are using non linear SVM, I don't think the weights coefficients are directly related to the input space. Yet you can get the decision function:
f(x) = sgn( SUM_i (alpha_i*y_i*K(sv_i,x)) + b );
where K is your kernel function.

Begining to code Logistic regression in java

I want to code the logistic regression(classification problem) algorithm using java -
Hypothesis is -
Can anyone please tell me what −(−θ to the power T) is?
I was able to code linear regression its hypothesis is which is relatively easy but can not start off with logistic regression.
ΘT is the transpose of parameters vector Θ and ΘTx is the linear combination of input features.If you know linear regression then you can think ΘTx as a output of linear regression. Look at the figure below.
The first part is the linear regression. The output of the linear regression is
. Since logistic regression is not a regression but a classification problem, your output shouldn't be continuous. Instead you require a binary output for any inputs. For this you need a function that maps the range of input to the value between 0 and 1 so that you can apply some threshold to the output to get the classification. And the suitable function for this would be sigmoid function as you mentioned.
Regrading your question, the output of linear regression can be written as
The term = ΘTx is the vectorized implementation of output of linear regression. So ΘT is nothing but a transpose of parameter vector. This can be understood by following mathematical operations.
For details in logistic regression and coding check this link.
The ΘT represenets transponse of theta matrix. Where theta matrix is matrix of features. When writing code for those algorthms, I strongly advice yout to use first MATLAB or OCTAVE software first for calculating matrices. Then, when you are sure that your algorithm is working correctly implement it in JAVA.
Cheers,
Emil

Resources