Difference between RidgeCV() and GridSearchCV() - machine-learning

RidgeCV() also searches on a set of hyperparameters, and given a similar kernel in GridSearchCV() along with similar parameters would there be any difference in the results of the two?

RidgeCV implements cross validation for ridge regression specifically, while with GridSearchCV you can optimize parameters for any estimator, including ridge regression.

Related

Sklearn models: decision function vs predict_proba for roc curve

In Sklearn, roc curve requires (y_true, y_scores). Generally, for y_scores, I feed in probabilities as outputted by a classifier's predict_proba function. But in the sklearn example, I see both predict_prob and decision_fucnction are used.
I wonder what is the difference in terms of real life model evaluation?
The functional form of logistic regression is -
f(x)=11+e−(β0+β1x1+⋯+βkxk)
This is what is returned by predict_proba.
The term inside the exponential i.e.
d(x)=β0+β1x1+⋯+βkxk
is what is returned by decision_function. The "hyperplane" referred to in the documentation is
β0+β1x1+⋯+βkxk=0
My Understanding after reading few resources:
Decision Function: Gives the distances from the hyperplane. These are therefore unbounded. This can not be equated to probabilities. For getting probabilities, there are 2 solutions - Platt Scaling & Multi-Attribute Spaces to calibrate outputs using Extreme Value Theory.
Predict Proba: Gives the actual probabilities (0 to 1) however attribute 'probability' has to be set to True while fitting the model itself. It uses Platt scaling which is known to have theoretical issues.
Refer to this in documentation.

Should I find regularization parameter or degree first for Support Vector Regression algorithm?

I am working on a problem to predict revenue generated by a film. I am using sklearn's support vector regression algorithm with polynomial kernel. I tried to find the degree which gives best accuracy using default value of regularization parameter. But, I got error percentage in the 7 digit range. So, I decided to increase variance, by tuning the regularization parameter.
So should I first assume a degree and find the regularization parameter which gives best result or vice versa? Or is there something else that I should consider?
Generally, doing a grid search with degree and regularization parameter is a common practice. There is some information about this is sklearn here:
https://scikit-learn.org/stable/modules/grid_search.html
This will allow you to make a dictionary of the kernels that you can try (rbf, poly, etc) and their respective hyperparameters and the regularization parameter and attempt to find the best one.

Ordinal logistic regression how it differs from logistic regression?

I am sure this question may not be in the brilliant category. But Somehow to learn machine learning i may start with stupid question. So, please.
I understood the terms of regressions partially.
The regression essentially give the idea of the relationship between the dependent and independent variables.
If the dependent variable is continuous and if you see the linear relation between dependent and independent, then linear regression is a way to go.
A slight change now. If the dependent value could be something like Binary value (Y/N), ie: the output value is binomial distribution, then logistic regression is a way to go that which demands non linear relationship between dependent and independent.
So far..Please correct me if i am wrong.
Now my question is with respect to ordinal logistic regression.
I have started looking at the below link for reference
https://statistics.laerd.com/spss-tutorials/ordinal-regression-using-spss-statistics.php
Where it is mentioned that " It can be considered as either a generalisation of multiple linear regression or as a generalisation of binomial logistic regression".
Could someone help me understand this above statement with examples?
Logistic regression can be considered as an extension of linear regression. But instead of predicting continuous variables, it predicts discrete variables by introducing the computation of an activation function. So, you are asked to produce a discriminatory function that based on X you produce a function that outputs f: [1,2, ..., k] where k is the number of classes that your problem presents. Now X can be composed of features that are both continuous or discrete. It does not matter, just make sure you apply pre-processing to them.
The base case for logistic regression is finding the decision boundary that divides two classes. But in order to add more classes, you have to implement another approach. There are several: softmax (https://en.wikipedia.org/wiki/Softmax_function), one-vs-all (https://en.wikipedia.org/wiki/Multiclass_classification), etc.
Finally, answering your question about ordinal logistic regression is an extension of logistic regression. But considers the order of the output variables such as in the case of a test. Take a look online for examples.

What's the meaning of logistic regression dataset labels?

I've learned the Logistic Regression for some days, and i think the logistic regression's dataset's labels needs to be 1 or 0, is it right ?
But when i lookup the libSVM library's regression dataset, i see the label values are continues number(e.g. 1.0086,1.0089 ...), did i miss something ?
Note that the libSVM library could be used for regression problem.
Thanks so much !
Contrary to its name, logistic regression is a classification algorithm and it outputs class probability conditioned on the data point. Therefore the training set labels need to be either 0 or 1. For the dataset you mentioned, logistic regression is not a suitable algorithm.
SVM is a classification algorithm and it uses the input labels -1 or 1. It is not a probabilistic algorithm and it doesn't output class probabilities. It also can be adapted to regression.
Are you using a 3rd party library or programming this yourself? Generally the labels are used as ground truth so you can see how effective your approach was.
For example if your algo is trying to predict what a particular instance is it might output -1, the ground truth label will be +1 which means you did not successfully classify that particular instance.
Note that "regression" is a general term. To say someone will perform regression analysis doesn't necessarily tell you what algorithm they will be using, nor all of the nature of the data sets. All it really tells you is that you have a set of samples with features which you want to use to predict a single outcome value (a model for conditional probability).
One major difference between logistic regression and linear regression is that the former is usually trained on categorical, binary-labeled sample sets; while the latter is trained on real-labeled (ℝ) sample sets.
Any time your labels are real valued, it means you're probably going to use linear regression or similar, or else convert those real valued labels to categorical labels (e.g. via thresholds or bins) if you want to in fact use logistic regression. There is potentially a big difference in the quality and interpretation of your results though, if you try to convert from one such problem setup to another.
See also Regression Analysis.

Multinomial logistic regression steps in SPSS

I have data suited to multinomial logistic regression but I don't know how to formulate the model in predicting my Y.
How do I perform Multinomial Logistic Regression using SPSS?
How does stepwise method work?
There are plenty of examples of annotated output for SPSS multinomial logistic regression:
UCLA example
My own list of links and resources
Stepwise method provides a data driven approach to selection of your predictor variables. In general the decision to use data-driven or direct entry or hierarchical approaches is related to whether you want to test theory (i.e., direct entry or hierarchical) or you want to simply optimise prediction (i.e., stepwise and related methods).

Resources