I have data suited to multinomial logistic regression but I don't know how to formulate the model in predicting my Y.
How do I perform Multinomial Logistic Regression using SPSS?
How does stepwise method work?
There are plenty of examples of annotated output for SPSS multinomial logistic regression:
UCLA example
My own list of links and resources
Stepwise method provides a data driven approach to selection of your predictor variables. In general the decision to use data-driven or direct entry or hierarchical approaches is related to whether you want to test theory (i.e., direct entry or hierarchical) or you want to simply optimise prediction (i.e., stepwise and related methods).
Related
Am trying to understand the difference in assumptions required for Naive Bayes and Logistic Regression.
As per my knowledge both Naive Bayes and Logistic Regression should have features independent to each other ie predictors should not have any multi co-linearity.
and only in Logistic Regression should follow linearity of independent variables and log-odds.
Correct me if am wrong and is there any other assumptions/differences between Naive and logistic regression
You are right durga. Them two have similar performances as well.
A difference is that NB assumes normal distributions, whereas logistic regression does not. As for speed, NB is much faster.
Logistic regression, according to this source:
1) Requires observations to be independent of each other. In other words, the observations should not come from repeated measurements or matched data.
2) Requires the dependent variable to be binary and ordinal logistic regression requires the dependent variable to be ordinal.
3) Requires little or no multicollinearity among the independent variables. This means that the independent variables should not be too highly correlated with each other.
4) Assumes linearity of independent variables and log odds.
5) Typically requires a large sample size. A general guideline is that you need at minimum of 10 cases with the least frequent outcome for each independent variable in your model.
tl;dr:
Naive Bayes requires conditional independence of the variables. Regression family needs the feature to be not highly correlated to have a interpretable/well fit model.
Naive Bayes require the features to meet the "conditional independence" requirement which means:
This is very much different than the "regression family" requirements. What they need is that variables are not "correlated". Even if the features are correlated, the regression model might only become overfit or might become harder to interpret. So if you use a proper regularization, you would still get a good prediction.
I am sure this question may not be in the brilliant category. But Somehow to learn machine learning i may start with stupid question. So, please.
I understood the terms of regressions partially.
The regression essentially give the idea of the relationship between the dependent and independent variables.
If the dependent variable is continuous and if you see the linear relation between dependent and independent, then linear regression is a way to go.
A slight change now. If the dependent value could be something like Binary value (Y/N), ie: the output value is binomial distribution, then logistic regression is a way to go that which demands non linear relationship between dependent and independent.
So far..Please correct me if i am wrong.
Now my question is with respect to ordinal logistic regression.
I have started looking at the below link for reference
https://statistics.laerd.com/spss-tutorials/ordinal-regression-using-spss-statistics.php
Where it is mentioned that " It can be considered as either a generalisation of multiple linear regression or as a generalisation of binomial logistic regression".
Could someone help me understand this above statement with examples?
Logistic regression can be considered as an extension of linear regression. But instead of predicting continuous variables, it predicts discrete variables by introducing the computation of an activation function. So, you are asked to produce a discriminatory function that based on X you produce a function that outputs f: [1,2, ..., k] where k is the number of classes that your problem presents. Now X can be composed of features that are both continuous or discrete. It does not matter, just make sure you apply pre-processing to them.
The base case for logistic regression is finding the decision boundary that divides two classes. But in order to add more classes, you have to implement another approach. There are several: softmax (https://en.wikipedia.org/wiki/Softmax_function), one-vs-all (https://en.wikipedia.org/wiki/Multiclass_classification), etc.
Finally, answering your question about ordinal logistic regression is an extension of logistic regression. But considers the order of the output variables such as in the case of a test. Take a look online for examples.
I am participating in the Kaggle San Francisco Crime competition and i am currently trying o number of different classifiers to test benchmark performances. I am using a LogisticRegressionClassifier from sklearn, without any parameter tuning and I noticed from sklearn.metrict.classification_report that it is only predicting the predominant classses,i.e. the classes which have the highest number of occurrences in my training set.
Intuition tells me that this has to parameter tuning, but I am not sure which parameters I have to tweek in order to make the classifier more aware of less predominant classes ( LogisticRegressionClassifier has quite a few ). At the moment it is predicting only 3 classes from 38 or smth like that so it definitely needs improvement.
Any ideas?
If your model is classifying only predominant classes then you are facing problem of imbalance classes. Here are some good reads to tackle this in machine learning.
Logistic Regression is a binary classifier and uses one-vs-all or one-vs-one technique for multiclass classification, which is not good if you have higher number of output classes (33 in your case). Try using other classifier. For a start , use softmax classifier which is an extension of logistic classifier having support for multi-class classification. In scikit learn, set multi_class variable as multinomial to use softmax regression.
Other way to improve your model could be using GridSearch for parameter tuning.
On a side note, I would recommend you to use other models as well.
I am trying to perform logistic regression for my data. I came to know about glm. What is the actual difference between glm and regular logistic regression?
What are the pros and cons of it?
Logistic Regression is a special case of Generalized Linear Models. GLMs is a class of models, parametrized by a link function. If you choose logit link function, you'll get Logistic Regression.
The main benefit of GLM over logistic regression is overfitting avoidance. GLM usually try to extract linearity between input variables and then avoid overfitting of your model. Overfitting means very good performance on training data and poor performance on test data.
I have been trying to figure out the correlation between the error rate and the number of features in both of these models. I watched some videos, and the creator of the video said that a simple model can be better than a complicated model. So I figured that the more features I had the greater the error rate would be. This did not prove to be true in my work, and when I had less features the error rate went up. I'm not sure if I'm doing this incorrectly, or if the guy in the video made a mistake. Can someone care to explain? I also am curious how features relate to Logistic Regression's error rate as well.
Naive Bayes and Logistic Regression are a "generative-discriminative pair," meaning they have the same model form (a linear classifier), but they estimate parameters in different ways.
For feature x and label y, naive Bayes estimates a joint probability p(x,y) = p(y)*p(x|y) from the training data (that is, builds a model that could "generate" the data), and uses Bayes Rule to predict p(y|x) for new test instances. On the other hand, logistic regression estimates p(y|x) directly from the training data by minimizing an error function (which is more "discrimative").
These differences have implications for error rate:
When there are very few training instances, logistic regression might "overfit," because there isn't enough data to estimate p(y|x) reliably. Naive Bayes might do better because it models the entire joint distribution.
When the feature set is large (and sparse, like word features in text classification) naive Bayes might "double count" features that are correlated with each other, because it assumes that each p(x|y) event is independent, when they are not. Logistic regression can do a better job by naturally "splitting the difference" among these correlated features.
If the features really are (mostly) conditionally independent, both models might actually improve with more and more features, provided there are enough data instances. The problem comes when the training set size is small relative to the number of features. Priors on naive Bayes feature parameters, or regularization methods (like L1/Lasso or L2/Ridge) on logistic regression can help in these cases.