Regression, classification on Machine Learning

Regression, classification on Machine Learning - machine-learning

I have a classification and regression question on machine learning.
First question, the following dataset
http://it.tinypic.com/view.php?pic=oh3gj7&s=8#.VIjhRDGG_lF
Can we say, the data set is linearly separable?
In order to apply a linear model for classication, a transformation of the input space is not needed for this dataset, or is not possible for this dataset?
My answer is no, but I am not sure for the second, I am not sure a transformation is possible for the dataset.
Second question about regression probl:
Give the following data set f : R -> R
http://it.tinypic.com/view.php?pic=madsmr&s=8#.VIjhVjGG_lE
Can we say that :
A linear model for regression can be used to learn the function associated to this data set ?
Given this data set, it is not possible to determine an optimal conguration of the linear model?
I am reading the book of Tom Mitchell Machine learning, and Pattern Recognition and Machine Learning Bishop, but I still have trouble giving the right answer.
Thanks in advance.

Neither of this datasets can be modeled using linear classification/regression.
In case of the "input data transfromation" if only dataset is consistent (there are no two exact same points with two different labels) there always exists transformation after which data is linearly separable. In particular one can construct it with:
phi(x) = 1 iff label of x is "1"
in other words, you map all positive samples to "1" and negatives to "0", so your data is now trivialy linearly separable. Or simply map your N points into N unit vectors in R^N space in such a way that i'th point is mapped to [0 0 0 ... 1 ... 0 0 0]^T where this "1" appears at i'th place. Such dataset is trivialy linearly separable for any labeling.

Related

Inverse prediction in Machine Learning

I have a question on inverse prediction in Machine Learning/Data Science. Here I give a example to illustrate my question: I have 20 input features X = (x0, x1, ... x19) and 3 output variables Y = (y0, y1, y2). The number of training/test data usually small, such as <1000 items or even <100 in the training set.
In general, by using the machine learning toolbox (such as scikit learn), I can train the models (such as random forest, linear/polynomial regression and neural network) from X --> Y. But what I actually want to know is, for example, how should I set X, so that I can have y1 values in a specific range (for example y1 > 100).
Does anyone know how to solve this kind of "inverse prediction"? There are two ways in my mind:
Train the model in the normal way: X-->Y, then set a dense mesh in the high dimension X space. In this example, it is 20 dimensions. Then use all the point in this mesh as input data and throw them to the trained model. Select all the input points where the predicted y1 > 100. Finally, use some methods, such as clustering to look for some patterns in the selected data points.
Direct learn models from Y to X. Then, set a dense mesh in the high dimension Y space, where let y1 > 100. Then use the trained models to calculate the X data points.
The second method might be OK when the Y also have high dimensions. But usually, in my application, Y is very low-dimension and X is very high-dimension, which makes me think method 2 is not very practical.
Does anyone have any new thoughts? I think this should be somehow very common in industry and maybe some people meet similar situation before.
Thank you!

From what I understand of your needs, #1 is an excellent fit for this problem. I recommend that you use a simple binary classifier SVM to discriminate good/bad X vectors. SVM works well with high-dimensional spaces, and reading out the coefficients is easy in most SVM interfaces.

Similar note that may be useful:
In inverse/backward prediction, we can predict inversely with similar accuracy of direct/forward prediction of X--->Y and backward of Y--->X only just with solving the systems of equations X<---->Y assuming weights and intercepts. Also, usually, it is better for linear problems AX=B. Note that it is usually possible the Python code for inverse prediction has a considerable error while solving the system of equations (n*n) is better choice with suitable accuracy for that.
Regards

Can a machine learning model provide information about mean and standard deviation of data on which it was trained?

Consider a parametric binary classifier (such as Logistic Regression, SVM etc.) trained on a dataset (say containing two features for e.g. Blood Pressure and Cholesterol level). The dataset is thrown away and the trained model can only be used as a black box (no tweaks and inside information can be gathered from the trained model). Only a set of data points can be provided and their labels predicted.
Is it possible to get information about the mean and/or standard deviation and/or range of the features of the dataset on which this model was trained? If yes, how so? and If no, then why can't we?
Thank you for your response! :)

SVM does not provide any information about the data statistics, it is a maximum margin classifier and it finds the best separating hyperplane between two datasets in the feature space, as a linear combination of "support vectors". If you use kernel functions, then this combination is in the kernel space, it is not even in the original feature space. SVM does not have a straightforward probabilistic interpretation whatsoever.
Logistic regression is a discriminative classifer and models the conditional probability p (y|x,w) where y is your label, x is your data and w are the features. After maximum likelihood training you are left with w and it is again a discriminator (hyperplane) in the feature space, so you don't have the features again.
The following can be considered. Use a Gaussian classifier. Assume that your class is produced by the prior class probability p (y). Then a class conditional density p (x|y,w) produces your data. Then by the Bayes rule, you will have: p (y|x,w) = (p (y)p (x|y,w))/p (x). If you define the class conditional density p (x|y,w) as Gaussian, its parameter set w will consists of the mean vector m and covariance matrix C of x, assuming it is being produced by the class y. But remember that, this will work only based on the assumption that the current data vector belongs to a specific class. Conditioned on w, a better option would be for mean vector: E [x|w]. This the expectation of x with respect to p (x|w). It comes down to a weighted average of mean vectors for the class y=0 and y=1, with respect to their prior class probabilities. Same should work for covariance as well, but it needs to be derived properly, I am not %100 sure right now.

Libsvm: SVM normalizing starts from 0 or 0.001

I am using libsvm for my document classification.
I use svm.h and svm.cc only in my project.
Its struct svm_problem requires array of svm_node that are non-zero thus using sparse.
I get a vector of tf-idf words with lets say in range [5,10]. If i normalize it to [0,1], all the 5's would become 0.
Should i remove these zeroes when sending it to svm_train ?
Does removing these would not reduce the information and lead to poor results ?
should i start the normalization from 0.001 rather than 0 ?
Well, in general, in SVM does normalizing in [0,1] not reduces information ?

SVM is not a Naive Bayes, feature's values are not counters, but dimensions in multidimensional real valued space, 0's have exactly the same amount of information as 1's (which also answers your concern regarding removing 0 values - don't do it). There is no reason to ever normalize data to [0.001, 1] for the SVM.
The only issue here is that column-wise normalization is not a good idea for the tf-idf, as it will degenerate yout features to the tf (as for perticular i'th dimension, tf-idf is simply tf value in [0,1] multiplied by a constant idf, normalization will multiply by idf^-1). I would consider one of the alternative preprocessing methods:
normalizing each dimension, so it has mean 0 and variance 1
decorrelation by making x=C^-1/2*x, where C is data covariance matrix

What is the difference between linear regression and logistic regression? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
When we have to predict the value of a categorical (or discrete) outcome we use logistic regression. I believe we use linear regression to also predict the value of an outcome given the input values.
Then, what is the difference between the two methodologies?

Linear regression output as probabilities
It's tempting to use the linear regression output as probabilities but it's a mistake because the output can be negative, and greater than 1 whereas probability can not. As regression might actually
produce probabilities that could be less than 0, or even bigger than
1, logistic regression was introduced.
Source: http://gerardnico.com/wiki/data_mining/simple_logistic_regression
Outcome
In linear regression, the outcome (dependent variable) is continuous.
It can have any one of an infinite number of possible values.
In logistic regression, the outcome (dependent variable) has only a limited number of possible values.
The dependent variable
Logistic regression is used when the response variable is categorical in nature. For instance, yes/no, true/false, red/green/blue,
1st/2nd/3rd/4th, etc.
Linear regression is used when your response variable is continuous. For instance, weight, height, number of hours, etc.
Equation
Linear regression gives an equation which is of the form Y = mX + C,
means equation with degree 1.
However, logistic regression gives an equation which is of the form
Y = eX + e-X
Coefficient interpretation
In linear regression, the coefficient interpretation of independent variables are quite straightforward (i.e. holding all other variables constant, with a unit increase in this variable, the dependent variable is expected to increase/decrease by xxx).
However, in logistic regression, depends on the family (binomial, Poisson,
etc.) and link (log, logit, inverse-log, etc.) you use, the interpretation is different.
Error minimization technique
Linear regression uses ordinary least squares method to minimise the
errors and arrive at a best possible fit, while logistic regression
uses maximum likelihood method to arrive at the solution.
Linear regression is usually solved by minimizing the least squares error of the model to the data, therefore large errors are penalized quadratically.
Logistic regression is just the opposite. Using the logistic loss function causes large errors to be penalized to an asymptotically constant.
Consider linear regression on categorical {0, 1} outcomes to see why this is a problem. If your model predicts the outcome is 38, when the truth is 1, you've lost nothing. Linear regression would try to reduce that 38, logistic wouldn't (as much)2.

In linear regression, the outcome (dependent variable) is continuous. It can have any one of an infinite number of possible values. In logistic regression, the outcome (dependent variable) has only a limited number of possible values.
For instance, if X contains the area in square feet of houses, and Y contains the corresponding sale price of those houses, you could use linear regression to predict selling price as a function of house size. While the possible selling price may not actually be any, there are so many possible values that a linear regression model would be chosen.
If, instead, you wanted to predict, based on size, whether a house would sell for more than $200K, you would use logistic regression. The possible outputs are either Yes, the house will sell for more than $200K, or No, the house will not.

Just to add on the previous answers.
Linear regression
Is meant to resolve the problem of predicting/estimating the output value for a given element X (say f(x)). The result of the prediction is a continuous function where the values may be positive or negative. In this case you normally have an input dataset with lots of examples and the output value for each one of them. The goal is to be able to fit a model to this data set so you are able to predict that output for new different/never seen elements. Following is the classical example of fitting a line to set of points, but in general linear regression could be used to fit more complex models (using higher polynomial degrees):
Resolving the problem
Linear regression can be solved in two different ways:
Normal equation (direct way to solve the problem)
Gradient descent (Iterative approach)
Logistic regression
Is meant to resolve classification problems where given an element you have to classify the same in N categories. Typical examples are, for example, given a mail to classify it as spam or not, or given a vehicle find to which category it belongs (car, truck, van, etc ..). That's basically the output is a finite set of discrete values.
Resolving the problem
Logistic regression problems could be resolved only by using Gradient descent. The formulation in general is very similar to linear regression the only difference is the usage of different hypothesis function. In linear regression the hypothesis has the form:
h(x) = theta_0 + theta_1*x_1 + theta_2*x_2 ..
where theta is the model we are trying to fit and [1, x_1, x_2, ..] is the input vector. In logistic regression the hypothesis function is different:
g(x) = 1 / (1 + e^-x)
This function has a nice property, basically it maps any value to the range [0,1] which is appropiate to handle propababilities during the classificatin. For example in case of a binary classification g(X) could be interpreted as the probability to belong to the positive class. In this case normally you have different classes that are separated with a decision boundary which basically a curve that decides the separation between the different classes. Following is an example of dataset separated in two classes.
You can also use the below code to generate the linear regression
curve
q_df = details_df
# q_df = pd.get_dummies(q_df)
q_df = pd.get_dummies(q_df, columns=[
"1",
"2",
"3",
"4",
"5",
"6",
"7",
"8",
"9"
])
q_1_df = q_df["1"]
q_df = q_df.drop(["2", "3", "4", "5"], axis=1)
(import statsmodels.api as sm)
x = sm.add_constant(q_df)
train_x, test_x, train_y, test_y = sklearn.model_selection.train_test_split(
x, q3_rechange_delay_df, test_size=0.2, random_state=123 )
lmod = sm.OLS(train_y, train_x).fit() lmod.summary()
lmod.predict()[:10]
lmod.get_prediction().summary_frame()[:10]
sm.qqplot(lmod.resid,line="q") plt.title("Q-Q plot of Standardized
Residuals") plt.show()

Simply put, linear regression is a regression algorithm, which outpus a possible continous and infinite value; logistic regression is considered as a binary classifier algorithm, which outputs the 'probability' of the input belonging to a label (0 or 1).

The basic difference :
Linear regression is basically a regression model which means its will give a non discreet/continuous output of a function. So this approach gives the value. For example : given x what is f(x)
For example given a training set of different factors and the price of a property after training we can provide the required factors to determine what will be the property price.
Logistic regression is basically a binary classification algorithm which means that here there will be discreet valued output for the function . For example : for a given x if f(x)>threshold classify it to be 1 else classify it to be 0.
For example given a set of brain tumour size as training data we can use the size as input to determine whether its a benine or malignant tumour. Therefore here the output is discreet either 0 or 1.
*here the function is basically the hypothesis function

They are both quite similar in solving for the solution, but as others have said, one (Logistic Regression) is for predicting a category "fit" (Y/N or 1/0), and the other (Linear Regression) is for predicting a value.
So if you want to predict if you have cancer Y/N (or a probability) - use logistic. If you want to know how many years you will live to - use Linear Regression !

Regression means continuous variable, Linear means there is linear relation between y and x.
Ex= You are trying to predict salary from no of years of experience. So here salary is independent variable(y) and yrs of experience is dependent variable(x).
y=b0+ b1*x1
We are trying to find optimum value of constant b0 and b1 which will give us best fitting line for your observation data.
It is a equation of line which gives continuous value from x=0 to very large value.
This line is called Linear regression model.
Logistic regression is type of classification technique. Dnt be misled by term regression. Here we predict whether y=0 or 1.
Here we first need to find p(y=1) (wprobability of y=1) given x from formuale below.
Probaibility p is related to y by below formuale
Ex=we can make classification of tumour having more than 50% chance of having cancer as 1 and tumour having less than 50% chance of having cancer as 0.
Here red point will be predicted as 0 whereas green point will be predicted as 1.

Cannot agree more with the above comments.
Above that, there are some more differences like
In Linear Regression, residuals are assumed to be normally distributed.
In Logistic Regression, residuals need to be independent but not normally distributed.
Linear Regression assumes that a constant change in the value of the explanatory variable results in constant change in the response variable.
This assumption does not hold if the value of the response variable represents a probability (in Logistic Regression)
GLM(Generalized linear models) does not assume a linear relationship between dependent and independent variables. However, it assumes a linear relationship between link function and independent variables in logit model.

| Basis | Linear | Logistic |
|-----------------------------------------------------------------|--------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------|
| Basic | The data is modelled using a straight line. | The probability of some obtained event is represented as a linear function of a combination of predictor variables. |
| Linear relationship between dependent and independent variables | Is required | Not required |
| The independent variable | Could be correlated with each other. (Specially in multiple linear regression) | Should not be correlated with each other (no multicollinearity exist). |

In short:
Linear Regression gives continuous output. i.e. any value between a range of values.
Logistic Regression gives discrete output. i.e. Yes/No, 0/1 kind of outputs.

To put it simply, if in linear regression model more test cases arrive which are far away from the threshold(say =0.5)for a prediction of y=1 and y=0. Then in that case the hypothesis will change and become worse.Therefore linear regression model is not used for classification problem.
Another Problem is that if the classification is y=0 and y=1, h(x) can be > 1 or < 0.So we use Logistic regression were 0<=h(x)<=1.

Logistic Regression is used in predicting categorical outputs like Yes/No, Low/Medium/High etc. You have basically 2 types of logistic regression Binary Logistic Regression (Yes/No, Approved/Disapproved) or Multi-class Logistic regression (Low/Medium/High, digits from 0-9 etc)
On the other hand, linear regression is if your dependent variable (y) is continuous.
y = mx + c is a simple linear regression equation (m = slope and c is the y-intercept). Multilinear regression has more than 1 independent variable (x1,x2,x3 ... etc)

In linear regression the outcome is continuous whereas in logistic regression, the outcome has only a limited number of possible values(discrete).
example:
In a scenario,the given value of x is size of a plot in square feet then predicting y ie rate of the plot comes under linear regression.
If, instead, you wanted to predict, based on size, whether the plot would sell for more than 300000 Rs, you would use logistic regression. The possible outputs are either Yes, the plot will sell for more than 300000 Rs, or No.

In case of Linear Regression the outcome is continuous while in case of Logistic Regression outcome is discrete (not continuous)
To perform Linear regression we require a linear relationship between the dependent and independent variables. But to perform Logistic regression we do not require a linear relationship between the dependent and independent variables.
Linear Regression is all about fitting a straight line in the data while Logistic Regression is about fitting a curve to the data.
Linear Regression is a regression algorithm for Machine Learning while Logistic Regression is a classification Algorithm for machine learning.
Linear regression assumes gaussian (or normal) distribution of dependent variable. Logistic regression assumes binomial distribution of dependent variable.

The basic difference between Linear Regression and Logistic Regression is :
Linear Regression is used to predict a continuous or numerical value but when we are looking for predicting a value that is categorical Logistic Regression come into picture.
Logistic Regression is used for binary classification.

How to purposely overfit Weka tree classifiers?

I have a binary class dataset (0 / 1) with a large skew towards the "0" class (about 30000 vs 1500). There are 7 features for each instance, no missing values.
When I use the J48 or any other tree classifier, I get almost all of the "1" instances misclassified as "0".
Setting the classifier to "unpruned", setting minimum number of instances per leaf to 1, setting confidence factor to 1, adding a dummy attribute with instance ID number - all of this didn't help.
I just can't create a model that overfits my data!
I've also tried almost all of the other classifiers Weka provides, but got similar results.
Using IB1 gets 100% accuracy (trainset on trainset) so it's not a problem of multiple instances with the same feature values and different classes.
How can I create a completely unpruned tree?
Or otherwise force Weka to overfit my data?
Thanks.
Update: Okay, this is absurd. I've used only about 3100 negative and 1200 positive examples, and this is the tree I got (unpruned!):
J48 unpruned tree
------------------
F <= 0.90747: 1 (201.0/54.0)
F > 0.90747: 0 (4153.0/1062.0)
Needless to say, IB1 still gives 100% precision.
Update 2: Don't know how I missed it - unpruned SimpleCart works and gives 100% accuracy train on train; pruned SimpleCart is not as biased as J48 and has a decent false positive and negative ratio.

Weka contains two meta-classifiers of interest:
weka.classifiers.meta.CostSensitiveClassifier
weka.classifiers.meta.MetaCost
They allows you to make any algorithm cost-sensitive (not restricted to SVM) and to specify a cost matrix (penalty of the various errors); you would give a higher penalty for misclassifying 1 instances as 0 than you would give for erroneously classifying 0 as 1.
The result is that the algorithm would then try to:
minimize expected misclassification cost (rather than the most likely class)

The quick and dirty solution is to resample. Throw away all but 1500 of your positive examples and train on a balanced data set. I am pretty sure there is a resample component in Weka to do this.
The other solution is to use a classifier with a variable cost for each class. I'm pretty sure libSVM allows you to do this and I know Weka can wrap libSVM. However I haven't used Weka in a while so I can't be of much practical help here.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart