Begining to code Logistic regression in java - machine-learning

I want to code the logistic regression(classification problem) algorithm using java -
Hypothesis is -
Can anyone please tell me what −(−θ to the power T) is?
I was able to code linear regression its hypothesis is which is relatively easy but can not start off with logistic regression.

ΘT is the transpose of parameters vector Θ and ΘTx is the linear combination of input features.If you know linear regression then you can think ΘTx as a output of linear regression. Look at the figure below.
The first part is the linear regression. The output of the linear regression is
. Since logistic regression is not a regression but a classification problem, your output shouldn't be continuous. Instead you require a binary output for any inputs. For this you need a function that maps the range of input to the value between 0 and 1 so that you can apply some threshold to the output to get the classification. And the suitable function for this would be sigmoid function as you mentioned.
Regrading your question, the output of linear regression can be written as
The term = ΘTx is the vectorized implementation of output of linear regression. So ΘT is nothing but a transpose of parameter vector. This can be understood by following mathematical operations.
For details in logistic regression and coding check this link.

The ΘT represenets transponse of theta matrix. Where theta matrix is matrix of features. When writing code for those algorthms, I strongly advice yout to use first MATLAB or OCTAVE software first for calculating matrices. Then, when you are sure that your algorithm is working correctly implement it in JAVA.
Cheers,
Emil

Related

What is the difference between LinearRegression and SGDRegressor?

I understand that both LinearRegression class and SGDRegressor class from scikit-learn performs linear regression. However, only SGDRegressor uses Gradient Descent as the optimization algorithm.
Then what is the optimization algorithm used by LinearRegression, and what are the other significant differences between these two classes?
LinearRegression always uses the least-squares as a loss function.
For SGDRegressor you can specify a loss function and it uses Stochastic Gradient Descent (SGD) to fit. For SGD you run the training set one data point at a time and update the parameters according to the error gradient.
In simple words - you can train SGDRegressor on the training dataset, that does not fit into RAM. Also, you can update the SGDRegressor model with a new batch of data without retraining on the whole dataset.
To understand the algorithm used by LinearRegression, we must have in mind that there is (in favorable cases) an analytical solution (with a formula) to find the coefficients which minimize the least squares:
theta = (X'X)^(-1)X'Y (1)
where X' is the the transpose matrix of X.
In the case of non-invertibility, the inverse can be replaced by the Moore-Penrose pseudo-inverse calculated using "singular value decomposition" (SVD). And even in the case of invertibility, the SVD method is faster and more stable than applying the formula (1).
PS - No LaTeX (MathJaX) in Stackoverflow ???
--
Pierre (from France)

Why should we use Lasso over Linear regression for feature selection in machine learning?

while selecting features in machine learning, one can use Lasso regression to figure out the least required feature by selecting the least coefficient but we can do the same using Linear Regression
linear regression
Y=x0+x1b1+x2b2.......xnbn
here x1,x2,x3...xn are coefficient, using gradient descent we get the best coefficient, we can remove the features who has the least coefficient. now when it is possible using Linear Regression then why should one use Lasso Regression?
am i missing something, please help
Lasso is a regularization technique which is for avoiding overfitting when you train your model. When you do not use any regularization technique, your loss function just tries to minimize the difference between the predicted value and real value min |y_pred - y|.
To minimize this loss function, gradient descent changes the coefficient of your model. This step may cause the overfitting of your model because your optimization function want only to minimize the difference between prediction and real value. To solve this issue, regularization techniques add another penalty term to the loss functions: value of coefficients. In this way, when your model tries to minimize the difference between predicted and real value, it also tries to do not increase the coefficients too much.
As you mentioned, you can select features in both ways, however, Lasso technique also takes care of the overfitting problem.

SVM vector of weights

I have a classification task, and I use svm_perf application.
The question is having trained the model I wonder whether it's possible to get the weight of the features.
There is an -a parametes which outputs the alphas, honestly I don't recall alphas in SVM I think the weights are always w.
If you are implementing linear SVM, there is a Python script based on the model file output by svm_learn and svm_perf_learn. To be more specific, the weight is just w=SUM_i (y_i*alpha_i*sv_i) where sv_i is the support vector, y_i is the category from trained sample.
If you are using non linear SVM, I don't think the weights coefficients are directly related to the input space. Yet you can get the decision function:
f(x) = sgn( SUM_i (alpha_i*y_i*K(sv_i,x)) + b );
where K is your kernel function.

Which Regression methods are suitable for binary valued features and continuous output?

I want to build a machine learning model to regression on continuous output given binary valued features(0,1). the dimension of my problem is around 200.
which of the flowing methods seems suitable for this kind of problem ?
SVR with different Kernels
Regression random forest
MARS
Gradient boosting with regression tree
Kernel regression (Nadya-Watson Kernel regression)
LSR and LARS
Stochastic gradient boosting
Intuitively speaking, anything requiring the calculation of a gradient is going to struggle on binary values. From your list, SVR and Forests would be the first place I'd look for a benchmark solution.
You can also look at expectation maximization for Bernoully mixture models.
It deals with binary input sets. You can find theory in book:
Christopher M. Bishop. "Pattern Recognition and Machine Learning".

What's the meaning of logistic regression dataset labels?

I've learned the Logistic Regression for some days, and i think the logistic regression's dataset's labels needs to be 1 or 0, is it right ?
But when i lookup the libSVM library's regression dataset, i see the label values are continues number(e.g. 1.0086,1.0089 ...), did i miss something ?
Note that the libSVM library could be used for regression problem.
Thanks so much !
Contrary to its name, logistic regression is a classification algorithm and it outputs class probability conditioned on the data point. Therefore the training set labels need to be either 0 or 1. For the dataset you mentioned, logistic regression is not a suitable algorithm.
SVM is a classification algorithm and it uses the input labels -1 or 1. It is not a probabilistic algorithm and it doesn't output class probabilities. It also can be adapted to regression.
Are you using a 3rd party library or programming this yourself? Generally the labels are used as ground truth so you can see how effective your approach was.
For example if your algo is trying to predict what a particular instance is it might output -1, the ground truth label will be +1 which means you did not successfully classify that particular instance.
Note that "regression" is a general term. To say someone will perform regression analysis doesn't necessarily tell you what algorithm they will be using, nor all of the nature of the data sets. All it really tells you is that you have a set of samples with features which you want to use to predict a single outcome value (a model for conditional probability).
One major difference between logistic regression and linear regression is that the former is usually trained on categorical, binary-labeled sample sets; while the latter is trained on real-labeled (ℝ) sample sets.
Any time your labels are real valued, it means you're probably going to use linear regression or similar, or else convert those real valued labels to categorical labels (e.g. via thresholds or bins) if you want to in fact use logistic regression. There is potentially a big difference in the quality and interpretation of your results though, if you try to convert from one such problem setup to another.
See also Regression Analysis.

Resources