How to convert a regression model into classification model - machine-learning

I have a regression model but I want to convert it into a classification model. Let's say between 0 to 4 "Bad" and 5 "Avg." and between 6 to 10 "Good". But I am not sure, how to do it? Can someone please help me out?

Basically what you mean is that you have ordinal classes that you want to change into categorical classes. Its a bit tricky but you can do this using two different ways:
In your dataframe, change the labels to Bad, Average and Good using your conditions, use a LabelEncoder to encode them and train the network using softmax activation in the final layer.
In case you do not want to train your network again, in the predict script, add an if-elif-else condition to set the output.

Related

Can logistic and lineair regression produce a prediction on a scale?

I currently have a dataset of drawings, each drawing being represented by some features. Each feature (independent variable) is a continuous number. None of the drawings have a label as of yet, which is why I am planning to start a sort of questionaire with people. However, before I can correctly setup such questionaire, I should have an idea of what kind of labels I should use for my training data.
At first thought, I was thinking about letting people rate the drawings on a scale, for example from 1 to 5 with 1 being bad, 3 being average and 5 being good. Alternatively, I could also reduce the question to a simple good or bad question. The latter would mean I lose some valuable information, but the dependent variable could then be considered 'binary'.
Using the training data I then composed, I would need to have a machine learning algorithm (model) which given a drawing, predicts if the drawing is good or not. Ideally, I would have some way of tuning the strictness in this prediction. For example, the model could instead of simply predicting 'good' or 'bad', predict the likelyhood of a painting being good on a scale of 0 to 1. I could then say "Well, let's say all paintings which are 70% likely to be good, are considered as good". Another example would be that the model predicts the goodness using the same categorical values the people used to rate the drawing initially. So it would either predict the drawing being a 1, 2, 3, 4 or 5. Similar to my first example, I could then say "Well, all paintings which are rated at least a 4, are considered good paintings" and tune this threshhold to my liking.
After doing some research, I came up with logistic and linear regression being good candidates. However, if which of the two would be the best for my scenario? Equally important, how would I need to format my labels? Just simple 0's and 1's or a scale?
You could use a 1 vs all representation if you wanted to use a multi-class categorical classification:
Essentially, you train 1 classifier for every category you have (you have 10 categories, so you have 10 classifiers) and then each classifier is just trained to predict whether or not the category belongs to each specific class.
There are alternative ways to make multi-class logistic regression work that only require training a single model, such as by using categorical cross entropy, but given that you'd like to use ordinal data, a linear regression used as a regression model is likely more ideal. You'd predict a value between 1 and 10 and then just round to the nearest integer. This way you aren't penalizing close guesses as much as far away guesses.
what keeps you from using a logistic regression model. Due to a lack of better dataset I used the standard diabetes data. The target variable is an integer between 50 and 200. I normalised the data between [-1,1] such that I can use sigmoid as activation function. For the loss I decided to use
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import MaxPooling2D, Input, Convolution2D
import numpy as np
from sklearn import datasets
diabetes = datasets.load_diabetes()
x_train=diabetes.data
y_train=2*(diabetes.target-min(diabetes.target))/(max(diabetes.target)-min(diabetes.target))-1
inputs = tf.keras.Input(shape=(x_train.shape[1],))
outputs = tf.keras.layers.Dense(1, activation=tf.nn.sigmoid)(inputs)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
model.compile(optimizer=tf.keras.optimizers.Adam(), # Optimizer
loss=tf.keras.losses.MSE,
metrics=['sparse_categorical_accuracy'])
history = model.fit(x_train, y_train,
batch_size=64,
epochs=300,
validation_data=(x_train, y_train))
You could also use a linear regression model. There you only need to replace the activation function by linear. However I think the squashing character, besides ensuring hat there is no rating larger 1 or smaller -1.
A last alternative would be to train pair-wise preference. The idea is to show the human two drawings and ask which one he likes more. Then build a binary model, e.g., logistic regression. This approach appears preferable to me as it is easier to answer for the human

Is it possible to train a binary classification neural network by only feeding it with input from only one class?

I'm basically trying to create a neural network that should tell me whether an input I'm giving it is valid or not. The problem is that I only have valid input with which I can train it.
Right now I am trying to come up with a working dense model that validates only mnist digits between 0 and 4. All other digits should be seen as invalid. First attempt was to train it with digits between 0 and 4 as valid and images with random pixels as invalid (with the same percent of black pixels as a normal image) but unfortunately it doesn't work. When I test it with digits between 5 and 9, they are seen as valid.
So I'm starting to think if it's even possible to train a neural network this way.
Also I realize there might be better ways to do this, maybe with an autoencoder or a different kind of network but right now I want to try this with only dense layers.
Thank you.
What you are looking for is one-class classification, also known as unary classification or class-modelling.
Quick google search suggests to train an autoencoder and define an object as in your class if the reconstruction error is below a specific threshold.
But if you start building up something like that i would suggest you to use something like One-Class K-Nearest Neighbor or One-Class SVM first to see if you get acceptable results. If so you can improve your results with the "extremly more complicated to develop"- solution using autoencoders

Machine learning multi-classification: Why use 'one-hot' encoding instead of a number

I'm currently working on a classification problem with tensorflow, and i'm new to the world of machine learning, but I don't get something.
I have successfully tried to train models that output the y tensor like this:
y = [0,0,1,0]
But I can't understand the principal behind it...
Why not just train the same model to output classes such as y = 3 or y = 4
This seems much more flexible, because I can imagine having a multi-classification problem with 2 million possible classes, and it would be much more efficient to output a number between 0-2,000,000 than to output a tensor of 2,000,000 items for every result.
What am I missing?
Ideally, you could train you model to classify input instances and producing a single output. Something like
y=1 means input=dog, y=2 means input=airplane. An approach like that, however, brings a lot of problems:
How do I interpret the output y=1.5?
Why I'm trying the regress a number like I'm working with continuous data while I'm, in reality, working with discrete data?
In fact, what are you doing is treating a multi-class classification problem like a regression problem.
This is locally wrong (unless you're doing binary classification, in that case, a positive and a negative output are everything you need).
To avoid these (and other) issues, we use a final layer of neurons and we associate an high-activation to the right class.
The one-hot encoding represents the fact that you want to force your network to have a single high-activation output when a certain input is present.
This, every input=dog will have 1, 0, 0 as output and so on.
In this way, you're correctly treating a discrete classification problem, producing a discrete output and well interpretable (in fact you'll always extract the output neuron with the highest activation using tf.argmax, even though your network hasn't learned to produce the perfect one-hot encoding you'll be able to extract without doubt the most likely correct output )
The answer is in how that final tensor, or single value, are calculated. In an NN, your y=3 would be build by a weighted sum over the values of the previous layer.
Trying to train towards single values would then imply a linear relationship between the category IDs where none exists: For the true value y=4, the output y=3 would be considered better than y=1 even though the categories are random, and may be 1: dogs, 3: cars, 4: cats
Neural networks use gradient descent to optimize a loss function. In turn, this loss function needs to be differentiable.
A discrete output would be (indeed is) a perfectly valid and valuable output for a classification network. Problem is, we don't know how to optimize this net efficiently.
Instead, we rely on a continuous loss function. This loss function is usually based on something that is more or less related to the probability of each label -- and for this, you need a network output that has one value per label.
Typically, the output that you describe is then deduced from this soft, continuous output by taking the argmax of these pseudo-probabilities.

Different weights for different classes in neural networks and how to use them after learning

I trained a neural network using the Backpropagation algorithm. I ran the network 30 times manually, each time changing the inputs and the desired output. The outcome is that of a traditional classifier.
I tried it out with 3 different classifications. Since I ran the network 30 times with 10 inputs for each class I ended up with 3 distinct weights but the same classification had very similar weights with a very small amount of error. The network has therefore proven itself to have learned successfully.
My question is, now that the learning is complete and I have 3 distinct type of weights (1 for each classification), how could I use these in a regular feed forward network so it can classify the input automatically. I searched around to check if you can somewhat average out the weights but it looks like this is not possible. Some people mentioned bootstrapping the data:
Have I done something wrong during the backpropagation learning process? Or is there an extra step which needs to be done post the learning process with these different weights for different classes?
One way how I am imaging this is by implementing a regular feed forward network which will have all of these 3 types of weights. There will be 3 outputs and for any given input, one of the output neurons will fire which will result that the given input is mapped to that particular class.
The network architecture is as follows:
3 inputs, 2 hidden neurons, 1 output neuron
Thanks in advance
It does not make sense if you only train one class in your neural network each time, since the hidden layer can make weight combinations to 'learn' which class the input data may belong to. Learn separately will make the weights independent. The network won't know which learned weight to use if a new test input is given.
Use a vector as the output to represent the three different classes, and train the data altogether.
EDIT
P.S, I don't think the link post you provide is relevant with your case. The question in that post arises from different weights initialization (randomly) in neural network training. Sometimes people apply some seed methods to make the weight learning reproducible to avoid such a problem.
In addition to response by nikie, another possibility is to represent output as one (unique) output unit with continuous values. For example, ann classify for first class if output is in the [0, 1) interval, for second if is in the [1, 2) interval and third classes in [2, 3). This architecture is declared in letterature (and verified in my experience) to be less efficient that discrete represetnation with 3 neurons.

percentage in classify of learning algorithem

I'm using weka, I have a training set, and the classify of the examples in the training set is boolean.
After I have the training set, I want to predict the percentage of new input to be true or false. I want to get a number between 0-1, and not only o or 1.
How can I do that, I have seen that in the prediection there are only the possibels classifes.
Thanks in advance.
You can only make the same kind of prediction with the learned classifier -- it learns to make the predictions you train it to make. The kind of prediction you want sounds more like regression. That is, you're don't want a strict classification, but a continuous value designating the membership probability.
The easiest way to achieve what you want is to replace the Booleans in your training set with 0/1 values and learn a regression model. This will give you numbers, although not necessarily only between 0 and 1.
To get real probabilities, you would need to use a classifier that calculates probabilities (such as Naive Bayes) and write some custom code (using the Weka library) to retrieve them. See the javadoc of the method that gives you access to the class probabilities.

Resources