Predicting inputs in neural network - machine-learning

Is it possible to predict inputs in "Keras neural network" for a particular output?
For example, I have a dataset with 28 inputs and 3 outputs. So, I have trained the model in Keras which works fine. Now, I have to enter the particular values in outputs and I have to predict that what will be the inputs for that particular output.

I'm not 100% sure I understand the question correctly, but if you're trying to build a model that can take inputs and predict outputs, then you will need to train a second model to predict inputs from outputs, where you swap the inputs and outputs so that outputs are your inputs, and your inputs are the outputs. Although this might be annoying, you might have to build a separate network to predict each of your input variables.
To get around this problem, you can consider autoencoders if you're okay with getting a close approximation of the input. An autoencoder is an unsupervised artificial neural network that learns how to efficiently compress and encode data then learns how to reconstruct the data back from the reduced encoded representation to a representation that is as close to the original input as possible (you can read more here: https://towardsdatascience.com/auto-encoder-what-is-it-and-what-is-it-used-for-part-1-3e5c6f017726).

Yes it is definitely possible to predict inputs from the output. In fact, what you're describing is essentially an autoencoder.
Let's say you have a NN trained on MNIST. If you then use the outputs of the classification layer to train the decoder of an auto encoder, you will get a rough indication of the input.
However this is not the best way to do it. The best way to do it is to simply have the latent space be considered the "output", then feed this output into:
a): A 1 layer classification to give you the predicted output and
b): the decoder
This will give you the predicted output and the original image

Related

How to classify images with Variational Autoencoder

I have trained an autoencoder in both labeled images (1200) and unlabeled images (4000) and I have both models saved separately (vae_fake_img and vae_real_img). So I was wondering what to do next. I know Variational Autoencoders are not useful for a classification task but feature extraction seems like a good try. So here are my attempts:
Labeled my unlabeled data using k-means clustering from the labeled images latent space.
My supervisor suggested training the unlabeled images on the VAE, then visualize the latent space with t-SNE, then K-means clustering, then MLP for final prediction.
I want to train a Conditional VAE to create more labeled samples and retrain the VAE and use the reconstruction (64,64,3) output and using the last three fully connected (FC) layers of VGGNet16 architecture for final classification as done in this paper Encoder as feature extraction paper.
I have tried so many methods for my thesis and I really need to achieve high accuracy if I want to get a job in my current internship. So any suggestion or guidance is highly appreciated. I've read so many Autoencoder papers but the architecture for classification is not fully explained (or Im not understanding properly), I want to know which part of the VAE holds more information for multiclassification as I believe that the latent space of the encoder has more useful information than the decoder reconstruction. I want to know which part of the autoencoder has better feature extraction for a final classification.
in case of Autoencoders yoh don't need labels for reconstructing input data. So I think these approaches might make slight improvements:
Use VAE(Variational Auto Encoder) instead of AE
Use Conditional VAE(CVAE) and the combine all the data and train the network feeding all of data into that.
consider Batch as condition, for labeled and unlabeled data and use onehot of batch of data as its condition.
Inject the condition to Encoder and Decoder
Then the latent space won't have any batch effect and you can use KNN to get the label of nearest labeled data for unlabeled ones.
Alternatively you can train a somple MLP to classify every sample of your latent space. (in this approach you should train the MLP only with labeled data and then test it on unlabeled data)
don't forget Batch normalization and drop out layers
p.s., the most meaningful layer of an AE is the latent space.

Using probabilities from a model as features to another

I'd like to use the probability output from a model as features to another model.
For instance, I want to determine what kind of bird is on a picture, I want to use a CNN, train it and then use the probability result with other data, like size and weight from the bird, and feed it to a svm.
Do I need to use training and testing set for extracting these probabilities using the CNN? Should I devide my dataset into folds and then extract the probabilities for each different testing fold or can I just train and test on all my data and save the probabilities?
A test set is intended to validate your classifier reaches its goals, or alternatively to set hyper-parameters. In this case, you're not interested in the output of the CNN, as it's just an intermediate layer in the bigger picture.
Having said that, you're apparently not back-propagating SVM errors through its inputs. That's the consequence of a two-stage model. If you did, you'd be optimizing the CNN for use as input to that particular SVM.

Neural Networks normalizing output data

I have a training data for NN along with expected outputs. Each input is 10 dimensional vector and has 1 expected output.I have normalised the training data using Gaussian but I don't know how to normalise the outputs since it only has single dimension. Any ideas?
Example:
Raw Input Vector:-128.91, 71.076, -100.75,4.2475, -98.811, 77.219, 4.4096, -15.382, -6.1477, -361.18
Normalised Input Vector: -0.6049, 1.0412, -0.3731, 0.4912, -0.3571, 1.0918, 0.4925, 0.3296, 0.4056, -2.5168
The raw expected output for the above input is 1183.6 but I don't know how to normalise that. Should I normalise the expected output as part of the input vector?
From the looks of your problem, you are trying to implement some sort of regression algorithm. For regression problems you don't normally normalize the outputs. For the training data you provide for a regression system, the expected output should be within the range you're expecting, or simply whatever data you have for the expected outputs.
Therefore, you can normalize the training
inputs to allow the training to go faster, but you typically don't normalize the target outputs. When it comes to testing time or providing new inputs, make sure you normalize the data in the same way that you did during training. Specifically, use exactly the same parameters for normalization during training for any test inputs into the network.
One important remark is that you normalized elements of a single input vector. Having one-dimensional output space, you could not normalize the output.
The correct way is, indeed, to take a complete batch of training data, say N input (and output) vectors, and normalize each dimension (variable) individually (using N samples). Thus, for one-dimensional output, you will have N samples for normalization. In this way, the vector space of your input will not be distorted.
The normalization of the output dimension is usually required when the scale-space of output variables significantly different. After training, you should use the same set normalization parameters (e.g., for zscore it is "mean" and "std") as you obtain from the training data. In this case, you will put new (unseen) data into the same scale space as you in training.

How to use stacked autoencoders for pretraining

Let's say I wish to used stacked autoencoders as a pretraining step.
Let's say my full autoencoder is 40-30-10-30-40.
My steps are:
Train a 40-30-40 using the original 40 features data set in both input and output layers.
Using the trained encoder part only of the above i.e. 40-30 encoder, derive a new 30 feature representation of the original 40 features.
Train a 30-10-30 using the new 30 features data set (derived in step 2) in both input and output layers.
Take the trained encoder from step 1 ,40-30, and feed it into the encoder from step 3,30-10, giving a 40-30-10 encoder.
Take the 40-30-10 encoder from step 4 and use it as the input the NN.
a) Is that correct?
b) Do I freeze the weights in the 40-30-10 encoder when training the NN which would be the same as pregenerating the 10 feature representation from the original 40 feature data set and training on the new 10 feature representation data set.
PS. I already have a question out asking about whether I need to tie the weights of the encoder and decoder
a) Is that correct?
This is one of the typical approaches. You could also try to fit the autoencoder directly, as "raw" autoencoder with that many layers should be possible to fit right away, As an alternative you might consider fitting stacked denoising autoencoders instead, which might benefit more from "stacked" training.
b) Do I freeze the weights in the 40-30-10 encoder when training the NN which would be the same as pregenerating the 10 feature representation from the original 40 feature data set and training on the new 10 feature representation data set.
When you train whole NN you do not freeze anything. Pretraining is only a kind of preconditioning for the optimization process - you show your method where to start, but you do not want to limit the fitting procedure of actual supervised learning.
PS. I already have a question out asking about whether I need to tie the weights of the encoder and decoder
No, you do not have to tie weights, especially that you actually throw away your decoder anyway. Tieing the weights is important for some more probabilistic models in order to make minimization procedure possible (like in the case of RBMs), but for autoencoder there is no point.

labelling of dataset in machine learning

I have a question about some basic concepts of machine learning. The examples, I observed, were giving a brief overview .For training the system, feature vector is given as input. In case of supervised learning, the dataset is labelled. I have confusion about labelling. For example if I have to distinguish between two types of pictures, I will provide a feature vector and on output side for testing, I'll provide 1 for type A and 2 for type B. But if I want to extract a region of interest from a dataset of images. How will I label my data to extract ROI using SVM. I hope I am able to convey my confusion. Thanks in anticipation.
In supervised learning, such as SVMs, the dataset should be composed as follows:
<i-th feature vector><i-th label>
where i goes from 1 to the number of patterns (also examples or observations) in your training set so this represents a single record in your training set which can be used to train the SVM classifier.
So you basically have a set composed by such tuples and if you do have just 2 labels (binary classification problem) you can easily use a SVM. Indeed the SVM model will be trained thanks to the training set and the training labels and once the training phase has finished you can use another set (called Validation Set or Test Set), which is structured in the same way as the training set, to test the accuracy of your SVMs.
In other words the SVM workflow should be structured as follows:
train the SVM using the training set and the training labels
predict the labels for the validation set using the model trained in the previous step
if you know what the actual validation labels are, you can match the predicted labels with the actual labels and check how many labels have been correctly predicted. The ratio between the number of correctly predicted labels and the total number of labels in the validation set returns a scalar between [0;1] and it's called the accuracy of your SVM model.
if you're interested in the ROI, you might want to check the trained SVM parameters (mainly the weights and bias) to reconstruct the separation hyperplane
It is also important to know that the training set records should be correctly, a priori labelled: if the training labels are not correct, the SVM will never be able to correctly predict the output for previously unseen patterns. You do not have to label your data according to the ROI you want to extract, the data must be correctly labelled a priori: the SVM will have the entire set of type A pictures and the set of type B pictures and will learn the decision boundary to separate pictures of type A and pictures of type B. You do not have to trick the labels: if you do, you're not doing classification and/or machine learning and/or pattern recognition. You're basically tricking the results.

Resources