Altrnative of "Weka: training and test set are not compatible"? - machine-learning

"Weka: training and test set are not compatible" can be solved using batch filtering but at the time of training a model I don't have test.arff. My problem caused in the command "stringToWord vector" (on CLI).
So my question is, can Caret package(R) or Scikit learn (Python) provides any alternative for this one.
Note:
1. Functionality provided by "stringToWord vector" is a must requirement.
2. I don't want to retrain my model while testing because it takes lot of time.

Given the requirements you mentioned, you can use Weka's Filtered Classifier option during training and testing. I am not re-iterating what I have recorded as a video cast here and here.
But the basic idea is not to use the StringToWord vector as a direct filter rather to use it as a filtering option in the FilteredClassifier option. The model you generate will be just once. And then you can apply the model directly on your unlabelled data without retraining them or without applying StringToWord vector again on the unlabelled data. FilteredClassifier will take care of these concerns for you.

Related

What is the use of train_on_batch() in keras?

How train_on_batch() is different from fit()? What are the cases when we should use train_on_batch()?
For this question, it's a simple answer from the primary author:
With fit_generator, you can use a generator for the validation data as
well. In general, I would recommend using fit_generator, but using
train_on_batch works fine too. These methods only exist for the sake of
convenience in different use cases, there is no "correct" method.
train_on_batch allows you to expressly update weights based on a collection of samples you provide, without regard to any fixed batch size. You would use this in cases when that is what you want: to train on an explicit collection of samples. You could use that approach to maintain your own iteration over multiple batches of a traditional training set but allowing fit or fit_generator to iterate batches for you is likely simpler.
One case when it might be nice to use train_on_batch is for updating a pre-trained model on a single new batch of samples. Suppose you've already trained and deployed a model, and sometime later you've received a new set of training samples previously never used. You could use train_on_batch to directly update the existing model only on those samples. Other methods can do this too, but it is rather explicit to use train_on_batch for this case.
Apart from special cases like this (either where you have some pedagogical reason to maintain your own cursor across different training batches, or else for some type of semi-online training update on a special batch), it is probably better to just always use fit (for data that fits in memory) or fit_generator (for streaming batches of data as a generator).
train_on_batch() gives you greater control of the state of the LSTM, for example, when using a stateful LSTM and controlling calls to model.reset_states() is needed. You may have multi-series data and need to reset the state after each series, which you can do with train_on_batch(), but if you used .fit() then the network would be trained on all the series of data without resetting the state. There's no right or wrong, it depends on what data you're using, and how you want the network to behave.
Train_on_batch will also see a performance increase over fit and fit generator if youre using large datasets and don't have easily serializable data (like high rank numpy arrays), to write to tfrecords.
In this case you can save the arrays as numpy files and load up smaller subsets of them (traina.npy, trainb.npy etc) in memory, when the whole set won't fit in memory. You can then use tf.data.Dataset.from_tensor_slices and then using train_on_batch with your subdataset, then loading up another dataset and calling train on batch again, etc, now you've trained on your entire set and can control exactly how much and what of your dataset trains your model. You can then define your own epochs, batch sizes, etc with simple loops and functions to grab from your dataset.
Indeed #nbro answer helps, just to add few more scenarios, lets say you are training some seq to seq model or a large network with one or more encoders. We can create custom training loops using train_on_batch and use a part of our data to validate on the encoder directly without using callbacks. Writing callbacks for a complex validation process could be difficult. There are several cases where we wish to train on batch.
Regards,
Karthick
From Keras - Model training APIs:
fit: Trains the model for a fixed number of epochs (iterations on a dataset).
train_on_batch: Runs a single gradient update on a single batch of data.
We can use it in GAN when we update the discriminator and generator using a batch of our training data set at a time. I saw Jason Brownlee used train_on_batch in on his tutorials (How to Develop a 1D Generative Adversarial Network From Scratch in Keras)
Tip for quick search: Type Control+F and type in the search box the term that you want to search (train_on_batch, for example).

Do generative adversarial networks require class labels?

I am trying to understand how a GAN is trained. I believe understand the Adversarial training process. What I can't seem to find information on is this: do GANs use class labels in the training process? My current understanding says no - because the discriminator is simply trying to discriminate between real or fake images, while the generator is trying to create real image (but not images of any specific class.)
If this is the case, then how do researchers propose to use the discriminator network for classification tasks? the network would only be able to perform two way classification between real or fake images. The generator network would also be difficult to use, seeing as we don't know what setting of the input vector 'Z' will result in the required generated image.
It completely depends on the network you are trying to build. If you are talking specifically about the basic GAN, then you are correct. Class labels are not needed as the discriminator network is only classifying real/fake images. There is a conditional variant of the GAN (cGAN) where you do make use of the class labels in both the generator and the discriminator. This allows you to produce examples for a specific class with the generator and classify them with the discriminator (along with the real/fake classification)
From the reading that I have done, the discriminator network is just used as a tool for training the generator, and the generator is the main network of concern. Why would you use the discriminator that you used to train the GAN for classification when you could just use a ResNet or VGG net for your classification tasks. These networks would work better anyway. You are right however that using the original GAN could cause difficulty because of the mode collapse and constantly producing the same image. That is why the conditional variant was introduced.
Hope this clears things up!
Do GANs use class labels in the training process?
The author suspected GANs doesn't require labels. This is correct. The discriminator is trained to classify real and fake images. Since we know which images are real and which are generated by the generator, we do not need labels to train the discriminator. The generator is trained to fool the discriminator, which also doesn't require labels.
This is one of the most attractive benefits of GANs [1]. Usually, we refer to methods that do not require labels as unsupervised learning. That said, if we had labels, maybe we could train a GAN that uses the labels to improve performance. This idea underlies the follow-up work by [2] who introduced the conditional GAN.
If this is the case, then how do researchers propose to use the discriminator network for classification tasks?
There seems to be a misunderstanding here. The purpose of the discriminator is NOT to act as a classifier on real data. The purpose of the discriminator is to "tell the generator how to improve its fakes". This is done by using the discriminator as a loss function, which we can backpropagate gradients through if it is a neural network. After training, we usually discard the discriminator.
The generator network would also be difficult to use, seeing as we don't know what setting of the input vector 'Z' will result in the required generated image.
It seems the underlying reason for posting the question lies here. The input vector 'Z' is chosen such that it follows some distribution, typically a normal distribution. But then what happens if we take 'Z', a random vector with normally distributed entries, and computes 'G(Z)'? We get a new vector which follows a very complicated distribution that depends on G. The entire idea of GANs is to change G such that this new complicated distribution is close to the distribution of our data. This idea is formalized with f-Divergences in [3].
[1] https://arxiv.org/abs/1406.2661
[2] https://arxiv.org/abs/1411.1784
[3] https://arxiv.org/abs/1606.00709

Do I still need to load word2vec model at model testing?

This may sound like a naive question, but i am quite new on this. Let's say I use the Google pre-trained word2vector model (https://github.com/dav/word2vec) to train a classification model. I save my classification model. Now I load back the classification model into memory for testing new instances. Do I need to load the Google word2vector model again? Or is it only used for training my model?
It depends on how your corpuses and test examples are structured and pre-processed.
You are probably using the pre-trained word-vectors to turn text into numerical features. At first, text examples are vectorized to train the classifier. Later, other (test/production) text examples will be vectorized in the same, and presented to get the classifier to get its judgements.
So you will need to use the same text-to-vectors process for test/production text examples as was used during training. Perhaps you've done that in a separate earlier bulk step, in which case you already have the features in the vector form the classifier uses. But often your classifier pipeline will itself take raw text, and vectorize it – in which case it will need the same pre-trained (word)->(vector) mappings available at test time as were available during training.

How to normalize test data in the same way as training data?

I have used weka.filters.unsupervised.instance.Normalize in order to normalize my training data before building my model.
But now I face the question of normalizing the instances that I need to classify once the model is built. weka.filters.unsupervised.instance.Normalize does not output parameters that I could use in order to normalize the attributes of my instance. Or am I mistaken?
I understand you intend to apply exactly the same normalization to training and test data. The way to do it is to make use of the batch option. You can find an explanation on how to do it in the command line at the Batch Filtering page.

Classification Algorithm which can take predefined weights for attributes as input

I have 20 attributes and one target feature. All the attributes are binary(present or not present) and the target feature is multinomial(5 classes).
But for each instance, apart from the presence of some attributes, I also have the information that how much effect(scale 1-5) did each present attribute have on the target feature.
How do I make use of this extra information that I have, and build a classification model that helps in better prediction for the test classes.
Why not just use the weights as the features, instead of binary presence indicator? You can code the lack of presence as a 0 on the continuous scale.
EDIT:
The classifier you choose to use will learn optimal weights on the features in training to separate the classes... thus I don't believe there's any better you can do if you do not have access to test weights. Essentially a linear classifier is learning a rule of the form:
c_i = sgn(w . x_i)
You're saying you have access to weights, but without an example of what the data look like, and an explanation of where the weights come from, I'd have to say I don't see how you'd use them (or even why you'd want to---is standard classification with binary features not working well enough?)
This clearly depends on the actual algorithms that you are using.
For decision trees, the information is useless. They are meant to learn which attributes have how much effect.
Similarly, support vector machines will learn the best linear split, so any kind of weight will disappear since the SVM already learns this automatically.
However, if you are doing NN classification, just scale the attributes as desired, to emphasize differences in the influential attributes.
Sorry, you need to look at other algorithms yourself. There are just too many.
Use the knowledge as prior over the weight of features. You can actually compute the posterior estimation out of the data and then have the final model

Resources