What kind of machine learning model do I use to combine two data sets with different features into 1 binary class prediction? I want to predict 0 and 1. I have one data set with 1000 values and 2 features and I have a second data set with same class (0 or 1) with 3 different features.
Related
Let's say I have a dataset of 2 million. At first, I used only 1 million, trained those and saved the model in h5 format like first.h5. Later I used another 1 million data, trained those using the same algorithm and saved as second.h5. Training requires more than a day , hence I can't use all two million data at once. Is there any way , I can merge those two saved model like first.h5 + second.h5 = merged.h5
There is no way you can do that (merge models). Let me put it in simple terms. You train a child named first using some 1 million data to identify if an image is a cat or a dog. Then you trained a second child named second using the other 1 million data to identify if an image is a cat or a dog. Now what you are asking for is to combine the first and second.
However, assume that the training data is IID (independent and identically distributed) then what you can do is create an ensemble of both the models for making predictions.
The simple way to ensemble two models is are
Max Voting
Averaging
Weighted Averaging
Follow this link on how to the ensemble.
Or a simple strategy is to average the final score of both the models and use the averaged score to make the predictions.
A more powerful strategy is to use the validation set to find the weights for the classes and then use these weights for making the final predictions on unseen data.
You could merge - average weights - but this will not be the same as training with full dataset.
Usually training with more data leads to better results, to better model.
If you don' t want to train with full dataset i would recommend not to average weights but to use both models for inference and average predictions.
I want to predict students play cricket or not{Target Variable}.
Suppose I have 3 columns :
Gender, Class, Age
As we can see, I have 2 categorical attributes and one continuous attribute.
While deciding the root node, I know that both categorical attributes can be compared traditionally using gini criterion. How should I split the continuous attribute and which criterion should I take into account for it to be considered as a competitor for being the root node against 2 categorical?
You can split continuous variables in intervals. Lets suppose you have continuous variable form 1 to 10, You can split it like 1 to 5 in one category and 6 to 10 in different category.
It really depends on which model (algorithm) you are using for doing the splitting. However, In generel the F-test is what normally what is used when splitting continuous variables. Try to have a look at what SAS uses for their implementation: SAS - splitting criteria. Also, here is a quite good explanation of decision trees: Decision tree. It begins here.
I'm creating a character recognition software in Python using scikit-learn. I have a large dataset of images labelled [A-Za-z]. I'm using linear SVM. Training the model using all the samples with 52 different labels is very, very slow.
If I divide my training dataset in 13 sections such that each section has images of only 4 characters, and no image can be a part of more than 1 section, and then train 13 different models.
How can I combine those models together to create a more accurate model? OR if I perform classification of test set on all 13 models and compare individual sample's result on basis of confidence score (selecting the one with with highest score), will it affect the accuracy of the overall model?
It seems what you need to to is some kind of Order Reduction of data.
After the order reduction classify data into in 13 large group and then do a final classification tool.
I would look into Linear Discriminant Analysis for the first step I mentioned.
I'm looking to use Tensorflow to set up a neural network to score items based on various properties they have. The amount of properties a given item can have is small (let's say 10 is the max) but the amount of possible properties is in the hundreds. For example, imagine we were scoring different kinds of vehicle, each with various attributes ("wheels", "engine horsepower", "wings", etc.) and a numerical value for that attribute (2, 600, 4).
My question is: is there a way to model the neural network for this to have a relatively low number of inputs, on the order of the max number of properties the item can have (in this example, 10)? Or does each possible property need to be an input, resulting in hundreds of total inputs, most of which (>90%) would be blank for any given item?
Just have all of the possible properties as inputs, but set them to 0 when they are not present. Hundreds of inputs to a NN is not uncommon anyways.
I am trying to use linear SVMs for multi-class object category recognition. So far what I have understood is that there are mainly two approaches used- one-vs-all(OVA) and one-vs-one(OVO).
But I am having difficulty understanding its implementation. I mean the steps that I think is used are:
First the feature descriptors are prepared from let's say SIFT. So I have a 128XN feature vector.
Next to prepare a SVM classifier model for a particluar object category(say car), I take 50 images of car as the positive training set and total 50 images of rest categories taking randomly from each category (Is this part correct?). I prepare such models for all such categories (say 5 of them).
Next when I have an input image, do I need to input the image into all the 5 models and then check their values (+1/-1) for each of these models? I am having difficulty understanding this part.
In one-vs-all approach, you have to check for all 5 models. Then you can take the decision with the most confidence value. LIBSVM gives probability estimates.
In one-vs-one approach, you can take the majority. For example, you test 1 vs. 2, 1 vs. 3, 1 vs. 4 and 1 vs. 5. You classify it as 1 in 3 cases. You do the same for other 4 classes. Suppose for other four classes the values are [0, 1, 1, 2]. Therefore, class 1 was obtained most number of times, making that class as the final class. In this case, you could also do total of probability estimates. Take the maximum. That would work unless in one pair the classification goes extremely wrong. For example, in 1 vs. 4, it classifies 4 (true class is 1) with a confidence 0.7. Then just because of this one decision, your total of probability estimates may shoot up and give wrong results. This issue can be examined experimentally.
LIBSVM uses one vs. one. You can check the reasoning here. You can read this paper too where they defend one vs. all classification approach and conclude that it is not necessarily worse than one vs. one.
In short, your positive training samples are always the same. In one vs one you train n classifiers with negative samples from each of the negative classes taken separately. In one vs all you lump all negative samples together and train a single classifier.. The problem with the former approach is that you have to consider all n outcomes to decide on the class. The problem with the latter approach is that lumping al negativel object classes create may create a non homogeneous class that is hard to process and analyse.