Need some advise here.
I am trying to build a model where it can predict the 3 different output features when 5 input features are given.
for example,
5 input features: size of the house, house floor, house condition, number of rooms, parking.
3 output features: price for selling, price for buying, price for renting
What I am confusing right now is that, is that possible that the trained model are able to predict the 3 outputs? What I found from others' example/tutorial is that they mostly trying to do one thing only on their model.
Sorry if my explanations are bad, I am new to tensorflow and machine learning.
Neural network definitely can predict/approximate more outputs. I have experience with neuron regulator and there net produce control signal for two motors.
So I don't have experience with tensorflow. But this framework is from Google and is quite popular, so I'm almost sure, there is multioutput functionality.
There is nice example of such thing.
In common practice, we build a model to predict only one output, that is because in surpervised learning, we should input some certain kinds of variables, and find a relation between them with a wanted output. Because this relation generally cannot work between the input and another wanted output.
But we can have a special technique to fit your problem:
If we have four input variables : I1, I2, I3, I4, and we want three output lables (generally discrete): O1, O2, O3. so we can created a new lable O4 after mergering the original three outputs. For example, if O1, O2, O3 can onlt be 0 or 1, the O4 have 2^3, in total, 8 possible values. So, we can build a prediction model between four input variables and the output O4. And once value of O4 is known, O1-O3 are all known as well.
Howover, if the output variable are not all discrete, especially regression technique is used, the technique above wonnot work. So, to predict three output, we normally do training three times and make three models.
Related
I am using different projects datasets with input features (depth of inheritance tree, number of children, number of methods), where these features have values per class in each different project.
I have read many papers saying that a neural network or any other model can't work on datasets of different distributions
My question is:
1. what is the meaning of datasets with different distributions (where a single dataset has a number of samples, each sample corresponding to a class in that project)
2. Why NN or any algorithm can't work on 2 datasets of different distributions
Thanks in advance.
One of the most used assumptions when formulating a statistical learning probelm is that the samples are IID, this means that your samples are identically distributed, thus all the sample should came from the ssame distribution. When you say that you have two different dataets, it means that this assumption is not true, and the majority of theoritical guarantes are no longer holding. Now, maybe your question is what does it mean "a data distribution", this is just the joint law p(x, y) whe x are the features and y the labels. So two dataset have different distributions means that p_{1}(x, y) != p_{2}(x, y)
I'm working on a question answering problem with limited data (~10,000s of data points) and very few features for both the context/question as well as the options/choices. Given:
a question Q and
options A, B, C, D, E (each characterized by some features, say, string similarity to Q or number of words in each option)
(while training) a single correct answer, say B.
I wish to predict exactly one of these as the correct answer. But I'm stuck because:
If I arrange ground truth as [0 1 0 0 0], and give the concatenation of QABCDE as input, then the model will behave as if classifying an image into dog, cat, rat, human, bird, i.e. each class will have a meaning, however that's not true here. If I switched the input to QBCDEA, the prediction should be [1 0 0 0 0].
If I split each data point into 5 data points, i.e. QA:0, QB:1, QC:0, QD:0, QE:0, then the model fails to learn that they're in fact interrelated, and only one of them must be predicted as 1.
One approach that seems viable is to make a custom loss function which penalizes multiple 1s for a single question, and which penalizes no 1s as well. But I think I might be missing something very obvious here :/
I'm also aware of how large models like BERT do this over SQuAD like datasets. They add positional embeddings to each option (eg. A gets 1, B gets 2), and then use a sort of concatenation over QA1 QB2 QC3 QD4 QE5 as input, and [0 1 0 0 0] as output. Unfortunately, I believe this will not work in my case given the very small dataset I have.
The problem you're having is that you removed all useful information from your "ground truth". The training target is not the ABCDE labels -- the target is the characteristics of the answers that those letters briefly represent.
Those five labels are merely array subscripts for classifications that are a 5Pn (5 objects chosen from n) shuffled subset of your training space. Bottom line: there is no information in those labels.
Rather, extract the salient characteristics from those answers. Your training needs to find the answer (characteristic set) that sufficiently matches the question. As such, what you're doing is close to multi-label training.
Multi-label models should handle this situation. This will include those that label photos, identifying multiple classes represented in the input.
Does that get you moving?
Response to OP comment
You understand correctly: predicting 0/1 for five arbitrary responses is meaningless to the model; the single-letter variables are of only transitory meaning, and have no relation to anything trainable.
A short thought experiment will demonstrate this. Imagine that we sort the answers such that A is always the correct answer; this doesn't change the information in the inputs and outputs; it's a valid arrangement of the multiple-choice test.
Train the model; we'll get to 100% accuracy in short order. Now, consider the model weights: what has the model learned from the input? Nothing -- the weights will train to ignore the input and select A, or will have absolutely arbitrary values that come to the A conclusion.
You need to ignore the ABCDE designations entirely; the target information is in the answers themselves, not in those letters. Since you haven't posted any sample cases, we have little to guide us for an alternate approach.
If your paradigm is a typical multiple-choice examination, with few restrictions on the questions and answers, then the problem you're tackling is far larger than your project is likely to solve -- you're in "Watson" territory, requiring a large knowledge base and a strong NLP system to parse the inputs and available responses.
If you have a restricted paradigm for the answers, perhaps you can parse them into phrases and relations, yielding a finite set of classes to consider in your training. In this case, a multi-label model might well be able to solve your problem.
If your application is open-ended, i.e. open topic, then I expect that you need a different model class (such as BERT), but you'll still need to consider the five answers as text sequences, not as letters. You need a holistic match to the subject at hand. If this is a typical multiple-choice exam, then your model will still have classification troubles, as all five answers are likely to be on topic; finding the correct answer should depend on some level of semantic insight into question and answer, something stronger than "bag of words" processing.
I have a set of items that are each described by 10 precise numbers n1, .., n10. I would like to learn the coefficients k1, .., k10 that should be associated to those numbers to rank them according to my criteria.
In that purpose I created a web application (in php) that shows me two items and ask me which one should be ranked first (it gives the supervision to the machine learning).
My question : I can't find a way to learn the ten coefficients at the same time for each case. Do you have any idea on what algorithm I could use ? (neural networks with all the 10 numbers in entry seem to be a good option because it would learn all the coefficient, but I don't know what would be the output of this network as I would like to learn it by comparing the items two by two.)
Neural network is fine for this. Your output would be the 10 coefficients. Comparing them "two by two" is nothing that influences the net architecture. Standard neural net training procedure takes care of "comparing the items" (if you want to call it that) itself.
At last, make sure to know if you have linear (single layer perceptron is enough) or non-linear (multilayer perceptron) data.
I'm looking at C4.5 classifier for a machine learning task. I have a large dataset containing city names, and need to differentiate between e.g. London Ontario, London England or even London in Burgundy in France, but looking at features from the surrounding text: E.g. Zip codes, state names, even when "Canada" or "England" are not mentioned. I also have access to meta data such as dialing codes which can help determine which country it is.
Subsequently once trained I want to run the classifier on the large dataset.
In all the examples I have found here there are only 2 states for the result (in this golf example play or don't play).
Can the c4.5 classifier handle London (Canada), London (England), London (France) as result classes or do I need to have different classifiers for London (Canada) True/False etc?
I see two options in your case.
The first approach is a straightforward extension to c4.5. In each leaf node, you keep all the labels instead of just the majority label. For example, as shown in the figure below, red labels actually present in three different leafs. When you have a query at the data point pointed by the arrow, the outputs are 3 labels (green, red and blue) together with their corresponding conditional probability p(c|v) (given feature x1 and x2, what is the probability of data x belongs to class c).
The 2nd approach is to generate multiple decision trees hence a random forest. The randomness can be injected by randomly sampling subset of training data made available to each individual tree. At classification time, you can aggregate the vote from all decision trees to get multi-class classification results.
The figures are borrowed from this excellent tutorial on multi-class classification by Andrew Zisserma.
I'm attempting to make a classifier that chooses a rating (1-5) for a item i. For each item i, I have a vector x containing about 40 different quantities pertaining to i. I also have a gold standard rating for each item. Based on some function of x, I want to train a classifier to give me a rating 1-5 that closely matches the gold standard.
Most of the information I've seen on classifiers deal with just binary decisions, while I have a rating decision. Are there common techniques or code libraries out there to deal with this sort of problem?
I agree with you that ML problems in which the response variable is on an ordinal scale
require special handling--'machine-mode' (i.e., returning a class label) seems insufficient
because the class labels ignore the relationship among the labels ("1st, 2nd, 3rd");
likewise, 'regression-mode' (i.e., treating the ordinal labels as floats, {1, 2, 3}) because
it ignores the metric distance between the response variables (e.g., 3 - 2 != 1).
R has (at least) several packages directed to ordinal regression. One of these is actually called Ordinal, but i haven't used it. I have used the Design Package in R for ordinal regression and i can certainly recommend it. Design contains a complete set of functions for solution, diagnostics, testing, and results presentation of ordinal regression problems via the Ordinal Logistic Model. Both Packages are available from CRAN) A step-by-step solution of an ordinal regression problem using the Design Package is presented on the UCLA Stats Site.
Also, i recently looked at a paper by a group at Yahoo working on ordinal classification using Support Vector Machines. I have not attempted to apply their technique.
Have you tried using Weka? It supports binary, numerical, and nominal attributes out of the box, the latter two of which might work well enough for your purposes.
Furthermore, it looks like one of the classifiers that's available is a meta-classifier called OrdinalClassClassifier.java, which is the result of this research:
Eibe Frank and Mark Hall, A simple approach to ordinal classification. In Proceedings of the 12th European Conference on Machine Learning, 2001, pp. 145-156.
If you don't need a pre-made approach, then these references (in addition to doug's note about the Yahoo SVM paper) might be useful:
W Chu and Z Ghahramani, Gaussian processes for ordinal regression. Journal of Machine Learning Research, 2006.
Wei Chu and S. Sathiya Keerthi, New approaches to support vector ordinal regression. In Proceedings of the 22nd international conference on Machine Learning, 2005, 145-152.
The problems that dough has raised are all valid. Let me add another one. You didn't say how you would like to measure the agreement between the classification and the "gold standard". You have to formulate the answer to that question as soon as possible, as this will have a huge impact on your next step. In my experience, the most problematic part of any (ok, not any, most) optimization task is the score function. Try asking yourself whether all errors equal? Does miss-classifying the "3" as being "4" has the same impact as classifying "4" as "3"? What about "1" vs "5". Can mistakenly missing one case have disastrous consequences (miss HIV diagnosis, activate pilot ejection in a plane)
The simplest way to measure the agreement between categorical classifiers is Cohen's Kappa. More complicated methods are described in the following links here, here, here, and here
Having said that, sometimes picking a solution that "just works", instead of "the right one" is faster and easier. If I were you I would pick a machine learning library (R, Weka, I personally love Orange) and see what I get. Only if you don't have reasonably good results with that, look for more complex solutions
If not interested in fancy statistics a one hidden layer back propagation neural network with 3 or 5 output nodes will probably do the trick if the training data is sufficiently large. Most NN classifiers try to minimize the mean squared error which is not always desired. Support Vector Machines mentioned earlier is a good alternative.
FANN is a good library for back propagation NNs, it also has some tools to assist in training of the network.
There are two packages in R that might help taming ordinal data
ordinalForest on CRAN
rpartScore on CRAN
I'm working on an OrdinalClassifier that is based on the sklearn framework (specifically the OVR multiclass classifier) and which works well with sklearn workflow such as pipelines, cross validation, and scoring.
Through testing, I'm finding that it performs very well vs. standard non-ordinal multiclass classification using SVC. And it gives much greater control over optimizing for precision and recall on the positive class (in my testing, I used sklearn's diabetes dataset and transformed the disease progression target(y) into a low, medium, high class label. Testing via cross validation is on my repo along with attribution. Scoring is based on weighted f1.
https://github.com/leeprevost/OrdinalClassifier