I have a set of items that are each described by 10 precise numbers n1, .., n10. I would like to learn the coefficients k1, .., k10 that should be associated to those numbers to rank them according to my criteria.
In that purpose I created a web application (in php) that shows me two items and ask me which one should be ranked first (it gives the supervision to the machine learning).
My question : I can't find a way to learn the ten coefficients at the same time for each case. Do you have any idea on what algorithm I could use ? (neural networks with all the 10 numbers in entry seem to be a good option because it would learn all the coefficient, but I don't know what would be the output of this network as I would like to learn it by comparing the items two by two.)
Neural network is fine for this. Your output would be the 10 coefficients. Comparing them "two by two" is nothing that influences the net architecture. Standard neural net training procedure takes care of "comparing the items" (if you want to call it that) itself.
At last, make sure to know if you have linear (single layer perceptron is enough) or non-linear (multilayer perceptron) data.
Related
I’m very new to machine learning.
I have a dataset with data given me by a f1 race. User is playing this game and is giving me this dataset.
With machine learning, I have to work with this data and when a user (I know they are 10) plays a game I have to recognize who’s playing.
The data consists of datagram packet occurred in 1/10 second freq, the packets contains the following Time, laptime, lapdistance, totaldistance, speed, car position, traction control, last lap time, fuel, gear,..
I’ve thought to use a kmeans used in a supervised way.
Which algorithm could be better?
The task must be a multiclass classification. The very first step in any machine learning activity is to define a score metric (https://machinelearningmastery.com/classification-accuracy-is-not-enough-more-performance-measures-you-can-use/). That allows you to compare models between themselves and decide which is better. Then build a base model with random forest or/and logistic regression as suggested in another answer - they perform well out-of-the-box. Then try to play with features and understand which of them are more informative. And don't forget about a visualizations - they give many hints for data wrangling, etc.
this is somewhat a broad question, so I'll try my best
kmeans is unsupervised algorithm meaning it will find the classes itself and it best used when you know there are multiple classes but you don't know what exactly they are... using it with labeled data just means you will compute the distance of new vector v to each vector in the dataset and pick the one (or ones using majority vote) which give the min distance , this is not considered as machine learning
in this case when you do have the labels, supervised approach will yield much better results
I suggest try random forest and logistic regression at first, those are the most basic and common algorithms and they give pretty good results
if you haven't achieve the desired accuracy you can use deep learning and build a neural network with input layer as big as your packet's values and output layer of the number of classes, in between you can use one or multiple hidden layers with various nodes, but this is advanced approach and you better pick up some experience in machine learning field before pursue it
Note: the data is a time series, meaning that every driver has it's own behaviour of driving a car, so data should be considered as bulks of points, with this you can apply pattern matching technics, also there are a several neural networks build exactly for this data (like RNN) but this is far far advanced and much more difficult to implement
I have been reading article about SRCNN and found that they are using "number of backprops" for evaluating how well network is performing, i.e. what network is able to learn after x backprops (as I understand). I would like to know what number of backprops actually means. Is this just the number of training data samples that there used during the training? Or maybe the number of mini-batches? Maybe it is one of the previous numbers multiplied by number of learnable parameters in the network? Or something completely different? Maybe there is some other more common name for this that I could loop up somewhere and read more about it because I was not able to find anything useful by searching "number of backprops" or "number of backpropagations"?
Bonus question: how widely this metric is used and how good is it?
I read their Paper from 2016:
author={C. Dong and C. C. Loy and K. He and X. Tang},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
title={Image Super-Resolution Using Deep Convolutional Networks},
Since they don't even mention batches I assume they are doing a backpropagation to update their weights after each sample / image.
In other words their batchsize (mini-batchsize) is equal to 1 sample.
So number of backpropagations means amount of batches after all, which is quite a common metric, viz. in the paper PSNR (loss) over amount of batches (or usually loss over epochs).
Bonus question: I come to the conclusion they just didn't stick to the common thesaurus of machine learning, or deep learning.
BonusBonus question: They use the metric of loss after n batches to showcase how much the different network architectures could learn on trainigdatasets with different size.
I would assume that after it means how many the network has learned after back-propagating n times. Its more likely interchangeable with "after training over n samples..."
This maybe a bit different if they are using a recurrent network, as they could have more samples run in forward prop then in backwardprop. (For whatever reason I can't get the link to the paper to load, so unsure).
Based on your number of questions I think you might be overthinking this :)
Number of backprops is not a metric used commonly. Perhaps they use it here to showcase the speed of training based upon whatever optimization method's they are using. But for most common instances, it is not a relevant metric.
There is a dataset of a plant that makes certain numeric outputs based on numeric inputs. The dataset contains the input values and output value for several years every 15 minutes.
Since it would be too expensive to model the physical properties of the system in software, I would like to create a model with Machine learning, which behaves as the system. When entering inputs, the model should provide output.
For the solution I have tested Feedforward neural network. The results are ok, but in some cases too inaccurate.
What other methods would be available for this problem?
If it's a time series task you could use the NARX architecture of a neural network or an LSTM network. Later is like the NARX a recurrent neural network. Matlab offers an implementation of the first one.
https://en.m.wikipedia.org/wiki/Nonlinear_autoregressive_exogenous_model
https://en.m.wikipedia.org/wiki/Long_short-term_memory
If you "simply" want to fit a polynomial to your data you could use basic linear regression with polynomials of different degree to see which one works best.
Note: It's not called linear because it's only able to fit linear models.
https://en.m.wikipedia.org/wiki/Linear_regression
Some other possibilities are kernel methods such as kernel ridge regression or SVR. Later one is based on support vector machines which usually perform quite well (at least for classification from my personal experience).
If you want to try SVR you can use a small but great lib called libSVM. Matlab also offers this.
The following link shows a comparison of this algorithms:
http://scikit-learn.org/stable/auto_examples/plot_kernel_ridge_regression.html
Edit: If i understand this correctly, it's a time series task if you want to predict the outputs of a future time t+1 from a given time t. Try the NARX model or the LSTM net.
I Wrote a simple recurrent neural network (7 neurons, each one is initially connected to all the neurons) and trained it using a genetic algorithm to learn "complicated", non-linear functions like 1/(1+x^2). As the training set, I used 20 values within the range [-5,5] (I tried to use more than 20 but the results were not changed dramatically).
The network can learn this range pretty well, and when given examples of other points within this range, it can predict the value of the function. However, it can not extrapolate correctly and predicting the values of the function outside the range [-5,5]. What are the reasons for that and what can I do to improve its extrapolation abilities?
Thanks!
Neural networks are not extrapolation methods (no matter - recurrent or not), this is completely out of their capabilities. They are used to fit a function on the provided data, they are completely free to build model outside the subspace populated with training points. So in non very strict sense one should think about them as an interpolation method.
To make things clear, neural network should be capable of generalizing the function inside subspace spanned by the training samples, but not outside of it
Neural network is trained only in the sense of consistency with training samples, while extrapolation is something completely different. Simple example from "H.Lohninger: Teach/Me Data Analysis, Springer-Verlag, Berlin-New York-Tokyo, 1999. ISBN 3-540-14743-8" shows how NN behave in this context
All of these networks are consistent with training data, but can do anything outside of this subspace.
You should rather reconsider your problem's formulation, and if it can be expressed as a regression or classification problem then you can use NN, otherwise you should think about some completely different approach.
The only thing, which can be done to somehow "correct" what is happening outside the training set is to:
add artificial training points in the desired subspace (but this simply grows the training set, and again - outside of this new set, network's behavious is "random")
add strong regularization, which will force network to create very simple model, but model's complexity will not guarantee any extrapolation strength, as two model's of exactly the same complexity can have for example completely different limits in -/+ infinity.
Combining above two steps can help building model which to some extent "extrapolates", but this, as stated before, is not a purpose of a neural network.
As far as I know this is only possible with networks which do have the echo property. See Echo State Networks on scholarpedia.org.
These networks are designed for arbitrary signal learning and are capable to remember their behavior.
You can also take a look at this tutorial.
The nature of your post(s) suggests that what you're referring to as "extrapolation" would be more accurately defined as "sequence recognition and reproduction." Training networks to recognize a data sequence with or without time-series (dt) is pretty much the purpose of Recurrent Neural Network (RNN).
The training function shown in your post has output limits governed by 0 and 1 (or -1, since x is effectively abs(x) in the context of that function). So, first things first, be certain your input layer can easily distinguish between negative and positive inputs (if it must).
Next, the number of neurons is not nearly as important as how they're layered and interconnected. How many of the 7 were used for the sequence inputs? What type of network was used and how was it configured? Network feedback will reveal the ratios, proportions, relationships, etc. and aid in the adjustment of network weight adjustments to match the sequence. Feedback can also take the form of a forward-feed depending on the type of network used to create the RNN.
Producing an 'observable' network for the exponential-decay function: 1/(1+x^2), should be a decent exercise to cut your teeth on RNNs. 'Observable', meaning the network is capable of producing results for any input value(s) even though its training data is (far) smaller than all possible inputs. I can only assume that this was your actual objective as opposed to "extrapolation."
I'm attempting to make a classifier that chooses a rating (1-5) for a item i. For each item i, I have a vector x containing about 40 different quantities pertaining to i. I also have a gold standard rating for each item. Based on some function of x, I want to train a classifier to give me a rating 1-5 that closely matches the gold standard.
Most of the information I've seen on classifiers deal with just binary decisions, while I have a rating decision. Are there common techniques or code libraries out there to deal with this sort of problem?
I agree with you that ML problems in which the response variable is on an ordinal scale
require special handling--'machine-mode' (i.e., returning a class label) seems insufficient
because the class labels ignore the relationship among the labels ("1st, 2nd, 3rd");
likewise, 'regression-mode' (i.e., treating the ordinal labels as floats, {1, 2, 3}) because
it ignores the metric distance between the response variables (e.g., 3 - 2 != 1).
R has (at least) several packages directed to ordinal regression. One of these is actually called Ordinal, but i haven't used it. I have used the Design Package in R for ordinal regression and i can certainly recommend it. Design contains a complete set of functions for solution, diagnostics, testing, and results presentation of ordinal regression problems via the Ordinal Logistic Model. Both Packages are available from CRAN) A step-by-step solution of an ordinal regression problem using the Design Package is presented on the UCLA Stats Site.
Also, i recently looked at a paper by a group at Yahoo working on ordinal classification using Support Vector Machines. I have not attempted to apply their technique.
Have you tried using Weka? It supports binary, numerical, and nominal attributes out of the box, the latter two of which might work well enough for your purposes.
Furthermore, it looks like one of the classifiers that's available is a meta-classifier called OrdinalClassClassifier.java, which is the result of this research:
Eibe Frank and Mark Hall, A simple approach to ordinal classification. In Proceedings of the 12th European Conference on Machine Learning, 2001, pp. 145-156.
If you don't need a pre-made approach, then these references (in addition to doug's note about the Yahoo SVM paper) might be useful:
W Chu and Z Ghahramani, Gaussian processes for ordinal regression. Journal of Machine Learning Research, 2006.
Wei Chu and S. Sathiya Keerthi, New approaches to support vector ordinal regression. In Proceedings of the 22nd international conference on Machine Learning, 2005, 145-152.
The problems that dough has raised are all valid. Let me add another one. You didn't say how you would like to measure the agreement between the classification and the "gold standard". You have to formulate the answer to that question as soon as possible, as this will have a huge impact on your next step. In my experience, the most problematic part of any (ok, not any, most) optimization task is the score function. Try asking yourself whether all errors equal? Does miss-classifying the "3" as being "4" has the same impact as classifying "4" as "3"? What about "1" vs "5". Can mistakenly missing one case have disastrous consequences (miss HIV diagnosis, activate pilot ejection in a plane)
The simplest way to measure the agreement between categorical classifiers is Cohen's Kappa. More complicated methods are described in the following links here, here, here, and here
Having said that, sometimes picking a solution that "just works", instead of "the right one" is faster and easier. If I were you I would pick a machine learning library (R, Weka, I personally love Orange) and see what I get. Only if you don't have reasonably good results with that, look for more complex solutions
If not interested in fancy statistics a one hidden layer back propagation neural network with 3 or 5 output nodes will probably do the trick if the training data is sufficiently large. Most NN classifiers try to minimize the mean squared error which is not always desired. Support Vector Machines mentioned earlier is a good alternative.
FANN is a good library for back propagation NNs, it also has some tools to assist in training of the network.
There are two packages in R that might help taming ordinal data
ordinalForest on CRAN
rpartScore on CRAN
I'm working on an OrdinalClassifier that is based on the sklearn framework (specifically the OVR multiclass classifier) and which works well with sklearn workflow such as pipelines, cross validation, and scoring.
Through testing, I'm finding that it performs very well vs. standard non-ordinal multiclass classification using SVC. And it gives much greater control over optimizing for precision and recall on the positive class (in my testing, I used sklearn's diabetes dataset and transformed the disease progression target(y) into a low, medium, high class label. Testing via cross validation is on my repo along with attribution. Scoring is based on weighted f1.
https://github.com/leeprevost/OrdinalClassifier