I'm searching for method to evaluate a regression models like the accuracy in clustering models, please could you tell me how I can evaluate my regression model
This is very simple: I assume your regression function spits out a number which is somewhat different from your target value. It might even be simpler than clustering, let me explain why...
Difference in evaluation
In Clustering you have labels. An element can have the wrong or the correct label. There are different cases (false positive, false negative, true positive, true negative). Your test might need to consider all these cases (accuracy looks at the overall correctness, no matter what).
In regression your result is a number (like 2.123) and your target is another number (like 1.100). Your error is the difference (in thise case 1.023). You can now apply various different ways to calculate how big your error is across all results, also considergin positive and negative error.
Ways to calculate
There are plenty of ways and you need to pick what is right for you. Here are the two probably most popular ones:
Sum of squared errors (SSE): https://en.wikipedia.org/wiki/Residual_sum_of_squares
Mean sqwuared error (MSE): https://en.wikipedia.org/wiki/Mean_squared_error
Related
Hi I've been doing a machine learning project about predicting if a given (query, answer) pair is a good match (label the pair with 1 if it is a good match, 0 otherwise). But the problem is, in the training set, all the items are labelled with 1. So I got confused because I don't think the training set has strong discriminative power. To be more specific, now I could extract some features like:
1. textual similarity between query and answer
2. some attributes like the posting date, who created it, which aspect is it about etc.
Maybe I should try semi supervised learning (never studied it so have no idea if it will work)? But with such a training set I even cannot do validation....
Actually, you can train a data set on only positive examples; 1-class SVM does this. However, this presumes that anything "sufficiently outside" the original data set is negative data, with "sufficiently outside" affected mainly by gamma (allowed error rate) and k (degree of the kernel function).
A solution for your problem depends on the data you have. You are quite correct that a model trains better when given representative negative examples. The description you give strongly suggests that you do know there are insufficient matches.
Do you need a strict +/- scoring for the matches? Most applications simply rank them: the match strength is the score. This changes your problem from a classification to a prediction case. If you do need a strict +/- partition (classification), then I suggest that you slightly alter your training set: include only obvious examples: throw out anything scored near your comfort threshold for declaring a match.
With these inputs only, train your model. You'll have a clear "alley" between good and bad matches, and the model will "decide" which way to judge the in-between cases in testing and production.
I have a set on one-vs-all SVMs. These are LibSVM binary trained SVMs on a genuine class then all other classes. I want to show FAR and FRR from the system, but I appear to get getting very large FRR values and very little FAR values. This is because I use a positive test set from the genuine class as positive test data and the positive data from all other classes as negative test data for a test. This means that I get an equal number of FAR and FRR values. If a genuine sample is falsely rejected then it means another SVM will falsely accept it in another test for another user.
This gives the same FAR and FRR values. But it means the FAR and FRR percentages are EXTREMELY different. The negative dataset can be up to 100 times bigger than the positive set. This means that, if we have n false rejections (and consequently false acceptances) then we have n/pos_data_size FRR and n/(pos_data_size*100) FAR!
I would like to nicely represent the error rates. But this seems to be very difficult to do! Is there a way that would work in this case?
Do you know how to intepret RAE and RSE values? I know a COD closer to 1 is a good sign. Does this indicate that boosted decision tree regression is best?
RAE and RSE closer to 0 is a good sign...you want error to be as low as possible. See this article for more information on evaluating your model. From that page:
The term "error" here represents the difference between the predicted value and the true value. The absolute value or the square of this difference are usually computed to capture the total magnitude of error across all instances, as the difference between the predicted and true value could be negative in some cases. The error metrics measure the predictive performance of a regression model in terms of the mean deviation of its predictions from the true values. Lower error values mean the model is more accurate in making predictions. An overall error metric of 0 means that the model fits the data perfectly.
Yes, with your current results, the boosted decision tree performs best. I don't know the details of your work well enough to determine if that is good enough. It honestly may be. But if you determine it's not, you can also tweak the input parameters in your "Boosted Decision Tree Regression" module to try to get even better results. The "ParameterSweep" module can help with that by trying many different input parameters for you and you specify the parameter that you want to optimize for (such as your RAE, RSE, or COD referenced in your question). See this article for a brief description. Hope this helps.
P.S. I'm glad that you're looking into the black carbon levels in Westeros...I'm sure Cersei doesn't even care.
For my research I am using Weka to predict alpha values for different uses. The legal range of alpha is any real number between 0 and 1 inclusive. It is currently performing well, but some of the predictions are greater than 1. I want to keep the classifier as numerical since it is a real number, but I want to limit the range of the prediction to between 0 and 1. Any ideas on how to do this?
I think that #Lars-Kotthoff raises interesting points. I would provide my suggestions from a different perspective, ignoring completely the classification problems:
Once you have a set of values within a range [0, inf), you can just try to normalised them using some function such as logit or min-max, among others.
You can't do this in Weka. Whether it will be possible at all will depend on the implementation of the regression algorithm -- I'm not aware of something like this being implemented in any of the algorithms in Weka (although I might be wrong).
Even if it was implemented, the most likely thing that would happen is that everything greater than 1 would simply be replaced by 1. You can do the same thing by checking each prediction and replacing all values greater than 1.
Taking the possible output range into account when training the regression model is unlikely to improve performance.
I'm implementing an one-versus-rest classifier to discriminate between neural data corresponding (1) to moving a computer cursor up and (2) to moving it in any of the other seven cardinal directions or no movement. I'm using an SVM classifier with an RBF kernel (created by LIBSVM), and I did a grid search to find the best possible gamma and cost parameters for my classifier. I have tried using training data with 338 elements from each of the two classes (undersampling my large "rest" class) and have used 338 elements from my first class and 7218 from my second one with a weighted SVM.
I have also used feature selection to bring the number of features I'm using down from 130 to 10. I tried using the ten "best" features and the ten "worst" features when training my classifier. I have also used the entire feature set.
Unfortunately, my results are not very good, and moreover, I cannot find an explanation why. I tested with 37759 data points, where 1687 of them came from the "one" (i.e. "up") class and the remaining 36072 came from the "rest" class. In all cases, my classifier is 95% accurate BUT the values that are predicted correctly all fall into the "rest" class (i.e. all my data points are predicted as "rest" and all the values that are incorrectly predicted fall in the "one"/"up" class). When I tried testing with 338 data points from each class (the same ones I used for training), I found that the number of support vectors was 666, which is ten less than the number of data points. In this case, the percent accuracy is only 71%, which is unusual since my training and testing data are the exact same.
Do you have any idea what could be going wrong? If you have any suggestions, please let me know.
Thanks!
Test dataset being same as training data implies your training accuracy was 71%. There is nothing wrong about it as the data was possibly not well separable by the kernel you used.
However, one point of concern is the number of support vectors being high suggests probable overfitting .
Not sure if this amounts to an answer - it would probably be hard to give one without actually seeing the data - but here are some ideas regarding the issue you describe:
In general, SVM tries to find a hyperplane that would best separate your classes. However, since you have opted for 1vs1 classification, you have no choice but to mix all negative cases together (your 'rest' class). This might make the 'best' separation much less fit to solve your problem. I'm guessing that this might be a major issue here.
To verify if that's the case, I suggest trying to use only one other cardinal direction as the negative set, and see if that improves results. In case it does, you can train 7 classifiers, one for each direction. Another option might be to use the multiclass option of libSVM, or a tool like SVMLight, which is able to classify one against many.
One caveat of most SVM implementations is their inability to support big differences between the positive and negative sets, even with weighting. From my experience, weighting factors of over 4-5 are problematic in many cases. On the other hand, since your variety in the negative side is large, taking equal sizes might also be less than optimal. Thus, I'd suggest using something like 338 positive examples, and around 1000-1200 random negative examples, with weighting.
A little off your question, I would have considered also other types of classification. To start with, I'd suggest thinking about knn.
Hope it helps :)