For my research I am using Weka to predict alpha values for different uses. The legal range of alpha is any real number between 0 and 1 inclusive. It is currently performing well, but some of the predictions are greater than 1. I want to keep the classifier as numerical since it is a real number, but I want to limit the range of the prediction to between 0 and 1. Any ideas on how to do this?
I think that #Lars-Kotthoff raises interesting points. I would provide my suggestions from a different perspective, ignoring completely the classification problems:
Once you have a set of values within a range [0, inf), you can just try to normalised them using some function such as logit or min-max, among others.
You can't do this in Weka. Whether it will be possible at all will depend on the implementation of the regression algorithm -- I'm not aware of something like this being implemented in any of the algorithms in Weka (although I might be wrong).
Even if it was implemented, the most likely thing that would happen is that everything greater than 1 would simply be replaced by 1. You can do the same thing by checking each prediction and replacing all values greater than 1.
Taking the possible output range into account when training the regression model is unlikely to improve performance.
Related
I do have a question and I need your support. I have a data set which I am analyzing. I need to predict a target. To do this I did some data cleaning, among others drop highly (linear correlated feautes)
After preparing my data I applied random forest regressor (it is a regression problem). I am stucked a bit, since I really cannot catch the meaning and thus the value for max_features
I found the following page answer, where it is written
features=n_features for regression is a mistake on scikit's part. The original paper for RF gave max_features = n_features/3 for regression
I do get different results if I use max_features=sqrt(n) or max_features=n_features
Can any1 give me a good explanation how to approach this parameter?
That would be really great
max_features is a parameter that needs to be tuned. Values such as sqrt or n/3 are defaults and usually perform decently, but the parameter needs to be optimized for every dataset, as it will depend on the features you have, their correlations and importances.
Therefore, I suggest training the model many times with a grid of values for max_features, trying every possible value from 2 to the total number of your features. Train your RandomForestRegressor with oob_score=True and use oob_score_ to assess the performance of the Forest. Once you have looped over all possible values of max_features, keep the one that obtained the highest oob_score.
For safety, keep the n_estimators on the high end.
PS: this procedure is basically a grid search optimization for one parameter, and is usually done via Cross Validation. Since RFs give you OOB scores, you can use these instead of CV scores, as they are quicker to compute.
I am trying to build a classifier to predict breast cancer using the UCI dataset. I am using support vector machines. Despite my most sincere efforts to improve upon the accuracy of the classifier, I cannot get beyond 97.062%. I've tried the following:
1. Finding the most optimal C and gamma using grid search.
2. Finding the most discriminative feature using F-score.
Can someone suggest me techniques to improve upon the accuracy? I am aiming at at least 99%.
1.Data are already normalized to the ranger of [0,10]. Will normalizing it to [0,1] help?
2. Some other method to find the best C and gamma?
For SVM, it's important to have the same scaling for all features and normally it is done through scaling the values in each (column) feature such that the mean is 0 and variance is 1. Another way is to scale it such that the min and max are for example 0 and 1. However, there isn't any difference between [0, 1] and [0, 10]. Both will show the same performance.
If you insist on using SVM for classification, another way that may result in improvement is ensembling multiple SVM. In case you are using Python, you can try BaggingClassifier from sklearn.ensemble.
Also notice that you can't expect to get any performance from a real set of training data. I think 97% is a very good performance. It is possible that you overfit the data if you go higher than this.
some thoughts that have come to my mind when reading your question and the arguments you putting forward with this author claiming to have achieved acc=99.51%.
My first thought was OVERFITTING. I can be wrong, because it might depend on the dataset - But the first thought will be overfitting. Now my questions;
1- Has the author in his article stated whether the dataset was split into training and testing set?
2- Is this acc = 99.51% achieved with the training set or the testing one?
With the training set you can hit this acc = 99.51% when your model is overfitting.
Generally, in this case the performance of the SVM classifier on unknown dataset is poor.
Hello and thanks for helping,
My question is a long time problem that I try to tackle :
How do we train a neural network if the input is a probability rather than a value ?
To make it more intuitive :
Let's say we have 6 features and the value they may take is 1 or -1 for each.
Their value is determined probabilistically, such as the feature 1 can be 1 with 60% probability or -1 with 30% probability.
How do we train the network if in each trial, we may get a INPUT value in accordance with the probability distribution of each feature ?
Actually the answer is more straingthforward than you might expect, as many existing neural networks are actually trained exactly in this manner. You have to do ... nothing. Simply sample your batch in each iteration according to your distribution and that's all. Neural network does not require finite training set, thus you can efficiently train it on "potentialy ifinite" one (generator of samples). This is exactly what is being done in image processing with image augmentation - each batch consists random subsamples of the images (patches), which are sampled from very basic probability distributions.
#Nagabuhushan suggests solving different problem - where you know a priori probability of each sample, which, according to question is not the case:
we may get a INPUT value in accordance with the probability distribution of each feature
Plus, even if it would be the case, NNs are not good with multiplying thus one might need additional tweaking of architecture (log-transforms).
For the values you feed into the net, you should use the probabilities of each feature taking on the value 1. You could use the probabilities of them taking on -1, but be consistent. Also, determine some order of features and consistently order their probabilities, respectively.
Edit: I think I may have misunderstood the question. Do your inputs consist of probabilities, or 1's and -1's? If the latter, then a well-architected network should learn the distributions on its own. Just be sure to train it against the same input space that you'll be evaluating it against.
I'm doing some experiments with Weka Multilayer Perceptron, and I have some questions relating to its parameters. I've checked the help document but couldn't understand:
What is nominalToBinaryFilter? How to use?
normalizeAttribute: I think this is to scale value of features to [-1, 1] range. But how they do it in case the value is not numeric, for example with weather dataset.
reset: This will reset if the current training process diverges and start again with a lower learning rate. How much should we decrease the current learning rate? (how to identify the next learning rate)
Initial weights: This isn't a parameter, but how they initialize initial weights? Is it symmetric (something like values inside [-ε, +ε])?
It has been a while since i used WEKA, but here are my comments about bullets 2,3 and 4 which may seem useful to you:
Bullet 2: Normalization is non applicable to categorical (non numerical) attributes so you don't need to worry about this parameter.
Bullet 3: By default reset sets the learning rate to half. Adjustment of learning rate depends on many factors and I suggest searching scholarly articles in case you think you are not covered by the default approach. From my experiecnce,a rule of thumb is to alter learning rate in steps of 0,1
Bullet 4: Initial weights are small random numbers that are not identical
I know this is very old but wanted to add regarding bullet 4:
The seed paramter is used to seed a random number generator that is then used to generate the random initial weights. Therefore, if you wanted to explore sensitivity to initial weights, you can use different values here.
I am working on a classification problem, which has different sensors. Each sensor collect a sets of numeric values.
I think its a classification problem and want to use weka as a ML tool for this problem. But I am not sure how to use weka to deal with the input values? And which classifier will best fit for this problem( one instance of a feature is a sets of numeric value)?
For example, I have three sensors A ,B, C. Can I define 5 collected data from all sensors,as one instance? Such as, One instance of A is {1,2,3,4,5,6,7}, and one instance of B is{3,434,534,213,55,4,7). C{424,24,24,13,24,5,6}.
Thanks a lot for your time on reviewing my question.
Commonly the first classifier to try is Naive Bayes (you can find it under "Bayes" directory in Weka) because it's fast, parameter less and the classification accuracy is hard to beat whenever the training sample is small.
Random Forest (you can find it under "Tree" directory in Weka) is another pleasant classifier since it process almost any data. Just run it and see whether it gives better results. It can be just necessary to increase the number of trees from the default 10 to some higher value. Since you have 7 attributes 100 trees should be enough.
Then I would try k-NN (you can find it under "Lazy" directory in Weka and it's called "IBk") because it commonly ranks amount the best single classifiers for a wide range of datasets. The only issues with k-nn are that it scales badly for large datasets (> 1GB) and it needs to fine tune k, the number of neighbors. This value is by default set to 1 but with increasing number of training samples it's commonly better to set it up to some higher integer value in range from 2 to 60.
And finally for some datasets where both, Naive Bayes and k-nn performs poorly, it's best to use SVM (under "Functions", it's called "Lib SVM"). However, it can be hassle to set up all the parameters of the SVM to get competitive results. Hence I leave it to the end when I already know what classification accuracies to expect. This classifier may not be the most convenient if you have more than two classes to classify.