How to remove redundant features using weka - machine-learning

I have around 300 features and I want to find the best subset of features by using feature selection techniques in weka. Can someone please tell me what method to use to remove redundant features in weka :)

There are mainly two types of feature selection techniques that you can use using Weka:
Feature selection with wrapper method:
"Wrapper methods consider the selection of a set of features as a search problem, where different combinations are prepared, evaluated and compared to other combinations. A predictive model us used to evaluate a combination of features and assign a score based on model accuracy.
The search process may be methodical such as a best-first search, it may stochastic such as a random hill-climbing algorithm, or it may use heuristics, like forward and backward passes to add and remove features.
An example if a wrapper method is the recursive feature elimination algorithm." [From http://machinelearningmastery.com/an-introduction-to-feature-selection/]
Feature selection with filter method:
"Filter feature selection methods apply a statistical measure to assign a scoring to each feature. The features are ranked by the score and either selected to be kept or removed from the dataset. The methods are often univariate and consider the feature independently, or with regard to the dependent variable.
Example of some filter methods include the Chi squared test, information gain and correlation coefficient scores." [From http://machinelearningmastery.com/an-introduction-to-feature-selection/]
If you are using Weka GUI, then you can take a look at two of my video casts here and here.

Related

What is the motivation for cross validation on feature selection preprocessing?

I saw several articles and examples of feature selection (wrapper & embedded methods) where they split the samples data into train and test sets.
I understand why we need to use cross-validate (split the data into train &test set) for building and testing the score of the models (actual prediction of the propose algorithm).
But I can't understand what is the motivation to do so for feature selection ?
There are no true results of which features we need to choose, so how can it improve the process of feature selection ?
What is the benefit ?
Most feature selection methods, such as wrapper models, require comparing model's performance under the use different feature combinations.
Cross-validation provides a more robust means of comparing the performance when different features subsets are used, and, therefore, a more robust feature selection process. For example, if K-folds cross-validation is used, comparison will be based on the average of the errors from different folds of data, and, therefore, a subset is selected that would result in the smallest generalization error.
Also, optimal hyper-parameters are not necessarily the same for different feature combination. Cross-validation helps with tuning and, therefore, a more fair comparison.
This is also an informative resource on this topic.

Feature Selection Techniques

I am completely new to statistical modelling.I wanted to know what are the feature selection techniques.
Say I have 10 variables but I need to what are actual important one's among them.
I have read about feature selection on internet and came to know few of the techniques:
Correlation
Forward Selection
Backward Elimination
But I am not getting how can I use them. How can a correlation be used in feature selection. How to perform a Forward Selection/Backward Elimination etc.
What models I can use for feature selection. I just want a high level overview of it. When to use what
Some one help me to get started
Correlation - In this approach we see how the target variable is correlated with the predictors and choose the ones which are highly correlated and ignore the others.
Forward Selection - In this we start with 0 predictors and check the model performance. And then at every stage we add one of the predictor which gives the best model performance.
Backward Selection - In this we start with all the predictors. And then at every stage we remove one of the predictors which gives the better model performance.

How many and/or what criteria does CfsSubsetEvaluator use in selecting features in each step of cross-validation while doing feature selection?

I am quite new to WEKA, and I have a dataset of 111 cases with 109 attributes. I am using feature selection tab in WEKA with CfsSubsetEval and BestFirst search method for feature selection. I am using leave-one-out cross-validation.
So, how many features does WEKA pick or what is the stopping criteria for number of features this method selects in each step of cross-validation
Thanks,
Gopi
The CfsSubsetEval algorithm is searching for a subset of features that work well together (have low correlation between the features and a high correlation to the target label). The score of the subset is called merit (you can see it in the output).
The BestFirst search won't allow you to determine the number of features to select. However, you can use other methods such as the GreedyStepWise or using InformationGain/GainRatio algorithms with Rankerand define the size of the feature set.
Another option you can use to influence the size of the set is the direction of the search (forward, backward...).
Good luck

SKlearn (scikit-learn) multivariate feature selection for regression

I want to use a feature selection method where "combinations" of features or "between features" interactions are considered for a simple linear regression.
SelectKBest only looks at one feature to the target, one at a time, and ranks them by Pearson's R values. While this is quick, but I'm afraid it's ignoring some important interactions between features.
Recursive Feature Elimination first uses ALL my features, fits a Linear Regression model, and then kicks out the feature with the smallest absolute value coefficient. I'm not sure whether if this accounts for "between features" interaction...I don't think so since it's simply kicking out the smallest coefficient one at a time until it reaches your designated number of features.
What I'm looking for is, for those seasoned feature selection scientists out there, a method to find the best subset or combination of features. I read through all Feature Selection documentation and can't find a method that describes what I have in mind.
Any tips will be greatly appreciated!!!!!!
I want to use a feature selection method where "combinations" of features or "between features" interactions are considered for a simple linear regression.
For this case, you might consider using Lasso (or, actually, the elastic net refinement). Lasso attempts to minimize linear least squares, but with absolute-value penalties on the coefficients. Some results from convex-optimization theory (mainly on duality), show that this constraint takes into account "between feature" interactions, and removes the more inferior of correlated features. Since Lasso is know to have some shortcomings (it is constrained in the number of features it can pick, for example), a newer variant is elastic net, which penalizes both absolute-value terms and square terms of the coefficients.
In sklearn, sklearn.linear_model.ElasticNet implements this. Note that this algorithm requires you to tune the penalties, which you'd typically do using cross validation. Fortunately, sklearn also contains sklearn.linear_model.ElasticNetCV, which allows very efficient and convenient searching for the values of these penalty terms.
I believe you have to generate your combinations first and only then apply the feature selection step. You may use http://scikit-learn.org/stable/modules/preprocessing.html#generating-polynomial-features for the feature combinations.

Individual feature evaluator

I have a question regarding individual feature evaluator in data mining.
Can OneRAttributeEval, InfoGainAttributeEval, GainRatioAttributeEval, ChiSquaredAttributeEval
be used on non-binary class classifiers?
Yes, these feature selection techniques can be used in the context of a multiclass classification problem. Usually, if something works for two classes, it would be extendable to handle multiple classes (2 or more). If you briefly look at how these techniques work, you would understand.
OneR basically constructs a single rule for a feature and calculates the classification accuracy, and the feature selection selects the feature that provides the best performance. Using only a single rule to evaluate the usefulness of features in the context of a multiclass problem may not be the best way, but it can be done.
With regards to the other three techniques, the measures - information gain, gain ratio, chi-square measure - used to evaluate the usefulness of features already take into account a weighted score for each class. Therefore, these techniques can be used for selecting features in the context of multiclass classification.
Also, a quick search and I found the following links:
OneRAttributeEval
InfoGainAttributeEval
GainRatioAttributeEval
ChiSquaredAttributeEval
It seems to me that you are looking at these exact functions. If you look at each of these links, you should be able to find "Capabilities". Below that, you can find that each of these functions can handle "Nominal class" (in the row "Class"). Nominal is multiclass.

Resources