Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
Can someone please give me some suggestions on which feature selection techniques for gene classification should I use?
The major problem to work with gene expression data, with a large number of dimensions and small sample size. Instead of standard feature extraction/selection algorithms, generally, kernel-based feature selection algorithms are applied to gene expression data such as KBMTL(kernelized Bayesian multitask learning), NDR(nonlinear dimensionality reduction) or regularized linear methods such as LASSO and Elastic-net.
You can check these papers to learn more about how to make efficient feature selection on gene expression data.
paper1
paper2
paper3
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I am looking for the machine learning correct approach for predicting the lottery numbers, not the most accurate answer but at least we have some predicted output. I am implementing the regression based and neural network models for this. Is their any specific approach which follows this?
It is impossible. The lottery numbers are random - actually to be more specific, the system is chaotic. You would require the initial configuration (positions etc) to insane (possibly infinite) precision to be able to make any predictions. Basically, don't even try it.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Can feature selection algorithms in Scikit-Learn or in other machine learning python modules be used with categorical values in dataset?
Not directly. If your column is categorical, you need to encode it
into numerical representation. If your column consist of textual entries, you would first need to transform this text into numerical vectors - check bag of words or tf-idf for example.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I want to train an output vector(which is from deep learning model) like fixed vector. Hence, I chose a cosine similarity between two vectors as the objective function. However, I don't know if that is a correct approach for my need.
No. The cosine similarity is a measure of how similar two items (samples in your dataset) are.
In contrast, the objective function when training a neural network should be a definition of the current estimation error over the data - so they are different things.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I would like to know the best available algorithms for text Classification. I want to classify the document based on Sports, Bank, technology etc.Please suggest good algorithms to get highest accuracy.
There is no best algorithm. See "4th Law of Data Mining – “NFL-DM” http://khabaza.codimension.net/index_files/9laws.htm
You do want an algorithm that can handle many columns. More columns than rows if need be. This rules out matrix-based algorithms.
Naive Bayes and SVM are popular choices for text classification.
The good accuracy is not only based on the machine-learning algorithm. Is is also based on the feature selection.
Try to define task specific features or analyze your feature space.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I am trying to predict tags for stackoverflow questions and I am not able to decide which Machine Learning algorithm will be a correct approach for this.
Input: As a dataset I have mined stackoverflow questions, I have tokenized the data set and removed stopwords and punctuation from this data.
Things i have tried:
TF-IDF
Trained Naive Bayes on the dataset and then gave user defined input to predict tags, but its not working correctly
Linear SVM
Which ML algorithm I should use Supervised or Unsupervised? If possible please, suggest a correct ML approach from the scratch. PS: I have the list of all tags present on StackOverflow so, will this help in anyway? Thanks
I would try MLP. In order to begin I would choose a reasonably small set of keywords for input and encode them [1..100 for example] and train for a reasonably small set of output tags.
PS: Unsupervised learning for this task is unfavorable in general because many questions that refer to different tags have very similar content and are very likely to get clustered together.