Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I am working on a practical machine learning problem as an exercise. I just need help formulating my problem.
I have text from 20 books of a famous old Author. there are 5 more books that has been debated throughout history if the belong to the same author or not.
I am thinking about the best way to represent this problem. I am thinking of using a bag-of-words appoach to find the most significant words used by the author.
Should I treat it as a Naive Bayes (Spam/Ham) problem, or should I use KNN classification (Author/non-author) to detect the class of each document. Is there another way of doing it?
I think Naive Bayes can give you insights. One more way can be , find out features which separate such books ex
1. Complexity of words , some writers are easy to understand and use common words , i am hinting towards IDF (Inverse document frequency)
2. Some words may not not even exist at his time like "selfie" , "mobile" etc.
Try to find a lot of features like that and can also train a discriminative classifier.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I already have 85 accuracy on my sklearn text classifier. What are the advantages and disadvantages of making a rule based system? Can save doing double the work? Maybe you can provide me with sources and evidence for each side, so that I can make the decision baed on my cirucumstances. Again, I want to know when ruls-based approach is favorable versus when a ML based approach is favorable? Thanks!
Here is an idea:
Instead of going one way or another, you can set up a hybrid model. Look at typical errors your machine learning classifier makes, and see if you can come up with a set of rules that capture those errors. Then run these rules on your input, and if they applied, finish there; if not, pass the input on to the classifier.
In the past I did this with a probabilistic part-of-speech tagger. It's difficult to tune a probabilistic model, but it's easy to add a few pre- or post-processing rules to capture some consistent errors.
https://www.linkedin.com/feed/update/urn:li:activity:6674229787218776064?commentUrn=urn%3Ali%3Acomment%3A%28activity%3A6674229787218776064%2C6674239716663156736%29
Yoel Krupnik (CTO & co-founder | smrt - AI For Accounting) writes:
I think it really depends on the specific problem. Some problems can be completely solved with rule based logic, some require machine learning (often in combination with rule based logic before or after).
Advantages of the rule based are that it doesn't require labeled training data, might quickly provide decent results used as a benchmark and helps you better understand the problem for future labeling / text manipulations required by the ML algorithm.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have a paragraph, system has to understand it and it should answer all the questions asked by the user. Please name the techniques and methodologies.
It all depends on the problem that you are trying to solve, the data available to you and the underlying domain. Lets get to it one by one:
Type of Problem
There are multiple types of question answering systems, like one word answers based on extract the exact answer from various sentences, or returning the most similar sentence from a list of sentences based on the question asked by the user, using various similarity and embedding techniques. I think this paper : Teaching Machines to Read and Comprehend should be a good place to start getting an idea about such systems.
Dataset
Next comes the dataset for such systems. Now there are various datasets available for question answering systems like :
SQuAD dataset
QA dataset based on Wikipedia Articles
Facebook bAbI dataset
AllenAI dataset based elementary Science question
NewsQA datset
Methodologies
Well there are multiple ways to go about solving this problem. It would be difficult to list all of them in one answer, but I can provide you some references:
Deep Learning for Question Answering
Various Deep Learning models on Question answering
SquAD dataset Leaderboard
Question Answering based on Word Alignment
Attention Based Question Answering
Reasoning-based QA
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I am trying to predict tags for stackoverflow questions and I am not able to decide which Machine Learning algorithm will be a correct approach for this.
Input: As a dataset I have mined stackoverflow questions, I have tokenized the data set and removed stopwords and punctuation from this data.
Things i have tried:
TF-IDF
Trained Naive Bayes on the dataset and then gave user defined input to predict tags, but its not working correctly
Linear SVM
Which ML algorithm I should use Supervised or Unsupervised? If possible please, suggest a correct ML approach from the scratch. PS: I have the list of all tags present on StackOverflow so, will this help in anyway? Thanks
I would try MLP. In order to begin I would choose a reasonably small set of keywords for input and encode them [1..100 for example] and train for a reasonably small set of output tags.
PS: Unsupervised learning for this task is unfavorable in general because many questions that refer to different tags have very similar content and are very likely to get clustered together.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I am having name and description of event and i want to find out about the categories of the event(for example is it entertainment event, politic event or something else).
I was searching on the web and i looked at some natural language processing techniques such as Latent Dirichlet Allocation but i can not see a way to use it in my situation.
Is it a good idea to try to categorize by having predefined keywords for each category, and then to query the text and decide by the amount of keywords from each category?
Can someone give me a clue about my problem ? Many thanks
One approach you could take is to start simple and use a bayesian classifier to analyze/classify your data.
I would approach this problem by taking your dataset and splitting it into a training dataset and a non-training dataset. Then, manually review each event and categorize it as a type of event. Using this training dataset to run your classifier against the remainder of your data.
This may not be ideal for a large amount of event types but it might be a way for you to get started addressing the problem.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
(homework problem)
Which of the following problems are best suited for the learning approach?
Classifying numbers into primes and non-primes.
Detecting potential fraud in credit card charges.
Determining the time it would take a falling object to hit the ground.
Determining the optimal cycle for trafic lights in a busy intersection
I'm trying to answer your question without doing your homework.
Basically you can think of machine learning as a way to extract patterns from data where all other approaches fail.
So first clue here: If there is an analytic way to solve the problem then don't use machine learning! The analytic algorithm will likely be faster, more efficient, and 100% correct.
Second clue is: There has to be a pattern in the data. If you as a human see a pattern, machine learning can find it too. If lots of smart humans who are experts of the respective domain don't see a pattern then machine learning will most likely fail. Chaos can not be learned, i.e. classified/predicted.
That should answer your question. Make sure to also read the summary on wikipedia to get an idea whether a problem can be solved using supervised, unsupervised, or reinforcement learning.