Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I am working on a project which needs to determine if a word is a fruit. I have tried several approaches but not satisfied with any of the results. Any suggestions?
My training set looks like this
Input: Apple is a fruit. Output: Apple.
Input: Guava is also a fruit Output: Guava.
Input: Pineapple is a seasonal fruit Output: Pineapple.
Example when running outside training data:
Input: I love all fruits but favorites are guava and apple. Output: Guava, Apple
This task is known as Named Entity Recognition. You can read about it on Wikipedia for starters.
A popular library for this is CoreNLP from Stanford. You can read about it on the Stanford Natural Language Processing Groups website.
In order to use it you need to label each token (word) in your training data indicating if it's a fruit or not. Hope this helps.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have a paragraph, system has to understand it and it should answer all the questions asked by the user. Please name the techniques and methodologies.
It all depends on the problem that you are trying to solve, the data available to you and the underlying domain. Lets get to it one by one:
Type of Problem
There are multiple types of question answering systems, like one word answers based on extract the exact answer from various sentences, or returning the most similar sentence from a list of sentences based on the question asked by the user, using various similarity and embedding techniques. I think this paper : Teaching Machines to Read and Comprehend should be a good place to start getting an idea about such systems.
Dataset
Next comes the dataset for such systems. Now there are various datasets available for question answering systems like :
SQuAD dataset
QA dataset based on Wikipedia Articles
Facebook bAbI dataset
AllenAI dataset based elementary Science question
NewsQA datset
Methodologies
Well there are multiple ways to go about solving this problem. It would be difficult to list all of them in one answer, but I can provide you some references:
Deep Learning for Question Answering
Various Deep Learning models on Question answering
SquAD dataset Leaderboard
Question Answering based on Word Alignment
Attention Based Question Answering
Reasoning-based QA
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I am working on a practical machine learning problem as an exercise. I just need help formulating my problem.
I have text from 20 books of a famous old Author. there are 5 more books that has been debated throughout history if the belong to the same author or not.
I am thinking about the best way to represent this problem. I am thinking of using a bag-of-words appoach to find the most significant words used by the author.
Should I treat it as a Naive Bayes (Spam/Ham) problem, or should I use KNN classification (Author/non-author) to detect the class of each document. Is there another way of doing it?
I think Naive Bayes can give you insights. One more way can be , find out features which separate such books ex
1. Complexity of words , some writers are easy to understand and use common words , i am hinting towards IDF (Inverse document frequency)
2. Some words may not not even exist at his time like "selfie" , "mobile" etc.
Try to find a lot of features like that and can also train a discriminative classifier.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I am having name and description of event and i want to find out about the categories of the event(for example is it entertainment event, politic event or something else).
I was searching on the web and i looked at some natural language processing techniques such as Latent Dirichlet Allocation but i can not see a way to use it in my situation.
Is it a good idea to try to categorize by having predefined keywords for each category, and then to query the text and decide by the amount of keywords from each category?
Can someone give me a clue about my problem ? Many thanks
One approach you could take is to start simple and use a bayesian classifier to analyze/classify your data.
I would approach this problem by taking your dataset and splitting it into a training dataset and a non-training dataset. Then, manually review each event and categorize it as a type of event. Using this training dataset to run your classifier against the remainder of your data.
This may not be ideal for a large amount of event types but it might be a way for you to get started addressing the problem.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Hi i want to predict health level(High,medium,low) in leaf using image processing and data mining.So far i thought using extract colors from leaf using Bayes algorithm to predict healthy of leaf. and data mining part have completed now.but i need extra features for prediction.we only used orchid leaf.So i can't use vain structure.Can anyone help me to what are the other features can be extracted from leaf for identify health level of leaf.Any idea or comments help me to improve my project. Thanks
There are many possible approaches to a problem like this. One common method is the bag-of-features model. Take a look at this example using the Computer Vision System Toolbox in MATLAB.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
i am new to data mining,i just want to know which feature selection is easy and best for time series data.
as my project is share market prediction...these are the following parameters available and i have to select best 5 features for constructing my model.can anyone help me which one to use and how to do it..
Features available are
symbol,series,date,prev close,open price,high price,low price,last price,close price,average price,total traded quantity,turnover in lacs, deliverable qty,% deliv qty to to traded qty
you can see those features in this link
click on it to see the data in nse website to know how data represents
Here I will not do your home work to select the features(attributes) but I can help you so that you can also select features:-
You can use Minimum-redundancy-maximum-relevance.
Do by choosing that features which are less repeating and giving giving high co-relation with output.You must google for more information about mRMR algorithm.