Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I am having name and description of event and i want to find out about the categories of the event(for example is it entertainment event, politic event or something else).
I was searching on the web and i looked at some natural language processing techniques such as Latent Dirichlet Allocation but i can not see a way to use it in my situation.
Is it a good idea to try to categorize by having predefined keywords for each category, and then to query the text and decide by the amount of keywords from each category?
Can someone give me a clue about my problem ? Many thanks
One approach you could take is to start simple and use a bayesian classifier to analyze/classify your data.
I would approach this problem by taking your dataset and splitting it into a training dataset and a non-training dataset. Then, manually review each event and categorize it as a type of event. Using this training dataset to run your classifier against the remainder of your data.
This may not be ideal for a large amount of event types but it might be a way for you to get started addressing the problem.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have a paragraph, system has to understand it and it should answer all the questions asked by the user. Please name the techniques and methodologies.
It all depends on the problem that you are trying to solve, the data available to you and the underlying domain. Lets get to it one by one:
Type of Problem
There are multiple types of question answering systems, like one word answers based on extract the exact answer from various sentences, or returning the most similar sentence from a list of sentences based on the question asked by the user, using various similarity and embedding techniques. I think this paper : Teaching Machines to Read and Comprehend should be a good place to start getting an idea about such systems.
Dataset
Next comes the dataset for such systems. Now there are various datasets available for question answering systems like :
SQuAD dataset
QA dataset based on Wikipedia Articles
Facebook bAbI dataset
AllenAI dataset based elementary Science question
NewsQA datset
Methodologies
Well there are multiple ways to go about solving this problem. It would be difficult to list all of them in one answer, but I can provide you some references:
Deep Learning for Question Answering
Various Deep Learning models on Question answering
SquAD dataset Leaderboard
Question Answering based on Word Alignment
Attention Based Question Answering
Reasoning-based QA
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
New to machine learning so looking for some direction how to get started. The end goal is to be able to train a model to count the number of objects in an image using Tensorflow. My initial focus will be to train the model to count one specific type of object. So lets say I take coins. I will only train the model to count coins. Not worried about creating a generic counter for all different types of objects. I've only done Google's example of image classification of flowers and I understand the basics of that. So looking for clues how to get started. Is this an image classification problem and I can use the same logic as the flowers...etc etc?
Probably the best performing solution for the coin problem would be to use a regression to solve this. Annotate 5k images with the amount of objects in the scene and run your model on it. Then your model just outputs the correct number. (Hopefully)
Another way is to classify if an image shows a coin and use a sliding window approach like this one: https://arxiv.org/pdf/1312.6229.pdf to classify for each window if it shows a coin. Then you count the found regions. This one is easier to annotate and learn and better extensible. But you have the problem of choosing good windows and using the result of those windows in a concise way.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I am trying to predict tags for stackoverflow questions and I am not able to decide which Machine Learning algorithm will be a correct approach for this.
Input: As a dataset I have mined stackoverflow questions, I have tokenized the data set and removed stopwords and punctuation from this data.
Things i have tried:
TF-IDF
Trained Naive Bayes on the dataset and then gave user defined input to predict tags, but its not working correctly
Linear SVM
Which ML algorithm I should use Supervised or Unsupervised? If possible please, suggest a correct ML approach from the scratch. PS: I have the list of all tags present on StackOverflow so, will this help in anyway? Thanks
I would try MLP. In order to begin I would choose a reasonably small set of keywords for input and encode them [1..100 for example] and train for a reasonably small set of output tags.
PS: Unsupervised learning for this task is unfavorable in general because many questions that refer to different tags have very similar content and are very likely to get clustered together.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I am working on a practical machine learning problem as an exercise. I just need help formulating my problem.
I have text from 20 books of a famous old Author. there are 5 more books that has been debated throughout history if the belong to the same author or not.
I am thinking about the best way to represent this problem. I am thinking of using a bag-of-words appoach to find the most significant words used by the author.
Should I treat it as a Naive Bayes (Spam/Ham) problem, or should I use KNN classification (Author/non-author) to detect the class of each document. Is there another way of doing it?
I think Naive Bayes can give you insights. One more way can be , find out features which separate such books ex
1. Complexity of words , some writers are easy to understand and use common words , i am hinting towards IDF (Inverse document frequency)
2. Some words may not not even exist at his time like "selfie" , "mobile" etc.
Try to find a lot of features like that and can also train a discriminative classifier.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I'd like to build a simple recommendation system. Let's say for online shop, where I have events like purchases, likes, views.
Currently, I understand how to build a recommendation for each of those types of events separately. But, I can't figure out how to combine those results to provide a user a single list of the most relevant items.
It would be great if you could point me to the docs or briefly explain so I could google it.
Thanks in advance!
There are different ways how to combine the recommendations.
One straight forward way is built three types of recommenders (or as many as you need), and put the recommendations from all of them into one list, and sort it by the estimated preference value. You can even have a wrapper recommender that combines your other recommenders underneath.
Another way is to combine the similarity metrics, instead of the recommendations. Again, you will have a CustomSimilarity class that implements the User/ItemSimilarity, depending on what you need, and combine the outputs of your individual similarity metrics into one as a linear combination. You should actually be careful when combining similarities. They should all be either User similarity measures or Item similarity measures. Then you will use this CustomSimilarity measure for your recommender.
You can read more about hybrid recommendation in this book.