how to combine mahout recommendations [closed] - mahout

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I'd like to build a simple recommendation system. Let's say for online shop, where I have events like purchases, likes, views.
Currently, I understand how to build a recommendation for each of those types of events separately. But, I can't figure out how to combine those results to provide a user a single list of the most relevant items.
It would be great if you could point me to the docs or briefly explain so I could google it.
Thanks in advance!

There are different ways how to combine the recommendations.
One straight forward way is built three types of recommenders (or as many as you need), and put the recommendations from all of them into one list, and sort it by the estimated preference value. You can even have a wrapper recommender that combines your other recommenders underneath.
Another way is to combine the similarity metrics, instead of the recommendations. Again, you will have a CustomSimilarity class that implements the User/ItemSimilarity, depending on what you need, and combine the outputs of your individual similarity metrics into one as a linear combination. You should actually be careful when combining similarities. They should all be either User similarity measures or Item similarity measures. Then you will use this CustomSimilarity measure for your recommender.
You can read more about hybrid recommendation in this book.

Related

Which technique is used in Auto Answering user queries in Artificial Intelligence? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have a paragraph, system has to understand it and it should answer all the questions asked by the user. Please name the techniques and methodologies.
It all depends on the problem that you are trying to solve, the data available to you and the underlying domain. Lets get to it one by one:
Type of Problem
There are multiple types of question answering systems, like one word answers based on extract the exact answer from various sentences, or returning the most similar sentence from a list of sentences based on the question asked by the user, using various similarity and embedding techniques. I think this paper : Teaching Machines to Read and Comprehend should be a good place to start getting an idea about such systems.
Dataset
Next comes the dataset for such systems. Now there are various datasets available for question answering systems like :
SQuAD dataset
QA dataset based on Wikipedia Articles
Facebook bAbI dataset
AllenAI dataset based elementary Science question
NewsQA datset
Methodologies
Well there are multiple ways to go about solving this problem. It would be difficult to list all of them in one answer, but I can provide you some references:
Deep Learning for Question Answering
Various Deep Learning models on Question answering
SquAD dataset Leaderboard
Question Answering based on Word Alignment
Attention Based Question Answering
Reasoning-based QA

use catboost for ranking task [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I'd like to know how to configure catboost for a ranking task. The catboost homepage alludes that it can be used for ranking tasks. However, it seems documentation for ranking tasks is scarce:
https://tech.yandex.com/catboost/doc/dg/concepts/cli-reference_train-model-docpage/
and all of the tutorials are focused on classifying individual instances:
https://github.com/catboost/catboost/tree/master/catboost/tutorials
Ideally there would be some documentation or examples similar to LightGBM for ranking: https://github.com/Microsoft/LightGBM/tree/master/examples/lambdarank
Has anyone used catboost for ranking?
Starting from version 0.9 CatBoost supports several ranking modes.
To use a ranking mode you need to build a dataset that contains groups of objects (user group_id for that). The alrogithm will try to find the best order within a group.
There are two pairwise modes in CatBoost, PairLogit and PairLogitPairwise. For a pairwise mode you need to provide pairs as a part of your dataset. PairLogit is much faster but PairLogitPairwise might give better quality on large datasets.
There are two ranking modes YetiRank and YetiRankPairwise. To use them you need to have labels in your dataset. The difference between them is the same, YetiRankPairwise is more computationally expensive, but might give better results.
There are also a mix between ranking and regression (QueryRMSE), a mix between ranking and classification (QueryCrossEntropy) and a QuerySoftMax loss.

What is the difference between feature engineering and feature extraction? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I am struggling to find the difference between the two concepts. From what I understand both refer to turning raw data into more comprehensive features to describe the problem at hand. Are they the same thing? If not could anyone please provide examples for both?
Feature extraction is usually used when the original data was very different. In particular when you could not have used the raw data.
E.g. original data were images. You extract the redness value, or a description of the shape of an object in the image. It's lossy, but at least you get some result now.
Feature engineering is the careful preprocessing into more meaningful features, even if you could have used the old data.
E.g. instead of using variables x, y, z you decide to use log(x)-sqrt(y)*z instead, because your engineering knowledge tells you that this derived quantity is more meaningful to solve your problem. You get better results than without.
Feature engineering - is transforming raw data into features/attributes that better represent the underlying structure of your data, usually done by domain experts.
Feature Extraction - is transforming raw data into the desired form.

Authorship Attribution using Machine Learning [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I am working on a practical machine learning problem as an exercise. I just need help formulating my problem.
I have text from 20 books of a famous old Author. there are 5 more books that has been debated throughout history if the belong to the same author or not.
I am thinking about the best way to represent this problem. I am thinking of using a bag-of-words appoach to find the most significant words used by the author.
Should I treat it as a Naive Bayes (Spam/Ham) problem, or should I use KNN classification (Author/non-author) to detect the class of each document. Is there another way of doing it?
I think Naive Bayes can give you insights. One more way can be , find out features which separate such books ex
1. Complexity of words , some writers are easy to understand and use common words , i am hinting towards IDF (Inverse document frequency)
2. Some words may not not even exist at his time like "selfie" , "mobile" etc.
Try to find a lot of features like that and can also train a discriminative classifier.

Categorize social events [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I am having name and description of event and i want to find out about the categories of the event(for example is it entertainment event, politic event or something else).
I was searching on the web and i looked at some natural language processing techniques such as Latent Dirichlet Allocation but i can not see a way to use it in my situation.
Is it a good idea to try to categorize by having predefined keywords for each category, and then to query the text and decide by the amount of keywords from each category?
Can someone give me a clue about my problem ? Many thanks
One approach you could take is to start simple and use a bayesian classifier to analyze/classify your data.
I would approach this problem by taking your dataset and splitting it into a training dataset and a non-training dataset. Then, manually review each event and categorize it as a type of event. Using this training dataset to run your classifier against the remainder of your data.
This may not be ideal for a large amount of event types but it might be a way for you to get started addressing the problem.

Resources