what does 0 as output means in AverageAbsoluteDifferenceRecommenderEvaluator in mahout? - mahout

I am currently playing with Apache mahout and reading book Mahout in Action and it confused me about the evaluator which we use in evaluation of recommender system and specifically i wanted to ask about AverageAbsoluteDifferenceRecommenderEvaluator i.e when it results in 0.
Does it means there is no error or does it means recommendation system is very bad?
Thanks.

0 means a perfect match.
RecommenderEvaluator returns MeanAverageError, representing how well the Recommender's estimated preferences match real values; lower scores mean a better match: 0 no error at all.
more

Related

LDA: Coherence Values using u_mass v c_v

I am currently attempting to record and graph coherence scores for various topic number values in order to determine the number of topics that would be best for my corpus. After several trials using u_mass, the data proved to be inconclusive since the scores don't plateau around a specific topic number. I'm aware that CV ranges from -14 to 14 when using u_mass, however my values range from -2 to -1 and selecting an accurate topic number is not possible. Due to these issues, I attempted to use c_v instead of u_mass but I receive the following error:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
This is my code for computing the coherence value
cm = CoherenceModel(model=ldamodel, texts=texts, dictionary=dictionary,coherence='c_v')
print("THIS IS THE COHERENCE VALUE ")
coherence = cm.get_coherence()
print(coherence)
If anyone could provide assistance in resolving my issues for either c_v or u_mass, it would be greatly appreciated! Thank you!

Mahout : What is the value returned by AverageAbsoluteDifferenceEvaluator on TanimotoCoefficientSimilarity?

I'm trying to improve the mahout recommendation implementation in a project, and I found out that my predecessor used tanimotoCoefficientSimilarity for a dataset with preference value 1-5. I changed it to UncenteredCosineSimilarity, and now I'm trying to test its improvement in performance.
I tried using AverageAbsoluteDifferenceEvaluator on both, but realised that this should not be used for Tanimoto since it does not return the expected value of the preference.
However, the value seems odd and I don't quite understand what the value this implementation is returning represents. The average preference value of the dataset is 3.2, and if Tanimoto was to return a value in the range [0,1], then the output of AverageAbsoluteDifferenceEvaluator must be in the range [2.2, 3.2], but it consistently returns a value in the range [0.8, 1.1].
Does anyone have explanation for this?
Thank you.
TanimotoCoefficientSimilarity works without coefficients - so AverageAbsoluteDifferenceEvaluator not have any sense for TanimotoCoefficientSimilarity

Is this a correct implementation of Q-Learning for Checkers?

I am trying to understand Q-Learning,
My current algorithm operates as follows:
1. A lookup table is maintained that maps a state to information about its immediate reward and utility for each action available.
2. At each state, check to see if it is contained in the lookup table and initialise it if not (With a default utility of 0).
3. Choose an action to take with a probability of:
(*ϵ* = 0>ϵ>1 - probability of taking a random action)
1-ϵ = Choosing the state-action pair with the highest utility.
ϵ = Choosing a random move.
ϵ decreases over time.
4. Update the current state's utility based on:
Q(st, at) += a[rt+1, + d.max(Q(st+1, a)) - Q(st,at)]
I am currently playing my agent against a simple heuristic player, who always takes the move that will give it the best immediate reward.
The results - The results are very poor, even after a couple hundred games, the Q-Learning agent is losing a lot more than it is winning. Furthermore, the change in win-rate is almost non-existent, especially after reaching a couple hundred games.
Am I missing something? I have implemented a couple agents:
(Rote-Learning, TD(0), TD(Lambda), Q-Learning)
But they all seem to be yielding similar, disappointing, results.
There are on the order of 10²⁰ different states in checkers, and you need to play a whole game for every update, so it will be a very, very long time until you get meaningful action values this way. Generally, you'd want a simplified state representation, like a neural network, to solve this kind of problem using reinforcement learning.
Also, a couple of caveats:
Ideally, you should update 1 value per game, because the moves in a single game are highly correlated.
You should initialize action values to small random values to avoid large policy changes from small Q updates.

Multi-variable Recommender System

I went through tutorials on implementing Recommender System and most of them takes one variable (rank).
I want to implement an Item-Based Recommender System which takes multiple variables.
Eg : Let's say an Item (bar) has following varables (values ranging from -10 to +10, to express opposite polarities)
- price (cheap to expensive)
- environment (casual to fine)
- age range (young to adults)
Now I want to recommend items (bar) looking at the list of bars registered in User's history.
Is this kind of a "multi dimensional recommender system" possible to implement using Mahout or any other framework ?
You want the multi-modal, multi-indicator, multi-variable, how every you want to describe it—Universal Recommender. It can handle all this data. We've tested it on real datasets and get significant boost in precision test because of what we call "secondary indicators".
Good intuition. Give the UR a look: blog.actionml.com, check out the slides in one post. Code here: https://github.com/actionml/template-scala-parallel-universal-recommendation/tree/v0.3.0 Built on the new Spark version of Mahout: http://mahout.apache.org/users/algorithms/intro-cooccurrence-spark.html

Which Improvements can be done to AnyTime Weighted A* Algorithm?

Firstly , For those of your who dont know - Anytime Algorithm is an algorithm that get as input the amount of time it can run and it should give the best solution it can on that time.
Weighted A* is the same as A* with one diffrence in the f function :
(where g is the path cost upto node , and h is the heuristic to the end of path until reaching a goal)
Original = f(node) = g(node) + h(node)
Weighted = f(node) = (1-w)g(node) +h(node)
My anytime algorithm runs Weighted A* with decaring weight from 1 to 0.5 until it reaches the time limit.
My problem is that most of the time , it takes alot time until this it reaches a solution , and if given somthing like 10 seconds it usaully doesnt find solution while other algorithms like anytime beam finds one in 0.0001 seconds.
Any ideas what to do?
If I were you I'd throw the unbounded heuristic away. Admissible heuristics are much better in that given a weight value for a solution you've found, you can say that it is at most 1/weight times the length of an optimal solution.
A big problem when implementing A* derivatives is the data structures. When I implemented a bidirectional search, just changing from array lists to a combination of hash augmented priority queues and array lists on demand, cut the runtime cost by three orders of magnitude - literally.
The main problem is that most of the papers only give pseudo-code for the algorithm using set logic - it's up to you to actually figure out how to represent the sets in your code. Don't be afraid of using multiple ADTs for a single list, i.e. your open list. I'm not 100% sure on Anytime Weighted A*, I've done other derivatives such as Anytime Dynamic A* and Anytime Repairing A*, not AWA* though.
Another issue is when you set the g-value too low, sometimes it can take far longer to find any solution that it would if it were a higher g-value. A common pitfall is forgetting to check your closed list for duplicate states, thus ending up in a (infinite if your g-value gets reduced to 0) loop. I'd try starting with something reasonably higher than 0 if you're getting quick results with a beam search.
Some pseudo-code would likely help here! Anyhow these are just my thoughts on the matter, you may have solved it already - if so good on you :)
Beam search is not complete since it prunes unfavorable states whereas A* search is complete. Depending on what problem you are solving, if incompleteness does not prevent you from finding a solution (usually many correct paths exist from origin to destination), then go for Beam search, otherwise, stay with AWA*. However, you can always run both in parallel if there are sufficient hardware resources.

Resources