the use of PlusAnonymousUserDataModel - mahout

What is wrong with the following code and why it produces no recommendations for anonymous user?
I cannot figure out what's going wrong, but I can't get recommendations for anonymous user with PlusAnonymousUserDataModel.
This is the example code, which shows no recommendations for anonymous user, but gives recommendation for user in the model with exactly similar preferences:
public static void main(String[] args) throws Exception {
DataModel model = new GenericBooleanPrefDataModel(
GenericBooleanPrefDataModel.toDataMap(new FileDataModel(
new File(args[0]))));
PlusAnonymousUserDataModel plusAnonymousModel = new PlusAnonymousUserDataModel(model);
UserSimilarity similarity = new LogLikelihoodSimilarity(model);
UserNeighborhood neighborhood =
new NearestNUserNeighborhood(
Integer.parseInt(args[1]), similarity, model);
//new ThresholdUserNeighborhood(Float.parseFloat(args[1]), similarity, model);
System.out.println("Neighborhood=" + args[1]);
System.out.println("");
Recommender recommender = new GenericBooleanPrefUserBasedRecommender(model,
neighborhood, similarity);
PreferenceArray anonymousPrefs =
new BooleanUserPreferenceArray(12);
anonymousPrefs.setUserID(0,
PlusAnonymousUserDataModel.TEMP_USER_ID);
anonymousPrefs.setItemID(0, 1105L);
anonymousPrefs.setItemID(1, 1201L);
anonymousPrefs.setItemID(2, 1301L);
anonymousPrefs.setItemID(3, 1401L);
anonymousPrefs.setItemID(4, 1502L);
anonymousPrefs.setItemID(5, 1602L);
anonymousPrefs.setItemID(6, 1713L);
anonymousPrefs.setItemID(7, 1801L);
anonymousPrefs.setItemID(8, 1901L);
anonymousPrefs.setItemID(9, 2002L);
anonymousPrefs.setItemID(10, 9101L);
anonymousPrefs.setItemID(11, 9301L);
synchronized(anonymousPrefs){
plusAnonymousModel.setTempPrefs(anonymousPrefs);
List<RecommendedItem> recommendations1 = recommender.recommend(PlusAnonymousUserDataModel.TEMP_USER_ID, 20);
plusAnonymousModel.clearTempPrefs();
System.out.println("Recm for anonymous:");
for (RecommendedItem recommendation : recommendations1) {
System.out.println(recommendation);
}
System.out.println("");
}
List<RecommendedItem> recommendations = recommender.recommend(
Integer.parseInt(args[2]), 20);
System.out.println("Recomedation for user_id="
+ Integer.parseInt(args[2]) + ":");
for (RecommendedItem recommendation : recommendations) {
System.out.println(recommendation);
}
System.out.println("");
The output produced by this code is as follows:
Neighborhood=100
Recm for anonymous:
Recomedation for user_id=1680604:
RecommendedItem[item:1701, value:24.363672]
... and so on. So there's no recommendations for anonymous user! :(
It turns out that to get recommendations you must construct similarity, neighbourhood and recommender using not "real" (file-based in my case), persistent DataModel model, but with PlusAnonymousUserDataModel plusAnonymousModel instead!
So, basical documentation on Mahout ( https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/impl/model/PlusAnonymousUserDataModel.html ) is wrong stating ItemSimilarity similarity = new LogLikelihoodSimilarity(realModel); // not plusModel
Earlier, other person on SO had the same problem and didn't get any answer here: Model creation for User User collanborative filtering
So I think I should go there and answer to him. Sean Owen, thank you for your interest, can you approve that the solution I found is the correct one?

Related

Three-Tier Architecture: Get All Data and Validations

The project I am working is 'University Management System' and it's a big one. Right now, I am implementing the student registration section that works fine (A small portion of the project). I've used 'Three-Tier Architecture' and 'ORM - EF' in ASP.NET MVC template. In the project, I need to do some validations for registering students depending upon their year, department etc. So there are sections like DAL, BLL, finally controller and view. I've done the validations in the controller and getting the data from BLL that again retrieves data from DAL (This is the simple condition of 'Three-Tier Architecture'). So my questions are:
1) Is it OK to do the validations in the controller?
2) If not and need to do it in the BLL, will it be just fine and why or I can
continue doing it in the controller?
Note: To me, doing the validations in the controller or BLL seems OK and the same. Does it have any effect?
Right now, I've done the following:
DAL:
public List<Student> Add(int studentID, string studentName, string email, DateTime regDate)
{
List<Student> lst = null;
Student aStudent = new Student();
aStudent.StudentID = studentID;
aStudent.StudentName = studentName;
aStudent.Email = email;
aStudent.RegDate = regDate;
try
{
db.Students.Add(aStudent);
db.SaveChanges();
}
catch (Exception ex)
{
ex.ToString();
}
return lst;
}
BLL:
public List<Student> Add(int studentID, string studentName, string email, DateTime regDate)
{
return aStudentGateway.Add(studentID, studentName, email, regDate);
}
Controller:
/**Student Registration - Starts**/
[HttpPost]
public ActionResult AddStudent(Student aStudent)
{
List<Department> departments = aDepartmentManager.GetAllDepartments();
List<DepartmentViewModel> departmentsViewModel = aDepartmentManager.GetAllDepartmentViewModel();
DateTime yearInDateTime = Convert.ToDateTime(Request.Form["RegDate"]);
string extractYear = yearInDateTime.ToString();
var year = DateTime.Parse(extractYear).Year;
int department = Convert.ToInt32(Request.Form["Department"]);
List<Student> studentList = aStudentManager.GetAllStudents();
int count = 1;
var query = (from c in studentList
where c.Department == department && c.Year == year
select c).ToList();
foreach (var c in query)
{
if (query.Count() > 0)
{
int m = Convert.ToInt32(c.StudentID);
count = m + 1; //Incrementing the numbers by one with the table column
}
else
{
int m = 1;
count = m + 1; //Incrementing the numbers by one with the variable assigned one
}
}
Student student = new Student();
student.StudentName = Request.Form["StudentName"];
student.Email = Request.Form["Email"];
student.RegDate = Convert.ToDateTime(Request.Form["RegDate"]);
student.StudentID = count;
if (aStudentManager.ExistEmailAny(student.Email))
{
ViewBag.ErrorMessage = "Email already exists";
}
else
{
aStudentManager.Add(aStudent.StudentID, aStudent.StudentName, aStudent.Email, aStudent.RegDate);
ViewBag.Message = "Registration successful. See below to verify.";
/**This section used to show student details after registration**/
var result = (from c in departments
join d in departmentsViewModel on c.DepartmentID equals d.DepartmentId
where d.DepartmentId == department
select c);
foreach (var items in result)
{
if (count.ToString().Length > 1)
{
ViewBag.StudentID = items.Code + "-" + year + "-" + "0" + count;
}
else
{
ViewBag.StudentID = items.Code + "-" + year + "-" + "00" + count;
}
StudentViewModel.StudentID = student.StudentID;
StudentViewModel.StudentName = student.StudentName;
StudentViewModel.Email = student.Email;
StudentViewModel.RegDate = student.RegDate;
}
/**This section used to show student details after registration**/
}
return View();
}
/**Student Registration - Ends**/
I would provide multiple steps of validation in the different layers, depending on the context and the meaning of the layer.
First, it's a best practice to provide validation both on client and server side.
For the client side you should provide field checks for required fields and other simple validations. If you are using MVC you can use data annotations.
The same validation should be replicated in the controller. Here you should fail fast applying some kind of contract to the parameters that have been passed. One good practice is using Code Contracts that provide preconditions that need to be satisfied to go on in your pipeline of execution.
In the business layer provide the check that needs to be done in the business logic.
Finally in the data access layer provide all the checks that are needed to persist your data. If you are using EF a good practice is implementing the IValidatableObject for your entity classes. Here in Scott Gu's blog you can find a post that explains this technique.
Even though this approach look like it will introduce repetitions, it will provide consistency in your data and separate concerns between your layers.
1) Is it OK to do the validations in the controller?
No at all, it would be more better to use Data Annotation Validator Attributes, and to do validation in your model class.
Second thing, you're doing some stuff of DAL in your controller, like
List<Department> departments = aDepartmentManager.GetAllDepartments();
List<DepartmentViewModel> departmentsViewModel = aDepartmentManager.GetAllDepartmentViewModel();
var query = (from c in studentList
where c.Department == department && c.Year == year
select c).ToList();
These all queries should be in DAL, which is exact use of DAL to interact with the database, and keep your controller clean.
Third thing,
If you pass Student to the controller, then not need to get each attribute using Request.Form.
Hope this make sense!

Updating database with Entity Framework, Comparing methods

I think these are essentially the same method, but the first one queries the db first, so has less performance due to hitting the db twice. I will only have 40 users at most so performance isn't too big an issue. Is there any other reason to use one over the other?
Grab the entity from the db first, change it then save it:
public void UpdateStudent(StudentModel model)
{
using (var _db = new AppEntities())
{
Student student = new Student();
student = _db.Student.Find(model.studentId);
student.FirstName = model.FirstName;
student.LastName = model.LastName;
student.DOB = model.DOB;
student.GradeId = model.GradeId;
_db.Entry(student).State = System.Data.Entity.EntityState.Modified;
_db.SaveChanges();
}
}
Change the entity and let EF find it in the DB and update:
public void UpdateStudent(StudentModel model)
{
using (var _db = new AppEntities())
{
Student student = new Student()
{
student.StudentId = model.StudentId,
student.FirstName = model.FirstName,
student.LastName = model.LastName,
student.DOB = model.DOB,
student.GradeId = model.GradeId
};
_db.Entry(student).State = System.Data.Entity.EntityState.Modified;
_db.SaveChanges();
}
}
In first code snippet you take some version of entity form db. If other thread or proccess modifies the same entity I don't think EF would let you just do an update as your base version of entity differs from that one in db right before an update query.
In the second one if some thread or process modifies this entity while you're processing this request you probably could lose that change.
EDIT: I never tired that. I'm always getting the entity and then modify and save but you could write a test to verify what happens.
In your first snippet, you don't have to mark the entity as Modified, because the change tracker takes care of that. This is important to note because it also defines the difference between the two methods. I'll explain.
Let's assume that of all assignments (student.FirstName = model.FirstName; etc.) only the first one is a real change. If so -
The first code fragment (but without marking the entity as Modified) triggers an update statement that only updates FirstName.
The second code fragment always updates all fields in Student.
This means that the first fragment is less likely to cause concurrency conflicts (someone else may change LastName in the mean time and you don't overwrite this modification by stale data, as happens in the second scenario).
So it's about fine-grained changes vs. a sweeping update, roundtrips vs. redundancy:
the first scenario takes roundtrips but is more concurrency-safe.
the second scenario takes no roundtrips but is less concurrency-safe.
It's up to you to balance the trade-offs.
To make this choice a little bit harder, there is a third option:
public void UpdateStudent(StudentModel model)
{
using (var _db = new AppEntities())
{
Student student = new Student()
{
student.StudentId = model.StudentId,
student.FirstName = model.FirstName,
student.LastName = model.LastName,
student.DOB = model.DOB,
student.GradeId = model.GradeId
};
_db.Students.Attach(student);
_db.Entry(student).Property(s => s.FirstName).IsModified = true;
_db.Entry(student).Property(s => s.LastName).IsModified = true;
_db.Entry(student).Property(s => s.DOB).IsModified = true;
_db.Entry(student).Property(s => s.GradeId).IsModified = true;
_db.SaveChanges();
}
}
No roundtrip and now you only mark 4 properties as modified. So you still update too many properties if only one was actually changed, but four is better than all.
And there's more to this "rondtrips vs redundancy" question, but I explained that elswhere.

Mahout Recomendaton engine recommending products and with its Quantity to customer

i am working on mahout recommendation engine use case.I precomputed recommendations and stored in database. now i am planning to expose with taste rest services to .net.i had limited customers and products.it is distributor level recommendation use case.my question is if new distributor comes in ,how would i suggests recommendations to him.and also how would i suggest the Quantity of Recommended product to each distributor.could you people give me some guidance.am i going to face performance issues..?
One way is, when new user comes, to precompute the recommendations from scratch for all the users or only for this user. You should know that this user might change the recommendations for the others too. Its up to your needs frequently do you want to do the pre-computations.
However, if you have limited number of users and items, another way is to have online recommender that computes the recommendations in real time. If you use the FileDataModel, there is a way to get the data from the new user periodically (See the book Mahout in Action). If you use in memory data model, which is faster, you can override the methods: setPreference(long userID, long itemID, float value) and removePreference(long userID, long itemID), and whenever new user comes and likes or removes some items you should call these methods on your data model.
EDIT: Basically you can get the GenericDataModel, and add this to the methods setPreference and removePreference. This will be your lower level data model. You can wrap it afterwards with ReloadFromJDBCDataModel by setting your data model in the reload() method like this:
DataModel newDelegateInMemory =
delegate.hasPreferenceValues()
? new MutableDataModel(delegate.exportWithPrefs())
: new MutableBooleanPrefDataModel(delegate.exportWithIDsOnly());
The overridden methods:
#Override
public void setPreference(long userID, long itemID, float value) {
userIDs.add(userID);
itemIDs.add(itemID);
setMinPreference(Math.min(getMinPreference(), value));
setMaxPreference(Math.max(getMaxPreference(), value));
Preference p = new GenericPreference(userID, itemID, value);
// User preferences
GenericUserPreferenceArray newUPref;
int existingPosition = -1;
if (preferenceFromUsers.containsKey(userID)) {
PreferenceArray oldPref = preferenceFromUsers.get(userID);
newUPref = new GenericUserPreferenceArray(oldPref.length() + 1);
for (int i = 0; i < oldPref.length(); i++) {
//If the item does not exist in the liked user items, add it!
if(oldPref.get(i).getItemID()!=itemID){
newUPref.set(i, oldPref.get(i));
}else{
//Otherwise remember the position
existingPosition = i;
}
}
if(existingPosition>-1){
//And change the preference value
oldPref.set(existingPosition, p);
}else{
newUPref.set(oldPref.length(), p);
}
} else {
newUPref = new GenericUserPreferenceArray(1);
newUPref.set(0, p);
}
if(existingPosition == -1){
preferenceFromUsers.put(userID, newUPref);
}
// Item preferences
GenericItemPreferenceArray newIPref;
existingPosition = -1;
if (preferenceForItems.containsKey(itemID)) {
PreferenceArray oldPref = preferenceForItems.get(itemID);
newIPref = new GenericItemPreferenceArray(oldPref.length() + 1);
for (int i = 0; i < oldPref.length(); i++) {
if(oldPref.get(i).getUserID()!=userID){
newIPref.set(i, oldPref.get(i));
}else{
existingPosition = i;
}
}
if(existingPosition>-1){
oldPref.set(existingPosition, p);
}else{
newIPref.set(oldPref.length(), p);
}
} else {
newIPref = new GenericItemPreferenceArray(1);
newIPref.set(0, p);
}
if(existingPosition == -1){
preferenceForItems.put(itemID, newIPref);
}
}
#Override
public void removePreference(long userID, long itemID) {
// User preferences
if (preferenceFromUsers.containsKey(userID)) {
List<Preference> newPu = new ArrayList<Preference>();
for (Preference p : preferenceFromUsers.get(userID)) {
if(p.getItemID()!=itemID){
newPu.add(p);
}
}
preferenceFromUsers.remove(userID);
preferenceFromUsers.put(userID, new GenericUserPreferenceArray(newPu));
}
if(preferenceFromUsers.get(userID).length()==0){
preferenceFromUsers.remove(userID);
userIDs.remove(userID);
}
if (preferenceForItems.containsKey(itemID)) {
List<Preference> newPi = new ArrayList<Preference>();
for (Preference p : preferenceForItems.get(itemID)) {
if(p.getUserID() != userID){
newPi.add(p);
}
}
preferenceForItems.remove(itemID);
preferenceForItems.put(itemID, new GenericItemPreferenceArray(newPi));
}
if(preferenceForItems.get(itemID).length()==0){
//Not sure if this is needed, but it works without removing the item
//preferenceForItems.remove(itemID);
//itemIDs.remove(itemID);
}
}
If by "new distributor" you mean that you have no data for them, no historical data. Then you cannot make recommendations using Mahout's recommenders.
You can suggest other items once they chose one. Use Mahout's "itemsimilarity" driver to calculate similar items for everything in your catalog. Then if they choose something you can suggest similar items.
The items that come from the itemsimilarity driver can be stored in you DB as a column value containing ids for similar items for every item. Then you can index the column with a search engine and use the user's first order as the query. This will return realtime personalized recommendations and is the most up-to-date method suggested by the Mahout people.
See a description of how to do this in this book by Ted Dunning, one of the leading Mahout Data Scientists. http://www.mapr.com/practical-machine-learning

Mahout recommendation returns empty set

I am trying to run KnnItemBasedRecommender using sample data "intro.csv" using the below code, however I am getting empty set as result.
public static void main(String[] args) throws Exception {
DataModel model = NeuvidisData.convertToDataModel();
//RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
#Override
public Recommender buildRecommender(DataModel model) {
ItemSimilarity similarity = new LogLikelihoodSimilarity(model);
Optimizer optimizer = new ConjugateGradientOptimizer();
return new KnnItemBasedRecommender(model, similarity, optimizer, 2);
}
};
Recommender rec = recommenderBuilder.buildRecommender(model);
List<RecommendedItem> rcList = rec.recommend(1, 2);
for(RecommendedItem item:rcList)
{
System.out.println("item:");
System.out.println(item);
}
}
Can anybody help me?
Presumably because your data is too small or sparse to make recommendations for user 1 using this algorithm. Without the data it's hard to say.
The following code worked for me.
ItemSimilarity similarity = new PearsonCorrelationSimilarity(dataModel);
Optimizer optimizer = new ConjugateGradientOptimizer();
Recommender recommender = new KnnItemBasedRecommender(dataModel, similarity, optimizer, 5);
Used PearsonCorrelationSimilarity instead of LogLikelihoodSimilarity.
This solution may work for a specific set of data. So, this solution is based on your data set.

Entity Framework - Disconnexted Behavior in nTier

I am new to EF but I will try my best to describe the scenario. I have 3 tables in My DB namely RecommendationTopic, Recommendation and Question. Each RecommendationTopic can have multiple Recommendations and each Recommendation may have multiple questions. Assume that I already have predefined questions in my Question table.
I have one service that returns me list of questions like below:
public List<Question> FetchQuestions(int categoryID)
{
using (Entities context = new Entities())
{
questions = context.Questions.Where(i => i.ID >= 0).ToList();
}
}
I have another service which is used to create RecommendationTopic and Recommendation whose code is something like below:
public void ManageRecommendation(RecommendationTopic recommendationTopic)
{
using (Entities context = new Entities())
{
context.AddObject("RecommendationTopics", recommendationTopic);
context.SaveChanges();
}
}
My client code looks like below:
List<Question> questions;
using (QuestionServiceClient client = new QuestionServiceClient())
{
questions = client.FetchQuestions();
}
using (RecommendationServiceClient client = new RecommendationServiceClient())
{
RecommendationTopic rTopic = new RecommendationTopic();
rTopic.CategoryID = 3;
rTopic.Name = "Topic From Client";
Recommendation rCom = new Recommendation();
rCom.Text = "Dont!";
rCom.RecommendationTopic = rTopic;
rCom.ConditionText = "Some condition";
rCom.Questions.Add(questions[0]);
rCom.Questions.Add(questions[1]);
client.ManageRecommendation(rTopic);
}
Since the client makes 2 separate service calls, the context would be different for both the calls. When I try to run this and check the EF profiler, it not only generates query to insert into RecommendationTopic and Recommendation but also Question table!
I am sure this is caused due to different context for both the calls as when I execute a similar code within a single context, it works as it's supposed to work.
Question is, how do I make it work in a disconnected scenario?
My client could be Silverlight client where I need to fill a Question drop down with a separate call and save Recommendation topic in a separate call. For this reason I am using self tracking entities as well.
Any input appreciated!
-Vinod
If you are using STEs (self tracking entities) your ManageRecommendation should look like:
public void ManageRecommendation(RecommendationTopic recommendationTopic)
{
using (Entities context = new Entities())
{
context.RecommendationTopics.ApplyChanges(recommendationTopic);
context.SaveChanges();
}
}
Calling AddObject skips self tracking behavior of your entity. If you are not using STEs you must iterate through all questions and change their state to Unchanged:
public void ManageRecommendation(RecommendationTopic recommendationTopic)
{
using (Entities context = new Entities())
{
context.RecommendationTopics.AddObject(recommendationTopic);
foreach (var question in recommendationTopic.Questions)
{
context.ObjectStateManager.ChangeObjectState(recommendationTopic, EntityState.Unchanged);
}
context.SaveChanges();
}
}

Resources