My recommender recommends empty list - mahout-recommender

Please look at the following code. I am using the movielens 100k dataset. But the recommender does not return a list of recommended items, it returns an empty list. Where did I go wrong?
package org.psneog;
import java.io.File;
import java.io.IOException;
import java.util.List;
import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood;
import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
import org.apache.mahout.cf.taste.recommender.RecommendedItem;
import org.apache.mahout.cf.taste.recommender.Recommender;
import org.apache.mahout.cf.taste.similarity.UserSimilarity;
public class MyUserBasedRecommender {
public void myrecommend(String filename) throws IOException, TasteException {
System.out.println("BEGIN myrecommend");
DataModel model = new FileDataModel(new File(filename));
System.out.println("MARK1 " + model.getNumItems());
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
System.out.println("MARK2 " + similarity);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model);
System.out.println("MARK3 " + neighborhood);
Recommender recommender = new GenericUserBasedRecommender(model, neighborhood, similarity);
System.out.println("MARK4 " + recommender);
List<RecommendedItem> recommendations = recommender.recommend(751, 10);
System.out.println("MARK5 "+ recommendations );
for(RecommendedItem recommendation: recommendations){
System.out.println(recommendation);
System.out.println("MARK6");
}
System.out.println("END myrecommend");
}
public static void main(String[] args) throws IOException, TasteException {
System.out.println("BEGIN");
MyUserBasedRecommender myreco = new MyUserBasedRecommender();
myreco.myrecommend("ml-100k/u.csv");
System.out.println("END");
}
}
It shows an empty list at MARK5:
BEGIN
BEGIN myrecommend
MARK1 997
MARK2 PearsonCorrelationSimilarity[dataModel:FileDataModel[dataFile:C:\Users\Prabhakar\workspace_reco\MahoutEvaluation\ml-100k\u.csv],inferrer:null]
MARK3 NearestNUserNeighborhood
MARK4 GenericUserBasedRecommender[neighborhood:NearestNUserNeighborhood]
MARK5 []
END myrecommend
END
Please help me

Related

Apache beam pipeline cannot be updated on Dataflow

I am running a beam pipeline on Google Cloud Dataflow. however, the pipeline cannot be updated with exactly the same code. The pipeline looks like pipeline overview. And the code is as follows
import com.google.common.collect.Iterables;
import com.google.common.primitives.Ints;
import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.io.GenerateSequence;
import org.apache.beam.sdk.options.PipelineOptions;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.transforms.Create;
import org.apache.beam.sdk.transforms.DoFn;
import org.apache.beam.sdk.transforms.ParDo;
import org.apache.beam.sdk.transforms.View;
import org.apache.beam.sdk.values.PCollection;
import org.apache.beam.sdk.values.PCollectionView;
import org.joda.time.Duration;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class PipelineTest {
private static final Logger logger = LoggerFactory.getLogger(PipelineTest.class);
public static void main(String[] args) {
int[] shit = new int[1000];
for (int i = 0; i < shit.length; i++) {
shit[i] = i * i;
}
PipelineOptions options = PipelineOptionsFactory.create();
Pipeline pipeline = Pipeline.create(options);
PCollection<Iterable<Integer>> sideInput =
pipeline.apply("Create", Create.<Iterable<Integer>>of(Ints.asList(shit)));
PCollectionView<Iterable<Integer>> view =
sideInput.apply("CreateSideInput", View.asSingleton());
PCollection<String> done =
pipeline
.apply(
"FakeData",
GenerateSequence.from(0).to(50_000).withRate(10, Duration.standardSeconds(1)))
.apply(
"Map1",
ParDo.of(
new DoFn<Long, String>() {
#ProcessElement
public void processElement(ProcessContext ctx) {
Long element = ctx.element();
Iterable<Integer> v = ctx.sideInput(view);
String out = "element " + element + ", value " + Iterables.size(v);
logger.info("MAP1: " + out);
ctx.output(out);
}
})
.withSideInputs(view))
.apply(
"Map2",
ParDo.of(
new DoFn<String, String>() {
#ProcessElement
public void processElement(ProcessContext ctx) {
String element = ctx.element();
Iterable<Integer> v = ctx.sideInput(view);
String out = "element " + element + ", value " + Iterables.size(v);
logger.info("MAP2: " + out);
ctx.output(out);
}
})
.withSideInputs(view));
}
}
I tried to provide default value for the view as well as use two views. However, neither of them work. If the view is used in two independent transforms, the pipeline can be updated.

The import org.apache.lucene.queryparser cannot be resolved

I am using Lucene 6.6 and I am facing difficulty in importing lucene.queryparser and I did check the lucene documentations and it doesn't exist now.
I am using below code. Is there any alternative for queryparser in lucene6.
import java.io.IOException;
import java.text.ParseException;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopScoreDocCollector;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.util.Version;
public class HelloLucene {
public static void main(String[] args) throws IOException, ParseException {
// 0. Specify the analyzer for tokenizing text.
// The same analyzer should be used for indexing and searching
StandardAnalyzer analyzer = new StandardAnalyzer();
// 1. create the index
Directory index = new RAMDirectory();
IndexWriterConfig config = new IndexWriterConfig(analyzer);
IndexWriter w = new IndexWriter(index, config);
addDoc(w, "Lucene in Action", "193398817");
addDoc(w, "Lucene for Dummies", "55320055Z");
addDoc(w, "Managing Gigabytes", "55063554A");
addDoc(w, "The Art of Computer Science", "9900333X");
w.close();
// 2. query
String querystr = args.length > 0 ? args[0] : "lucene";
// the "title" arg specifies the default field to use
// when no field is explicitly specified in the query.
Query q = null;
try {
q = new QueryParser(Version.LUCENE_6_6_0, "title", analyzer).parse(querystr);
} catch (org.apache.lucene.queryparser.classic.ParseException e) {
e.printStackTrace();
}
// 3. search
int hitsPerPage = 10;
IndexReader reader = DirectoryReader.open(index);
IndexSearcher searcher = new IndexSearcher(reader);
TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);
searcher.search(q, collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;
// 4. display results
System.out.println("Found " + hits.length + " hits.");
for (int i = 0; i < hits.length; ++i) {
int docId = hits[i].doc;
Document d = searcher.doc(docId);
System.out.println((i + 1) + ". " + d.get("isbn") + "\t" + d.get("title"));
}
// reader can only be closed when there
// is no need to access the documents any more.
reader.close();
}
private static void addDoc(IndexWriter w, String title, String isbn) throws IOException {
Document doc = new Document();
doc.add(new TextField("title", title, Field.Store.YES));
// use a string field for isbn because we don't want it tokenized
doc.add(new StringField("isbn", isbn, Field.Store.YES));
w.addDocument(doc);
}
}
Thanks!
The problem got solved.
Initially, in the build path, only Lucene-core-6.6.0 was added but lucene-queryparser-6.6.0 is a separate jar file that needs to be added separately.

Twitter authentication credentials

I am new to the social network analysis and twitter API.I wanted to collect tweets on certain topic .So i have written the following code
package com;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.util.List;
import twitter4j.Query;
import twitter4j.QueryResult;
import twitter4j.Status;
import twitter4j.Tweet;
import twitter4j.Twitter;
import twitter4j.TwitterException;
import twitter4j.TwitterFactory;
public class TwitterSearchAdvance {
public static void main(String[] vishal) throws TwitterException,
IOException {
// List<Tweet> Data = new ArrayList<Tweet>();
StringBuffer stringbuffer = new StringBuffer();
Twitter twitter = new TwitterFactory().getInstance();
for (int page = 0; page <= 4; page++) {
Query query = new Query("Airtel");
// query.setRpp(100); // 100 results per page
QueryResult qr = twitter.search(query);
List<Status> qrTweets = qr.getTweets();
System.out.println("-------------------" + qrTweets.size());
// break out if there are no more tweets
for (Status t : qrTweets) {
System.out.println(t.getCreatedAt() + ": " + t.getText());
stringbuffer.append(t);
stringbuffer.append("\n");
BufferedWriter bw = new BufferedWriter(new FileWriter(new File(
"/home/vishal/FirstDocu.txt"), true));
bw.write(stringbuffer.toString());
bw.newLine();
// bw.write(t.getCreatedAt() + ": " + t.getText());
bw.close();
}
}
}
}
But when i run the following program the following error starts coming
Exception in thread "main" java.lang.IllegalStateException: Authentication credentials are missing. See http://twitter4j.org/configuration.html for the detail.
at twitter4j.TwitterBaseImpl.ensureAuthorizationEnabled(TwitterBaseImpl.java:200)
at twitter4j.TwitterImpl.get(TwitterImpl.java:1833)
at twitter4j.TwitterImpl.search(TwitterImpl.java:282)
at com.TwitterSearchAdvance.main(TwitterSearchAdvance.java:28)
Where do i need to provide credentials in my program
Thanks
Have a look at the options here http://twitter4j.org/en/configuration.html
There are a number of ways to provide credentials to your program:
Properties File
ConfigurationBuilder class
System properties
Environment Variables
All details and instructions can be found in the link

Run JesterRecommenderEvaluationRunner, but get no results of evaluation

I downloaded the Jester example code in Mahout, and tries to run it on jester dataset to see the evaluation results. the running is done successfully, but the console only has the results:
log4j:WARN No appenders could be found for logger (org.apache.mahout.cf.taste.impl.model.file.FileDataModel).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
I expect to see the evaluation score range from 0 to 10. any one can help me found out how to get the score?
I am using mahout-core-0.6.jar and the following is the code:
JesterDataModel.java:
package Jester;
import java.io.File;
import java.io.IOException;
import java.util.Collection;
import java.util.regex.Pattern;
import com.google.common.collect.Lists;
import org.apache.mahout.cf.taste.example.grouplens.GroupLensDataModel;
import org.apache.mahout.cf.taste.impl.common.FastByIDMap;
import org.apache.mahout.cf.taste.impl.model.GenericDataModel;
import org.apache.mahout.cf.taste.impl.model.GenericPreference;
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.model.Preference;
import org.apache.mahout.common.iterator.FileLineIterator;
//import org.apache.mahout.cf.taste.impl.common.FileLineIterable;
public final class JesterDataModel extends FileDataModel {
private static final Pattern COMMA_PATTERN = Pattern.compile(",");
private long userBeingRead;
public JesterDataModel() throws IOException {
this(GroupLensDataModel.readResourceToTempFile("\\jester-data-1.csv"));
}
public JesterDataModel(File ratingsFile) throws IOException {
super(ratingsFile);
}
#Override
public void reload() {
userBeingRead = 0;
super.reload();
}
#Override
protected DataModel buildModel() throws IOException {
FastByIDMap<Collection<Preference>> data = new FastByIDMap<Collection<Preference>> ();
FileLineIterator iterator = new FileLineIterator(getDataFile(), false);
FastByIDMap<FastByIDMap<Long>> timestamps = new FastByIDMap<FastByIDMap<Long>>();
processFile(iterator, data, timestamps, false);
return new GenericDataModel(GenericDataModel.toDataMap(data, true));
}
#Override
protected void processLine(String line,
FastByIDMap<?> rawData,
FastByIDMap<FastByIDMap<Long>> timestamps,
boolean fromPriorData) {
FastByIDMap<Collection<Preference>> data = (FastByIDMap<Collection<Preference>>) rawData;
String[] jokePrefs = COMMA_PATTERN.split(line);
int count = Integer.parseInt(jokePrefs[0]);
Collection<Preference> prefs = Lists.newArrayListWithCapacity(count);
for (int itemID = 1; itemID < jokePrefs.length; itemID++) { // yes skip first one, just a count
String jokePref = jokePrefs[itemID];
if (!"99".equals(jokePref)) {
float jokePrefValue = Float.parseFloat(jokePref);
prefs.add(new GenericPreference(userBeingRead, itemID, jokePrefValue));
}
}
data.put(userBeingRead, prefs);
userBeingRead++;
}
}
JesterRecommenderEvaluatorRunner.java
package Jester;
import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.eval.RecommenderEvaluator;
import org.apache.mahout.cf.taste.impl.eval.AverageAbsoluteDifferenceRecommenderEvaluator;
import org.apache.mahout.cf.taste.model.DataModel;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.io.IOException;
public final class JesterRecommenderEvaluatorRunner {
private static final Logger log = LoggerFactory.getLogger(JesterRecommenderEvaluatorRunner.class);
private JesterRecommenderEvaluatorRunner() {
// do nothing
}
public static void main(String... args) throws IOException, TasteException {
RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
DataModel model = new JesterDataModel();
double evaluation = evaluator.evaluate(new JesterRecommenderBuilder(),
null,
model,
0.9,
1.0);
log.info(String.valueOf(evaluation));
}
}
Mahout 0.7 is old, and 0.6 is very old. Use at least 0.7, or better, later from SVN.
I think the problem is exactly what you identified: you don't have any slf4j bindings in your classpath. If you use the ".job" files in Mahout you will have all dependencies packages. Then you will actually see output.

parsing with dom4j

I am successfully retrieve the data of response using xpath expression /abcde/response from the xml ,
<abcde>
<response>000</response>
</abcde>
But couldnt retrieve the data of response from the same xml but with some additional data
<abcde version="8.1" xmlns="http://www.litle.com/schema"
response="0" message="Valid Format">
<response>000</response>
</abcde>
What am i doing wrong ?
package stackoverflow;
import java.io.ByteArrayInputStream;
import java.util.HashMap;
import java.util.Map;
import org.dom4j.Document;
import org.dom4j.DocumentException;
import org.dom4j.DocumentFactory;
import org.dom4j.DocumentHelper;
import org.dom4j.XPath;
import org.dom4j.io.SAXReader;
import org.dom4j.xpath.DefaultXPath;
import org.jaxen.VariableContext;
public class MakejdomWork {
public static void main(String[] args) {
new MakejdomWork().run();
}
public void run() {
ByteArrayInputStream bis = new ByteArrayInputStream("<abcde version=\"8.1\" xmlns=\"http://www.litle.com/schema\" response=\"0\" message=\"Valid Format\"> <response>000</response></abcde>".getBytes());
//ByteArrayInputStream bis = new ByteArrayInputStream("<abcde><response>000</response></abcde>".getBytes());
Map nsPrefixes = new HashMap();
nsPrefixes.put( "x", "http://www.litle.com/schema" );
DocumentFactory factory = new DocumentFactory();
factory.setXPathNamespaceURIs( nsPrefixes );
SAXReader reader = new SAXReader();
reader.setDocumentFactory( factory );
Document doc;
try {
doc = reader.read( bis );
Object value = doc.valueOf("/abcde/x:response");
System.out.println(value);
} catch (DocumentException e) {
e.printStackTrace();
}
}
}
Short answer: you need to use namespace prefixes if your parser is namespace aware (which dom4j is)

Resources