Mahout recommendation returns empty set - mahout

I am trying to run KnnItemBasedRecommender using sample data "intro.csv" using the below code, however I am getting empty set as result.
public static void main(String[] args) throws Exception {
DataModel model = NeuvidisData.convertToDataModel();
//RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
#Override
public Recommender buildRecommender(DataModel model) {
ItemSimilarity similarity = new LogLikelihoodSimilarity(model);
Optimizer optimizer = new ConjugateGradientOptimizer();
return new KnnItemBasedRecommender(model, similarity, optimizer, 2);
}
};
Recommender rec = recommenderBuilder.buildRecommender(model);
List<RecommendedItem> rcList = rec.recommend(1, 2);
for(RecommendedItem item:rcList)
{
System.out.println("item:");
System.out.println(item);
}
}
Can anybody help me?

Presumably because your data is too small or sparse to make recommendations for user 1 using this algorithm. Without the data it's hard to say.

The following code worked for me.
ItemSimilarity similarity = new PearsonCorrelationSimilarity(dataModel);
Optimizer optimizer = new ConjugateGradientOptimizer();
Recommender recommender = new KnnItemBasedRecommender(dataModel, similarity, optimizer, 5);
Used PearsonCorrelationSimilarity instead of LogLikelihoodSimilarity.
This solution may work for a specific set of data. So, this solution is based on your data set.

Related

Trying to add a listener to a model (backed by a TDB2 dataset)

After a little research, org.apache.jena.sparql.core.DatasetGraphMonitor looked the way to go.
To my understanding I have to crate a DatasetGraph wrapped by the DatasetGraphMonitor, use this graph to create a Model and all the modifications to the model are now notified to my DatasetChanges object.
So that's what I'm doing:
//create a Dataset backed by TBD2
Dataset dataset = TDB2Factory.connectDataset(location);
//wrap the dataset with a DatasetGraphMonitor and obtain a DatasetGraph
DatasetGraph datasetGraph = new DatasetGraphMonitor(dataset.asDatasetGraph(), new DatasetChanges() {
#Override
public void start() {
}
#Override
public void reset() {
}
#Override
public void finish() {
}
#Override
public void change(QuadAction qaction, Node g, Node s, Node p, Node o) {
LOG.info("Dataset change: "+qaction);
}
});
//create a model using the DatasetGraphMonitor as underlying graph
Model model = ModelFactory.createModelForGraph(datasetGraph.getDefaultGraph());
//run an insert sparql query to add new triples to the triplestore (this really is in a write transaction, maybe I'm oversimplifying here)
UpdateAction.parseExecute(sparqlQuery, model);
well, you guessed that already: change never gets called.
Any idea about what I'm doing wrong here? Thanks.
DatasetGraphMonitor is for monitoring actions on the dataset. Getting the default graph, making it a model, doesn't trigger that machinery. (If it did, you'd get a "not in transaction" exception). The returns graph does straight to the core database.
Instead, either:
Wrap the graph from datasetGraph.getDefaultGraph() with GraphWrapper and put
the monitoring code on the various add/delete methods.
Do the update (in a transaction) on the datasetGraph.

Accord.NET Comparing two images to determine similarity

I would like your advice as to why the code might be becoming unresponsive and how to fix it.
I am using Accord.NET to compare images. The first stage of my project is to compare two images, an observed image and a model image, and determine how similar they are; the second, is to compare an observed image against my whole database to determine what the observed image most likely is based on how the models have been categorized. Right now I am focusing on the first. I initially tried using ExhaustiveTemplateMatching.ProcessImage() but it didn't fit my need. Now, I am using SURF. Here is my code as is:
public class ProcessImage
{
public static void Similarity(System.IO.Stream model, System.IO.Stream observed,
out float similPercent)
{
Bitmap bitModel = new Bitmap(model);
Bitmap bitObserved = new Bitmap(observed);
// For method Difference, see http://www.aforgenet.com/framework/docs/html/673023f7-799a-2ef6-7933-31ef09974dde.htm
// Inspiration for this process: https://www.youtube.com/watch?v=YHT46f2244E
// Greyscale class http://www.aforgenet.com/framework/docs/html/d7196dc6-8176-4344-a505-e7ade35c1741.htm
// Convert model and observed to greyscale
Grayscale filter = new Grayscale(0.2125, 0.7154, 0.0721);
// apply the filter to the model
Bitmap greyModel = filter.Apply(bitModel);
// Apply the filter to the observed image
Bitmap greyObserved = filter.Apply(bitObserved);
int modelPoints = 0, matchingPoints = 0;
/*
* This doesn't work. Images can have different sizes
// For an example, https://thecsharper.com/?p=94
// Match
var tm = new ExhaustiveTemplateMatching(similarityThreshold);
// Process the images
var results = tm.ProcessImage(greyModel, greyObserved);
*/
using (SpeededUpRobustFeaturesDetector detector = new SpeededUpRobustFeaturesDetector())
{
List<SpeededUpRobustFeaturePoint> surfModel = detector.ProcessImage(greyModel);
modelPoints = surfModel.Count();
List<SpeededUpRobustFeaturePoint> surfObserved = detector.ProcessImage(greyObserved);
KNearestNeighborMatching matcher = new KNearestNeighborMatching(5);
var results = matcher.Match(surfModel, surfObserved);
matchingPoints = results.Length;
}
// Determine if they represent the same points
// Obtain the pairs of associated points, we determine the homography matching all these pairs
// Compare the results, 0 indicates no match so return false
if (matchingPoints <= 0)
{
similPercent = 0.0f;
}
similPercent = (matchingPoints * 100) / modelPoints;
}
}
So far I get to obtain the list of points but then when matching the code seems to become unresponsive.
I am calling the above code from an ASP.NET web page after the user posts a bitmap. Here is the code if it may help:
public ActionResult Compare(int id)
{
ViewData["SampleID"] = id;
return View();
}
[HttpPost]
public ActionResult Compare(int id, HttpPostedFileBase uploadFile)
{
Sample model = _db.Sample_Read(id);
System.IO.Stream modelStream = null;
float result = 0;
_db.Sample_Stream(model.FileId, out modelStream);
ImgProc.ProcessImage.Similarity(modelStream, uploadFile.InputStream,
out result);
ViewData["SampleID"] = id;
ViewData["match"] = result;
return View();
}
The page itself is rather simple, a hidden field, an file type input and a submit.
Problem was my PC. After some time processing the calculation finishes.
Thanks,
For KNearestNeighborMatching to decide, it is necessary to put
Accord.Imaging and Accord.Vision.

Save Jena property table to TDB

When I try to save my propertyTabelGraph then I get
a org.apache.jena.shared.AddDeniedExceptionenter exception.
This exception is thrown by the method performAdd in the GraphBase class:
/**
Add a triple to the triple store. The default implementation throws an
AddDeniedException; subclasses must override if they want to be able to add triples.
*/
#Override
public void performAdd( Triple t )
{ throw new AddDeniedException( "GraphBase::performAdd" ); }
This function is called because I create a GraphPropertyTable which inherit
from GraphBase, however there is no override for the method perfromAdd as I expected there to be.
I am unsure on how I should proceed now.
I suspect that I am doing something wrong, please help me find out what!
Here is a minimum example that recreate the error:
PropertyTable propertytable = new PropertyTableArrayImpl(2, 2);
Column alpha = propertytable.createColumn(NodeFactory.createLiteral("alpha"));
Column beta = propertytable.createColumn(NodeFactory.createLiteral("beta"));
Row one = propertytable.createRow(NodeFactory.createLiteral("one"));
Row two = propertytable.createRow(NodeFactory.createLiteral("two"));
propertytable.getRow(one.getRowKey()).setValue(alpha,NodeFactory.createLiteral("alpha-one"));
propertytable.getRow(one.getRowKey()).setValue(beta,NodeFactory.createLiteral("beta-two"));
propertytable.getRow(two.getRowKey()).setValue(alpha,NodeFactory.createLiteral("alpha-one"));
propertytable.getRow(two.getRowKey()).setValue(beta,NodeFactory.createLiteral("beta-two"));
GraphPropertyTable graph = new GraphPropertyTable(propertytable);
Model model = ModelFactory.createModelForGraph(graph);
Dataset dataset = TDBFactory.createDataset("tdb/");
dataset.begin(ReadWrite.WRITE);
try {
dataset.addNamedModel("www.example.org/model", model);
dataset.commit();
} finally {
dataset.end();
}
How do I proceed in order to persist my property table on disk?

NEO4J Spatial: tips about batch inserter

This is my scenario: we are building a routing system by using neo4j and the spatial plugin. We start from the OSM file and we read this file and import nodes and relationships in our graph (a custom graph model)
Now, if we don't use the batch inserter of neo4j, in order to import a compressed OSM file (with compressed dimension of around 140MB, and normal dimensions around 2GB) it takes around 3 days on a dedicated server with the following characteristics: CentOS 6.5 64bit, quad core, 8GB RAM; pease note that the most time is related to the Neo4J Nodes and relationships creation; in-fact if we read the same file without doing anything with neo4j, the file is read in around 7 minutes (i'm sure about this becouse in our process we first read the file in order to store the correct osm nodes ids and then we read again the file in order to create the neo4j graph)
Obviously we need to improve the import proces so we are trying to use the batchInserter. So far, so good (I need to check how much it will perform by using the batchInserter but I guess it will be faster); so the first thing I did was: let's try to use the batch inserter in a simple test case (very similar to our code, but without modifying our code directly)
I list my software versions:
Neo4j: 2.0.2
Neo4jSpatial: 0.13-neo4j-2.0.1
Neo4jGraphCollections: 0.7.1-neo4j-2.0.1
Osmosis: 0.43.1
Since I'm using osmosis in order to read the osm file, I wrote the following Sink implementation:
public class BatchInserterSinkTest implements Sink
{
public static final Map<String, String> NEO4J_CFG = new HashMap<String, String>();
private static File basePath = new File("/home/angelo/Scrivania/neo4j");
private static File dbPath = new File(basePath, "db");
private GraphDatabaseService graphDb;
private BatchInserter batchInserter;
// private BatchInserterIndexProvider batchIndexService;
private SpatialDatabaseService spatialDb;
private SimplePointLayer spl;
static
{
NEO4J_CFG.put( "neostore.nodestore.db.mapped_memory", "100M" );
NEO4J_CFG.put( "neostore.relationshipstore.db.mapped_memory", "300M" );
NEO4J_CFG.put( "neostore.propertystore.db.mapped_memory", "400M" );
NEO4J_CFG.put( "neostore.propertystore.db.strings.mapped_memory", "800M" );
NEO4J_CFG.put( "neostore.propertystore.db.arrays.mapped_memory", "10M" );
NEO4J_CFG.put( "dump_configuration", "true" );
}
#Override
public void initialize(Map<String, Object> arg0)
{
batchInserter = BatchInserters.inserter(dbPath.getAbsolutePath(), NEO4J_CFG);
graphDb = new SpatialBatchGraphDatabaseService(batchInserter);
spatialDb = new SpatialDatabaseService(graphDb);
spl = spatialDb.createSimplePointLayer("testBatch", "latitudine", "longitudine");
//batchIndexService = new LuceneBatchInserterIndexProvider(batchInserter);
}
#Override
public void complete()
{
// TODO Auto-generated method stub
}
#Override
public void release()
{
// TODO Auto-generated method stub
}
#Override
public void process(EntityContainer ec)
{
Entity entity = ec.getEntity();
if (entity instanceof Node) {
Node osmNodo = (Node)entity;
org.neo4j.graphdb.Node graphNode = graphDb.createNode();
graphNode.setProperty("osmId", osmNodo.getId());
graphNode.setProperty("latitudine", osmNodo.getLatitude());
graphNode.setProperty("longitudine", osmNodo.getLongitude());
spl.add(graphNode);
} else if (entity instanceof Way) {
//do something with the way
} else if (entity instanceof Relation) {
//do something with the relation
}
}
}
Then I wrote the following test case:
public class BatchInserterTest
{
private static final Log logger = LogFactory.getLog(BatchInserterTest.class.getName());
#Test
public void batchInserter()
{
File file = new File("/home/angelo/Scrivania/MilanoPiccolo.osm");
try
{
boolean pbf = false;
CompressionMethod compression = CompressionMethod.None;
if (file.getName().endsWith(".pbf"))
{
pbf = true;
}
else if (file.getName().endsWith(".gz"))
{
compression = CompressionMethod.GZip;
}
else if (file.getName().endsWith(".bz2"))
{
compression = CompressionMethod.BZip2;
}
RunnableSource reader;
if (pbf)
{
reader = new crosby.binary.osmosis.OsmosisReader(new FileInputStream(file));
}
else
{
reader = new XmlReader(file, false, compression);
}
reader.setSink(new BatchInserterSinkTest());
Thread readerThread = new Thread(reader);
readerThread.start();
while (readerThread.isAlive())
{
try
{
readerThread.join();
}
catch (InterruptedException e)
{
/* do nothing */
}
}
}
catch (Exception e)
{
logger.error("Errore nella creazione di neo4j con batchInserter", e);
}
}
}
By executing this code, I get this exception:
Exception in thread "Thread-1" java.lang.ClassCastException: org.neo4j.unsafe.batchinsert.SpatialBatchGraphDatabaseService cannot be cast to org.neo4j.kernel.GraphDatabaseAPI
at org.neo4j.cypher.ExecutionEngine.<init>(ExecutionEngine.scala:113)
at org.neo4j.cypher.javacompat.ExecutionEngine.<init>(ExecutionEngine.java:53)
at org.neo4j.cypher.javacompat.ExecutionEngine.<init>(ExecutionEngine.java:43)
at org.neo4j.collections.graphdb.ReferenceNodes.getReferenceNode(ReferenceNodes.java:60)
at org.neo4j.gis.spatial.SpatialDatabaseService.getSpatialRoot(SpatialDatabaseService.java:76)
at org.neo4j.gis.spatial.SpatialDatabaseService.getLayer(SpatialDatabaseService.java:108)
at org.neo4j.gis.spatial.SpatialDatabaseService.containsLayer(SpatialDatabaseService.java:253)
at org.neo4j.gis.spatial.SpatialDatabaseService.createLayer(SpatialDatabaseService.java:282)
at org.neo4j.gis.spatial.SpatialDatabaseService.createSimplePointLayer(SpatialDatabaseService.java:266)
at it.eng.pinf.graph.batch.test.BatchInserterSinkTest.initialize(BatchInserterSinkTest.java:46)
at org.openstreetmap.osmosis.xml.v0_6.XmlReader.run(XmlReader.java:95)
at java.lang.Thread.run(Thread.java:744)
This is related to this code:
spl = spatialDb.createSimplePointLayer("testBatch", "latitudine", "longitudine");
So now I'm wondering: how can I use the batchInserter for my case? I have to add the created nodes to the SimplePointLayer....so how can I create it by using the batchInserter graph db service?
Is there any little simple sample?
Any tip is really really appreciated
cheers
Angelo
The OSMImporter class in the code has an example of using the batch inserter to import OSM data. The main thing is that the batch inserter is not really supported by neo4j spatial, so you need to do a few things manually. If you look at the class OSMImporter.OSMBatchWriter, you will see how it does things. It is not using the SimplePointLayer at all, since that does not support the batch inserter. It is creating the graph structure it wants directly. The simple point layer is quite simple, certainly much simpler than the OSM model created by the code I'm referencing, so I think you should be able to write a batch-inserter compatible version yourself without too much trouble.
What I would recommend is that you create the layer and nodes using the batch inserter to create the correct graph structure, then switch to the normal embedded API and use that to iterate through the nodes and add them to the spatial index.

the use of PlusAnonymousUserDataModel

What is wrong with the following code and why it produces no recommendations for anonymous user?
I cannot figure out what's going wrong, but I can't get recommendations for anonymous user with PlusAnonymousUserDataModel.
This is the example code, which shows no recommendations for anonymous user, but gives recommendation for user in the model with exactly similar preferences:
public static void main(String[] args) throws Exception {
DataModel model = new GenericBooleanPrefDataModel(
GenericBooleanPrefDataModel.toDataMap(new FileDataModel(
new File(args[0]))));
PlusAnonymousUserDataModel plusAnonymousModel = new PlusAnonymousUserDataModel(model);
UserSimilarity similarity = new LogLikelihoodSimilarity(model);
UserNeighborhood neighborhood =
new NearestNUserNeighborhood(
Integer.parseInt(args[1]), similarity, model);
//new ThresholdUserNeighborhood(Float.parseFloat(args[1]), similarity, model);
System.out.println("Neighborhood=" + args[1]);
System.out.println("");
Recommender recommender = new GenericBooleanPrefUserBasedRecommender(model,
neighborhood, similarity);
PreferenceArray anonymousPrefs =
new BooleanUserPreferenceArray(12);
anonymousPrefs.setUserID(0,
PlusAnonymousUserDataModel.TEMP_USER_ID);
anonymousPrefs.setItemID(0, 1105L);
anonymousPrefs.setItemID(1, 1201L);
anonymousPrefs.setItemID(2, 1301L);
anonymousPrefs.setItemID(3, 1401L);
anonymousPrefs.setItemID(4, 1502L);
anonymousPrefs.setItemID(5, 1602L);
anonymousPrefs.setItemID(6, 1713L);
anonymousPrefs.setItemID(7, 1801L);
anonymousPrefs.setItemID(8, 1901L);
anonymousPrefs.setItemID(9, 2002L);
anonymousPrefs.setItemID(10, 9101L);
anonymousPrefs.setItemID(11, 9301L);
synchronized(anonymousPrefs){
plusAnonymousModel.setTempPrefs(anonymousPrefs);
List<RecommendedItem> recommendations1 = recommender.recommend(PlusAnonymousUserDataModel.TEMP_USER_ID, 20);
plusAnonymousModel.clearTempPrefs();
System.out.println("Recm for anonymous:");
for (RecommendedItem recommendation : recommendations1) {
System.out.println(recommendation);
}
System.out.println("");
}
List<RecommendedItem> recommendations = recommender.recommend(
Integer.parseInt(args[2]), 20);
System.out.println("Recomedation for user_id="
+ Integer.parseInt(args[2]) + ":");
for (RecommendedItem recommendation : recommendations) {
System.out.println(recommendation);
}
System.out.println("");
The output produced by this code is as follows:
Neighborhood=100
Recm for anonymous:
Recomedation for user_id=1680604:
RecommendedItem[item:1701, value:24.363672]
... and so on. So there's no recommendations for anonymous user! :(
It turns out that to get recommendations you must construct similarity, neighbourhood and recommender using not "real" (file-based in my case), persistent DataModel model, but with PlusAnonymousUserDataModel plusAnonymousModel instead!
So, basical documentation on Mahout ( https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/impl/model/PlusAnonymousUserDataModel.html ) is wrong stating ItemSimilarity similarity = new LogLikelihoodSimilarity(realModel); // not plusModel
Earlier, other person on SO had the same problem and didn't get any answer here: Model creation for User User collanborative filtering
So I think I should go there and answer to him. Sean Owen, thank you for your interest, can you approve that the solution I found is the correct one?

Resources