Trying to add a listener to a model (backed by a TDB2 dataset) - jena

After a little research, org.apache.jena.sparql.core.DatasetGraphMonitor looked the way to go.
To my understanding I have to crate a DatasetGraph wrapped by the DatasetGraphMonitor, use this graph to create a Model and all the modifications to the model are now notified to my DatasetChanges object.
So that's what I'm doing:
//create a Dataset backed by TBD2
Dataset dataset = TDB2Factory.connectDataset(location);
//wrap the dataset with a DatasetGraphMonitor and obtain a DatasetGraph
DatasetGraph datasetGraph = new DatasetGraphMonitor(dataset.asDatasetGraph(), new DatasetChanges() {
#Override
public void start() {
}
#Override
public void reset() {
}
#Override
public void finish() {
}
#Override
public void change(QuadAction qaction, Node g, Node s, Node p, Node o) {
LOG.info("Dataset change: "+qaction);
}
});
//create a model using the DatasetGraphMonitor as underlying graph
Model model = ModelFactory.createModelForGraph(datasetGraph.getDefaultGraph());
//run an insert sparql query to add new triples to the triplestore (this really is in a write transaction, maybe I'm oversimplifying here)
UpdateAction.parseExecute(sparqlQuery, model);
well, you guessed that already: change never gets called.
Any idea about what I'm doing wrong here? Thanks.

DatasetGraphMonitor is for monitoring actions on the dataset. Getting the default graph, making it a model, doesn't trigger that machinery. (If it did, you'd get a "not in transaction" exception). The returns graph does straight to the core database.
Instead, either:
Wrap the graph from datasetGraph.getDefaultGraph() with GraphWrapper and put
the monitoring code on the various add/delete methods.
Do the update (in a transaction) on the datasetGraph.

Related

Is it possible to have a map of map in MapState?

private MapState<String, EventsHistory> eventsMap = null;
public void processElement2(Event event,
Context context,
Collector<JoinedEvent> collector) throws Exception {
String name = event.getExperimentName();
if (eventsMap.get(name) == null) {
eventsMap.put(name, new EventsHistory());
}
eventsMap.get(name).put(event.getEventTime(), event);
}
class EventsHistory {
private final Map<Long, Event> events = new HashMap<>();
public Map<Long, Event> getEvents() {
return events;
}
public void put(final Long eventTime, final Event event) {
events.put(eventTime, event);
}
}
I have the above code and would like to use Flink's MapState to maintain a map of maps.
When I test this locally, I can see the state update fine. But when I run it in a cluster, the eventsMap is always empty.
Is it valid to use a map of maps in MapState? Is there a better way to achieve this?
As an alternate, I tried the below version, where I do the grouping myself. Strangely enough this works.
private MapState<EventKey, Event> assignmentEventsMap = null;
public final class EventKey {
private String name;
private long eventTime;
}
public void processElement2(Event event,
Context context,
Collector<JoinedEvent> collector) throws Exception {
String name = event.getExperimentName();
eventsMap
.put(new EventKey(event.getName(), event.getEventTime()),
event);
}
The code you have shared is difficult to understand, but perhaps you have misunderstood what MapState is. ValueState provides a sharded key/value store, distributed across the cluster. MapState gives you a sharded key/value store, where the values themselves are nested Maps.
In other words, MapState is always map of maps. You ended up trying to create a map of maps of maps -- which is one level too far.
I'm assuming you are trying to build this structure, where you effectively have a map from experiment names to nested maps of timestamps to events:
name -> (time -> event)
Assuming that your stream of events has already been keyed by the experiment name, then rather than using MapState<String, EventsHistory> eventsMap, what you really want is MapState<Long, Event> eventsMap, and rather than
eventsMap.get(name).put(event.getEventTime(), event);
you should be doing
eventsMap.put(event.getEventTime(), event);
See the tutorial about ValueState and an example using MapState in the Flink docs for more background how to work with these mechanisms.

How do I make View's asList() sortable in Google Dataflow SDK?

We have a problem making asList() method sortable.
We thought we could do this by just extending the View class and override the asList method but realized that View class has a private constructor so we could not do this.
Our other attempt was to fork the Google Dataflow code on github and modify the PCollectionViews class to return a sorted list be using the Collections.sort method as shown in the code snippet below
#Override
protected List<T> fromElements(Iterable<WindowedValue<T>> contents) {
Iterable<T> itr = Iterables.transform(
contents,
new Function<WindowedValue<T>, T>() {
#SuppressWarnings("unchecked")
#Override
public T apply(WindowedValue<T> input){
return input.getValue();
}
});
LOG.info("#### About to start sorting the list !");
List<T> tempList = new ArrayList<T>();
for (T element : itr) {
tempList.add(element);
};
Collections.sort((List<? extends Comparable>) tempList);
LOG.info("##### List should now be sorted !");
return ImmutableList.copyOf(tempList);
}
Note that we are now sorting the list.
This seemed to work, when run with the DirectPipelineRunner but when we tried the BlockingDataflowPipelineRunner, it didn't seem like the code change was being executed.
Note: We actually recompiled the dataflow used it in our project but this did not work.
How can we be able to achieve this (as sorted list from the asList method call)?
The classes in PCollectionViews are not intended for extension. Only the primitive view types provided by View.asSingleton, View.asSingleton View.asIterable, View.asMap, and View.asMultimap are supported.
To obtain a sorted list from a PCollectionView, you'll need to sort it after you have read it. The following code demonstrates the pattern.
// Assume you have some PCollection
PCollection<MyComparable> myPC = ...;
// Prepare it for side input as a list
final PCollectionView<List<MyComparable> myView = myPC.apply(View.asList());
// Side input the list and sort it
someOtherValue.apply(
ParDo.withSideInputs(myView).of(
new DoFn<A, B>() {
#Override
public void processElement(ProcessContext ctx) {
List<MyComparable> tempList =
Lists.newArrayList(ctx.sideInput(myView));
Collections.sort(tempList);
// do whatever you want with sorted list
}
}));
Of course, you may not want to sort it repeatedly, depending on the cost of sorting vs the cost of materializing it as a new PCollection, so you can output this value and read it as a new side input without difficulty:
// Side input the list, sort it, and put it in a PCollection
PCollection<List<MyComparable>> sortedSingleton = Create.<Void>of(null).apply(
ParDo.withSideInputs(myView).of(
new DoFn<Void, B>() {
#Override
public void processElement(ProcessContext ctx) {
List<MyComparable> tempList =
Lists.newArrayList(ctx.sideInput(myView));
Collections.sort(tempList);
ctx.output(tempList);
}
}));
// Prepare it for side input as a list
final PCollectionView<List<MyComparable>> sortedView =
sortedSingleton.apply(View.asSingleton());
someOtherValue.apply(
ParDo.withSideInputs(sortedView).of(
new DoFn<A, B>() {
#Override
public void processElement(ProcessContext ctx) {
... ctx.sideInput(sortedView) ...
// do whatever you want with sorted list
}
}));
You may also be interested in the unsupported sorter contrib module for doing larger sorts using both memory and local disk.
We tried to do it the way Ken Knowles suggested. There's a problem for large datasets. If the tempList is large (so sort takes some measurable time as it's proportion to O(n * log n)) and if there are millions of elements in the "someOtherValue" PCollection, then we are unecessarily re-sorting the same list millions of times. We should be able to sort ONCE and FIRST, before passing the list to the someOtherValue.apply's DoFn.

How to limit parsing depth using Tinkerpop Frames

Hi I have an interface and a corresponding implementation class like:
public interface IActor extends VertexFrame {
#Property(ActorProps.nodeClass)
public String getNodeClass();
#Property(ActorProps.nodeClass)
public void setNodeClass(String str);
#Property(ActorProps.id)
public String getId();
#Property(ActorProps.id)
public void setId(String id);
#Property(ActorProps.name)
public String getName();
#Property(ActorProps.name)
public void setText(String text);
#Property(ActorProps.uuid)
public String getUuid();
#Property(ActorProps.uuid)
public void setUuid(String uuid);
#Adjacency(label = RelClasses.CoActors, direction = Direction.OUT)
public Iterable<IActor> getCoactors();
}
And I use OrientDB with it that looks something like that. I had similar implementation with Neo4j as well:
Graph graph = new OrientGraph("remote:localhost/actordb");
FramedGraph<Graph> manager = new FramedGraphFactory().create(graph);
IActor actor = manager.frame(((OrientGraph)graph).getVertexByKey("Actor.uuid",uuid), IActor.class);
Above works but the problem is that in this case or similar, because there is a relationship between two vertices of class Actor, there could be potentially a graph loop. Is there a way to define either by Annotation or some other way (e.g through Manager) to stop after x steps for a specific #Adjacency so this won't go forever? If #GremlinGroovy (https://github.com/tinkerpop/frames/wiki/Gremlin-Groovy) annotation is the answer could you please give an example ?
I'm not sure I understand the question/problem. (You say "potentially", but haven't actually proven that there's a problem!)
Is the problem that there is a loop in the Vertex/Frames, and (you think) loading the object will result in an infinite loop?
Have you been able to prove that there is a problem loading a Vertex/Frame with a loop? (show me the code/problem)
As I understand it, the Pipelines will lazy-load objects (only load then when required). The frames (I imagine) only load adjacent frames when requested. Basically, as far as I can tell, theres no problem.
Example (Groovy)
// create some framed vertices
Person nick = createPerson(name: 'Nick')
Person michail = createPerson(name: 'Michail')
// create a recursive loop
nick.addKnows(michail)
michail.addKnows(nick)
// handles recursion = true!
Person nick2 = framedGraph.getVertex(nick.asVertex().id, Person)
assert nick2.knows.knows.knows.knows.knows.name == 'Michail'

NEO4J Spatial: tips about batch inserter

This is my scenario: we are building a routing system by using neo4j and the spatial plugin. We start from the OSM file and we read this file and import nodes and relationships in our graph (a custom graph model)
Now, if we don't use the batch inserter of neo4j, in order to import a compressed OSM file (with compressed dimension of around 140MB, and normal dimensions around 2GB) it takes around 3 days on a dedicated server with the following characteristics: CentOS 6.5 64bit, quad core, 8GB RAM; pease note that the most time is related to the Neo4J Nodes and relationships creation; in-fact if we read the same file without doing anything with neo4j, the file is read in around 7 minutes (i'm sure about this becouse in our process we first read the file in order to store the correct osm nodes ids and then we read again the file in order to create the neo4j graph)
Obviously we need to improve the import proces so we are trying to use the batchInserter. So far, so good (I need to check how much it will perform by using the batchInserter but I guess it will be faster); so the first thing I did was: let's try to use the batch inserter in a simple test case (very similar to our code, but without modifying our code directly)
I list my software versions:
Neo4j: 2.0.2
Neo4jSpatial: 0.13-neo4j-2.0.1
Neo4jGraphCollections: 0.7.1-neo4j-2.0.1
Osmosis: 0.43.1
Since I'm using osmosis in order to read the osm file, I wrote the following Sink implementation:
public class BatchInserterSinkTest implements Sink
{
public static final Map<String, String> NEO4J_CFG = new HashMap<String, String>();
private static File basePath = new File("/home/angelo/Scrivania/neo4j");
private static File dbPath = new File(basePath, "db");
private GraphDatabaseService graphDb;
private BatchInserter batchInserter;
// private BatchInserterIndexProvider batchIndexService;
private SpatialDatabaseService spatialDb;
private SimplePointLayer spl;
static
{
NEO4J_CFG.put( "neostore.nodestore.db.mapped_memory", "100M" );
NEO4J_CFG.put( "neostore.relationshipstore.db.mapped_memory", "300M" );
NEO4J_CFG.put( "neostore.propertystore.db.mapped_memory", "400M" );
NEO4J_CFG.put( "neostore.propertystore.db.strings.mapped_memory", "800M" );
NEO4J_CFG.put( "neostore.propertystore.db.arrays.mapped_memory", "10M" );
NEO4J_CFG.put( "dump_configuration", "true" );
}
#Override
public void initialize(Map<String, Object> arg0)
{
batchInserter = BatchInserters.inserter(dbPath.getAbsolutePath(), NEO4J_CFG);
graphDb = new SpatialBatchGraphDatabaseService(batchInserter);
spatialDb = new SpatialDatabaseService(graphDb);
spl = spatialDb.createSimplePointLayer("testBatch", "latitudine", "longitudine");
//batchIndexService = new LuceneBatchInserterIndexProvider(batchInserter);
}
#Override
public void complete()
{
// TODO Auto-generated method stub
}
#Override
public void release()
{
// TODO Auto-generated method stub
}
#Override
public void process(EntityContainer ec)
{
Entity entity = ec.getEntity();
if (entity instanceof Node) {
Node osmNodo = (Node)entity;
org.neo4j.graphdb.Node graphNode = graphDb.createNode();
graphNode.setProperty("osmId", osmNodo.getId());
graphNode.setProperty("latitudine", osmNodo.getLatitude());
graphNode.setProperty("longitudine", osmNodo.getLongitude());
spl.add(graphNode);
} else if (entity instanceof Way) {
//do something with the way
} else if (entity instanceof Relation) {
//do something with the relation
}
}
}
Then I wrote the following test case:
public class BatchInserterTest
{
private static final Log logger = LogFactory.getLog(BatchInserterTest.class.getName());
#Test
public void batchInserter()
{
File file = new File("/home/angelo/Scrivania/MilanoPiccolo.osm");
try
{
boolean pbf = false;
CompressionMethod compression = CompressionMethod.None;
if (file.getName().endsWith(".pbf"))
{
pbf = true;
}
else if (file.getName().endsWith(".gz"))
{
compression = CompressionMethod.GZip;
}
else if (file.getName().endsWith(".bz2"))
{
compression = CompressionMethod.BZip2;
}
RunnableSource reader;
if (pbf)
{
reader = new crosby.binary.osmosis.OsmosisReader(new FileInputStream(file));
}
else
{
reader = new XmlReader(file, false, compression);
}
reader.setSink(new BatchInserterSinkTest());
Thread readerThread = new Thread(reader);
readerThread.start();
while (readerThread.isAlive())
{
try
{
readerThread.join();
}
catch (InterruptedException e)
{
/* do nothing */
}
}
}
catch (Exception e)
{
logger.error("Errore nella creazione di neo4j con batchInserter", e);
}
}
}
By executing this code, I get this exception:
Exception in thread "Thread-1" java.lang.ClassCastException: org.neo4j.unsafe.batchinsert.SpatialBatchGraphDatabaseService cannot be cast to org.neo4j.kernel.GraphDatabaseAPI
at org.neo4j.cypher.ExecutionEngine.<init>(ExecutionEngine.scala:113)
at org.neo4j.cypher.javacompat.ExecutionEngine.<init>(ExecutionEngine.java:53)
at org.neo4j.cypher.javacompat.ExecutionEngine.<init>(ExecutionEngine.java:43)
at org.neo4j.collections.graphdb.ReferenceNodes.getReferenceNode(ReferenceNodes.java:60)
at org.neo4j.gis.spatial.SpatialDatabaseService.getSpatialRoot(SpatialDatabaseService.java:76)
at org.neo4j.gis.spatial.SpatialDatabaseService.getLayer(SpatialDatabaseService.java:108)
at org.neo4j.gis.spatial.SpatialDatabaseService.containsLayer(SpatialDatabaseService.java:253)
at org.neo4j.gis.spatial.SpatialDatabaseService.createLayer(SpatialDatabaseService.java:282)
at org.neo4j.gis.spatial.SpatialDatabaseService.createSimplePointLayer(SpatialDatabaseService.java:266)
at it.eng.pinf.graph.batch.test.BatchInserterSinkTest.initialize(BatchInserterSinkTest.java:46)
at org.openstreetmap.osmosis.xml.v0_6.XmlReader.run(XmlReader.java:95)
at java.lang.Thread.run(Thread.java:744)
This is related to this code:
spl = spatialDb.createSimplePointLayer("testBatch", "latitudine", "longitudine");
So now I'm wondering: how can I use the batchInserter for my case? I have to add the created nodes to the SimplePointLayer....so how can I create it by using the batchInserter graph db service?
Is there any little simple sample?
Any tip is really really appreciated
cheers
Angelo
The OSMImporter class in the code has an example of using the batch inserter to import OSM data. The main thing is that the batch inserter is not really supported by neo4j spatial, so you need to do a few things manually. If you look at the class OSMImporter.OSMBatchWriter, you will see how it does things. It is not using the SimplePointLayer at all, since that does not support the batch inserter. It is creating the graph structure it wants directly. The simple point layer is quite simple, certainly much simpler than the OSM model created by the code I'm referencing, so I think you should be able to write a batch-inserter compatible version yourself without too much trouble.
What I would recommend is that you create the layer and nodes using the batch inserter to create the correct graph structure, then switch to the normal embedded API and use that to iterate through the nodes and add them to the spatial index.

Json and Circular Reference Exception

I have an object which has a circular reference to another object. Given the relationship between these objects this is the right design.
To Illustrate
Machine => Customer => Machine
As is expected I run into an issue when I try to use Json to serialize a machine or customer object. What I am unsure of is how to resolve this issue as I don't want to break the relationship between the Machine and Customer objects. What are the options for resolving this issue?
Edit
Presently I am using Json method provided by the Controller base class. So the serialization I am doing is as basic as:
Json(machineForm);
Update:
Do not try to use NonSerializedAttribute, as the JavaScriptSerializer apparently ignores it.
Instead, use the ScriptIgnoreAttribute in System.Web.Script.Serialization.
public class Machine
{
public string Customer { get; set; }
// Other members
// ...
}
public class Customer
{
[ScriptIgnore]
public Machine Machine { get; set; } // Parent reference?
// Other members
// ...
}
This way, when you toss a Machine into the Json method, it will traverse the relationship from Machine to Customer but will not try to go back from Customer to Machine.
The relationship is still there for your code to do as it pleases with, but the JavaScriptSerializer (used by the Json method) will ignore it.
I'm answering this despite its age because it is the 3rd result (currently) from Google for "json.encode circular reference" and although I don't agree with the answers (completely) above, in that using the ScriptIgnoreAttribute assumes that you won't anywhere in your code want to traverse the relationship in the other direction for some JSON. I don't believe in locking down your model because of one use case.
It did inspire me to use this simple solution.
Since you're working in a View in MVC, you have the Model and you want to simply assign the Model to the ViewData.Model within your controller, go ahead and use a LINQ query within your View to flatten the data nicely removing the offending circular reference for the particular JSON you want like this:
var jsonMachines = from m in machineForm
select new { m.X, m.Y, // other Machine properties you desire
Customer = new { m.Customer.Id, m.Customer.Name, // other Customer properties you desire
}};
return Json(jsonMachines);
Or if the Machine -> Customer relationship is 1..* -> * then try:
var jsonMachines = from m in machineForm
select new { m.X, m.Y, // other machine properties you desire
Customers = new List<Customer>(
(from c in m.Customers
select new Customer()
{
Id = c.Id,
Name = c.Name,
// Other Customer properties you desire
}).Cast<Customer>())
};
return Json(jsonMachines);
Based on txl's answer you have to
disable lazy loading and proxy creation and you can use the normal methods to get your data.
Example:
//Retrieve Items with Json:
public JsonResult Search(string id = "")
{
db.Configuration.LazyLoadingEnabled = false;
db.Configuration.ProxyCreationEnabled = false;
var res = db.Table.Where(a => a.Name.Contains(id)).Take(8);
return Json(res, JsonRequestBehavior.AllowGet);
}
Use to have the same problem. I have created a simple extension method, that "flattens" L2E objects into an IDictionary. An IDictionary is serialized correctly by the JavaScriptSerializer. The resulting Json is the same as directly serializing the object.
Since I limit the level of serialization, circular references are avoided. It also will not include 1->n linked tables (Entitysets).
private static IDictionary<string, object> JsonFlatten(object data, int maxLevel, int currLevel) {
var result = new Dictionary<string, object>();
var myType = data.GetType();
var myAssembly = myType.Assembly;
var props = myType.GetProperties();
foreach (var prop in props) {
// Remove EntityKey etc.
if (prop.Name.StartsWith("Entity")) {
continue;
}
if (prop.Name.EndsWith("Reference")) {
continue;
}
// Do not include lookups to linked tables
Type typeOfProp = prop.PropertyType;
if (typeOfProp.Name.StartsWith("EntityCollection")) {
continue;
}
// If the type is from my assembly == custom type
// include it, but flattened
if (typeOfProp.Assembly == myAssembly) {
if (currLevel < maxLevel) {
result.Add(prop.Name, JsonFlatten(prop.GetValue(data, null), maxLevel, currLevel + 1));
}
} else {
result.Add(prop.Name, prop.GetValue(data, null));
}
}
return result;
}
public static IDictionary<string, object> JsonFlatten(this Controller controller, object data, int maxLevel = 2) {
return JsonFlatten(data, maxLevel, 1);
}
My Action method looks like this:
public JsonResult AsJson(int id) {
var data = Find(id);
var result = this.JsonFlatten(data);
return Json(result, JsonRequestBehavior.AllowGet);
}
In the Entity Framework version 4, there is an option available: ObjectContextOptions.LazyLoadingEnabled
Setting it to false should avoid the 'circular reference' issue. However, you will have to explicitly load the navigation properties that you want to include.
see: http://msdn.microsoft.com/en-us/library/bb896272.aspx
Since, to my knowledge, you cannot serialize object references, but only copies you could try employing a bit of a dirty hack that goes something like this:
Customer should serialize its Machine reference as the machine's id
When you deserialize the json code you can then run a simple function on top of it that transforms those id's into proper references.
You need to decide which is the "root" object. Say the machine is the root, then the customer is a sub-object of machine. When you serialise machine, it will serialise the customer as a sub-object in the JSON, and when the customer is serialised, it will NOT serialise it's back-reference to the machine. When your code deserialises the machine, it will deserialise the machine's customer sub-object and reinstate the back-reference from the customer to the machine.
Most serialisation libraries provide some kind of hook to modify how deserialisation is performed for each class. You'd need to use that hook to modify deserialisation for the machine class to reinstate the backreference in the machine's customer. Exactly what that hook is depends on the JSON library you are using.
I've had the same problem this week as well, and could not use anonymous types because I needed to implement an interface asking for a List<MyType>. After making a diagram showing all relationships with navigability, I found out that MyType had a bidirectional relationship with MyObject which caused this circular reference, since they both saved each other.
After deciding that MyObject did not really need to know MyType, and thereby making it a unidirectional relationship this problem was solved.
What I have done is a bit radical, but I don't need the property, which makes the nasty circular-reference-causing error, so I have set it to null before serializing.
SessionTickets result = GetTicketsSession();
foreach(var r in result.Tickets)
{
r.TicketTypes = null; //those two were creating the problem
r.SelectedTicketType = null;
}
return Json(result);
If you really need your properties, you can create a viewmodel which does not hold circular references, but maybe keeps some Id of the important element, that you could use later for restoring the original value.

Resources