JUNG - How to get the exact vertex in Graph? - jung

I have to create a graph with its self-defined node type and the nodes & connections are read from a txt file one by one.
The file format is like this: startNode attibutes endNode.
Every time I read one line, I created 2 node objects: startNode & endNode. and add edge between them..
However, the startNode may exist in several lines..
e.g. V1 ... V2 ; V1 ... V3
Therefore, I have to check whether my graph has contained the node before I add edges..and I should use the vertex in graphs instead of the node newly created..
Does jung have any built-in method to solve this problem?
Or any suggestions?

The short answer is: by contract, JUNG's graph implementations take care of this for you, as long as your custom node/edge objects' implementations of equals() and hashCode() do the Right Thing.
If you try to add a vertex to a graph and it's already present in the graph, the addVertex() method will return false (meaning 'nothing done') just as with the analogous add() method in Set.
Also note that the addEdge() methods will add the connected vertices to the graph for you if they're not already present.

JUNG considers the vertices (and edges) different as long as they are referenced to different objects. If you create two vertex objects with same properties, they will be considered as different vertices and you will be able to insert both of them into the graph. JUNG doesn't have an equals method that you can override (to check the vertex object's properties) to perform a check whether two vertex objects are the same or not. Therefore you need to manually maintain the list of vertices (and edges) in your graph to avoid adding a vertex you already have in your graph. However you can easily do that with a HashMap (if your graph is not too big).

Related

Extracting subgraph in tensorflow

I have pretrained network and I'm trying to get just a part of it (subgraph) tf graph along with variables and saver object.
this is how I'm doing it:
subgraph = tf.graph_util.extract_sub_graph(default_graph, list of nodes to preserve)
tf.reset_default_graph()
tf.import_graph_def(subgraph)
This however removes all variables (when I call reset_default_graph). Even If I explicitely add the operation nodes for variables (only the "variable" type operations) into the "list of nodes to preserve".
How can I preserve subgraph of larger graph while preserving values of variables?
Is it a matter of addition some new nodes to "preserve list"?
The relation between graph nodes and variables is still unclear to me and tutorial merely mentions that creation of variable creates some operations (nodes) in the graph.
I think what you are doing looks right. As you said, a Variable is a simply an operation (a node in graph) that outputs a tensor of certain values. You should be able to add Variable nodes to the list to preserve them, as you have been already doing. Could you use print(sess.graph_def) to make sure the names you provided are correct?

how new traverse api works

nowadaya i m learning new traverse api of neo4j and i followed the link below
http://neo4j.com/docs/stable/tutorial-traversal-java-api.html
so now i know how to use uniqueness,evaluater etc.
that is i know how to change beahviours of the api.
but the thing i want to know is that how exactly it traverse.
for example im trying to find neighbours of a node.
does neo4j use index to find this?
does neo4j keep a hash to find neighbours?
more specifically, when i write the following code for example.
TraversalDescription desc = database.traversalDescription().breadthFirst().evaluator( Evaluators.toDepth( 3) );
node =database.getNodeById(4601410);
Traverser traverser = desc.traverse(node);
in my description i used breadthFirst. So it means that when i give node to traverse, the code should find the first neighbours. So how the api finds the first neighbours is the thing i want to know. Is there a pointer to neighbours in node? So when i say traverse until to depth 3 it finds the first neighbours and then take the neighbours as node in a recursive function and so on? So if we say to depth 10 then it can be slow?
so what i want exactly is how i can change the natural behaviour of the api to traverse?
Simplified, Neo4j stores records representing nodes and relationships a.s.o. in its store. Every node is represented by a node record on disk, that record contains a pointer (direct offset into relationship store) for the first relationship (neighbour if you will). Relationship records link to each other, so getting all neighbours for a node will read the node record, its relationship pointer to that relationship record and continue following those forward pointers until the end of that chain. Does that answer your question?
TraversalDescription features a concept of PathExpander - that is the component deciding which relationships will be used for the next step. Use TraversalDescription.expand() for this.
You can either use your own implementation for PathExpander or use one of the predefined methods in PathExpanders.
If you just want your traversal follow specific relationship types you can use TraversalDescription.relationships() to specify those.

neo4j spatial contain search

i'm trying to develop a web service able to give me back the name of the administrative area that contains a given gps position.
I have already developed a java application able to insert some polygons (administrative areas of my country) in neo4j using spatial plugin and Java API. Then, giving a gps position, i'm able to get the name of the polygon that contains it.
Now i'm trying to do the same using REST API of Neo4j (instead of java api) but i'm not able to find any example.
So my questions are:
1) Is possible to insert polygons in Neo4j using REST API (if i well understood is possible using WKT format) ?
2) is possible to execute a spatial query that finds all polygons that contain a given gps position ?
thanks, Enrico
The answer to both of your questions is yes. Here are example steps that use REST and Cypher.
1) Create your spatial layer and index (REST). In this example, my index is named 'test' (a layer of the same name and base spatial nodes will be created), and the name of the property on my nodes that will contain the wkt geometry information is 'wkt'.
POST http://localhost:7474/db/data/index/node {"name":"test", "config":{"provider":"spatial", "wkt":"wkt"}}
2) Create a node (Cypher). You can have labels and various properties. The only part that Neo4j Spatial cares about is the 'wkt' property. (You could do this step with REST.)
CREATE (n { name : "Fooville", wkt : "POLYGON((11.0 11.0, 11.0 12.0, 12.0 12.0, 12.0 11.0, 11.0 11.0))" })
3) Add the node to the layer. You can do this by adding the node to the index or to the layer, but there is an important difference. If you add it to the index, a copy node containing only the geometry data will be created, and that will be added to the layer. Querying via Cypher will return your original node, but querying via REST or Java will return the copy node. If you add the node directly to the layer, then you must take an extra step if you want to be able to query with Cypher later. In both cases you will need the URI of the node, the last element of which is the Neo4j node number. In the example below, I assume the node number is 4 (which it will be if you do this example on a fresh, empty database).
Method 1:
POST http://localhost:7474/db/data/ext/SpatialPlugin/graphdb/addNodeToLayer { "layer":"test", "node":"http://localhost:7474/db/data/node/4" }
To make this node searchable via Cypher, add the node number to the node as a user 'id' property. (You could do this with REST.)
START n = node(4) SET n.id = id(n)
Method 2: Using this method will double your node count, double your WKT storage, and produce differing results when querying via REST vs Cypher.
POST http://localhost:7474/db/data/index/node/test {"value":"dummy","key":"dummy","uri":"http://localhost:7474/db/data/node/4"}
3) Run your query. You can do a query in REST or Cypher (assuming you conditioned the nodes as described above). The Cypher queries available are: 'withinDistance', 'withinWKTGeometry', and 'bbox'. The REST queries available are: 'findGeometriesWithinDistance', 'findClosestGeometries', and 'findGeometriesInBBox'. It's interesting to note that only Cypher allows you to query for nodes within a WKT geometry. There's also a difference in REST between the findClosestGeometries and findGeometriesWithinDistance that I don't yet understand, even though the arguments are the same. To see how to make the REST calls, you can issue these commands:
POST http://localhost:7474/db/data/ext/SpatialPlugin/graphdb/findGeometriesWithinDistance
POST http://localhost:7474/db/data/ext/SpatialPlugin/graphdb/findClosestGeometries
POST http://localhost:7474/db/data/ext/SpatialPlugin/graphdb/findGeometriesInBBox
The Cypher queries are: (replace text between '<>', including the '<>', with actual values)
START n = node:<layer>("withinDistance:[<y>, <x>, <max distance in km>]")
START n = node:<layer>("withinWKTGeometry:POLYGON((<x1> <y1>, ..., <xN> <yN>, <x1> <y1>))")
START n = node:<layer>("bbox:[<min x>, <max x>, <min y>, <max y>]")
I have assumed in all of this that you are using a longitude/latitude coordinate reference system (CRS), so x is longitude and y is latitude. (This preserves a right-handed coordinate system in which z is up.)

Neo4j: Java API to compute intersection multiple properties

I'm very new in using Neo4j and have a question regarding the computation of intersections of nodes.
Let's suppose, I have the three properties A,B,C and I want to select only the nodes that have all three properties.
I created an index for the properties and thus, I can get all nodes having one of the properties. However, afterwards I have to merge the IndexHits. Is there a way to select directly all nodes having the three properties?
My second idea was to create a node for each property and connect other nodes by relationships. I can then iterate over all relationships and get for each property a list of nodes which are connected. But again, I have to compute the intersection afterwards.
Is there a function I miss here, since I suppose it's a standard problem.
Thanks a lot,
Benny
Do you also have the values you look for? You would start with the property that limits the amount of found nodes most.
MATCH (a:Label {property1:{value1}})
WHERE a.property2 = {value2} AND a.property3 = {value3}
RETURN a
For the Java API and lucene indexes:
gdb.index().forNodes("foo").query("p1:value1 p2:value2 p3:value3")
Lucene query syntax

Neo4j Key-Value List recommended implementation

I've been using Neo4j for a little while now and have an app up and running using Neo4j, its all working really well and Neo4j has been really cool at solving this problem, but I now need to extend the app and having been trying to impl. a Key-Value List of data into Neo4j and I'm not sure the best way to go about it.
I have a List, the list is around 7 million elements in length and so a bit long for just storing the whole list in memory and managing it myself. I tested this and it would consume 3Gb.
My choices are either:
(a) Neo4j is just the wrong tool for the job and I should use an actual key-value data store. A little adverse to do this as I'd have to introduce another data store just for this list of data.
(b) Use Neo4j, by creating a node per key-value setting the key and value as properties on the node, but there is no relationship other then having an index to group these nodes together, exposing the key of the key-value as the key on the index.
(c) Create a single node and hold all key-values as properties, this feels wrong, because when getting the node it will load the whole thing into memory, then I'd have to search the properties for the one I'm interested in, and I might as well manage the List myself.
(d) The key is a two part key that actually points to two nodes, so create a relationship and set the value as a property on the relationship. I started down this path, but when it came to doing a lookup for a specific key/value it's not simple and fast, so backed away from this.
Either options 'a' or 'b' feel the way to go.
Any advice would be appreciated.
Example scenario
We have Node A and Node B which has a relationship between the two Nodes.
The nodes all have a property of 'foo', with foo having some value.
In this example node A has foo=X and Node B has foo=Y
We then have this list of K/Vs. One of those K/V is Key:X+Y=Value:Z
So, the original idea was to create another relationship between Node A and Node B and store a property on the relationship holding Z. Then create an index on 'foo' and a relationship idx on the new relationship.
When given Key X+Y get the value.
Lookup logic would be get Node A (from X) and Node B (from y), then walk through Node A relationships to Node B lookup for this new relationship type.
While this will work, I do not like the fact I have to lookup through all relationships to/from the nodes looking for a specific type this is inefficient. Especially if there are many relationships of different types.
So the conclusion to go with options 'A' or 'B', or I'm trying to do something impractical with Neo.
Don't try to store 7 million items in a Neo4j property -- you're right, that's wrong.
Redis and Neo4j often make a good pairing, but I don't quite understand what you're trying to do or what you mean in "d" -- what are the key/value pairs, and how do they relate to the nodes and relationships in the graph? Examples would help.
UPDATE: The most natural way to do this with a graph database is to store it as a property on the edge between the two nodes. Then you can use Gremlin to get its value.
For example, to return a property on an edge that exists between two vertices (nodes) that have some properties:
start = g.idx('vertices')[[key:value]] // start vertex
edge = start.outE(label).as('e') // edge
end = edge.inV.filter{it.someprop == somevalue} // end vertex
prop = end.back('e').prop // edge property
return prop
You could store it in an index like you suggested, but this adds more complexity to your system, and if you need to reference the data as part of the traversal, then you will either have to store duplicate data or look it up in Redis during the traversal, which you can do, see:
Have Gremlin Talk to Redis in Real Time while It's Walking the Graph
https://groups.google.com/d/msg/gremlin-users/xhqP-0wIg5s/bxkNEh9jSw4J
UPDATE 2:
If the ID of vertex a and b are known ahead of time, then it's even easier:
g.v(a).outE(label).filter{it.inVertex.id == b}.prop
If vertex a and b are known ahead of time, then it's:
a.outE(label).filter{it.inVertex == b}.prop

Resources