Using efficiently JUNG with TinkerPop3 - graph-algorithm

I am using TinkerPop3 in Java and I have a weighted multi-graph on which I would like to run Dijkstra algorithm to find Shortest Weighted Path between two vertices. I have found in other questions that the recommended way to do this is by using JUNG with TinkerPop, but they are related to TinkerPop2, which had the JungGraph as part of blueprints.
My questions is whether there is any efficient way to use JUNG on Tinkerpop3 graphs, as the only way I have found for now is to create a new JUNG graph and iteratively add all edges from my TinkerPop3 graph to it. Any alternative suggestions to JUNG are also welcomed.

I am neither familiar with TinkerPop nor with its data model. In general you have two basic ways to provide an instance of B given an instance of A, given that B and A are reasonably compatible:
copy: create an instance of B, iterate over the elements of A and copy them into B (this is your current solution)
view: create a class that redirects calls to the methods of A to the appropriate methods for B. You can do this by implementing the appropriate interface (or extending the appropriate (abstract) class).
Assuming you're using JUNG 2.x, you could extend the Abstract[Typed]Graph class. You may find it useful to look at the GraphDecorator class to see an example of this sort of delegation (in that case the class being delegated to is an instance of Graph, but it should be straightforward to adapt the model to delegate to TinkerPop if that model has appropriate methods).
Note: the JUNG data model used in v2.x is being replaced with the Guava common.graph data model in JUNG v3.x. The same basic ideas apply, however.

Related

Neo4j data modeling: correct way to specify a source for a statement?

I'm working on a scientific database that contains model statements such as:
"A possible cause of Fibromyalgia is Microglial hyperactivity, as supported by these 10 studies: [...] and contradicted by 1 study [...]."
I need to specify a source for statements in Neo4j and be able to do 2 ways operations, like:
Find all statements supported by a study
Find all studies supporting a statement
The most immediate idea I had is to use the DOI of studies as unique identifiers in the relationship property. The big con of this idea is that I have to scan all the relationships to find the list of all statements supported by a study.
So, since it is impossible to make a link between a study and a relationship, I had the idea to make 2 links, at each extremity of the relationship. The obvious con is that it does not give information about the relationship, like "support" or "contradict".
So, I came to the conclusion that I need a node for the hypothesis:
However, it overloads the graph and we are not anymore in the classical node -relationship-> node design that makes property graphs so easy to understand.
Using RDF, it is possible to add properties to relationships using subgraphs, however there we enter semantic graphs and quad stores, which is a more complex tool.
So I'm wondering if there is a "correct" design pattern for Neo4j to support this type of need that I may not have imagined instead?
Thanks
Based on your requirements, I think put support_study as property of edge will do the work:
Thus we could query the following as:
Find all statements supported by a study
MATCH ()-[e:has_cause{support_study: "doi_foo_bar"}]->()
RETURN e;
Find all studies supporting a statement
Given statement is “foo” is caused by “bar”
MATCH (v:disease{name: "foo"})-[e:has_cause]->(v1:sympton{name: "bar")
RETURN DISTINCT e.support_study;
While, this is mostly based on NebulaGraph, where:
It speaks cypher DQL(together with nGQL)
It supports properties in edge
It used 4-tuple(rather than a Key) to distingush an edge(src,dst,edge_type,rank), where rank is an unique design to enable multiple has_cause edge instance between one pair of disease-> sympton, you could put the hash of doi or other number as rank field(or omit, of cause, it will be 0)
It’s distributed and Open-Source(Apache 2.0)
Note:
In NebulaGraph, index should be created on has_cause(support_study) and disease(name), ref: https://www.siwei.io/en/nebula-index-explained/ and https://docs.nebula-graph.io/3.2.0/3.ngql-guide/14.native-index-statements/
But, I think it applies to neo4j, too :)

Using ontology to infer labels for process model

I'm trying to implement a specific type of process mining, that has been presented in this thesis [link]. It is based on HMMs and generates a process model in form of a directed graph, where:
Nodes are called intentions and correspond to hidden states
Edges are called strategies and consist of different activities
These activities correspond to the HMM's observable emissions
Intentions can be fulfilled using different strategies
A user event log consisting of user IDs, timestamps and activities is used as input. The image below is an example of such a process model. The highlighted nodes and edges resemble the path that has been predicted using the Viterbi algorithm.
You can see that the graph's nodes and edges only carry numeric labels, which allow to distinguish between the different strategies and intentions. In order to make these labels more meaningful to the human reader, I'd like to infer some suitable labels.
My idea is to use an ontology to obtain those labels. After some research I figured out that I probably needed to do something that is generally referred to as "ontology learning". For this I would need to create some axioms in RDF/OWL format and then use these as input for a reasoner, that would infer an ontology.
Is this approach correct and reasonable to achieve my goal?
If this is the way to go, I will need some tool to generate axioms in an automated way. So far I couldn't find any tool that would do that completely out-of-the-box. Based on what I've seen so far I conclude that I would need to define some kind of mapping between the original data and the desired axioms. I took a closer look at protégé, which offers a plugin for spreadsheets. It seems to be based on the MappingMasterDSL project [link].
I've also found an interesting paper [link] on ontology learning where an RNN-based model is trained in a end-to-end fashion to translate definitory sentences into OWL formulae. BUT: My user event log data does not contain any natural sentences. Its activities are defined by tokens derived from HTML elements of the user interface. Therefore the RNN-based approach does not seem to be applicable here. (For the interested reader, the related project can be found here [link])
Isn't there really any easier way than hand-crafting the axioms' schema(ta)?
Assuming that I have created my axioms and inferred an ontology, I would like to use the strategies' (edges') observable activities (emissions) to infer a suitable label. I guess I would need to query my ontology somehow. I could use the activity names as parameters for my query and look for some related entities that reveal the desired label. I'm expecting something like:
"I have a strategy with ID=3, that strategy can be executed with
actions a, b and c, give me all entities of the ontology, that
have these actions as property value and show and give me all related
labels for those entities"
But where would the data for the labels actually come from?
I think I'm missing some important step during the process of ontology learning. Where do I find an additional data source for the labels and how do I relate this data to my ontology's entities?
Also I'm wondering if there is a way to incorporate the inherent knowledge of the process model's topology into my ontology.

Neo4j entity relationship diagram

How to extract an entity relationship diagram from a graph database? I have all the required files that was created from my application.
You can use
call db.schema for a graph representation of the graph data model. There are a few other functions to get the properties, keys, indexes, etc like call db.indexes, call db.propertykeys etc.
The APOC procedure library has a few relevant functions that might help to get a tabular layout - or develop it yourself in Excel from the labels, property keys, etc.
You can also build a data model using the Arrows tool
Please reorient your thinking to use graph terms - the equivalent for the ER diagram would be a model built using the Arrows tool or the db.schema.
I used: CALL db.schema.visualization for visualizing the database schema. Like https://stackoverflow.com/a/45357049/7924573 already said, in graph databases this is as closest as you can to ER-diagrams. In the remote interface you can export it directly as e.g. .svg graphic
Here is an example:

Querying in Gremlin using multiple indices

I am trying to optimize requests in Gremlin on a Neo4J graph.
Here is the short version of the basic request I am using:
g.idx("myIndex")[[myId:5]].outE("HAS_PRODUCT").filter{it.shop_id:5}.inV
So I looked into indexation and got to create an index on "HAS_PRODUCT"-typed edges with key 'shop_id'.
Using the same request, I don't see a big difference.
My questions are:
Is my new index used when I query with: filter{it.shop_id:5}
If not, how can I use this new index in my request?
More generally, if idx( is the graph method to use an index, is there a pipe method for that?
Thanks!
The short answer is that Gremlin won't make use of the secondary index when using Neo4j, but please consider reading the longer answer below in relation to TinkerPop, Gremlin and its philosophy.
The longer answer is....Indices are not being used for your shop_id. When you call outE you are effectively iterating all the edges to find those with shop_id == 5. To make use of indices in Gremlin you should use a vertex query. So, rewriting your code a bit (to also use key indices) would be like:
g.V('myIndex',5).outE('HAS_PRODUCT').has('shop_id',5).inV
With Blueprints implementations that support vertex-centric indices, the use of has will utilize that index automatically. Unfortunately, Neo4j is not one of those databases yet. Blueprints implementations that do implement it, include Titan (see Vertex-centric indices) and OrientDB (as part of the yet unreleased Blueprints 2.4.0...I believe they will have a partial implementation in that release) in that case.

How can I specify which relationship type to use as a function of the current node at every step of a traversal with neo4j?

I'd like to traverse my graph using the neo4j traversal API, but I need to be able to specify which relationship type to use at every step, and the relationship type to use needs to be a function of the current node. Is there a way to do this?
in the current Traverser API you can't choose the exact relationship to traverse. Instead, you take the more granular approach of node.getRelationships(), chose the one you want and the end onde on it, and so on.
The algo gets a bit more verbose than using Traverser, but gives you more flexibility. For a tinkering approach, Gremlin supports the notion of functions for choosing edges to traverse, see here. This will soon be implemented using Blueprint Pipes for Java-level performance.
HTH
/peter neubauer

Resources