Coding a type of random walk in Neo4j using the Traversal Framework - neo4j

I'm currently working on a graph where nodes are connected via probabilistic edges. The weight on each edge defines the probability of existence of the edge.
Here is an example graph to get you started
(A)-[0.5]->(B)
(A)-[0.5]->(C)
(B)-[0.5]->(C)
(B)-[0.3]->(D)
(C)-[1.0]->(E)
(C)-[0.3]->(D)
(E)-[0.3]->(D)
I would like to use the Neo4j Traversal Framework to traverse this graph starting from (A) and return the number of nodes that have been reached based on the probability of the edges found along the way.
Important:
Each node that is reached can only be counted once. -> If (A) reaches (B) and (C), then (C) need not reach (B). On the other hand if (A) fails to reach (B) but reaches (C) then (C) will attempt to reach (B).
The same goes if (B) reaches (C), (C) will not try and reach (B) again.
This is a discrete time step function, a node will only attempt to reach a neighboring node once.
To test the existence of an edge (whether we traverse it) we can generate a random number and verify if it's smaller than the edge weight.
I have already coded part of the traversal description as follows. (Here it is possible to start from multiple nodes but that is not necessary to solve the problem.)
TraversalDescription traversal = db.traversalDescription()
.breadthFirst()
.relationships( Rels.INFLUENCES, Direction.OUTGOING )
.uniqueness( Uniqueness.NODE_PATH )
.uniqueness( Uniqueness.RELATIONSHIP_GLOBAL )
.evaluator(new Evaluator() {
#Override
public Evaluation evaluate(Path path) {
// Get current
Node curNode = path.endNode();
// If current node is the start node, it doesn't have previous relationship,
// Just add it to result and keep traversing
if (startNodes.contains(curNode)) {
return Evaluation.INCLUDE_AND_CONTINUE;
}
// Otherwise...
else {
// Get current relationhsip
Relationship curRel = path.lastRelationship();
// Instantiate random number generator
Random rnd = new Random();
// Get a random number (between 0 and 1)
double rndNum = rnd.nextDouble();
// relationship wc is greater than the random number
if (rndNum < (double)curRel.getProperty("wc")) {
String info = "";
if (curRel != null) {
Node prevNode = curRel.getOtherNode(curNode);
info += "(" + prevNode.getProperty("name") + ")-[" + curRel.getProperty("wc") + "]->";
}
info += "(" + curNode.getProperty("name") + ")";
info += " :" + rndNum;
System.out.println(info);
// Keep node and keep traversing
return Evaluation.INCLUDE_AND_CONTINUE;
} else {
// Don't save node in result and stop traversing
return Evaluation.EXCLUDE_AND_PRUNE;
}
}
}
});
I keep track of the number of nodes reached like so:
long score = 0;
for (Node currentNode : traversal.traverse( nodeList ).nodes())
{
System.out.print(" <" + currentNode.getProperty("name") + "> ");
score += 1;
}
The problem with this code is that although NODE_PATH is defined there may be cycles which I don't want.
Therefore, I would like to know:
Is there is a solution to avoid cycles and count exactly the number of nodes reached?
And ideally, is it possible (or better) to do the same thing using PathExpander, and if yes how can I go about coding that?
Thanks

This certainly isn't the best answer.
Instead of iterating on nodes() I iterate on the paths, and add the endNode() to a set and then simply get the size of the set as the number of unique nodes.
HashSet<String> nodes = new HashSet<>();
for (Path path : traversal.traverse(nodeList))
{
Node currNode = path.endNode();
String val = String.valueOf(currNode.getProperty("name"));
nodes.add(val);
System.out.println(path);
System.out.println("");
}
score = nodes.size();
Hopefully someone can suggest a more optimal solution.
I'm still surprised though that NODE_PATH didn't not prevent cycles from forming.

Related

Merge two sorted linked lists: space complexity

I am looking at the following Geeks for Geeks problem:
Given two sorted linked lists consisting of N and M nodes respectively. The task is to merge both of the list (in-place) and return head of the merged list.
Example 1
Input:
N = 4, M = 3
valueN[] = {5,10,15,40}
valueM[] = {2,3,20}
Output: 2 3 5 10 15 20 40
Explanation: After merging the two linked
lists, we have merged list as 2, 3, 5,
10, 15, 20, 40.
Below answer is the GFG answer. I don't understand how its space complexity is O(1). We are creating a new node, so it must be O(m+n).
Node* sortedMerge(Node* head1, Node* head2)
{
struct Node *dummy = new Node(0);
struct Node *tail = dummy;
while (1) {
if (head1 == NULL) {
tail->next = head2;
break;
}
else if (head2 == NULL) {
tail->next = head1;
break;
}
if (head1->data <= head2->data){
tail->next = head1;
head1 = head1->next;
}
else{
tail->next = head2;
head2 = head2->next;
}
tail = tail->next;
}
return dummy->next;
}
Could someone explain how the space complexity is O(1) here?
I can't understand how it's space complexity is O(1). Since we are creating a new node so it must be O(m+n).
Why should it be O(m+n) when it creates one node? The size of that node is a constant, so one node represents O(1) space complexity. Creating one node has nothing to do with the size of either of the input lists. Note that the node is created outside of the loop.
It is actually done this way to keep the code simple, but the merge could be done even without that dummy node.

How to describe the Possible Routes between 2 Nodes using Pseudo code

I'am trying to develop an algorithm using Pseudo code to display all possible routes between 2 Nodes. I have looked at the dijkstra algorithm but I'am having some difficulty trying to create an algorithm using only Pseudo Code.
Possible Routes Between node 7 and 5
I have identified all the possible Routes (Without passing the same node twice)
7 -> 4 -> 5
7 -> 6 -> 2 ->1 -> 8 -> 5
7-> 6 -> 4 -> 5
7 -> 6 -> 2 -> 1 -> 3 -> 5
Set of Nodes: 1,2,3,4,5,6,7,8
Link between Nodes 1+2, 1+3, 1+8, 2+6, 3+5, 4+5, 4+6, 4+7, 5+8, 6+7.
Using DFS: The idea is to do Depth First Traversal of given directed
graph. Start the traversal from source. Keep storing the visited
vertices in an array say ‘path[]’. If we reach the destination vertex,
print contents of path[]. The important thing is to mark current
vertices in path[] as visited also, so that the traversal doesn’t go
in a cycle.
Java Implementation:
// Prints all paths from
// 's' to 'd'
public void printAllPaths(int s, int d)
{
boolean[] isVisited = new boolean[v];
ArrayList pathList = new ArrayList<>();
//add source to path[]
pathList.add(s);
//Call recursive utility
printAllPathsUtil(s, d, isVisited, pathList);
}
// A recursive function to print
// all paths from 'u' to 'd'.
// isVisited[] keeps track of
// vertices in current path.
// localPathList<> stores actual
// vertices in the current path
private void printAllPathsUtil(Integer u, Integer d,
boolean[] isVisited,
List localPathList) {
// Mark the current node
isVisited[u] = true;
if (u.equals(d))
{
System.out.println(localPathList);
}
// Recur for all the vertices
// adjacent to current vertex
for (Integer i : adjList[u])
{
if (!isVisited[i])
{
// store current node
// in path[]
localPathList.add(i);
printAllPathsUtil(i, d, isVisited, localPathList);
// remove current node
// in path[]
localPathList.remove(i);
}
}
// Mark the current node
isVisited[u] = false;
}
You can check another ways here https://efficientcodeblog.wordpress.com/2018/02/15/finding-all-paths-between-two-nodes-in-a-graph/

Reactive way of implementing 'standard pagination'

I am just starting with Spring Reactor and want to implement something that I would call 'standard pagination', don't know if there is technical term for this. Basically no matter what start and end date is passed to method, I want to return same amound of data, evenly distributed.
This will be used for some chart drawing in the future.
I figured out rough copy with algorithm that does exactly that, unfortunatelly before I can filter results I need to either count() or take last index() and block to get this number.
This block is surelly not the reactive way to do this, also it makes flux to call DB twice for data (or am I missing something?)
Is there any operator than can help me and get result from count() somehow down the stream for further usage, it would need to compute anyway before stream can be processed, but to get rid of calling DB two times?
I am using mongoDB reactive driver.
Flux<StandardEntity> results = Flux.from(
mongoCollectionManager.getCollection(channel)
.find( and(gte("lastUpdated", begin), lte("lastUpdated", end))))
.map(d -> new StandardEntity(d.getString("price"), d.getString("lastUpdated")));
Long lastIndex = results
.count()
.block();
final double standardPage = 10.0D;
final double step = lastIndex / standardPage;
final double[] counter = {0.0D};
return
results
.take(1)
.mergeWith(
results
.skip(1)
.filter(e -> {
if (lastIndex > standardPage)
if (counter[0] >= step) {
counter[0] = counter[0] - step + 1;
return true;
} else {
counter[0] = counter[0] + 1;
return false;
}
else
return true;
}));

Different result of Traversal

Traversal API is giving different result for seemingly same declaration. In method 1. I took method 1 sample from neo4j's site. And tried to restructure it in method 2. However, apparently there is not difference, both methods are producing different output. Method2 is completely skipping LIKE relationship. Even if I change the sequence in method1 like putting depthFirst() in last, the output changes.
It will be great if someone could please help me understand this different output?
Method 1:
void depthFirst() {
GraphDatabaseBuilder graphDbBuilder = new GraphDatabaseFactory()
.newEmbeddedDatabaseBuilder(storeDir);
GraphDatabaseService graphDb = graphDbBuilder.newGraphDatabase();
String output = "";
int i = 0;
try (Transaction tx = graphDb.beginTx()) {
Node node = graphDb.findNode(LabelTyeps.Person, "name", "Joe");
for (Path position : graphDb.traversalDescription().depthFirst()
.relationships(RelationshipTypes.KNOWS)
.relationships(RelationshipTypes.LIKES, Direction.INCOMING)
.evaluator(Evaluators.toDepth(5)).traverse(node)) {
output += position.toString() + ":"
+ (String) position.endNode().getProperty("name")
+ "\n";
}
System.out.println(output);
}
graphDb.shutdown();
}
Output of method 1:
(3):Joe
(3)<--[LIKES,1]--(8):Lisa
(3)<--[LIKES,1]--(8)--[KNOWS,2]-->(4):Lars
(3)<--[LIKES,1]--(8)--[KNOWS,2]-->(4)--[KNOWS,4]-->(7):Dirk
(3)<--[LIKES,1]--(8)--[KNOWS,2]-->(4)--[KNOWS,4]-->(7)--[KNOWS,5]-->(6):Peter
(3)<--[LIKES,1]--(8)--[KNOWS,2]-->(4)--[KNOWS,4]-->(7)--[KNOWS,5]-->(6)--[KNOWS,7]-->(5):Sara
(3)<--[LIKES,1]--(8)--[KNOWS,2]-->(4)<--[KNOWS,3]--(9):Ed
Method 2 (Just changed the way travDesc is structured)
try (Transaction tx = graphDb.beginTx()) {
Node node = graphDb.findNode(LabelTyeps.Person, "name", "Joe");
TraversalDescription travDesc = graphDb.traversalDescription();
travDesc.depthFirst();
travDesc.relationships(RelationshipTypes.KNOWS);
travDesc.relationships(RelationshipTypes.LIKES, Direction.INCOMING);
travDesc.evaluator(Evaluators.toDepth(5));
for (Path position : travDesc.traverse(node)) {
// System.out.println("Loop count: " + ++i);
output += position.toString() + ":"
+ (String) position.endNode().getProperty("name")
+ "\n";
// System.out.println(output);
}
System.out.println(output);
}
Output of method 2
(3):Joe
(3)--[KNOWS,6]-->(5):Sara
(3)--[KNOWS,6]-->(5)<--[KNOWS,7]--(6):Peter
(3)--[KNOWS,6]-->(5)<--[KNOWS,7]--(6)<--[KNOWS,5]--(7):Dirk
(3)--[KNOWS,6]-->(5)<--[KNOWS,7]--(6)<--[KNOWS,5]--(7)<--[KNOWS,4]--(4):Lars
(3)--[KNOWS,6]-->(5)<--[KNOWS,7]--(6)<--[KNOWS,5]--(7)<--[KNOWS,4]--(4)<--[KNOWS,3]--(9):Ed
(3)--[KNOWS,6]-->(5)<--[KNOWS,7]--(6)<--[KNOWS,5]--(7)<--[KNOWS,4]--(4)<--[KNOWS,2]--(8):Lisa
Sample data:
create (:Person {name:"Joe"})
,(:Person{name:"Lars"})
,(:Person{name:"Sara"})
,(:Person{name:"Peter"})
,(:Person{name:"Dirk"})
,(:Person{name:"Lisa"})
,(:Person{name:"Ed"})
match (a:Person{name:"Lisa"}), (b:Person{name:"Joe"}) create (a) - [:LIKES] -> (b)
match (a:Person{name:"Lisa"}), (b:Person{name:"Lars"}) create (a) - [:KNOWS] -> (b)
match (a:Person{name:"Ed"}), (b:Person{name:"Lars"}) create (a) - [:KNOWS] -> (b)
match (a:Person{name:"Lars"}), (b:Person{name:"Dirk"}) create (a) - [:KNOWS] -> (b)
match (a:Person{name:"Dirk"}), (b:Person{name:"Peter"}) create (a) - [:KNOWS] -> (b)
match (a:Person{name:"Joe"}), (b:Person{name:"Sara"}) create (a) - [:KNOWS] -> (b)
match (a:Person{name:"Peter"}), (b:Person{name:"Sara"}) create (a) - [:KNOWS] -> (b)
TraversalDescription is a immutable fluent API, quoting form http://neo4j.com/docs/java-reference/current/javadocs/org/neo4j/graphdb/traversal/TraversalDescription.html
A traversal description is immutable and each method which adds or modifies the behavior returns a new instances that includes the new modification, leaving the instance which returns the new instance intact.

pugixml number of child nodes

Does a pugixml node object have a number-of-child-nodes method? I cannot find it in the documentation and had to use an iterator as follows:
int n = 0;
for (pugi::xml_node ch_node = xMainNode.child("name"); ch_node; ch_node = ch_node.next_sibling("name")) n++;
There is no built-in function to compute that directly; one other approach is to use std::distance:
size_t n = std::distance(xMainNode.children("name").begin(), xMainNode.children("name").end());
Of course, this is linear in the number of child nodes; note that computing the number of all child nodes, std::distance(xMainNode.begin(), xMainNode.end()), is also linear - there is no constant-time access to child node count.
You could use an expression based on an xpath search (no efficiency guarantees, though):
xMainNode.select_nodes( "name" ).size()
int children_count(pugi::xml_node node)
{
int n = 0;
for (pugi::xml_node child : node.children()) n++;
return n;
}

Resources