Get sequence of common Nodes in two Neo4j Paths

Get sequence of common Nodes in two Neo4j Paths - neo4j

Let's have for example the following travel routes in Neo4j:
Tom: New York --> Moskow --> Berlin --> Paris --> Mumbay --> Kairo --> Rio --> Amsterdam --> Nashville
Mary: Madrid --> New York --> Moskow --> Berlin --> Mumbay --> Kairo --> New York --> Nashville --> Toronto
Bob: New York --> Nashville --> Amsterdam --> Rio --> Kairo --> Mumbay --> Paris --> Berlin --> Moscow
Now I need to get the travel route that is most similar to Tom's route. Similarity means, the number of common cities that are visited in the same sequence, while skipping cities in between is allowed.
Expected result:
Common route of Tom and Mary: New York -> Moskow -> Berlin -> Mumbay -> Kairo -> Nashville => Similarity = 6
Common route of Tom and Bob: New York -> Nashville (also New York -> Amsterdam and other paths) => Similarity = 2
Note: Even Tom and Bob have 9 common cities, but they are not in the same sequence, this is not counting for similarity!
Is it possible with Neo4j to get this kind of similarity?
I would be able to read all the paths and then compare them outside of Neo4j. But better would be to do this with Neo4j, because it will be more elegant and probably faster.
Added:
My current algorithm is a similarity matrix, where I have one path on the x-axis and another on the y-axis. Every match between x and y causes an increment of all values that are on right side and below the match. I other words, every value in the matrix represents the count of the common path before the cell.
Condition to increment a value is that the value is not already greater. This may happen if there are several common paths, in this case the longest path must win.
Two examples of matrix of above examples:
Full Java code:
public static int getLongestCommonWordsInSequence(List<String> yList, List<String> xList) {
// right = X, left = Y
// Need dimension +1, because matches-count is incremented not on match point, but on next indexes
int[][] matrix = new int[xList.size() + 1][yList.size() + 1];
// Iterate y
for (int y = 0; y < yList.size(); y++) {
// Iterate x
for (int x = 0; x < xList.size(); x++) {
// Check if y = x
if (yList.get(y).equals(xList.get(x))) {
// Increment the count
int newCount = matrix[x][y] + 1;
// Update all counts that are on right side AND below the current match
for (int _x = x + 1; _x <= xList.size(); _x++) {
for (int _y = y + 1; _y <= yList.size(); _y++) {
// Update only if value is < newCount
if (matrix[_x][_y] < newCount) {
matrix[_x][_y] = newCount;
}
}
}
}
}
}
return matrix[matrix.length - 1][matrix[matrix.length - 1].length - 1];
}
Added
This is my Java code I would use to create the data of a single journey. This is only experimental. I will extend it or create any other structure if it is of advantage.
Node journeyNode = graphDb.createNode(Label.label("Journey"));
for (Destination destination : journey.destinations) {
Node destinationNode = graphDb.createNode(Label.label("Destination"));
destinationNode.setProperty("name", destination.name);
journey.createRelationshipTo(destinationNode, RelationshipType.withName("Destination"));
}

Related

Creating adjacency list from 2D array of ints

I have run in to a roadblock when writing an algorithm to find:
a) the shortest bus route and
b) the route with the least switches.
I need to use BFS to solve it, but we are not given an adjacency list for each vertex, so I am confused as to how to make the adjacency list out of a 2D array of Integers, where each number represents a bus stop, and each row is its own route:
So, [1 2 3 4 5] is Route 1,
[6 7] is Route 2 etc.
Also, the first station of each route is the next station
the bus will reach from the last station of the route,
so each route goes in circles:
1 -> 2 -> 3 -> 4 -> 5 -> 1 -> 2 -> ...
1 2 3 4 5
6 7
8 7 4
2 6
So far what I have is this:
public void fillAdjacencyList(int[][] routes){
for(int i=0; i<routes.length; i++){
for(int j=0; j<routes[i].length; j++){
if(this.stop==routes[i][routes[i].length-1]){
BusStop temp=new BusStop(routes[i][0]);
this.next.add(temp);
}else{
if(routes[i][j]==this.stop){
BusStop temp=new BusStop(routes[i][j+1]);
this.next.add(temp);
}
}
}
}
this.visited='*';
}
The first vertex is initialized when i read the starting point from where I need to find the shortest bus route. Each vertex is defined as:
int stop;
ArrayList<BusStop> next;
char visited;
char created;
int[] line;
public BusStop(int st){
this.stop=st;
this.created='*';
}

The Python function below does the job. I made it in python so it's less verbose and (hopefully) easier to understand
def to_adj(routes):
adj = { }
for route in routes:
for stop in route:
if stop not in adj:
adj[stop] = []
for i in range(len(route)):
adj[route[i]].append(route[(i+1) % len(route)])
return adj

Random values with different weights

Here's a question about entity framework that has been bugging me for a bit.
I have a table called prizes that has different prizes. Some with higher and some with lower monetary values. A simple representation of it would be as such:
+----+---------+--------+
| id | name | weight |
+----+---------+--------+
| 1 | Prize 1 | 80 |
| 2 | Prize 2 | 15 |
| 3 | Prize 3 | 5 |
+----+---------+--------+
Weight is this case is the likely hood I would like this item to be randomly selected.
I select one random prize at a time like so:
var prize = db.Prizes.OrderBy(r => Guid.NewGuid()).Take(1).First();
What I would like to do is use the weight to determine the likelihood of a random item being returned, so Prize 1 would return 80% of the time, Prize 2 15% and so on.
I thought that one way of doing that would be by having the prize on the database as many times as the weight. That way having 80 times Prize 1 would have a higher likelihood of being returned when compared to Prize 3, but this is not necessarily exact.
There has to be a better way of doing this, so i was wondering if you could help me out with this.
Thanks in advance

Normally I would not do this in database, but rather use code to solve the problem.
In your case, I would generate a random number within 1 to 100. If the number generated is between 1 to 80 then 1st one wins, if it's between 81 to 95 then 2nd one wins, and if between 96 to 100 the last one win.
Because the random number could be any number from 1 to 100, each number has 1% of chance to be hit, then you can manage the winning chance by giving the range of what the random number falls into.
Hope this helps.
Henry

This can be done by creating bins for the three (generally, n) items and then choose a selected random to be dropped in one of those bins.
There might be a statistical library that could do this for you i.e. proportionately select a bin from n bins.
A solution that does not limit you to three prizes/weights could be implemented like below:
//All Prizes in the Database
var allRows = db.Prizes.ToList();
//All Weight Values
var weights = db.Prizes.Select(p => new { p.Weight });
//Sum of Weights
var weightsSum = weights.AsEnumerable().Sum(w => w.Weight);
//Construct Bins e.g. [0, 80, 95, 100]
//Three Bins are: (0-80],(80-95],(95-100]
int[] bins = new int[weights.Count() + 1];
int start = 0;
bins[start] = 0;
foreach (var weight in weights) {
start++;
bins[start] = bins[start - 1] + weight.Weight;
}
//Generate a random number between 1 and weightsSum (inclusive)
Random rnd = new Random();
int selection = rnd.Next(1, weightsSum + 1);
//Assign random number to the bin
int chosenBin = 0;
for (chosenBin = 0; chosenBin < bins.Length; chosenBin++)
{
if (bins[chosenBin] < selection && selection <= bins[chosenBin + 1])
{
break;
}
}
//Announce the Prize
Console.WriteLine("You have won: " + allRows.ElementAt(chosenBin));

Golden Ratio Fibonacci Hell

In one of my java programs I am trying to read a number and then use the golden ratio (1.618034) to find the next smallest fibonacci number its index. For example, if I enter 100000 I should get back "the smallest fibonacci number which is greater than 100000 is the 26th and its value is 121393".
The program should also calculate a fibonacci number by index (case 1 in the code below) which I have coded so far, but I can't figure out how to solve the problem described above (case 2). I have a horrible teacher and I don't really understand what I need to do. I am not asking for the code, just kind of a step by step what I should do for case 2. I can not use recursion. Thank you for any help. I seriously suck at wrapping my head around this.
import java.util.Scanner;
public class Fibonacci {
public static void main(String args[]) {
Scanner scan = new Scanner(System.in);
System.out.println("This is a Fibonacci sequence generator");
System.out.println("Choose what you would like to do");
System.out.println("1. Find the nth Fibonacci number");
System.out.println("2. Find the smallest Fibonacci number that exceeds user given value");
System.out.println("3. Find the two Fibonacci numbers whose ratio is close enough to the golden number");
System.out.print("Enter your choice: ");
int choice = scan.nextInt();
int xPre = 0;
int xCurr = 1;
int xNew = 0;
switch (choice)
{
case 1:
System.out.print("Enter the target index to generate (>1): ");
int index = scan.nextInt();
for (int i = 2; i <= index; i++)
{
xNew = xPre + xCurr;
xPre = xCurr;
xCurr = xNew;
}
System.out.println("The " + index + "th number Fibonacci number is " + xNew);
break;
case 2:
System.out.print("Enter the target value (>1): ");
int value = scan.nextInt();
}
}
}

First, you should understand what this golden ration story is all about. The point is, Fibonacci numbers can be calced recursively, but there's also a formula for the nth Fibonacci number:
φ(n) = [φ^n - (-φ)^(-n)]/√5
where φ = (√5 + 1)/2 is the Golden Ratio (approximately 1.61803). Now, |(-φ)^(-1)| < 1 which means that you can calc φ(n) as the closest integer to φ^n/√5 (unless n = 1).
So, calc √5, calc φ, then learn how to get an integer closest to the value of a real variable and then calc φ(n) using the φ^n/√5 formula (or just use the "main" [φ^n - (-φ)^(-n)]/√5 formula) in a loop and in that loop compare φ(n) with the number that user input. When φ(n) exceeds the user's number, remember n and φ(n).

Why does Dijkstra's Algorithm use a heap (priority queue)?

I have tried using Djikstra's Algorithm on a cyclic weighted graph without using a priority queue (heap) and it worked.
Wikipedia states that the original implementation of this algorithm does not use a priority queue and runs in O(V2) time.
Now if we just removed the priority queue and used normal queue, the run time is linear, i.e. O(V+E).
Can someone explain why we need the priority queue?

I had the exact same doubt and found a test case where the algorithm without a priority_queue would not work.
Let's say I have a Graph object g, a method addEdge(a,b,w) which adds edge from vertex a to vertex b with weight w.
Now, let me define the following graph :-
Graph g
g.addEdge(0,1,5) ;
g.addEdge(1,3,1) ;
g.addEdge(0,2,2) ;
g.addEdge(2,1,1) ;
g.addEdge(2,3,7) ;
Now, say our queue contains the nodes in the following order {0,1,2,3 }
So, node 0 is visited first then node 1 is visited.
At this point of time the dist b/w 0 and 3 is computed as 6 using the path 0->1->3, and 1 is marked as visited.
Now node 2 is visited and dist b/w 0 and 1 is updated to the value 3 using the path 0->2->1, but since node 1 is marked visited, you cannot change the distance b/w 0 and 3 which (using the optimal path) (`0->2->1->3) is 4.
So, your algorithm fails without using the priority_queue.
It reports dist b/w 0 and 3 to be 6 while in reality it should be 4.
Now, here is the code which I used for implementing the algorithm :-
class Graph
{
public:
vector<int> nodes ;
vector<vector<pair<int,int> > > edges ;
void addNode()
{
nodes.push_back(nodes.size()) ;
vector<pair<int,int> > temp ; edges.push_back(temp);
}
void addEdge(int n1, int n2, int w)
{
edges[n1].push_back(make_pair(n2,w)) ;
}
pair<vector<int>, vector<int> > shortest(int source) // shortest path djkitra's
{
vector<int> dist(nodes.size()) ;
fill(dist.begin(), dist.end(), INF) ; dist[source] = 0 ;
vector<int> pred(nodes.size()) ;
fill(pred.begin(), pred.end(), -1) ;
for(int i=0; i<(int)edges[source].size(); i++)
{
dist[edges[source][i].first] = edges[source][i].second ;
pred[edges[source][i].first] = source ;
}
set<pair<int,int> > pq ;
for(int i=0; i<(int)nodes.size(); i++)
pq.insert(make_pair(dist[i],i)) ;
while(!pq.empty())
{
pair<int,int> item = *pq.begin() ;
pq.erase(pq.begin()) ;
int v = item.second ;
for(int i=0; i<(int)edges[v].size(); i++)
{
if(dist[edges[v][i].first] > dist[v] + edges[v][i].second)
{
pq.erase(std::find(pq.begin(), pq.end(),make_pair(dist[edges[v][i].first],edges[v][i].first))) ;
pq.insert(make_pair(dist[v] + edges[v][i].second,edges[v][i].first)) ;
dist[edges[v][i].first] = dist[v] + edges[v][i].second ;
pred[i] = edges[v][i].first ;
}
}
}
return make_pair(dist,pred) ;
}
pair<vector<int>, vector<int> > shortestwpq(int source) // shortest path djkitra's without priority_queue
{
vector<int> dist(nodes.size()) ;
fill(dist.begin(), dist.end(), INF) ; dist[source] = 0 ;
vector<int> pred(nodes.size()) ;
fill(pred.begin(), pred.end(), -1) ;
for(int i=0; i<(int)edges[source].size(); i++)
{
dist[edges[source][i].first] = edges[source][i].second ;
pred[edges[source][i].first] = source ;
}
vector<pair<int,int> > pq ;
for(int i=0; i<(int)nodes.size(); i++)
pq.push_back(make_pair(dist[i],i)) ;
while(!pq.empty())
{
pair<int,int> item = *pq.begin() ;
pq.erase(pq.begin()) ;
int v = item.second ;
for(int i=0; i<(int)edges[v].size(); i++)
{
if(dist[edges[v][i].first] > dist[v] + edges[v][i].second)
{
dist[edges[v][i].first] = dist[v] + edges[v][i].second ;
pred[i] = edges[v][i].first ;
}
}
}
return make_pair(dist,pred) ;
}
As expected the results were as follows :-
With priority_queue
0
3
2
4
Now using without priority queue
0
3
2
6

Like Moataz Elmasry said the best you can expect is O(|E| + |V|.|logV|) with a fib queue. At least when it comes to big oh values.
The idea behind it is, for every vertex(node) you are currently working on, you already found the shortest path to. If the vertex isn't the smallest one, (distance + edge weight) that isn't necessarily true. This is what allows you to stop the algorithm as soon as you have expanded(?) every vertex that is reachable from your initial vertex. If you aren't expanding the smallest vertex, you aren't guaranteed to be finding the shortest path, thus you would have to test every single path, not just one. So instead of having to go through every edge in just one path, you go through every edge in every path.
Your estimate for O(E + V) is probably correct, the path and cost you determined on the other hand, are incorrect. If I'm not mistaken the path would only be the shortest if by any chance the first edge you travel from every vertex just happens to be the smallest one.
So Dijkstra's shortest path algorithm without a queue with priority is just Dijkstra's path algorithm ;)

For sparse graph, if implement with binary min heap runtime is(E*logV), however if you implement it with Fibonacci heap, runtime would be(VlogV+E).

A heap is the best choice for this task as it guarantees O(log(n)) for adding edges to our queue and to remove the top element. Any other implementation of priority queue would sacrifice in either adding to our queue or removing from it to gain a performance boost somewhere else. Depending on how sparse the graph, then you might find better performance using a different implementation of priority queue, but generally speaking a min-heap is best since it balances the two.
Not the greatest source but: http://en.wikipedia.org/wiki/Heap_(data_structure)

Scaling a number between two values

If I am given a floating point number but do not know beforehand what range the number will be in, is it possible to scale that number in some meaningful way to be in another range? I am thinking of checking to see if the number is in the range 0<=x<=1 and if not scale it to that range and then scale it to my final range. This previous post provides some good information, but it assumes the range of the original number is known beforehand.

You can't scale a number in a range if you don't know the range.
Maybe what you're looking for is the modulo operator. Modulo is basically the remainder of division, the operator in most languages is is %.
0 % 5 == 0
1 % 5 == 1
2 % 5 == 2
3 % 5 == 3
4 % 5 == 4
5 % 5 == 0
6 % 5 == 1
7 % 5 == 2
...

Sure it is not possible. You can define range and ignore all extrinsic values. Or, you can collect statistics to find range in run time (i.e. via histogram analysis).
Is it really about image processing? There are lots of related problems in image segmentation field.

You want to scale a single random floating point number to be between 0 and 1, but you don't know the range of the number?
What should 99.001 be scaled to? If the range of the random number was [99, 100], then our scaled-number should be pretty close to 0. If the range of the random number was [0, 100], then our scaled-number should be pretty close to 1.
In the real world, you always have some sort of information about the range (either the range itself, or how wide it is). Without further info, the answer is "No, it can't be done."

I think the best you can do is something like this:
int scale(x) {
if (x < -1) return 1 / x - 2;
if (x > 1) return 2 - 1 / x;
return x;
}
This function is monotonic, and has a range of -2 to 2, but it's not strictly a scaling.

I am assuming that you have the result of some 2-dimensional measurements and want to display them in color or grayscale. For that, I would first want to find the maximum and minimum and then scale between these two values.
static double[][] scale(double[][] in, double outMin, double outMax) {
double inMin = Double.POSITIVE_INFINITY;
double inMax = Double.NEGATIVE_INFINITY;
for (double[] inRow : in) {
for (double d : inRow) {
if (d < inMin)
inMin = d;
if (d > inMax)
inMax = d;
}
}
double inRange = inMax - inMin;
double outRange = outMax - outMin;
double[][] out = new double[in.length][in[0].length];
for (double[] inRow : in) {
double[] outRow = new double[inRow.length];
for (int j = 0; j < inRow.length; j++) {
double normalized = (inRow[j] - inMin) / inRange; // 0 .. 1
outRow[j] = outMin + normalized * outRange;
}
}
return out;
}
This code is untested and just shows the general idea. It further assumes that all your input data is in a "reasonable" range, away from infinity and NaN.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Get sequence of common Nodes in two Neo4j Paths - neo4j

Related

Creating adjacency list from 2D array of ints

Random values with different weights

Golden Ratio Fibonacci Hell

Why does Dijkstra's Algorithm use a heap (priority queue)?

Scaling a number between two values

Categories

Resources