I have two DiGraphs, say G and H, and I would like to count how many paths of G are part of H.
For any node pairs (src, dst) I can generate the paths between them using the 'all_simple_paths' function to get the generators:
G_gen = nx.all_simple_paths(G, src, dst)
H_gen = nx.all_simple_paths(H, src, dst)
Since the amount of paths is considerably high (the graphs have typically 100 nodes) I cannot resort to building lists etc.. (e.g. list(G_gen)) so I am wondering if there are smarter ways to deal with it. In addition, I would also like to distinguish based on the path lengths.
.. or maybe a better solution can be found with a different module ?
Thanks in advance for any help on this.
Thierry
I wonder if there is some reason why nx.intersection (see here) wouldn't work here? I'm not sure if it checks for direction under the hood but it doesn't seem to force outputs to standard Graph output either. Below might work:
# Create a couple of random preferential attachment graphs
G = nx.barabasi_albert_graph(100, 5)
H = nx.barabasi_albert_graph(100, 5)
# Convert to directed
G = G.to_directed()
H = H.to_directed()
# Get intersection
intersection = nx.intersection(G, H)
# Print info for each
print(nx.info(G))
print(nx.info(H))
print(nx.info(intersection))
which outputs:
>>> DiGraph with 100 nodes and 950 edges
>>> DiGraph with 100 nodes and 950 edges
>>> DiGraph with 100 nodes and 176 edges
The nodes are all shared in the example since the node ids are just simple integers and so they follow the same generation index. With real data I suppose your node sets might not be equivalent like here and you probably will see differences there too.
On the path lengths I'm not quite sure how you would go about that. The intersection just checks which nodes and edges are shared between two graphs and returns those that are in both, unaware of any other conditions I suspect. There might be a way to impose some additional constraints by adapting the source code with of the intersection function with some conditional checks.
I guess this doesn't check the number of paths but rather the number of edges, so I suppose you're looking for something more specific than this. But at the very least no path can exist outside of the intersection, since all shared paths must contain the same edges in both (since if an edge is missing from a path in either, it cannot exist as a path in the shared solution).
Hope this helps in some way shape or form, though I feel I've oversimplified your question quite a bit.
EDIT: Intuitively, the full solution to your question might be to simply enumerate all possible paths in the intersection.
I am struggling to find 1 efficient algorithm which will give me all possible paths between 2 nodes in a directed graph.
I found RGL gem, fastest so far in terms of calculations. I am able to find the shortest path using the Dijkstras Shortest Path Algorithm from the gem.
I googled, inspite of getting many solutions (ruby/non-ruby), either couldn't convert the code or the code is taking forever to calculate (inefficient).
I am here primarily if someone can suggest to find all paths using/tweaking various algorithms from RGL gem itself (if possible) or some other efficient way.
Input of directed graph can be an array of arrays..
[[1,2], [2,3], ..]
P.S. : Just to avoid negative votes/comments, unfortunately I don't have inefficient code snippet to show as I discarded it days ago and didn't save it anywhere for the record or reproduce here.
The main problem is that the number of paths between two nodes grows exponentially in the number of overall nodes. Thus any algorithm finding all paths between two nodes, will be very slow on larger graphs.
Example:
As an example imagine a grid of n x n nodes each connected to their 4 neighbors. Now you want to find all paths from the bottom left node to the top right node. Even when you only allow for moves to the right (r) and moves up (u) your resulting paths can be described by any string of length 2n with equal number of (r)'s and (u)'s. This will give you "2n choose n" number of possible paths (ignoring other moves and cycles)
I am new in neo4j and trying to understand how can I optimize routing queries.
I am working with OSM db.
and I am trying to calclulate the distance from one point to another.
START a=node(760119)
MATCH path=(a)-[:NEXT|NODE*1..30]-(c)
WHERE HAS(c.node_osm_id) AND c.node_osm_id=283103898
RETURN DISTINCT reduce(
distance = 0, n in filter(
x in path where has(x.length)
) | distance + n.length
) AS distance order by distance
My query returns a set of distances.
319.5609607071325
320.0901127819706
321.64043860878735
332.13372820085
334.21320610250484
How can i rewrite the query, to stop looking for new paths if the distance is longer than the shortest.
Thanks in advance.
Cypher doesn't have support for shortest path with cost evaluation yet (as of 2.0-RC1). If you need to use a more efficient shortest path algorithm, you'll need to implement an unmanaged extension.
However, I do see where you might be able to improve your performance... have you tried adding C as a start point query?
The teacher told us that every time you divide something by 2, the run-time is likely to be log n. For instance, if we divide an array into two, each time we traverse one of the array, the run-time would be log n. However, we may run into a case with LinkedList where we may be easily misled. For instance, we may have an algorithm to find the maximum of the list by starting from either the head or the tail in order to have a run-time of less than n. Logically, we may think that the run time would be log n, but it's not. Why is that? And how do you determine that?
Starting from the front or back (or alternating) does not change the basis of the search for the greatest value. All it does is reorder the search strategy.
If you have a sequential, ordered list and you do a binary search, each comparison reduces the possible locations for a match by 1/2.
If you look at one element of the linked list, each comparison reduces the possible locations for a match by 1 element.
That is a crucial difference.
I need to compare a query bit sequence with a database of up to a million bit sequences. All bit sequences are 100 bits long. I need the lookup to be as fast as possible. Are there any packages out there for fast determination of the similarity between two bit sequences? --Edit-- The bit sequences are position sensitive.
I have seen a possible algorithm on Bit Twiddling Hacks but if there is a ready made package that would be better.
If the database is rather static, you may want to build a tree data structure on it.
Search the tree recursively or in multiple threads and per search keep an actual difference variable. If the actual difference becomes greater than what you would consider 'similar', abort the search.
E.g. Suppose we have the following tree:
root
0 1
0 1 0 1
0 1 0 1 0 1 0 1
If you want to look for patterns similar to 011, and only want to allow 1 different bit at most, search like this (recursively or multi-threaded):
Start at the root
Take the left branch (0), this is similar, so difference is still 0
Take the left branch (0), this is different, so difference becomes 1, which is still acceptable
take the left branch (0), this is different, so difference becomes 2, which is too high. Abort looking in this branch.
take the right branch (1), this is equal, so difference remains 1, continue to search in this branch (not shown here)
Take the right branch (1), this is equal, so difference remains 0, go on
take the left branch (0), this is different, so difference becomes 1, which is still acceptable, go on.
This goes on until you have found your bit patterns.
If your bit patterns are more dynamic and being updated in your application, you will have to update the tree.
If memory is a problem, consider going to 64-bit.
If you want to look up the, let's say 50, most matching patterns, and we can assume that the input data set is rather static (or can be dynamically updated), you can repeat the initial phase of the previous comment, so:
For every bit pattern, count the bits.
Store the bit patterns in a multi_map (if you use STL, Java probably has something similar)
Then, use the following algorithm:
Make 2 collections: one for storing the found patterns, one for storing possibly good patterns (this second collection should probably be map, mapping 'distances' to patterns)
Take your own pattern and count the bits, assume this is N
Look in the multimap at index N, all these patterns will have the same sum, but not necessarily be completely identical
Compare all the patterns at index N. If they are equal store the result in the first collection. If they are not equal, store the result in the second collection/map, using the difference as key.
Look in the multimap at index N-1, all these patterns will have a distance of 1 or more
Compare all the patterns at index N-1. If they have a distance of 1, store them in the first collection. If they have a larger distance, store the result in the second collection/map, using the difference as key.
Repeat for index N+1
Now look in the second collection/map and see if there is something stored with distance 1. If it is, remove them from the second collection/map and store them in the first collection.
Repeat this for distance 2, distance 3, ... until you have enough patterns.
If the number of required patterns is not too big, and the average distance is also not too big, then the number of real compares between patterns is probably only a few %.
Unfortunately, since the patterns will be distributed using a Gaussian curve, there will still be quite some patterns to check. I didn't do a mathematical check on it, but in practice, if you don't want too many patterns out of the millions, and the average distance is not too far, you should be able to find the set of most-close patterns by checking only a few percent of the total bit patterns.
Please keep me updated of your results.
I came up with a second alternative.
For every bit pattern of the million ones count the number of bits and store the bit patterns in an STL multi_map (if you're writing in C++).
Then count the number of bits in your pattern. Suppose you have N bits set in your bit pattern.
If you now want to allow at most D differences, look up all the bit patterns in the multi_map having N-D, N-D+1, ..., N-1, N, N+1, ... N+D-1, N+D bits.
Unfortunately, the division of bit patterns in the multi_map will follow a Gaussian pattern, which means that in practice you will still have to compare quite some bit patterns.
(Originally I thought this could be solved by counting even 0's and uneven 1's but this isn't true.)
Assuming that you want to allow 1 difference, you have to look up 3 slots in the multi_map out of the 100 possible slots, leaving you with 3% of the actual bit patterns to do a full compare.