Computing classes of maximal path-equivalent nodes in a rooted DAG - graph-algorithm

I have a rooted directed acyclic graph with a single root node r.
I'm interested in computing the following equivalence:
"Nodes v and w are maximal path-equivalent iff every maximal path from r contains either both of v and w or none of them"
In particular, I want to find all equivalence classes w.r.t. the above condition, possibly in O(n+m) time (n nodes, m edges).
I feel like this problem is not unknown but I don't know what terms to search for.
If anyone knows what this problem is called or has any ideas on how to solve it, I would appreciate it.

Related

How node2vec works

I have been reading about the node2vec embedding algorithm and I am a little confused how it works.
For reference, node2vec is parametrised by p and q and works by simulating a bunch of random walks from nodes and just running word2vec embeddings on these walks as "sentences". By setting p and q in different ways, you can get more BFS or more DFS type random walks in the simulataion phase, capturing different network structure in the embedding.
Setting q > 1 gives us more BFS behaviour in that the samples of walks comprise of nodes within a small locality. The thing I am confused about is that the paper says this is equivalent to embedding nodes with similar structural properties close to each other.
I don't quite understand how that works. If I have two separate say star/hub structured nodes in my network that are far apart, why would embedding based on the random walks from those two nodes put those two nodes close together in the embedding?
This question has occupied my mind also after reading the article, and more so after empirically seeing that it indeed does that.
I assume you refer to the part in the paper showing the following diagram, states that u and s6 resulting embeddings will be quite similar in the space:
To understand why this indeed happens, first we must understand how the skip-gram model embeds information, which is the mechanism that consumes the random walks.
The skip-gram model eventually generates similar embeddings for tokens that can appear in similar context - but what does that really mean from the skip-gram model perspective?
If we would like to embed the structural equivalence we would favor a DFS-like walk (and additionally we would have to use an adequate window size for the skip-gram model).
So random walks would look like
1. s1 > u > s4 > s5 > s6 > s8
2. s8 > s6 > s5 > s4 > u > s1
3. s1 > s3 > u > s2 > s5 > s6
4. s7 > s6 > s5 > s2 > u > s3
.
.
n. .....
What will happen is that there would be many walks, where u and s6 appear in walks where their surroundings are the same. Since their surroundings will be similar it means that their context is similar and as stated similar context == similar embeddings.
One might further ask what about order? Well order doesn't really matter, since the skip-gram model uses the window size to generate pairs out of every sentence, in the link I provided you can further understand this concept.
So bottom line, if you can create walks that will create similar context for two nodes, their embeddings will be similar.
My understanding of the two sampling strategies goes like this:
DFS: for each node (a) the walk explores a wide context, containing not just the immediate neighbors (b), but also nodes further away (c). When optimizing the embedding and trying to get nodes closer which have similar context, the optimizer has to consider not just the relation of (a)-(b), but also (b)-(c), and so on. This is the same as trying to place nodes so that their distance in the network is conserved (each node trying to find its place based on a wide context).
BFS: for each node (a) the walk only explores the local context, but it does that extensively, so probably all neighbors (b1, b2, ...) will be included (and maybe some 2nd neighbors). Imagine trying to find a nodes place in the embedding space, while only having information on their neighbors. Nodes, that have similarly embedded neighbors should be close, e.g. dangling nodes with only 1 neighbor (and thus respective walk containing the source node many times), or nodes with two neighbors which have high degrees (i.e. a bridges connecting two hubs). So by only knowing the local information the embedding will not optimize for global distances, thus the result is not based on the actual graph structure, but rather on local patterns (called structural equivalence in the paper, just to make it confusing)
BUT!!! I tried reproducing the results for the network of Les Miserables with the parameters used in the original paper (p=1 q=0.5 and p=1 q=2), and couldn't get node2vec to do this 2nd type structural embedding thing. There is something fishy going on, as others also struggle with getting node2vec to embed structurally, here is a paper on it. If someone was able to reproduce their results please tell me how :)

a new edge is insert to a Minimum spanning tree

I trying to find an algorithm to the following question with one different :
the edge are not distinct.
Give an efficient algorithm to test if T remains the minimum-cost spanning tree with the new edge added to G.
in this link- there is a solution but it is not for the different I wrote up:
the edges are not nessecerliy distinct.
Updating a Minimum spanning tree when a new edge is inserted
someone has an idea?
Well, the naive approach of just using Prim or Kruskal to find the min cost spanning tree of the new graph and then see which one has a lower total cost isn't too bad at O(|E|log|E|).
But we don't need to look at the whole graph.
Suppose your new edge connects vertices A and B. Let C be the parent of A. If B is not a descendent of A, then if A-B is lower cost than A-C, then T is no longer the MST and B should be the new parent of the subtree rooted at A.
If B is a descendant of A, then if A-B is shorter than any of the branches in T along the path from A to B, then T is no longer the MST, and the highest cost edge along that path should be removed, B is the root of the newly disconnected component, and should be added as a child of A.
I believe you may need to check these things a second time, reversing which vertices are A and B. The complexity of this is log|V| where the base of the log is the average number of children per node of T. In the case of T being a straight line, it's O(|V|), but otherwise, I think you could say it is O(log|V|).
First find an MST using one of the existing efficient algorithms.
Now adding an edge (v,w) creates a cycle in the MST. If the newly added edge has the maximum cost among the edges on the cycle then the MST remains as it is. If some other edge on the cycle has the maximum cost, then that's the edge to be removed to get a tree with lower cost.
So we need an efficient way to find the edge with the maximum value on the cycle. You can climb from v and w until you reach LCA(v, w) (the least common ancestor of v and w) to get the edge with the max cost. This takes linear time in the worst case.
If you are going to answer multiple such queries then pre-processing the MST is probably better. You can pre-process the MST to get a sparse table data structure in O(N lg N) time and then use this data structure to answer max queries in O(lg N) time in the worst case.

Graph Algorithm : Similar to TSP

I want to solve a problem similar to the TSP( Travelling Salesman Problem).
I have N ( N > 0, N < 20 ) nodes and i must visit all nodes.
The cost between nodes are equal.
I can visit a node unlimited times.
I want to find more than one path and the cost have not restriction.
Tell me some effective algorithms about this problem?
Here is a solution that works with a weighted graph.
First, the naive solution, enumerating.
It works in O(n!) because there are (n-1)! Hamiltonian paths, and you need O(n) to check each one.
There is better algorithm, with dynamic programming in O(n*2^n)
Define the state as the following: for x a node, and S a set of nodes containing x:
w[S][x] = the weight of the shortest path that start at node x, and goes through all the node in the set S, and then finishes at 0.
Note that 0 does not necessarily belongs to S.
S = {x} is the basic case: w[S][x] = weight(w,0)
Then the recursion formula:
If S is larger than, {x}, Iterate over the possible next step y
w[S][x] = min(weight(x,y) + w[S\x][y] for all y in S\x)
This algorithm will output just one optimal path.

How to design an O(m) time algorithm to compute the shortest cycle of G(undirected unweighted graph) that contains s?

How to design an O(m) time algorithm to compute the shortest cycle of G(undirected unweighted graph) that contains s(s ∈ V) ?
You can run a BFS from your node s as starting point, this will give you a BFS-tree. Afterwards you can built a lowest-common-ancestor (LCA) data structure on this BFS-tree. This can be done for example with Tarjan's lowest-common-ancestor algorithm. I will not got into details here. Given two nodes v and w, LCA lets you find the lowest node in a tree (the BFS-tree in our case) that has v and w as descendents. The idea is when you are considering two nodes that are connected in our BFS-tree you want to check if their paths to the root (s is this case) + the edge that connects them forms a cycle (with s). This is the case if their LCA is s.
Assuming you have built the LCA, you run a second BFS. When expanding the neighbours of a node v, you also take into consideration the nodes already marked as explored. Suppose x is a neighbour of v such that x has already been explored. If the LCA of v and x is s then the path from x to s and form v to s in the BFS-tree plus the edge xv forms a cycle. The first x and v that you encounter in your second BFS gives you the desired result. If no such x exist then s is not contained in any cycle.
The cycle is also the shortest containing s.
The two BFS run in O(m) and the LCA construction can also be done in linear time, hence the whole procedure can be implemented in O(m).
This might a bit overkill. There surely is a much simpler solution.

Decide Whether All Shortest Paths From s to t Contain The Edge e

Let G = (V;E) be a directed graph whose edges all have non-negative weights. Let s,t be 2 vertices in V, and let e be an edge in E.
Describe an algorithm that decides whether all shortest paths from s to t contain the edge e.
Well, this is how you can achieve Dijsktra's time complexity:
Simply run Dijkstra from s and calculate delta(s,t) (the weight of the shortest path from s to t).
Remove the edge e, and run Djikstra again from s in the new graph.
If delta(s,t) in the new graph has increased, it means that all shortest paths from s to t contain the edge e, otherwise it's not true.
I was wondering whether there is a more efficient algorithm for solving this problem. Do you think that it's possible to beat Dijkstra's time complexity ?
Thanks in advance
Your approach sounds correct to me. You just calculate the shortest path with and without the possibility of taking edge e. That gives you 2 Dijkstra searches.
There is room for improvement if you go to A*, bidirectional search or recover your Dijkstra search tree:
A* would speed up your Dijkstra-query but it might not be possible for your graph as you need to be able to define a good bound on your remaining distance.
bidirectional search could be done with both searches meeting around the edge. You can then examine all paths with and without the edge with only 1 fast bidirectional query+ some extra work for both cases instead of having 2 full Dijkstra's that are very similar
you could search once without the edge and maintain your search tree. Then you add e and update the shortest path tree starting from the start point of e. If the label of the end point > the label of the start point + length e, the end point can be reached faster when using e. Recursively search the neighbours of your end point and only update the distances if they could be reached faster than before. Should save you some work.

Resources