Removing one edge from a directed graph G - machine-learning

What will happen if we remove one edge (new graph G') from a directed graph G? For example, does the probability distribution P that factorizes G also factorize G′? and what will happen if G and G′ were undirected graphs?
Any help will be appreciated!

Think of minimal I-map. Minimal I-map is part of the definition of Bayesian Network. That will lead you in the right direction.

Related

Can I use Breadth-First-Search on weighted graphs if I modify it?

I am having a discussion with a friend if the following will work:
We recently learned in a lecture about Breadth-First-Search. I know that it is a special case of Dijkstra where each edge weight is set to one. Assume now we are given a graph where the edges have integer weights of more than one. Then I would modify this graph by introducing additional vertices and connecting them by edges with weight one, e.g. assume we have an edge of weight 3 connecting the vertices u and v, then I would introduce dummy-vertices d1, d2, remove the edge connecting u and v and instead add edges {u, d1}, {d1, d2}, {d2,v} of weight one.
If I modify my whole graph this way and then apply breadth-first search starting from one of the original vertices, wouldn't this work as well?
Thank you very much in advance!
Since BFS is guaranteed to return an optimal path on unweighted graphs, and you've created the unweighted equivalent of your original graph, you'll be guaranteed to get the shortest path.
What you lose by doing this over Dijkstra's algorithm is runtime optimality. Now the runtime of your algorithm is dependent on the edge weights, whereas Dijkstra's is only dependent on the number of edges.
This sort of thought experiment is a great way to understand how Dijkstra's algorithm works (eg. how would you modify your algorithm to not require creating a new graph? Or not take 100 steps for an edge with weight 100?). In fact this is probably how Dijkstra discovered the algorithm to begin with.

How to minimize the maximum cost of a path in a vertex disjoint path cover?

Given a directed weighted graph G and n, where n is the number of paths to be used to cover all the vertices in the graph G. How can I minimize the maximum cost of the longest path? (assuming that a solution always exist in this graph)
For n = 1, this obviously becomes a Travelling Salesman Problem - which is NP-hard. Thus, I wouldn't look for exact algorithms in your case.
My guess would be that a good solution for small n would be to use one of the abundant algorithms for the Travelling Salesman Problem (which usually approximate optimal solutions quite good) and then remove the (n-1) heaviest edges from the found path. That way you end with n paths.
The Wikipedia Article on TSP actually lists some pretty easy algorithmic techniques which should give you a reasonably good approximation.

How to find maximal eulerian subgraph?

How to find maximal eulerian subgraph of a given graph? By "maximal" I mean subgraph with maximal number of edges, vertices, or both. My idea is to find basis of cycle space and combine basis cycles in a proper way, but I don't know how to do it (and is it a good idea or not).
UPD. Source graph is connected.
Some thoughts. Graph is eulerian iff it is connected (with possible isolated vertices) and all vertices have even degree.
It is 'easy' to satisfy second criteria by removing (shortest) paths between pairs of odd degree vertices.
Connectivity is problematic since removing edges can produce unconnected graph.
An example which shows that 'simple' (greedy) solution is not easy to produce. Modify complete graph K5 by splitting each edge in two edges (or more). Take two these modified K5 graph and from each one take two vertices (A, B from first and C, D from second). Connect A-C and B-D. Greedy approach would remove these added edges since they are the shortest paths. With that graph becomes unconnected. Solution would be to remove paths A-B and C-D.
It seems to me that algorithm should take a care about subgraph connectivity while removing edges. For sure algorithm should preserve that each subset of odd degree vertices, of which no pair are used to remove path between them, should have connectivity larger than cardinality of subset.
I would try (for a test) with recursive brute force solution with optimization. O is list of odd degree vertices.
def remove_edges(O, G):
if O is empty:
return solution
for f in O:
for t in O\{f}":
G2 = G without path edges between (f,t)
if G2 is unconnected:
continue
return remove_edges(O\{f,t}, G2)
Optimization can be to order sets O and O{f} by vertices that have shortest paths. That can be done by finding shortest lengths between all pairs of vertices from O before removing edges. That can be done by BFS from each O vertex.
It is proved in 1979 that determining if a given graph contains a spanning Eulerian subgraph is NP-complete.
Ref: W. R. Pulleyblank, A note on graphs spanned by
Eulerian graphs, J. Graph Theory 3, 1979, pp.
309–310,
Please refer to this
Finding the maximum size (number of edges) of spanning Eulerian subgraph of a graph (if it exists) is an active research area.
Consider the following standard definitions. Given a graph G = (V, E)
A circuit is a sequence of adjacent vertices starting and ending at
the same vertex. Circuits do not allow repeated edges but they do allow
repeated vertices.
A cycle is a special case of a circuit in which vertices also do not
repeat.
Note that circuits and Eulerian subgraphs are the same thing. This means that finding the longest circuit in G is equivalent to finding a maximum Eulerian subgraph of G. As noted above, this problem is NP-hard. So, unless P=NP, an efficient (i.e. polynomial time) algorithm for finding a maximal Eulerian subgraph in an arbitrary graph is impossible.
For undirected graphs, one way of randomly producing an Eulerian subgraph is to identify a cycle basis for G. A cycle basis is a set of cycles that, when combined using symmetric differences, can be used to form every Eulerian subgraph of the original graph G. Hence, we only need to take a random selection of cycles from this set and combine them to get our arbitrary Eulerian subgraph.
Given that an Eulerian subgraph is basically a collection of overlapping cycles, here is a greedy, polynomial-time algorithm that I'd like to suggest for finding large (but not necessarily maximum) Eulerian subgraphs. This works for both directed and undirected graphs and produces a set of edges (or arcs) E’ that define an Eulerian subgraph containing a user-defined source vertex s. The following steps are for directed graphs but can be easily modified for the undirected case.
Let U = {s} and E' = {}
while U is not empty
Let u be a random element in U
Form a cycle C from u in G
if no such cycle C exists
Remove u from U
else
Add the arcs of C to E'
Remove the arcs of C from G
Add the vertices of C to U
Here’s a few points to note about this algorithm.
Here, the set U holds the vertices that are yet to be fully considered by the algorithm.
To apply this method to undirected graphs, just replace the word
"arcs" with "edges"
This method can be seen as a generalisation of
Hierholzer's algorithm. Hence, if the input graph G is already
an Eulerian graph, then the returned set E’ will contain all of the
edges from G.
Various methods can be used to generate a cycle C from
vertex u. For directed graphs, a simple method is to create an
additional dummy vertex u' and temporarily redirect all of the incoming arcs
from u to u'. Various algorithms can then be used to determine a
u-u'-path (which represents a cycle), such as BFS, DFS, or
Wilson's algorithm.
This algorithm can be said to produce a maximal Eulerian subgraph with respect to G and s. This is because, on termination, no further cycles can be added to the solution contained in E'. Note that we should not confuse the terms maximal and maximum here: finding a maximal Eulerian subgraph is easy (using the above method); finding a maximum Eulerian subgraph is NP-hard. Similar terminology is used with matchings.

TSP Where Vertices Can be Visited Multiple Times

I am looking to solve a problem where I have a weighted directed graph and I must start at the origin, visit all vertices at least once and return to the origin in the shortest path possible. Essentially this would be a classic example of TSP, except I DO NOT have the constraint that each vertex can only be visited once. In my case any vertex excluding the origin can be visited any number of times along the path, if this makes the path shorter. So for example in a graph containing the vertices V1, V2, V3 a path like this would be valid, given that it is the shortest path:
ORIGIN -> V1 -> V2 -> V1 -> V3 -> V1 -> ORIGIN
As a result, I am a bit stuck on what approach to take in order to solve this, as a classic dynamic programming algorithm approach which is usually used to solve TSP problems in exponential time is not suitable.
The typical approach is to create a distance matrix that gives the shortest-path distance between any two nodes. So d(i,j) = shortest path (following the edges of the network) from i to j. This can be done using Dijkstra's algorithm.
Now just solve a classical TSP with distances d(i,j). Your TSP doesn't "know" that the actual route followed might involve visiting a node multiple times. At the same time, it will ensure that the vehicle stops at every node.
Now, as for efficiency: As #Codor points out, TSP is NP-hard and so is your variant of it, so you are not going to find a provably optimal, polynomial-time algorithm. However, there are still many, many good algorithms (both heuristic and exact) for TSP, and most of them should be suitable for your problem. (In general, DP is not the way to go for TSP.)
To answer the question in part, the problem described in the question does not admit a polynomial-time algorithm unless P=NP by the following argument. Clearly, the proposed problem includes instances which are Euclidean. However, no optimal solution to a Euclidean instance has repeated nodes, as such a solution can be improved by deleting additional nodes, using the triangle inequality. However, according to the Wikipedia article on TSP, Euclidean TSP is still NP-hard. This means that any polynomial-time algorithm for the problem in the question would be able to solve the Euclidean TSP to optimality on polynomial time, which is impossible unless P=NP.

how to decide p of ACF and q of PACF in AR, MA, ARMA and ARIMA?

I am confused about how to calculate p of ACF and q of PACF in AR, MA, ARMA and ARIMA. For example, in R, we use acf or pacf to get the best p and q.
However, based on the information I have read, p is the order of AR and q is the order of MA. Let's say p=2, then AR(2) is supposed to be y_t=a*y_t-1+b*y_t-2+c. We can calculate acf function (in R) when lag=1,2,3.... to find which lag brings the biggest acf function value. The same thing happens to MA for deciding q. But, does this mean that p and q have already been set up?
I guess here is the steps. But I am not sure if I am right.
So, let's say in R's functions acf and pacf, is this the real process:
1. For p=1, set lag=1,2,3,...max to see which lag has the biggest autocorrelation value.
2. For p=2,3,4..., do the same thing to find the lags.
3. Compare those values with each other. Let's say the biggest autocorrelation value comes when p=2 and lag=4, when we say the order of AR, ie. p, is 2?
Cloud anyone please give me an example showing exactly how to estimate p and q?
This isn't a good stackoverflow question. You want to be on the Math site for this. To answer your question, though, there isn't one single generally accepted method for finding the optimal p and q.
Generally, what most people tend to do, is eyeball it using pacf visualizations (in which case, as you observe, you can't distinguish whether to put time into p or q) and set p == q.
An alternative way to do it, would be to try estimating your time series with different values of p and q, in a grid search, and pick the combination that maximizes some estimator like log likelihood or out-of-sample error, or whatever makes sense on your dataset.
If I might suggest, however, you probably want to start by looking at the rather extensive body of research on arima models and see how others have done this - that really should be your first step for questions like this.
PACF plot for most optimal in the AR(p) model, ACF plot for most optimal in the MA(q) model

Resources