How can I find max of multiplication of subtree nodes? - graph-algorithm

A tree that has N nodes is given. Nodes have weights between -1000 and 1000. That is asked from me is to find maximum of multiplication of subtree nodes. Do you have any idea/algorithm to solve this problem?
For example

I'd just traverse the tree keeping track of a cumulative product until you reach the leaf node.
Select the leaf node with the highest product.
Optionally:
Along the way you /can/ optimize by traversing only the highest child of each node.
Note For this optimization we assume that no negative values exist.

Related

Identifying nodes with many, but thin connections

I need to identify (and label) nodes with many relationships (say, 10 on a possible scale of 1 to 60) but weak weights for the relationships (say 1 or 2 on a possible scale of 1 to 100). I could write a Cypher query to find them. I don’t need help with that. What I want to ask is, is there a GDS metric for this?
You could use a combination of degree and weighted degree.
If you want to construct such a gds graph, you could use the subgraph option that allows you to filter on mutated properties

Can I use Breadth-First-Search on weighted graphs if I modify it?

I am having a discussion with a friend if the following will work:
We recently learned in a lecture about Breadth-First-Search. I know that it is a special case of Dijkstra where each edge weight is set to one. Assume now we are given a graph where the edges have integer weights of more than one. Then I would modify this graph by introducing additional vertices and connecting them by edges with weight one, e.g. assume we have an edge of weight 3 connecting the vertices u and v, then I would introduce dummy-vertices d1, d2, remove the edge connecting u and v and instead add edges {u, d1}, {d1, d2}, {d2,v} of weight one.
If I modify my whole graph this way and then apply breadth-first search starting from one of the original vertices, wouldn't this work as well?
Thank you very much in advance!
Since BFS is guaranteed to return an optimal path on unweighted graphs, and you've created the unweighted equivalent of your original graph, you'll be guaranteed to get the shortest path.
What you lose by doing this over Dijkstra's algorithm is runtime optimality. Now the runtime of your algorithm is dependent on the edge weights, whereas Dijkstra's is only dependent on the number of edges.
This sort of thought experiment is a great way to understand how Dijkstra's algorithm works (eg. how would you modify your algorithm to not require creating a new graph? Or not take 100 steps for an edge with weight 100?). In fact this is probably how Dijkstra discovered the algorithm to begin with.

How to find maximal eulerian subgraph?

How to find maximal eulerian subgraph of a given graph? By "maximal" I mean subgraph with maximal number of edges, vertices, or both. My idea is to find basis of cycle space and combine basis cycles in a proper way, but I don't know how to do it (and is it a good idea or not).
UPD. Source graph is connected.
Some thoughts. Graph is eulerian iff it is connected (with possible isolated vertices) and all vertices have even degree.
It is 'easy' to satisfy second criteria by removing (shortest) paths between pairs of odd degree vertices.
Connectivity is problematic since removing edges can produce unconnected graph.
An example which shows that 'simple' (greedy) solution is not easy to produce. Modify complete graph K5 by splitting each edge in two edges (or more). Take two these modified K5 graph and from each one take two vertices (A, B from first and C, D from second). Connect A-C and B-D. Greedy approach would remove these added edges since they are the shortest paths. With that graph becomes unconnected. Solution would be to remove paths A-B and C-D.
It seems to me that algorithm should take a care about subgraph connectivity while removing edges. For sure algorithm should preserve that each subset of odd degree vertices, of which no pair are used to remove path between them, should have connectivity larger than cardinality of subset.
I would try (for a test) with recursive brute force solution with optimization. O is list of odd degree vertices.
def remove_edges(O, G):
if O is empty:
return solution
for f in O:
for t in O\{f}":
G2 = G without path edges between (f,t)
if G2 is unconnected:
continue
return remove_edges(O\{f,t}, G2)
Optimization can be to order sets O and O{f} by vertices that have shortest paths. That can be done by finding shortest lengths between all pairs of vertices from O before removing edges. That can be done by BFS from each O vertex.
It is proved in 1979 that determining if a given graph contains a spanning Eulerian subgraph is NP-complete.
Ref: W. R. Pulleyblank, A note on graphs spanned by
Eulerian graphs, J. Graph Theory 3, 1979, pp.
309–310,
Please refer to this
Finding the maximum size (number of edges) of spanning Eulerian subgraph of a graph (if it exists) is an active research area.
Consider the following standard definitions. Given a graph G = (V, E)
A circuit is a sequence of adjacent vertices starting and ending at
the same vertex. Circuits do not allow repeated edges but they do allow
repeated vertices.
A cycle is a special case of a circuit in which vertices also do not
repeat.
Note that circuits and Eulerian subgraphs are the same thing. This means that finding the longest circuit in G is equivalent to finding a maximum Eulerian subgraph of G. As noted above, this problem is NP-hard. So, unless P=NP, an efficient (i.e. polynomial time) algorithm for finding a maximal Eulerian subgraph in an arbitrary graph is impossible.
For undirected graphs, one way of randomly producing an Eulerian subgraph is to identify a cycle basis for G. A cycle basis is a set of cycles that, when combined using symmetric differences, can be used to form every Eulerian subgraph of the original graph G. Hence, we only need to take a random selection of cycles from this set and combine them to get our arbitrary Eulerian subgraph.
Given that an Eulerian subgraph is basically a collection of overlapping cycles, here is a greedy, polynomial-time algorithm that I'd like to suggest for finding large (but not necessarily maximum) Eulerian subgraphs. This works for both directed and undirected graphs and produces a set of edges (or arcs) E’ that define an Eulerian subgraph containing a user-defined source vertex s. The following steps are for directed graphs but can be easily modified for the undirected case.
Let U = {s} and E' = {}
while U is not empty
Let u be a random element in U
Form a cycle C from u in G
if no such cycle C exists
Remove u from U
else
Add the arcs of C to E'
Remove the arcs of C from G
Add the vertices of C to U
Here’s a few points to note about this algorithm.
Here, the set U holds the vertices that are yet to be fully considered by the algorithm.
To apply this method to undirected graphs, just replace the word
"arcs" with "edges"
This method can be seen as a generalisation of
Hierholzer's algorithm. Hence, if the input graph G is already
an Eulerian graph, then the returned set E’ will contain all of the
edges from G.
Various methods can be used to generate a cycle C from
vertex u. For directed graphs, a simple method is to create an
additional dummy vertex u' and temporarily redirect all of the incoming arcs
from u to u'. Various algorithms can then be used to determine a
u-u'-path (which represents a cycle), such as BFS, DFS, or
Wilson's algorithm.
This algorithm can be said to produce a maximal Eulerian subgraph with respect to G and s. This is because, on termination, no further cycles can be added to the solution contained in E'. Note that we should not confuse the terms maximal and maximum here: finding a maximal Eulerian subgraph is easy (using the above method); finding a maximum Eulerian subgraph is NP-hard. Similar terminology is used with matchings.

How can a tree be encoded as input to a neural network?

I have a tree, specifically a parse tree with tags at the nodes and strings/words at the leaves. I want to pass this tree as input into a neural network all the while preserving its structure.
Current approach
Assume we have some dictionary of words w1,w2.....wn
Encode the words that appear in the parse tree as n dimensional binary vectors with a 1 showing up in the ith spot whenever the word in the parse tree is wi
Now how about the tree structure? There are about 2^n possible parent tags for n words that appear at the leaves So we cant set a max length of input words and then just brute force enumerate all trees.
Right now all i can think of is to approximate the tree by choosing the direct parent of a leaf. This can be represented by a binary vector as well with dimension equal to number of different types of tags - on the order of ~ 100 i suppose.
My input is then two dimensional. The first is just the vector representation of a word and the second is the vector representation of its parent tag
Except this will lose a lot of the structure in the sentence. Is there a standard/better way of solving this problem?
You need a recursive neural network. Please see this repository for an example implementation: https://github.com/erickrf/treernn
The principle of a recursive (not recurrent) neural network is shown in this picture.
It learns representation of each leaf, and then goes up through the parents to finally construct the representation of the whole structure.
Encode tree structure: Think of Recurrent Neural Network, which you have one chain which can be construct by for loop. But here you have a tree. So you would need do some kind of loop with branch. Recursive function call might work with some Python overhead.
I suggest you build neural network with 'define by run' framework (like Chainer, PyTorch) to reduce overhead. Because your tree may have to be rebuild different for each data sample, which require to rebuilding computation graph.
Read Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks, with original Torch7 implementation here and PyTorch implementation, you may have some ideal.
For encoding a tag at node, I think an easiest way would be encoding them as you do with word.
For example, a node data is [word vector][tag vector]. If node is leaf, you have word, but may not have tag (you did not say that there is tag at leaf node), so leaf data representation is [word][zero vector] (or [word vector][tag vector]). The case inner node that does not have word=> [zero vector][tag vector]. Then, you have inner node and leaf with same vector dimension of data representation. You may treat them equally (or not :3)
Encode each leaf node using (i) the sequence of nodes that connects it to the root node and (ii) the encoding of the leaf node that comes before it.
For (i), use a recurrent network whose input is tags. Feed this RNN the root tag, the second level tag, ..., and finally the parent tag (or their embeddings). Combine this with the leaf itself (the word or its embedding). Now, you have a feature that describes the leaf and its ancestors.
For (ii), also use a recurrent network! Simply start by computing the feature described above for the left most leaf and feed it to a second RNN. Keep doing this for each leaf moving from left to right. At each step, the second RNN will give you a vector that represents the current leaf with its ancestors, the leaves that come before it and their ancestors.
Optionally, do (ii) bi-directionally and you will get a leaf feature that incorporates the whole tree!

Graph as output of a Classifier

It seems that I have a very common task, but I'm missing some keywords that would help me to find the information. So I state my task.
There are Persons. A set of variables is known about each person. A pair of persons P1 and P2 can be in one of the following relationships (which are the classes) :
Parent-child
siblings
partners (the significant ones)
other (some indirect relative or non a family member)
By selecting some variables of the pairs (Pi, Pk) with known relationships, I can train a Naive Bayes Classifier to predict the class. This is good.
Now. I have a set of persons P1, P2, ... Pm, and I need to build the most probable graph representing the family tree. I could use my Bayes Classifier pairwise, but in this case I wouldn't use a lot of information that is stored in the graph / in the combinations of several nodes.
For example, nodes P1, P2, P3 and P4 are given. My Bayes Classifier thinks with a good probability of 0.9 that P2 is parent of P1, and P4 is parent of P3. As of the relationship between P1 and P3, it returns p=0.31 for siblings and p=0.34 for partners, so the result is quite unreliable. Now, if the classification of the relationship between P2 and P4 yields "partner" with a good probability of say 0.7, I could be more sure that P1 and P3 are in fact siblings. On the other hand, if P2 and P4 are "other" with probability of 0.8, it is safer for me to conclude that P1 and P3 are partners.
I could code this logic by hand, but I think there are a lot more cases and logical dependencies, especially if we want to build a relationship graph for around 10 or 20 persons. Therefore I would like to use some kind of a classifier or classifier system.
But the output of this classifier system will be not a binary or scalar value, but a whole graph. What can I use or where can I start looking?
Thanks!
You want some to do some kind of structure learning. Just like graphs are much more complicated than bits, structure learning is much more complicated than classification.
You probably want to find a maximum a posteriori (MAP) family tree, subject your probabilistic knowledge of the individual relationships. The MAP is the single most likely assignment given all of your knowledge. The general problem of figuring out relationships between probabalistically related items is called probabalistic inference, or sometimes just inference.
I don't know if you can access the course materials at the recently completed probabilistic graphical models class, but that would be well studying.

Resources