Modifying a bipartite graph so that it would have a perfect matching - graph-algorithm

Given a bipartite graph with equal-sized sides X and Y, how can we efficiently find the minimum number of edges we have to add so that the graph will have a perfect matching? is there a better solution than iterating over all 2^(|X|) subsets and adding edges until Hall's theorem is satisfied?
Thanks.

If I understood the question correctly, it should be possible to generate a cardinality-maximal matching of the initial graph efficiently, by either using the so-called Hungarian method or modelization as a network flow problem. Once the cardinality-maximal matching has been found, there must be an equal number of unmatched nodes in either partition, which can be matched using additional edges at will.
In other words, if M is the cardinality of a cardinality-maximal matching in the original graph and |X|=|Y| holds, then at at least M-|X| edges have to be added in order to have a perfect matching contained in the graph.

Related

Get subgraph in Neo4j using top N edges based on edge property

I have a graph of Position nodes that are connected using direction :TO edges. Each node has a uuid property and may have many edges from it to other nodes and each edge has a property probability. I want to get the subgraph from a particular starting node using only the edges with the top N probabilities from each node. For example, if each node has ten edges, I might want to use the three edges with the highest probability.
On top of this I want to exclude all edges that end in an already visited node and, preferably, be able to parameterize the maximum number of levels (maxLevel in the apoc procedures, I believe).
The apoc path expansion procedures would probably work fine except for the last requirement; there's no apparent way to limit the number of edges, just the number of rows.
I've tried chaining MATCH queries together but can't figure out how to limit the number of edges on a per node basis, just the number of rows.
I think I have a few additional ideas that I'm going to work on, but I feel like this has to be a common enough use case that I'm missing something fundamental.
Try this:
MATCH (n:label1)-[e:type]->(m:label2)
WHERE n.name='xxx'
WITH n,e1,m
ORDER BY e1.prop DESC
LIMIT 3
MATCH (m)-[e2:type]->(k)
RETURN n,e1,m,e2,k

Cypher: Find any path between nodes

I have a neo4j graph that looks like this:
Nodes:
Blue Nodes: Account
Red Nodes: PhoneNumber
Green Nodes: Email
Graph design:
(:PhoneNumber) -[:PART_OF]->(:Account)
(:Email) -[:PART_OF]->(:Account)
The problem I am trying to solve is to
Find any path that exists between Account1 and Account2.
This is what I have tried so far with no success:
MATCH p=shortestPath((a1:Account {accId:'1234'})-[]-(a2:Account {accId:'5678'})) RETURN p;
MATCH p=shortestPath((a1:Account {accId:'1234'})-[:PART_OF]-(a2:Account {accId:'5678'})) RETURN p;
MATCH p=shortestPath((a1:Account {accId:'1234'})-[*]-(a2:Account {accId:'5678'})) RETURN p;
MATCH p=(a1:Account {accId:'1234'})<-[:PART_OF*1..100]-(n)-[:PART_OF]->(a2:Account {accId:'5678'}) RETURN p;
Same queries as above without the shortest path function call.
By looking at the graph I can see there is a path between these 2 nodes but none of my queries yield any result. I am sure this is a very simple query but being new to Cypher, I am having a hard time figuring out the right solution. Any help is appreciated.
Thanks.
All those queries are along the right lines, but need some tweaking to make work. In the longer term, though, to get a better system to easily search for connections between accounts, you'll probably want to refactor your graph.
Solution for Now: Making Your Query Work
The path between any two (n:Account) nodes in your graph is going to look something like this:
(a1:Account)<-[:PART_OF]-(:Email)-[:PART_OF]->(ai:Account)<-[:PART_OF]-(:PhoneNumber)-[:PART_OF]->(a2:Account)
Since you have only one type of relationship in your graph, the two nodes will thus be connected by an indeterminate number of patterns like the following:
<-[:PART_OF]-(:Email)-[:PART_OF]->
or
<-[:PART_OF]-(:PhoneNumber)-[:PART_OF]->
So, your two nodes will be connected through an indeterminate number of intermediate (:Account), (:Email), or (:PhoneNumber) nodes all connected by -[:PART_OF]- relationships of alternating direction. Unfortunately to my knowledge (and I'd love to be corrected here), using straight cypher you can't search for a repeated pattern like this in your current graph. So, you'll simply have to use an undirected search, to find nodes (a1:Account) and(a2:Account) connected through -[:PART_OF]- relationships. So, at first glance your query would look like this:
MATCH p=shortestPath((a1:Account { accId: {a1_id} })-[:PART_OF*]-(a2:Account { accId: {a2_id} }))
RETURN *
(notice here I've used cypher parameters rather than the integers you put in the original post)
That's very similar to your query #3, but, like you said - it doesn't work. I'm guessing what happens is that it doesn't return a result, or returns an out of memory exception? The problem is that since your graph has circular paths in it, and that query will match a path of any length, the matching algorithm will literally go around in circles until it runs out of memory. So, you want to set a limit, like you have in query #4, but without the directions (which is why that query doesn't work).
So, let's set a limit. Your limit of 100 relationships is a little on the large side, especially in a cyclical graph (i.e., one with circles), and could potentially match in the region of 2^100 paths.
As a (very arbitrary) rule of thumb, any query with a potential undirected and unlabelled path length of more than 5 or 6 may begin to cause problems unless you're very careful with your graph design. In your example, it looks like these two nodes are connected via a path length of 8. We also know that for any two nodes, the given minimum path length will be two (i.e., two -[:PART_OF]- relationships, one into and one out of a node labelled either :Email or :PhoneNumber), and that any two accounts, if linked, will be linked via an even number of relationships.
So, ideally we'd set out our relationship length between 2 and 10. However, cypher's shortestPath() function only supports paths with a minimum length of either 0 or 1, so I've set it between 1 and 10 in the example below (even though we know that in reality, the shortest path have a length of at least two).
MATCH p=shortestPath((a1:Account { accId: {a1_id} })-[:PART_OF*1..10]-(a2:Account { accId: {a2_id} }))
RETURN *
Hopefully, this will work with your use case, but remember, it may still be very memory intensive to run on a large graph.
Longer Term Solution: Refactor Graph and/or Use APOC
Depending on your use case, a better or longer term solution would be to refactor your graph to be more specific about relationships to speed up query times when you want to find accounts linked only by email or phone number - i.e. -[:ACCOUNT_HAS_EMAIL]- and -[:ACCOUNT_HAS_PHONE]-. You may then also want to use APOC's shortest path algorithms or path finder functions, which will most likely return a faster result than using cypher, and allow you to be more specific about relationship types as your graph expands to take in more data.

Find Longest Path in Graph

I have been trying hard to find out longest path in a complex network. I have been through many questions in StackOverflow and Internet, but none could help me. I have written a CQL as
start n=node(*)
match p = (n)-[:LinkTo*1..]->(m)
with n,MAX(length(p)) as L
match p = (n)-[:LinkTo*1..]->(m)
where length(p) = L
return p,L
I don't get any solution. Neo4J would keep running for the answer, and I also tried executing it in Neo4J Cloud Hosting. I didn't any solution even there, but got an error
"Error undefined-undefined"
I am in dire need of a solution. The result for this answer will help me complete my project. So, anyone please help me in correcting the query.
Well for one you're doing a highly expensive operation twice when you only have to do it once.
Additionally, you are returning one path per every single node in your database, at least (as there may be multiple paths for a node that are the longest paths available for that node). But from your question it sounds like you want the single largest path in the graph, not one each for every single node.
We can also improve your match by only performing the longest-path match on nodes that are at the head of the path, and not somewhere in the middle.
Maybe try this one?
match (n)
where (n)-[:LinkTo]->() and not ()-[:LinkTo]->(n)
match p = (n)-[:LinkTo*1..]->(m)
return p, length(p) as L
order by L desc
limit 1
The problem you're trying to solve is NP-hard. On small sparse graphs a brute-force approach such as the one suggested by InverseFalcon may succeed in reasonable time, but on any reasonably large and/or densely connected graph, you will quickly run into both time and space problems.
If you have a weighted graph, you can find the longest path between 2 nodes by negating all the edge-weights, and running a shortest weighted path algorithm over the modified graph. However if you want to find the longest path in the entire graph, you are effectively trying to solve the Travelling Salesman Problem, but with -ve edge weights. You can't do that with Cypher.
If your graph is unweighted, I'd find an easier problem, or see if you can convert your graph to a weighted one and tackle it as described above. Alternatively, see if you can frame your requirements in such a way that you don't need to find a longest path.

Cypher query to find the hop depth length of particular relationships

I am trying to find the amount of relationships that stem originally from a parent node and I am not sure the syntax to use in order to gain access to this returned integer. I am can be sure in my code that each child node can only have one relationship of a particular type so this allows me to capture a "true" depth reading
My attempt is this but I am hoping there is a cleaner way:
MATCH p=(n {id:'123'})-[r:Foo*]->(c)
RETURN length(p)
I am not sure this is the correct syntax because it returns an array of integers with the last index being the true tally length. I am hoping for something that just returns an int instead of this mentioned array.
I am very grateful for help that you may be able to offer.
As Nicole says, in general, finding the longest path between two nodes in a graph is not feasible in any reasonable time. If your graph is very small, it is possible that you will be able to find all paths, and select the one with the most edges but this won't scale to larger graphs.
However there is a trick that you can do in certain circumstances. If your graph contains no directed cycles, you can assign each edge a weight of -1, and then look for the shortest weighted path between the source and target nodes. Since the edge weights are negative a shortest weighted path must correspond to a path with a maximum number of edges between the desired nodes.
Unfortunately, Cypher doesn't yet support shortest weighted path algorithms, however the Neo4j database engine does. The docs give an example of how to do this. You will also need to implement your own algorithm, such as Bellman-Ford using the traversal API, because Dijkstra won't work with -ve edge weights.
However, please be aware that this trick won't work if your graph contains cycles - it must be a DAG.
Your query:
MATCH p=(n {id:'123'})-[r:Foo*]->(c)
RETURN length(p)
is returning the length of ALL possible paths from n to c. You probably are only interested in the shortest path? You can use the shortestPath function to only consider the shortest path from n to c:
MATCH p = shortestPath((n {id:'123'})-[r:Foo*]->(c))
RETURN length(p)

Nearest nodes to a give node, assigning dynamically weight to relationship types

I need to find the N nodes "nearest" to a given node in a graph, meaning the ones with least combined weight of relationships along the path from given node.
Is is possible to do so with a pure Cypher only solution? I was looking about path functions but couldn't find a viable way to express my query.
Moreover, is it possible to assign a default weight to a relationship at query time, according to its type/label (or somehow else map the relationship type to the weight)? The idea is to experiment with different weights without having to change a property for every relationship.
Otherwise I would have to change the weight property's value to each relationship and re-do it to before each query, which is very time-consuming (my graph has around 10M relationships).
Again, a pure Cypher solution would be the best, or please point me in the right direction.
Please use variable length Cypher queries to find the nearest nodes from a single node.
MATCH (n:Start { id: 0 }),
(n)-[:CONNECTED*0..2]-(x)
RETURN x
Note that the syntax [CONNECTED*0..2] is a range parameter specifying the min and max relationship distance from a given node, with relationship type CONNECTED.
You can swap this relationship type for other types.
In the case you wanted to traverse variably from the start node to surrounding nodes but constrain via a stop criteria to a threshold, that is a bit more difficult. For these kinds of things it is useful to get acquainted with Neo4j's spatial plugin. A good starting point to learn more about Neo4j spatial can be found in this blog post: http://neo4j.com/blog/neo4j-spatial-part1-finding-things-close-to-other-things
The post is a little outdated but if you do some Google searching you can find more updated materials.
GitHub repository: https://github.com/neo4j-contrib/spatial

Resources