How to match a set against a set of sets, completely - mapping

This problem is similar to the "Exact Hitting Set" problem (http://en.wikipedia.org/wiki/Exact_cover#Exact_hitting_set) but with slightly different constraints.
I am looking for libraries, implementations, or papers that solve the following.
Say I have a set of sets S, and is initialized as follows:
S = {N, O, P, E};
N = {1, 2, 5}
O = {4, 5}
P = {1, 6, 7}
E = {2, 3, 8}};
S has n sets, and each subset set is of unknown size. In this example n = 4
Now I have another set X of size n, which is initialized to:
X = {1, 2, 4, 6}
What I need to do, is match each element in X to one and only one set in S.
So S should be completely satisfied with all of it's sets mapped to X and vice versa.
X[0] --> N
X[1] --> E
X[2] --> O
X[3] --> P
The main problem I am having is how to deal with the duplicated data with in the sets of S. How do I deal with these collisions? And How to implement the algorithm in such a way that is relatively scalable?
If you have any information that could point me in the right direction to solve this will be very much appreciated.

You could create a bipartite graph in the following manner:
For each element in the set X create a node in the U disjoint set of the graph
For each subset in the set S create a node in the V disjoint set of the graph
If element of X in subset of S then create an edge between the corresponding nodes in U and V
Then having the bipartite graph you can solve the problem using hopcroft karp algorithm to produce the maximum cardinality matching.(O(|E|sqrt(V)))

Related

How to declare/use Arrays and Quantifiers in Z3Py?

I'm new to Z3py and I found an exercise where it would ask to check if some verification conditions were true. Up to this moment, the exercises I've done were basically to transform simple propositional formulas into z3py clauses, something like:
Propositional Formula would be:
(n>=4) -> (x = y +2)
Which would become in Z3Py:
n, x, y = Ints('n x y')
s.add(Implies(n>=5, x == y+3))
The conditions I'm presented now, introduce Arrays and Quantifiers and after spending hours trying to figure it out on the documentation around, I'm still not able to get it properly done.
For example, how would I do the same process above but with the following conditions:
n ≥ 1 ∧ i = 1 ∧ m = A[0]
i <= n ∧ ∀j. 0 ≤ j < i → m ≥ A[j]
A little snippet of what I think that is correctly done:
i, n = Ints('i n')
s.add(And(n>=1, i == 1, ???)
s.add(And(i<=n,Implies(???))
How should I replace the ??? so that the conditions would be correctly transformed into Z3Py?
Solution:
- The constraint
n ≥ 1 ∧ i = 1 ∧ m = A[0]
would become in Z3Py:
A = Array('A', IntSort(), IntSort()) //Array declaration
i, n, m, j = Ints('i n m j') //Ints declaration for both constraints
And(n>=1, i==1, m==A[0])
- The constraint
i <= n ∧ ∀j. 0 ≤ j < i → m ≥ A[j]
would become:
And(i<=n,ForAll(j,Implies(And(j>=0,j<i),m>=A[j])))
Your questions is quite ambiguous. Note that you've the constraints:
n ≥ 1 ∧ i = 1
and then
i <= n
but that consequent is already implied by the first, and hence is redundant. Meaning, if you add them both to the solver like you did with your s.add lines, then it won't really mean much of anything at all.
I'm guessing these two lines actually arise in different "subparts" of the problem, i.e., they aren't used together for the same verification goal. And, making an educated guess, you're trying to say something about the maximum element of an array; where i is some sort of a loop-counter. The first line is what happens before the loop starts, and the second is the invariant the loop-body ensures. If this is the case, you should be explicit about that.
Assuming this is the case, then these sorts of problems are usually modeled on the "body" of that loop, i.e., you need to show us exactly what sort of a "program" you're dealing with. That is, these constraints will only make sense in the presence of some sort of a transformation of your program variables.

Does having the same variable for pattern matching in Erlang means the value are same?

I have some code like this:
hdistance([H|T], [H1|T1], Distance) when H /= H1 ->
hdistance(T, T1, Distance + 1);
hdistance([H|T], [H1|T1], Distance) when H == H1 ->
hdistance(T, T1, Distance).
Can I get rid of the when clause by doing it like this:
hdistance([H|T], [H|T1], Distance) ->
hdistance(T, T1, Distance + 1);
hdistance([H|T], [H1|T1], Distance) ->
hdistance(T, T1, Distance).
If yes why and if no why not ?
Yes.
If yes why
Based on my experience with maps, I remember formulating a rule that the order of matching is not guaranteed, but once one of the lists matches and H is bound to a value, then the other list will only match if the head of the list is equal to H.
You may have experienced something similar in the shell when you wrote:
2> X = 10.
then sometime later, you wrote:
5> X = 20.
and you got an error that said, "no match of right hand side". For the first "match", X was bound to 10; then for the second match because 20 does not match 10, you get an error. It works the same way for your lists: H gets bound to a value for the first match, then for the second match the head of the list has to equal H.
You can actually write the second clause like this:
hdistance([_|T], [_|T1], Distance) ->
hdistance(T, T1, Distance).
because execution will only get to the second clause if the heads of the lists are not equal.

Alphabetical ordering in BFS

I have troubles differentiating between BFS with alphabetical ordering and BFS without it.
For example, to find a spanning tree in this graph (starting from E).
Starting G
After adding {E,B} and {E,C}
T after added EB and EC
I'm not sure whether to continue adding {B,F} or {C,F}.
Thank you very much.
I'm not sure whether to continue adding {B,F} or {C,F}. Thank you very
much.
Well, the answer depends on the order in which you add the vertices B and C in your queue of BFS algorithm. If you look at the algorithm:
BFS (G, s) //Where G is the graph and s is the Source Node
let Q be queue.
Q.enqueue( s ) //Inserting s in queue until all its neighbour vertices are marked.
mark s as visited.
while ( Q is not empty)
//Removing that vertex from queue,whose neighbour will be visited now
v = Q.dequeue( )
//processing all the neighbours of v
for all neighbours w of v in Graph G
if w is not visited
Q.enqueue( w ) //Stores w in Q to further visit its neighbours
mark w as visited.
Its clear that it does not specify what should be the order in which you enque the neighbours of a vertex.
So if you visit the neighbours of E in the order: B , C , then clearly due to FIFO property of Queue data structure, node B will be dequed(taken out of queue) before C and you will have the edge B--F. If the order is C , B, then the edge would be C--F for similar reasons.
Once you understand the pseudocode, you will understand it very clearly.

Cypher query become very slow on a medium size dataset (with loop)

This question further extends the idea on the question:
Cypher: how to find all the chains of single nodes not repeated?
For example, in a graph like this:
(a1:TestNode)-[:REL]->(r1:Route)-[:REL]->(a2:TestNode)-[:REL]->(s1:Route)-[:REL]->(a1:TestNode)
(a2:TestNode)-[:REL]->(r2:Route)-[:REL]->(a3:TestNode)-[:REL]->(s2:Route)-[:REL]->(a2:TestNode)
(a3:TestNode)-[:REL]->(r3:Route)-[:REL]->(a4:TestNode)-[:REL]->(s3:Route)-[:REL]->(a3:TestNode)
Graphically:
s3 ← a4
↙ ↗
s2 ← a3 → r3
↙ ↗
s1 → a2 → r2
↙ ↗
a1 → r1
Cypher code:
CREATE (a1:TestNode {name:'a1'})-[:REL]->(r1:Route {name:'r1'})-[:REL]->(a2:TestNode {name:'a2'})-[:REL]->(s1:Route {name:'s1'})-[:REL]->(a1),
(a2)-[:REL]->(r2:Route {name:'r2'})-[:REL]->(a3:TestNode {name:'a3'})-[:REL]->(s2:Route {name:'s2'})-[:REL]->(a2),
(a3)-[:REL]->(r3:Route {name:'r3'})-[:REL]->(a4:TestNode {name:'a4'})-[:REL]->(s3:Route {name:'s3'})-[:REL]->(a3)
Afterwards, we can find a route from a4 to a1 by this command:
MATCH p = (a4:TestNode {name: 'a1'})-[r:REL*]->(a1:TestNode {name: 'a4'})
WITH [a4] + nodes(p) AS ns, p
WHERE ALL (n IN ns
WHERE 1=SIZE(FILTER(m IN TAIL(ns)
WHERE m = n)))
RETURN p
Question:
1. If I extend the above create query to have 2,000 'a' nodes, i.e. up to
(a2000)-[:REL]->(r2000:Route {name:'r2000'})-[:REL]->(a2001:TestNode {name:'a2001'})-[:REL]->(s2000:Route {name:'s2000'})-[:REL]->(a2000),
I found that my computer becomes very slow, and 2GB of memory is occupied by neo4j. Is it normal?
Then I want to find a route from a2001 to a1. The system cannot find the solution (which is obvious a2001->a2000->a1999.....->a1). I guess it is because of the loops in between. In the previous question mentioned above, the query should have avoided loops because duplicates are not allowed.
My purpose is to extend this idea such that possible routes between 2 locations can be identified on a connected graph. Many thanks.

Find similar images using Geometric Min Hash: How to calculated theoretical matching probabilities?

I'm trying to match images based on visual words (labeled key points within images). When comparing the simulated results to my theoretical results I get significant deviations, therefore I guess there must be a mistake in my theoretical probability calculation.
You can imagine two images as set of visual words (visual word names range from A to Z):
S1=SetImage1={A, B, C, D, E, F, G, H, I, J, L, M, N, O, Y, Z}
S2=SetImage2={A, L, M, O, T, U, V, W, X, Y, Z}
You can already see that some visual words occur in both sets (e.g. A, Z, Y,...). Now we separate the visual words into primary words and secondary words (see the provided image). Each primary word has a neighborhood of secondary words. You can see the primary words (red rectangles) and their secondary words (words within ellipse). For our example the primary word sets are as follows:
SP1=SetPrimaryWordsImage1={A, J, L}
SP2=SetPrimaryWordsImage2={A, L,}
We now randomly select a visual word img1VAL1 from the set SP1 and one word from the neighborhood of img1VAL1, i.e. img1VAL2=SelFromNeighborhood(img1VAL1) resulting into a pair PairImage1={img1VAL1, img1VAL2}. We do the same with the second image and get PairImage2={img2VAL1, img2VAL2}.
Example:
from Image1 we select A as primary visual word and C as secondary word since C is within the neighborhood of A. We get the pair {A, C}
from Image2 we select also A as primary visual word and Z as secondary word. We get the pair {A, Z}
{A,C} != {A,Z} and therefore we have no match. But what is the probability that randomly selected pairs are equal?
The probability is this:
A={1, 2, 3, 4}, B=A={1, 2, 3}
intersection C=A int B={1, 2, 3}
Number of possible pairs out of intersection = 3-choose-2 (binomial)
number of all possibilities=|A|-choose-2 * |B|-choose-2
therefore probability
|intersection|-choose-2/(|A|-choose-2 * |B|-choose-2)

Resources