"for all" in datalog - declarative

Given a set of facts of the form is_member(country, organisation), I have the following query to write in datalog:
Return all countries who belong to all of the organisations of which Denmark is a member of.
I want to do something like
member_all_Denmarks_organisations(Country):-
¬( is_member('Denmark', Organization),
¬is_member(Country, Organization)
).
In other words, 'for every organization that Denmark is member of, Country is a member of it too'. But datalog does not allow negated predicates which contain non-instantiated variables, so this doesn't work.
How can I proceed? And in general, when wanting to express a 'for all' statement, how to do so in datalog?

We are going to take the following alternative equivalent definition:
Return all countries who not fail to belong to some organisation that Denmark is a member of.
Of course, you can only express this in a dialect of Datalog with negation.
The following should do:
organisation_of_denmark(org) :- is_member('Denmark', org).
// a country c is disqualified if there is some organisation org
// of which Denmark is a member but c isn't
disqualified_country(c) :- organisation_of_denmark(org), country(c), ¬is_member(c, org).
// we are only interested in countries that are not excluded by the previous rule
mmember_all_Denmarks_organisations(c) :- country(c), ¬disqualified_country(c).
// in case there is no unary predicate identifying all countries
// the best we can do is the following (knowing well that then the above
// will only work for countries that are members of at least one organisation)
country(c) :- is_member(c, _).
This is precisely what you wrote also, only with intermediate relations included that
capture some of your sub-formulas and with the atom country(c) included to act as
a guard or a domain for the outer-most complementation.

The problem is a case of expressing the following proposition P in Datalog:
P(x) := for all y, p(y) => q(x,y)
In Datalog, given database DB with, say, 2 columns and x in 1st column, this can be expressed as:
P(x):- DB(x,_), ¬disqualified(x).
disqualified(x):- DB(x,_), p(y), ¬q(x,y).
The trick is to create your own disqualified() predicate.
DB(x,_) is there just to instantiate x before it appears in a negated predicate.
In the specific Denmark case:
P(x) =: 'x is member of all Denmark's organisations'
p(y) =: is_member('Denmark', y)
q(x,y) =: is_member(x,y)
DB =: is_member()

Related

How to concatenate three columns into one and obtain count of unique entries among them using Cypher neo4j?

I can query using Cypher in Neo4j from the Panama database the countries of three types of identity holders (I define that term) namely Entities (companies), officers (shareholders) and Intermediaries (middle companies) as three attributes/columns. Each column has single or double entries separated by colon (eg: British Virgin Islands;Russia). We want to concatenate the countries in these columns into a unique set of countries and hence obtain the count of the number of countries as new attribute.
For this, I tried the following code from my understanding of Cypher:
MATCH (BEZ2:Officer)-[:SHAREHOLDER_OF]->(BEZ1:Entity),(BEZ3:Intermediary)-[:INTERMEDIARY_OF]->(BEZ1:Entity)
WHERE BEZ1.address CONTAINS "Belize" AND
NOT ((BEZ1.countries="Belize" AND BEZ2.countries="Belize" AND BEZ3.countries="Belize") OR
(BEZ1.status IN ["Inactivated", "Dissolved shelf company", "Dissolved", "Discontinued", "Struck / Defunct / Deregistered", "Dead"]))
SET BEZ4.countries= (BEZ1.countries+","+BEZ2.countries+","+BEZ3.countries)
RETURN BEZ3.countries AS IntermediaryCountries, BEZ3.name AS
Intermediaryname, BEZ2.countries AS OfficerCountries , BEZ2.name AS
Officername, BEZ1.countries as EntityCountries, BEZ1.name AS Companyname,
BEZ1.address AS CompanyAddress,DISTINCT count(BEZ4.countries) AS NoofConnections
The relevant part is the SET statement in the 7th line and the DISTINCT count in the last line. The code shows error which makes no sense to me: Invalid input 'u': expected 'n/N'. I guess it means to use COLLECT probably but we tried that as well and it shows the error vice-versa'd between 'u' and 'n'. Please help us obtain the output that we want, it makes our job hell lot easy. Thanks in advance!
EDIT: Considering I didn't define variable as suggested by #Cybersam, I tried the command CREATE as following but it shows the error "Invalid input 'R':" for the command RETURN. This is unfathomable for me. Help really needed, thank you.
CODE 2:
MATCH (BEZ2:Officer)-[:SHAREHOLDER_OF]->(BEZ1:Entity),(BEZ3:Intermediary)-
[:INTERMEDIARY_OF]->(BEZ1:Entity)
WHERE BEZ1.address CONTAINS "Belize" AND
NOT ((BEZ1.countries="Belize" AND BEZ2.countries="Belize" AND
BEZ3.countries="Belize") OR
(BEZ1.status IN ["Inactivated", "Dissolved shelf company", "Dissolved",
"Discontinued", "Struck / Defunct / Deregistered", "Dead"]))
CREATE (p:Connections{countries:
split((BEZ1.countries+";"+BEZ2.countries+";"+BEZ3.countries),";")
RETURN BEZ3.countries AS IntermediaryCountries, BEZ3.name AS
Intermediaryname, BEZ2.countries AS OfficerCountries , BEZ2.name AS
Officername, BEZ1.countries as EntityCountries, BEZ1.name AS Companyname,
BEZ1.address AS CompanyAddress, AS TOTAL, collect (DISTINCT
COUNT(p.countries)) AS NumberofConnections
Lines 8 and 9 are the ones new and to be in examination.
First Query
You never defined the identifier BEZ4, so you cannot set a property on it.
Second Query (which should have been posted in a separate question):
You have several typos and a syntax error.
This query should not get an error (but you will have to determine if it does what you want):
MATCH (BEZ2:Officer)-[:SHAREHOLDER_OF]->(BEZ1:Entity),(BEZ3:Intermediary)- [:INTERMEDIARY_OF]->(BEZ1:Entity)
WHERE BEZ1.address CONTAINS "Belize" AND NOT ((BEZ1.countries="Belize" AND BEZ2.countries="Belize" AND BEZ3.countries="Belize") OR (BEZ1.status IN ["Inactivated", "Dissolved shelf company", "Dissolved", "Discontinued", "Struck / Defunct / Deregistered", "Dead"]))
CREATE (p:Connections {countries: split((BEZ1.countries+";"+BEZ2.countries+";"+BEZ3.countries), ";")})
RETURN BEZ3.countries AS IntermediaryCountries,
BEZ3.name AS Intermediaryname,
BEZ2.countries AS OfficerCountries ,
BEZ2.name AS Officername,
BEZ1.countries as EntityCountries,
BEZ1.name AS Companyname,
BEZ1.address AS CompanyAddress,
SIZE(p.countries) AS NumberofConnections;
Problems with the original:
The CREATE clause was missing a closing } and also a closing ).
The RETURN clause had a dangling AS TOTAL term.
collect (DISTINCT COUNT(p.countries)) was attempting to perform nested aggregation, which is not supported. In any case, even if it had worked, it probably would not have returned what you wanted. I suspect that you actually wanted the size of the p.countries collection, so that is what I used in my query.

Mnesia Errors case_clause in QLC query without a case clause

I have the following function for a hacky project:
% The Record variable is some known record with an associated table.
Query = qlc:q([Existing ||
Existing <- mnesia:table(Table),
ExistingFields = record_to_fields(Existing),
RecordFields = record_to_fields(Record),
ExistingFields == RecordFields
]).
The function record_to_fields/1 simply drops the record name and ID from the tuple so that I can compare the fields themselves. If anyone wants context, it's because I pre-generate a unique ID for a record before attempting to insert it into Mnesia, and I want to make sure that a record with identical fields (but different ID) does not exist.
This results in the following (redacted for clarity) stack trace:
{aborted, {{case_clause, {stuff}},
[{db, '-my_func/2-fun-1-',8, ...
Which points to the line where I declare Query, however there is no case clause in sight. What is causing this error?
(Will answer myself, but I appreciate a comment that could explain how I could achieve what I want)
EDIT: this wouldn't be necessary if I could simply mark certain fields as unique, and Mnesia had a dedicated insert/1 or create/1 function.
For your example, I think your solution is clearer anyway (although it seems you can pull the record_to_fields(Record) portion outside the comprehension so it isn't getting calculated over and over.)
Yes, list comprehensions can only have generators and assignments. But you can cheat a little by writing an assignment as a one-element generator. For instance, you can re-write your expression as this:
RecordFields = record_to_fields(Record),
Query = qlc:q([Existing ||
Existing <- mnesia:table(Table),
ExistingFields <- [record_to_fields(Existing)],
ExistingFields == RecordFields
]).
As it turns out, the QLC DSL does not allow assignments, only generators and filters; as per the documentation (emphasis mine):
Syntactically QLCs have the same parts as ordinary list
comprehensions:
[Expression || Qualifier1, Qualifier2, ...]
Expression (the template)
is any Erlang expression. Qualifiers are either filters or generators.
Filters are Erlang expressions returning boolean(). Generators have
the form Pattern <- ListExpression, where ListExpression is an
expression evaluating to a query handle or a list.
Which means we cannot variable assignments within a QLC query.
Thus my only option, insofar as I know, is to simply write out the query as:
Query = qlc:q([Existing ||
Existing <- mnesia:table(Table),
record_to_fields(Existing) == record_to_fields(Record)
]).

Determining if this data is really in 4th normal form?

I got a few - company, location and product details to store in a db.
sample data
company location product
------------------------------
abc hilltop alpha
abc hilltop beta
abc riverside alpha
abc riverside beta
buggy underbridge gama
buggy underbridge theta
buggy underbridge omega
The relationships are multi-valued, as I understand. And the data needs to be normalized as the MVD's are
not derived from a candidate key (company ->> location and company ->> product where company is not a candidate key)
or the union does not make the whole set (company U location < R and so with product).
But my colleague disagrees with me, who insists that for a relation to have multi-valued dependency at least four same values in company column should exist for each company. i.e
t1(company) = t2(company) = t3(company) = t4(company),
for company abc this is true. But for company "buggy", which does only one product in three locations, this is untrue.
For the formal definition and similar examples I refernced:
https://en.wikipedia.org/wiki/Multivalued_dependency
and Fourth_normal_form example also on wiki.
I know my colleague is being pedagogy, but I too started seeing the same question after reading the formal definition. (After all these are derived on mathematical basis.)
update: I am not asking how to normalize this data in to 4NF, I think I know that. (I need to break it in to two tables 1) company - location and 2) company - product.
which I have done already.
Can some one explain how this relation is still a MVD even though it does not satisfy the formal definition?
Detailed explanations are very much welcome.
"There exist" says some values exist, and they don't have to be different. EXISTS followed by some name(s) says that there exist(s) some value(s) referred to by the name(s), for which a condition holds. Multiple names can refer to the same value. (FOR ALL can be expressed in terms of EXISTS.)
The notion of MVD can be applied to both variables and values. In fact the form of the linked definition is that a MVD holds in the variable sense when it holds in the value sense "in any legal relation". To know that a particular value is legal, you need business knowledge. You can then show whether that value satisfies an MVD. But to show whether its variable satisfies the MVD you have to show that the MVD is satisfied "in any legal relation" value that the variable can hold. One valid value can tell you that a MVD doesn't hold in (it and) its variable, but it can't tell you that a MVD does hold in its variable. That requires more business knowledge.
You can show that this value violates 4NF by using that definition of MVD. The definition says that a relation variable satisfies a MVD when a certain condition holds "for any valid relation" value:
for all pairs of tuples t1 & t2 in r such that t1[a] = t2[a] there exist tuples t3 & t4 [...]
For what MVD and values for t1 & t2 does your colleague claim there doesn't exist values for t3 & t4? There is no such combination of MVD and values for t1 & t2. Eg for {company} ↠ {product} and t1 & t2 both (buggy, underbridge, gamma), we can take (company, underbridge, gamma) as a value for both t3 & t4, and so on for all other choices for t1 & t2.
Another definition for F ↠ T holding is that binary JD (join dependency) *{F U T, F U (A - T)} holds, ie that the relation is equal to the join of its projections on F U T & F U (A - T). This definition might be more immediately helpful to you & your colleague in that it avoids the terminology that you & they are misinterpreting. Eg your example data is the join of these two of its projections:
company location
--------------------
abc hilltop
abc riverside
buggy underbridge
company product
----------------
abc alpha
abc beta
buggy gamma
buggy theta
buggy omega
So it satisfies the JD *{{company, location}, {company, product}}, so it satisfies the MVDs {company} ↠ {location} and {company} ↠ {product} (among others). (Maybe you will be able to think of examples of relations with zero, one, two, three etc tuples for which one or more (trivial and/or non-trivial) MVDs hold.)
Of course, the two definitions are two different ways of describing the same condition.
PS 1 Whenever a FD F → T holds, the MVD F ↠ T holds. For a relation in BCNF, the MVDs that violate 4NF & 5NF are those not so associated with FDs.
PS 2 A relation variable is meant to hold a tuple if and only if it makes a true statement in business terms when its values are substituted into a given statement template, or predicate. That plus the JD definition for MVD gives conditions for a relation variable satisfying a MVD in business terms. Here our predicate is of the form ...company...location...product.... (Eg company namedcompanyis located atlocationand makes productproduct.) It happens that this MVD holds for a variable when for all valid business situations, FOR ALL company, location, product,
EXISTS product [...company...location...product...]
AND EXISTS location [...company...location...product...]
IMPLIES ...company...location...product...

Is it possible to create a variable and make its assignment based on certain conditions in a cypher query?

I am trying to create an array of values that will be assigned based on the outcome of a case test. This test will be inside a query that I already know works with a preset value in the query.
The query I am trying to embed in the case test is something like this:
WITH SPLIT (('07/28/2015'), '/' AS cd
MATCH (nodeA: NodeTypeA)-(r:ARelation)->(nodeB: NodeTypeB)
WITH cd, SPLIT (nodeA.ADate, '/') AS dd, nodeA, nodeB, r
WHERE
(TOINT(cd[2])> TOINT(dd[2])) OR (TOINT(cd[2]= TOINT(dd[2]) AND ((TOINT(cd[0])> TOINT(dd[0])) OR (TOINT(cd[0])= TOINT(dd[0]) AND (TOINT(cd[1])>= TOINT(dd[1])))))
RETURN nodeA, nodeB, r
I want to replace the current date with whatever date will be 6 months from the current date, and I came up with something like this, though I am not sure where I would put it in my query or if it would even work (do I initialize the new variable for instance somehow?):
WHEN ((TOINT(cd[0])> 6))
THEN
TOINT(fd[2])=TOINT(cd[2])+1, TOINT(fd[0])=TOINT(cd[0])-6, TOINT(fd[1])=TOINT(cd[1])
ELSE
TOINT(fd[2])=TOINT(cd[2]), TOINT(fd[0])=TOINT(cd[0])+6, TOINT(fd[1])=TOINT(cd[1])
fd would then replace the cd in the original query's WHERE segment. Where would my case test go, is it correctly written (and if not, what is wrong), and would I need something else added to make it all work?
Just use a WITH block to do a computation and bind it to a new variable, like this:
WITH 2 + 2 as y RETURN y;
That basically assigns the value 4 to y.
In your query, you already have a big WITH block. Just put your computations in those, bound to new variables, and you can then refer to those variables in subsequent expressions.
Don't try to modify these variables, just create new ones (with new WITH blocks) as needed. If you need variables that can actually change, then...well hey you're working with a database, the ultimate way to store and update information. Create a new node, and then update it as you see fit. :)
This is my proposed solution
Explanation: I have declared four variables in my query i.e. name1, name2, ken and lana and I am using these variables for creating MATCH pattern (in the MATCH clause) and filtering those in the Where clause.
WITH "Lau" AS name1,
"L" AS name2,
"Keanu Reeves" AS ken,
"Lana Wachowski" AS lana
MATCH(x:Person{ name: ken})-[:ACTED_IN]->(m:Movie)<-[:ACTED_IN]-(y:Person),
(x1:Person{name: lana})-[:DIRECTED]->(m)<-[:DIRECTED]-(y1:Person)
WHERE y.name CONTAINS name1 OR
y.name CONTAINS name2 OR
(y.name CONTAINS name1 AND y.name CONTAINS name2)
RETURN x, m, y, x1;

How do query expression joins depend on the order of keys?

In the documentation for query expressions, I found:
Note that the order of the keys around the = sign in a join expression is significant.
I can't, however, find any information about how exactly the order is significant, what difference it makes, or what the rationale was for making an equality operator non-symmetric.
Can anyone either explain or point me to some better documentation?
This is important for joins. For example, if you look at the sample for leftOuterJoin:
query {
for student in db.Student do
leftOuterJoin selection in db.CourseSelection on
(student.StudentID = selection.StudentID) into result
for selection in result.DefaultIfEmpty() do
select (student, selection)
}
The order determines what happens when "missing" values occur. The key is this line in the docs:
If any group is empty, a group with a single default value is used instead.
With the current order, every StudentID within db.Student will be represented, even if db.CourseSelection doesn't have a matching element. If you reverse the order, the opposite is true - every "course selection" will be represented, with missing students getting the default value. This would mean that, in the above, if you switched the order, any students without a course selection would have no representation in the results, where the current order always shows every student.
The expression on the left of the operator must be derived from the "outer" thing being joined and the expression on the right must be derived from the "inner" thing (as you mention in your comment on Reed's answer). This is because of the LINQ API - the actual method that is invoked to build the query looks like this:
static member Join<'TOuter, 'TInner, 'TKey, 'TResult> :
outer:IQueryable<'TOuter> *
inner:IEnumerable<'TInner> *
outerKeySelector:Expression<Func<'TOuter, 'TKey>> *
innerKeySelector:Expression<Func<'TInner, 'TKey>> *
resultSelector:Expression<Func<'TOuter, 'TInner, 'TResult>> -> IQueryable<'TResult>
So you can't join on arbitrary boolean expressions (which you can do in SQL - something like JOIN ON a.x + b.y - 7 > a.w * b.z is fine in SQL but not in LINQ), you can only join based on an equality condition between explicit projections of the outer and inner tables. In my opinion this is a very unfortunate design decision, but it's been carried forward from LINQ into F#.

Resources