In ER model a domain is a set of values for an attribute (i.e., the properties or characteristics of entities). The value set conforms to a common definition for the domain (e.g., type, format, syntax, meaning).
My question are: Is a domain like a check constraint in database?
A domain, as well as defining a set of allowed values, also defines the operation(s) that can be performed on those values. For instance you may define the logical domains
EMPLOYEE_NUMBERS = { n∈Z ∧ 0 <= n <= 99999 }
NUMBER_OF_DAYS = { n∈Z ∧ 0 <= n <= 99999 }
For the EMPLOYEE_NUMBERS domain the mathematical operations of addition and subtraction do not make sense, and so are not defined for the domain. However, for the NUMBER_OF_DAYS domain these mathematical operations do make sense and are defined.
When coming to implement the logical domain on a physical (SQL) DBMS a check constraint may be used, in addition to the basic data-type definition. This however, only limits the values allowed in a column, not the operations that can be performed. Therefore, you could define columns with check constraints:
EMP_NO INTEGER(5) CHECK (EMP_NO >= 0 AND EMP_NO <= 99999)
NO_OF_DAYS INTEGER(5) CHECK (NO_OF_DAYS >= 0 AND NO_OF_DAYS <= 99999)
Although this limits the values that can be stored in these columns in the same way it does not prevent you from summing all the EMP_NOs, although the answer obtained has no meaning. Whereas, a sum of NO_OF_DAYS may produce a legitimate result.
Related
Using Neo4j.
I would like to add a integer number to values already existing in properties of several relationships that I call this way:
MATCH x=(()-[y]->(s:SOL{PRB:"Taking time"})) SET y.points=+2
But it doesn't add anything, just replace by 2 the value I want to incremente.
To achieve this use
SET y.points = y.points + 2
From your original question it looks like you were trying to use the Addition Assignment operator which exists in lots of languages (e.g. python, type/javascript, C#, etc.). However, in cypher += is a little different and is designed to do this in a way which allows you to add or update properties to or on entire nodes or relationships based on a mapping.
If you had a parameter like the below (copy this into the neo4j browser to create a param).
:param someMapping: {a:1, b:2}
The query below would create a property b on the node with value 2, and set the value of property a on that node to 1.
MATCH (n:SomeLabel) WHERE n.a = 0
SET n+= $someMapping
RETURN n
I got a few - company, location and product details to store in a db.
sample data
company location product
------------------------------
abc hilltop alpha
abc hilltop beta
abc riverside alpha
abc riverside beta
buggy underbridge gama
buggy underbridge theta
buggy underbridge omega
The relationships are multi-valued, as I understand. And the data needs to be normalized as the MVD's are
not derived from a candidate key (company ->> location and company ->> product where company is not a candidate key)
or the union does not make the whole set (company U location < R and so with product).
But my colleague disagrees with me, who insists that for a relation to have multi-valued dependency at least four same values in company column should exist for each company. i.e
t1(company) = t2(company) = t3(company) = t4(company),
for company abc this is true. But for company "buggy", which does only one product in three locations, this is untrue.
For the formal definition and similar examples I refernced:
https://en.wikipedia.org/wiki/Multivalued_dependency
and Fourth_normal_form example also on wiki.
I know my colleague is being pedagogy, but I too started seeing the same question after reading the formal definition. (After all these are derived on mathematical basis.)
update: I am not asking how to normalize this data in to 4NF, I think I know that. (I need to break it in to two tables 1) company - location and 2) company - product.
which I have done already.
Can some one explain how this relation is still a MVD even though it does not satisfy the formal definition?
Detailed explanations are very much welcome.
"There exist" says some values exist, and they don't have to be different. EXISTS followed by some name(s) says that there exist(s) some value(s) referred to by the name(s), for which a condition holds. Multiple names can refer to the same value. (FOR ALL can be expressed in terms of EXISTS.)
The notion of MVD can be applied to both variables and values. In fact the form of the linked definition is that a MVD holds in the variable sense when it holds in the value sense "in any legal relation". To know that a particular value is legal, you need business knowledge. You can then show whether that value satisfies an MVD. But to show whether its variable satisfies the MVD you have to show that the MVD is satisfied "in any legal relation" value that the variable can hold. One valid value can tell you that a MVD doesn't hold in (it and) its variable, but it can't tell you that a MVD does hold in its variable. That requires more business knowledge.
You can show that this value violates 4NF by using that definition of MVD. The definition says that a relation variable satisfies a MVD when a certain condition holds "for any valid relation" value:
for all pairs of tuples t1 & t2 in r such that t1[a] = t2[a] there exist tuples t3 & t4 [...]
For what MVD and values for t1 & t2 does your colleague claim there doesn't exist values for t3 & t4? There is no such combination of MVD and values for t1 & t2. Eg for {company} ↠ {product} and t1 & t2 both (buggy, underbridge, gamma), we can take (company, underbridge, gamma) as a value for both t3 & t4, and so on for all other choices for t1 & t2.
Another definition for F ↠ T holding is that binary JD (join dependency) *{F U T, F U (A - T)} holds, ie that the relation is equal to the join of its projections on F U T & F U (A - T). This definition might be more immediately helpful to you & your colleague in that it avoids the terminology that you & they are misinterpreting. Eg your example data is the join of these two of its projections:
company location
--------------------
abc hilltop
abc riverside
buggy underbridge
company product
----------------
abc alpha
abc beta
buggy gamma
buggy theta
buggy omega
So it satisfies the JD *{{company, location}, {company, product}}, so it satisfies the MVDs {company} ↠ {location} and {company} ↠ {product} (among others). (Maybe you will be able to think of examples of relations with zero, one, two, three etc tuples for which one or more (trivial and/or non-trivial) MVDs hold.)
Of course, the two definitions are two different ways of describing the same condition.
PS 1 Whenever a FD F → T holds, the MVD F ↠ T holds. For a relation in BCNF, the MVDs that violate 4NF & 5NF are those not so associated with FDs.
PS 2 A relation variable is meant to hold a tuple if and only if it makes a true statement in business terms when its values are substituted into a given statement template, or predicate. That plus the JD definition for MVD gives conditions for a relation variable satisfying a MVD in business terms. Here our predicate is of the form ...company...location...product.... (Eg company namedcompanyis located atlocationand makes productproduct.) It happens that this MVD holds for a variable when for all valid business situations, FOR ALL company, location, product,
EXISTS product [...company...location...product...]
AND EXISTS location [...company...location...product...]
IMPLIES ...company...location...product...
In the documentation for query expressions, I found:
Note that the order of the keys around the = sign in a join expression is significant.
I can't, however, find any information about how exactly the order is significant, what difference it makes, or what the rationale was for making an equality operator non-symmetric.
Can anyone either explain or point me to some better documentation?
This is important for joins. For example, if you look at the sample for leftOuterJoin:
query {
for student in db.Student do
leftOuterJoin selection in db.CourseSelection on
(student.StudentID = selection.StudentID) into result
for selection in result.DefaultIfEmpty() do
select (student, selection)
}
The order determines what happens when "missing" values occur. The key is this line in the docs:
If any group is empty, a group with a single default value is used instead.
With the current order, every StudentID within db.Student will be represented, even if db.CourseSelection doesn't have a matching element. If you reverse the order, the opposite is true - every "course selection" will be represented, with missing students getting the default value. This would mean that, in the above, if you switched the order, any students without a course selection would have no representation in the results, where the current order always shows every student.
The expression on the left of the operator must be derived from the "outer" thing being joined and the expression on the right must be derived from the "inner" thing (as you mention in your comment on Reed's answer). This is because of the LINQ API - the actual method that is invoked to build the query looks like this:
static member Join<'TOuter, 'TInner, 'TKey, 'TResult> :
outer:IQueryable<'TOuter> *
inner:IEnumerable<'TInner> *
outerKeySelector:Expression<Func<'TOuter, 'TKey>> *
innerKeySelector:Expression<Func<'TInner, 'TKey>> *
resultSelector:Expression<Func<'TOuter, 'TInner, 'TResult>> -> IQueryable<'TResult>
So you can't join on arbitrary boolean expressions (which you can do in SQL - something like JOIN ON a.x + b.y - 7 > a.w * b.z is fine in SQL but not in LINQ), you can only join based on an equality condition between explicit projections of the outer and inner tables. In my opinion this is a very unfortunate design decision, but it's been carried forward from LINQ into F#.
I have an an entity containing two optional to-many relationships (childA <<-> parent <->> childB). Each of these two child entities contain an optional string that I am interested in querying on.
Using the same format, I get the results I expect for one, but not the other. I understand that means I don't understand the tools I'm working with; and hoped for some insight. This is what the two queries look like:
childA.#count != 0 AND (0 == SUBQUERY(childA, $a, $a.string != NIL).#count)
childB.#count != 0 AND (0 == SUBQUERY(childB, $a, $a.string != NIL).#count)
I would expect to get back results from non-nil instances of both childA and childB only if each entity instances' string is also nil. My question is, why would one give the results that I expect; while the other does not?
Clarification:
I'm trying to solve the general problem where I'm searching for one of two things. I'm either searching for a default value in an attribute. When the attribute is optional, I'm additionally searching for a nil attribute. The problem is further compounded when optional relationships' should only be considered when the are populated. Without the relationship count != 0, I get back all parents with a nil relationship. In one case, this is the desired behavior. In another case, this appears to diminish the returned parent count (to 0 results).
For the optional attribute case, the query might look like:
parent.#count != 0 AND (parent.gender == -1) OR (parent.gender == NIL)
Where there are optional relationships in the key-paths, the query takes the form exemplified in the first example.
Again, I have gotten the results I have expected with all but one case, where there doesn't seem to be anything unique to it's relationships nor attribute characteristics. Or I should say, there's nothing unique about this exception in data model structure or query format...
Maybe you mixed up == and != in the second clause, and it should be
childA.#count != 0 AND (SUBQUERY(childA, $a, $a.string != NIL).#count != 0)
It would be clearer what you want to achieve if you could formulate the query in plain English first.
BTW, you can use expressionForSubquery:usingIteratorVariable:predicate: of class NSExpression to build the subquery for you. You might get useful error reporting more easily then.
I understand the problem now.
In my case, it's usually logical correct to first filter out NSSets with 0 count. But, in the problematic case, it's logically correct to return the results of both NSSets with 0 count, and NSSets with > 0 count where the attribute is nil (when optional), or the attribute is set to its default value. In other words, in the problematic case, the left condition needs to be removed, resulting in the following format:
(0 == SUBQUERY(childA, $a, $a.string != NIL).#count)
It seems I'll need to have the managed objects indicate which scenario is appropriate on a case by case basis...yuk!
For simplicity say we have a sample set of possible scores {0, 1, 2}. Is there a way to calculate a mean based on the number of scores without getting into hairy lookup tables etc for a 95% confidence interval calculation?
dreeves posted a solution to this here: How can I calculate a fair overall game score based on a variable number of matches?
Now say we have 2 scenarios ...
Scenario A) 2 votes of value 2 result in SE=0 resulting in the mean to be 2
Scenario B) 10000 votes of value 2 result in SE=0 resulting in the mean to be 2
I wanted Scenario A to be some value less than 2 because of the low number of votes, but it doesn't seem like this solution handles that (dreeve's equations hold when you don't have all values in your set equal to each other). Am I missing something or is there another algorithm I can use to calculate a better score.
The data available to me is:
n (number of votes)
sum (sum of votes)
{set of votes} (all vote values)
Thanks!
You could just give it a weighted score when ranking results, as opposed to just displaying the average vote so far, by multiplying with some function of the number of votes.
An example in C# (because that's what I happen to know best...) that could easily be translated into your language of choice:
double avgScore = Math.Round(sum / n);
double rank = avgScore * Math.Log(n);
Here I've used the logarithm of n as the weighting function - but it will only work well if the number of votes is neither too small or too large. Exactly how large is "optimal" depends on how much you want the number of votes to matter.
If you like the logarithmic approach, but base 10 doesn't really work with your vote counts, you could easily use another base. For example, to do it in base 3 instead:
double rank = avgScore * Math.Log(n, 3);
Which function you should use for weighing is probably best decided by the order of magnitude of the number of votes you expect to reach.
You could also use a custom weighting function by defining
double rank = avgScore * w(n);
where w(n) returns the weight value depending on the number of votes. You then define w(n) as you wish, for example like this:
double w(int n) {
// caution! ugly example code ahead...
// if you even want this approach, at least use a switch... :P
if (n > 100) {
return 10;
} else if (n > 50) {
return 8;
} else if (n > 40) {
return 6;
} else if (n > 20) {
return 3;
} else if (n > 10) {
return 2;
} else {
return 1;
}
}
If you want to use the idea in my other referenced answer (thanks!) of using a pessimistic lower bound on the average then I think some additional assumptions/parameters are going to need to be injected.
To make sure I understand: With 10000 votes, every single one of which is "2", you're very sure the true average is 2. With 2 votes, each a "2", you're very unsure -- maybe some 0's and 1's will come in and bring down the average. But how to quantify that, I think is your question.
Here's an idea: Everyone starts with some "baggage": a single phantom vote of "1". The person with 2 true "2" votes will then have an average of (1+2+2)/3 = 1.67 where the person with 10000 true "2" votes will have an average of 1.9997. That alone may satisfy your criteria. Or to add the pessimistic lower bound idea, the person with 2 votes would have a pessimistic average score of 1.333 and the person with 10k votes would be 1.99948.
(To be absolutely sure you'll never have the problem of zero standard error, use two different phantom votes. Or perhaps use as many phantom votes as there are possible vote values, one vote with each value.)