What is the Correct Way to Use Crow's Foot Notation for Relationships? - entity-relationship

This has bothered me for some time. What assumptions are to be made, in regards to cardinality when a relationship does not use crow's foot notation- in my opinion- completely. For example, here is a one-to-many relationship from Wikipedia:
I would have thought that this in incorrect; that children must have a mother so I would put two lines on the left side (one mandatory and only one) and a 1 to many for the children (a line and a crow's foot) on the right to indicate that a mother must have at least one child, but could have many. I would have expected this:
My question is, what assumptions are to be made in a "shortcut" like this because I see it everywhere on cardinality examples? Is there a known assumption or rule of what leaving those blank mean?

Both are correct.
The difference between them is that Wikipedia's example isn't Crow's Foot, but a variation called Barker Notation. It looks so similar as Richard Barker modelled it on Crow's Foot and intended it as a refinement
(For some reason, they taught us Barker Notation at college as opposed to Crow's Foot)

Related

NEAT: how does crossover occur for species with only one member

So, I'm trying to implement the NEAT(Neuroevolution of augmenting topologies) algorithm and have stumbled into a problem. How are networks in species with only one member crossed over?
One solution I came up with is to perform inter-species crossover. But I don't know if it would be effective.
In NEAT, there are four ways in which you can create candidate individuals for the next generation:
Pass an exact copy of an individual
Pass a mutated copy of an individual
Do crossover using two individuals from a given species
Do crossover with two individuals of different species (iter-species)
Of course, you can always do (1). This is often applied to "elites", which may be the best of all, or the best of each species.
You can also always do (2), again to a subset of all individuals or to a subset (random or sorted) within each species.
As you correctly anticipate, (4) is also always a possibility, as long as you do have at least two species (it seems things would be a bit broken otherwise).
Regarding (3) in case you have a species with only one individual? You can't really do it, right?
There are two things that can help in this situation. First, use a mix of 1 to 4 options. The frequency for each option is normally determined using hyperparameters (as well as the frequency for each type of mutation and so on).
But here I would actually reconsider your speciation algorithm. Speciation means separating your population into groups, where hopefully more similar individuals are grouped together. There are different ways in which you can do this, and you can re-examine your species with different frequencies as well (you can reset your species every generation!). It does not seem very efficient if your clustering algorithm (because speciation is a type of clustering) is returning species with one or even zero individuals. So this is where I would actually work!
As a final note, I remember a full NEAT implementation is no basic project. I would recommend not trying to implement this on your own. I think it is a better use of your time to work with a well-established implementation, so you can focus on understanding how things work and how to adapt them for your needs, and not so much on bugs and other implementation details.

Numeral in POS tag "whnp-1" mean?

I am trying to understand an already annotated the sentence When this happens, the two KIMs show a magnetism that causes the first KIM to move toward the second KIM.
What does number 1 in POS tag WHADVP-1 for When mean/signify?
Similarly what does number 1 in POS tag WHNP-1 for that mean/signify?
I think I understand well POS tags, after reading http://web.mit.edu/6.863/www/PennTreebankTags.html and
notes by Andrew McIntyre.
They are indices for coreference resolution. I think this guide from the University of Tübingen, Germany puts it quite nicely.
4.1.2 Indexing
Indices are used only when they can be used to indicate a relationship
that would otherwise not be unambiguously retrievable from the
bracketing. Indices are used to express such relationships as
coreference (as in the case of controlled PRO or pragmatic coreference
for arbitrary PRO), binding (as in the case of wh-movement), or close
association (as in the case of it-extraposition). These relationships
are shown only when some type of null element is involved, and only
when the relationship is intrasentential. One null element may be
associated with another, as in the case of the null wh-operator.
Coreference relations between overt pronouns and their antecedents are
not annotated.
I work quite a lot with POS taggers and I usually just ignore the numbers chained on, unless I am debugging a parse error and want to know why a sentence is tagged wrong. They can be very useful for training sequencing algorithms such as MMEM, CRFs, etc.

Does an algorithm exist to identify different queries/questions in sentence?

I want to identifies different queries in sentences.
Like - Who is Bill Gates and where he was born? or Who is Bill Gates, where he was born? contains two queries
Who is Bill Gates?
Where Bill Gates was born
I worked on Coreference resolution, so I can identify that he points to Bill Gates so resolved sentence is "Who is Bill Gates, where Bill Gates was born"
Like wise
MGandhi is good guys, Where he was born?
single query
who is MGandhi and where was he born?
2 queries
who is MGandhi, where he was born and died?
3 queries
India won world cup against Australia, when?
1 query (when India won WC against Auz)
I can perform Coreference resolution but not getting how can I distinguish queries in it.
How to do this?
I checked various sentence parser, but as this is pure nlp stuff, sentence parser does not identify it.
I tried to find "Sentence disambiguation" like "word sense disambiguation", but nothing exist like that.
Any help or suggestion would be much appreciable.
Natural language is full of exceptions. Especially in English, it is often said that there are more exceptions than rules. So, it is almost impossible to get a completely accurate solution that works every single time, but using a parser, you can achieve reasonably good performance.
I like to use the Berkeley parser for such tasks. Their online demo includes a graphical representation of the parse tree, which is extremely helpful when trying to formulate heuristics.
For example, consider the question "Who is Bill Gates and where was he born?". The parse tree looks like this:
Clearly, you can split the tree at the central conjunction (CC) node to extract the individual queries. In general, this will be easy if the parsed sentence is simple (where there will be only one query) or compound (where the individual queries can be split by looking at conjunction nodes, as above).
Another more complex example in your question has three queries, such as "Who is Gandhi and where did he work and live?". The parse tree:
Again, you can see the conjunction node which splits "Who is Gandhi" and "Where did he work and live*". The parse does not, however, split up the second query into two, as you would ideally want. And that brings us to the hardest part of what you are trying to do: dealing with (computationally, of course) what is known as right node raising. This is a linguistic construct where common parts get shared.
For example, consider the question "When and how did he suffer a setback?". What it really asks is (a) when did he suffer a setback?, and (b) how did he suffer a setback? Right-node raising issues cannot be solved by just parse trees. It is, in fact, one of the harder problems in computational linguistics, and belongs to the domain of hardcore academic research.

ER diagrams: can two different notations be used in same diagram?

Within the one ER diagram is it possible to use the 1...* etc type of notation as well as the arrow notations in order to show cardinality constraints or does it have to be either or.
Often CASE tools such as CA ERwin or IBM Data Architect allow to display both relationship type (in IE crow-feet symbols) and also description of cardinality type.
If relationship is potentially one - to - many, it can described as broken crow feet symbol and cardinality type describtion of zero, one or M.

What are recommended patterns to localize a dynamically built phrase?

Given a phrase that is dynamically constructed with portions present or removed based on parameters, what are some possible solutions for supporting localization? For example, consider the following two phrases with bold parts that represent dynamically inserted portions:
The dog is spotted, has a doghouse and is chasing a ball.
The dog is white, and is running in circles.
For English, this can be solved by simply concatenating the phrase portions or perhaps having a few token-filled strings in a resource file that can be selected based on parameters. But these solutions won't work or get ugly quickly once you need to localize for other languages or have more parameters. In the example above, assuming that the dog appearance is the only portion always present, a localized resource implementation might consist of the following resource strings:
AppearanceOnly: The dog is %appearance%.
ActivityOnly: The dog is %appearance% and is %activity%.
AssessoryOnly: The dog is %appearance% and has %accessory%.
AccessoryActivity: The dog is %appearance%, has %accessory% and is %activity%.
While this works, the required number of strings grows exponentially depending upon the number of parameters.
Been searching far and wide for best practices that might help me with this challenge. The only solution I have found is to simply reword the phrase—but you lose the natural sentence structure, which I really don't want to do:
Dog: spotted, doghouse, chasing ball
Suggestions, links, thoughts, examples, or "You're crazy, just reword it!" feedback is welcome :) Thanks!
The best approach is probably to divide the sentence to separate sentences, like “The dog is spotted. The dog has a doghouse. The dog is chasing a ball.” This may look boring, but if you would replace all occurrences of “the dog” except the first one, you have a serious pronoun problem. In many languages, the pronoun to be used would depend on the noun it refers to. (Even in English, it is not quite clear whether a dog is he, she, or it.)
The reason for separation is that different languages have different verb systems. For example, in Russian, you cannot really combine the three sentences into one sentence that has three verbs sharing a subject. (In Russian, you don’t use the verb “to be” in present tense – instead, you would just say the equivalent of “Dog – spotted”, and there is no verb corresponding to “to have” – instead, you use the equivalent of “at dog doghouse”. Finnish is similar with respect to “to have”. Such issues are sometimes handled, in “forced” localizations, by using a word that corresponds to “to possess” or “to own”, but the result is odd-looking, to put it mildly.)
Moreover, languages have different natural orders for subject, verb, and object. Your initial approach implicitly postulates a SVO order. You should not assume that the normal, unmarked word order always starts with the subject. Instead of using sentence patterns like "%subject% %copula% %appearance% (where %copula% is “is”, “are”, or “am” in English), you would need to call a function with two parameters, subject and appearance, returning a sentence that has a language-dependent copula, or no copula, and that has a word order determined by the rules of the language. Yes, it gets complicated; localization of generated statements gets rather complicated as soon as you deal with anything but structurally very similar languages.

Resources