SQWRL Query to Select Difference of Grouped Sets - ontology

Let's say that I have this information
Individual | Sex | HairColor
---------------------------------------
Joseph | Male | Black
Peter | Male | Black
Kevin | Male | Blonde
Andrew | Male | Brown
Boris | Male | Brown
Chistine | Female | Black
Julia | Female | Black
Julieth | Female | Brown
Judith | Female | Brown
Mary | Female | Blonde
My individuals are all different. I have the class Male and Female asserted to each one. And I also have the property hasHairColor asserted to everyone with its value.
The question is, how can I query all the males with hair color different from black (the ontology may have many other hair colors)?
So far, I have tryied this queries with faulty results...
1. Male(?x) ^ Male(?y) ^ hasHairColor(?y, "Black") ^ differentFrom(?x, ?y) -> sqwrl:select(?x)
2. Male(?x) ^ Male(?y) ^ hasHairColor(?y, "Black") . sqwrl:makeSet(?males, ?x) ^ sqwrl:groupBy(?males, ?x) ^ sqwrl:makeSet(?blacks, ?y) ^ sqwrl:groupBy(?blacks, ?y) . sqwrl:notEqual(?males, ?blacks) -> sqwrl:select(?x)
3. Male(?x) ^ Male(?y) ^ hasHairColor(?y, "Black") . sqwrl:makeSet(?males, ?x) ^ sqwrl:groupBy(?males, ?x) ^ sqwrl:makeSet(?blacks, ?y) ^ sqwrl:groupBy(?blacks, ?y) . sqwrl:difference(?diff, ?males, ?blacks) -> sqwrl:select(?x)
I'm missing something in the way the joins are made. The result works but only when one of the sets has only one element (i.e. if I try to remove blondes)
I'm using Protege 5.2 with the SWRL and SQWRL Tab 2.0.5
Thanks in advance

The issue is still pending, but the solution was to use SPARQL to achieve what I wanted.
If you need more information about the open issue. You can find it here
https://github.com/protegeproject/swrlapi/issues/43

Related

Fill in blank cells from several columns in ={ARRAYFORMULA()}

I have a human-friendly sheet with sparse hierarchical data:
SEASON | FRUIT | LETTER
-----------------------------
Winter | |
| Lemons |
| | Delta
Summer | |
| | Alpha
| | Beta
| Pears |
| | Gamma
(Note how Alpha and Beta don't have a FRUIT entry.)
I want to generate a new column, using ARRAYFORMULA(), to contain a full "path" to the LETTER:
SEASON | FRUIT | LETTER | PATH
------------------------------------
Winter | | | Winter//
| Lemons | | Winter/Lemons/
| | Delta | Winter/Lemons/Delta
Summer | | | Summer//
| | Alpha | Summer//Alpha
| | Beta | Summer//Beta
| Pears | | Summer/Pears/
| | Gamma | Summer/Pears/Gamma
Please help me to understand how to write such ARRAYFORMULA().
I'm trying approach, based on answers in Fill in blank cells in ={ARRAYFORMULA()}, but I'm stuck at resetting FRUIT to empty string for a new SEASON. I.e. this naïve implementation would yield Summer/Lemons/Alpha instead of Summer//Alpha:
={ ARRAYFORMULA(
IFERROR(VLOOKUP(ROW(SEASON), IF(SEASON<>"", { ROW(SEASON), SEASON }), 2, 1), "")
& "/" & IFERROR(VLOOKUP(ROW(FRUIT), IF(FRUIT<>"", { ROW(FRUIT), FRUIT }), 2, 1), "")
& "/" & LETTER
) }
Here is a sample spreadsheet created specifically to answer this question.
you will find this formula in cell E1 on a tab called Possible Solution.
=ARRAYFORMULA(IF(LEN(A:A&B:B&C:C),VLOOKUP(ROW(A:A),FILTER({ROW(A:A),A:A},LEN(A:A)),2,1)&"/"&VLOOKUP(ROW(B:B),FILTER({ROW(B:B),B:B},LEN(A:A&B:B)),2,1)&"/"&C:C,))
It uses the VLOOKUP(ROW(),FILTER(),[index],TRUE) technique to append the relevant parts of the path to one another.
Note the portion of the formula in the image which i believe was the crux of the trouble with the strategy you were trying...

Which Starspace training mode to use for multi-level embeddings

I am using the StarSpace embedding framework for the first time and am unclear on the "modes" that it provides for training and the differences between them.
The options are:
wordspace
sentencespace
articlespace
tagspace
docspace
pagespace
entityrelationspace/graphspace
Let's say I have a dataset that looks like this:
| Author | City | Tweet_ID | Tweet_contents |
|:-------|:-------|:----------|:-----------------------------------|
| A | NYC | 1 | "This is usually a short sentence" |
| A | LONDON | 2 | "Another short sentence" |
| B | PARIS | 3 | "Check out this cool track" |
| B | BERLIN | 4 | "I like turtles" |
| C | PARIS | 5 | "It was a dark and stormy night" |
| ... | ... | ... | ... |
(In reality, my dataset is not a language data and looks nothing like this, but this example demonstrates the point well enough.)
I would like to simultaneously create embeddings from scratch (not using pre-existing embeddings at any point) for each of the following:
Authors
Cities
Tweet/Sentences/Documents (EG. 1, 2, 3, 4, 5, etc.)
Words (EG. 'This', 'is', 'usually', ..., 'stormy', 'night', etc.)
Even after reading the coumentation, it doesn't seem clear which 'mode' of starspace training I should be using.
If anyone could help me understand how to interpret the modes to help select the appropriate one, that would be much appreciated.
I would also like to know if there are conditions under which the embeddings generated using one of the modes above, would in some way be equivalent to the embeddings built using a different mode (ignoring the fact that the embeddings would be different because of the non-determinstic nature of the process.)
Thank you

Can I order a pivot table using a second condition?

I am working with a spreadsheet where I store the books I read. The format is as follows:
A | B | C | D | E | F
year | book | author | mark | language | country of the author
With entries like:
A | B | C | D | E | F
-------------------------------------------------------------
2004 | Hamlet | Shakespeare | 8 | ES | UK
2005 | Crimen y punishment | Dostoevsky | 9 | CAT | Russia
2007 | El mundo es ansí | Baroja | 8 | ES | Spain
2011 | Dersu Uzala | Arsenyev | 8 | EN | Russia
2015 | Brothers Karamazov | Dostoevsky | 8 | ES | Russia
2019 | ... Shanti Andía | Baroja | 7 | ES | Spain
I have several pivot tablas to get different data, such as top countries, top books, etc. In one of them I want to group by authors and order by number of books I have read from each one of them.
So I defined:
ROWS
author (column C) with
order: Desc for COUNT of author
VALUES
author
summation by: COUNT
show as Default
mark
summation by: AVERAGE
show as Default
This way, the data above show like this:
author | COUNT of author | AVERAGE of mark
-------------------------------------------------------------
Baroja | 2 | 7,5
Dostoevsky | 2 | 8,5
Shakespeare | 1 | 8
Arsenyev | 1 | 8
It is fine, since it orders data having top read authors on top. However, I would also like to order also by AVERAGE of mark. This way, when COUNT of author matches, it would use AVERAGE of mark to solve the ties and put on top the one author with a better average on their books.
On my sample data, Dostoevsky would go above Baroja (8,5 > 7).
I have been looking for different options, but I could not find any without including an extra column in the pivot table.
How can I use a second option to solve the ties when the first option gives the same value?
You can achieve a customized sort order on a pivot table without any extra columns in the source range. However... you'd definately need an extra field added to the pivot.
In the Pivot table editor go to Values and add a Calculated Field.
Use any formula that describes the sort order you want. E.g. let's multiply the counter by 100 to use as first criteria:
=COUNTA(author) * 100 + AVERAGE(score)
Do notice it is important to select Summarize by your Custom formula (screenshot above).
Now, just add this new calculated field as your row's Sort by field, and you're done!
Notice though, you do get an extra column added to the pivot.
Of course, you could hide it.
Translated from my answer to the cross-posted question on es.SO.
try:
=QUERY(A2:F,
"select C,count(C),avg(D)
where A is not null
group by C
order by count(C) desc
label C'author',count(C)'COUNT of author',avg(D)'AVERAGE of mark'")

Relational Algebra Stanford Lagunitas Online Course Quiz

Problem: Compute the natural join of R and S. Which of the following tuples is in the result? Assume each tuple has schema (A,B,C,D).
Relation R
| A | C |
|---|---|
| 3 | 3 |
| 6 | 4 |
| 2 | 3 |
| 3 | 5 |
| 7 | 1 |
Relation S
| B | C | D |
|---|---|---|
| 5 | 1 | 6 |
| 1 | 5 | 8 |
| 4 | 3 | 9 |
I'm not quite sure what it means by "assume each tuple has a schema of A,B,C,D". Does this mean the R relation has a scheme of ABCD although it only lists A and C? I should assume there's also B and D but columns B and D are blank?
Operating under that assumption, I got the answer wrong. The explanation says there's no (7,5) in R which there clearly is under column A. Could someone explain to me what I'm doing wrong or if I'm missing something? Thank you!
The answer feedback is misleading and wrong, that would be the feedback if you choose (7,1,5,8)
Your answer is right.
For thoroughness: in a natural join you connect tuples on common attributes, in this case C is the attribute in common.
Your return tuples are:
R S
A,C B,C,D A,B,C,D
(7,1) & (5,1,6) = (7,5,1,6)
(3,5) & (1,5,8) = (3,1,5,8)
(2,3) & (4,3,9) = (2,4,3,9)
(3,3) & (4,3,9) = (3,4,3,9) --Your answer, correct
I even found a Stanford doc defining a natural join, just in case they lived in a different universe than the rest of us, but they don't. It's just a bug in the quiz.
The question doesn't say R has that scheme. It says the natural join of R & S has that scheme.
(There are many variations on what a relation is, what relational operators are available, how they work & what their symbols are. They are telling you to expect that the schema for the join of those two relations has columns A, B, C & D. You should already know that from the definitions in the course, but since they give it nobody should get that part wrong.)
You seem to be saying that your choice of a row in the natural join was 2. That's correct. The explanation says that a wrong choice can't be right because tuple (7,5) is not in R. They do not mean that (7,5) is a list of values "under column A". But that feedback is for choice 3, not choice 2. So the answer checking seems to have a bug. Let them know.

Entity Relation notation in text

Is there a standard (non-graphical) notation for Entity Relationships?
right now I'm using my own janky notation:
User >> Photo , (1-many)
User > Profile , (1-1 hasOne)
Profile < User , (1-1 belongsTo)
Photo << User , (many-1 belongsTo)
Photo <> Tag , (many-many)
Almost 10 years later and I've also had a hard time finding plaintext standards. Here's what I've found so far (fair warning though, it's mostly graphical standards that happen to work well in text).
First, the common term for describing the cardinality of a relationship between objects is "multiplicity".
This association relationship indicates that (at least) one of the two related classes make reference to the other. This relationship is usually described as "A has a B" (a mother cat has kittens, kittens have a mother cat).
Wikipedia
Though a considerable number of sources also use the term "cardinality".
There's a few good answers about the difference on this SO question about Multiplicity vs Cardinality. I found this one to be pretty succinct:
...a multiplicity is made up of a lower and an upper cardinality. A cardinality is how many elements are in a set. Thus, a multiplicity tells you the minimum and maximum allowed members of the set.
Jim L.
UML's Multiplicity Notation
UML's multiplicity notation works well in text.
+--------------+--------+-----------------------------------------+
| Multiplicity | Option | Cardinality |
+--------------+--------+-----------------------------------------+
| 0..0 | 0 | Collection must be empty |
| 0..1 | | No instances or one instance |
| 1..1 | 1 | Exactly one instance |
| 0..* | * | Zero or more instances |
| 1..* | | At least one instance |
| 5..5 | 5 | Exactly 5 instances |
| m..n | | At least m but no more than n instances |
+--------------+--------+-----------------------------------------+
There seem to be a few variations of this:
Microsoft's Relational Notation
+---------------------------------+---------------------+
| Multiplicity | Cardinality |
+---------------------------------+---------------------+
| * | One to zero or more |
| 1..* | One to one or more |
| 0..1 | One to zero or one |
| 1 | Exactly one |
| Two numbers separated by a dash | a range |
+---------------------------------+---------------------+
IBM's
+------+--------------------+-------------------------------+
| Rose | Software Architect | Description |
+------+--------------------+-------------------------------+
| n | * | Unlimited number of instances |
| 1 | 1 | Exactly 1 instance |
| 0..n | * | 0 or more instances |
| 1..n | 1,,* | 1 or more instances |
| 0..1 | 0..1 | 0 or 1 instances |
+------+--------------------+-------------------------------+
Smartdraw's Martin Style
Chen Style
From what I've read Chen style is the "original format". I commonly see this expressed in text as:
+----------+--------------+
| Notation | Description |
+----------+--------------+
| 1:1 | One to One |
| 1:N | One to Many |
| N:1 | Many to One |
| M:N | Many to Many |
+----------+--------------+
IDEF1X and Others
There's IDEF1x (a NIST standard):
IDEF1X is a method for designing relational databases with a syntax designed to support the semantic constructs necessary in developing a conceptual schema.
That seems to describe the Min-Max / ISO notation (the English link is currently broken but here's a German article) referenced by Wikipedia's Entity–relationship model article which also lists a few other styles of graphical notations, some of which are text-friendly.
The German language article on (min,max) notation also has a useful table comparing UML, Chen, (min,max) and MC (Modified Chen):
+----------------------+-----------------+---------------------------------+-------------+-----------------+----------------------+
| (min,max) [Entity 1] | [UML, Entity 1] | Chen-Notation | MC-Notation | [UML, Entity 2] | (min,max) [Entity 2] |
+----------------------+-----------------+---------------------------------+-------------+-----------------+----------------------+
| (0,1) | 0..1 | 1:1 | c:c | 0..1 | (0,1) |
| (0,N) | 0..1 | 1:N | c:mc | 0..* | (0,1) |
| (0,N) | 1..1 | 1:N + total participation | 1:mc | 0..* | (1,1) |
| (0,N) | 0..* | M:N | mc:mc | 0..* | (0,N) |
| (1,1) | 0..1 | total participation + 1:1 | c:1 | 1..1 | (0,1) |
| (1,N) | 0..1 | total participation + 1:N | c:m | 1..* | (0,1) |
| (1,1) | 1..1 | total part. + 1:1 + total part. | 1:1 | 1..1 | (1,1) |
| (1,N) | 1..1 | total part. + 1:N + total part. | 1:m | 1..* | (1,1) |
| (1,N) | 0..* | total participation + M:N | mc:m | 1..* | (0,N) |
| (1,N) | 1..* | total part. + M:N + total part. | m:m | 1..* | (1,N) |
+----------------------+-----------------+---------------------------------+-------------+-----------------+----------------------+
Why not use the same than in ER-Diagramms:
User 1-n Photos
User 1-1 Profile
Photo n-1 User
and so on. But I never heard of an official plaintext standart.
There is software available that transforms plain text descriptions into visual ER diagrams.
For instance erd uses the following notation:
Cardinality Syntax
0 or 1 ?
exactly 1 1
0 or more *
1 or more +
Examples:
Person *--1 `Birth Place`
Artist +--? PlatinumAlbums
Check also this list of similar tools.
However, none of these could be called a standard.

Resources