Entity Relation notation in text - entity-relationship

Is there a standard (non-graphical) notation for Entity Relationships?
right now I'm using my own janky notation:
User >> Photo , (1-many)
User > Profile , (1-1 hasOne)
Profile < User , (1-1 belongsTo)
Photo << User , (many-1 belongsTo)
Photo <> Tag , (many-many)

Almost 10 years later and I've also had a hard time finding plaintext standards. Here's what I've found so far (fair warning though, it's mostly graphical standards that happen to work well in text).
First, the common term for describing the cardinality of a relationship between objects is "multiplicity".
This association relationship indicates that (at least) one of the two related classes make reference to the other. This relationship is usually described as "A has a B" (a mother cat has kittens, kittens have a mother cat).
Wikipedia
Though a considerable number of sources also use the term "cardinality".
There's a few good answers about the difference on this SO question about Multiplicity vs Cardinality. I found this one to be pretty succinct:
...a multiplicity is made up of a lower and an upper cardinality. A cardinality is how many elements are in a set. Thus, a multiplicity tells you the minimum and maximum allowed members of the set.
Jim L.
UML's Multiplicity Notation
UML's multiplicity notation works well in text.
+--------------+--------+-----------------------------------------+
| Multiplicity | Option | Cardinality |
+--------------+--------+-----------------------------------------+
| 0..0 | 0 | Collection must be empty |
| 0..1 | | No instances or one instance |
| 1..1 | 1 | Exactly one instance |
| 0..* | * | Zero or more instances |
| 1..* | | At least one instance |
| 5..5 | 5 | Exactly 5 instances |
| m..n | | At least m but no more than n instances |
+--------------+--------+-----------------------------------------+
There seem to be a few variations of this:
Microsoft's Relational Notation
+---------------------------------+---------------------+
| Multiplicity | Cardinality |
+---------------------------------+---------------------+
| * | One to zero or more |
| 1..* | One to one or more |
| 0..1 | One to zero or one |
| 1 | Exactly one |
| Two numbers separated by a dash | a range |
+---------------------------------+---------------------+
IBM's
+------+--------------------+-------------------------------+
| Rose | Software Architect | Description |
+------+--------------------+-------------------------------+
| n | * | Unlimited number of instances |
| 1 | 1 | Exactly 1 instance |
| 0..n | * | 0 or more instances |
| 1..n | 1,,* | 1 or more instances |
| 0..1 | 0..1 | 0 or 1 instances |
+------+--------------------+-------------------------------+
Smartdraw's Martin Style
Chen Style
From what I've read Chen style is the "original format". I commonly see this expressed in text as:
+----------+--------------+
| Notation | Description |
+----------+--------------+
| 1:1 | One to One |
| 1:N | One to Many |
| N:1 | Many to One |
| M:N | Many to Many |
+----------+--------------+
IDEF1X and Others
There's IDEF1x (a NIST standard):
IDEF1X is a method for designing relational databases with a syntax designed to support the semantic constructs necessary in developing a conceptual schema.
That seems to describe the Min-Max / ISO notation (the English link is currently broken but here's a German article) referenced by Wikipedia's Entity–relationship model article which also lists a few other styles of graphical notations, some of which are text-friendly.
The German language article on (min,max) notation also has a useful table comparing UML, Chen, (min,max) and MC (Modified Chen):
+----------------------+-----------------+---------------------------------+-------------+-----------------+----------------------+
| (min,max) [Entity 1] | [UML, Entity 1] | Chen-Notation | MC-Notation | [UML, Entity 2] | (min,max) [Entity 2] |
+----------------------+-----------------+---------------------------------+-------------+-----------------+----------------------+
| (0,1) | 0..1 | 1:1 | c:c | 0..1 | (0,1) |
| (0,N) | 0..1 | 1:N | c:mc | 0..* | (0,1) |
| (0,N) | 1..1 | 1:N + total participation | 1:mc | 0..* | (1,1) |
| (0,N) | 0..* | M:N | mc:mc | 0..* | (0,N) |
| (1,1) | 0..1 | total participation + 1:1 | c:1 | 1..1 | (0,1) |
| (1,N) | 0..1 | total participation + 1:N | c:m | 1..* | (0,1) |
| (1,1) | 1..1 | total part. + 1:1 + total part. | 1:1 | 1..1 | (1,1) |
| (1,N) | 1..1 | total part. + 1:N + total part. | 1:m | 1..* | (1,1) |
| (1,N) | 0..* | total participation + M:N | mc:m | 1..* | (0,N) |
| (1,N) | 1..* | total part. + M:N + total part. | m:m | 1..* | (1,N) |
+----------------------+-----------------+---------------------------------+-------------+-----------------+----------------------+

Why not use the same than in ER-Diagramms:
User 1-n Photos
User 1-1 Profile
Photo n-1 User
and so on. But I never heard of an official plaintext standart.

There is software available that transforms plain text descriptions into visual ER diagrams.
For instance erd uses the following notation:
Cardinality Syntax
0 or 1 ?
exactly 1 1
0 or more *
1 or more +
Examples:
Person *--1 `Birth Place`
Artist +--? PlatinumAlbums
Check also this list of similar tools.
However, none of these could be called a standard.

Related

how to create relationship using cypher

I have been learning neo4j/cypher for the last week. I have finally been able to upload two csv files and create a relationship,"captured". However, I am not fully confident in my understanding of the code as I was following the tutorial on the neo4j site. Could you please help me confirm what I did is correct.
I have two csv files, a "cap.csv" and a "survey.csv". The survey table contains data of each unique survey conducted at the survey sites. the cap table contains data of each unique organisms captured. In the cap table I have a foreign key, "survey_id", which in the Postgres db you would join to the p.key in the survey table.
I want to create a relationship, "captured", showing each unique organsism that was captured based on the "date" column in the survey table.
Survey table
| lake_id | date |survey_id | duration |
| -------- | -------------- | --| --
| 1 | 05/27/14 |1 | 7 |
| 2 | 03/28/13 | 2|10 |
| 2 | 06/29/19 | 3|23 |
| 3 | 08/21/21 | 4|54 |
| 1 | 07/23/18 | 5|23 |
| 2 | 07/22/23 | 6|12 |
Capture table
| cap_id | species |capture_life_stage | weight | survey_id |
| -------- | -------------- | --| -----|---|
| 1 | a |adult | 10 | 1|
| 2 | a | adult|10 | 2 |
| 3 | b | juv|23 | 3 |
| 4 | a | adult|54 | 4 |
| 5 | b | juv|23 | 5 |
| 6 | c | juv |12 | 6 |
LOAD CSV WITH HEADERS FROM 'file:///cap.csv' AS row
WITH
row.id as id,
row.species as species,
row.capture_life_stage as capture_life_stage,
toInteger(row.weight) as weight,
row.survey_id as survey_id
MATCH (c:cap {id: id})
MERGE (s) - [rel:captured {survey_id: survey_id}] ->(c)
return count(rel)
I am struggling to understand the code I wrote above. I followed the neo4j tutorial exactly but used my data (https://neo4j.com/developer/desktop-csv-import/).
I am fairly confident from data checks, but did the above code create the "captured" relationship showing each unique organism captured on that unique survey date? Based on the visual I can see I believe it did but I don't fully understand each step in the code.
What is the purpose of the MATCH (c:cap {id: id}) in the code?
The code below
MATCH (c:cap {id: id})
is the same as
MATCH (c:cap)
Where c.id = id
It is a shorter way of finding Captured node based on id and then you are creating a relationship with Survey node.
Question: s is not defined in your query. Where is it?

Filter out all of user's entries if one of them was selected

I am trying to write a formula that will allow you to put up to 3 values into a pool (per person) and then when one of those values is selected for use all 3 potential values are removed from the pool.
My current formula is long but fairly simple in that it takes the values from each user and adds/subtracts them based on whether they are being used.
=(SUMIF($I$2:$I$50, $A24, $L$2:$L$50) + SUMIF($J$2:$J$50, $A24, $L$2:$L$50) + SUMIF($K$2:$K$50, $A24, $L$2:$L$50)) - (SUMIF($A$2:$A$20, $A24, $L$2:$L$20) + SUMIF($B$2:$B$20, $A24, $L$2:$L$20) + SUMIF($C$2:$C$20, $A24, $L$2:$L$20) + SUMIF($D$2:$D$20, $A24, $L$2:$L$20) + SUMIF($E$2:$E$20, $A24, $L$2:$L$20))
The limiting factor here is that each user is adding a potential of 3 values to the pool but when one is selected it is only subtracting that from the pool and then leaving the other 2 which invalidates the pool as a whole.
For reference I have copied the contents over to the following sheet
https://docs.google.com/spreadsheets/d/1tbL1WuaUoR8JM8cndTBiKnoaLm-UxCaddpSOJbCVmiM/edit?usp=sharing
Does anyone have an idea for how to properly remove all available options from the pool rather than just the one that was selected? I have a few ideas from how to possibly make it work with code but I am trying to make it an automated formula rather than something that needs to be specifically "run" to calculate it.
You need some marker against each user to indicate whether their entries are still eligible for taking. Here is an example: column Eligible (L) is True if we haven't chosen from that user yet. Column Selected (N) is filled manually, by choosing from what is available in column Available (O). The formulas are:
For Eligible column (second row shown):
=and(isna(match(I2, N$2:N, 0)), isna(match(J2, N$2:N, 0)), isna(match(K2, N$2:N, 0)))
which says that neither I2, J2, or K2 match anything in N.
And in column O, only one array formula is needed:
={filter(I2:I, L2:L); filter(J2:J, L2:L); filter(K2:K, L2:L)}
which filters out non-eligible users and stacks the foods in one column using array notation {}.
+--------+---------+--------+------------+-----------+--+----------+------------+
| User | Food 1 | Food 2 | Food 3 | Eligible | | Selected | Available |
+--------+---------+--------+------------+-----------+--+----------+------------+
| User A | Chicken | Pear | Watermelon | TRUE | | Grape | Chicken |
| User B | Garlic | Grape | Rice | FALSE | | Beef | Potato |
| User C | Beef | Corn | Salt | FALSE | | | Pear |
| User D | Potato | Pepper | Orange | TRUE | | | Pepper |
| | | | | | | | Watermelon |
| | | | | | | | Orange |
+--------+---------+--------+------------+-----------+--+----------+------------+

Rails Many to many relationships with connecting or cloning two table with references?

I'm a new with Rails and I'm having trouble with some types of associations that seem a bit more complex than the ones I've been exposed to so far.
Zombie_users Body_parts_status Body_parts
| id | name | | id | user_id | body_part_id | recovery | | id | name |
|-----------| --> |----------------------------------------| --> |---------------|
| 1 | Joe | | 1 | 1 | 2 | 10% | | 1 | left leg |
| 2 | Max | | 2 | 1 | 3 | 43% | | 2 | brain |
| 3 | hair |
| 4 | blue eye |
Zobmie_users Recovery_tools Body_parts_impacts
| id | name | | id |user_id| name | | id|recovery_tool_id| body_part_id | impact |
|-----------|-->|-------------------|-->|--------------------------------------------|
| 1 | Joe | | 1 | 1 |hammer| | 1 | 1 | 2 | 10% |
| 2 | Max | | 2 | 1 |magic | | 2 | 2 | 3 | 43% |
graphic illustration of the needed functionality
We have users and a list of body parts.
I need that the users will be able to create recovery tools with which they can through Body Parts impact recover their body parts status :)
and be able to check what part of the body still need to be fixed(compared to the list) and what body parts they have already corrected.
My problem is that I do not know how to implement such connections.
because I need to have some kind of clone of the body parts to body parts status for each user.
But how I reference it so it also works with Body Parts impacts
I do not have even a concept of where to start :)
body parts table is just a long listing of all the parts of the human body
and each user should have their own "copy" of all these parts.

Neo4j - count very slow

I am running this query (bisac_code is uniquely indexed).
Execution time is more than 2.5 minutes.
52 main codes are selected from almost 4000 in total.
The total number of wokas is very large, 19 million nodes.
Are there any possibilities to make it run faster?
neo4j-sh (?)$ MATCH (b:Bisac)-[r:INCLUDED_IN]-(w:Woka)
> WHERE (b.bisac_code =~ '.*000000')
> RETURN b.bisac_code as bisac_code, count(w) as wokas_count
> ORDER BY b.bisac_code
> ;
+---------------------------+
| bisac_code | wokas_count |
+---------------------------+
| "ANT000000" | 13865 |
| "ARC000000" | 32905 |
| "ART000000" | 79600 |
| "BIB000000" | 2043 |
| "BIO000000" | 256082 |
| "BUS000000" | 226173 |
| "CGN000000" | 16424 |
| "CKB000000" | 26410 |
| "COM000000" | 44922 |
| "CRA000000" | 18720 |
| "DES000000" | 2713 |
| "DRA000000" | 62610 |
| "EDU000000" | 228182 |
| "FAM000000" | 42951 |
| "FIC000000" | 474004 |
| "FOR000000" | 41999 |
| "GAM000000" | 8803 |
| "GAR000000" | 37844 |
| "HEA000000" | 36939 |
| "HIS000000" | 3908869 |
| "HOM000000" | 5123 |
| "HUM000000" | 29270 |
| "JNF000000" | 40396 |
| "JUV000000" | 200144 |
| "LAN000000" | 89059 |
| "LAW000000" | 153138 |
| "LCO000000" | 1528237 |
| "LIT000000" | 89611 |
| "MAT000000" | 58134 |
| "MED000000" | 80268 |
| "MUS000000" | 75997 |
| "NAT000000" | 35991 |
| "NON000000" | 107513 |
| "OCC000000" | 42134 |
| "PER000000" | 26989 |
| "PET000000" | 4980 |
| "PHI000000" | 72069 |
| "PHO000000" | 8546 |
| "POE000000" | 104609 |
| "POL000000" | 309153 |
| "PSY000000" | 55710 |
| "REF000000" | 96477 |
| "REL000000" | 133619 |
| "SCI000000" | 86017 |
| "SEL000000" | 40901 |
| "SOC000000" | 292713 |
| "SPO000000" | 172284 |
| "STU000000" | 10508 |
| "TEC000000" | 77459 |
| "TRA000000" | 9093 |
| "TRU000000" | 12041 |
| "TRV000000" | 27706 |
+---------------------------+
52 rows
198310 ms
And the response time is not consistent.
After a while drops to less than half of a minute.
52 rows
31207 ms
In Neo4j 2.3 there will be index support for prefix LIKE searches but probably not for postfix ones.
There are two ways of making #user2194039's solution faster:
Use path expression to count the Woka per Bisac:
MATCH (b:Bisac) WHERE (b.bisac_code =~ '.*000000')
WITH b, size((b)-[:INCLUDED_IN]->()) as wokas_count
RETURN b.bisac_code as bisac_code, wokas_count
ORDER BY b.bisac_code
Mark the Bisac's with that pattern with a label
MATCH (b:Bisac) WHERE (b.bisac_code =~ '.*000000') SET b:Main;
MATCH (b:Main:Bisac)
WITH b, size((b)-[:INCLUDED_IN]->()) as wokas_count
RETURN b.bisac_code as bisac_code, wokas_count
ORDER BY b.bisac_code;
The slow speed is caused by your regular expression pattern matching (=~ ). Although your bisac_code is indexed, the regex match causes the index to be ineffective. The index only works when you are matching full bisac_code values.
Cypher does include some string manipulation facilities that might let you get by without using a regex =~, but I doubt it would make any difference, because the index will still be useless.
I might suggest considering if you can further categorize your bisac_codes so that you do not need to do a pattern match. Maybe an extra indexed property that somehow denotes those codes that end in 000000?
If you do not want to add properties, you may try matching only the Bisacs first, and then including the Wokas. Something like this:
MATCH (b:Bisac) WHERE (b.bisac_code =~ '.*000000')
WITH b
MATCH (b)-[r:INCLUDED_IN]-(w:Woka)
RETURN b.bisac_code as bisac_code, count(w) as wokas_count
ORDER BY b.bisac_code
This may help Cypher stick to the 4000 Bisac nodes while doing the pattern match, before getting involved with all 19 million Woka nodes, but I am not sure if this will make a material difference. Even slogging through 4000 nodes (effectively without an index) is a slow process.
Hash Tables in Database Indexing
The reason that your index is ineffective for regex pattern matching is that Neo4j likely uses a hash table for indexing properties. This is common of many databases. Wikipedia has an article here.
The basics though are that the index is not storing all of the properties that you want to search through. It is storing values that represent the properties you want to search through, and the representation is only valid for the whole property. If you are searching for only a part of the property value, the hashes stored in the index are useless, and the database must search through the properties the old-fashioned way -- one by one.
Edit re: your edit
The improvement in response time after running this query multiple times is certainly due to caching. Neo4j is remembering that you access the Bisac nodes and bisac_code properties frequently, and is keeping them in memory. This makes future queries faster because the values do not need to be read off disk.
However, eventually, those nodes a properties will likely be dropped from the cache, as Neo4j finds you manipulating different nodes, which it will cache instead. There are only so many nodes Neo4j can cache before running out of memory, so it picks the most recent and/or frequently used data.

Ideal solution for the following case scenario in database

There are 50 exams to be written by around millions of students online, One person may or may not write more than one exam. A person can also write a single exam more than one time ( retries ) ..
So which of the below solution is better for this case, I am okay with a better solution than these two as well
Option 1. Store each exam in a single table :
Subject 1
+----------------+---------+
| student id | Marks |
+----------------+---------+
| 1 | 85 |
| 2 | 32 |
| 2 | 60 |
+----------------+---------+
Subject 2
+----------------+---------+
| student id | Marks |
+----------------+---------+
| 1 | 85 |
| 2 | 32 |
| 2 | 60 |
+----------------+---------+
Like above with each table will have the student id only if that particular person has taken that exam , and have multiple occurrences of the student id if he has taken it more than once.
Option 2 :
+----------------+---------+---------+
| student id | Subject | Marks |
+----------------+---------+---------+
| 1 | Subj1 | 85 |
| 2 | Subj1 | 32 |
| 2 | Subj1 | 60 |
| 1 | Subj2 | 80 |
| 3 | Subj2 | 90 |
+----------------+---------+---------+
with all the values in a single table.
Which is better in terms of performance and storage perspective.
My various que
I think the best here is following:
Table STUDENT with information about students
Table EXAM with information about exams
Table EXAM_TRY with reference to STUDENT and EXAM tables, and fields DATE_OF_EXAM and RESULT_OF_EXAM
2 indexes on foreign keys in table EXAM_TRY
Depending on situation - index on date field (for example, you would need it for planning work for examiners)

Resources