Select rows with "one of each" in relational algebra

Select rows with "one of each" in relational algebra - join

Say I have a Personstable with attributes {name, pet}. How do I select the names of people where they have one of each kind of pet (dog, cat, bird), but a person only has one of each kind of pet if they pet is in the table.
Example: Bob, Dog and Bob, Cat are the only rows in the table. Therefore, Bob has one of each kind of pet. But the moment Lynda, Bird are added, Bob doesn't have one of each type of pet anymore.
I think the first step to this is to π(pet). You get a list of all kinds of pets since relational algebra removes duplicates. Not sure what to do after this, but I have think I need to join π(pet) and Persons.
I've tried a few things like Natural Join and Cross products but I haven't arrived at a result yet and I'm out of ideas.

The answer to the question can be found with the Division operator:
Persons ÷ πpet(Persons)
This relational algebra expression returns a relation with only the column name, containing all the names of the persons that have all the different kind of pets currently present in the Persons table itself.
The division is an operator that, in some sense, is the inverse of the product operator (the name is derived exactly from this fact). It is a derived operator that can be defined in terms of projection, set difference and product (see for instance this answer).

Related

Relational Algebra Join - necessary to rename?

Let's assume i have some 2 simple tables:
IMPORTANT: This is about relational algebra, not SQL.
Band table:
band_name founded
Gambo 1975
John. 1342
Album table:
album_name band_name
Celsius. Gambo
Trambo Gambo
Now, since the Band and the Album table share the same column name "band_name", would it be necessary to rename it when i would join them?
As far as i know, the join eliminates the duplicate entry that is shared amongst the join. This example, where i simply pick all Bands that are existing in the Album table (obviously just 'Gambo' in this giving example)
Πfounded, band_name(Band ⋈ Album)
should therefore work fine, right? Can somebody confirm?

(Have to enter a caveat that there are many variants of Relational Algebra; that they differ in semantics; and they differ in syntax. Assuming you intend a variant similar to that in wikipedia ...)
Yes that expression should work fine. The natural join operator ⋈ matches same-named attributes between its two operands. So the subexpression Band ⋈ Album produces a result with attributes {band_name, founded, album_name}. Your expression projects two of those.
Note the attributes for a relation value are a set not a sequence; therefore any operation over relation operands with same-named attributes must match attributes.
In contrast, Cartesian Product × requires its operands to have disjoint attribute names. Then Band × Album is ill-formed and would be rejected. (So you'd need to Rename band_name in one of them, to get relations that could be operands.)
I'm not all that happy with your way of putting it "the join eliminates the duplicate entry that is shared amongst the join." Because only in SQL do you get a duplicate (from SELECT * FROM Band, Album ... -- which results in a table with four columns, of which two are named band_name). SQL FROM list of tables is a botch-up: neither join nor Cartesian Product, but something trying to be both, and succeeding only in being neither. RA's ⋈ never produces a "duplicate" so never does it "eliminate" anything.
Particularly if there's Keys declared and a Foreign Key constraint (from Album's band_name to Band's) I see those as identifying the same band, then the natural operation is to bring together that which has been taken apart, so the name 'Natural Join'.

Entity Relationship Diagram: How to create a Yelp-kind of app with not just one price-range?

Im new to Rails and I'm in the middle of sketching up an ERD for my new app. A Yelp-sort of app, where a Client is sorted by price.
So I want one Client to have many priceranges - One Client can both have pricerange $ and Pricerange $$$$ for example. The priceranges are:
$ - $$ - $$$ - $$$$ - $$$$$
How would this look in a table? Would I create a table called PriceRange with Range1, Range2, Range3, Range4, Range5 to be booleans?
Doesn't the PriceRange-table need any foreign/primary keys?
PriceRange
Range1 (Boolean)
Range2 (Boolean)
Range3 (Boolean)
Range4 (Boolean)
Range5 (Boolean)

Look, I'm Brazilian and I'm not very knowledgeable about yelp applications. I do not quite know what it is, but from what I saw, they are systems to assess/measure/evaluate (perhaps the translation is wrong here for you) things, in this case, companies, right?
Following this logic, let's think...
By the description of your problem (context), you have clients (companies), and they can have price ranges, correct? If:
A price interval is represented by textual names, such as "$", "$$",
and so on,
and the same price range may have (numeric) values for different companies,
And the same price range (type) can be (or not) assigned to different
companies,
Then here is what we have:
By decomposing this conceptual model, you would end up with three tables:
Companies
Price Ranges
Price Ranges from Companies
The primary keys of Company and Price Ranges will be passed to Price Ranges from Companies as foreign keys. You can use them as a composite primary key, or use a surrogate key. If using a surrogate key, you will permit/allow a company to have the same kind of price range more than once, which I believe is not the case.
Let's look at another situation, if things are simpler as:
If there is no need to store prices,
and an company may have or not one or more price ranges represented by "$", "$$", and so on,
Then here is what we have:
Similarly, we'll have the same 3 tables. Likewise, you still must pass the primary keys of Companies and Price Ranges to Price Ranges from Companies as foreign keys.
So I want one Client to have many priceranges - One Client can both
have pricerange $ and Pricerange $$$$ for example
Notice how N-N relationships allow us to create optional relationships between entities. This will allow a company to have zero, one, two, (etc.) or all price ranges defined. Again, so that is not allowed a company to have a price range more than once, set the foreign keys as composite primary key in Price Ranges from Companies.
If you have any questions or anything I explained has nothing to do with your context, please do not hesitate to comment.
EDIT
Is the Price ranges from companies what is called a Joint table?
Yes. There are also other terms used, some in different areas of computer science, such as Link Table, or Intermediate Table.
Actually we do not have a table here in the diagram, but an entity. In the Conceptual Model there are no tables, but entities and relationships. Be careful with this terminology when developing the Conceptual Model, or else you may get confused (I say this from experience).
However, yes, once decomposed, we will have a table from this relationship. When decomposed, N-N relationships will always become tables, no exception. Differently, 1-1 and 1-N (or N-1) relationships do not become tables. These tables with these special names (Join/Link/Intermediate Tables) serves to associate records from different tables, hence the name.
And is it necessary to have a column called Price Range Id? I mean
what is it there for?
At where? If you say at the Price Ranges entity, it is rather necessary. Must We not identify records in a table in some way? Here I set what is called a Surrogate Key. If on the other hand, you have a column with unique values for each record in the table, you can also use this column. I highly recommend that you consider the use of surrogate keys. Read the link I gave you.
In the Conceptual Model, we have to define the properties and also the primary keys. During the phase of the conceptual model, natural attributes of entities can become primary keys if you so desire. In this case, we have what is called a Natural Key.
If on the other hand you refer to Price Ranges from Companies entity, so the question is another ("And is it necessary to have a column called Price Range Id?"). Here we have a table with two columns, as I told you. The two are foreign keys. You need it so you can relate rows from the two tables... I think you were not referring to that, is not it? If so, no problem, you can comment and ask more questions. I do not care to answer. To be honest, I did not quite understand your question.
EDIT 2
So that Company 28 can be identified in the Price Ranges (for instance
ID 40) Which would make it easier to call out the price ranges it has?
Maybe my English is not very good, but it seems to me that you have a beginner's doubt/question in relation to the concept of tables and relationships between them. If not that, I apologize because maybe I did not understand. But let's see...
The tables in a database have rows / records. Each line has its own data. Even with this, each line / record needs to be differentiated and identified somehow. That is why we attach to each line an identifier, known as the primary key (this, and this). In summary, the primary key is how we identify, differentiate, separate and organize different records.
Even if all records have different values, you must select a field (column) that represents the primary key of the table. By obligation, every record MUST have a primary key. Although you can choose which field is a primary key, you are allowed to choose one or more fields to serve as the primary key. When this happens, that is, when more than one field participates/serves as the primary key, we have a table with something called Composite Primary Key. Similarly, it has the ability to identify records. Note that, because of that, primary key values must be unique, otherwise you may have 2 identical records.
This is the basic concept so that we can relate tables to each other, in case, records/rows of tables together. If we have a Company identified by the ID 28 (a line/record), and we want to relate it to a Price Range identified by the ID 40, then we need to store somewhere that relationship (28 <--> 40). This is where the role of intermediate/link/join tables comes in (but only to relationships N-N! For 1-N or N-1 relationships it works similarly, but not identical).
My original question was whether it was necessary, and why a company
ID had to link up with a price range ID at all.
With this table storing records which relates to other records (for their primary keys), we can perform a SQL join operation (If you have questions about this, see this image). Depending on how you perform this operation, you'll get:
All companies that have Price Ranges.
All companies that do not have Price Ranges.
All the Price Ranges of a given company.
All companies that have or not a X Price Range.
All price ranges that are given or not to companies.
...
Anyway, you get all this because of the established relationship.
If it could just be taken out and then the table of price ranges would
only involve Pricerange1-5.
This sentence I did not understand. What should be taken out? Could you please explain this sentence better?

Matching students to courses with course limit (Hungarian, Max Flow, Min-Cost-Flow, ...)

I am currently writing a program which maps students to courses. Currently, I am using a SAT-Solver, but I am trying to implement a polynomial time / non greedy algorithm which solves the following sub-problem:
There are students (50-150)
There are subjects (10-20), e.g. 'math', 'biology', 'art'
There are courses per subject (at least one), e.g. 'math-1', 'math-2', 'biology-1', 'art-1', 'art-2', 'art-3'
A student selects some (fixed) subjects (10-12) and for each subject the student has to be assigned to exactly one of the existing courses (if possible). It does not matter which course 'math-1' or 'math-2' is being selected.
The courses have a maximum number of allowed students (20-34)
Each course is in a fixed block (= timeslot 1 to 13)
A student may not be assigned to courses being in the same block
I am now describing what I have done so far.
(1) Ignoring the course-student-limit
I was able to solve this with the hungarian algorithm / bipartite matching. Each student may be computed individually by modelling it as following:
left nodes represent the subjects 'math', 'biology', 'art' (of the student)
right nodes represent the blocks '1', '2', .... '13'
an edge is inserted for each course from 'subject' to 'block'
This way the student is assigned for every subject to a course while not attending courses which are in the same block. But course-limits are ignored.
(2) Ignoring the selected subjects of the student
I was able to solve this with a max-flow-algorithm. For each student the following is modelled:
Layer 1: From source to each student with a flow of 13
Layer 2: From each student to his/her personal block with a flow of 1
Layer 3: From each student-block to each course in that block with flow 1
Layer 4: From each course to the sink with 'max-student-limit'
This way the student selects arbitrary courses and the course-limit is fullfilled. But he/she may be unlucky and be assigned to 'math-1', 'math-2' and 'math-3' ignoring the subjects 'biology' and 'art'.
(3) Greedy Hungarian
Another idea I had was to match one student at a time with the hungarian algorithm and adjusting the weights so that 'more empty courses' are preferred. For example one could model:
left nodes are subjects of the student
right nodes are blocks
for each course insert an edge from subject to the block of the course with weight = number of free seats
And then computing a Maximum-Weight-Matching.
I would really appreciate any suggestions / help.
Thank you!

Relational Algebra "Only Once" or "Exists once"

So I have 2 relations
Student = {student id, name, address}
Course = {course no, title, subject}
Completed = {course no, student id, grade, semester}
and I want to display the name of students who have COMPLETED only one COURSE of "Physics" (Which is a subject)
I dont have problems joining the tables to get the data together, my problem is with how to get values that appear only once?
What I have so far
PICourse_no (σ Subject=´Physics´(COURSE))
That gets me all the course numbers that are Physics related
PIStudent_Id(σCourseNo= (PICourse_no (σ Subject=´Physics´(COURSE))))
And with that I think I'm getting the Id's of all the students who study a physics related course...but now here is my problem, how do I remove the students who have MORE than one physics related course?

"how do I remove the students who have MORE than one physics related course?"
That is done by the relational MINUS operator or one of its nephews (sometimes known as antijoin). As indicated in the comments, there are a major number of distinct sets of operators all called "relational algebra". You really have to look into which one you are supposed to be using.

A better example of ternary relationship

In SQL you can describe a binary relation with a table like
Husband | Wife
We know that an husband can have only one wife, and viceversa, so that's a 1:1 relationship, and you can specify costraints such that if you add an husband that is already in the table you get an error, right?
If you add a third column like this
Husband | Wife | Country
We know that in some country one husband can have many wives; now you cannot put easy costraints, you have to deal with the third column.
So from a binary relation we get a ternary relation with different behavior that depends on the third column.
This example is stupid and useless, do you know any other example?
(other example of ternary relationship such that one of the column changes the tuple behavior?)
Thank you.
EDIT:
Another point of view to see my problem:
You have any binary relationship, within a domain: do you know any binary relation that changes costraints (or behavior) as domain changes?

Another example might be that you can apply coupons towards an order, but for certain coupon types you can only apply one per order whereas other coupon types may be combined.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart