Matching students to courses with course limit (Hungarian, Max Flow, Min-Cost-Flow, ...) - graph-algorithm

I am currently writing a program which maps students to courses. Currently, I am using a SAT-Solver, but I am trying to implement a polynomial time / non greedy algorithm which solves the following sub-problem:
There are students (50-150)
There are subjects (10-20), e.g. 'math', 'biology', 'art'
There are courses per subject (at least one), e.g. 'math-1', 'math-2', 'biology-1', 'art-1', 'art-2', 'art-3'
A student selects some (fixed) subjects (10-12) and for each subject the student has to be assigned to exactly one of the existing courses (if possible). It does not matter which course 'math-1' or 'math-2' is being selected.
The courses have a maximum number of allowed students (20-34)
Each course is in a fixed block (= timeslot 1 to 13)
A student may not be assigned to courses being in the same block
I am now describing what I have done so far.
(1) Ignoring the course-student-limit
I was able to solve this with the hungarian algorithm / bipartite matching. Each student may be computed individually by modelling it as following:
left nodes represent the subjects 'math', 'biology', 'art' (of the student)
right nodes represent the blocks '1', '2', .... '13'
an edge is inserted for each course from 'subject' to 'block'
This way the student is assigned for every subject to a course while not attending courses which are in the same block. But course-limits are ignored.
(2) Ignoring the selected subjects of the student
I was able to solve this with a max-flow-algorithm. For each student the following is modelled:
Layer 1: From source to each student with a flow of 13
Layer 2: From each student to his/her personal block with a flow of 1
Layer 3: From each student-block to each course in that block with flow 1
Layer 4: From each course to the sink with 'max-student-limit'
This way the student selects arbitrary courses and the course-limit is fullfilled. But he/she may be unlucky and be assigned to 'math-1', 'math-2' and 'math-3' ignoring the subjects 'biology' and 'art'.
(3) Greedy Hungarian
Another idea I had was to match one student at a time with the hungarian algorithm and adjusting the weights so that 'more empty courses' are preferred. For example one could model:
left nodes are subjects of the student
right nodes are blocks
for each course insert an edge from subject to the block of the course with weight = number of free seats
And then computing a Maximum-Weight-Matching.
I would really appreciate any suggestions / help.
Thank you!

Related

Select rows with "one of each" in relational algebra

Say I have a Personstable with attributes {name, pet}. How do I select the names of people where they have one of each kind of pet (dog, cat, bird), but a person only has one of each kind of pet if they pet is in the table.
Example: Bob, Dog and Bob, Cat are the only rows in the table. Therefore, Bob has one of each kind of pet. But the moment Lynda, Bird are added, Bob doesn't have one of each type of pet anymore.
I think the first step to this is to π(pet). You get a list of all kinds of pets since relational algebra removes duplicates. Not sure what to do after this, but I have think I need to join π(pet) and Persons.
I've tried a few things like Natural Join and Cross products but I haven't arrived at a result yet and I'm out of ideas.
The answer to the question can be found with the Division operator:
Persons ÷ πpet(Persons)
This relational algebra expression returns a relation with only the column name, containing all the names of the persons that have all the different kind of pets currently present in the Persons table itself.
The division is an operator that, in some sense, is the inverse of the product operator (the name is derived exactly from this fact). It is a derived operator that can be defined in terms of projection, set difference and product (see for instance this answer).

Relational Algebra "Only Once" or "Exists once"

So I have 2 relations
Student = {student id, name, address}
Course = {course no, title, subject}
Completed = {course no, student id, grade, semester}
and I want to display the name of students who have COMPLETED only one COURSE of "Physics" (Which is a subject)
I dont have problems joining the tables to get the data together, my problem is with how to get values that appear only once?
What I have so far
PICourse_no (σ Subject=´Physics´(COURSE))
That gets me all the course numbers that are Physics related
PIStudent_Id(σCourseNo= (PICourse_no (σ Subject=´Physics´(COURSE))))
And with that I think I'm getting the Id's of all the students who study a physics related course...but now here is my problem, how do I remove the students who have MORE than one physics related course?
"how do I remove the students who have MORE than one physics related course?"
That is done by the relational MINUS operator or one of its nephews (sometimes known as antijoin). As indicated in the comments, there are a major number of distinct sets of operators all called "relational algebra". You really have to look into which one you are supposed to be using.

Neo4J Grandchild Relationships Linked to Nodes

I'm trying to model contractor relationships in Neo4J and I'm struggling with how to conceptualize subcontracts. I have nodes for Government Agencies (label: Agency) and Contractors (label:Company). Each of these have geospatial Office nodes with the HAS_OFFICE relationship. I'm thinking of creating a node that represents a Government Contract (label: Contract).
Here's what I'm struggling with: A Contract has a Government Agency (I'm thinking this is a "HAS CONTRACT" relationship) and one or more prime contractor(s) (I'm thinking this is a "PRIME" relationship). Here's where it gets complicated. Each of those primes contractors can have subcontractors under the prime contract only. Graphically, this is:
(Agency) -[HAS_CONTRACT]-> (Contract) -[PRIME]-> (Company 1) -[SUB]-> (Company 2)
The problem I'm struggling with is that the [SUB] relationship is only for certain contracts -- not all. For example:
Agency 1 -HAS-> Contract ABC -P-> Company 1 -S-> Company 2
Agency 1 -HAS-> Contract ABC -P-> Company 3 -S-> Company 4
Agency 2 -HAS-> Contract XYZ -P-> Company 1
Agency 2 -HAS-> Contract XYZ -P-> Company 4 -S-> Company 2
I want some way to search on that so I can ask cypher questions like "Find ways Agency 2 can put money on contract with Company 2." It should come back with the XYZ contract through Company 4, and NOT the XYZ contract through Company 1.
It seems like maybe storing and filtering on data within the relationship would work, but I'm struggling with how. Can I say Prima and Sub relationships have a property, "contract_id" that must match Contract['id']? If so, how?
Edit: I don't want to have to specify the contract name for the query. Based on #MarkM's reply, I'm thinking something like:
MATCH (a:Agency)-[:HAS]-(c:Contract)-[:PRIME {contract_id:c.id}]
-(p:Company)-[:SUB {contract_id:c.id}]-(s:Company)
RETURN s
I'd also like to be able to use things like shortestPath to find the shortest path between an agency and a contractor that follows a single contract ID.
I'd create the subcontractor by having two relationships, one to the contractor and one to the contract.
(:Agency)-[:ISSUES]->(con:Contract)-[:PRIMARY]->(contractor:Company)
(con:Contract)-[:SECONDARY]->(subContractor:Company)<-[:SUBCONTRACTS]-(contractor:Company)
Perhaps you can mode your use-case as a graph-gist, which is a good way of documenting and discussing modeling issues.
This seems pretty simple; I apologize if I've misunderstood the question.
If you want subcontractors you can simply query:
MATCH (a:Agency)-[:HAS]-(:Contract)-[:PRIME]-(p:Company)-[:SUB]-(s:Company) RETURN s
This will return all companies that are subcontractors. The query matches the whole pattern. So if you want XYZ contract subcontractors you simply give it the parameter:
MATCH (a:Agency)-[:HAS]-(:Contract {contractID: XYZ})-[:PRIME]-(p:Company)-[:SUB]-(s:Company) RETURN s
You'll only get company 2.
EDIT: based on your edit:
"Find ways Agency 2 can put money on contract with Company 2"
This seems to require some domain-specific knowledge which I don't have. I assume Agency 2 can only put money on subcontractors but not primes?? I might help if you reword so we know exactly what your trying to get from the graph. From my reading it looks like you want all companies that are subcontractors under Company 2's contracts. Is that right?
If that's what you want, again you just give Neo the path:
MATCH (a:Agency: {AgencyID: 2)-[:HAS]
-(c:Contract)-[:PRIME]-(:Company)-[:SUB]-(s:Company: {companyID: 2)
RETURN c, s
This will give you a list of all contracts under XYZ for which Company 2 is a subcontractor. With the current example, it will one row: [c:Contract XYZ, s:Company 2]. If Agency 2 had more contracts under which Company 2 subcontracted, you would get more rows.
You can't do this: [:PRIME {contract_id:c.id}] [:SUB {contract_id:c.id}] because Prime and Sub relationships shouldn't have contract_id properties. They don't need them — the very fact that they are connected to a contract is enough.
One thing that might make this a little more complicated is if the subcontractors also have subcontractors, but that's not evident.
Okay take 2:
So the problem isn't captured well in the original example data — sorry for missing it. A better example is:
Agency 1 -HAS-> Contract ABC -P-> Company 1 -S-> Company 2
Agency 1 -HAS-> Contract XYZ -P-> Company 1 -S-> Company 3
Now if I ask
MATCH (a:Agency)-[:HAS_CONTRACT]-(ABC:Contract {id:ABC})-[:PRIME]
-(c:Company)-[:SUBS]-(c2) RETURN c2
I'll get both Company 2 and 3 even though only 2 is on ABC, Right?
The problem here is the data model not the query. There's no way to distinguish a company's subs because they are all connected directly to the company node. You could put a property on the sub relationship with the prime ID, but a better way that really captures the information is to add another contract node under company. Whether you label this as a different type depends on your situation.
Company1 then [:HAS] a contract which the subs are connected to. The contract can then point back to the prime contract with a relationship of something like [:PARENT] or [:PRIME] or maybe from the prime to the sub with a [:SUBCONTRACT] relationship
Now everything becomes much easier. You can find all subcontracts under a particular contract, all subcontracts a particular company [:HAS], etc. To find all subcontractors under a particular contract you could query something like this:
MATCH (contract:Contract { id:"ContractABC" })-[:PRIME]-(c:Company)
-[:HAS]->(subcontract:Contract)-[:PARENT]-(contract)
WITH c, subcontract
MATCH (subcontract)-[:SUBS]-(subcontractor:Company)
RETURN c, subcontractor
This should give you a list of all companies and their subcontractors under contract ABC. In this case Company 1, Company 2 (but not company 3).
Here's a console example: http://console.neo4j.org/?id=flhv8e
I've left the original [:SUB] relationships but you might not need them.

Fetch entities with 0 junction tables

We have a few different entities. To explain a little better, here is example structure:
We have a lot of students.
We have a lot of homeworks.
Each homework has N (varies per homework) tasks.
There is a junction table connecting students and tasks.
We want to assign some tasks to certain students in a homework. Let's say one homework has 5 possible tasks, we want each student to get one or more tasks.
At the moment, interface lists all students with some properties (for this case let's say - average grade, hair colour, name, gender etc.) and has 5 checkboxes (there are 5 tasks in selected homework). We can use filters to show only students with average grade of 4, or just female students etc.
After a while, you would assign 98% of students, but you notice there are 2% students without any tasks for selected homework (some statistics shows you that). Instead of going through all few thousand students, we'd like to create filter which would use existing filters AND apply additional filter which would show all students which have 0 tasks where task.homeworkId = X
Right now, in my head there is a possible solution, but I'm not sure if this is possible using breeze:
from students
where (OLD_FILTERS)
and ((from junction_table
where task.homeworkId = selectedHomeworkId
and student.id = $parent_query.id).count() = 0)
I've been over this for a while now, can't come up with a nice clean solution. Only thing comming to my mind is pretty complex solution with manual filtering of all student properties and this new condition on server.
Thanks in advance for any help.
EDIT
Tables are related as follows:
student - junction table = 1 to many
task - junction table = 1 to many (basically, it's many-to-many student-task through junction entity)
task - homework = many to 1 (many tasks per one homework)
some perception on the model:
Student:
Id
Property1
Property2
...
Homework
Id
Property1...
Task
Id
HomeworkId // this is foreign key
Property....
JunctionTable
Id
TaskId // foreign key
StudentId // foreign key
Thanks to DenisK for help. We used older version of Breeze without the needed option. Solution is simple predicate:
var pred = breeze.Predicate.create("studentTasks", "any", "task.homeworkId", "==", homeworkId).not();

Should I flatten multiple customer into one row of dimension or using a bridge table

I'm new to datawarehousing and I have a star schema with a contract fact table. It holds basic contract information like Start date, end date, amount ...etc.
I have to link theses facts to a customer dimension. there's a maximum of 4 customers per contract. So I think that I have two options either I flatten the 4 customers into one row for ex:
DimCutomers
name1, lastName1, birthDate1, ... , name4, lastName4, birthDate4
the other option from what I've heard is to create a bridge table between the facts and the customer dimension. Thus complexifying the model.
What do you think I should do ? What are the advantages / drawbacks of each solution and is there a better solution ?
I would start by creating a customer dimension with all customers in it, and with only one customer per row. A customer dimension can be a useful tool by itself for CRM and other purposes and it means you'll have a single, reliable list of customers, which makes whatever design you then implement much easier.
After that it depends on the relationship between the customer(s) and the contract. The main scenarios I can think of are that a) one contract has 4 customer 'roles', b) one contract has 1-4 customers, all with the same role, and c) one contract has 1-n customers, all with the same role.
Scenario A would be that each contract has 4 customer roles, e.g. one customer who requested the contract, a second who signs it, a third who witnesses it and a fourth who pays for it. In that case your fact table will have one row per contract and 4 customer ID columns, each of which references the customer dimension:
...
RequesterCustomerID int,
SignatoryCustomerID int,
WitnessCustomerID int,
BillableCustomerID int,
...
Of course, if one customer is both a requester and a witness then you'll have the same ID in both RequesterCustomerID and WitnessCustomerID because you only have one row for him in your customer dimension. This is completely normal.
Scenario B is that all customers have the same role, e.g. each contract has 1-4 signatories. If the number of signatories can never be more than 4, and if you're very confident that this will 'always' be true, then the simple solution is also to have one row per contract in the fact table with 4 columns that reference the customer dimension:
...
SignatoryCustomer1 int,
SignatoryCustomer2 int,
SignatoryCustomer3 int,
SignatoryCustomer4 int,
...
Even if most contracts only have 1 or 2 signatories, it's not doing much harm to have 2 less frequently used columns in the table.
Scenario C is where one contract has 1-n customers, where n is a number that varies widely and can even be very large (class action lawsuit?). If you have 50 customers on one contract, then adding 50 columns to the fact table becomes difficult to manage. In this case I would add a bridge table called ContractCustomers or whatever that links the fact table with the customer dimension. This isn't as 'neat' as the other solutions, but a pure star schema isn't very good at handling n:m relationships like this anyway.
There may also be more complex cases, where you mix scenarios A and C: a contract has 3 requesters, 5 signatories, 2 witnesses and the bill is split 3 ways between the requesters. In this case you will have no choice but to create some kind of bridge table that contains the specific customer mix for each contract, because it simply can't be represented cleanly with just one fact and one dimension table.
Either way can work but each solution has different implications. Certainly you need customer and contract tables. A key question is: is it always a maximum of four or may it eventually increase beyond that? If it will stay at 4, then you can have a repeating group of 4 customer IDs in the contract. The disadvantage of this is that it is fixed. If a contract does not have four, there are some empty spaces. If, however, there might be more than 4, then the only viable solution is to use a bridge table because in a bridge table you add more customers by inserting new rows (with no change to the table structure). In the fixed solution, in this case you add more than 4 customers by altering the table. A bridge table is an example of what, for many decades now, ER modeling has called an associative entity. It is the more flexible of the two solutions. However, I worked on a margin system once wherein large margin amounts needed five levels of manager approval. It has been five and will always be five, they told me. Each approving manager represented a different organizational level. In this case, we used a repeating group of five manager IDs, one for each level, and included them in the trade. So it is important to understand the current business rules and the future outlook.

Resources