Neo4J Grandchild Relationships Linked to Nodes - neo4j

I'm trying to model contractor relationships in Neo4J and I'm struggling with how to conceptualize subcontracts. I have nodes for Government Agencies (label: Agency) and Contractors (label:Company). Each of these have geospatial Office nodes with the HAS_OFFICE relationship. I'm thinking of creating a node that represents a Government Contract (label: Contract).
Here's what I'm struggling with: A Contract has a Government Agency (I'm thinking this is a "HAS CONTRACT" relationship) and one or more prime contractor(s) (I'm thinking this is a "PRIME" relationship). Here's where it gets complicated. Each of those primes contractors can have subcontractors under the prime contract only. Graphically, this is:
(Agency) -[HAS_CONTRACT]-> (Contract) -[PRIME]-> (Company 1) -[SUB]-> (Company 2)
The problem I'm struggling with is that the [SUB] relationship is only for certain contracts -- not all. For example:
Agency 1 -HAS-> Contract ABC -P-> Company 1 -S-> Company 2
Agency 1 -HAS-> Contract ABC -P-> Company 3 -S-> Company 4
Agency 2 -HAS-> Contract XYZ -P-> Company 1
Agency 2 -HAS-> Contract XYZ -P-> Company 4 -S-> Company 2
I want some way to search on that so I can ask cypher questions like "Find ways Agency 2 can put money on contract with Company 2." It should come back with the XYZ contract through Company 4, and NOT the XYZ contract through Company 1.
It seems like maybe storing and filtering on data within the relationship would work, but I'm struggling with how. Can I say Prima and Sub relationships have a property, "contract_id" that must match Contract['id']? If so, how?
Edit: I don't want to have to specify the contract name for the query. Based on #MarkM's reply, I'm thinking something like:
MATCH (a:Agency)-[:HAS]-(c:Contract)-[:PRIME {contract_id:c.id}]
-(p:Company)-[:SUB {contract_id:c.id}]-(s:Company)
RETURN s
I'd also like to be able to use things like shortestPath to find the shortest path between an agency and a contractor that follows a single contract ID.

I'd create the subcontractor by having two relationships, one to the contractor and one to the contract.
(:Agency)-[:ISSUES]->(con:Contract)-[:PRIMARY]->(contractor:Company)
(con:Contract)-[:SECONDARY]->(subContractor:Company)<-[:SUBCONTRACTS]-(contractor:Company)
Perhaps you can mode your use-case as a graph-gist, which is a good way of documenting and discussing modeling issues.

This seems pretty simple; I apologize if I've misunderstood the question.
If you want subcontractors you can simply query:
MATCH (a:Agency)-[:HAS]-(:Contract)-[:PRIME]-(p:Company)-[:SUB]-(s:Company) RETURN s
This will return all companies that are subcontractors. The query matches the whole pattern. So if you want XYZ contract subcontractors you simply give it the parameter:
MATCH (a:Agency)-[:HAS]-(:Contract {contractID: XYZ})-[:PRIME]-(p:Company)-[:SUB]-(s:Company) RETURN s
You'll only get company 2.
EDIT: based on your edit:
"Find ways Agency 2 can put money on contract with Company 2"
This seems to require some domain-specific knowledge which I don't have. I assume Agency 2 can only put money on subcontractors but not primes?? I might help if you reword so we know exactly what your trying to get from the graph. From my reading it looks like you want all companies that are subcontractors under Company 2's contracts. Is that right?
If that's what you want, again you just give Neo the path:
MATCH (a:Agency: {AgencyID: 2)-[:HAS]
-(c:Contract)-[:PRIME]-(:Company)-[:SUB]-(s:Company: {companyID: 2)
RETURN c, s
This will give you a list of all contracts under XYZ for which Company 2 is a subcontractor. With the current example, it will one row: [c:Contract XYZ, s:Company 2]. If Agency 2 had more contracts under which Company 2 subcontracted, you would get more rows.
You can't do this: [:PRIME {contract_id:c.id}] [:SUB {contract_id:c.id}] because Prime and Sub relationships shouldn't have contract_id properties. They don't need them — the very fact that they are connected to a contract is enough.
One thing that might make this a little more complicated is if the subcontractors also have subcontractors, but that's not evident.

Okay take 2:
So the problem isn't captured well in the original example data — sorry for missing it. A better example is:
Agency 1 -HAS-> Contract ABC -P-> Company 1 -S-> Company 2
Agency 1 -HAS-> Contract XYZ -P-> Company 1 -S-> Company 3
Now if I ask
MATCH (a:Agency)-[:HAS_CONTRACT]-(ABC:Contract {id:ABC})-[:PRIME]
-(c:Company)-[:SUBS]-(c2) RETURN c2
I'll get both Company 2 and 3 even though only 2 is on ABC, Right?
The problem here is the data model not the query. There's no way to distinguish a company's subs because they are all connected directly to the company node. You could put a property on the sub relationship with the prime ID, but a better way that really captures the information is to add another contract node under company. Whether you label this as a different type depends on your situation.
Company1 then [:HAS] a contract which the subs are connected to. The contract can then point back to the prime contract with a relationship of something like [:PARENT] or [:PRIME] or maybe from the prime to the sub with a [:SUBCONTRACT] relationship
Now everything becomes much easier. You can find all subcontracts under a particular contract, all subcontracts a particular company [:HAS], etc. To find all subcontractors under a particular contract you could query something like this:
MATCH (contract:Contract { id:"ContractABC" })-[:PRIME]-(c:Company)
-[:HAS]->(subcontract:Contract)-[:PARENT]-(contract)
WITH c, subcontract
MATCH (subcontract)-[:SUBS]-(subcontractor:Company)
RETURN c, subcontractor
This should give you a list of all companies and their subcontractors under contract ABC. In this case Company 1, Company 2 (but not company 3).
Here's a console example: http://console.neo4j.org/?id=flhv8e
I've left the original [:SUB] relationships but you might not need them.

Related

In Google Sheets, how can I remove similar (but not duplicate) strings from the same column?

In Google Sheets, I have a column which is a list of company names, all within the same column. The problem is that some of them are repeated with slight differences. For example:
Company 1 Limited
Company 1 Ltd
Company 2
Company 3 Group
Company 3
Company 4
Company 5 p.l.c
Company 5 plc
Is there a way I can delete the similar (e.g. Company 1 "Ltd" vs Company 1 "Limited" entries to end up with a list like this?
Company 1 Ltd
Company 2
Company 3 Group
Company 4
Company 5 plc
I don't have a preference between words like 'Ltd' or 'Limited', or whether 'group' is present or not. I would just like to reduce, as much as possible, these similar double entries. I've come across fuzzylookup but my understanding is that it only works between two ranges.
The easiest thing for me to do would be to strip down the company names to a standard form by removing "Ltd" and "limited" with find and replace, and then remove the duplicates, but I would rather not go down this path as I would like to retain something following the company name. Keep in mind that the column contains company names which vary in string length. "Company X" is used in this case for demonstration purposes.
Sample here: https://docs.google.com/spreadsheets/d/1u2XDzKR09Ri_hR9FXxs9OwRswEKDvhlXCaLb1w-rhSc/edit?usp=sharing

E-R diagram confusion

I am in the process of designing this E-R diagram for a shop of which I have shown part of below (the rest is not relevant). See the link please:
E-R diagram
The issue that I have is that the shop only sells two items, Socks and Shoes.
Have I correctly detailed this in my diagram? I'm not sure if my cardinalities and/or my design is correct. A customer has to buy at least one of these items for the order to exist (but has the liberty to buy any number).
The Shoe and Sock entities would have their respective ID attribute, and I am planning to translate to a relational schema like this:
(I forgot to add to my diagram the ORDER_CONTAINS relationship to have an attribute called "Quantity". )
Table: Order_Contains
ORDER_ID | SHOEID | SOCKID | QTY
primary key | FK, could be null |FK, could be null | INT
This clearly won't work since the Qty would be meaningless. Is there a way I can reduce the products to just two products and make all this work?
Having two one-to-many relationships combined into one with nullable fields is a poor design. How would you record an order containing both shoes and socks - a row per shoe with SOCKID set to NULL and vice-versa for socks, or would you combine rows? In the former case the meaning of QTY is clear though it depends on the contents of SHOEID/SOCKID fields, but what would the QTY mean in the latter case? How would you deal with rows where both SHOEID and SOCKID are NULL and the QTY is positive? Keep in mind Murphy's law of databases - if it can be recorded it will be. Worse, your primary key (ORDER_ID) will prevent you from recording more than one row, so a customer couldn't buy more than one (pair of) socks or shoes.
A better design would be to have two separate relations:
Order_Socks (ORDER_ID PK/FK, SOCKID PK/FK, QTY)
Order_Shoes (ORDER_ID PK/FK, SHOEID PK/FK, QTY)
With this, there's only one way to record the contents of an order and it's unambiguous.
You have not explained very well the context here. I'll try to explain from what I understand, and give you some hints.
Do your shop only and always (forever) sell 2 products? Do the details of these products (color, model, weight, width, etc...) need to be persisted in the database? If yes, then we have two entities in the model, SOCKS and SHOES. Each entity has its own properties. A purchase or a order is usually seen as an event on the ERD. If your customers always buys (or order) socks with shoes, then there will always be a link between three entities:
CLIENTS --- SHOES --- SOCKS
This connection / association / relationship is an event, and this would be the purchase (or order).
If a customer can buy separate shoes and socks, then socks and shoes are subtypes of a super entity, called PRODUCTS, and a purchase is an event between CUSTOMERS and PRODUCTS. Here in this case we have a partitioning relationship.
If however, your customers buy separate products, and your store will not sell forever only 2 products, and details of the products are not always the same and will not be saved as columns in a table, then the case is another.
Shoes and socks are considered products, as well as other items that can be considered in future. Thus, we have records/rows in a PRODUCTS table.
When a customer places an order (or a purchase), he (she) is acquiring products. There is a strong link between customers and products here, again usually an event, which would be the purchase (or a order).
I do not know if you do it, but before thinking of start a diagram, type the problem context in a paper or a document. Show all details present in the situation.
The entities are seen when they have properties. If you need to save the name of a customer, the customer's eye color, the customer's e-mail, and so on, then you will have certainly a CUSTOMER entity.
If you see entities relate in some way, then you have a relationship, and you should ask yourself what kind of relationship these entities form. In your case of products and customers, we have a purchasing relationship there between. The established relationship is a purchase (or an order, you call it). One customer can buy various products, and one product (not on the same shelf, is the type, model) can be purchased for several customers, thus, we have a Many-To-Many relationship.
The relationship created changes according to the context. Whatever, we'll invent something crazy here as examples. Say we have customers and products. Say you want to persist a situation where customers lick Products (something really crazy, just for you to see how the context says the relationship).
There would be an intimate connection between customers and products entities (really close... I think...). In this case, the relationship represents a history of customers licking products. This would generate an EVENT. In this event you could put properties such as the date, the amount of times a customer licked a proper product, the weather, the time, the traffic light color on the street, etc., only what you need to persist according to your context, your needs.
Remember that for N-N relationships created, we need to see if new entities (out of relationship) will emerge. This usually happens when you are decomposing the conceptual model to the logical model. Probably, product orders will generate not one but two entities: The ORDER and the products of orders. It is within the products of orders that you place the list of products ordered from each customer, and the quantity.
I would like to present various materials to study ERD, but unfortunately they are all in Portuguese. I hope I have helped you in some way. If you want to be more specific about your problem, I think I can really help you best. Anything, please ask.

Enforcing association uniqueness in ActiveRecord

There are three sets of entities: Players, Teams and Games. Team may consist from one or two Players and are formed voluntary for each particular Game, i.e.
players A,B,C,D can form 11 Teams because Team AB is the same as Team BA.
That being said Teams may contain only unique set of players - BA is a duplication of AB.
The most obvious way to form the Teams and Games relationship is as many to many, but is this a way to go? The real question is how to model those restrictions in robust and scalable way, so they can handle, let's say, a team not of only 1-2 players but also a team of 1-20 players without (much) augmentation?
Here is an abstract interface I'm thinking of -
Team.find_or_create_by(player_ids: [p1.id, p2.id]) # find p1 and p2 team
Team.find_by(player_id: p1) # find all teams that p1 has participated in
p.s. I do not think that question title is really good and probably this is well known problem which has an established name, so if one can point it out I'll be really glad to swap the title.

Matching students to courses with course limit (Hungarian, Max Flow, Min-Cost-Flow, ...)

I am currently writing a program which maps students to courses. Currently, I am using a SAT-Solver, but I am trying to implement a polynomial time / non greedy algorithm which solves the following sub-problem:
There are students (50-150)
There are subjects (10-20), e.g. 'math', 'biology', 'art'
There are courses per subject (at least one), e.g. 'math-1', 'math-2', 'biology-1', 'art-1', 'art-2', 'art-3'
A student selects some (fixed) subjects (10-12) and for each subject the student has to be assigned to exactly one of the existing courses (if possible). It does not matter which course 'math-1' or 'math-2' is being selected.
The courses have a maximum number of allowed students (20-34)
Each course is in a fixed block (= timeslot 1 to 13)
A student may not be assigned to courses being in the same block
I am now describing what I have done so far.
(1) Ignoring the course-student-limit
I was able to solve this with the hungarian algorithm / bipartite matching. Each student may be computed individually by modelling it as following:
left nodes represent the subjects 'math', 'biology', 'art' (of the student)
right nodes represent the blocks '1', '2', .... '13'
an edge is inserted for each course from 'subject' to 'block'
This way the student is assigned for every subject to a course while not attending courses which are in the same block. But course-limits are ignored.
(2) Ignoring the selected subjects of the student
I was able to solve this with a max-flow-algorithm. For each student the following is modelled:
Layer 1: From source to each student with a flow of 13
Layer 2: From each student to his/her personal block with a flow of 1
Layer 3: From each student-block to each course in that block with flow 1
Layer 4: From each course to the sink with 'max-student-limit'
This way the student selects arbitrary courses and the course-limit is fullfilled. But he/she may be unlucky and be assigned to 'math-1', 'math-2' and 'math-3' ignoring the subjects 'biology' and 'art'.
(3) Greedy Hungarian
Another idea I had was to match one student at a time with the hungarian algorithm and adjusting the weights so that 'more empty courses' are preferred. For example one could model:
left nodes are subjects of the student
right nodes are blocks
for each course insert an edge from subject to the block of the course with weight = number of free seats
And then computing a Maximum-Weight-Matching.
I would really appreciate any suggestions / help.
Thank you!

Is it possible to have conditional OLAP dimension aggregators?

I have a set of OLAP cubes, in the form of a snow-flake schema, each representing one factory.
I have three concepts that for some factories clearly behave as 3 dimensions, and for other factories clearly behave as 2 dimensions.
The concepts are always the same: "products", "sales agents" and "customers".
But for some cases, I doubt if I should model it as a purely 3 dimensional cube or I should play around with some tweak or trick with a 2 dimensional cube.
Cases A and B are the ones that are clear for me, and Case C is the one that generates my wonderings.
CASE A: Clearly a 3 dimensional cube
Any agent can sell any product to any company. Several agents are resposible together for the same set of customers.
I model this case as this:
CASE B: Clearly a 2 dimensional cube
Every agent is 'responsible' for a portfolio of customers, and he can sell any product but only to his customers. The analysis is made on 'current responsability on the portfolio' so if an agent leaves the company, all his customers are reassigned to a new agent and the customer uniquely belongs to the new agent.
I model this case as this:
CASE C: My doubts
A customer may have been assigned a single agent or a set of several agents each one being responsible for a ProductCategory.
For example:
Alice manages TablesAndWoods ltd and GreenForest ltd.
Bob manages Chairs ltd and FastWheels ltd.
Carol manages Forniture ltd ONLY for ProductType = 'machinery' and also manages FrozenBottles ltd for ANY type of product.
Dave also manages Forniture ltd but ONLY for ProductType = 'consumables' and also manages HighCeilings ltd for ANY type of product.
QUESTION:
In this example "Case C":
Are customer and agent independent dimensions because Forniture ltd has relation both to Caroland Dave, so it is a 3D cube?
Or it is a 2D cube, where agent is not an independent dimension, but it is an aggregator of customer "conditioned" somehow by the ProductCategory product aggregator?
I would like to see how would you model this.
Thanks in advance.
Here is how I would model it:
Your fact table is Sales.
Your dimensions are (probably) Date, Product, Customer and Agent. This is closest to your Case A.
Collapse your snowflake (white entities) into the dimensions. The presence of these entities suggest that you should consider whether type-2 slowly changing dimensions are needed for at-time analysis.
Consider a Bridge table to capture the many-to-many relationship between Agent and Product.

Resources