Graph Modelling a 'transitive' relationship - neo4j

This is a followup to an earlier question that I had posted and accepted an answer. I have a further question after getting feedback, and trying to post as a new question to hopefully get an answer.
Having discussed with users, the requirement just got more complex. What they actually do is something like a table in relational world with following columns (its denormalised with lot of repetitive data:
PartnerName | Service | Offered? |CurrentlyUsing | WeCouldSellThese |
XX | Baking | Yes |Competitor A, B | Product A |
XX | Baking | Yes |Competitor A, B | Product C |
XX | Baking | Yes |Competitor A, B | Product D |
XX | OnlyDough| Yes |Product A | Product C |
XX | Packing | No | | Product E |
Basically, they need to store information what is being used currently, and whether its currently offered by partner or not, they still try to sell them products (Offered Yes or No will both still lead to a market). There is a many-to-many relationship between service and product as well...which means there is a "3node" relationship - A particular partner for a particular product for a particular service, here are the 2 options I'm thinking of. The trouble with Option 1 is that Product A would have many To_Build outgoing relationships, so I dont have a way to figure out its for which partner.
Here are the options after I bring a new entity to split the relationship:

You can use an extra node (say, labelled "Build") to "reify" the "3-node relationship". For example:
By the way, you should also consider whether the Could_Offer relationship is redundant. For example, you could add an isOffered property to the Could_Build relationship and eliminate the Could_Offer relationship.

Related

Best practices for fact table that depends on two processes

I am building a star schema for an online business. One of the key processes is email newsletter signup.
But the analysis depends on two processes and I can't figure out how to model it the best way.
Here's how the process works:
Person visits website
Person fills out web form and is recorded as a contact in our CRM
Person receives a link asking him to confirm if this is really his email
Person clicks the link and is considered confirmed
Person can now receive emails from us
The signup and confirmation process take place at different times. Most people click the confirmation link on the same day, but we send two follow up email over a few days after the signup so some people may confirm their email only after a few days.
On top of that a person could signup several times on the website. Most of our signups are people who exchange their email address in return for some sort of resource like an eBook.
As long as the person's email is not marked confirmed in our system, we ask the person to confirm on each signup.
Since we have multiple offers it's not uncommon for a person to request eBook A, eBook B and eBook C and only confirm after several signups.
In the fact table each signup for emails that are unconfirmed yet is marked as ConfirmationRequested -> True.
If the person clicks a confirmation link of ANY of the confirmation request emails he should be considered confirmed for each of those signups.
How I want to analyse the data
See how many signups we had
See how many signups were re-signups and how many were new contacts in the CRM (new email address)
See how many new contacts have confirmed their email address (and become full subscribers)
See how many re-signups were asked to confirm their email and how many have done so
Analyse how long it takes for people to confirm their email address
Analyse the confirmation rate
Filter contacts by their confirmation status and analyse what people who have or have not confirmed have in common
I don't really care about confirmations in isolation from signups.
And for my purposes I would like to have a ConfirmationStatus dimension that is...
"Confirmed" if the person confirms within 7 days of sign up
"Pending" if the person hasn't confirmed, but 7 days haven't passed since signup yet
"Not Confirmed" if the person hasn't confirmed within 7 days (even if the person does confirm at some later point)
On top of that I usually look at this report on Mondays to analyse the previous week and compare it to other weeks. (I already have a working version of this report in a flat table, but I am trying to learn how to build proper star schemas.)
This has the additional challenge that contacts that signed up on Sunday for example only had less than a day to confirm and would drag down the confirmation rate and the latest week would look bad if compared to previous week where all contacts had the full 7 days to confirm.
So I calculate a "Confirmed within signup week" confirmation count and rate for all weeks to allow apples to apples comparisons.
How to model this...
I have considered the following options...
Option #1: Separate fact tables
Since these are separate processes that happen at separate times I have learned that I should create separate fact tables and then drill across common dimensions.
I could calculate signups that requested confirmations from the signup table and then calculate confirmations within a week of the signup through the contact and date dimensions.
But that wouldn't allow me to filter the signups by confirmation status.
That's why I am considering...
Option 2: A fact table that combines both signups and confirmations
I am thinking of something like this:
| Dim Signup Info | | | Dim Contact | | | Fact Signups | |
|-----------------------|------|---|-------------|------|---|----------------------|----|
| SignupInfoKey | SK | | ContactKey | SK | | SignupDateKey | FK |
| SignupType | SCD1 | | Name | SCD1 | | ConfirmationDate | FK |
| ConfirmationRequested | SCD1 | | Email | SCD1 | | SignupInfoKey | FK |
| ConfirmationSucceeded | SCD1 | | ... | | | ContactKey | FK |
| ConfirmationStatus | SCD1 | | | | | SignupId | DD |
| | | | | | | SignupDateTime | DD |
| | | | | | | ConfirmationDateTime | DD |
| | | | | | | Signups | M |
| | | | | | | NewContacts | M |
| | | | | | | ConfirmationMin | M |
| | | | | | | ConfirmationDays | M |
I need the ConfirmationDate in the fact to calculate the "Confirmed Within Week" measures at report time (I am using powerbi and it's easy there). I could of course also create a dimension "ConfirmedWIthinWeek" and then filter based on that, but it won't be as flexible... What if I decide later on to look at the data on a daily or monthly basis for example?
Another concern is that it will require to reprocess and update the fact tables on each incremental load for the past 7 days.
I know that's ok for dimensions, but is that ok for fact tables too?
So my questions are
Is option #2 a good solution or is there a better way to do this?
Is it ok to update fact tables or is that discouraged?
Overall my question is: What am I missing?
This seems like a very common thing. One obvious example would be an order star that has fact table columns for AmountOrdered, AmountPaid, AmountRefunded and dimensions like "Order Status", "Paid Status" and "Refunded Status".
But none of my searches have resulted in answers to this common problem. Surely there must be a term for the problem and a pattern name for the solution where I can learn more about it?

Rails using Views instead of Tables

I need to create a Rails app that will show/utilize our current CRM system data. The thing is - I could just take Rails and use current DB as backend, but the table names and column names are the exact opposite Rails use.
Table names:
+-------------+----------------+--------------+
| Resource | Expected table | Actual table |
+-------------+----------------+--------------+
| Invoice | invoices | Invoice |
| InvoiceItem | invoice_items | InvItem |
+-------------+----------------+--------------+
Column names:
+-------------+-----------------+---------------+
| Property | Expected column | Actual column |
+-------------+-----------------+---------------+
| ID | id | IniId |
| Invoice ID | invoice_id | IniInvId |
+-------------+-----------------+---------------+
I figured I could use Views to:
Normalize all table names
Normalize all column names
Make it possible to not use column aliases
Make it possible to use scaffolding
But there's a big but:
Doing it on a database level, Rails will probably not be able to build SQL properly
App will probably be read-only, unless I don't use Views and create a different DB instead and sync them eventually
Those disadvantages are probably even worse when you compare it to just plain aliasing.
And so I ask - is Rails able to somehow transparently know the id column is in fact id, but is InvId in the database and vice versa? I'm talking about complete abstraction - simple aliases just don't cut it when using joins etc. as you still need to use the actual DB name.

Ruby on Rails: Join Tables Concept

So I have been out of the coding game for a while and recently decided to pick up rails. I have a question about the concept of Join tables in rails. Specifically:
1) why are these join tables needed in the database?
2) Why can't I just JOIN two tables on the fly like we do in SQL?
A join table allows a clean linking of association between two independent tables. Join tables reduce data duplication while making it easy to find relationships in your data later on.
E.g. if you compare a table called users:
| id | name |
-----------------
| 1 | Sara |
| 2 | John |
| 3 | Anthony |
with a table called languages:
| id| title |
----------------
| 1 | English |
| 2 | French |
| 3 | German |
| 4 | Spanish |
You can see that both truly exist as separate concepts from one another. Neither is subordinate to the other the way a single user may have many orders, (where each order row might store a unique foreign_key representing the user_id of the user that made it).
When a language can have many users, and a user can have many languages -- we need a way to join them.
We can do that by creating a join table, such as user_languages, to store every link between a user and the language(s) that they may speak. With each row containing every matchup between the pairs:
| id | user_id | language_id |
------------------------------
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 1 | 4 |
| 4 | 2 | 1 |
| 5 | 3 | 1 |
With this data we can see that Sara (user_id: 1) is trilingual, while John(user_id: 2) and Anthony(user_id: 3) only speak English.
By creating a join table in-between both tables to store the linkage, we preserve our ability to make powerful queries in relation to data on other tables. For example, with a join table separating users and languages it would now be easy to find every User that speaks English or Spanish or both.
But where join tables get even more powerful is when you add new tables. If in the future we wanted to link languages to a new table called schools, we could simply create a new join table called school_languages. Even better, we can add this join table without needing to make any changes to the languages SQL table itself.
As Rails models, the data relationship between these tables would look like this:
User --> user_languages <-- Language --> school_languages <-- School
By default every school and user would be linked to Language using the same language_id(s)
This is powerful. Because with two join tables (user_languages & school_languages) now referencing the same unique language_id, it will now be easy to write queries about how either relates. For example we could find all schools who speak the language(s) of a user, or find all users who speak the language(s) of a school. As our tables expand, we can ride the joins to find relations about pretty much anything in our data.
tl;dr: Join tables preserve relations between separate concepts, making it easy to make powerful relational queries as you add new tables.

Order by A then B using Ruby on Rails Model

This is not a homework question. I am trying to learn more.
I have the following entities with attributes
Manufacturer {name} //Store Manufactueres
Model {manufacturer_id, name} //Store Models
Tint {manufacturer_id, model_id, front, side, rear} //Store measurements
I have the follow data in my Tint entity. Alphabets stands for different manufacturer name and models.
Manufacturer | Model | Front | Side | Rear |
-------------+-------+-------+------+-------
A | AD | 10 | 10 | 10 |
B | AB | 10 | 10 | 10 |
A | AA | 10 | 10 | 10 |
A | AC | 10 | 10 | 10 |
B | AA | 10 | 10 | 10 |
A | AB | 10 | 10 | 10 |
When I print it out in view, I would like to have it sorted based on Manufacturer name and then Model. So the result will be as below. The name of the Manufactures will be sorted alphabetically, then Models.
Manufacturer | Model | Front | Side | Rear |
-------------+-------+-------+------+-------
A | AA | 10 | 10 | 10 |
A | AB | 10 | 10 | 10 |
A | AC | 10 | 10 | 10 |
A | AD | 10 | 10 | 10 |
B | AA | 10 | 10 | 10 |
B | AB | 10 | 10 | 10 |
I have setup the model to make sure Manufacturer and Model is a distinct pair of values.
My question is since I am referencing using manufacturer_id and model_id, how can I get the name of the Manufacturer and Model from Manufacturer and Model table.
In my tints_controller.rb, I have #tints = Tint.all.order(:manufacturer_id). However, it will only sort based on the manufacturer_id (as in numbers) instead of the name of the manufacturer.
I know that I can do it in SQL way (SELECT, FROM, WHERE) in RoR model. However, I would like to know is it possible to use ActiveRecord to sort the data based on their name.
If I understand correctly, you have 3 models, Tint, Manufacturer and Model. I am assuming you have the appropiate has_many and belongs_to associations setup correctly.
Tint.rb
belongs_to :workspace
Manufacturer.rb
has_many :models
has_many :tints, through: :models
Model.rb:
belongs_to Manufacturer
has_many :tints
You need to first join the three models together, and then order by some criteria
tints_controller.rb
#tints = Tint.joins(model: :manufacturer).order('manufacturers.name, models.name').pluck('manufacturers.name, models.name, tints.front, tints.side, tints.rear')
That will give you all tints records and they appropiate models and manufacturers.
Any time you have the id of an entity in Rails, you can easily retrieve other associated fields simply by instantiating that entity:
#manufacturer = Manufacturer.find(params[manufacturer_id])
Then it's a simple matter to retrieve any of the other fields:
#manufacturer_name = #manufacturer.name
If you need a collection of manufacturers or manufacturer names, then it's advisable to build yourself an ActiveRecord::Relation object immediately via a scoped query (as you already know). I have no idea what your criteria are, otherwise, I'd supply some sample code. I can tell you that your scoped query should include an .order clause at the end:
#manufacturers = Manufacturer.where("some_column = ?", some_criterion).order(:sort_field)
In the above example, :sort_field would be the field by you want to sort your ActiveRecord::Relation. I'm guessing in your case, it's :name.
All this having been said, if you want fancy sorted tables, you should look into the JQuery DataTables gem. DataTables can do a lot of the heavy lifting for you, and it's convenient for your users because they can then sort and resort by any column you present.
In your tints_controller.rb, instedad of
#tints = Tint.all.order(:manufacturer_id)
please write:
#tints = Tint.all.order(:manufacturer_id, :model_id)
Answer to my question:
In tints_controller.rb, I wrote
#tints = Tint.joins(:manufacturer, :model).order("manufacturers.name ASC, models.name ASC") to join the table and order them accordingly.
I tried the answer provided by #Goston above and I had an issue when I was trying edit the tints. It did not allow me to edit.
Note: Answer provided by #Goston will order them, but it broke the edit function for my case.

Retrieving data in Rails both ways from 2 column join table

I have a ChunkRelationship model with a table that looks like this:
+----+---------------+----------------+---------------------+---------------------+
| id | chunk_id | chunk_partner | created_at | updated_at |
+----+---------------+----------------+---------------------+---------------------+
| 1 | 1 | 2 | 2010-02-14 12:11:22 | 2010-02-14 12:11:22 |
| 2 | 2 | 1 | 2010-02-14 12:11:22 | 2010-02-14 12:11:22 |
+----+---------------+----------------+---------------------+---------------------+
Both entries are foreign keys to a Chunk model. Right now, the relationship is being saved twice, once in both directions ( 2 => 1 and 1 => 2). But the relationship can be saved once, because if one ID is known then the other can be found (What is this type of table called?).
I am wondering what the Rails way of doing that would be. I was thinking of creating a before_validation callback on the ChunkRelationship model and taking the smallest number of the two and always saving that to the chunk_id column, which would allow for checking for duplicates easier before saving. But from there I'm not sure how I would retrieve them.
The intended end result would be for chunk.partners to return all the rows paired with it, no matter which column either one is in.
Perhaps you are looking for the has_many_and_belongs_to association: http://guides.rubyonrails.org/association_basics.html#the-has-and-belongs-to-many-association
This should create a many-to-many relationship which I believe you are describing.

Resources