What normal form is a Star schema model in Data Warehouse - data-warehouse

In Data warehouse - dimensional modelling, what kind of normal form is Star Schema?

It's not. It is a denormalised model

Logical all star schema will be 2nd normal form .
AS "A relation that is in First Normal Form and every non-primary-key attribute is fully functionally dependent on the primary key, then the relation is in Second Normal Form"

Related

Rails - How to Model Dynamic Form Fields?

I have a request model. A request has one classification. What I want to set up is to store a bunch of form fields in the DB. Their types, names etc. Different classifications will have different form fields for the user to fill out on a request form. So ultimately User creates new request with classification C, and they are presented with a form with the appropriate fields for classification C.
I would like the values stored in a table with the request. My question is how should this be modeled?
Request has one classification.
Classification has_many requests.
I'm just not sure what to do with the dynamic form fields. I would like to be able to create the fields and attach them to the classification. So if first name, last name are fields needed I wouldn't have to create them for every classification. Just create them once and set associate them with a classification through a join table.
Looking for advice on how to model this out and be able to easily reference them from a request.
Thanks! Any info or thoughts are appreciated.
I would say that you should first try to model it according the relational model as far as possible.
# beware of potential conflicts with this name as it clashes with core method in controllers
class Request < ApplicationRecord
has_many :classifications
end
class Classification < ApplicationRecord
belongs_to :request
end
Model everything you know you can normalize. It's usually more then you think.
Dealing with data that doesn't adhere to a fixed schema can then be dealt with a few ways:
Just define all the fields and live with a few nulls here and there.
The Entity–attribute–value (EAV) pattern. This classic approach consists of a separate table where each row represents a value for a classification eg rails g model ClassificationAttribute classification:references attr_name attr_value. This is largely made obsolete by JSON data types.
A JSON/JSONB column. This additional column would be used to shove any unstructured data that cannot be normalized.
Serialized data columns. This also made obsolete by JSON/JSONB.
All of these can be combined with the Single Table Inheritance pattern.
If classification can be broken down into a limited number of variants you could consider Multiple Table Inheritance where you store the base data in the classification table and then use separate tables for the more specific data. Rails delegated_type feature can be used for this.
Your question is really confused and it is hard to understand what you are trying to achieve. But a few remarks:
You say "Request has one classification. Classification has_many requests" But if Request has one classification. Then classification should belongs to request. This way The Classification model holds a field called request_id (foreign key) that will help ActiveRecord link the two models together. (The child model is the one holding a foreign key)
If each is the parent of the other (has_one or has_many), then where is the foreign key ?
dynamic fields is not something possible. Your databse if hard coded: each field is declared in the relational database and Rails ActiveRecord's allows to access it easily and validate it. There is indeed a solution: have one of the model holds a JSON or JSONB field. And the value instead of being of the common types: string, text, integer.. be of JSON type and holds a value that is converted to a hash by Rails :
{
first_name: "Arthur",
last_name: "Smith",
age: "23"
}
This is pretty convenient for shopping carts as you can save an actual list of items rather than an association. Having an association would need to version your items changes (when the price of an item changes for example) which need some good engineering.
The question is : is it what you really want to do ? Because this is an option that doesn't fit all apps or uses.
Also you say the request depends on the classification. I have mentionned the problem of the foreign key above. But it seems weird that one of your record behavior is set by a direct relationship relationship. Who creates the classification ? Is it one of the app models such as the User ? an Admin ? or is it seeded by the app creator (then Classification is a standalone model) ? In this case the classification preexists the request the Request and maybe a has_and_belongs_to_many association (a join table ) would fit better...
Maybe give us a clearer view of what you want to achieve with real life examples so we can help further

Should we put all the fields related to the `user` in a `dim_user` table in a data warehouse?

Considering there is a data warehouse contains one fact table and three dimension tables.
Fact table:
fact_orders
Dimension tables:
dim_user
dim_product
dim_date
All the data of these tables are extracted from our business systems.
In the business system, the user has many attributes, some of which could change upon time(mobile, avatar_url, nick_name, status), some others won't change once the record is created(id,gender,register_channel).
So generally in the dim_user table, which fields should we use and why?
Dim_User should have both changeable and unchangeable fields. In denormalized model, it is preferrable to keep all the related attributes of a dimension in a single table.
Also, it is preferrable to keep all the information available about user in the dimension table, as they might be used for reporting purposes. If they won't be needed for reporting purpose, you can skip them.
If you want to keep the history of change of the user, you can consider implementing slowly changing dimensions. Otherwise, you can update the dimension attributes, as and when they change. It is called SCD Type I.

Split fact table because of one missing foreign key?

Imagine that we have two different messages:
CarDataLog
CarStatusLog
CarDataLog contains data which has a direct relation to a car and the corresponding Person and contains data about the car.
CarStatusLog contains data about the same car as mentioned above which had a customer in the log included. But this time the data is a status. For a field like: "CleaningState": "NotCleaned" or "Cleaned".
Both of the log messages contain a Car_ID. Would we create one Fact table with the foreign keys to Car and Person and have the risk the person_id is null sometimes because it is not given.. Or would a better approach be to create two fact tables with the risk of having the 'grain' spreaded out?
The use case would be: get data for a specific car, including the states it had and the Person first name.
I am new to data warehousing and I hope someone can assist me with this issue?
A standard practice in data warehousing is to make a dummy row for dimension tables that is used to match "UNKNOWN" data. This prevents NULLS in the foreign keys in the fact table.
Depending on your use case, you may have multiple types of "UNKNOWN" data. For example, you could use a key of -1 for "UNKNOWN" and -2 for "NOT APPLICABLE" dimensional data.
See also: https://www.kimballgroup.com/2010/10/design-tip-128-selecting-default-values-for-nulls/
You need dims as Car_dim, Person_dim, Status_dim (as values CleaningState,NotCleaned" or "Cleaned), and Date_dim. Person_dim can have a row of "Unknown" person name when you get a null person name.
Dim and Fact tables have parent/child relationship that means you have to load data in Dim first (Dim is a parent) and then you load into a Fact (child) table.
Load dim IDs from above Dims in your Fact table based on the data you get. Make sure the 2 logs you have date fields in them so you can join both logs on a Car_id and when a date in both logs matches for that Car_id.
If you get a scenario when a Car_id exists in CarDataLog but not in CarStatusLog, then you need to create a row of "Unknown Status" in the Status_dim so you can use it in the Fact table. Good Luck!

Ruby on Rails: Saving multiple values in a single database cell

How do I save multiple values in a single cell record in Ruby on Rails applications?
If I have a table named Exp with columns named: Education, Experience, and Skill, what is the best practice if I want users to store multiple values such as: education institutions or skills in a single row?
I'd like to have users use multiple text fields, but should go into same cell record.
For instance if user has multiple skills, those skills should be in one cell? Would this be best or would it be better if I created a new table for just skills?
Please advise,
Thanks
I would not recommend storing multiple values in the same database column. It would make querying very difficult. For example, if you wanted to look for all the users with a particular skill set, the query would clumsy both on readability and performance.
However, there are still certain cases where it makes sense.
When you want to allow for variable list of data points
You are not going to query the data based on one of the values in the list
ActiveRecord has built-in support for this. You can store Hash or Array in a database column.
Just mark the columns as Text
rails g model Exp experience:text education:text skill:text
Next, serialize the columns in your Model code
class Exp < ActiveRecord::Base
serialize :experience, :education, :skill
# other model code
end
Now, you can just save the Hash or Array in the database field!
Exp.new(:skill => ['Cooking', 'Singing', 'Dancing'])
You can do it using a serialized list in a single column (comma-separated), but a really bad idea, read these answers for reasoning:
Is storing a delimited list in a database column really that bad?
How to store a list in a column of a database table
I suggest changing your schema to have a one to many relationship between users and skills.
Rails 4 and PostgreSQL comes with hstore support out of the box, more info here In rails 3 you can use gem to enable it.
It depends on what kind of functionality you want. If you want to bind the Exp model attributes with a form (for new and update operations) and put some validations on them, it is always better to keep it in a separate table. On the other hand, if these are just attributes, which you just need in database keep them in a single column. There is way by which you can keep the serialized object like arrays and hashes in database columns. Make them a array/hash as per your need and save it like this.
http://api.rubyonrails.org/classes/ActiveRecord/AttributeMethods/Serialization/ClassMethods.html#method-i-serialize
Serialized attributes, automatically deserializes when they are pulled out of tables and serialized automatically when saved.

Rails model to represent multiple fields

I'm developing a rails project where I have one data model with multiple fields that are collection selects. I'd like to create another model to represent all of these collection select fields. So, for instance, my main data model has three collection select fields -- one for county, one for category, and one for classification. I could separate these into three separate data models, but that seems redundant since they all share the same characteristics. They have a type and a value, like a county is a county and it has a value of let's say Sonoma, just as category has a type of category and a value of let's say Winery. If you've ever used Drupal, I'm basically looking for the behavior of the taxonomy functionality.
So you see my dilemma: I need to separate these fields into three separate fields but they have very similar data structures. Any suggestions would be greatly appreciated.
This is a perfect case for single-table inheritance. Your problem is screaming for it.

Resources