Should I use a model archive in rails - ruby-on-rails

I have a model product with a has_many relation prices. The prices table is growing rapidly, only few current prices are normally needed, but I want to keep all as a history.
So I am thinking to "archive" all old prices. How do I do that best?
Before I had a column old and was filtering them out when ever I only wanted the current prices. But now the prices table has 2.5 million rows and only 200k are needed in most situations. That's why I thought I would just create a new model price_archive. Copy all "old" prices to price_archive and delete it from prices. And all logic will be moved to a module, used by both models, so I can use price and price_archive in the same way.
Pros for the archive approach:
~ most of the queries are done on the smaller data set (200k, not much growing)
Cons:
displaying both ordered by time needs to be sorted on some kind of joined data set, because times overlap. So it looks like (part.prices.to_a + part.prices_archive.to_a).sort(&:time). Not a big problem, because this will be used very soldomly. But:
I have other models (i.e. order) that use prices in a belongs_to relation, so those need price_id and price_archive_id (with one id always being nil), so that they still reference a price.
Most queries are: show all prices for product (in a select box) and mark the price that is connected to this order (or add it to the select box, when it is archived)
So the code would be something like:
Order.where(*where*).includes(:part => :prices, :price, :price_archive)
The db will query: prices WHERE part_id = ? [on 200k] + prices WHERE id = ? [on 200k] + price_archives WHERE id = ? [on 2300k, but with primary_key]
instead of prices WHERE part_id = ? [on 2500k, with normal index]
Is there a better way or should I stay with the old column?

Related

ActiveRecord - Using a float field to 'group & order' items

I'm working on a project built on Rails 4, ActiveRecord and PostgreSQL and faced with a performance dilemma -
For brevity, let's say I have Category & Item models. Category has_many items.
Let's take the example where category 'Furniture' has 'bed, large mattress, small mattress, armchair', etc. While displaying these items under the category, we would intuitively want to see all kinds of mattresses and bed frames together, instead of being lexicographically ordered. Also, let's assume the total number of items under any category is in the order of < 100 (mostly about ~10-15 per category) & so naturally, the order of items falling in the same 'group' under a category would be much lower than that.
To achieve this grouping, one way is to create a SubCategory model and associate items through them, so we can add items of a certain group later on and still be able to show them together by grouping on the category & sub category.
The other way I'm thinking of, since the order of total items is so small, is to add an order (float type) field to the Item model to still be able to group them together (Bed = 5.01, Mattress = 5.02, Chair = 6.01, Bed Cover = 5.03 & so on).
The only reason I'm considering the other option is because we're confident on the number of items to not go beyond even a 100 in our application's scope and so the Sub Category route - creating a new model and persisting many columns vs one - seems like an overkill for this particular case.
So my question (finally!) is this -
What kind of pitfalls might I fall if I went the second route? Moreover, is sorting on a float field with Postgres an overall better tradeoff on speed and memory vs adding a new model to simulate sub groupings such as mentioned in the above example?

How to store data in fact table with multiple products in an order in data warehouse

I am trying to design a dimensional modeling for data warehousing for one of my project(Sales Order). I'm new to this concept.
So far, I could understand that the product, customer and date can be stored in the dimension table and the order info will be in the fact table.
Date_dimension table structure will be
date_dim_id, date, week_number, month_number
Product_dimension table structure will be
product_dim_id, product_name, desc, sku
Order_fact table structure will be
order_id, product_dim_id(fk), date_dim_id(fk), order_quantity, order_total_price, etc
If a order is place with 2 or more number of product, will there be repeated entry in the order_fact table for the same order_id, date_dim_id
Please help on this. I'm confused here. I know that in a relational database, order table will have one entry per order and relation between the product and order will be maintained in a different table having the order_id and product_id as the foreign key.
Thanks in advance.
This is a classic case where you should (probbaly) have two fact tables
FactOrderHeader and FactOrderDetail.
FactOrderHeader will have a record for each order, storing information regarding the value of the order and any order level discounts; though they could be expressed as an OrderDetail record in some cases.
FactOrderDetail will have a record for each order line, storing information regard the product, product cost, product sale price, number of items, item discount. etc.
You may need to have a DimOrderHeader as well, if there are non-Fact pieces of information that you want to store, for example, date the order was taken, delivered, paid.

Granularity in Star Schema leads to multiple values in Fact Table?

I'm trying to understand star schema at the moment & struggling a lot with granularity.
Say I have a fact table that has session_id, user_id, order_id, product_id and I want to roll-up to sessions by user by week (keeping in mind that not every session would lead to an order or a product & the DW needs to track the sessions for non-purchasing users as well as those who purchase).
I can see no reason to track order_ids or session_ids in the fact table so it would become something like:
week_date, user_id, total_orders, total_sessions ...
But how would I then track product_ids if a user makes more than one purchase in a week? I assume I can't keep multiple product ids in an array (eg: "20/02/2012","5","3","PR01,PR32,PR22")?
I'm thinking it may have to be kept at 'every session' level but that could lead to a very large amount of data. How would you implement granularity for an example such as above?
Dimensional modelling required Dimensions as well as Facts.
You need a Date/Calendar dimension, which includes columns like this:
calendar (id,cal_date,cal_year,cal_month,...)
The "grain" of your fact table is the key to data storage. If you have transactions, then the transaction should be the grain, and you store one row per transaction. Use proper (integer) surrogate keys to your dimensions, and your table won't be as large as you fear.
Now you can write a query like this, to sum sales of product by year:
select product_name,cal_year,sum(purchase_amount)
from fact_whatever
inner join calendar on id = fact_whatever.calendar_id
inner join product on id = fact_whatever.product_id
group by product_name,cal_year

Database - Should total price be stored in a column?

I have orders and items table. Here's the association
# order.rb
has_many :items
# item.rb
belongs_to :order
What I'm wondering is: Should I keep the total price in the order table? The total price is summed from each item.
I need to generate Chart and the only data needed is the total price.
My app is still on local development so I haven't had any performance issue. But I'm guessing it will start having an impact after there are few hundreds of orders.
I suggest that you keep the total price in the orders table as well.
One common thing about shops is that prices change. And when they change they create a huge mess (in both accounting and stats) unless you keep them somewhere other than the product itself.
There is no need to worry about performance for that thing.

RoR and Relational databases: handling Model default values in database

I'm working on a RoR project for work, and I'm having trouble deciding about the design of my relational database tables.
Consider the following:
I've got a model Product, each product has a unique name.
I've also got a model called Shop, each shop has many products.
Finally, I have an Order model, Order is obviously connected to the shop which the order has been made from, and to the list of products which were ordered.
I would like to keep default values (e.g. default price) for each product, and I'd like each Shop to be able to overwrite those default values if needed, but can't really decide on the strategy of doing so.
What I have in mind is as follows:
Create a Product table, which will include the product name, and also, columns to keep the product's default values (e.g. price)
Create a Shop table, which will include everything which has to do with the shop.
Create a Product_To_Shop table, which will hold the product quantity for that exact shop, and will hold additional columns, which match the Product default values columns which will let the shop overwrite the default product related values.
Now when I'd like to get the price for a specific order, i'll first check out the Product_To_Shop table, for the related Product and Shop, and check the Price field for the matching row, and in case it's not set to a value (nil), head to the Product table and fetch the default price value for the relevant product.
The whole thing looks a bit complex for a task which seems a bit more trivial.
I was wondering if anyone ever had to deal with keeping default values in the database like that and has a more elegant solution, since this one seems like an overkill...
you can do the following
Create a Products table, which will include the products data ( but no prices).
Create a Shops table, which will include the shops data.
Create a Prices table, which will include Product_id, Shop_id, Price.
Shop_id defaulted to null which will indicate your default price
When you need the price get the one matching shop_id or isnull

Resources