Rating System Database Structure - database-connection

I have two entity groups. Restaurants and Users. Restaurants can be rated (1-5) by users. And rating fromeach user should be retrievable.
Resturant(id, name, ..... , total_number_of_votes, total_voting_points )
User (id, name ...... )
Rating (id, restaurant_id, user_id, rating_value)
Do i need to store the avg value so that it need not be calculated every time ? which table is the best place to store avg_rating, total_no_of_votes, total_voting_points ?

Well, if you store the average value somewhere; it will only be accurate as of the last time you calculated it. (i.e. you have 5 reviews; then store the averages somewhere. You get 5 more new reviews, and then your saved average is incorrect).
My opinion is that this sort of logic is perfectly suited to a middle-tier. Calculating an average shouldn't be very resource intensive, and really shouldn't impact performance.
If you really want to store it in the database; I would probably store them in their own table, and update those values via triggers. However, this could be even more resource intensive than calculating it in the middle-tier.

Some database, for example, PostGreSQL, allow you to store an array as part of a row. e.g.
create table restaurants (
...,
ratings integer[],
...
);
So you could, for example, keep the last 5 ratings in the same row as the restaurant. When you get a new rating, shuffle the old ratings left, and add the new rating at the end, then calculate the average.

Related

Should I use a model archive in rails

I have a model product with a has_many relation prices. The prices table is growing rapidly, only few current prices are normally needed, but I want to keep all as a history.
So I am thinking to "archive" all old prices. How do I do that best?
Before I had a column old and was filtering them out when ever I only wanted the current prices. But now the prices table has 2.5 million rows and only 200k are needed in most situations. That's why I thought I would just create a new model price_archive. Copy all "old" prices to price_archive and delete it from prices. And all logic will be moved to a module, used by both models, so I can use price and price_archive in the same way.
Pros for the archive approach:
~ most of the queries are done on the smaller data set (200k, not much growing)
Cons:
displaying both ordered by time needs to be sorted on some kind of joined data set, because times overlap. So it looks like (part.prices.to_a + part.prices_archive.to_a).sort(&:time). Not a big problem, because this will be used very soldomly. But:
I have other models (i.e. order) that use prices in a belongs_to relation, so those need price_id and price_archive_id (with one id always being nil), so that they still reference a price.
Most queries are: show all prices for product (in a select box) and mark the price that is connected to this order (or add it to the select box, when it is archived)
So the code would be something like:
Order.where(*where*).includes(:part => :prices, :price, :price_archive)
The db will query: prices WHERE part_id = ? [on 200k] + prices WHERE id = ? [on 200k] + price_archives WHERE id = ? [on 2300k, but with primary_key]
instead of prices WHERE part_id = ? [on 2500k, with normal index]
Is there a better way or should I stay with the old column?

Firestore Map field performance with large dataset

I have a list of products, each user can buy multiple tickets for a product. When I fetch a product I need to know how many tickets a user has bought per product. I currently have product as a separate collection and I fetch a list of tickets per products for the current logged in user. This worked great but I was using an in statement to make a single query to fetch the tickets in a single query for multiple products, and Firestore only allows 10 items in an in statement
I'm considering storing the count of tickets on each product as a map. How does this affect the performance of the snapshotListener? Let's say I listen to 100 products and each has a Map with 100k key-values where the key is a string and the value is an Int.
Is it easy to update the map, can i do an increment on the value of a specific field?
Security is not an issue, it's ok to have the ticket count for all users public
Product {
name: String
....
tickets: {
"USERID": 5,
// 100k more records
}
}
I'm considering storing the count of tickets on each product as a map. How does this affect the performance of the snapshotListener?
It has no effect on the performance of anything other than the time it takes to download the document.
Is it easy to update the map, can i do an increment on the value of a specific field?
Yes, it's easy. Use dot notation to locate the field to increment, and use FieldValue.increment() to increment it.

How to store data in fact table with multiple products in an order in data warehouse

I am trying to design a dimensional modeling for data warehousing for one of my project(Sales Order). I'm new to this concept.
So far, I could understand that the product, customer and date can be stored in the dimension table and the order info will be in the fact table.
Date_dimension table structure will be
date_dim_id, date, week_number, month_number
Product_dimension table structure will be
product_dim_id, product_name, desc, sku
Order_fact table structure will be
order_id, product_dim_id(fk), date_dim_id(fk), order_quantity, order_total_price, etc
If a order is place with 2 or more number of product, will there be repeated entry in the order_fact table for the same order_id, date_dim_id
Please help on this. I'm confused here. I know that in a relational database, order table will have one entry per order and relation between the product and order will be maintained in a different table having the order_id and product_id as the foreign key.
Thanks in advance.
This is a classic case where you should (probbaly) have two fact tables
FactOrderHeader and FactOrderDetail.
FactOrderHeader will have a record for each order, storing information regarding the value of the order and any order level discounts; though they could be expressed as an OrderDetail record in some cases.
FactOrderDetail will have a record for each order line, storing information regard the product, product cost, product sale price, number of items, item discount. etc.
You may need to have a DimOrderHeader as well, if there are non-Fact pieces of information that you want to store, for example, date the order was taken, delivered, paid.

Granularity in Star Schema leads to multiple values in Fact Table?

I'm trying to understand star schema at the moment & struggling a lot with granularity.
Say I have a fact table that has session_id, user_id, order_id, product_id and I want to roll-up to sessions by user by week (keeping in mind that not every session would lead to an order or a product & the DW needs to track the sessions for non-purchasing users as well as those who purchase).
I can see no reason to track order_ids or session_ids in the fact table so it would become something like:
week_date, user_id, total_orders, total_sessions ...
But how would I then track product_ids if a user makes more than one purchase in a week? I assume I can't keep multiple product ids in an array (eg: "20/02/2012","5","3","PR01,PR32,PR22")?
I'm thinking it may have to be kept at 'every session' level but that could lead to a very large amount of data. How would you implement granularity for an example such as above?
Dimensional modelling required Dimensions as well as Facts.
You need a Date/Calendar dimension, which includes columns like this:
calendar (id,cal_date,cal_year,cal_month,...)
The "grain" of your fact table is the key to data storage. If you have transactions, then the transaction should be the grain, and you store one row per transaction. Use proper (integer) surrogate keys to your dimensions, and your table won't be as large as you fear.
Now you can write a query like this, to sum sales of product by year:
select product_name,cal_year,sum(purchase_amount)
from fact_whatever
inner join calendar on id = fact_whatever.calendar_id
inner join product on id = fact_whatever.product_id
group by product_name,cal_year

Rails Database and Model for Income and Expenses

I'm new to Rails and am trying to figure out how to create models to track income and expenses in my app. Should I:
1) Create one model and database table called Finance, and then set a field called "type" to either income or expense, then continue with description, amount, date?
2) Or should I create two models and two tables called Income and Expenses, each with description, amount, and date?
I intend to use this data to allow photographers to track income and expenses related to their business. So for example when the photographer books an appointment they can associate income and expenses with that appointment. They can also see a report which shows monthly income, expenses, and profit.
I would say go with one table and use STI (i.e use the type field)..
Both income and expenses are inherently the same thing, just the "direction" of the operation is different. So to me it makes sense to use the same data model, with exceptions hidden in specific subtypes.
Now as for the issues mentioned in the other answer:
Ordering both items at the same time becomes easy with one table. It will be painful with two.
When indexing your table properly, it doesn't matter if its one or two tables. When creating an index on the type column, the cardinality of the records is the same as it would be in two tables, thus not really being that much different in terms of performance. Aggregation will be easier and faster with one table as well.
Table locking is not an issue, unless you use some kind of a you database (like MyISAM), which you should not be doing.
It's basically just a question of preference. You can do all database queries with one or two tables (using UNION). So I'd prefer two tables to have a cleaner model structure. Image you'd want so save an income entry:
One Table: You always have to set the type
Two Tables: You just have to choose the right model
But I can image one database query that could(!) be faster with using only one table:
If you want to ORDER both types let's say by date.
And there's another point where one table is better, but that doesn't apply to your model:
If there's an infinite number of types. Or: If the number of types can change.
For everything else two separate tables are better. Concerning query performance:
If the tables get really huge and you for example want to retrieve all income entries, it's of course faster to look up those entries in a table with 300000 entries than in a table with 600000 entries.
With a deeper look at the DBMS there's another reason for using two tables:
Table locking. Some database engines lock whole tables for write operations. Thus only half of the data would get locked and the other half can still be accessed at the same time.
I will have a look at the ORDER thing with two tables. Maybe I'm wrong and the performance impact isn't even there.
Results:
I've created three simple tables (using MySQL):
inc: id (int, PK), money (int, not null)
exp: id (int, PK), money (int, not null)
combi: id (int, PK), type (tinyint, index, not null), money (not null)
And then filled the tables with random data:
money: from 1 to 10000
type: from 1 to 2
inc: 100000 entries
exp: 100000 entries
combi: 200000 entries
Run these queries:
SELECT id, money
FROM combi
WHERE money > 5000
ORDER BY money
LIMIT 200000;
0,1 sec ... without index: 0,1 sec
SELECT * FROM (
SELECT id, money FROM inc WHERE money > 5000
UNION
SELECT id, money FROM exp WHERE money > 5000
) a
ORDER BY money LIMIT 200000;
0,16 sec
SELECT id, money
FROM combi
WHERE money > 5000 && type = 1
ORDER BY money
LIMIT 200000;
0,14 sec ... without index: 0,085 sec
SELECT id, money
FROM inc
WHERE money > 5000
ORDER BY money
LIMIT 200000;
0,04 sec
And you can see the expected results:
when you need income and expenses in one query, then one table is faster
when you need only income OR expenses, then two tables are faster
But what I don't understand: Why is the query with type = 1 so much slower? I thought using index would make it nearly equal fast?

Resources