Normalization of a healthcare database table which tracks the surgeries done - normalization

I have a table to track the surgeries in a hospital called Surgery_Record as below.
surgery_Record_ID patient_ID surgery_ID theatre_ID Surgery_Date
1 1 20 0 2000-05-10
2 85 20 0 2000-01-15
3 10 20 0 2000-01-29
4 13 16 0 2000-11-19
5 15 1 0 2000-05-28
My assumptions are that:
No revisiting of patients
Every patient has only one surgery done
A particular operation Theatre is used only one time in a day
I figured out the following Functional Dependencies:
Patient_ID, Theatre_ID---> Surgery_Date
Surgery_Record_ID---> Patient_ID
Patient_ID---> Surgery_ID, Surgery_Record_ID, Theatre_ID
Patient_ID, Surgery_ID--->Theatre_ID
Surgery_Record_ID, Patient_ID, Surgery_ID, Theatre_ID---> Surgery_Date
From the above dependencies, I found that the candidate keys are {Patient_ID, Theatre_ID}
{Patient_ID, Surgery_ID} and {Surgery_Record_ID, Patient_ID, Surgery_ID, Theatre_ID}
So does my table violate the Second Normal Form? Please help to check if my FDs are correct because i very new at doing this. Thanks a lot in advance

Your FDs are in boldface.
Patient_ID, Theatre_ID---> Surgery_Date
Based on the sample data, I'd have to say this one is wrong. The following FDs seem to be correct for these three attributes. They're derived mainly from your assumptions, which I've pointed out in comments probably don't hold in the real world.
Patient_ID -> {Theatre_ID, Surgery_Date}
Surgery_Date -> Theatre_ID
Surgery_Record_ID---> Patient_ID
Each row gets a different value for Surgery_Record_ID, so Surgery_Record_ID determines every attribute.
surgery_Record_ID -> {patient_ID, surgery_ID, theatre_ID, Surgery_Date}
Patient_ID---> Surgery_ID, Surgery_Record_ID, Theatre_ID
Since patients can't revisit, and since patients can have only one surgery, Patient_ID will be globally unique, just like Surgery_Record_ID. Patient_ID will determine every attribute.
patient_ID -> {surgery_Record_ID, surgery_ID, theatre_ID, Surgery_Date}
Patient_ID, Surgery_ID--->Theatre_ID
I covered Patient_ID above. Surgery_ID doesnt' determine anything.
Surgery_Record_ID, Patient_ID, Surgery_ID, Theatre_ID---> Surgery_Date
Surgery_Record_ID -> {Patient_ID, Surgery_Date}
Patient_ID -> {Surgery_Record_ID, Surgery_Date}
Surgery_Date -> {Surgery_Record_ID, Patient_ID}
Your assumptions require Surgery_Date to be unique. So Surgery_Date determines every attribute.
Surgery_Date -> {patient_ID, surgery_ID, theatre_ID, surgery_Record_ID}

Related

Limit query by sum of attributes of objects in Rails and Postgresql

I have a Lead model that has a pre-calculated float column called "sms_price". I want to allow users to send text messages to those leads, and to assign a budget to their campaigns (something similar to what you can find on fb ads).
I need a scope that can limit the number of leads by the total price of those leads. So for example if the user has defined a total budget of 200
id: 1 | sms_price: 0.5 | total_price: 0.5
id: 2 | sms_price: 1.2 | total_price: 1.7
id: 3 | sms_price: 0.9 | total_price: 2.6
...
id: 94 | sms_price: 0.8 | total_price: 199.4 <--- stop query here
id: 95 | sms_price: 0.7 | total_price: 200.1
So I need two things in my scope:
Calculate the total price recursively
Get only the leads that have a total price lower than the desired budget
So far I have only managed to do the first task (Calculate the total price recursively) using this scope:
scope :limit_by_total_price, ->{select('"leads".*, sum(sms_price) OVER(ORDER BY id) AS total_price')}
This works and if I do Lead.limit_by_total_price.last.total_price I get 38039.7499999615
Now what I need is a way to retrieve only the leads that have a total price lower than the budget:
scope :limit_by_total_price, ->(budget){select('"leads".*, sum(sms_price) OVER(ORDER BY id) AS total_price').where('total_price < ?', budget)}
But it doesn't recognise the total_price attribute:
ActiveRecord::StatementInvalid: PG::UndefinedColumn: ERROR: column "total_price" does not exist
Why does it recognise the total_price attribute in a single object and not in the scope ?
The problem is that the columns calculated in a SELECT clause are not available to the WHERE clause in the same statement. To do what you want, you need a subquery.
You can do this and yet stay in the Rails universe using ActiveRecord's from method. The technique is nicely illustrated in this Hashrocket blog post.
In your case it might look something like this (because of the complexity, I would use a class method rather than a scope):
def self.limit_by_total_price(budget)
subquery = select('leads.*, sum(leads.sms_price) over(order by leads.id) as total_price')
from(subquery, :leads).where('total_price < ?', budget)
end

Sort a table using the total of other tables

2 tables: memberships and week_scores
Membership has_many :week_scores
WeekScore belongs_to :membership
Every membership has 16 week_scores
Every week_score table has a score column that has an integer from 0 to 20.
So just to be clear all memberships have 16 week_scores and I want to display a leaderboard table of all members of the group sorted by the total score of all their 16 week_scores tables.
It should look something like this
Username | Score
David | 114
Rick | 97
Mike | 95
...
The score column should be a sum of all the week_scores one user has so in case of David it was
week_score.score 1: 15
week_score.score 2: 12
week_score.score 3: 14
...
week_score.score 16: 9
total: 114
If the name of the post is not good let me know.
One way of doing this is with an subselect:
subselect = "SELECT SUM(score) FROM week_scores WHERE membership_id=memberships.id"
#memberships = Membership.
select("memberships.*, (#{subselect}) AS total_score").
order("total_score DESC")
The additional column specified in the select clause (total_score) will be available on the returned membership instances as if it were a “real” attribute, so calling something like #memberships.first.total_score will work.
(Note that I extracted the subselect into a separate variable only to make the code more readable, it is of course also possible to have it inline instead.)

How to make Rails/ActiveRecord return unique objects using join table's boolean column

I have a Rails 4 app using ActiveRecord and Postgresql with two tables: stores and open_hours. a store has many open_hours:
stores:
Column |
--------------------+
id |
name |
open_hours:
Column |
-----------------+
id |
open_time |
close_time |
store_id |
The open_time and close_time columns represent the number of seconds since midnight of Sunday (i.e. beginning of the week).
I would like to get list of store objects ordered by whether the store is open or not, so stores that are open will be ranked ahead of the stores that are closed. This is my query in Rails:
Store.joins(:open_hours).order("#{current_time} > open_time AND #{current_time} < close_time desc")
Notes that current_time is in number of seconds since midnight on the previous Sunday.
This gives me a list of stores with the currently open stores ranked ahead of the closed ones. However, I'm getting a lot of duplicates in the result.
I tried using the distinct, uniq and group methods, but none of them work:
Store.joins(:open_hours).group("stores.id").group("open_hours.open_time").group("open_hours.close_time").order("#{current_time} > open_time AND #{current_time} < close_time desc")
I've read a lot of the questions/answers already on Stackoverflow but most of them don't address the order method. This question seems to be the most relevant one but the MAX aggregate function does not work on booleans.
Would appreciate any help! Thanks.
Here is what I did to solve the issue:
In Rails:
is_open = "bool_or(#{current_time} > open_time AND #{current_time} < close_time)"
Store.select("stores.*, CASE WHEN #{is_open} THEN 1 WHEN #{is_open} IS NULL THEN 2 ELSE 3 END AS open").group("stores.id").joins("LEFT JOIN open_hours ON open_hours.store_id = stores.id").uniq.order("open asc")
Explanation:
The is_open variable is just there to shorten the select statement.
The bool_or aggregate function is needed here to group the open_hours records. Otherwise there likely will be two results for each store (one open and one closed), which is why using the uniq method alone doesn't eliminate the duplicate issues
LEFT JOIN is used instead of INNER JOIN so we can include the stores that don't have any open_hours objects
The store can be open (i.e. true), closed (i.e. false) or not determined (i.e. nil), so the CASE WHEN statement is needed here: if a store is open, then it's 1, 2 if not determined and 3 if closed
Ordering the results ASC will show open stores first, then the not determined ones, then the closed stores.
This solution works but doesn't feel very elegant. Please post your answer if you have a better solution. Thanks a lot!
Have you tried uniq method, just append it at the end
Store.joins(:open_hours).order("#{current_time} > open_time AND #{current_time} < close_time desc").uniq

Sorting by rank and total where multiple entries may exist

Ruby 2.1.5
Rails 4.2.1
My model is contributions, with the following fields:
event, contributor, date, amount
The table would have something like this:
earth_day, joe, 2014-04-14, 400
earth_day, joe, 2015-05-19, 400
lung_day, joe, 2015-05-20, 800
earth_day, john, 2015-05-19, 600
lung_day, john, 2014-04-18, 900
lung_day, john, 2015-05-21, 900
I have built an index view that shows all these fields and I implemented code to sort (and reverse order) by clicking on the column titles in the Index view.
What I would to do is have the Index view displayed like this:
Event Contributor Total Rank
Where event is only listed once per contributor and the total is sum of all contributions for this event by the contributor and rank is how this contributor ranks relative to everyone else for this particular event.
I am toying with having a separate table where only a running tally is kept for each event/contributor and a piece of code to compute rank and re-insert it in the table, then use that table to drive views.
Can you think of a better approach?
Keeping a running tally is a fine option. Writes will slow down, but reads will be fast.
Another way is to create a database view, if you are using postgresql, something like:
-- Your table structure and data
create table whatever_table (event text, contributor text, amount int);
insert into whatever_table values ('e1', 'joe', 1);
insert into whatever_table values ('e2', 'joe', 1);
insert into whatever_table values ('e1', 'jim', 0);
insert into whatever_table values ('e1', 'joe', 1);
insert into whatever_table values ('e1', 'bob', 1);
-- Your view
create view event_summary as (
select
event,
contributor,
sum(amount) as total,
rank() over (order by sum(amount) desc) as rank
from whatever_table
group by event, contributor
);
-- Using the view
select * from event_summary order by rank;
event | contributor | total | rank
-------+-------------+-------+------
e1 | joe | 2 | 1
e1 | bob | 1 | 2
e2 | joe | 1 | 2
e1 | jim | 0 | 4
(4 rows)
Then you have an ActiveRecord class like:
class EventSummary < ActiveRecord::Base
self.table_name = :event_summary
end
and you can do stuff like EventSummary.order(rank: :desc) and so on. This won't slow down writes, but reads will be a little slower, depending on how much data you are working with.
Postgresql also has support for materialized views, which could give you the best of both worlds, assuming you can have a little bit of lag between when the data is entered and when the summary table is updated.

Keep historical database relations integrity when data changes

I hesitate between various alternative when it comes to relations that have "historical"
value.
For example, let's say an User has bought an item at a certain date... if I just store this the classic way like:
transation_id: 1
user_id: 2
item_id: 3
created_at: 01/02/2010
Then obviously the user might change its name, the item might change its price, and 3 years later when I try to create a report of what happend I have false data.
I have two alternative:
keep it stupid like I shown earlier, but use something like https://github.com/airblade/paper_trail and do something like:
t = Transaction.find(1);
u = t.user.version_at(t.created_at)
create a database like transaction_users and transaction_items and copy the users/items into these tables when a transaction is made. The structure would then become:
transation_id: 1
transaction_user_id: 2
transaction_item_id: 3
created_at: 01/02/2010
Both approach have their merits, tho solution 1 looks much simpler... Do you see a problem with solution 1? How is this "historical data" problem usually solved? I have to solve this problem for 2-3 models like this for my project, what do you reckon would be the best solution?
Taking the example of Item price, you could also:
Store a copy of the price at the time in the transaction table
Creating a temporal table for item prices
Storing a copy of the price in the transaction table:
TABLE Transaction(
user_id -- User buying the item
,trans_date -- Date of transaction
,item_no -- The item
,item_price -- A copy of Price from the Item table as-of trans_date
)
Getting the price as of the time of transaction is then simply:
select item_price
from transaction;
Creating a temporal table for item prices:
TABLE item (
item_no
,etcetera -- All other information about the item, such as name, color
,PRIMARY KEY(item_no)
)
TABLE item_price(
item_no
,from_date
,price
,PRIMARY KEY(item_no, from_date)
,FOREIGN KEY(item_no)
REFERENCES item(item_no)
)
The data in the second table would looke something like:
ITEM_NO FROM_DATE PRICE
======= ========== =====
A 2010-01-01 100
A 2011-01-01 90
A 2012-01-01 50
B 2013-03-01 60
Saying that from the first of January 2010 the price of Article A was 100. It changed the first of Januari 2011 to 90, and then again to 50 from the first of January 2012.
You will most likely add a TO_DATE to the table, even though it's a denormalization (the TO_DATE is the next FROM_DATE).
Finding the price as of the transaction would be something along the lines of:
select t.item_no
,t.trans_date
,p.item_price
from transaction t
join item_price p on(
t.item_no = p.item_no
and t.trans_date between p.from_date and p.to_date
);
ITEM_NO TRANS_DATE PRICE
======= ========== =====
A 2010-12-31 100
A 2011-01-01 90
A 2011-05-01 90
A 2012-01-01 50
A 2012-05-01 50
I'll went with PaperTrail, it keeps history of all my models, even their destruction. I could always switch to point 2 later on if it doesn't scale.

Resources