Retrieve data quickly and efficient

Retrieve data quickly and efficient - ruby-on-rails

I am using Ruby On Rails and Postgresql as a DB.
My users are subscribed to a plan and based on their plan they are allow to a number of resources, I have a DB table which is keeping track of their usage. I need to check their usage on multiple actions, I would like to know if there is a best practice of working whit this data (storing and retrieving)
My table:
UserStats: id(pk), projects_left, keys_left, user_id
Usually on create actions I retrieve data and then update the data on that userstats table also there are many places where I do just a select on the table.

If resources are also stored as database tables you can consider creating a trigger on insert to those tables which will cause the insert to fail if they exceed their limits.
this way you never need to update UserStats, you just store their max allowed.
I believe it's less error prone, handles deletes without extra code and allows other apps to modify the db
eg:
CREATE OR REPLACE FUNCTION check_limits_projects(int user_id) RETURNS TRIGGER AS
DECLARE
my_projects_count INT
my_project_limit INT
BEGIN
SELECT count(*) INTO my_projects_count FROM projects WHERE user_id = NEW.user_id
SELECT projects_left INTO my_projects_limit FROM UserStats WHERE user_id = NEW.user_id
IF (my_project_count >= my_project_limit) THEN
RETURN FALSE
END IF
RETURN NEW
END
CREATE TRIGGER 'limit_check' BERFORE INSERT ON projects;

Related

Constructing a 1-many relationship with custom string foreign keys in PGSQL ActiveRecord

I have the following tables (Showing only the relevant fields):
lots
history_id
histories
initial_date
updated_date
r_doc_date
l_doc_date
datasheet_finalized_date
users
username
So I am rebuilding an exisiting application that dealt with a rather large amount of bureaucracy, and needs to keep track of five separate dates (as shown in the histories table). The problem that I am having is that I don't know how best to model this in ActiveRecord, historically it's been done by having the histories tables represented as so:
histories
initial_date
updated_date
r_doc_date
l_doc_date
datasheet_finalized_date
username
Where only one of the five date fields could ever be filled at one time...which in my opinion is a terrible way to go about modeling this...
So basically I want to build a unique queryable connection between every date in the histories table and its specific relevant user. Is it possible to use every timestamp in the histories table as a foreign key to query the specific user?

I think that there's a simpler approach to what you're trying to accomplish. It sounds like you want to be able to query each lot and find the 'relevant user' (I am guessing that this refers to the user who did whatever action is necessary to update the specific column on the histories table). To do this I would first create a join table between users and histories, called user_histories:
user_histories
user_id
history_id
I would create a row on this table any time a lot's history is updated and one of the relevant dates changes. But that now brings up the issue of being able to differentiate which specific date-type the user actually changed (since there are five). Instead of using each one as a foreign key (since they wouldn't necessarily be unique) I would recommend creating a 'history_code' on the user_histories table to represent each one of the history date-types (much like how a polymorphic_type is used). Resulting in the user_histories table looking like this:
user_histories
user_id
history_id
history_code
And an example record looking like this:
UserHistory.sample = {
user_id: 1,
history_id: 1,
history_code: "Initial"
}
Allowing you to query the specific user who changed a record in the histories table with the following:
history.user_histories.select { |uhist| hist.history_code == "Initial" }
I would recommend building these longer queries out into model methods, allowing for a faster, cleaner query down the line, for example:
#app/models/history.rb
def initial_user
self.user_histories.select { |uhist| hist.history_code == "Initial" }
end
This should give you the results you want, but should get around the whole issue of the dates not being suitable for foreign keys, since you can't guarantee their uniqueness.

rails count of records for multiple tables on every page of app

I'm trying to add a count to each link on a nav bar. Each link on the nav bar goes to different models.
I want to avoid having to query and count records of multiple models every time a visitor navigates to another page.
How do I cache this information?
counter_cache seems to be only for associations. These are standalone models without the need for associations.

You could use a simple Model.count, it executes a MySQL count query, which shouldn't cost much.
Then there's the active record cache, which would cache the result of the query with the query string as a cache key, any following requests would hit the cache and it wouldn't even execute the query again, it would be returned directly from the query cache, so I don't think it would be expensive at all.
EDIT:
About .size, it's is a method for enumerable, like arrays, to use it you need to fetch a result set to calculate it's size, that result set is extra data you don't need, you're just interested in the count, so you should just tell the database to fetch the count, hence the Model.count, here's an example from an app I already have:
User.count
(0.2ms) SELECT COUNT(*) FROM `users`
User.all.size
User Load (71.4ms) SELECT `users`.* FROM `users`
Notice the difference in the queries, also the difference in the response time, first one is very fast because it's already cached.
About the index, in mysql ( and i think any respectful database ) any primary key should be unique and indexed, because that's the primary identifier of the record, you don't need to specify it in the migration, rails creates the auto increment unique primary key by it self, it's a default that it doesn't even appear in the migration files, to disable the creation of primary key you would need to add an extra option id: false which is rarely needed.

2 column table, ignore duplicates on mass insert postgresql

I have a Join table in Rails which is just a 2 column table with ids.
In order to mass insert into this table, I use
ActiveRecord::Base.connection.execute("INSERT INTO myjointable (first_id,second_id) VALUES #{values})
Unfortunately this gives me errors when there are duplicates. I don't need to update any values, simply move on to the next insert if a duplicate exists.
How would I do this?
As an fyi I have searched stackoverflow and most the answers are a bit advanced for me to understand. I've also checked the postgresql documents and played around in the rails console but still to no avail. I can't figure this one out so i'm hoping someone else can help tell me what I'm doing wrong.
The closest statement I've tried is:
INSERT INTO myjointable (first_id,second_id) SELECT 1,2
WHERE NOT EXISTS (
SELECT first_id FROM myjointable
WHERE first_id = 1 AND second_id IN (...))
Part of the problem with this statement is that I am only inserting 1 value at a time whereas I want a statement that mass inserts. Also the second_id IN (...) section of the statement can include up to 100 different values so I'm not sure how slow that will be.
Note that for the most part there should not be many duplicates so I am not sure if mass inserting to a temporary table and finding distinct values is a good idea.
Edit to add context:
The reason I need a mass insert is because I have a many to many relationship between 2 models where 1 of the models is never populated by a form. I have stocks, and stock price histories. The stock price histories are never created in a form, but rather mass inserted themselves by pulling the data from YahooFinance with their yahoo finance API. I use the activerecord-import gem to mass insert for stock price histories (i.e. Model.import columns,values) but I can't type jointable.import columns,values because I get the jointable is an undefined local variable

I ended up using the WITH clause to select my values and give it a name. Then I inserted those values and used WHERE NOT EXISTS to effectively skip any items that are already in my database.
So far it looks like it is working...
WITH withqueryname(first_id,second_id) AS (VALUES(1,2),(3,4),(5,6)...etc)
INSERT INTO jointablename (first_id,second_id)
SELECT * FROM withqueryname
WHERE NOT EXISTS(
SELECT first_id FROM jointablename WHERE
first_id = 1 AND
second_id IN (1,2,3,4,5,6..etc))
You can interchange the Values with a variable. Mine was VALUES#{values}
You can also interchange the second_id IN with a variable. Mine was second_id IN #{variable}.

Here's how I'd tackle it: Create a temp table and populate it with your new values. Then lock the old join values table to prevent concurrent modification (important) and insert all value pairs that appear in the new table but not the old one.
One way to do this is by doing a left outer join of the old values onto the new ones and filtering for rows where the old join table values are null. Another approach is to use an EXISTS subquery. The two are highly likely to result in the same query plan once the query optimiser is done with them anyway.
Example, untested (since you didn't provide an SQLFiddle or sample data) but should work:
BEGIN;
CREATE TEMPORARY TABLE newjoinvalues(
first_id integer,
second_id integer,
primary key(first_id,second_id)
);
-- Now populate `newjoinvalues` with multi-valued inserts or COPY
COPY newjoinvalues(first_id, second_id) FROM stdin;
LOCK TABLE myjoinvalues IN EXCLUSIVE MODE;
INSERT INTO myjoinvalues
SELECT n.first_id, n.second_id
FROM newjoinvalues n
LEFT OUTER JOIN myjoinvalues m ON (n.first_id = m.first_id AND n.second_id = m.second_id)
WHERE m.first_id IS NULL AND m.second_id IS NULL;
COMMIT;
This won't update existing values, but you can do that fairly easily too by using with a second query that does an UPDATE ... FROM while still holding the write table lock.
Note that the lock mode specified above will not block SELECTs, only writes like INSERT, UPDATE and DELETE, so queries can continue to be made to the table while the process is ongoing, you just can't update it.
If you can't accept that an alternative is to run the update in SERIALIZABLE isolation (only works properly for this purpose in Pg 9.1 and above). This will result in the query failing whenever a concurrent write occurs so you have to be prepared to retry it over and over and over again. For that reason it's likely to be better to just live with locking the table for a while.

How to make sure that it is possible to update a database table column only in one way?

I am using Ruby on Rails v3.2.2 and I would like to "protect" a class/instance attribute so that a database table column value can be updated only one way. That is, for example, given I have two database tables:
table1
- full_name_column
table2
- name_column
- surname_column
and I manage the table1 so that the full_name_column is updated by using a callback stated in the related table2 class/model, I would like to make sure that it is possible to update the full_name_column value only through that callback.
In other words, I should ensure that the table2.full_name_column value is always
"#{table1.name_column} #{table1.surname_column}"
and that it can't be another value. So, for example, if I try to "directly" update the table1.full_name_column, it should raise something like an error. Of course, that value must be readable.
Is it possible? What do you advice on handling this situation?
Reasons to this approach...
I want to use that approach because I am planning to perform database searches on table1 columns where the table1 contains other values related to a "profile"/"person" object... otherwise, probably, I must make some hack (maybe a complex hack) to direct those searches to the table2 so to look for "#{table1.name_column} #{table1.surname_column}" strings.
So, I think that a simple way is to denormalize data as explained above, but it requires to implement an "uncommon" way to handling that data.
BTW: An answer should be intend to "solve" related processes or to find a better approach to handle search functionalities in a better way.

Here's two approaches for maintaining the data on database level...
Views and materialized tables.
If possible, the table1 could be VIEW or for example MATERIALIZED QUERY TABLE (MQT). The terminology might differ slightly, depending on the used RDMS, I think Oracle has MATERIALIZED VIEWs whereas DB2 has MATERIALIZED QUERY TABLEs.
VIEW is simply an access to data that is physically in some different table. Where as MATERIALIZED VIEW/QUERY TABLE is a physical copy of the data, and therefore for example not in sync with source data in real time.
Anyway. these approaches would provide read-only access to data, that is owned by table2, but accessible by table1.
Example of very simple view:
CREATE VIEW table1 AS
SELECT surname||', '||name AS full_name
FROM table2;
Triggers
Sometimes views are not convenient as you might actually want to have some data in table1 that is not available from anywhere else. In these cases you could consider to use database triggers. I.e. create trigger that when table2 is updated, also table1 is updated within the same database transaction.
With the triggers the problem might be that then you have to give privileges to the client to update table1 also. Some RDMS might provide some ways to tune access control of the triggers, i.e. the operations performed by TRIGGERs would be performed with different privileges from the operations that initiate the TRIGGER.
In this case the TRIGGER could look something like this:
CREATE TRIGGER UPDATE_NAME
AFTER UPDATE OF NAME, SURNAME ON TABLE2
REFERENCING NEW AS NEWNAME
FOR EACH ROW
BEGIN ATOMIC
UPDATE TABLE1 SET FULL_NAME = NEWNAME.SURNAME||', '||NEWNAME.NAME
WHERE SOME_KEY = NEWNAME.SOME_KEY
END;

By replicating the data from table2 into table1 you've already de-normalized it. As with any de-normalization, you must be disciplined about maintaining sync. This means not updating things you're not supposed to.
Although you can wall off things with attr_accessible to prevent accidental assignment, the way Ruby works means there's no way to guarantee that value will never be modified. If someone's determined enough, they will find a way. This is where the discipline comes in.
The best approach is to document that the column should not be modified directly, block mass-assignment with attr_accessible, and leave it at that. There's no concept of a write-protected attribute, really, as far as I know.

How can safely assign an incrementing ID to a subset of objects?

I have an Order object which can be in an unpaid or paid state. When an order is paid, I want to set an order_number which should be an incrementing number.
Sounds easy enough, but I'm worried about collisions. I can imagine one order having an order_number stored in memory, about to save and then another order saves itself, using that number, now the one in memory should be recalculated, but how?

You can create a database table that just contains an AUTO_INCREMENT primary key. When you need to to get a new order_number just do an insert into this table and get the the value of the primary key for the created row.

There are a lot of approaches. Essentially you need a lock to ensure that each request to the counter always return a different value.
Memcache, Redis and some key-value stores have this kind of counter feature. eg, each time you want to get a new order_number, just call the incr command of Redis, it will increment the counter and return the new value.
A more complex solution can be implemented via the trigger/stored procedure/sequence features of RMDBS(like mysql). For mysql, create a new table contains an AUTO_INCREMENT primary key. When you want to get a new order_number, make an insert into this table and get last_insert_id(). If you want ACID, just wrap the procedure in a transaction.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart