I'd like to use the rank() function PostgreSQL on one of my columns.
Character.select("*, rank() OVER (ORDER BY points DESC)")
But since I don't have a rank column in my table rails doesn't include it with the query. What would be the correct way to get the rank included in my ActiveRecord object?
try this:
Character.find_by_sql("SELECT *, rank() OVER (ORDER BY points DESC) FROM characters")
it should return you Character objects with a rank attribute, as documented here. However, this may not be database-agnostic and tends to get messy if you pass around the objects.
another (expensive) solution is to add a rank column to your table, and have a callback recalculate all records' rank using .order whenever a record is saved or destroyed.
edit :
another idea suitable for single-record queries can ben seen here
Related
I have been using different methods to get specific fields from active record, But which one is faster and preferred to use and how are they different from one another?
User.all.collect(&:name)
User.all.pluck(:name)
User.all.select(:name)
User.all.map(&:name)
Thanks for your help in advance.
Usage of any of these methods requires different use cases:
Both select and pluck make SQL's SELECT of specified columns (SELECT "users"."name" FROM "users"). Hence, if you don't have users already fetched and not going to, these methods will be more performant than map/collect.
The difference between select and pluck:
Performance: negligible when using on a reasonable number of records
Usage: select returns the list of models with the column specified, pluck returns the list of values of the column specified. Thus, again, the choice depends on the use case.
collect/map methods are actually aliases, so there's no difference between them. But to iterate over models they fetch the whole model (not the specific column), they make SELECT "users".* FROM "users" request, convert the relation to an array and map over it.
This might be useful, when the relation has already been fetched. If so, it won't make additional requests, what may end up more performant than using pluck or select. But, again, must be measured for a specific use case.
pluck: retrieve just names from users, put them in an array as strings (in this case) and give it to you.
select: retrieve all the users from db with just the 'name' column and returns a relation.
collect/map (alias): retrieve all the users from db with all columns, put them in an array of User objects with all the fields, then transform every object in just the name and give this names array to you.
I put this in order of performance to me.
I have read the Ruby docs on the query method "group", but I am having a hard time understanding how to use it.
lets say I have a table called users, and there are the fields name, email, gender.
I am able to type User.group(:name).count, which return a a hash with key value pairs of {name: count}.
Why does User.group(:name) not work?
Is there a way of grouping similar names, and accessing those records?
ex. User.group(:name).first or User.group(:name).each
It seems to me that I am thinking of using "group" incorrectly.
Why does User.group(:name) not work?
When you are using GROUP BY in SQL it needs a SELECT clause too. But it was absent in your case, and that throws error.
In your first case the query was SELECT COUNT(*) from users GROUP BY name, and this is the reason it worked.
As per your last sentence you need:
User.group(:name).select(:name).each do |record|
# work with record
end
I don't know what is the DB client you are using, but here is the idea from Postgresql GROUP BY documentation.
GROUP BY will condense into a single row all selected rows that share the same values for the grouped expressions. expression can be an input column name, or the name or ordinal number of an output column (SELECT list item), or an arbitrary expression formed from input-column values. In case of ambiguity, a GROUP BY name will be interpreted as an input-column name rather than an output column name.
Aggregate functions, if any are used, are computed across all rows making up each group, producing a separate value for each group (whereas without GROUP BY, an aggregate produces a single value computed across all the selected rows). When GROUP BY is present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions, since there would be more than one possible value to return for an ungrouped column.
I've got a User model and a Card model. User has many Cards, so card has a attribute user_id.
I want to fetch the newest single Card for each user. I've been able to do this:
Card.all.order(:user_id, :created_at)
# => gives me all the Cards, sorted by user_id then by created_at
This gets me half way there, and I could certainly iterate through these rows and grab the first one per user. But this smells really bad to me as I'd be doing a lot of this using Arrays in Ruby.
I can also do this:
Card.select('user_id, max(created_at)').group('user_id')
# => gives me user_id and created_at
...but I only get back user_ids and created_at timestamps. I can't select any other columns (including id) so what I'm getting back is worthless. I also don't understand why PG won't let me select more columns than above without putting them in the group_by or an aggregate function.
I'd prefer to find a way to get what I want using only ActiveRecord. I'm also willing to write this query in raw SQL but that's if I can't get it done with AR. BTW, I'm using a Postgres DB, which limits some of my options.
Thanks guys.
We join the cards table on itself, ON
a) first.id != second.id
b) first.user_id = second.user_id
c) first.created_at < second.created_at
Card.joins("LEFT JOIN cards AS c ON cards.id != c.id AND c.user_id = cards.user_id AND cards.created_at < c.created_at").where('c.id IS NULL')
This is a bit late, but I am working on the same matter, and i found this one works for me :
Card.all.group_by(&:user_id).map{|s| s.last.last}
What do you think ?
I've found one solution that is suboptimal performance-wise but will work for very small datasets, when time is short or it's a hobby project:
Card.all.order(:user_id, :created_at).to_a.uniq(&:user_id)
This takes the AR:Relation results, casts them into a Ruby Array, then performs a Array#uniq on the results with a Proc. After some brief testing it appears #uniq will preserve order, so as long as everything is in order before using uniq you should be good.
The feature is time sensitive so I'm going to use this for now, but I will be looking at something in raw SQL following #Gene's response and link.
I have this query in a project model:
report = self.reports.group(:key_id)
report.select('key_id, count(*) as count')
What do I need to add in order to get another column (level) from reports table?
I tried adding my column to select but that means that I have to group it as well and I only want to get the unique records by key_id
Thank you
If you want to include information about another field, then you have to include that field in the group expression or as part of an aggregate field. That's a fundamental aspect of SQL.
For example, if you want to count the number of occurrences of various values of level associated with each key_id then you can add a count(level) column. The aggregation field can get arbitrarily "fancy", such as counting up the number of occurrences of level within various bands as you've mentioned in your comment.
I have a Join table in Rails which is just a 2 column table with ids.
In order to mass insert into this table, I use
ActiveRecord::Base.connection.execute("INSERT INTO myjointable (first_id,second_id) VALUES #{values})
Unfortunately this gives me errors when there are duplicates. I don't need to update any values, simply move on to the next insert if a duplicate exists.
How would I do this?
As an fyi I have searched stackoverflow and most the answers are a bit advanced for me to understand. I've also checked the postgresql documents and played around in the rails console but still to no avail. I can't figure this one out so i'm hoping someone else can help tell me what I'm doing wrong.
The closest statement I've tried is:
INSERT INTO myjointable (first_id,second_id) SELECT 1,2
WHERE NOT EXISTS (
SELECT first_id FROM myjointable
WHERE first_id = 1 AND second_id IN (...))
Part of the problem with this statement is that I am only inserting 1 value at a time whereas I want a statement that mass inserts. Also the second_id IN (...) section of the statement can include up to 100 different values so I'm not sure how slow that will be.
Note that for the most part there should not be many duplicates so I am not sure if mass inserting to a temporary table and finding distinct values is a good idea.
Edit to add context:
The reason I need a mass insert is because I have a many to many relationship between 2 models where 1 of the models is never populated by a form. I have stocks, and stock price histories. The stock price histories are never created in a form, but rather mass inserted themselves by pulling the data from YahooFinance with their yahoo finance API. I use the activerecord-import gem to mass insert for stock price histories (i.e. Model.import columns,values) but I can't type jointable.import columns,values because I get the jointable is an undefined local variable
I ended up using the WITH clause to select my values and give it a name. Then I inserted those values and used WHERE NOT EXISTS to effectively skip any items that are already in my database.
So far it looks like it is working...
WITH withqueryname(first_id,second_id) AS (VALUES(1,2),(3,4),(5,6)...etc)
INSERT INTO jointablename (first_id,second_id)
SELECT * FROM withqueryname
WHERE NOT EXISTS(
SELECT first_id FROM jointablename WHERE
first_id = 1 AND
second_id IN (1,2,3,4,5,6..etc))
You can interchange the Values with a variable. Mine was VALUES#{values}
You can also interchange the second_id IN with a variable. Mine was second_id IN #{variable}.
Here's how I'd tackle it: Create a temp table and populate it with your new values. Then lock the old join values table to prevent concurrent modification (important) and insert all value pairs that appear in the new table but not the old one.
One way to do this is by doing a left outer join of the old values onto the new ones and filtering for rows where the old join table values are null. Another approach is to use an EXISTS subquery. The two are highly likely to result in the same query plan once the query optimiser is done with them anyway.
Example, untested (since you didn't provide an SQLFiddle or sample data) but should work:
BEGIN;
CREATE TEMPORARY TABLE newjoinvalues(
first_id integer,
second_id integer,
primary key(first_id,second_id)
);
-- Now populate `newjoinvalues` with multi-valued inserts or COPY
COPY newjoinvalues(first_id, second_id) FROM stdin;
LOCK TABLE myjoinvalues IN EXCLUSIVE MODE;
INSERT INTO myjoinvalues
SELECT n.first_id, n.second_id
FROM newjoinvalues n
LEFT OUTER JOIN myjoinvalues m ON (n.first_id = m.first_id AND n.second_id = m.second_id)
WHERE m.first_id IS NULL AND m.second_id IS NULL;
COMMIT;
This won't update existing values, but you can do that fairly easily too by using with a second query that does an UPDATE ... FROM while still holding the write table lock.
Note that the lock mode specified above will not block SELECTs, only writes like INSERT, UPDATE and DELETE, so queries can continue to be made to the table while the process is ongoing, you just can't update it.
If you can't accept that an alternative is to run the update in SERIALIZABLE isolation (only works properly for this purpose in Pg 9.1 and above). This will result in the query failing whenever a concurrent write occurs so you have to be prepared to retry it over and over and over again. For that reason it's likely to be better to just live with locking the table for a while.