Postgres group by day of month, ActiveRecord(Rails) returns array with nil ids while database query works fine [duplicate] - ruby-on-rails

Tag.joins(:quote_tags).group('quote_tags.tag_id').order('count desc').select('count(tags.id) AS count, tags.id, tags.name')
Build query:
SELECT count(tags.id) AS count, tags.id, tags.name FROM `tags` INNER JOIN `quote_tags` ON `quote_tags`.`tag_id` = `tags`.`id` GROUP BY quote_tags.tag_id ORDER BY count desc
Result:
[#<Tag id: 401, name: "different">, ... , #<Tag id: 4, name: "family">]
It not return count column for me. How can I get it?

Have you tried calling the count method on one of the returned Tag objects? Just because inspect doesn't mention the count doesn't mean that it isn't there. The inspect output:
[#<Tag id: 401, name: "different">, ... , #<Tag id: 4, name: "family">]
will only include things that the Tag class knows about and Tag will only know about the columns in the tags table: you only have id and name in the table so that's all you see.
If you do this:
tags = Tag.joins(:quote_tags).group('quote_tags.tag_id').order('count desc').select('count(tags.id) AS count, tags.id, tags.name')
and then look at the counts:
tags.map(&:count)
You'll see the array of counts that you're expecting.

Update: The original version of this answer mistakenly characterized select and subsequent versions ended up effectively repeating the current version of the other answer from #muistooshort. I'm leaving it in it's current state because it has the information about using raw sql. Thanks to #muistooshort for pointing out my error.
Although your query is in fact working as explained by the other answer, you can always execute raw SQL as an alternative.
There are a variety of select_... methods you can choose from, but I would think you'd want to use select_all. Assuming the build query that you implicitly generated was correct, you can just use that, as in:
ActiveRecord::Base.connection.select_all('
SELECT count(tags.id) AS count, tags.id, tags.name FROM `tags`
INNER JOIN `quote_tags` ON `quote_tags`.`tag_id` = `tags`.`id`
GROUP BY quote_tags.tag_id
ORDER BY count desc')
See http://api.rubyonrails.org/classes/ActiveRecord/ConnectionAdapters/DatabaseStatements.html for information on the various methods you can choose from.

Related

Rails: AND operator in a has_many association

My relationship is a Client can have many ClientJobs. I want to be able to find clients that perform both Job a and Job b. I'm using 3 select boxes so I can pick a maximum of three jobs to select from. The select boxes are populated from the database.
I know how to test for 1 job with the query below. But I need a way to use an AND operator to test that both jobs exist for that client.
#clients = Client.includes("client_jobs").where(
client_jobs: { job_name: params[:job1]})
Unfortunately it's easy to do an IN operation like below, but I'm thinking the syntax for AND should be similar....I hope
#lients = Client.includes("client_jobs").where(
client_jobs: { job_name: [params[:job1], params[:job2]]})
EDIT: Posting the sql statement that hits the database from the answer below
Core Load (0.6ms) SELECT `clients`.* FROM `clients`
CoreStatistic Load (1.9ms) SELECT `client_jobs`.* FROM `client_jobs`
WHERE `client_jobs `.`client_id` IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10,........)
The second query runs through every client_job in the database. It's never tested against the params[:job1], params[:job2] etc. So #clients returns nil crashing my view template
(undefined method `map' for nil:NilClass
In my opinion, a better approach then self-joins is to simply join ClientJobs and then use GROUP BY and HAVING clauses to filter out only those records that exactly match the given associated records.
performed_jobs = %w(job job2 job3)
Client.joins(:client_jobs).
where(client_jobs: { job_name: performed_jobs }).
group("clients.id").
having("count(*) = #{performed_jobs.count}")
Let's walk through this query:
first two clauses join the ClientJobs to Clients and filter out only those, that have any of the three jobs defined (it uses the IN clause)
next, we group these joined records by Client.id so that we get the clients back
finally, the having clause ensures we only return those clients that had exactly 3 ClientJob records joined in, i.e. only those that had all the three client jobs defined.
It is the trick with HAVING(COUNT(*) = ...) that turns the IN clause (which is essentially an OR-ed list of options) into a "must have all these" clause.
To do this in a single SQL query try the following:
jobs_with_same_user = ClientJob.select(:user_id).where(job_name: "<job_name1>", user_id: ClientJob.select(:user_id).where(job_name: "<job_name2>"))
#clients = Client.where(id: jobs_with_same_user)
Here's what this query is doing:
Select the user_ids of all Client jobs with [job_name2]
Select the user_ids of all Client jobs with user_id IN result set from (1) AND having [job_name1]
Select all users with using (2) as a subquery.
Not many know this but Rails 4+ supports subqueries. Basically this is a self join acting as subquery for the clients:
SELECT *
FROM clients
WHERE id IN <jobs_with_same_user>
Also, I'm not sure if you're referencing the client_jobs association in your view, but if you are, add the includes statement to avoid an N+1 query:
#clients = Client.includes(:client_jobs).where(id: jobs_with_same_user)
EDIT
If you prefer, the same result can be achieved with a self-referencing inner join:
jobs_with_same_user = ClientJob
.select("client_jobs.user_id AS user_id")
.joins("JOIN client_jobs inner_client_jobs ON inner_client_jobs.user_id=client_jobs.user_id")
.where(client_jobs: { job_name: "<first_job_name1>" }, inner_client_jobs: { job_name: "<job_name2>" })
#clients = Client.where(id: jobs_with_same_user)

Return duplicate records (activerecord, postgres)

I have the following query returning duplicate titles, but :id is nil:
Movie.select(:title).group(:title).having("count(*) > 1")
[#<Movie:0x007f81f7111c20 id: nil, title: "Fargo">,
#<Movie:0x007f81f7111ab8 id: nil, title: "Children of Men">,
#<Movie:0x007f81f7111950 id: nil, title: "The Martian">,
#<Movie:0x007f81f71117e8 id: nil, title: "Gravity">]
I tried adding :id to the select and group but it returns an empty array. How can I return the whole movie record, not just the titles?
A SQL-y Way
First, let's just solve the problem in SQL, so that the Rails-specific syntax doesn't trick us.
This SO question is a pretty clear parallel: Finding duplicate values in a SQL Table
The answer from KM (second from the top, non-checkmarked, at the moment) meets your criteria of returning all duplicated records along with their IDs. I've modified KM's SQL to match your table...
SELECT
m.id, m.title
FROM
movies m
INNER JOIN (
SELECT
title, COUNT(*) AS CountOf
FROM
movies
GROUP BY
title
HAVING COUNT(*)>1
) dupes
ON
m.title=dupes.title
The portion inside the INNER JOIN ( ) is essentially what you've generated already. A grouped table of duplicated titles and counts. The trick is JOINing it to the unmodified movies table, which will exclude any movies that don't have matches in the query of dupes.
Why is this so hard to generate in Rails? The trickiest part is that, because we're JOINing movies to movies, we have to create table aliases (m and dupes in my query above).
Sadly, it Rails doesn't provide any clean ways of declaring these aliases. Some references:
Rails GitHub issues mentioning "join" and "alias". Misery.
SO Question: ActiveRecord query with alias'd table names
Fortunately, since we've got the SQL in-hand, we can use the .find_by_sql method...
Movie.find_by_sql("SELECT m.id, m.title FROM movies m INNER JOIN (SELECT title, COUNT(*) FROM movies GROUP BY title HAVING COUNT(*)>1) dupes ON m.first=.first")
Because we're calling Movie.find_by_sql, ActiveRecord assumes our hand-written SQL can be bundled into Movie objects. It doesn't massage or generate anything, which lets us do our aliases.
This approach has its shortcomings. It returns an array and not an ActiveRecord Relation, which means it can't be chained with other scopes. And, in the documentation for the find_by_sql method, we get extra discouragement...
This should be a last resort because using, for example, MySQL specific terms will lock you to using that particular database engine or require you to change your call if you switch engines.
A Rails-y Way
Really, what is the SQL doing above? It's getting a list of names that appear more than once. Then, it's matching that list against the original table. So, let's just do that using Rails.
titles_with_multiple = Movie.group(:title).having("count(title) > 1").count.keys
Movie.where(title: titles_with_multiple)
We call .keys because the first query returns an hash. The keys are our titles. The where() method can take an array, and we've handed it an array of titles. Winner.
You could argue one line of Ruby is more elegant than two. And if that one line of Ruby has an ungodly string of SQL embedded within it, how elegant is it really?
Hope this helps!
You can try to add id in your select:
Movie.select([:id, :title]).group(:title).having("count(title) > 1")

Order with DISTINCT ids in rails with postgres

I have the following code to join two tables microposts and activities with micropost_id column and then order based on created_at of activities table with distinct micropost id.
Micropost.joins("INNER JOIN activities ON
(activities.micropost_id = microposts.id)").
where('activities.user_id= ?',id).order('activities.created_at DESC').
select("DISTINCT (microposts.id), *")
which should return whole micropost columns.This is not working in my developement enviornment.
(PG::InvalidColumnReference: ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list
If I add activities.created_at in SELECT DISTINCT, I will get repeated micropost ids because the have distinct activities.created_at column. I have done a lot of search to reach here. But the problem always persist because of this postgres condition to avoid random selection.
I want to select based on order of activities.created_at with distinct micropost _id.
Please help..
To start with, we need to quickly cover what SELECT DISTINCT is actually doing. It looks like just a nice keyword to make sure you only get back distinct values, which shouldn't change anything, right? Except as you're finding out, behind the scenes, SELECT DISTINCT is actually acting more like a GROUP BY. If you want to select distinct values of something, you can only order that result set by the same values you're selecting -- otherwise, Postgres doesn't know what to do.
To explain where the ambiguity comes from, consider this simple set of data for your activities:
CREATE TABLE activities (
id INTEGER PRIMARY KEY,
created_at TIMESTAMP WITH TIME ZONE,
micropost_id INTEGER REFERENCES microposts(id)
);
INSERT INTO activities (id, created_at, micropost_id)
VALUES (1, current_timestamp, 1),
(2, current_timestamp - interval '3 hours', 1),
(3, current_timestamp - interval '2 hours', 2)
You stated in your question that you want "distinct micropost_id" "based on order of activities.created_at". It's easy to order these activities by descending created_at (1, 3, 2), but both 1 and 2 have the same micropost_id of 1. So if you want the query to return just micropost IDs, should it return 1, 2 or 2, 1?
If you can answer the above question, you need to take your logic for doing so and move it into your query. Let's say that, and I think this is pretty likely, you want this to be a list of microposts which were most recently acted on. In that case, you want to sort the microposts in descending order of their most recent activity. Postgres can do that for you, in a number of ways, but the easiest way in my mind is this:
SELECT micropost_id
FROM activities
JOIN microposts ON activities.micropost_id = microposts.id
GROUP BY micropost_id
ORDER BY MAX(activities.created_at) DESC
Note that I've dropped the SELECT DISTINCT bit in favor of using GROUP BY, since Postgres handles them much better. The MAX(activities.created_at) bit tells Postgres to, for each group of activities with the same micropost_id, sort by only the most recent.
You can translate the above to Rails like so:
Micropost.select('microposts.*')
.joins("JOIN activities ON activities.micropost_id = microposts.id")
.where('activities.user_id' => id)
.group('microposts.id')
.order('MAX(activities.created_at) DESC')
Hope this helps! You can play around with this sqlFiddle if you want to understand more about how the query works.
Try the below code
Micropost.select('microposts.*, activities.created_at')
.joins("INNER JOIN activities ON (activities.micropost_id = microposts.id)")
.where('activities.user_id= ?',id)
.order('activities.created_at DESC')
.uniq

query , can not select column count

Tag.joins(:quote_tags).group('quote_tags.tag_id').order('count desc').select('count(tags.id) AS count, tags.id, tags.name')
Build query:
SELECT count(tags.id) AS count, tags.id, tags.name FROM `tags` INNER JOIN `quote_tags` ON `quote_tags`.`tag_id` = `tags`.`id` GROUP BY quote_tags.tag_id ORDER BY count desc
Result:
[#<Tag id: 401, name: "different">, ... , #<Tag id: 4, name: "family">]
It not return count column for me. How can I get it?
Have you tried calling the count method on one of the returned Tag objects? Just because inspect doesn't mention the count doesn't mean that it isn't there. The inspect output:
[#<Tag id: 401, name: "different">, ... , #<Tag id: 4, name: "family">]
will only include things that the Tag class knows about and Tag will only know about the columns in the tags table: you only have id and name in the table so that's all you see.
If you do this:
tags = Tag.joins(:quote_tags).group('quote_tags.tag_id').order('count desc').select('count(tags.id) AS count, tags.id, tags.name')
and then look at the counts:
tags.map(&:count)
You'll see the array of counts that you're expecting.
Update: The original version of this answer mistakenly characterized select and subsequent versions ended up effectively repeating the current version of the other answer from #muistooshort. I'm leaving it in it's current state because it has the information about using raw sql. Thanks to #muistooshort for pointing out my error.
Although your query is in fact working as explained by the other answer, you can always execute raw SQL as an alternative.
There are a variety of select_... methods you can choose from, but I would think you'd want to use select_all. Assuming the build query that you implicitly generated was correct, you can just use that, as in:
ActiveRecord::Base.connection.select_all('
SELECT count(tags.id) AS count, tags.id, tags.name FROM `tags`
INNER JOIN `quote_tags` ON `quote_tags`.`tag_id` = `tags`.`id`
GROUP BY quote_tags.tag_id
ORDER BY count desc')
See http://api.rubyonrails.org/classes/ActiveRecord/ConnectionAdapters/DatabaseStatements.html for information on the various methods you can choose from.

No Method Error 'map' for #<Arel::Nodes::SqlLiteral>

I have the following example query:
source = "(SELECT DISTINCT source.* FROM (SELECT * FROM items) AS source) AS items"
items = Item.select("items.*").from(source).includes([:images])
p items # [#<Item id: 1>, #<Item id:2>]
However running:
p items.count
Results in NoMethodError: undefined methodmap' for Arel::Nodes::SqlLiteral`
I appreciate the query is silly, however the non-simplifieid query is a bit too complicated to copy and this was the smallest crashing version I could create. Any ideas?
Can you call all on that object to essentially cast it to an Array?
Item.select("items.*").from(source).includes([:images]).all.count
Or perhaps in that case, size would be more appropriate. In any case, this will execute the query and load all the objects into memory, which may not be desirable.
It looks like the problem is with your includes([:images]). On a similar application, I can execute this from the console:
> Category.select('categories.*').from('(SELECT DISTINCT source.* FROM (SELECT * FROM categories) AS source) AS categories').count
(0.5ms) SELECT COUNT(*) FROM (SELECT DISTINCT source.* FROM (SELECT * FROM categories) AS source) AS categories
(Notice that the count overrides the SELECT clause, even though I explicitly specified items.*. But they're still equivalent queries.)
As soon as I add an includes scope, it fails:
> Category.select('categories.*').from('(SELECT DISTINCT source.* FROM (SELECT * FROM categories) AS source) AS categories').includes(:projects).count
NoMethodError: undefined method `left' for #<Arel::Nodes::SqlLiteral:0x131d35248>
I tried a few different means of acquiring the count, like select('COUNT(categories.*)'), but they all failed in various ways. ActiveRecord seems to be falling back on a basic LEFT OUTER JOIN to perform the eager loading, possibly because it thinks you're using some kind of condition or external table to perform the join, and this seems to confuse its normal methods of performing the count. See the end of the section on Eager Loading in the ActiveRecord::Associations docs.
My Suggestion
If the join doesn't affect the number of rows returned in the outer query, I'd say your best bet is to execute one query to get the count and one query to get the actual results. We have to do something similar in our application for paging: one query returns the current page of results, and one returns the total number of records matching the filter criteria.
The issue is Rails #24193 https://github.com/rails/rails/issues/24193 and has to do with from combined with eager loading. The workaround is to use the form: Item.select("items.*").from([Arel.sql(source)]).includes([:images])

Resources