Querying an JSONB array of objects in Rails 7 - ruby-on-rails

I'm on Ruby On Rails 7
I have a class ActiveRecord class called Thread.
There are 2 records :
id: 1,
subject: "Hello",
participants:
[{"name"=>"guillaume", "email"=>"guillaume#example.com"},
{"name"=>"Fabien", "email"=>"fabien#example.com"},]
id: 2,
subject: "World",
participants:
[{"name"=>"guillaume", "email"=>"guillaume#example.com"},
{"name"=>"hakim", "email"=>"hakim#example.com"},]
participants is a JSONB array column
I want to find records who have the email "hakim#example.com" in participants column.

I don't think this is supported directly by AR.
You can "unwrap" the jsonb array using _jsonb_array_elements" or as the docs say: "Expands the top-level JSON array into a set of JSON values."
See https://www.postgresql.org/docs/current/functions-json.html#FUNCTIONS-JSON-PROCESSING-TABLE
You can query the DB with SQL like this:
select * from threads
where exists(
select true
from jsonb_array_elements(participants) as users
where users->>'email' = 'fabien#example.com'
)
Which translates into something like this in AR
condition = <<~SQL
exists(
select true
from jsonb_array_elements(participants) as users
where users->>'email' = ?
)
SQL
Thread.where(condition, 'fabien#example.com')
(from the top of my head. can not test this)
IMHO:
This is something that I see often happening with JSONB data: looks nice and simple in the beginning and then turns out to be more complicated (to query, validate, enforce integrity, insert) than a proper relationship (e.g. a participants table that links users/threads)

Related

Active Record .includes with where clause

I'm attempting to avoid an N+1 query by using includes, but I need to filter out some of the child records. Here's what I have so far:
Column.includes(:tickets).where(board_id: 1, tickets: {sprint_id: 10})
The problem is that only Columns containing Tickets with sprint_id of 10 are returned. I want to return all Columns with board_id of 1, and pre-fetch tickets only with a sprint_id of 10, so that column.tickets is either an empty list of a list of Ticket objects with sprint_id 10.
This is how includes is intended to work. When you add a where clause it applies to the entire query and not just loading the associated records.
One way of doing this is by flipping the query backwards:
columns = Ticket.eager_load(:columns)
.where(sprint_id: 10, columns: { board_id: 1 })
.map(&:column)
.uniq
Column.includes(:tickets).where(board_id: 1, tickets: {sprint_id: 10}) makes two SQL queries. One to select the columns that match the specified where clause, and another to select and load the tickets that their column_id is equal to the id of the matched columns.
To get all the related columns without loading unwanted tickets, you can do this:
columns = Column.where(board_id: 1).to_a
tickets = Ticket.where(column_id: columns.map(&:id), sprint_id: 10).to_a
This way you won't be able to call #tickets on each column (as it will again make a database query and you'll have the N+1 problem) but to have a similar way of accessing a column's tickets without making any queries you can do something like this:
grouped_tickets = tickets.group_by(&:column_id)
columns.each do |column|
column_tickets = grouped_tickets[column.id]
# Do something with column_tickets if they're present.
end

Populate an active record collection with different SQLs on the same model in rails

I'm trying to populate an active record collection from several SQLs on the same model. The only thing that differs between the SQLs is the where clause. My models have a type_id. As an example I have
models = Model.where("type_id = ?", 1)
logger.debug 'models.count ' + models.count.to_s
m = Model.where("type_id = ?", 2)
models << m
logger.debug 'models.count ' + models.count.to_s
From that, my logfile shows me
SELECT COUNT(*) FROM "models" WHERE (type_id = 1)
models.count 1
SELECT COUNT(*) FROM "models" WHERE (type_id = 1)
models.count 1
The second SQL is not correct for my situation, I wanted
SELECT COUNT(*) FROM "models" WHERE (type_id = 2)
The only way I've found to get around this is to do Model.all, iterate over each and add the ones I want. This would be very time consuming for a large model. Is there a better way?
From the sounds of it, you're looking for any Model with a type_id of either 1 or 2. In SQL, you would express this as an IN subclause:
SELECT * FROM models WHERE type_id IN (1, 2);
In Rails, you can pass an array of acceptable values to a where call to generate the SQL IN statement:
Model.where(:type_id => [1, 2])
As stated by #ArtOfCode what you want is to do the query on one pass. That being said, what you are trying to do there won't work because when you are adding with << the object of your second query to the first one you are just appending the instance to the first collection. The object type of the resulting query is an ActiveRecord_Relation which happens to hold two instances of your custom models (in this case Model) but when you send / call count thats actually executing an ActiveRecord query.
How can you tell the difference? Well, if you do run that code you used and do:
models.count
You'll see that there's SQL executed for whatever the conditions of the query on models you did, however, if you do this:
models.length
You'll notice the result is 2, and the reason is because the length of the collection of your own objects which happens to be inside the ActiveRecord_Relation is indeed two, and that is what happens if you use <<; it'll add object instances to the relation but that does not mean that they are part of the query.
You could even do this:
models << Model.new
And calling models.length would effectively return 3 because that is the amount of instances of your model that are contained within the relation, again, not a part of the query. So as you can see you can even add new object instances which have not even been saved to the database.
TL;DR if you want to query objects that are stored in the database do it on the query itself, or chain conditions at once, but don't try to mix activerecord relation collections.

Return duplicate records (activerecord, postgres)

I have the following query returning duplicate titles, but :id is nil:
Movie.select(:title).group(:title).having("count(*) > 1")
[#<Movie:0x007f81f7111c20 id: nil, title: "Fargo">,
#<Movie:0x007f81f7111ab8 id: nil, title: "Children of Men">,
#<Movie:0x007f81f7111950 id: nil, title: "The Martian">,
#<Movie:0x007f81f71117e8 id: nil, title: "Gravity">]
I tried adding :id to the select and group but it returns an empty array. How can I return the whole movie record, not just the titles?
A SQL-y Way
First, let's just solve the problem in SQL, so that the Rails-specific syntax doesn't trick us.
This SO question is a pretty clear parallel: Finding duplicate values in a SQL Table
The answer from KM (second from the top, non-checkmarked, at the moment) meets your criteria of returning all duplicated records along with their IDs. I've modified KM's SQL to match your table...
SELECT
m.id, m.title
FROM
movies m
INNER JOIN (
SELECT
title, COUNT(*) AS CountOf
FROM
movies
GROUP BY
title
HAVING COUNT(*)>1
) dupes
ON
m.title=dupes.title
The portion inside the INNER JOIN ( ) is essentially what you've generated already. A grouped table of duplicated titles and counts. The trick is JOINing it to the unmodified movies table, which will exclude any movies that don't have matches in the query of dupes.
Why is this so hard to generate in Rails? The trickiest part is that, because we're JOINing movies to movies, we have to create table aliases (m and dupes in my query above).
Sadly, it Rails doesn't provide any clean ways of declaring these aliases. Some references:
Rails GitHub issues mentioning "join" and "alias". Misery.
SO Question: ActiveRecord query with alias'd table names
Fortunately, since we've got the SQL in-hand, we can use the .find_by_sql method...
Movie.find_by_sql("SELECT m.id, m.title FROM movies m INNER JOIN (SELECT title, COUNT(*) FROM movies GROUP BY title HAVING COUNT(*)>1) dupes ON m.first=.first")
Because we're calling Movie.find_by_sql, ActiveRecord assumes our hand-written SQL can be bundled into Movie objects. It doesn't massage or generate anything, which lets us do our aliases.
This approach has its shortcomings. It returns an array and not an ActiveRecord Relation, which means it can't be chained with other scopes. And, in the documentation for the find_by_sql method, we get extra discouragement...
This should be a last resort because using, for example, MySQL specific terms will lock you to using that particular database engine or require you to change your call if you switch engines.
A Rails-y Way
Really, what is the SQL doing above? It's getting a list of names that appear more than once. Then, it's matching that list against the original table. So, let's just do that using Rails.
titles_with_multiple = Movie.group(:title).having("count(title) > 1").count.keys
Movie.where(title: titles_with_multiple)
We call .keys because the first query returns an hash. The keys are our titles. The where() method can take an array, and we've handed it an array of titles. Winner.
You could argue one line of Ruby is more elegant than two. And if that one line of Ruby has an ungodly string of SQL embedded within it, how elegant is it really?
Hope this helps!
You can try to add id in your select:
Movie.select([:id, :title]).group(:title).having("count(title) > 1")

Order with DISTINCT ids in rails with postgres

I have the following code to join two tables microposts and activities with micropost_id column and then order based on created_at of activities table with distinct micropost id.
Micropost.joins("INNER JOIN activities ON
(activities.micropost_id = microposts.id)").
where('activities.user_id= ?',id).order('activities.created_at DESC').
select("DISTINCT (microposts.id), *")
which should return whole micropost columns.This is not working in my developement enviornment.
(PG::InvalidColumnReference: ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list
If I add activities.created_at in SELECT DISTINCT, I will get repeated micropost ids because the have distinct activities.created_at column. I have done a lot of search to reach here. But the problem always persist because of this postgres condition to avoid random selection.
I want to select based on order of activities.created_at with distinct micropost _id.
Please help..
To start with, we need to quickly cover what SELECT DISTINCT is actually doing. It looks like just a nice keyword to make sure you only get back distinct values, which shouldn't change anything, right? Except as you're finding out, behind the scenes, SELECT DISTINCT is actually acting more like a GROUP BY. If you want to select distinct values of something, you can only order that result set by the same values you're selecting -- otherwise, Postgres doesn't know what to do.
To explain where the ambiguity comes from, consider this simple set of data for your activities:
CREATE TABLE activities (
id INTEGER PRIMARY KEY,
created_at TIMESTAMP WITH TIME ZONE,
micropost_id INTEGER REFERENCES microposts(id)
);
INSERT INTO activities (id, created_at, micropost_id)
VALUES (1, current_timestamp, 1),
(2, current_timestamp - interval '3 hours', 1),
(3, current_timestamp - interval '2 hours', 2)
You stated in your question that you want "distinct micropost_id" "based on order of activities.created_at". It's easy to order these activities by descending created_at (1, 3, 2), but both 1 and 2 have the same micropost_id of 1. So if you want the query to return just micropost IDs, should it return 1, 2 or 2, 1?
If you can answer the above question, you need to take your logic for doing so and move it into your query. Let's say that, and I think this is pretty likely, you want this to be a list of microposts which were most recently acted on. In that case, you want to sort the microposts in descending order of their most recent activity. Postgres can do that for you, in a number of ways, but the easiest way in my mind is this:
SELECT micropost_id
FROM activities
JOIN microposts ON activities.micropost_id = microposts.id
GROUP BY micropost_id
ORDER BY MAX(activities.created_at) DESC
Note that I've dropped the SELECT DISTINCT bit in favor of using GROUP BY, since Postgres handles them much better. The MAX(activities.created_at) bit tells Postgres to, for each group of activities with the same micropost_id, sort by only the most recent.
You can translate the above to Rails like so:
Micropost.select('microposts.*')
.joins("JOIN activities ON activities.micropost_id = microposts.id")
.where('activities.user_id' => id)
.group('microposts.id')
.order('MAX(activities.created_at) DESC')
Hope this helps! You can play around with this sqlFiddle if you want to understand more about how the query works.
Try the below code
Micropost.select('microposts.*, activities.created_at')
.joins("INNER JOIN activities ON (activities.micropost_id = microposts.id)")
.where('activities.user_id= ?',id)
.order('activities.created_at DESC')
.uniq

Rails 3 Comparing foreign key to list of ids using activerecord

I have a relationship between two models, Registers and Competitions. I have a very complicated dynamic query that is being built and if the conditions are right I need to limit Registration records to only those where it's Competition parent meets a certain criteria. In order to do this without select from the Competition table I was thinking of something along the lines of...
Register.where("competition_id in ?", Competition.where("...").collect {|i| i.id})
Which produces this SQL:
SELECT "registers".* FROM "registers" WHERE (competition_id in 1,2,3,4...)
I don't think PostgreSQL liked the fact that the in parameters aren't surrounded by parenthesis. How can I compare the Register foreign key to a list of competition ids?
you can make it a bit shorter and skip the collect (this worked for me in 3.2.3).
Register.where(competition_id: Competition.where("..."))
this will result in the following sql:
SELECT "registers".* FROM "registers" WHERE "registers"."competition_id" IN (SELECT "competitions"."id" FROM "competitions" WHERE "...")
Try this instead:
competitions = Competition.where("...").collect {|i| i.id}
Register.where(:competition_id => competitions)

Resources