Given a record and order conditions, find record(s) after or before - ruby-on-rails

For example, you have a list of items, sorted by priority. You have 10,000 items! If you are showing the user a single item, how do you provide buttons for the user to see the previous item or the next item (what are these items)?
You could pass the item's position to the item page and use OFFSET in your SQL query. The downside of this, apart from having to pass a number that may change, is that the database cannot jump to the offset; it has to read every record until it reaches, say, the 9001st record. This is slow. Having searched for a solution, I could not find one, so I wrote order_query.
order_query uses the same ORDER BY query, but also includes a WHERE clause that excludes records before (for next) or after (for prev) the current one.
Here is an example of what the criteria could look like (using the gem above):
p = Issue.find(31).relative_order_by_query(Issue.visible,
[[:priority, %w(high medium low)],
[:valid_votes_count, :desc, sql: '(votes - suspicious_votes)'],
[:updated_at, :desc],
[:id, :desc]])
p.before #=> ActiveRecord::Relation<...>
p.previous #=> Issue<...>
p.position #=> 5
p.next #=> Issue<...>
p.after #=> ActiveRecord::Relation<...>
Have I just reinvented the wheel here? I am very interested in other approaches of doing this on the backend.
Internally this gem builds a query that depends on the current record's order values and looks like:
SELECT ... WHERE
x0 OR
y0 AND (x1 OR
y1 AND (x2 OR
y2 AND ...))
ORDER BY ...
LIMIT 1
Where x correspond to > / < terms, and y to = terms (for resolving ties), per order criterion.
Example query from the test suite log:
-- Current record: priority='high' (votes - suspicious_votes)=4 updated_at='2014-03-19 10:23:18.671039' id=9
SELECT "issues".* FROM "issues" WHERE
("issues"."priority" IN ('medium','low') OR
"issues"."priority" = 'high' AND (
(votes - suspicious_votes) < 4 OR
(votes - suspicious_votes) = 4 AND (
"issues"."updated_at" < '2014-03-19 10:23:18.671039' OR
"issues"."updated_at" = '2014-03-19 10:23:18.671039' AND
"issues"."id" < 9)))
ORDER BY
"issues"."priority"='high' DESC,
"issues"."priority"='medium' DESC,
"issues"."priority"='low' DESC,
(votes - suspicious_votes) DESC,
"issues"."updated_at" DESC,
"issues"."id" DESC
LIMIT 1

I found an alternative approach, and it uses a construct from the SQL '92 standard (Predicates 209), the row values constructor comparison predicate:
Let Rx and Ry be the two row value constructors of the comparison predicate and let RXi and RYi be the i-th row value constructor elements of Rx and Ry, respectively. "Rx comp op Ry" is true, false, or unknown as follows:
"x = Ry" is true if and only if RXi = RYi for all i.
"x <> Ry" is true if and only if RXi <> RYi for some i.
"x < Ry" is true if and only if RXi = RYi for all i < n and RXn < RYn for some n.
"x > Ry" is true if and only if RXi = RYi for all i < n and RXn > RYn for some n.
I found an example in this article by Markus Winand. Row value constructor comparison predicate can be used like this:
SELECT *
FROM sales
WHERE (sale_date, sale_id) < (?, ?)
ORDER BY sale_date DESC, sale_id DESC
This is roughly equivalent to this query:
SELECT *
FROM sales
WHERE sale_date < ? OR (sale_date = ? AND sale_id < ?)
ORDER BY sale_date DESC, sale_id DESC
The first caveat is that to use this directly all the order components have to be in the same direction, otherwise more fiddling is required. The other being that, despite being standard, row values comparison predicates are not supported by most databases (does work on postgres).

Related

Rails .where query chained to sql function, is there a way to call it on the results without converting them to an array?

I have a method that ranks user's response rates in our system called ranked_users
def ranked_users
User.joins(:responds).group(:id).select(
"users.*, SUM(CASE WHEN answers.response != 3 THEN 1 ELSE 0 END ) avg, RANK () OVER (
ORDER BY SUM(CASE WHEN answers.response != 3 THEN 1 ELSE 0 END ) DESC, CASE WHEN users.id = '#{
current_user.id
}' THEN 1 ELSE 0 END DESC
) rank"
)
.where('users.active = true')
.where('answers.created_at BETWEEN ? AND ?', Time.now - 12.months, Time.now)
end
result = ranked_users
I then take the top three with top_3 = ranked_users.limit(3)
If the user is not in the top 3, I want to append them with their rank to the list:
user_rank = result.find_by(id: current_user.id)
Whenever I call user_rank.rank it returns 1. I know this is because it's applying the find_by clause first and then ranking them. Is there a way to enforce the find_by clause happens only on the result of the first query? I tried doing result.load.find_by(...) but had the same issue. I could convert the entire result into an array but I want the solution to be highly scalable.
If you expect lots of users with lots of answers and high load on your rating system - you can create a materialized view for the ranking query with (user_id, avg, rank, etc.) and refresh it periodically instead of calculating rank every time (say, a few times per day or even less often). There's gem scenic for this.
You can even have indexes on rank and user id on the view and your query will be two simple fast reads from it.

All off with join if includes if and only if query does not work with Rails and Postgres

What I want is to query for all media that only has "fra" language associated through the habtm association. Currently, this query returns Media instances that include "fra", so e.g. ["fra", "eng"]. I use Postgres.
So, (1) I want to find the media that has exactly the language 'x' and no other, (2) find the media that has exactly the languages 'x' and 'y', (3) find the media that has exactly the languages 'x', 'y' and 'z' etc.
class Media < ActiveRecord::Base
has_and_belongs_to_many :audio_language_records, :join_table => "audio_languages_media", :class_name => "Language"
end
class Language < ActiveRecord::Base
attr_accessor :code
end
I have tried this, but it returns all media that includes "fra", but not only "fra"
Media.joins(:audio_language_records).where("languages.code = ? AND languages.code != ?", "fra", "eng")
The following does not work as intended and returns the same result
Media.joins(:audio_language_records).where("languages.code IN (?) AND languages.code NOT IN (?)", ["fra"], ["eng"])
So you need to find the Media for which there exists a language.code of 'eng' but no entries in the join table for the same media_id but a different language_id.
SELECT * FROM media m1
JOIN audio_languages_media alm1 ON alm1.media_id = m1.id
JOIN languages l1 ON alm1.language_id = l1.id
WHERE NOT EXISTS(
SELECT 1 FROM audio_languages_media alm2
WHERE alm1.language_id != alm2.language_id
AND alm1.media_id = alm2.media_id
)
AND l1.code = 'eng';
Let us know if this is the right db query so we can help with the AREL.
Edit: Query for when you want to find a media that is in at least 'eng' and 'fra'
SELECT * FROM media m1
WHERE(
SELECT count(*) FROM audio_languages_media alm2
JOIN languages l2 ON alm2.language_id = l2.id
WHERE l2.code in ('eng','fra')
AND alm2.media_id = m1.id
) > 1;
Edit: Add #chinshr's query
If you want media that has only exactly 'eng' and 'fra'
SELECT * FROM media m1
WHERE(
SELECT count(*) FROM audio_languages_media alm2
JOIN languages l2 ON alm2.language_id = l2.id
WHERE l2.code IN ('eng','fra')
AND alm2.media_id = m1.id
AND (
SELECT count(*) FROM audio_languages_media alm2
WHERE alm2.media_id = media.id
) = 2
) = 2;
This query can be tweaked for more or less languages by adding/removing from the IN array, and adjusting the count at the end to be equal to the number of elements in the IN array.
For this to work reliably, you must have a unique index on audio_languages_media(media_id, language_id);
You only have one value per row so every value that is "fra" can't be "eng" or any other thing. You're gonna have to check two rows. It can be done in SQL like this for example:
SELECT *
FROM languages t1
LEFT JOIN languages t2
ON t1.id=t2.id AND t2.code="eng"
WHERE t1.code="fra" AND t2.code IS NULL;

How to get records based on an offset around a particular record?

I'm building a search UI which searches for comments. When a user clicks on a search result (comment), I want to show the surrounding comments.
My model:
Group (id, title) - A Group has many comments
Comment (id, group_id, content)
For example:
When a user clicks on a comment with comment.id equal to 26. I would first find all the comments for that group:
comment = Comment.find(26)
comments = Comment.where(:group_id => comment.group_id)
I now have all of the group's comments. What I then want to do is show comment.id 26, with a max of 10 comments before and 10 comments after.
How can I modify comments to show that offset?
Sounds simple, but it's tricky to get the best performance for this. In any case, you must let the database do the work. That will be faster by an order of magnitude than fetching all rows and filter / sort on the client side.
If by "before" and "after" you mean smaller / bigger comment.id, and we further assume that there can be gaps in the id space, this one query should do all:
WITH x AS (SELECT id, group_id FROM comment WHERE id = 26) -- enter value once
(
SELECT *
FROM x
JOIN comment c USING (group_id)
WHERE c.id > x.id
ORDER BY c.id
LIMIT 10
)
UNION ALL
(
SELECT *
FROM x
JOIN comment c USING (group_id)
WHERE c.id < x.id
ORDER BY c.id DESC
LIMIT 10
)
I'll leave paraphrasing that in Ruby syntax to you, that's not my area of expertise.
Returns 10 earlier comments and 10 later ones. Fewer if fewer exist. Use <= in the 2nd leg of the UNION ALL query to include the selected comment itself.
If you need the rows sorted, add another query level on top with ORDER BY.
Should be very fast in combination with these two indexes for the table comment:
one on (id) - probably covered automatically the primary key.
one on (group_id, id)
For read-only data you could create a materialized view with a gap-less row-number that would make this even faster.
More explanation about parenthesis, indexes, and performance in this closely related answer.
Something like:
comment = Comment.find(26)
before_comments = Comment.
where('created_at <= ?', comment.created_at).
where('id != ?', comment.id).
where(group_id: comment.group_id).
order('created_at DESC').limit(10)
after_comments = Comment.
where('created_at >= ?', comment.created_at).
where('id != ?', comment.id).
where(group_id: comment.group_id).
order('created_at DESC').limit(10)

Find overlapping seasons where seasons have_many date_ranges

I have the following setup:
class Season < AR::Base
has_many :date_ranges
end
class DateRange < AR::Base
# has a :starts_at & :ends_at
end
How would I find all overlapping seasons from a season instance? I have already tried with a couple of different queries (below). But the problem I keep hitting is the fact that the season im checking for also possible has multiple date_ranges. I could solve it with a loop but i'd rather only use a query.
This query looks up all the seasons that overlap but it only does that for 1 input date_range
Season.joins(:date_ranges).where("starts_at <= ? AND ends_at >= ?", ends_at, starts_at)
Maybe I need something to chain a couple of OR's together for each date_range on the instance but where() only uses AND.
So in short, finding the overlap is not the problem, but how do I find overlap of multiple date_ranges to the entire database?
The easiest way to do this is through straight SQL. Something like this:
DateRange.find_by_sql(%q{
select a.*
from date_ranges a
join date_ranges b on
a.id < b.id
and (
(a.ends_at >= b.starts_at and a.ends_at <= b.ends_at)
or (a.starts_at >= b.starts_at and a.starts_at <= b.ends_at)
or (a.starts_at <= b.starts_at and a.ends_at >= b.ends_at)
)
where season_id = ?
}, season_id)
The basic idea is to join the table to itself so that you can easily compare the ranges. The a.id < b.id is there to get unique results and filter out "ranges matches itself" cases. The inner or conditions check for both types of overlaps:
[as-----ae] [as-----ae]
[bs-----be] [bs-----be]
and
[as--------------ae] [as----ae]
[bs----be] [bs--------------be]
You might want to think about the end points though, that query considers two intervals to overlap if they only match at an endpoint and that might not be what you want.
Presumably you already have a unique constraint on the (season_id, starts_at, ends_at) triples and presumably you're already ensuring that starts_at <= ends_at.

Limit an array by the sum of a value within the records in rails3

So lets say I have the following in a Post model, each record has the field "num" with a random value of a number and a user_id.
So I make this:
#posts = Post.where(:user_id => 1)
Now lets say I want to limit my #posts array's records to have a sum of 50 or more in the num value (with only the final record going over the limit). So it would be adding post.num + post2.num + post3.num etc, until it the total reaches at least 50.
Is there a way to do this?
I would say to just grab all of the records like you already are:
#posts = Post.where(:user_id => 1)
and then use Ruby to do the rest:
sum, i = 0, 0
until sum >= 50
post = #posts[i].delete
sum, i = sum+post.num, i+1
end
There's probably a more elegant way but this will work. It deletes posts in order until the sum has exceed or is equal to 50. Then #posts is left with the rest of the records. Hopefully I understood your question.
You need to use the PostgreSQL Window functions
This gives you the rows with the net sum lower than 50
SELECT a.id, sum(a.num) num_sum OVER (ORDER BY a.user_id)
FROM posts a
WHERE a.user_id = 1 AND a.num_sum < 50
But your case is trickier as you want to go over the limit by one row:
SELECT a.id, sum(a.num) num_sum OVER (ORDER BY a.user_id)
FROM posts a
WHERE a.user_id = 1 AND a.num_sum <= (
SELECT MIN(c.num_sum)
FROM (
SELECT sum(b.num) num_sum OVER (ORDER BY b.user_id)
FROM posts b
WHERE b.user_id = 1 AND b.num_sum >= 50
) c )
You have to convert this SQL to Arel.

Resources