Select n objects randomly with condition in Rails - ruby-on-rails

I have a model called Post, with a column called vote, and it has a big number of posts
I want to select n posts randomly that have >=x votes. n is very small compared to the number of posts
What is the best way to do this? I've tried a couple of ways that seem to be very inefficient. Thanks

If you're on MySQL, you could order all the posts that meet the greater than criteria randomly and select the top n.
The actual query would look like
SELECT * FROM posts WHERE votes >= x ORDER BY rand() LIMIT n
Haven't tested this, but something like this should work in Rails:
Post.all(:conditions => ["votes >= ?", x], :order => "rand()", :limit => n)

Related

How to get records based on an offset around a particular record?

I'm building a search UI which searches for comments. When a user clicks on a search result (comment), I want to show the surrounding comments.
My model:
Group (id, title) - A Group has many comments
Comment (id, group_id, content)
For example:
When a user clicks on a comment with comment.id equal to 26. I would first find all the comments for that group:
comment = Comment.find(26)
comments = Comment.where(:group_id => comment.group_id)
I now have all of the group's comments. What I then want to do is show comment.id 26, with a max of 10 comments before and 10 comments after.
How can I modify comments to show that offset?
Sounds simple, but it's tricky to get the best performance for this. In any case, you must let the database do the work. That will be faster by an order of magnitude than fetching all rows and filter / sort on the client side.
If by "before" and "after" you mean smaller / bigger comment.id, and we further assume that there can be gaps in the id space, this one query should do all:
WITH x AS (SELECT id, group_id FROM comment WHERE id = 26) -- enter value once
(
SELECT *
FROM x
JOIN comment c USING (group_id)
WHERE c.id > x.id
ORDER BY c.id
LIMIT 10
)
UNION ALL
(
SELECT *
FROM x
JOIN comment c USING (group_id)
WHERE c.id < x.id
ORDER BY c.id DESC
LIMIT 10
)
I'll leave paraphrasing that in Ruby syntax to you, that's not my area of expertise.
Returns 10 earlier comments and 10 later ones. Fewer if fewer exist. Use <= in the 2nd leg of the UNION ALL query to include the selected comment itself.
If you need the rows sorted, add another query level on top with ORDER BY.
Should be very fast in combination with these two indexes for the table comment:
one on (id) - probably covered automatically the primary key.
one on (group_id, id)
For read-only data you could create a materialized view with a gap-less row-number that would make this even faster.
More explanation about parenthesis, indexes, and performance in this closely related answer.
Something like:
comment = Comment.find(26)
before_comments = Comment.
where('created_at <= ?', comment.created_at).
where('id != ?', comment.id).
where(group_id: comment.group_id).
order('created_at DESC').limit(10)
after_comments = Comment.
where('created_at >= ?', comment.created_at).
where('id != ?', comment.id).
where(group_id: comment.group_id).
order('created_at DESC').limit(10)

Sequel -- How To Construct This Query?

I have a users table, which has a one-to-many relationship with a user_purchases table via the foreign key user_id. That is, each user can make many purchases (or may have none, in which case he will have no entries in the user_purchases table).
user_purchases has only one other field that is of interest here, which is purchase_date.
I am trying to write a Sequel ORM statement that will return a dataset with the following columns:
user_id
date of the users SECOND purchase, if it exists
So users who have not made at least 2 purchases will not appear in this dataset. What is the best way to write this Sequel statement?
Please note I am looking for a dataset with ALL users returned who have >= 2 purchases
Thanks!
EDIT FOR CLARITY
Here is a similar statement I wrote to get users and their first purchase date (as opposed to 2nd purchase date, which I am asking for help with in the current post):
DB[:users].join(:user_purchases, :user_id => :id)
.select{[:user_id, min(:purchase_date)]}
.group(:user_id)
You don't seem to be worried about the dates, just the counts so
DB[:user_purchases].group_and_count(:user_id).having(:count > 1).all
will return a list of user_ids and counts where the count (of purchases) is >= 2. Something like
[{:count=>2, :user_id=>1}, {:count=>7, :user_id=>2}, {:count=>2, :user_id=>3}, ...]
If you want to get the users with that, the easiest way with Sequel is probably to extract just the list of user_ids and feed that back into another query:
DB[:users].where(:id => DB[:user_purchases].group_and_count(:user_id).
having(:count > 1).all.map{|row| row[:user_id]}).all
Edit:
I felt like there should be a more succinct way and then I saw this answer (from Sequel author Jeremy Evans) to another question using select_group and select_more : https://stackoverflow.com/a/10886982/131226
This should do it without the subselect:
DB[:users].
left_join(:user_purchases, :user_id=>:id).
select_group(:id).
select_more{count(:purchase_date).as(:purchase_count)}.
having(:purchase_count > 1)
It generates this SQL
SELECT `id`, count(`purchase_date`) AS 'purchase_count'
FROM `users` LEFT JOIN `user_purchases`
ON (`user_purchases`.`user_id` = `users`.`id`)
GROUP BY `id` HAVING (`purchase_count` > 1)"
Generally, this could be the SQL query that you need:
SELECT u.id, up1.purchase_date FROM users u
LEFT JOIN user_purchases up1 ON u.id = up1.user_id
LEFT JOIN user_purchases up2 ON u.id = up2.user_id AND up2.purchase_date < up1.purchase_date
GROUP BY u.id, up1.purchase_date
HAVING COUNT(up2.purchase_date) = 1;
Try converting that to sequel, if you don't get any better answers.
The date of the user's second purchase would be the second row retrieved if you do an order_by(:purchase_date) as part of your query.
To access that, do a limit(2) to constrain the query to two results then take the [-1] (or last) one. So, if you're not using models and are working with datasets only, and know the user_id you're interested in, your (untested) query would be:
DB[:user_purchases].where(:user_id => user_id).order_by(:user_purchases__purchase_date).limit(2)[-1]
Here's some output from Sequel's console:
DB[:user_purchases].where(:user_id => 1).order_by(:purchase_date).limit(2).sql
=> "SELECT * FROM user_purchases WHERE (user_id = 1) ORDER BY purchase_date LIMIT 2"
Add the appropriate select clause:
.select(:user_id, :purchase_date)
and you should be done:
DB[:user_purchases].select(:user_id, :purchase_date).where(:user_id => 1).order_by(:purchase_date).limit(2).sql
=> "SELECT user_id, purchase_date FROM user_purchases WHERE (user_id = 1) ORDER BY purchase_date LIMIT 2"

Find overlapping seasons where seasons have_many date_ranges

I have the following setup:
class Season < AR::Base
has_many :date_ranges
end
class DateRange < AR::Base
# has a :starts_at & :ends_at
end
How would I find all overlapping seasons from a season instance? I have already tried with a couple of different queries (below). But the problem I keep hitting is the fact that the season im checking for also possible has multiple date_ranges. I could solve it with a loop but i'd rather only use a query.
This query looks up all the seasons that overlap but it only does that for 1 input date_range
Season.joins(:date_ranges).where("starts_at <= ? AND ends_at >= ?", ends_at, starts_at)
Maybe I need something to chain a couple of OR's together for each date_range on the instance but where() only uses AND.
So in short, finding the overlap is not the problem, but how do I find overlap of multiple date_ranges to the entire database?
The easiest way to do this is through straight SQL. Something like this:
DateRange.find_by_sql(%q{
select a.*
from date_ranges a
join date_ranges b on
a.id < b.id
and (
(a.ends_at >= b.starts_at and a.ends_at <= b.ends_at)
or (a.starts_at >= b.starts_at and a.starts_at <= b.ends_at)
or (a.starts_at <= b.starts_at and a.ends_at >= b.ends_at)
)
where season_id = ?
}, season_id)
The basic idea is to join the table to itself so that you can easily compare the ranges. The a.id < b.id is there to get unique results and filter out "ranges matches itself" cases. The inner or conditions check for both types of overlaps:
[as-----ae] [as-----ae]
[bs-----be] [bs-----be]
and
[as--------------ae] [as----ae]
[bs----be] [bs--------------be]
You might want to think about the end points though, that query considers two intervals to overlap if they only match at an endpoint and that might not be what you want.
Presumably you already have a unique constraint on the (season_id, starts_at, ends_at) triples and presumably you're already ensuring that starts_at <= ends_at.

Limit an array by the sum of a value within the records in rails3

So lets say I have the following in a Post model, each record has the field "num" with a random value of a number and a user_id.
So I make this:
#posts = Post.where(:user_id => 1)
Now lets say I want to limit my #posts array's records to have a sum of 50 or more in the num value (with only the final record going over the limit). So it would be adding post.num + post2.num + post3.num etc, until it the total reaches at least 50.
Is there a way to do this?
I would say to just grab all of the records like you already are:
#posts = Post.where(:user_id => 1)
and then use Ruby to do the rest:
sum, i = 0, 0
until sum >= 50
post = #posts[i].delete
sum, i = sum+post.num, i+1
end
There's probably a more elegant way but this will work. It deletes posts in order until the sum has exceed or is equal to 50. Then #posts is left with the rest of the records. Hopefully I understood your question.
You need to use the PostgreSQL Window functions
This gives you the rows with the net sum lower than 50
SELECT a.id, sum(a.num) num_sum OVER (ORDER BY a.user_id)
FROM posts a
WHERE a.user_id = 1 AND a.num_sum < 50
But your case is trickier as you want to go over the limit by one row:
SELECT a.id, sum(a.num) num_sum OVER (ORDER BY a.user_id)
FROM posts a
WHERE a.user_id = 1 AND a.num_sum <= (
SELECT MIN(c.num_sum)
FROM (
SELECT sum(b.num) num_sum OVER (ORDER BY b.user_id)
FROM posts b
WHERE b.user_id = 1 AND b.num_sum >= 50
) c )
You have to convert this SQL to Arel.

rails 3, active record: any way to tell how many unique values match a "x LIKE ?" query

I have a query to find all the phone numbers that match a partial expression such as "ends with 234"
#matchingphones = Calls.find :all,
:conditions => [ "(thephonenumber LIKE ?)", "%234"]
The same phone number might be in the database several times, and so might be returned multiple times by this query if it matches.
What I need is to know is UNIQUE phone numbers the query returns.
For example if the database contains
000-111-1234 *
000-111-3333
000-111-2234 *
000-111-1234 *
000-111-4444
the existing query will return the 3 records marked with * (eg returns one phone number -1234 twice since it's in the database twice)
what I need is a query that returns just once instance of each match, in this case
000-111-1234 *
000-111-2234 *
Calls.find :all,
:select => "id, DISTINCT thephonenumber",
:conditions => [ "(thephonenumber LIKE ?)", "%234"]
In addition,
1. You are using rails 2 query syntax.. better switch to rails 3 (arel)
2. Better name your class Call (not Calls)
Call.where("thephonenumber LIKE ?", "%234").group(:thephonenumber)
if you just want the count then you can do:
Call.where("thephonenumber LIKE ?", "%234").group(:thephonenumber).all.count

Resources