restricting rows on solr subquery/join - join

I want to implement a search on solr that takes the 500 bestselling products, and then does a search/filter on those 500 products only
In SQL, I would do something like this:
SELECT * FROM Product
WHERE ProductID IN (SELECT TOP 500 ProductID FROM Product ORDER BY Sales DESC)
AND Manufacturer = 'Apple'
I know I can do join/subqueries in solr, but I can't seem to work out how to sort and limit the rows of these subqueries before they are fed into the main query.
Is this possible in solr?

You probably wont need any Joins or Subqueries.
You would index the Products with the Sales information.
You just need to execute a Solr Query :-
q=manufacturer:apple&sort=sales desc&limit=500
OR
fq=manufacturer:apple&sort=sales desc&limit=500
This would search the manufacture field for apple, return the result set ordered by sales in the descending order and limit the rows to 500.
Fetching 500 would probably not be a good idea and you can always do pagination with rows and limit.
The limit of showing only 500 results can be handled at client side.

Related

how to get subset of activerecord objects after performing .limit()?

I want to be able to limit the activerecord objects to 20 being returned, then perform a where() that returns a subset of the limited objects which I currently know only 10 will fulfil the second columns criteria.
e.g. of ideal behaviour:
o = Object.limit(20)
o.where(column: criteria).count
=> 10
But instead, activerecord still looks for 20 objects that fulfil the where() criteria, but looks outside of the original 20 objects that the limit() would have returned on its own.
How can I get the desired response?
One way to decrease the search space is to use a nested query. You should search the first N records rather than all records which match a specific condition. In SQL this would be done like this:
select * from (select * from table order by ORDERING_FIELD limit 20) where column = value;
The query above will only search for the condition in 20 rows from the table. Notice how I have added a ORDERING_FIELD, this is required because each query could give you a different order each time you run it.
To do something similar in Rails, you could try the following:
Object.where(id: Object.order(:id).limit(20).select(:id)).where(column: criteria)
This will execute a query similar to the following:
SELECT [objects].* FROM [objects] WHERE [objects].[id] IN (SELECT TOP (20) [objects].[id] FROM [objects] ORDER BY [objects].id ASC) AND [objects].[column] = criteria

Optimize Bigquery query: Shuffle reached broadcast limit

I'm trying to process this query.
SELECT
r.src,r.dst, ROUND(r.price/50)*50 pb,COUNT(*) results
FROM [search.interesting_routes] ovr
LEFT JOIN [search.search_results2] r ON ovr.src=r.src AND ovr.dst=r.dst
WHERE DATE(r.saved_at) >= '2015-10-1' AND DATE(r.saved_at) <= '2015-10-01' AND r.price < 20000
GROUP BY pb, r.src, r.dst
ORDER BY pb
The table search_results2 contains a huge amout of search results about prices for routes (route is defined by src and dst).
I need to count all records in search_results2 for each record in interesting_routes for different price buckets.
The query works fine on small sample of data, but once the data is huge it ends with
Error: Shuffle reached broadcast limit for table __I0 (broadcasted at
least 176120970 bytes). Consider using partitioned joins instead of
broadcast joins.
I have a difficulty to rewrite the SELECT with usage of suggested partitioned join. Or at least get the result somehow.

How do I query on a subset of ActiveModel records?

I've rewritten this question as my previous explanation was causing confusion.
In the SQL world, you have an initial record set that you apply a query to. The output of this query is the result set. Generally, the initial record set is an entire table of records and the result set is the records from the initial record set that match the query ruleset.
I have a use case where I need my application to occasionally operate on only a subset of records in a table. If a table has 10,000 records in it, I'd like my application to behave like only the first 1,000 records exist. These should be the same 1,000 records each time. In other words, I want the initial record set to be the first 1,000 devices in a table (when ordered by primary key), and the result set the resulting records from these first 1,000 devices.
Some solutions have been proposed, and it's revealed that my initial description was not very clear. To be more explicit, I am not trying to implement pagination. I'm also not trying to limit the number of results I receive (which .limit(1,000) would indeed achieve).
Thanks!
This is the line in your question that I don't understand:
This causes issues though with both of the calls, as limit limits the results of the query, not the database rows that the query is performed on.
This is not a Rails thing, this is a SQL thing.
Device.limit(n) runs SELECT * FROM device LIMIT n
Limit always returns a subset of the queried result set.
Would first(n) accomplish what you want? It will both order the result set ascending by the PK and limit the number of results returned.
SQL Statements can be chained together. So if you have your subset, you can then perform additional queries with it.
my_subset = Device.where(family: "Phone")
# SQL: SELECT * FROM Device WHERE `family` = "Phone"
my_results = my_subset.where(style: "Touchscreen")
# SQL: SELECT * FROM Device WHERE `family` = "Phone" AND `style` = "Touchscreen"
Which can also be written as:
my_results = Device.where(family: "Phone").where(style: "Touchscreen")
my_results = Device.where(family: "Phone", style: "Touchscreen")
# SQL: SELECT * FROM Device WHERE `family` = "Phone" AND `style` = "Touchscreen"
From your question, if you'd like to select the first 1,000 rows (ordered by primary key, pkey) and then query against that, you'll need to do:
my_results = Device.find_by_sql("SELECT *
FROM (SELECT * FROM devices ORDER BY pkey ASC LIMIT 1000)
WHERE `more_searching` = 'happens here'")
You could specifically ask for a set of IDs:
Device.where(id: (1..4).to_a)
That will construct a WHERE clause like:
WHERE id IN (1,2,3,4)

Ruby on Rails - Selecting a range from a query

I'm doing a query to get all the purchases from the db. For example
orders = PurchaseOrders.all
I in the same query, how can I select only the first hundred orders(1-100) or just the next 100(101-200) etc..?
Thank you
You can use limit and offset:
PurchaseOrders.limit(200).offset(100)
which meant start from 200 and take 100 records. More info here. Or with take:
PurchaseOrders.offset(100).take(400)
take 400 records starting from 100.
For the first 100 records;
orders = PurchaseOrders.first(100)
and last 100 records;
orders = PurchaseOrders.last(100)
or by IDs,
orders = PurchaseOrders.find([100, 201])

How to select data for defined page and total count of records?

I have a table with paginated data and this is the way I select data for each page:
#visitors = EventsVisitor
.select('visitors.*, events_visitors.checked_in, events_visitors.checkin_date, events_visitors.source, events_visitors.id AS ticket_id')
.joins(:visitor)
.order(order)
.where(:event_id => params[:event_id])
.where(filter_search)
.where(mode)
.limit(limit)
.offset(offset)
Also to build table pagination I need to know total count of records. Currently my solution for this is very rough:
total = EventsVisitor
.select('count(*) as count, events_visitors.*')
.joins(:visitor)
.order(order)
.where(:event_id => params[:event_id])
.where(filter_search)
.where(mode)
.first()
.count
So my question is as follows - What is the optimal ruby way to select limited data for the current page and total count of records?
I noticed that if I do #visitors.count - additional sql query will be generated:
SELECT COUNT(count_column) FROM (SELECT 1 AS count_column FROM `events_visitors` INNER JOIN `visitors` ON `visitors`.`id` = `events_visitors`.`visitor_id` WHERE `events_visitors`.`event_id` = 1 LIMIT 15 OFFSET 0) subquery_for_count
First of all, I do not understand what is the reason to send an additional query to get a count of data that we already have, I mean that after we got data from database in #visitors we can count it with ruby without need to send additional queries to DB.
Second - I thought that maybe there are some ways to use something like .total_count that will generate similar count(*) query but without that useless limit/offset?
you should except limit and offset
http://guides.rubyonrails.org/active_record_querying.html#except .
See how kaminari does it
https://github.com/kaminari/kaminari/blob/92052eedf047d65df71cc0021a9df9df1e2fc36e/lib/kaminari/models/active_record_relation_methods.rb#L11
So it might be something like
total = #visitors.except(:offset, :limit, :order).count

Resources