Unable to search on whole database with searchkick as it limits to 10000 records - ruby-on-rails

Unable to search on whole elastic search DB just by using
SearchData.search('yamaha', match: :word_middle,load: false)
This limits the search to 10000 records but in my DB there are more than a hundred thousand records so, how to search on the whole DB not just the first ten thousand records I'm not able to find anything a little help will be appreciated

I am not sure about searckick but Elasticsearch always search on entire index and not only 10000 recored but by default it returns only 10 documents in response. You can change response size by changing size parameters and max it return 10000 documents per request.
Also, if you have large index then it always show 10000 for hits.total.value in response and for getting actual number count you can set track_total_hits value to true.
{
"track_total_hits": true,
"query": {
"match" : {
"user.id" : "elkbee"
}
}
}
If you need to get more then 10000 documents from the Elasticsearch then you can use search_after or scroll API. you can refer this documentation for more details.

Deep Paging
By default, Elasticsearch and OpenSearch limit paging to the first 10,000 results. Here’s why. We don’t recommend changing this, but if you really need all results, you can use:
class Product < ApplicationRecord
searchkick deep_paging: true
end
If you just need an accurate total count, you can instead use:
Product.search("pears", body_options: {track_total_hits: true})

Searhkick only limits the results to 10,000. It will still search on all records in your model, even if the number is well over 10,000.
If you aren't seeing the results you expect, there is something else wrong with your setup.
Have you included searchkick in your model and reindexed?
class YourModel < ApplicationRecord
searchkick
end
And in a new console: YourModel.reindex

Related

Sort Users by Number of Followers

I'm using a simple follower system in my application and I can get the number of any user's followers by running User.followers.count. However, when I try to sort all users by the number of followers they each have with #orderedUsers = User.all.order("followers.count DESC") it returns the error "ActiveRecord::StatementInvalid: SQLite3::SQLException: no such column: followers.count". Obviously, this is because there is no such column. Is there a way to work around this to do what I wish to achieve?
Thanks.
This code should work (assuming the DB table names are users and followers):
User.joins(:followers).order("count(followers.user_id) desc")
How about something like:
#ordered_users = User.all.sort{|a,b| a.followers.count <=> b.followers.count}
For the reverse order, you can do:
#ordered_users = User.all.sort{|a,b| b.followers.count <=> a.followers.count}
Or, .reverse, as you say in the comments.
EDIT: #Alex Quach left a good alternative in a different post. I've modified it for where it will not include the current user in the list, which may be helpful:
User.all.where('id != ?', current_user.id).sort_by { |u| -u.followers.count }
I would strongly consider using a counter cache on the User model, to hold the count of followers.
This would give a very small performance impact on adding or removing followers, and greatly increase performance when performing sorts:
User.order(followers_count: :desc)
This would be particularly noticeable if you wanted the top-n users by follower count, or finding users with no followers.

Mongoid::Criteria to array in Ruby

I have a Mongoid::Criteria object data which has 634 results:
#<Mongoid::Criteria
selector: {"search_id"=>155, "posted_time"=>{"$gte"=>2016-05-31 15:43:40 UTC, "$lte"=>2016-06-07 15:43:40 UTC}}
options: {:sort=>{"posted_time"=>-1}, :limit=>200}
class: MongoPost
embedded: false>
data.count
=> 634
data.to_a.count
=> 200
This is a problem, because when I map the object, the result is not which I expected:
data.map{|element| element.type}.count
=> 200
What is doing on, and how can I resolve it?
Your criteria has limit: 200 on it. each (and, by extension, map) respects the limit. That's why you get 200 records.
Why count ignores limit - this is a good question. I don't know exactly, but I suspect it has something to do with pagination. That is, having one criteria object, you can fetch a page and know total number of matching records (from which you can infer total number of pages).
To get count of limited query you can do something like:
criteria.lazy.count
This count is not the one from Mongoid, it's from Enumerable and will respect the limit.
About unsetting the limit: quick googling didn't reveal a suitable api, but you can always override limit with a bigger value. This, perhaps:
criteria.limit(criteria.count).to_a # get all records, without limit
Note about efficiency
Don't be fooled by the lazy keyword. It's only there to turn the mongo relation into an enumerable. There's nothing lazy about that. It still runs the query and iterates the result set. Essentially, it's the same as
criteria.to_a.count
In the current version of mongoid (5.x), this is possible via options to .count method.
criteria.count(limit: 10)
Or, to reuse whatever limit is already set on the criteria
criteria.count(criteria.options.slice(:limit))

searchkich search all data how we can increase?

I am using searchkich
it has method to find all the record with like Model.search('*') but it only returns 1000 records , can anyone help how I can increase it's size 100 to all records
You're looking to change the import batch size for example:
class Product < ActiveRecord::Base
searchkick batch_size: 5000 # defaults to 1000
end
You won't be able to specify all records however if you know the largest data-set you can try to specify that number as the limit
Product.search "2% Milk", limit: 5000, offset: 50
Unfortunately I believe the max return is the same as Elastic Search (10,000 documents) so that may be your hard ceiling.

grails 2.4.4, how to do pagination with very large record set, so dont know the total

All the examples I can find do something like this:
<g:paginate controller="Book" action="list" total="${bookInstanceTotal}" />
and the total attribute is "required" according to the documentation.
This works fine for very simple examples with small record sets (e.g. a few hundred)
If there are say 100k rows returned because the user put in wide search criteria, then I certainly don't want to read them all to find the total to allow pagination, and don't want to transfer all 100k rows from the db to the grails server, and don't want to repeat this each time thy hit the next page. I want to use the mysql limit/offset or similar to only bring back the small number of required rows.
Is this possible, or do I really have to work out the total (by reading all the records, or doing a separate count, then read the records?
I will always prefer to use criteria for pagination.
The example of using criteria :
def c = Account.createCriteria()
def results = c.list (max: 10, offset: 10) {
like("holderFirstName", "Fred%")
and {
between("balance", 500, 1000)
eq("branch", "London")
}
order("holderLastName", "desc")
}
This example is taken from grails documentation and you can read more about criteria in this documentation.
Using this criteria, you will get at max 10 results. But the important part is you can get total count according to the same criteria by using
results.totalCount
You don't read all records from db and load in grails to get the total. You just load 10 or whatever number of records you display in each page and you execute a count query to get the totalCount.
It works like this.
Lets say, you display 10 records on each page and you have 100K records in db.
Lets say UI passes max and offset params.
params.max = params.max ? (params.int(max) < 100 : params.max : 100) : 10
params.offset = params.offset ?: 0
def list = Domain.list(params)
When max option is specified, Domain.list() method returns PagedResultList which has getTotalCount() method which fires a count query and returns totalCount.
And you render the view like this
render(view:"list", model:[list:list, totalCount:list.totalCount)
So here you are not loading all the records from database, you are loading just 10 records and execute a count query to get totalCount

Range of IDs via ActiveRecord

I understand that I can use People.first(100) to retrieve the first 100 records, same goes for People.last(100).
What I don`t know, is how do I retrieve all objects in the range of 200-400, when the total number is lets say a 1000 records ?
What you need is limit and offset - read this for more info.
Example:
People.limit(200).offset(200)
The above code takes 200 records starting from 201st record - that means it would be records 201-400.
Are you searching on a specific field, your title suggests you're searching on id?
People.where('id BETWEEN ? AND ?', 200, 400)
or...
People.where(id: 200..400)
If you're not searching on a particular field, you would want to use Big_Bird's limit and offset methods.

Resources