Using scroll api via elasticsearch-model - ruby-on-rails

For the life of me I can't find any reference to using the ElasticSearch scroll api from within Ruby on Rails and the elastisearch-model (or rails or dsl) gem.
The only thing they do reference in the docs is calling scroll directly on the client, which kind of defeats the purpose. Also, it does not use the client or any client settings you've already set in your Rails app.
I want to do something like this.
Here is the ElasticSearch query that works from within the Kibana Dev Tools:
GET model_index/_search?scroll=1m
{
"size": 100,
"query": {
"match": {
"tenant_id": 3196
}
},
"_source": "id"
}
I would have thought that I could call something like
MyModel.search scroll: '1m', ...
but instead it seems like I need to do:
# First create a client by hand
client = Elasticssearch::Client.new
result = client.search index: 'model_index',
scroll: '1m',
body: { query: { match: { tenant_id: 3196 } }, sort: '_id' }
Does anyone have any more user-friendly examples?

As per elasticsearch guide -
We no longer recommend using the scroll API for deep pagination. If you need to preserve the index state while paging through more than 10,000 hits, use the search_after parameter with a point in time (PIT).
Ref - https://www.elastic.co/guide/en/elasticsearch/reference/7.x/scroll-api.html
Further edit for above question -
To scroll on document need to use scroll_id from result, to get next set of result.
body = { query: { match: { tenant_id: 3196 } }, sort: '_id' }
response = Elasticsearch::Client.new.search(
index: 'model_index',
scroll: "1m",
body: body,
size: 3000
)
loop do
hits = response.dig('hits', 'hits')
break if hits.empty?
hits.each do |hit|
# do something
end
response = Elasticsearch::Client.new.scroll(
:body => { :scroll_id => response['_scroll_id'] },
:scroll => '1m'
)
end

Related

Mongoid Aggregate result into an instance of a rails model

Introduction
Correcting a legacy code, there is an index of object LandingPage where most columns are supposed to be sortable, but aren't. This was mostly corrected, but few columns keep posing me trouble.
Theses columns are the one needing an aggregation, because based on a count of other documents. To simplify the explanation of the problem, I will speak only about one of them which is called Visit, as the rest of the code will just be duplication.
The code fetch sorted and paginate data, then modify each object using LandingPage methods before sending the json back. It was already like this and I can't modify it.
Because of that, I need to do an aggregation (to sort LandingPage by Visit counts), then get the object as LandingPage instance to let the legacy code work on them.
The problem is the incapacity to transform Mongoid::Document to a LandingPage instance
Here is the error I got:
Mongoid::Errors::UnknownAttribute:
Message:
unknown_attribute : message
Summary:
unknown_attribute : summary
Resolution:
unknown_attribute : resolution
Here is my code:
def controller_function
landing_pages = fetch_landing_page
landing_page_hash[:data] = landing_pages.map do |landing_page|
landing_page.do_something
# Do other things
end
render json: landing_page_hash
end
def fetch_landing_page
criteria = LandingPage.where(archived: false)
columns_name = params[:columns_name]
column_direction = params[:column_direction]
case order_column_name
when 'visit'
order_by_visits(criteria, column_direction)
else
criteria.order_by(columns_name => column_direction).paginate(
per_page: params[:length],
page: (params[:start].to_i / params[:length].to_i) + 1
)
end
def order_by_visit(criteria, order_direction)
def order_by_visits(landing_pages, column_direction)
LandingPage.collection.aggregate([
{ '$match': landing_pages.selector },
{ '$lookup': {
from: 'visits',
localField: '_id',
foreignField: 'landing_page_id',
as: 'visits'
}},
{ '$addFields': { 'visits_count': { '$size': '$visits' }}},
{ '$sort': { 'visits_count': column_direction == 'asc' ? 1 : -1 }},
{ '$unset': ['visits', 'visits_count'] },
{ '$skip': params[:start].to_i },
{ '$limit': params[:length].to_i }
]).map { |attrs| LandingPage.new(attrs) { |o| o.new_record = false } }
end
end
What I have tried
Copy and past the hash in console to LandingPage.new(attributes), and the instance was created and valid.
Change the attributes key from string to symbole, and it still didn't work.
Using is_a?(hash) on any element of the returned array returns true.
Put it to json and then back to a hash. Still got a Mongoid::Document.
How can I make the return of the Aggregate be a valid instance of LandingPage ?
Aggregation pipeline is implemented by the Ruby MongoDB driver, not by Mongoid, and as such does not return Mongoid model instances.
An example of how one might obtain Mongoid model instances is given in documentation.

How to retreive all the records from elasticsearch in rails

There is an upper limit on the number of docs you can get from elastic search(that is 10000). we can use "scroll" to retrieve all the records. Does anyone know how to embed this in code?
There is this method scroll
https://github.com/elastic/elasticsearch-ruby/blob/4608fd144277941003de71a0cdc24bd39f17a012/elasticsearch-api/lib/elasticsearch/api/actions/scroll.rb
But I don't know how to use it. Could you explain how to use it?
I have tried the "scan". But it is no longer supported in Elasticsearch anymore.
# Open the "view" of the index
response = client.search index: 'test', search_type: 'scan', scroll: '5m', size: 10
# Call `scroll` until results are empty
while response = client.scroll(scroll_id: response['_scroll_id'], scroll: '5m') and not
response['hits']['hits'].empty? do
puts response['hits']['hits'].map { |r| r['_source']['title'] }
end
Your code should work, but as you mentioned the scan parameter for search_type is not necessary. I just ran this locally with some test data and it worked:
# scroll.rb
require 'elasticsearch'
client = Elasticsearch::Client.new
response = client.search(index: 'articles', scroll: '10m')
scroll_id = response['_scroll_id']
while response['hits']['hits'].size.positive?
response = client.scroll(scroll: '5m', body: { scroll_id: scroll_id })
puts(response['hits']['hits'].map { |r| r['_source']['title'] })
end
Output:
$ ruby scroll.rb
Title 297
Title 298
Title 299
Title 300
...
You can fiddle around with the value for the scroll parameter, but something like this should work for you too.
paragraph from elastic official docs :
We no longer recommend using the scroll API for deep pagination. If
you need to preserve the index state while paging through more than
10,000 hits, use the search_after parameter with a point in time
(PIT).
Scroll Official Doc Link
I recommend to use pagination.
you can use
that limitation in number of hits is for performance inprovements, you can use pagination, Its much faster.
in this way you can use start point with form key or use search_after key with sort and PIT(point in time for prevent from inconsistent result). and you can determinate you hits size key with 10 for faster query time.
Pagination Official Doc Link
for instantiate PIT ID:
POST /test/_pit?keep_alive=1m
for instantiate pagination:
GET /test/_search
{
"size": 10,
"query": {
"match" : {
"user.id" : "elkbee"
}
},
"pit": {
"id": "46ToAwMDaWR5BXV1aWQyKwZub2RlXzMAAAAAAAAAACoBYwADaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQADaWR5BXV1aWQyKgZub2RlXzIAAAAAAAAAAAwBYgACBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==",
"keep_alive": "1m"
},
"sort": [
{"#timestamp": {"order": "asc", "format": "strict_date_optional_time_nanos", "numeric_type" : "date_nanos" }}
]
}
for get rest of data in pagination:
there is sort key in the result, put it in the search_after
GET /test/_search
{
"size": 10,
"query": {
"match" : {
"user.id" : "elkbee"
}
},
"pit": {
"id": "46ToAwMDaWR5BXV1aWQyKwZub2RlXzMAAAAAAAAAACoBYwADaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQADaWR5BXV1aWQyKgZub2RlXzIAAAAAAAAAAAwBYgACBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==",
"keep_alive": "1m"
},
"sort": [
{"#timestamp": {"order": "asc", "format": "strict_date_optional_time_nanos"}}
],
"search_after": [
"2021-05-20T05:30:04.832Z", #you can find this value from sort key in response
4294967298
],
"track_total_hits": false
}

Multiple Aggregations on same level in Elasticsearch-rails

I am trying to perform multiple aggregations at the same level with ElasticSearch using the elasticsearch-rails and elasticsearch-model gems.
In the query hash that I am generating, I have the following -
def query_hash(params, current_person = nil, manager_id = nil)
aggs = {}
aggs[:home_country_id] = {terms: {field: "home_country_id"}}
aggs[:home_region_id] = {terms: {field: "home_region_id"}}
{
sort: [ { created_at: { order: "desc" } }, id: { order: "desc" } ],
aggs: aggs
}
end
The response I stored in an object es_response.
When I search for both the aggregations, I can only find the last one in the response.
es_response.response["aggregrations"] only has the response of the latest aggregation object, home_region_id.
I couldn't find much documentation on the ES Reference on structuring multiple aggregations on the same level although there was a lot about nesting aggregations.
How can I fix this?
My ES version is 5.1

How do I query array elements in elastic search

This is the index of my model in the elastic search
{
"_index":"cars",
"_type":"car",
"_id":"3275",
"_version":4,
"_score":1,
"_source":{
"category_id": 6,
"car_branches":[
{
"id":32,
"name":"Type1"
},
{
"id":33,
"name":"Type2"
},
{
"id":36,
"name":"Type3"
}
],
}
}
I can query category_id with
Car.__elasticsearch__.search query:{match:{category_id: 6}}
How do I query for car_branches? I tried this
response = Car.__elasticsearch__.search query:{match:{car_branches:[id: 32]}}
I am getting Elasticsearch::Transport::Transport::Errors::BadRequest: [400]
You first need to delete your index first and recreate it. Before doing so, you need to change your mapping and make the car_branches field nested, like this:
indexes :car_branches, type: 'nested' do
indexes :id
indexes :name
Then you'll be able to make the query your want like this:
response = Car.__elasticsearch__.search query:{nested:{path: 'car_branches', query:{term:{'car_branches.id':[32]}}}}

ember-data: server side code for removing an associated object

I 'm working with revision 12 of ember-data RESTAdapter and using the rails-api gem.
I have these models:
App.TransportDocumentRow = DS.Model.extend,
productName: DS.attr 'string'
transportDocument: DS.belongsTo('App.TransportDocument')
App.TransportDocument = DS.Model.extend
number: DS.attr 'string'
transportDocumentRows: DS.hasMany('App.TransportDocumentRow')
configured in this way:
DS.RESTAdapter.map 'App.TransportDocument', {
transportDocumentRows: { embedded: 'always' }
}
(i'm using embedded: always becauseif i don't my document rows are committed with document_id = 0, as asked here
Consider i have already created a transport document (id: 1) with 2 rows. If i delete a row (with id: 1), the result would be a PUT request to /transport_documents/1.
The JSON sent with this put would be something like this:
{"transport_document"=>
{"number"=>"1", "transport_document_rows"=>
[
{"id"=>2, "product_name"=>"aaaa", "transport_document_id"=>1}
]
}, "id"=>"1"
}
while rails would expect something like this:
{"transport_document"=>
{"number"=>"1", "transport_document_rows"=>
[
{"id"=>1, "_delete"=>1}
{"id"=>2, "product_name"=>"aaaa", "transport_document_id"=>1}
]
}, "id"=>"1"
}
Is there a way specified in active_model_serializers to do this?
Or should i make some manual transformations my controller?
Or should i change the payload so that ember produces the correct request?

Resources