Elasicsearch getting top 5 results from an aggregation with a script - ruby-on-rails

I am trying to get the top 5 products sold, ordered by revenue using elasticsearch in Rails.
Here is my query:
query = {
bool: {
filter: {
bool: {
must: [
{ term: { store_id: store.id } } # Limiting the products by store
]
}
}
}
}
aggs = {
by_revenue: {
terms: {
size: 5,
order: {revenue: "desc"}
},
aggs: {
revenue: {
max: {
script: "doc['price_as_float'].value * doc['quantity'].value"
}
}
}
}
}
response = OrderItem.search(query: query, aggs: aggs, size: 0)
I get the error could not find the appropriate value context to perform aggregation [by_revenue]
Thanks!

You need to aggregate orders on product reference, then summing the prices * quantity to get the revenues from one product with a nested sum aggregation, not max:
aggs: {
products: {
terms: {
field: "product_ref",
order: { revenues: "desc" },
},
aggs: {
revenues: {
sum: { script: "doc['price_as_float'].value * doc['quantity'].value" }
}
}
}
}
Don't use the size option in the terms aggregation, because you're not sure all the orders for your top products are located in the same shard; you should get them from the response instead.

Related

Group a Searchkick result?

I have a basic Searchkick system set-up. I want to take the results and then group them by an attribute to sum a another attribute etc.
This question is close to my issue:
Elasticsearch + searckick
and the only answer was to use aggregations. I could do that but then I would be building an active record call for each of the agg keys returned.
Here is what I have so far:
BudgetItem.all.search("*", body_options: { aggs: { cbs_item_id: { terms: { field: "cbs_item_id" }, aggs: { "total": { "sum": { "field": "total" } } } } } } )
which results in:
"aggregations"=>{"cbs_item_id"=>{"doc_count_error_upper_bound"=>0, "sum_other_doc_count"=>0, "buckets"=>[{"key"=>5, "doc_count"=>2, "total"=>{"value"=>2956.0}}, {"key"=>6, "doc_count"=>2, "total"=>{"value"=>7734.0}}]}}}>
in my search_data I have a term 'cbs' which is a text value that relates to the 'cbs_item_id'. I am looking for this result:
"aggregations"=>
{"cbs_item_id"=>
{"doc_count_error_upper_bound"=>0, "sum_other_doc_count"=>0, "buckets"=>
[{"key"=>5, "doc_count"=>2, "total"=>{"value"=>2956.0}, "cbs"=>{"value"=>"MY CBS Related Field" }},
{"key"=>6, "doc_count"=>2, "total"=>{"value"=>7734.0}, "cbs"=>{"value"=>"MY OTHER CBS Related Field" }}]}}}
This of this where you have in inventory of cars and a separate table of car_colors ( [id = 1, color = red], [id = 3, color = blue ]. I want to search for the cars of a given color then group them and sum etc.
I am sure I am perhaps missing something simple here.
UPDATE
Getting close:
BudgetItem.all.search("*", body_options: { aggs: { cbs_item_id: { terms: { field: "cbs_item_id" }, aggs: { cbs: { terms: { field: "cbs" } }, "total": { "sum": { "field": "total" } } } } } } )
which results:
"buckets"=>
[{"key"=>5, "doc_count"=>2, "total"=>{"value"=>2956.0}, "cbs"=>{"doc_count_error_upper_bound"=>0, "sum_other_doc_count"=>0, "buckets"=>[{"key"=>"001", "doc_count"=>2}]}},
{"key"=>6, "doc_count"=>2, "total"=>{"value"=>7734.0}, "cbs"=>{"doc_count_error_upper_bound"=>0, "sum_other_doc_count"=>0, "buckets"=>[{"key"=>"002", "doc_count"=>2}]}}]}}
The second "key"s 001 and 002 are the data I am looking for.

Combining results of two tables in mongoid/mongo

Hi guys what would be the best way to combine results of two mongoid queries.
My issue is that I would like to know active users, A user can send a letter and a notification, both are separate table and a user if he sends either the letter or the notification is considered active. What I want to know is how many active users were there per month.
right now what I can think of is doing this
Letter.collection.aggregate([
{ '$match': {}.merge(opts) },
{ '$sort': { 'created_at': 1 } },
{
'$group': {
_id: '$customer_id',
first_notif_sent: {
'$first': {
'day': { '$dayOfMonth': '$created_at' },
'month': { '$month': '$created_at' },
'year': { '$year': '$created_at' }
}
}
}
}])
Notification.collection.aggregate([
{ '$match': {}.merge(opts) },
{ '$sort': { 'created_at': 1 } },
{
'$group': {
_id: '$customer_id',
first_notif_sent: {
'$first': {
'day': { '$dayOfMonth': '$created_at' },
'month': { '$month': '$created_at' },
'year': { '$year': '$created_at' }
}
}
}
}])
What I am looking for is to get the minimum of the dates and then combine the results and get the count. Right now I can get the results and loop over each of them and create a new list. But I wanted to know if there is a way to do it in mongo directly.
EDIT
For letters
def self.get_active(tenant_id)
map = %{
function() {
emit(this.customer_id, new Date(this.created_at))
}
}
reduce = %{
function(key, values) {
return new Date(Math.min.apply(null, values))
}
}
where(tenant_id: tenant_id).map_reduce(map, reduce).out(reduce: "#{tenant_id}_letter_notification")
end
Notifications
def self.get_active(tenant_id)
map = %{
function() {
emit(this.customer_id, new Date(this.updated_at))
}
}
reduce = %{
function(key, values) {
return new Date(Math.min.apply(null, values))
}
}
where(tenant_id: tenant_id, transferred: true).map_reduce(map, reduce).out(reduce: "#{tenant_id}_outgoing_letter_standing_order_balance")
end
This is what I am thinking of going with, one of the reason is that, lookup does not work with my version of mongo.
the customer created a new notification, or a new letter, and I would like to get the first created at of either.
Let's address this first as a foundation. Given examples of document schema as below:
Document schema in Letter collection:
{ _id: <ObjectId>,
customer_id: <integer>,
created_at: <date> }
And, document schema in Notification collection:
{ _id: <ObjectId>,
customer_id: <integer>,
created_at: <date> }
You can utilise aggregation pipeline $lookup to join the two collections. For example using mongo shell :
db.letter.aggregate([
{"$group":{"_id":"$customer_id", tmp1:{"$max":"$created_at"}}},
{"$lookup":{from:"notification",
localField:"_id",
foreignField:"customer_id",
as:"notifications"}},
{"$project":{customer_id:"$_id",
_id:0,
latest_letter:"$tmp1",
latest_notification: {"$max":"$notifications.created_at"}}},
{"$addFields":{"latest":
{"$cond":[{"$gt":["$latest_letter", "$latest_notification"]},
"$latest_letter",
"$latest_notification"]}}},
{"$sort":{latest:-1}}
], {cursor:{batchSize:100}})
The output of the above aggregation pipeline is a list of customers in sorted order of created_at field from either Letter or Notification. Example output documents:
{
"customer_id": 0,
"latest_letter": ISODate("2017-12-19T07:00:08.818Z"),
"latest_notification": ISODate("2018-01-26T13:43:56.353Z"),
"latest": ISODate("2018-01-26T13:43:56.353Z")
},
{
"customer_id": 4,
"latest_letter": ISODate("2018-01-04T18:55:26.264Z"),
"latest_notification": ISODate("2018-01-25T02:05:19.035Z"),
"latest": ISODate("2018-01-25T02:05:19.035Z")
}, ...
What I want to know is how many active users were there per month
To achieve this, you can just replace the last stage ($sort) of the above aggregation pipeline with $group. For example:
db.letter.aggregate([
{"$group":{"_id":"$customer_id", tmp1:{$max:"$created_at"}}},
{"$lookup":{from:"notification",
localField:"_id",
foreignField:"customer_id",
as:"notifications"}},
{"$project":{customer_id:"$_id",
_id:0,
latest_letter:"$tmp1",
latest_notification: {"$max":"$notifications.created_at"}}},
{"$addFields":{"latest":
{"$cond":[{"$gt":["$latest_letter", "$latest_notification"]},
"$latest_letter",
"$latest_notification"]}}},
{"$group":{_id:{month:{"$month": "$latest"},
year:{"$year": "$latest"}},
active_users: {"$sum": "$customer_id"}
}
}
],{cursor:{batchSize:10}})
Where the example output would be as below:
{
"_id": {
"month": 10,
"year": 2017
},
"active_users": 9
},
{
"_id": {
"month": 1,
"year": 2018
},
"active_users": 18
},

Elasticsearch sort option not supported

I'm using elastic search in Rails. I am trying to sort a list of customers by their total dollars spent descending. This is my ruby code:
query = {
bool: {
filter: {
term: { store_id: store.id } # Limits customers by current store
}
}
}
sort = {
sort: { "total_spent": { order: "desc" }}
}
response = Contact.search(query: query, sort: sort)
This returns with an error of sort option [total_spent] not supported I've tried with other fields to make sure it wasn't just something wrong with the total_spent field. Thanks.
I'm not really sure, but I think this may be related to incorrect usage of the ES::DSL.
What happens when you try this:
query = {
bool: {
filter: {
term: { store_id: store.id } # Limits customers by current store
}
}
}
sort = {
sort: [{ "total_spent": { order: "desc" }}] #https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-sort.html
}
response = Contact.search(query, sort)
We can sort specific to the field, refer https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-sort.html.
so we can use like,
query = {
bool: {
filter: {
term: { store_id: store.id } # Limits customers by current store
}
},
sort: { total_spent: { order: :desc }}
}
response = Contact.search(query)

Elasticsearch-rails, highlights query

I'm trying to get highlights from the Elasticsearch-rails gem, but I can't get it to work.
My search method:
query = {
query: {
filtered: {
query: {
match: {
_all: params[:q]
}
},
filter: {
term: {
active: true
}
}
},
},
highlight: {
fields: {
_all: {fragment_size: 150, number_of_fragments: 3}
}
}
}
#results = Elasticsearch::Model.search(query, [Market, Component]).results
When I map my results in the view to check if there are any highlights, I get an array of false:
= #results.map(&:highlight?)
I read through the Elasticsearch docs here: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-highlighting.html and the gem's documentation here: https://github.com/elastic/elasticsearch-rails/tree/master/elasticsearch-model and my query seems to be correct. Not sure how to proceed.
Apparently, the solution was to use "*" instead of "_all":
query = {
query: {
filtered: {
query: {
match: {
_all: params[:q]
}
},
filter: {
term: {
active: true
}
}
},
},
highlight: {
tags_schema: "styled",
fields: {
:"*" => {}
}
}
}

Faceting with Rubberband

I am implementing ElasticSearch in a Ruby-on-Rails 2.3 application with the RubberBand gem. I am trying to return facets but I can't seem to find methods that I can use for this purpose. I've looked through the documentation and source.
Does anyone know if it's possible with rubberband?
This issue might have what you are looking for:
https://github.com/grantr/rubberband/issues/4
q = {
"query"=> {
"filtered"=> {
"query"=> {
"match_all"=> {}
},
"filter"=> {
"term"=> {
"client_id"=> "717",
"product_id"=> "1"
}
}
}
},
"facets"=> {
"shipped_to_state_counts"=> {
"terms"=> {
"field"=> "state",
"size"=> "500"
}
}
}
}
EDIT: (simpler query, lucene syntax)
NOTE: These are not the same queries, per elasticsearch documentation:
There’s one important distinction to keep in mind. While search
queries restrict both the returned documents and facet counts, search
filters restrict only returned documents — but not facet counts.
q = {
"query"=> {
"query_string"=> {
"query"=> "client_id:717 AND product_id:1"
}
},
"facets"=> {
"shipped_to_state_counts"=> {
"terms"=> {
"field"=> "state",
"size"=> "500"
}
}
}
}
END EDIT
results = client.search(q)
facets = results.facets
=>
{
"shipped_to_state_counts"=> {
"_type"=> "terms",
"missing"=> 0,
"total"=> 1873274,
"other"=> 0,
"terms"=> [
{
"term"=> "MO",
"count"=> 187327
},
{
"term"=> "FL",
"count"=> 17327
}
]
}
}

Resources