Group a Searchkick result? - ruby-on-rails

I have a basic Searchkick system set-up. I want to take the results and then group them by an attribute to sum a another attribute etc.
This question is close to my issue:
Elasticsearch + searckick
and the only answer was to use aggregations. I could do that but then I would be building an active record call for each of the agg keys returned.
Here is what I have so far:
BudgetItem.all.search("*", body_options: { aggs: { cbs_item_id: { terms: { field: "cbs_item_id" }, aggs: { "total": { "sum": { "field": "total" } } } } } } )
which results in:
"aggregations"=>{"cbs_item_id"=>{"doc_count_error_upper_bound"=>0, "sum_other_doc_count"=>0, "buckets"=>[{"key"=>5, "doc_count"=>2, "total"=>{"value"=>2956.0}}, {"key"=>6, "doc_count"=>2, "total"=>{"value"=>7734.0}}]}}}>
in my search_data I have a term 'cbs' which is a text value that relates to the 'cbs_item_id'. I am looking for this result:
"aggregations"=>
{"cbs_item_id"=>
{"doc_count_error_upper_bound"=>0, "sum_other_doc_count"=>0, "buckets"=>
[{"key"=>5, "doc_count"=>2, "total"=>{"value"=>2956.0}, "cbs"=>{"value"=>"MY CBS Related Field" }},
{"key"=>6, "doc_count"=>2, "total"=>{"value"=>7734.0}, "cbs"=>{"value"=>"MY OTHER CBS Related Field" }}]}}}
This of this where you have in inventory of cars and a separate table of car_colors ( [id = 1, color = red], [id = 3, color = blue ]. I want to search for the cars of a given color then group them and sum etc.
I am sure I am perhaps missing something simple here.
UPDATE
Getting close:
BudgetItem.all.search("*", body_options: { aggs: { cbs_item_id: { terms: { field: "cbs_item_id" }, aggs: { cbs: { terms: { field: "cbs" } }, "total": { "sum": { "field": "total" } } } } } } )
which results:
"buckets"=>
[{"key"=>5, "doc_count"=>2, "total"=>{"value"=>2956.0}, "cbs"=>{"doc_count_error_upper_bound"=>0, "sum_other_doc_count"=>0, "buckets"=>[{"key"=>"001", "doc_count"=>2}]}},
{"key"=>6, "doc_count"=>2, "total"=>{"value"=>7734.0}, "cbs"=>{"doc_count_error_upper_bound"=>0, "sum_other_doc_count"=>0, "buckets"=>[{"key"=>"002", "doc_count"=>2}]}}]}}
The second "key"s 001 and 002 are the data I am looking for.

Related

Combining results of two tables in mongoid/mongo

Hi guys what would be the best way to combine results of two mongoid queries.
My issue is that I would like to know active users, A user can send a letter and a notification, both are separate table and a user if he sends either the letter or the notification is considered active. What I want to know is how many active users were there per month.
right now what I can think of is doing this
Letter.collection.aggregate([
{ '$match': {}.merge(opts) },
{ '$sort': { 'created_at': 1 } },
{
'$group': {
_id: '$customer_id',
first_notif_sent: {
'$first': {
'day': { '$dayOfMonth': '$created_at' },
'month': { '$month': '$created_at' },
'year': { '$year': '$created_at' }
}
}
}
}])
Notification.collection.aggregate([
{ '$match': {}.merge(opts) },
{ '$sort': { 'created_at': 1 } },
{
'$group': {
_id: '$customer_id',
first_notif_sent: {
'$first': {
'day': { '$dayOfMonth': '$created_at' },
'month': { '$month': '$created_at' },
'year': { '$year': '$created_at' }
}
}
}
}])
What I am looking for is to get the minimum of the dates and then combine the results and get the count. Right now I can get the results and loop over each of them and create a new list. But I wanted to know if there is a way to do it in mongo directly.
EDIT
For letters
def self.get_active(tenant_id)
map = %{
function() {
emit(this.customer_id, new Date(this.created_at))
}
}
reduce = %{
function(key, values) {
return new Date(Math.min.apply(null, values))
}
}
where(tenant_id: tenant_id).map_reduce(map, reduce).out(reduce: "#{tenant_id}_letter_notification")
end
Notifications
def self.get_active(tenant_id)
map = %{
function() {
emit(this.customer_id, new Date(this.updated_at))
}
}
reduce = %{
function(key, values) {
return new Date(Math.min.apply(null, values))
}
}
where(tenant_id: tenant_id, transferred: true).map_reduce(map, reduce).out(reduce: "#{tenant_id}_outgoing_letter_standing_order_balance")
end
This is what I am thinking of going with, one of the reason is that, lookup does not work with my version of mongo.
the customer created a new notification, or a new letter, and I would like to get the first created at of either.
Let's address this first as a foundation. Given examples of document schema as below:
Document schema in Letter collection:
{ _id: <ObjectId>,
customer_id: <integer>,
created_at: <date> }
And, document schema in Notification collection:
{ _id: <ObjectId>,
customer_id: <integer>,
created_at: <date> }
You can utilise aggregation pipeline $lookup to join the two collections. For example using mongo shell :
db.letter.aggregate([
{"$group":{"_id":"$customer_id", tmp1:{"$max":"$created_at"}}},
{"$lookup":{from:"notification",
localField:"_id",
foreignField:"customer_id",
as:"notifications"}},
{"$project":{customer_id:"$_id",
_id:0,
latest_letter:"$tmp1",
latest_notification: {"$max":"$notifications.created_at"}}},
{"$addFields":{"latest":
{"$cond":[{"$gt":["$latest_letter", "$latest_notification"]},
"$latest_letter",
"$latest_notification"]}}},
{"$sort":{latest:-1}}
], {cursor:{batchSize:100}})
The output of the above aggregation pipeline is a list of customers in sorted order of created_at field from either Letter or Notification. Example output documents:
{
"customer_id": 0,
"latest_letter": ISODate("2017-12-19T07:00:08.818Z"),
"latest_notification": ISODate("2018-01-26T13:43:56.353Z"),
"latest": ISODate("2018-01-26T13:43:56.353Z")
},
{
"customer_id": 4,
"latest_letter": ISODate("2018-01-04T18:55:26.264Z"),
"latest_notification": ISODate("2018-01-25T02:05:19.035Z"),
"latest": ISODate("2018-01-25T02:05:19.035Z")
}, ...
What I want to know is how many active users were there per month
To achieve this, you can just replace the last stage ($sort) of the above aggregation pipeline with $group. For example:
db.letter.aggregate([
{"$group":{"_id":"$customer_id", tmp1:{$max:"$created_at"}}},
{"$lookup":{from:"notification",
localField:"_id",
foreignField:"customer_id",
as:"notifications"}},
{"$project":{customer_id:"$_id",
_id:0,
latest_letter:"$tmp1",
latest_notification: {"$max":"$notifications.created_at"}}},
{"$addFields":{"latest":
{"$cond":[{"$gt":["$latest_letter", "$latest_notification"]},
"$latest_letter",
"$latest_notification"]}}},
{"$group":{_id:{month:{"$month": "$latest"},
year:{"$year": "$latest"}},
active_users: {"$sum": "$customer_id"}
}
}
],{cursor:{batchSize:10}})
Where the example output would be as below:
{
"_id": {
"month": 10,
"year": 2017
},
"active_users": 9
},
{
"_id": {
"month": 1,
"year": 2018
},
"active_users": 18
},

Elasicsearch getting top 5 results from an aggregation with a script

I am trying to get the top 5 products sold, ordered by revenue using elasticsearch in Rails.
Here is my query:
query = {
bool: {
filter: {
bool: {
must: [
{ term: { store_id: store.id } } # Limiting the products by store
]
}
}
}
}
aggs = {
by_revenue: {
terms: {
size: 5,
order: {revenue: "desc"}
},
aggs: {
revenue: {
max: {
script: "doc['price_as_float'].value * doc['quantity'].value"
}
}
}
}
}
response = OrderItem.search(query: query, aggs: aggs, size: 0)
I get the error could not find the appropriate value context to perform aggregation [by_revenue]
Thanks!
You need to aggregate orders on product reference, then summing the prices * quantity to get the revenues from one product with a nested sum aggregation, not max:
aggs: {
products: {
terms: {
field: "product_ref",
order: { revenues: "desc" },
},
aggs: {
revenues: {
sum: { script: "doc['price_as_float'].value * doc['quantity'].value" }
}
}
}
}
Don't use the size option in the terms aggregation, because you're not sure all the orders for your top products are located in the same shard; you should get them from the response instead.

Elasticsearch sort option not supported

I'm using elastic search in Rails. I am trying to sort a list of customers by their total dollars spent descending. This is my ruby code:
query = {
bool: {
filter: {
term: { store_id: store.id } # Limits customers by current store
}
}
}
sort = {
sort: { "total_spent": { order: "desc" }}
}
response = Contact.search(query: query, sort: sort)
This returns with an error of sort option [total_spent] not supported I've tried with other fields to make sure it wasn't just something wrong with the total_spent field. Thanks.
I'm not really sure, but I think this may be related to incorrect usage of the ES::DSL.
What happens when you try this:
query = {
bool: {
filter: {
term: { store_id: store.id } # Limits customers by current store
}
}
}
sort = {
sort: [{ "total_spent": { order: "desc" }}] #https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-sort.html
}
response = Contact.search(query, sort)
We can sort specific to the field, refer https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-sort.html.
so we can use like,
query = {
bool: {
filter: {
term: { store_id: store.id } # Limits customers by current store
}
},
sort: { total_spent: { order: :desc }}
}
response = Contact.search(query)

Faceting with Rubberband

I am implementing ElasticSearch in a Ruby-on-Rails 2.3 application with the RubberBand gem. I am trying to return facets but I can't seem to find methods that I can use for this purpose. I've looked through the documentation and source.
Does anyone know if it's possible with rubberband?
This issue might have what you are looking for:
https://github.com/grantr/rubberband/issues/4
q = {
"query"=> {
"filtered"=> {
"query"=> {
"match_all"=> {}
},
"filter"=> {
"term"=> {
"client_id"=> "717",
"product_id"=> "1"
}
}
}
},
"facets"=> {
"shipped_to_state_counts"=> {
"terms"=> {
"field"=> "state",
"size"=> "500"
}
}
}
}
EDIT: (simpler query, lucene syntax)
NOTE: These are not the same queries, per elasticsearch documentation:
There’s one important distinction to keep in mind. While search
queries restrict both the returned documents and facet counts, search
filters restrict only returned documents — but not facet counts.
q = {
"query"=> {
"query_string"=> {
"query"=> "client_id:717 AND product_id:1"
}
},
"facets"=> {
"shipped_to_state_counts"=> {
"terms"=> {
"field"=> "state",
"size"=> "500"
}
}
}
}
END EDIT
results = client.search(q)
facets = results.facets
=>
{
"shipped_to_state_counts"=> {
"_type"=> "terms",
"missing"=> 0,
"total"=> 1873274,
"other"=> 0,
"terms"=> [
{
"term"=> "MO",
"count"=> 187327
},
{
"term"=> "FL",
"count"=> 17327
}
]
}
}

Filter result based on a count of inner data

I am building my search query for some listing data. As part of the search people can ask for multiple rooms which sleeps a min amount of people, ie two rooms which sleep 2 and 3 people.
Im not sure how I can perform that with a filter.
Here is a shortened search query so far.
{
"query":{
"filtered":{
"query":{
"match_all":{}
}
}
},
"filter":{
"and":
[
{
"term":{
"status":"live"
}
},
{
"geo_bounding_box":{
"location":{
"top_left":"60.856553, -8.64935719999994",
"bottom_right":"49.8669688, 1.76270959999999"
}
}
}
,{
"range":{
"bedrooms":{
"gte":"2"
}
}
}
]
}
,
"size":10
}
Test Data
{
"took":1,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":3,
"max_score":1.0,
"hits":[
{
"_index":"listings",
"_type":"listing",
"_id":"1",
"_score":1.0,
"_source":{
"name:":"Listing One",
"address1":"Some Street",
"bedrooms":2,
"city":"A City",
"id":1,
"refno":"FI451",
"user_id":1,
"rooms":[
{
"bathroom":"Shared bathroom with bath",
"double_standard":null,
"id":5,
"single":2,
"sleeps":2,
"title":"Twinny",
},
{
"bathroom":"Ensuite with bath",
"double_king_size":1,
"double_standard":1,
"id":1,
"single":null,
"sleeps":2,
"title":"Double Ensuite Room",
}
]
}
},
{
"_index":"listings",
"_type":"listing",
"_id":"2",
"_score":1.0,
"_source":{
"name":"Listing Two",
"address1":"Some Street",
"bedrooms":2,
"city":"A City",
"id":2,
"refno":"BL932",
"user_id":1,
"rooms":[
{
"bathroom":"Ensuite with bath",
"double_standard":1,
"id":4,
"single":1,
"sleeps":3,
"title":"Family Room",
},
{
"bathroom":"Ensuite with shower",
"double_standard":1,
"id":2,
"single":null,
"sleeps":2,
"title":"Single Room",
}
]
}
},
{
"_index":"listings",
"_type":"listing",
"_id":"3",
"_score":1.0,
"_source":{
"name":"Listing Three",
"address1":"Another Address",
"bedrooms":1,
"city":"Your City",
"id":3,
"refno":"TE2116",
"user_id":1,
"rooms":[
{
"bathroom":"Ensuite with shower",
"double_king_size":null,
"double_standard":1,
"id":3,
"single":1,
"sleeps":3,
"title":"Family Room",
}
]
}
}
]
}
}
If you look at my data I have 3 listings, two of them have multiple rooms (Listing One & Two) but only Listing Two would match my search, Reason it has one room with that sleeps two and the other sleeps three.
Is it possible to perform this query with elasticsearch?
If what you want is "Find all listings where a bedroom sleeps 2 AND another bedroom sleeps 3", this query will work. It makes one big assumptions: that you are using inner objects, and not the Nested data type.
This query is using the fact that inner objects are collapsed into a single field, causing "rooms.sleeps" to equal [2,3] for the desired field. Since the field is collapsed into a single array, a simple Terms query will match them. When you change the execution mode to And, it forces both 2 and 3 to be matched.
The caveat is that a room that has [2,3,4] will also be matched.
I've omitted the geo and status portion since that data wasn't provided in the source documents.
{
"query": {
"filtered": {
"query": {
"match_all": {}
}
}
},
"filter": {
"and": [
{
"range": {
"bedrooms": {
"gte": "2"
}
}
},
{
"terms": {
"rooms.sleeps": [2,3],
"execution": "and"
}
}
]
},
"size": 10
}
As far as I know the filter has to be a sibling of the query inside the filtered element. See: http://www.elasticsearch.org/guide/reference/query-dsl/filtered-query/
If you combine that with Zach's solution it should work.
{
"query":
{
"filtered":
{
"query":
{
"match_all":{}
},
"filter":
{
"put" : "your filter here"
}
}
}
}

Resources