Faceting with Rubberband - ruby-on-rails

I am implementing ElasticSearch in a Ruby-on-Rails 2.3 application with the RubberBand gem. I am trying to return facets but I can't seem to find methods that I can use for this purpose. I've looked through the documentation and source.
Does anyone know if it's possible with rubberband?

This issue might have what you are looking for:
https://github.com/grantr/rubberband/issues/4
q = {
"query"=> {
"filtered"=> {
"query"=> {
"match_all"=> {}
},
"filter"=> {
"term"=> {
"client_id"=> "717",
"product_id"=> "1"
}
}
}
},
"facets"=> {
"shipped_to_state_counts"=> {
"terms"=> {
"field"=> "state",
"size"=> "500"
}
}
}
}
EDIT: (simpler query, lucene syntax)
NOTE: These are not the same queries, per elasticsearch documentation:
There’s one important distinction to keep in mind. While search
queries restrict both the returned documents and facet counts, search
filters restrict only returned documents — but not facet counts.
q = {
"query"=> {
"query_string"=> {
"query"=> "client_id:717 AND product_id:1"
}
},
"facets"=> {
"shipped_to_state_counts"=> {
"terms"=> {
"field"=> "state",
"size"=> "500"
}
}
}
}
END EDIT
results = client.search(q)
facets = results.facets
=>
{
"shipped_to_state_counts"=> {
"_type"=> "terms",
"missing"=> 0,
"total"=> 1873274,
"other"=> 0,
"terms"=> [
{
"term"=> "MO",
"count"=> 187327
},
{
"term"=> "FL",
"count"=> 17327
}
]
}
}

Related

Group a Searchkick result?

I have a basic Searchkick system set-up. I want to take the results and then group them by an attribute to sum a another attribute etc.
This question is close to my issue:
Elasticsearch + searckick
and the only answer was to use aggregations. I could do that but then I would be building an active record call for each of the agg keys returned.
Here is what I have so far:
BudgetItem.all.search("*", body_options: { aggs: { cbs_item_id: { terms: { field: "cbs_item_id" }, aggs: { "total": { "sum": { "field": "total" } } } } } } )
which results in:
"aggregations"=>{"cbs_item_id"=>{"doc_count_error_upper_bound"=>0, "sum_other_doc_count"=>0, "buckets"=>[{"key"=>5, "doc_count"=>2, "total"=>{"value"=>2956.0}}, {"key"=>6, "doc_count"=>2, "total"=>{"value"=>7734.0}}]}}}>
in my search_data I have a term 'cbs' which is a text value that relates to the 'cbs_item_id'. I am looking for this result:
"aggregations"=>
{"cbs_item_id"=>
{"doc_count_error_upper_bound"=>0, "sum_other_doc_count"=>0, "buckets"=>
[{"key"=>5, "doc_count"=>2, "total"=>{"value"=>2956.0}, "cbs"=>{"value"=>"MY CBS Related Field" }},
{"key"=>6, "doc_count"=>2, "total"=>{"value"=>7734.0}, "cbs"=>{"value"=>"MY OTHER CBS Related Field" }}]}}}
This of this where you have in inventory of cars and a separate table of car_colors ( [id = 1, color = red], [id = 3, color = blue ]. I want to search for the cars of a given color then group them and sum etc.
I am sure I am perhaps missing something simple here.
UPDATE
Getting close:
BudgetItem.all.search("*", body_options: { aggs: { cbs_item_id: { terms: { field: "cbs_item_id" }, aggs: { cbs: { terms: { field: "cbs" } }, "total": { "sum": { "field": "total" } } } } } } )
which results:
"buckets"=>
[{"key"=>5, "doc_count"=>2, "total"=>{"value"=>2956.0}, "cbs"=>{"doc_count_error_upper_bound"=>0, "sum_other_doc_count"=>0, "buckets"=>[{"key"=>"001", "doc_count"=>2}]}},
{"key"=>6, "doc_count"=>2, "total"=>{"value"=>7734.0}, "cbs"=>{"doc_count_error_upper_bound"=>0, "sum_other_doc_count"=>0, "buckets"=>[{"key"=>"002", "doc_count"=>2}]}}]}}
The second "key"s 001 and 002 are the data I am looking for.

How to find ids on array who is created facet operator

I have Customer collection on MongoDB. With status field. Which can have the same Id fields.
And I need find first changed value like 'Guest' and push it Id's to specific pipeline named as 'guests'.
And customers with status 'Member' I need push tu another pipeline named as 'members' who Id'd equal Id's from aggregation pipeline 'guests'.
This is done in order to obtain the quantity elements in 'guests' and 'members'.
Its member item:
{"_id"=>{"$oid"=>"5ce2ecb3ad71852e7fa9e73f"},
"status"=>"member",
"duration"=>nil,
"is_deleted"=>false,
"customer_id"=>"17601",
"customer_journal_item_id"=>"62769",
"customer_ids"=>"17601",
"customer_journal_item_ids"=>"62769",
"self_customer_status_id"=>"21078",
"self_customer_status_created_at"=>"2017-02-01T00:00:00.000Z",
"self_customer_status_updated_at"=>"2017-02-01T00:00:00.000Z",
"updated_at"=>"2019-05-20T18:06:43.655Z",
"created_at"=>"2019-05-20T18:06:43.655Z"}}
My aggregation
{
'$sort': {'self_customer_status_created_at': 1}
},
{'$match':
{
'self_customer_status_created_at':
{
"$gte": Time.parse('2017-01-17').beginning_of_month,
"$lte": Time.parse('2017-01-17').end_of_month
}
}
},
{
"$facet": {
"guests":
[
{
"$group": {
"_id": "$_id",
"data": {
'$first': '$$ROOT'
}
}
},
{
"$match": {
"data.status": "guest"
}
}, {
"$group": {
"_id":nil,
"array":{
"$push": "$data.self_customer_status_id"
}
}
},
{
"$project":{
"array": 1,
"_id":0
}
}
], "members":
[
{
"$group": {
"_id": "$_id", "data": {
'$last': '$$ROOT'
}
}
},
{
"$match": {
"data.status": "member",
"data.self_customer_status_id": {
"$in": [
"$guests.array"
]
}
}
}
}
]
}
}, {
"$project":
{
"members": 1,
"guests.array": 1
}
}
]
).as_json
Instead "guests.array" array? I have error:
Mongo::Error::OperationFailure: $in needs an array (2)
What am I doing wrong?
Sorry my English!
second expression in faced doesnt seen first expression
need delete
,
"data.self_customer_status_id": {
"$in": {
"$arrayElemAt":
[
"$guests.array",
0
]
}
}
{"$match": {"data.self_customer_status_id": { "$in": ["guests.array"] } } }
```
this link paste before $project

Elasicsearch getting top 5 results from an aggregation with a script

I am trying to get the top 5 products sold, ordered by revenue using elasticsearch in Rails.
Here is my query:
query = {
bool: {
filter: {
bool: {
must: [
{ term: { store_id: store.id } } # Limiting the products by store
]
}
}
}
}
aggs = {
by_revenue: {
terms: {
size: 5,
order: {revenue: "desc"}
},
aggs: {
revenue: {
max: {
script: "doc['price_as_float'].value * doc['quantity'].value"
}
}
}
}
}
response = OrderItem.search(query: query, aggs: aggs, size: 0)
I get the error could not find the appropriate value context to perform aggregation [by_revenue]
Thanks!
You need to aggregate orders on product reference, then summing the prices * quantity to get the revenues from one product with a nested sum aggregation, not max:
aggs: {
products: {
terms: {
field: "product_ref",
order: { revenues: "desc" },
},
aggs: {
revenues: {
sum: { script: "doc['price_as_float'].value * doc['quantity'].value" }
}
}
}
}
Don't use the size option in the terms aggregation, because you're not sure all the orders for your top products are located in the same shard; you should get them from the response instead.

Filtered search with Authorization for Elasticsearch

I'm trying to do a search where I look for "test" in any field while filtering for a specific client in the client_id field. Can't seem to figure this one out. This is how fat I got (but it's not working):
{
query: {
filtered: {
query: "test",
filter: {
term: {client_id: #client.id}
}
}
}
}
This is the right syntax
{
"query": {
"filtered": {
"query": {
"match": {
"_all": "test"
}
},
"filter": {
"term": {
"client_id": #client.id
}
}
}
}
}
From ES Docs: The _all field allows you to search for values in documents without knowing which field contains the value

ElasticSearch returns items that are too far away when using a geo_distance filter

When I am searching my ElasticSearch documents using a nested filter -> and -> geo_distance I retrieve documents which are too far away (and I don't want returned.) You can see the query and a screenshot below of the results (raw results on the left and manually filtered results on the right).
Here's another copy of the query:
{
"query":{
"match_all":{
}
},
"filter":{
"and":[
{
"term":{
"PropertySubType":"Single Family"
}
},
{
"term":{
"City":"Los Angeles"
}
},
{
"geo_distance":{
"distance":"2.25miles",
"Location":[
34.111583657,
-118.324646099
]
}
},
{
"range":{
"BedroomsTotal":{
"gte":3
}
}
},
{
"range":{
"BuildingSize":{
"gte":3000
}
}
},
{
"range":{
"YearBuilt":{
"lte":2000
}
}
},
{
"terms":{
"ListingStatus":[
"Active",
"Pending",
"Closed"
]
}
}
]
},
"size":100
}
Adding the option "distance_type" and setting it to "plane" fixed this issue. See "distance_type" here:
http://www.elasticsearch.org/guide/reference/query-dsl/geo-distance-filter.html

Resources