ElasticSearch Aggregator with sorting by text/keyword - ruby-on-rails

I have elasticsearch set up for searching across a products catalog's variants. Basically where:
Product has_many variants
Variant belongs_to product
And the variant index json / mapping contains the product name.
I am trying to search variants, grouped by product id, bucket size of 1. I am able to do it and sort by min price, max price, etc.
This works:
POST /variants/_search?size=0
{
"aggs" : {
"min_price" : { "min" : { "field" : "price" } }
}
}
This is (sort of) what I need next:
POST /variants/_search?size=0
{
"aggs" : {
"product_name" : { "sort by product_name asc / desc" }
}
}
My last task is about sorting them alphabetically, but I dont seem to be able to sort by a keyword field (asc/desc) using an aggregator.

In ES 6.0, you could do this. Note that size limits how many are returned, and the more you request the more expensive the query will be to execute. So if you really need many thousands you will probably want to try a different approach. Probably something where you created a separate rolled up index for products that you could search/sort instead of trying to do it through aggregations.
GET /variants/_search
{
"size": 0,
"aggs" : {
"product_name" : {
"terms" : {
"field" : "product_name",
"size": 1000,
"order" : { "_key" : "asc" }
}
}
}
}
Reference:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#search-aggregations-bucket-terms-aggregation-order

Related

Elastic search scroll aggregations

I am trying to get a unique document count in an index based on an id property via elastic search web API. The thing is that I have millions of entries. How can I scroll on an aggregation ?
this is the url:
http://my.servers.ip:9200/index_name/doc_type/_search?scroll=1m
And this is the body:
{
"_source": "false",
"aggs" : {
"Ids" : {
"terms" : {
"field" : "somePropertyIWantToGoupBy",
"size" : 100
},
"aggs": {
"unique": {
"cardinality": {
"field": "someCategoryIWantUniqueCount"
}
}
}
}
},"size":0
}
I get the scrollId , but on the next call with scroll id I'll get the next 100 aggregations, instead I get an empty result set.
Is it possible to scroll on aggregations ?
What am I doing wrong ?
There's no way to paginate terms aggregation.
You should use Composite Aggregation but it's a beta aggregation and might be removed or changed in the future...

ElasticSearch Stored Procedure

Suppose I have documents with structure:
{
name:"some_name",
salary:INT_VAL,
date:YYYY.dd.MMTHH:mm:sssZ,
num_of_months:INT_VAL
}
And now I want to make a query to elastic, that would select top 10 documents that sorted by criteria salary*num_of_months.
How can I do this?
And what if I want to sort by criteria with some logic inside, sth. like
if (num_of_months < 5)
then criteria = salary*100 ;
elseif criteria = salary*200;
endif
sort_by_criteria()
Doing a Sort with a script will enable you to perform calculations on the data and return it in the right order:
GET /myindex/mytype/_search?pretty
{
"sort" :{
"_script" : {
"type" : "number",
"lang": "expression",
"script" : "doc['num_of_months'].value < 5 ? doc['salary']*100 : doc['salary']*200",
"order":"desc"
}
},
"size": 10
}
Note the use of the ternary operator to do the If statement.

Multiple Joins In CouchDB

I am currently trying to figure out if CouchDB is suitable for my use-case and if so, how. I have a situation similar to the following:
First set of documents (let's call them companies):
{
"_id" : 1,
"name" : "Foo"
}
{
"_id" : 2,
"name" : "Bar"
}
{
"_id" : 3,
"name" : "Baz"
}
Second set of documents (let's call them projects):
{
"_id" : 4,
"name" : "FooProject1",
"company" : 1
}
{
"_id" : 5,
"name" : "FooProject2",
"company" : 1
}
...
{
"_id" : 100,
"name" : "BazProject2",
"company" : 3
}
Third set of documents (let's call them incidents):
{
"_id" : "300",
"project" : 4,
"description" : "...",
"cost" : 200
}
{
"_id" : "301",
"project" : 4,
"description" : "...",
"cost" : 400
}
{
"_id" : "302",
"project" : 4,
"description" : "...",
"cost" : 500
}
...
So in short every company has multiple projects, and every project can have multiple incidents. One reason I model the data is, that I come mainly from a SQL background, so the modelling may be completely unsuitable. The second reason is, that I would like to add new incidents very easily by just using the REST-API provided by couchdb. So the incidents have to be single documents.
However, I now would like to get a view that would allow me to calculate the total cost for each company. I can easily define a view using map-reduce and linked documents which get's me the total amount per project. However once I am at the project level I cannot get any further to the level of the company.
Is this possible at all using couchDb? This kind of summarising data sounds like a perfect use case for map-reduce. In SQL I would just do a three-table join, but it seems like in couchDb the best I can get is two-table joins.
As mentioned you cannot do joins in CouchDb but this isn't a limitation, this is an invitation to both think about your problems and approach them differently. The correct way to do this in CouchDb is to define data structures called for example : IncidentReference composed of :
The project id
And the company id
That way your data would look like :
{
"_id" : "301",
"project" : 4,
"description" : "...",
"cost" : 400,
"reference" : {
"projectId" : 1,
"companyId" : 2
}
}
This is just fine. Once you have that, you can play with Map/Reduce to achieve whatever you want easily. Generally speaking, you need to think about the way you are going to query your data.

Mongoid max and embeded collections

I have a Collection Report embeds submissions
class Report
embeds_many :submissions
class Submission
embedded_in :report
field :date_submitted, type: TimeWithZone
field :mistakes, type: Integer
I am trying to create a scope on Report
I want to add a scope query with two parts
get the latest submission (given by max date_submitted) that also has zero mistakes
I can create a scope for the mistakes part, but cannot work out how to get the latest submission
scope :my_scope, where("submissions.mistakes" => 0)
So this report would be returned as it's last enter in submissions has zero mistakes
Report
"submissions" : [
{
"date_submitted" : ISODate("2014-01-28T13:00:00Z"),
"mistakes" : 11
},
{
"date_submitted" : ISODate("2014-03-08T13:00:00Z"),
"mistakes" : 0
}
]
where this one wouldn't be returned
Report
"submissions" : [
{
"date_submitted" : ISODate("2014-01-28T13:00:00Z"),
"mistakes" : 0
},
{
"date_submitted" : ISODate("2014-03-08T13:00:00Z"),
"mistakes" : 11
}
]
This is because you are not filtering the element of the embedded array but the document that contains that element.
There could be an $elemMatch clause here which allows you to combine the conditions on a single element. But find does not have any operation for getting the max value as it were. This is not to be confused with the $max query modifier, which actually clips the index in use to not search beyond those bounds.
So here you use aggregate:
db.collection.aggregate([
// Optionally query to match and filter your documents.
//{ "$match: { /* Same conditions as find */ } },
// Unwind the array
{ "$unwind": "$submissions" },
// Filter all but 0 mistakes
{ "$match": { "submissions.mistakes": 0 } },
// Group the results, taking the max entry and presuming by document `_id`
{ "$group": {
"_id": "$_id",
"date_submitted": { "$max": "$submissions.date_submitted" }
}}
])
That is the general process for filtering the elements of an array. You may look into your driver implementation of aggregate, but the form is always the pipeline represented as an array of documents (hashes) in this form. Possibly using the moped form for getting the collection method. So something like:
Report.collection.aggregate([ /* stages */ ])
For more information on returning the original document form if that is what your requirement is then see here.

MongoDB/Mongoid: search for documents matching first item in array

I have a document that has an array:
{
_id: ObjectId("515e10784903724d72000003"),
association_chain: [
{
name: "Product",
id: ObjectId("4e1e2cdd9a86652647000003")
}
],
//...
}
I'm trying to search the collection for documents where the name of the first item in the association_chain array matches a given value.
How can I do this using Mongoid? Or if you only know how this can be done using MongoDB, if you post an example, then I could probably figure out how to do it with Mongoid.
Use the positional operator. You can query the first element of an array with .0 (and the second with .1, etc).
> db.items.insert({association_chain: [{name: 'foo'}, {name: 'bar'}]})
> db.items.find({"association_chain.0.name": "foo"})
{ "_id" : ObjectId("516348865862b60b7b85d962"), "association_chain" : [ { "name" : "foo" }, { "name" : "bar" } ] }
You can see that the positional operator is in effect since searching for foo in the second element doesn't return a hit...
> db.items.find({"association_chain.1.name": "foo"})
>
...but searching for bar does.
> db.items.find({"association_chain.1.name": "bar"})
{ "_id" : ObjectId("516348865862b60b7b85d962"), "association_chain" : [ { "name" : "foo" }, { "name" : "bar" } ] }
You can even index this specific field without indexing all the names of all the association chain documents:
> db.items.ensureIndex({"association_chain.0.name": 1})
> db.items.find({"association_chain.0.name": "foo"}).explain()
{
"cursor" : "BtreeCursor association_chain.0.name_1",
"nscanned" : 1,
...
}
> db.items.find({"association_chain.1.name": "foo"}).explain()
{
"cursor" : "BasicCursor",
"nscanned" : 3,
...
}
Two ways to do this:
1) if you already know that you're only interested in the first product name appearing in "association_chain", then this is better:
db.items.find("association_chain.0.name":"something")
Please note that this does not return all items, which mention the desired product, but only those which mention it in the first position of the 'association_chain' array.
If you want to do this, then you'll need an index:
db.items.ensureIndex({"association_chain.0.name":1},{background:1})
2) if you are looking for a specific product, but you are not sure in which position of the association_chain it appears, then do this:
With the MongoDB shell you can access any hash key inside a nested structure with the '.' dot operator! Please note that this is independent of how deeply that key is nested in the record (isn't that cool?)
You can do a find on an embedded array of hashes like this:
db.items.find("association_chain.name":"something")
This returns all records in the collection which contain the desired product mentioned anywhere in the association_array.
If you want to do this, you should make sure that you have an index:
db.items.ensureIndex({"association_chain.name":1},{background: 1})
See "Dot Notation" on this page: http://docs.mongodb.org/manual/core/document/
You can do this with the aggregation framework. In the mongo shell run a query that unwinds the documents so you have a document per array element (with duplicated data in the other fields), then group by id and any other field you want to include, plus the array with the operator $first. Then just include the $match operator to filter by name or mongoid.
Here's the query to match by the first product name:
db.foo.aggregate([
{ $unwind:"$association_chain"
},
{
$group : {
"_id" : {
"_id" : "$_id",
"other" : "$other"
},
"association_chain" : {
$first : "$association_chain"
}
}
},
{ $match:{ "association_chain.name":"Product"}
}
])
Here's how to query for the first product by mongoid:
db.foo.aggregate([
{ $unwind:"$association_chain"
},
{
$group : {
"_id" : {
"_id" : "$_id",
"other" : "$other"
},
"association_chain" : {
$first : "$association_chain"
}
}
},
{ $match:{ "association_chain.id":ObjectId("4e1e2cdd9a86652647000007")}
}
])

Resources