Elastic search scroll aggregations - url

I am trying to get a unique document count in an index based on an id property via elastic search web API. The thing is that I have millions of entries. How can I scroll on an aggregation ?
this is the url:
http://my.servers.ip:9200/index_name/doc_type/_search?scroll=1m
And this is the body:
{
"_source": "false",
"aggs" : {
"Ids" : {
"terms" : {
"field" : "somePropertyIWantToGoupBy",
"size" : 100
},
"aggs": {
"unique": {
"cardinality": {
"field": "someCategoryIWantUniqueCount"
}
}
}
}
},"size":0
}
I get the scrollId , but on the next call with scroll id I'll get the next 100 aggregations, instead I get an empty result set.
Is it possible to scroll on aggregations ?
What am I doing wrong ?

There's no way to paginate terms aggregation.
You should use Composite Aggregation but it's a beta aggregation and might be removed or changed in the future...

Related

ElasticSearch Aggregator with sorting by text/keyword

I have elasticsearch set up for searching across a products catalog's variants. Basically where:
Product has_many variants
Variant belongs_to product
And the variant index json / mapping contains the product name.
I am trying to search variants, grouped by product id, bucket size of 1. I am able to do it and sort by min price, max price, etc.
This works:
POST /variants/_search?size=0
{
"aggs" : {
"min_price" : { "min" : { "field" : "price" } }
}
}
This is (sort of) what I need next:
POST /variants/_search?size=0
{
"aggs" : {
"product_name" : { "sort by product_name asc / desc" }
}
}
My last task is about sorting them alphabetically, but I dont seem to be able to sort by a keyword field (asc/desc) using an aggregator.
In ES 6.0, you could do this. Note that size limits how many are returned, and the more you request the more expensive the query will be to execute. So if you really need many thousands you will probably want to try a different approach. Probably something where you created a separate rolled up index for products that you could search/sort instead of trying to do it through aggregations.
GET /variants/_search
{
"size": 0,
"aggs" : {
"product_name" : {
"terms" : {
"field" : "product_name",
"size": 1000,
"order" : { "_key" : "asc" }
}
}
}
}
Reference:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#search-aggregations-bucket-terms-aggregation-order

Querying nested elements in Firebase

I am trying to make a query to a Firebase collection using swift.
My Collection is like this :
{
"networks" : {
"-KF9zQYGA1r_JiFam8gd" : {
"category" : "friends",
"members" : {
"facebook:10154023600052295" : true
},
"name" : "My friends",
"owner" : "facebook:10154023600052295",
"picture" : "https://my.img/img.png"
},
"-KF9zQYZX6p_ra34DTGh" : {
"category" : "friends",
"members" : {
"tototata" : true
},
"name" : "My friends2",
"owner" : "facebook:10154023600052295",
"picture" : "https://my.img/img.png"
}
}
and my query is like that :
let networksRef = ref.childByAppendingPath("networks")
.queryOrderedByChild("members")
.queryEqualToValue(true, childKey: uid)
Then I am populating a TableView using FirebaseUI.
the thing is that query doesn't return anything, I also tried using this kind of query :
let networksRef = ref.childByAppendingPath("networks")
.queryOrderedByChild("members")
.queryStartingAtValue(true, childKey: uid)
.queryEndingAtValue(true, childKey: uid)
And to be clear "uid" is the node name nested in the "members" node with true as value. (As on the picture)
I can't get all the nodes without filtering because of the amount of data to be downloaded.
I'm stuck on that for a day, and I'm going crazy.. :-)
If someone can help ?
The correct syntax would be:
let networksRef = ref.childByAppendingPath("networks")
.queryOrderedByChild("members/\(uid)")
.queryEqualToValue(true)
But note that you'll need an index for each uid for this to work efficiently:
{
"rules": {
"networks": {
".indexOn": ["facebook:10154023600052295", "tototata"]
}
}
}
Without these indexes, the Firebase client will download all data and do the filtering locally. The result will be the same, but it'll consume a lot of unneeded bandwidth.
A proper solution for this requires that you change your data structure around. In Firebase (and most other NoSQL databases) you often end up modeling the data in the way that your application wants to access it. Since you are looking for the networks that a user is part of, you should store the networks per user:
"networks_per_uid": {
"facebook:10154023600052295": {
"-KF9zQYGA1r_JiFam8gd": true
},
"tototata": {
"-KF9zQYZX6p_ra34DTGh": true
},
}
Adding this so-called index to your data structure is called denormalizing and is introduces in this blog post, the documentation on data structuring and this great article about NoSQL data modeling.

ElasticSearch writing query for priority search

I am new to elastisearch and I just set it up and tried default search. I am using elasticsearch rails gem. I need to write custom query with priority search (some fields in table are more important then others, etc. title, updated_at in last 6 months...). I tried to find explanation or tutorial for how to do this but nothing seems understandable. Can anyone help me with this, soon better.
Never having used the ruby/elasticsearch integration, it doesn't seem too hard... The docs here show that you'd want to do something like this:
client.search index: 'my-index', body: { query: { match: { title: 'test' } } }
To do a basic search.
The ES documentation here shows how to do a field boosted query:
{
"multi_match" : {
"query" : "this is a test",
"fields" : [ "subject^3", "message" ]
}
}
Putting it all together, you'd do something like this:
client.search index: 'my-index', body: { query: { multi_match : {
query : "this is a test",
fields : [ "subject^3", "message" ]
} } }
That will allow you to search/boost on fields -- in the above case, the subject field is given 3 times the score of the message field.
There is a very good blog post about how to do advanced scoring. Part of it shows an example of adjusting the score based on a date:
...
"filter": {
"exists": {
"field": "date"
}
},
"script": "(0.08 / ((3.16*pow(10,-11)) * abs(now - doc['date'].date.getMillis()) + 0.05)) + 1.0"
...
I have done in php, Never used the gem from Ruby on rails. Here you can give the priority for the fields using the caret (^) notation.
Example:- Suppose if we have fields namely name, email, message and address in table and the priority should be given for the name and message then you can write as below
> { "multi_match" : {
> "query" : "this is a test",
> "fields" : [ "name^3", "message^2".... ] } }
Here name has 3 times higher priority than other fields and message has got 2 times higher priority than other fields.

Mongoid max and embeded collections

I have a Collection Report embeds submissions
class Report
embeds_many :submissions
class Submission
embedded_in :report
field :date_submitted, type: TimeWithZone
field :mistakes, type: Integer
I am trying to create a scope on Report
I want to add a scope query with two parts
get the latest submission (given by max date_submitted) that also has zero mistakes
I can create a scope for the mistakes part, but cannot work out how to get the latest submission
scope :my_scope, where("submissions.mistakes" => 0)
So this report would be returned as it's last enter in submissions has zero mistakes
Report
"submissions" : [
{
"date_submitted" : ISODate("2014-01-28T13:00:00Z"),
"mistakes" : 11
},
{
"date_submitted" : ISODate("2014-03-08T13:00:00Z"),
"mistakes" : 0
}
]
where this one wouldn't be returned
Report
"submissions" : [
{
"date_submitted" : ISODate("2014-01-28T13:00:00Z"),
"mistakes" : 0
},
{
"date_submitted" : ISODate("2014-03-08T13:00:00Z"),
"mistakes" : 11
}
]
This is because you are not filtering the element of the embedded array but the document that contains that element.
There could be an $elemMatch clause here which allows you to combine the conditions on a single element. But find does not have any operation for getting the max value as it were. This is not to be confused with the $max query modifier, which actually clips the index in use to not search beyond those bounds.
So here you use aggregate:
db.collection.aggregate([
// Optionally query to match and filter your documents.
//{ "$match: { /* Same conditions as find */ } },
// Unwind the array
{ "$unwind": "$submissions" },
// Filter all but 0 mistakes
{ "$match": { "submissions.mistakes": 0 } },
// Group the results, taking the max entry and presuming by document `_id`
{ "$group": {
"_id": "$_id",
"date_submitted": { "$max": "$submissions.date_submitted" }
}}
])
That is the general process for filtering the elements of an array. You may look into your driver implementation of aggregate, but the form is always the pipeline represented as an array of documents (hashes) in this form. Possibly using the moped form for getting the collection method. So something like:
Report.collection.aggregate([ /* stages */ ])
For more information on returning the original document form if that is what your requirement is then see here.

MongoDB/Mongoid: search for documents matching first item in array

I have a document that has an array:
{
_id: ObjectId("515e10784903724d72000003"),
association_chain: [
{
name: "Product",
id: ObjectId("4e1e2cdd9a86652647000003")
}
],
//...
}
I'm trying to search the collection for documents where the name of the first item in the association_chain array matches a given value.
How can I do this using Mongoid? Or if you only know how this can be done using MongoDB, if you post an example, then I could probably figure out how to do it with Mongoid.
Use the positional operator. You can query the first element of an array with .0 (and the second with .1, etc).
> db.items.insert({association_chain: [{name: 'foo'}, {name: 'bar'}]})
> db.items.find({"association_chain.0.name": "foo"})
{ "_id" : ObjectId("516348865862b60b7b85d962"), "association_chain" : [ { "name" : "foo" }, { "name" : "bar" } ] }
You can see that the positional operator is in effect since searching for foo in the second element doesn't return a hit...
> db.items.find({"association_chain.1.name": "foo"})
>
...but searching for bar does.
> db.items.find({"association_chain.1.name": "bar"})
{ "_id" : ObjectId("516348865862b60b7b85d962"), "association_chain" : [ { "name" : "foo" }, { "name" : "bar" } ] }
You can even index this specific field without indexing all the names of all the association chain documents:
> db.items.ensureIndex({"association_chain.0.name": 1})
> db.items.find({"association_chain.0.name": "foo"}).explain()
{
"cursor" : "BtreeCursor association_chain.0.name_1",
"nscanned" : 1,
...
}
> db.items.find({"association_chain.1.name": "foo"}).explain()
{
"cursor" : "BasicCursor",
"nscanned" : 3,
...
}
Two ways to do this:
1) if you already know that you're only interested in the first product name appearing in "association_chain", then this is better:
db.items.find("association_chain.0.name":"something")
Please note that this does not return all items, which mention the desired product, but only those which mention it in the first position of the 'association_chain' array.
If you want to do this, then you'll need an index:
db.items.ensureIndex({"association_chain.0.name":1},{background:1})
2) if you are looking for a specific product, but you are not sure in which position of the association_chain it appears, then do this:
With the MongoDB shell you can access any hash key inside a nested structure with the '.' dot operator! Please note that this is independent of how deeply that key is nested in the record (isn't that cool?)
You can do a find on an embedded array of hashes like this:
db.items.find("association_chain.name":"something")
This returns all records in the collection which contain the desired product mentioned anywhere in the association_array.
If you want to do this, you should make sure that you have an index:
db.items.ensureIndex({"association_chain.name":1},{background: 1})
See "Dot Notation" on this page: http://docs.mongodb.org/manual/core/document/
You can do this with the aggregation framework. In the mongo shell run a query that unwinds the documents so you have a document per array element (with duplicated data in the other fields), then group by id and any other field you want to include, plus the array with the operator $first. Then just include the $match operator to filter by name or mongoid.
Here's the query to match by the first product name:
db.foo.aggregate([
{ $unwind:"$association_chain"
},
{
$group : {
"_id" : {
"_id" : "$_id",
"other" : "$other"
},
"association_chain" : {
$first : "$association_chain"
}
}
},
{ $match:{ "association_chain.name":"Product"}
}
])
Here's how to query for the first product by mongoid:
db.foo.aggregate([
{ $unwind:"$association_chain"
},
{
$group : {
"_id" : {
"_id" : "$_id",
"other" : "$other"
},
"association_chain" : {
$first : "$association_chain"
}
}
},
{ $match:{ "association_chain.id":ObjectId("4e1e2cdd9a86652647000007")}
}
])

Resources