I have a Collection Report embeds submissions
class Report
embeds_many :submissions
class Submission
embedded_in :report
field :date_submitted, type: TimeWithZone
field :mistakes, type: Integer
I am trying to create a scope on Report
I want to add a scope query with two parts
get the latest submission (given by max date_submitted) that also has zero mistakes
I can create a scope for the mistakes part, but cannot work out how to get the latest submission
scope :my_scope, where("submissions.mistakes" => 0)
So this report would be returned as it's last enter in submissions has zero mistakes
Report
"submissions" : [
{
"date_submitted" : ISODate("2014-01-28T13:00:00Z"),
"mistakes" : 11
},
{
"date_submitted" : ISODate("2014-03-08T13:00:00Z"),
"mistakes" : 0
}
]
where this one wouldn't be returned
Report
"submissions" : [
{
"date_submitted" : ISODate("2014-01-28T13:00:00Z"),
"mistakes" : 0
},
{
"date_submitted" : ISODate("2014-03-08T13:00:00Z"),
"mistakes" : 11
}
]
This is because you are not filtering the element of the embedded array but the document that contains that element.
There could be an $elemMatch clause here which allows you to combine the conditions on a single element. But find does not have any operation for getting the max value as it were. This is not to be confused with the $max query modifier, which actually clips the index in use to not search beyond those bounds.
So here you use aggregate:
db.collection.aggregate([
// Optionally query to match and filter your documents.
//{ "$match: { /* Same conditions as find */ } },
// Unwind the array
{ "$unwind": "$submissions" },
// Filter all but 0 mistakes
{ "$match": { "submissions.mistakes": 0 } },
// Group the results, taking the max entry and presuming by document `_id`
{ "$group": {
"_id": "$_id",
"date_submitted": { "$max": "$submissions.date_submitted" }
}}
])
That is the general process for filtering the elements of an array. You may look into your driver implementation of aggregate, but the form is always the pipeline represented as an array of documents (hashes) in this form. Possibly using the moped form for getting the collection method. So something like:
Report.collection.aggregate([ /* stages */ ])
For more information on returning the original document form if that is what your requirement is then see here.
Related
I am trying to get a unique document count in an index based on an id property via elastic search web API. The thing is that I have millions of entries. How can I scroll on an aggregation ?
this is the url:
http://my.servers.ip:9200/index_name/doc_type/_search?scroll=1m
And this is the body:
{
"_source": "false",
"aggs" : {
"Ids" : {
"terms" : {
"field" : "somePropertyIWantToGoupBy",
"size" : 100
},
"aggs": {
"unique": {
"cardinality": {
"field": "someCategoryIWantUniqueCount"
}
}
}
}
},"size":0
}
I get the scrollId , but on the next call with scroll id I'll get the next 100 aggregations, instead I get an empty result set.
Is it possible to scroll on aggregations ?
What am I doing wrong ?
There's no way to paginate terms aggregation.
You should use Composite Aggregation but it's a beta aggregation and might be removed or changed in the future...
I have elasticsearch set up for searching across a products catalog's variants. Basically where:
Product has_many variants
Variant belongs_to product
And the variant index json / mapping contains the product name.
I am trying to search variants, grouped by product id, bucket size of 1. I am able to do it and sort by min price, max price, etc.
This works:
POST /variants/_search?size=0
{
"aggs" : {
"min_price" : { "min" : { "field" : "price" } }
}
}
This is (sort of) what I need next:
POST /variants/_search?size=0
{
"aggs" : {
"product_name" : { "sort by product_name asc / desc" }
}
}
My last task is about sorting them alphabetically, but I dont seem to be able to sort by a keyword field (asc/desc) using an aggregator.
In ES 6.0, you could do this. Note that size limits how many are returned, and the more you request the more expensive the query will be to execute. So if you really need many thousands you will probably want to try a different approach. Probably something where you created a separate rolled up index for products that you could search/sort instead of trying to do it through aggregations.
GET /variants/_search
{
"size": 0,
"aggs" : {
"product_name" : {
"terms" : {
"field" : "product_name",
"size": 1000,
"order" : { "_key" : "asc" }
}
}
}
}
Reference:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#search-aggregations-bucket-terms-aggregation-order
I would like to filter array inside a filter. First I have a big array of Staff object (self.bookingSettings.staffs). Inside this array I have multiple object like this :
"staffs": [
{
"id": 1,
"name": "Brian",
"services": [
{
"id": 1
},
{
"id": 2
},
{
"id": 3
},
{
"id": 4
}
],
"pos": 1
},...
I would like to filter this array in order to have only services with id = 3.
I succeed to have if first object is equal to 3 with this code :
self.bookingSettings.staffs.filter({ $0.services.first?.id == self.bookingService.id })
but that takes only the first item.
I think I have to filter inside my filter function, something like this to loop over all object inside services :
self.bookingSettings.staffs.filter({ $0.services.filter({ $0.id == self.bookingService.id }) })
but I've the following error: Cannot convert value of type [BookingService] to closure result type Bool.
Is this a good idea ? How can I achieve this ?
You could use filter, which would look something like this:
self.bookingSettings.staffs.filter {
!$0.services.filter{ $0.id == self.bookingService.id }.isEmpty
}
This code is constructing an entire array of filtered results, only to check if its empty and immediately discard it. Since filter returns all items that match the predicate from the list, it won't stop after it finds a match (which is really what you're looking for). So even if the first element out of a list of a million elements matches, it'll still go on to check 999,999 more elements. If the other 999,999 elements also match, then they will all be copied into filter's result. That's silly, and can use way more CPU and RAM than necessary in this case.
You just need contains(where:):
self.bookingSettings.staffs.filter {
$0.services.contains(where: { $0.id == self.bookingService.id })
}
contains(where:) short-circuits, meaning that it won't keep checking elements after a match is found. It stops and returns true as soon as find a match. It also doesn't both copying matching elements into a new list.
I am new to elastisearch and I just set it up and tried default search. I am using elasticsearch rails gem. I need to write custom query with priority search (some fields in table are more important then others, etc. title, updated_at in last 6 months...). I tried to find explanation or tutorial for how to do this but nothing seems understandable. Can anyone help me with this, soon better.
Never having used the ruby/elasticsearch integration, it doesn't seem too hard... The docs here show that you'd want to do something like this:
client.search index: 'my-index', body: { query: { match: { title: 'test' } } }
To do a basic search.
The ES documentation here shows how to do a field boosted query:
{
"multi_match" : {
"query" : "this is a test",
"fields" : [ "subject^3", "message" ]
}
}
Putting it all together, you'd do something like this:
client.search index: 'my-index', body: { query: { multi_match : {
query : "this is a test",
fields : [ "subject^3", "message" ]
} } }
That will allow you to search/boost on fields -- in the above case, the subject field is given 3 times the score of the message field.
There is a very good blog post about how to do advanced scoring. Part of it shows an example of adjusting the score based on a date:
...
"filter": {
"exists": {
"field": "date"
}
},
"script": "(0.08 / ((3.16*pow(10,-11)) * abs(now - doc['date'].date.getMillis()) + 0.05)) + 1.0"
...
I have done in php, Never used the gem from Ruby on rails. Here you can give the priority for the fields using the caret (^) notation.
Example:- Suppose if we have fields namely name, email, message and address in table and the priority should be given for the name and message then you can write as below
> { "multi_match" : {
> "query" : "this is a test",
> "fields" : [ "name^3", "message^2".... ] } }
Here name has 3 times higher priority than other fields and message has got 2 times higher priority than other fields.
I have a Record model with many dynamic attributes. I want to make a request to the model an send the response as JSON to the client. But i want to exclude fields like _id and all foreign_keys in this model.
I found an interessting answer how to exclude the values of some keys: How do I exclude fields from an embedded document in Mongoid?, but the keys in the response still exists.
I got:
{
"_id": 1,
"name": "tom"
}
And the without method makes:
{
"_id": nil,
"name": "tom"
}
But i want:
{
"name": "tom"
}
Is it possible to remove or exclude some keys and the values from the result?
You don't want to remove fields from the mongoid document, what you want to do is remove fields from the generated json.
In your controller, do
render :json => #model.to_json(:except => :_id)
Documentation for the to_json method http://apidock.com/rails/ActiveRecord/Serialization/to_json
taken from the mongodb documentation at: http://docs.mongodb.org/manual/reference/method/db.collection.find/
Exclude Certain Fields from the Result Set
The following example selects documents that match a selection criteria and excludes a set of fields from the resulting documents:
db.products.find( { qty: { $gt: 25 } }, { _id: 0, qty: 0 } )
The query returns all the documents from the collection products where qty is greater than 25. The documents in the result set will contain all fields except the _id and qty fields, as in the following:
{ "item" : "pencil", "type" : "no.2" }
{ "item" : "bottle", "type" : "blue" }
{ "item" : "paper" }
i suppose mongoid is setting the _id attribute to nil since mongoid models have a defined set of attributes (even if they are dynamic, _id, _type etc are defined). maybe you can try it with the mongodb driver.
but i think RedXVII answer is the more practical way to go