How to filter documents by computed embedded field - ruby-on-rails

I have the following schema
class User
include Mongoid::Timestamps::CamelCaseCreated
include Mongoid::Timestamps::CamelCaseUpdated
embeds_many :intervals
field :name, type: String
# ...
end
class Interval
embedded_in :user
field :startTime, type: DateTime
field :endTime, type: DateTime
# ...
end
I need to get users where at least one interval has duration greater or equal to 4 hours.
So for the following data I should receive only Walter White
users = [
{
name: 'Walter White',
intervals: [
{ startTime: '2021-01-27T08:00:00Z', endTime: '2021-01-27T16:00:00Z' }
]
},
{
name: 'Jesse Pinkman',
intervals: [
{ startTime: '2021-01-27T08:00:00Z', endTime: '2021-01-27T10:00:00Z' }
{ startTime: '2021-01-27T10:00:00Z', endTime: '2021-01-27T13:00:00Z' }
]
},
{
name: 'Saul Goodman',
intervals: []
}
]
I know that on a document itself I could filter by computed field like this
User.collection.aggregate([
{ '$project' => { 'durationInMilliseconds' => { '$subtract' => ['$updatedAt', '$createdAt'] } } },
{ '$match' => { 'durationInMilliseconds' => { '$gte' => four_hours } } }
])
# or
User.where('$expr' => { '$gt' => [{ '$subtract': ['$updatedAt', '$createdAt']}, four_hours] })
I was hoping that those would work with $elemMatch, but I receive the following errors
User.where(
intervals: {
'$elemMatch' => [
{ '$project' => { 'durationInMilliseconds' => { '$subtract' => ['$endTime', '$startTime'] } } },
{ '$match' => { 'durationInMilliseconds' => { '$gt' => four_hours } } }
]
}
)
=> $elemMatch needs an Object
User.where(
intervals: {
'$elemMatch' => {
{ '$expr' => { '$max' => some_logic_here } },
}
}
)
=> $expr can only be applied to the top-level document
Is there a way to make it work with the embedded collection?
versions that I'm using:
gem mongo (2.10.2)
gem mongoid (6.1.1)
mongo --version - 4.2.9

Related

logstash change type format

I have ror application that in admin dashboard, admin could observe the location of his employee, in my case, I use elk to gather information of employees that contains latitude and longitude and which send to my map based on his movement, My problem is, I have a template that logstash based on template create daily index but recently I found every field in my index that have type changed to text when indexed created.
this is my json that logstash reads:
{"driver_id": 31,"driver_email": "ankith.ravindran#mailinator.com","location": {"latitude": "-35.2824767","longitude": "149.1326453"},"created_at": "2021-06-29 14:28:47", "required_matches": 1, "type": "location"}
this is my logstash.conf file:
input {
file {
path => ["/usr/share/logstash/MPD_LOCATION/*",
"/usr/share/logstash/MPD_LOCATION/*/*",
"/usr/share/logstash/MPD_LOCATION/*/*/*",
"/usr/share/logstash/MPD_LOCATION/*/*/*/*",
"/usr/share/logstash/MPD_LOCATION/*/*/*/*/*"]
start_position => "beginning"
type => "json"
sincedb_path => "/dev/null"
}
}
filter {
mutate {
gsub => ["message","/}+({)/", "}::{"]
}
mutate {
gsub => ["message","/}+( )/", "}::"]
}
split {
field => "message"
terminator => "::"
}
json { source => "message" }
mutate {
add_field => { "uuid" => "D%{driver_id}T%{created_at}" }
rename => {
"[location][latitude]" => "[location][lat]"
"[location][longitude]" => "[location][lon]"
}
convert => {
"[location][lat]" => "float"
"[location][lon]" => "float"
}
}
}
output {
if ([type] == "location") {
elasticsearch {
hosts => "http://elasticsearch:9200"
index => "live_locations_%{+YYYY_MM_dd}"
# manage_template => true
template => "/usr/share/logstash/Template/live_locations.json"
template_name => "live_locations"
# template_overwrite => true
document_id => "%{uuid}"
}
} else if ([type] == "app_info") {
elasticsearch {
hosts => "http://elasticsearch:9200"
index => "app_info_%{+YYYY_MM_dd}"
document_id => "%{uuid}"
}
}
stdout { codec => rubydebug }
}
this is my template file:
{
"settings": {
"index": {
"number_of_shards": 5,
"number_of_replicas": 1
}
},
"mappings": {
"properties": {
"driver_id": { "type": "integer" },
"email": { "type": "text" },
"location": { "type": "geo_point" },
"app-platform": { "type": "text" },
"app-version": { "type": "text" },
"created_at": { "type": "date", "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"},
"required_matches": { "type": "integer" }
}
}
}
for example, I defined type of created_at , date but when index created this field return as text and I can't understand what happened or field of location it's return float so I could not use my index as geo_point, I have to add I use elk in the version of 7.13 and used on docker.
Updated : I have two types of JSON that one of them just returns the location of the employee the second of them just returns app_version and app_platform of the employee that used.
Updated 2 : I change my input from logstash to filebeat but I still have the same problem.

Failed to parse date field [0] with format [MMM, YY] with elastic search 5.0

I am trying to get the date parsed into a string format as month and numerical year format like "JAN, 92". My mapping is as below:
size" => 0,
"query" => {
"bool" => {
"must" => [
{
"term" => {
"checkin_progress_for" => {
"value" => "Goal"
}
}
},
{
"term" => {
"goal_owner_id" => {
"value" => "#{current_user.access_key}"
}
}
}
]
}
},
"aggregations" => {
"chekins_over_time" => {
"range" => {
"field" => "checkin_at",
"format" => "MMM, YY",
"ranges" => [
{
"from" => "now-6M",
"to" => "now"
}
]
},
"aggs" => {
"checkins_monthly" => {
"date_histogram" => {
"field" => "checkin_at",
"format" => "MMM, YY",
"interval" => "month",
"min_doc_count" => 0,
"missing" => 0,
"extended_bounds" => {
"min" => "now-6M",
"max" => "now"
}
}
}
}
}
}
}
I throws the following error:
elasticsearch.transport.RemoteTransportException: [captia-america][127.0.0.1:9300][indices:data/read/search[phase/query]]
Caused by: elasticsearch.ElasticsearchParseException: failed to parse date field [0] with format [MMM, YY]
If I remove the {MMM, YY} and put the normal date format it works.
What could the solution to rectify this.Help appreciated.
Your checkins_monthly aggregation is a bit wrong. The missing part should have the same format for the date to use when the field is missing. A 0 is not actually a date.
For example:
"aggs": {
"checkins_monthly": {
"date_histogram": {
"field": "checkin_at",
"format": "MMM, YY",
"interval": "month",
"min_doc_count": 0,
"missing": "Jan, 17",
"extended_bounds": {
"min": "now-6M",
"max": "now"
}
}
}

MongoDB count by date AND to date

I have a User table with "created" date field. Now I want to know how many new users were created on each day last week, as well as the total users by the end of each day (since the beginning of time).
For example:
{
{
day: 27,
new_users: 5,
total_users: 100
}, {
day: 28,
new_users: 7,
total_users: 107
}, {
day: 29,
new_users: 2,
total_users: 109
}
}
I already got the new_users part by using simple grouping and summing. Code below is mongoid/Ruby.
results = User.collection.aggregate(
[{
"$match" => {
created: { "$gte" => 1.week.ago.beginning_of_day, "$lte" => Time.now }
}
},
{
"$group" => {
_id: {
year_joined: { "$year" => "$created" },
month_joined: { "$month" => "$created" },
day_joined: { "$dayOfMonth" => "$created" }
},
count: { "$sum" => 1 }
}
},
{
"$sort" => {"_id.year_joined" => 1, "_id.month_joined" => 1, "_id.day_joined" => 1}
}]
)
How do I also get the total_users in the results?
You're getting close with your attempt. One thing is that since you're starting out by filtering down to own the last week, you don't need to care about the year or month.
Note: I don't know ruby, so I wrote it as I would in the mongo shell.
{
"$match" => {
created: { "$gte" => 1.week.ago.beginning_of_day, "$lte" => Time.now }
}
},
{
$group: {
_id: {"$dayOfMonth": "$created" },
day: {"$dayOfMonth": "$created" },
newUsers: { "$sum": 1}
}
},
{
"$sort" => {"_id.year_joined" => 1, "_id.month_joined" => 1, "_id.day_joined" => 1}
}
That will get you the day and newUsers, but it wont get you the total. I don't actually that is possible in a single query. One alternative might be to give each user a unique number and just take the max of the numbers for that day.

Exclude nil values from ElasticSearch Aggregation

I was using this query to retrieve the most significant values:
keywords = Answer.search(
:size => 5,
:query => {
:match => {
:question_id => 32481
}
},
:aggregations => {
:keywords => {
:significant_terms => {
:field => 'text'
}
}
}
)
The field is :text, but it has nil values, so the answer is always:
2.1.2 :135 > keywords.map(&:text)
=> [nil, nil, nil, nil, nil]
I tried to add a filter, as the documentation suggests, but it gives me a parse error:
keywords = Answer.search(
:size => 5,
:query => {
:match => {
:question_id => 32481
},
:filtered => {
:filter => {
:exists => { :field => 'text' }
}
}
},
:aggregations => {
:keywords => {
:significant_terms => {
:field => 'text'
}
}
}
)
I've tried many combinations, with no success. How can I get only the valid text answers?
I believe your ES query should translate to something like this:
"size": 5,
"query": {
"filtered": {
"query": { "match": { "question_id" : 32481 } },
"filter": {
"exists": {
"field": "text"
}
}
}
},
"aggs": {
"keywords": {
"significant_terms": {
"field": "text"
}
}
}
meaning your "question_id" "match" should be enclosed in the "filtered" element.

mongo-ruby-driver will not create a new document on upsert when there is a custom _id

I want to upsert a document with the mongo-ruby-driver using something like the following-
id = "#{params[:id]}:#{Time.now.strftime("%y%m%d")}"
# db.collection('metrics').insert({'_id' => id})
db.collection('metrics').update(
{ '_id' => id },
{ '$inc' => { "hits" => 1 } },
{ 'upsert' => true }
)
Right now this will only update existing documents, and not create one if it doesn't already exist. The only way it will perform both actions is if I uncomment the insert() command above it.
If I use the mongo console and try and do this upsert directly (without the need for the insert() ) it works as expected.
You should use a symbol instead of string in params. This code works.
db.collection('metrics').update(
{ '_id' => id },
{ '$inc' => { "hits" => 1 } },
{ :upsert => true }
)
In fact, you can use symbols most everywhere. This also works:
db.collection(:metrics).update(
{ :_id => id },
{ :$inc => { :hits => 1 } },
{ :upsert => true }
)

Resources