I have a User table with "created" date field. Now I want to know how many new users were created on each day last week, as well as the total users by the end of each day (since the beginning of time).
For example:
{
{
day: 27,
new_users: 5,
total_users: 100
}, {
day: 28,
new_users: 7,
total_users: 107
}, {
day: 29,
new_users: 2,
total_users: 109
}
}
I already got the new_users part by using simple grouping and summing. Code below is mongoid/Ruby.
results = User.collection.aggregate(
[{
"$match" => {
created: { "$gte" => 1.week.ago.beginning_of_day, "$lte" => Time.now }
}
},
{
"$group" => {
_id: {
year_joined: { "$year" => "$created" },
month_joined: { "$month" => "$created" },
day_joined: { "$dayOfMonth" => "$created" }
},
count: { "$sum" => 1 }
}
},
{
"$sort" => {"_id.year_joined" => 1, "_id.month_joined" => 1, "_id.day_joined" => 1}
}]
)
How do I also get the total_users in the results?
You're getting close with your attempt. One thing is that since you're starting out by filtering down to own the last week, you don't need to care about the year or month.
Note: I don't know ruby, so I wrote it as I would in the mongo shell.
{
"$match" => {
created: { "$gte" => 1.week.ago.beginning_of_day, "$lte" => Time.now }
}
},
{
$group: {
_id: {"$dayOfMonth": "$created" },
day: {"$dayOfMonth": "$created" },
newUsers: { "$sum": 1}
}
},
{
"$sort" => {"_id.year_joined" => 1, "_id.month_joined" => 1, "_id.day_joined" => 1}
}
That will get you the day and newUsers, but it wont get you the total. I don't actually that is possible in a single query. One alternative might be to give each user a unique number and just take the max of the numbers for that day.
Related
I have the following schema
class User
include Mongoid::Timestamps::CamelCaseCreated
include Mongoid::Timestamps::CamelCaseUpdated
embeds_many :intervals
field :name, type: String
# ...
end
class Interval
embedded_in :user
field :startTime, type: DateTime
field :endTime, type: DateTime
# ...
end
I need to get users where at least one interval has duration greater or equal to 4 hours.
So for the following data I should receive only Walter White
users = [
{
name: 'Walter White',
intervals: [
{ startTime: '2021-01-27T08:00:00Z', endTime: '2021-01-27T16:00:00Z' }
]
},
{
name: 'Jesse Pinkman',
intervals: [
{ startTime: '2021-01-27T08:00:00Z', endTime: '2021-01-27T10:00:00Z' }
{ startTime: '2021-01-27T10:00:00Z', endTime: '2021-01-27T13:00:00Z' }
]
},
{
name: 'Saul Goodman',
intervals: []
}
]
I know that on a document itself I could filter by computed field like this
User.collection.aggregate([
{ '$project' => { 'durationInMilliseconds' => { '$subtract' => ['$updatedAt', '$createdAt'] } } },
{ '$match' => { 'durationInMilliseconds' => { '$gte' => four_hours } } }
])
# or
User.where('$expr' => { '$gt' => [{ '$subtract': ['$updatedAt', '$createdAt']}, four_hours] })
I was hoping that those would work with $elemMatch, but I receive the following errors
User.where(
intervals: {
'$elemMatch' => [
{ '$project' => { 'durationInMilliseconds' => { '$subtract' => ['$endTime', '$startTime'] } } },
{ '$match' => { 'durationInMilliseconds' => { '$gt' => four_hours } } }
]
}
)
=> $elemMatch needs an Object
User.where(
intervals: {
'$elemMatch' => {
{ '$expr' => { '$max' => some_logic_here } },
}
}
)
=> $expr can only be applied to the top-level document
Is there a way to make it work with the embedded collection?
versions that I'm using:
gem mongo (2.10.2)
gem mongoid (6.1.1)
mongo --version - 4.2.9
I am not sure that this is a valid question or not. I have started working on mongodb aggregation. I have to make a graph for the data on daily, weekly, monthly basis.
I am using "$dayOfMonth", "$week", "$month" to group by depending on the date provided. ex if from and to dates difference is less or equal to 6 I am grouping on daily basis using "$dayOfMonth",
If from and to dates difference is greater than 6 and less than 30 grouping is done of "$week" and if differece is greater than 30 then grouping is done on monthly basis "$month".
I am passing date in my "$match". Is it possible to push 0 as keys if the gouping is not present.
example - from_date = "01/01/2018" to_date = "30/6/2018"
so grouping will be done on month. and suppose if I dont have date for 3 and 4th & 5th month. I want to push 0 in the nested keys as the value.
output = [
{"_id": "01/01/2018", "counter":12},
{"_id": "01/02/2018", "counter": 15},
{"_id":"01/06/2018", counter: 10}
]
expected_output =
[
{"_id": "01/01/2018", "counter":12},
{"_id": "01/02/2018", "counter": 15},
{"_id":"01/03/2018", counter: 0},
{"_id":"01/04/2018", counter:0},
{"_id":"01/05/2018", counter: 0},
{"_id":"01/06/2018", counter: 10}
]
I am using Rails and Mongoid Gem.
Query That I am using
converted = Analytics::Conversion::PharmacyPrescription.collection.aggregate([
{ "$match" => {
"organisation_id" => org_id.to_s,
"date" => {
"$gte" => from_date,
"$lte" => to_date
},
"role_ids" => {"$in" => [role_id, "$role_ids"]}
}
},{
"$project" => {
"total_count" => 1,
"converted_count" => 1,
"not_converted_count" => 1,
"total_invoice_amount" => 1,
"user_id" => 1,
"facility_id" => 1,
"organisation_id" => 1,
"date" => 1,
}
},{
"$group" => {
"_id" => { "#{groupby}" => "$date" },
"total_count" => {"$sum" => "$total_count"},
"converted_count" => { "$sum" => "$converted_count" },
"not_converted_count" => { "$sum" => "$not_converted_count"},
}
}
]).to_a
The Aggregation Framework can only aggregate the documents you have. You are actually asking it to add groups for documents that do not exist, but it has no way to "know" which groups to add.
What I would do is run the query as you have it, and afterwards "spread" the date units according to the chosen granularity (in your example it will be 01/01/2018, 01/02/2018, 01/03/2018, 01/04/2018, 01/05/2018, 01/06/2018, and run a simple function which will add an entry for each missing unit.
I am trying to get the date parsed into a string format as month and numerical year format like "JAN, 92". My mapping is as below:
size" => 0,
"query" => {
"bool" => {
"must" => [
{
"term" => {
"checkin_progress_for" => {
"value" => "Goal"
}
}
},
{
"term" => {
"goal_owner_id" => {
"value" => "#{current_user.access_key}"
}
}
}
]
}
},
"aggregations" => {
"chekins_over_time" => {
"range" => {
"field" => "checkin_at",
"format" => "MMM, YY",
"ranges" => [
{
"from" => "now-6M",
"to" => "now"
}
]
},
"aggs" => {
"checkins_monthly" => {
"date_histogram" => {
"field" => "checkin_at",
"format" => "MMM, YY",
"interval" => "month",
"min_doc_count" => 0,
"missing" => 0,
"extended_bounds" => {
"min" => "now-6M",
"max" => "now"
}
}
}
}
}
}
}
I throws the following error:
elasticsearch.transport.RemoteTransportException: [captia-america][127.0.0.1:9300][indices:data/read/search[phase/query]]
Caused by: elasticsearch.ElasticsearchParseException: failed to parse date field [0] with format [MMM, YY]
If I remove the {MMM, YY} and put the normal date format it works.
What could the solution to rectify this.Help appreciated.
Your checkins_monthly aggregation is a bit wrong. The missing part should have the same format for the date to use when the field is missing. A 0 is not actually a date.
For example:
"aggs": {
"checkins_monthly": {
"date_histogram": {
"field": "checkin_at",
"format": "MMM, YY",
"interval": "month",
"min_doc_count": 0,
"missing": "Jan, 17",
"extended_bounds": {
"min": "now-6M",
"max": "now"
}
}
}
I have below query that I want to convert into Mongo then How can I
SELECT COUNT(*) AS count_all, DATE(created_at) AS date_created_at FROM
"TABLE" GROUP BY DATE(created_at)
and also please explain so that next time I can do myself.
You can try the following. I hope this will help you.
db."TABLE".group({
"key": {
"created_at": true
},
"initial": {
"count_all": 0
},
"reduce": function(obj, prev) {
if (true != null) if (true instanceof Array) prev.count_all += true.length;
else prev.count_all++;
}
});
In mongodb you can use the following aggregation pipeline:
db.collection.aggregate( [
{
$group: {
_id: "$created_at",
count_all: { $sum: 1 }
}
},
{
$project: {
_id: 0, date_created_at: "$_id", count_all: 1
}
}
])
which can then be converted to ruby syntax as:
project = {"$project" =>
{
"_id" => 0,
"date_created_at" => "$_id",
"count_all" => 1
}
}
group = { "$group" =>
{ "_id" => "$created_at", "count_all" => { "$sum" => 1 } }
}
Table.collection.aggregate([group,project])
For more examples, refer the docs
I have an Element model that belongs to User. I am trying to calculate the following hash: how many users have element count of 1, 2, 3, etc. The approach I take is to first generate a hash of {user -> num elements}, then I sort-of invert it using a second map-reduce.
Here's what I have so far:
Element.map_reduce(%Q{
emit(this.user_id, 1);
}, %Q{
function(key, values) {
return Array.sum(values);
}
}).out(inline: true).map_reduce(%Q{
if (this.value > 1) {
emit(this.value, this._id);
}
}, %Q{
function(element_count, user_ids) {
return user_ids.length;
}
}).out(inline: true)
This gives me an "undefined method `map_reduce'" error. I couldn't find the answer in the docs. Any help would be great.
I calculated the hash using aggregate instead mapreduce, first grouping by user, and then grouping again by elements count:
Element.collection.aggregate([
{
"$group" => {
"_id" => "$user_id", "elements_count" => {"$sum" => 1}
}
},
{
"$group" => {
"_id" => "$elements_count", "users_count" => {"$sum" => 1}
}
},
{ "$project" => {
"_id" => 0,
"users_count" => '$users',
"elements_count" => '$_id',
}
}
])
This returns the following array:
[
{"users_count"=>3, "elements_count"=>2},
{"users_count"=>4, "elements_count"=>3},
...
]
If needed it can also be sorted using $sort operator