I would like to build an histogram on time series stored as time tree in neo4j.
The data structures are event done by a user each has timestamp, say user purchases category.
What I need to have is the number of browsing on each category by each user between start and end time, with interval of (1 second to days)
My model feats graph db very nicely, as I read neo4j documentation I could not find any way to do it in one query, and I'm afraid that calling for each user would be very slow.
I am aware to cypher capabilities, but I have no idea how to create such query.
I am looking for something like this (not working)
MATCH startPath=(root)-[:`2010`]->()-[:`12`]->()-[:`31`]->(startLeaf),
endPath=(root)-[:`2011`]->()-[:`01`]->()-[:`03`]->(endLeaf),
valuePath=(startLeaf)-[:NEXT*0..]->(middle)-[:NEXT*0..]->(endLeaf),
vals=(middle)-[:VALUE]->(event)
WHERE root.name = 'Root'
RETURN event.name, count(*)
ORDER BY event.name ASC
GROUP BY event.timestamp % 1000*60*10 // 10 minutes histogram bar
Then I'd like to have a report, for how many users browse to each site category:
0-9 news 5, commerce 3 ; 10-19 news 6, commerce 19; 1 20-29 news 2, commerce 8;
Any idea if it is optional with neo4j time tree model?
if so how? :-)
Does this work?
MATCH
startPath=(root)-[:`2010`]->()-[:`12`]->()-[:`31`]->(startLeaf),
endPath=(root)-[:`2011`]->()-[:`01`]->()-[:`03`]->(endLeaf),
valuePath=(startLeaf)-[:NEXT*0..]->(middle)-[:NEXT*0..]->(endLeaf),
vals=(middle)-[:VALUE]->(event)
WHERE root.name = 'Root'
RETURN event.name, event.timestamp % 1000*60*10 AS slice, count(*)
ORDER BY slice ASC
Basically I just added the event.timestamp % 1000*60*10 into the return so that Neo4j will use that as a grouping criteria
Related
I have Users and Checkpoint tables, each User can make multiple Checkpoints per day
I want to aggregate how many Checkpoints had been taken each day in the past 6 months based on each user's starting point, to create a graph showing avarage user Checkpoints witin thier x months.
for example:
if user1 started on January 1st, user2 started on March 15th, and user3 started on July 6th those would each be considered day 1, I would want the data from each of their day 1 even though they occur at different periods of time.
The current query I came up with, but unfotunatily it returns data based on fixed time for all of the users.
SELECT dates.date AS date,
checkpoints_count,
checkpoints_users
FROM (SELECT DATE_TRUNC('DAY', ('2000-01-01'::DATE - offs))::DATE AS date
-- 180 = 6 month in days
FROM GENERATE_SERIES(0, 180) AS offs) AS dates
LEFT OUTER JOIN (
SELECT
checkpoints_date::DATE AS date,
COUNT(id) AS checkpoints_count,
COUNT(user_id) AS checkpoints_users
FROM checkpoints
WHERE user_id in (1, 2, 3)
AND checkpoints_date::DATE
BETWEEN '2000-01-01'::DATE AND '2000-06-01'::DATE
GROUP BY checkpoints_date::DATE
) AS ck
ON dates.date = ck.date
ORDER BY dates.date;
EDIT
Here is a working SQL example that works (If I understand what you are looking for. Your SQL seems really complicated for what you are asking but I'm not a SQL expert)...
SELECT t1.*
FROM checkpoints AS t1
WHERE t1.user_id IN (1, 2, 3)
AND t1.id = (SELECT t2.id
FROM checkpoints AS t2
WHERE t2.user_id = t1.user_id
ORDER BY t2.check_date ASC
LIMIT 1)
SQL FIDDLE here
Since this is tagged Ruby on Rails I'll give you a rails answer.
If you know your user IDs (or use some other query to get them you have:
user_ids = [1, 2, 3]
first_checkpoints = []
user_ids.each do |id|
first_checkpoints << Checkpoint.where(user_id: id).order(:date).first
end
#returns an array of the first checkpoint of each user in list
This assumes a column in the checkpoints table called date. You didn't give a table structure for the two tables so this answer is a bit general. There might be a pure ActiveRecord answer. But this will get you what you want.
JIRA has an excellent ability to search issues with workLog items created in specific date and by a specific user. For example:
worklogDate > 2017-04-01 AND worklogDate < 2017-05-01 AND worklogAuthor = some-user
In this search result I can see a column Time Spent - it's a total time spent on a task. How can I aggregate time from workLogs only for selected days? For example, we worked on the task in Mar and April. How to write JQL to calculate only April's time.
It is possible?
To get the time spent by the user by tasks, if you have access to the database, you can run this query:
select wl.timeworked, wl.worklogbody, wl.updateauthor, wl.updated,
u.display_name, ji.summary,
concat(concat(p.pkey,'-'),ji.issuenum) as IssueKey
from worklog wl
inner join cwd_user u
on wl.updateauthor = u.user_name
inner join jiraissue ji
on ji.id = wl.issueid
inner join project p
on (ji.project = p.id)
where issueid in (
select j.ID
from jiraissue j
inner join project p
on (j.project = p.id)
where u.user_name = 'userid')
Replace the userid with the userid of the person that submitted the worklog. Take note that for each JIRA ticket (issue), there can be multiple worklog submission by different user. This will give you every worklog submitted by the userid and it will also show you which ticket (issue) they are for. You can add in the date constraint in the where clause if you want to just query for specific timeframe. The unit of measurement for the timeworked column is in second as recorded by JIRA.
According to Time Since Issues Report if JIRA's Documentation, Time Spend is the amount of time spent on the issue. This is the aggregate amount of time that has been logged against this issue. Are you looking to aggregate the time of one person in different issues?
I have the following query, which calculates the average number of impressions across all teams for a given name and league:
#all_team_avg = NielsenData
.where('name = ? and league = ?', name, league)
.average('impressions')
.to_i
However, there can be multiple entries for each name/league/team combination. I need to modify the query to only average the most recent records by created_at.
With the help of this answer I came up with a query which gets the result that I need (I would replace the hard-coded WHERE clause with name and league in the application), but it seems excessively complicated and I have no idea how to translate it nicely into ActiveRecord:
SELECT avg(sub.impressions)
FROM (
WITH summary AS (
SELECT n.team,
n.name,
n.league,
n.impressions,
n.created_at,
ROW_NUMBER() OVER(PARTITION BY n.team
ORDER BY n.created_at DESC) AS rowcount
FROM nielsen_data n
WHERE n.name = 'Social Media - Twitter Followers'
AND n.league = 'National Football League'
)
SELECT s.*
FROM summary s
WHERE s.rowcount = 1) sub;
How can I rewrite this query using ActiveRecord or achieve the same result in a simpler way?
When all you have is a hammer, everything looks like a nail.
Sometimes, raw SQL is the best choice. You can do something like:
#all_team_avg = NielsenData.find_by_sql("...your_sql_statement_here...")
I have written a Rails 4 app that accepts and plots sensor data. Sometimes there are 10 points per hour (but this number is not fixed). I'm plotting the data and doing a simple query of Points.all to get all the data points.
In order to reduce the query size, I would like to only return one record per hour. It doesn't matter which record is returned. The first record each hour using the created_at field would be fine.
How do I construct a query to do this?
You can get first one, but maybe average value is better. All you need to do is to group it by hour. I am not 100% about sqlite syntax but something in this sense:
connection.execute("SELECT AVG(READING_VALUE) FROM POINTS GROUP BY STRFTIME('%Y%m%d%H0', CREATED_AT)")
Inspired from this answer, here is an alternative which retrieves the latest record in that hour (if you don't want to average):
Point.from(
Point.select("max(unix_timestamp(created_at)) as max_timestamp")
.group("HOUR(created_at)") # subquery
)
.joins("INNER JOIN points ON subquery.max_timestamp = unix_timestamp(created_at)")
This will result in the following query:
SELECT `points`.*
FROM (
SELECT max(unix_timestamp(created_at)) as max_timestamp
FROM `points`
GROUP BY HOUR(created_at)
) subquery
INNER JOIN points ON subquery.max_timestamp = unix_timestamp(created_at)
You can also use MIN instead to get the first record of the hour, if you like, as well.
I have a problem to which I can't seem to find a simple solution.
I want to achieve the following:
* i have a list of tasks, each with an owner and a due date
* i want to display a list of all tasks grouped by owner
* i want to sort the owners based on the due dates: e.g. The owner with the lowest due date first, followed by the owner with the second lowest, etc
* I want to be able to paginate the results, preferable with will_paginate
To ilustrate, this would be a result i am looking for:
Harry
- task 1, due date 1
- task 3, due date 4
Ben
- task 2, due date 2
Carol
- task 4, due date 3
So far, I can query for all owners with tasks, sort them on a virtual attribute with their "earliest due date" and then display them and their tasks in a view.
There are multiple problems with this approach, imo:
* i run multiple queries from the view (owner.tasks.each etc). I always learned that running queries from the view is bad
* i'm doing the sorting by loading all records in memory (virtual attribute), which could get problematic with large records set.
* I have no idea how i can get pagination in on this, that will be sensitive to the amount of tasks displayed
I can't seem to crack this nut, so any help or hints would be greatly appreciated!
Erwin
Try this query, you have not provided sample data (ideally using SQL) so that we could play ourselves:
SELECT
u.id as owner_id, u.name as owner_name, t.id, t.due_date
FROM users u
INNER JOIN tasks m ON u.id = m.owner_id
INNER JOIN tasks t ON u.id = t.owner_id
GROUP BY u.id, u.name, t.id, t.due_date
ORDER BY MIN(m.due_date), t.due_date
You should get all the data you need in the proper order, and you can paginate simply by applying LIMIT to it (or converting it to AR and submitting it to will_paginate).