There are 2 students: a and b. a likes subjects chem,phy,bio and b likes phy,math,bio.They have a workshop every week and I would like to know count of both their subjects according to the workshop_attended in desc order.
currently I am using this query to get subjects he likes based on workshop attended:
MATCH (s:student{id:"1",name:"a"} )-[:workshop_attended]-(b:workshop)-[y:subject_likes]-(c:subjects) RETURN c,count(c) as total
Btw,is the above query correct w.r.t count and subjects?
Now I would like to know how many common subjects they have liked together by attending the workshops and also count of each of those subjects.How can I do it .I tried this but I always got 0 rows.
MATCH (s:student{id:"1",name:"a"} )-[:workshop_attended]-(b:workshop)-[y:subject_likes]-(c:subjects),
(s2:student{id:"2",name:”b"} )-[:workshop_attended]-(b2:workshop)-[y2:subject_likes]-(c:subjects)
RETURN c,count(c) as total
Also I tried:
MATCH (s:student{id:"1",name:"a"} )-[:workshop_attended]-(b:workshop)-[y:subject_likes]-(c:subjects),
(s2:student{id:"2",name:”b"} )-[:workshop_attended]-(b2:workshop)-[y2:subject_likes]-(l:subjects)
RETURN c,count(c),l,count(l) as total
Even that is wrong also I get more rows for some reason .I really appreciate any help.
Related
I have a very simple cypher which give me a poor performance.
I have approx. 2 million user and 60 book category with relation from user to category around 28 million.
When I do this cypher:
MATCH (u:User)-[read:READ]->(bc:BookCategory)
WHERE read.timestamp >= timestamp() - (1000*60*60*24*30)
RETURN distinct(bc.id);
It returns me 8.5k rows within 2 - 2.5 (First time) minutes
And when I do this cypher:
MATCH (u:User)-[read:READ]->(bc:BookCategory)
WHERE read.timestamp >= timestamp() - (1000*60*60*24*30)
RETURN u.id, u.email, read.timestamp;
It return 55k rows within 3 to 6 (First time) minutes.
I already have index on User id and email, but still I don't think this performance is acceptable. Any idea how can I improve this?
First of all, you can profile your query, to find what happens under the hood.
Currently looks like that query scans all nodes in database to complete query.
Reasons:
Neo4j support indexes only for '=' operation (or 'IN')
To complete query, it traverses all nodes, one by one, checking each node if it has valid timestamp
There is no straightforward way to deal with this problem.
You should look into creating proper graph structure, to deal with Time-specific queries more efficiently. There are several ways how to represent time in graph databases.
You can take look on graphaware/neo4j-timetree library.
Can you explain your model a bit?
Where are the books and the "reading"-Event in it?
Afaik all you want to know, which book categories have been recently read (in the last month)?
You could create a second type of relationship thats RECENTLY_READ which expires (is deleted) by a batch job it is older than 30 days. (That can be two simple cypher statements which create and delete those relationships).
WITH (1000*60*60*24*30) as month
MATCH (a:User)-[read:READ]->(b:BookCategory)
WHERE read.timestamp >= timestamp() - month
MERGE (a)-[rr:RECENTLY_READ]->(b)
WHERE coalesce(rr.timestamp,0) < read.timestamp
SET rr.timestamp = read.timestamp;
WITH (1000*60*60*24*30) as month
MATCH (a:User)-[rr:RECENTLY_READ]->(b:BookCategory)
WHERE rr.timestamp < timestamp() - month
DELETE rr;
There is another way to achieve what you exactly want to do here, but it's unfortunately not possible in Cypher.
With a relationship-index on timestamp on your read relationship you can run a Lucene-NumericRangeQuery in Neo4j's Java API.
But I wouldn't really recommend to go down this route.
Basically my question is: how do I sum relationship properties where there is a related nodes that have properties equal to Value A and Value B?
For example:
I have a simple DB has the following relationship:
(site)-[:HAS_MEMBER]->(user)-[:POSTED]->(status)-[:TAGGED_WITH]->(tag)
On [:TAGGED_WITH] I have a property called "TimeSpent". I can easily SUM up all the time spent for a particular day and user by using the following query:
MATCH (user)-[:POSTED]->(updates)-[r:TAGGED_WITH]->(tags)
WHERE user.name = "Josh Barker" AND updates.date = 20141120
RETURN tags.name, SUM(r.TimeSpent) as totalTimeSpent;
This returns to me a nice table with tags and associated time spent on each. (i.e. #Meeting 4.5). However, the question arises if I want to do some advanced searches and say "Show me all the meetings for ProjectA" (i.e. #Meeting #ProjectA). Basically, I am looking for a query that I can get all of the relationships where a single status has BOTH tags (and only if it has both). Then I can SUM that number up to get a count for how many meetings I spent in #ProjectA.
How do I do this?
MATCH (updates)-[r:TAGGED_WITH]->(tag1 {name: 'Meeting'}),
(updates)-[r:TAGGED_WITH]->(tag2 {name: 'ProjectA'})
RETURN SUM(r.TimeSpent) as totalTimeSpent, count(updates);
This should find all updates tagged with both of those things, and sum all time spent across all of those updates.
To create a generic solution where you may want one or more tags you could use something like this, passing in the array of tags as a parameter (and using the length of the array instead of the hard coded 2.
MATCH (user)-[:POSTED]->(update)-[r:TAGGED_WITH]->(tag)
WHERE user.name = "Josh Barker" AND updates.date = 20141120 AND tag.name IN ['Meeting', 'ProjectA']
WITH update, SUM(r.TimeSpent) AS totalTimeSpent, COLLECT(tag) AS tags
WHERE LENGTH(tags) = 2
RETURN update, totalTtimeSpent
As long as tag.name is indexed, this should be fast.
Edit - Remove User constraint
MATCH (update)-[r:TAGGED_WITH]->(tag)
WHERE tag.name IN ['Meeting', 'ProjectA']
WITH update, SUM(r.TimeSpent) AS totalTimeSpent, COLLECT(tag) AS tags
WHERE LENGTH(tags) = 2
RETURN update, totalTtimeSpent
BACKGROUND: Posts have many Communities through CommunityPosts. I understand the following query returns posts associated with ANY ONE of these community_ids.
Post.joins(:communities).where(communities: { id: [1,2,3] })
OBJECTIVE: I'd like to query for posts associated with ANY TWO community_ids in the array. Having either communities 1 and 2, communities 1 and 3, or communities 2 and 3.
EDIT: Please assume that length of the array is unknown. Used this array of explanation purposes. It will be current_user.community_ids instead of [1,2,3].
This will get you all posts having exactly any two associations from the current user's communities:
Post.select("posts.*, count(distinct(communities.id))").joins(:communities).where("communities.id in (?)", current_user.community_ids).group("posts.id").having("count(distinct(communities.id)) = 2")
Apparently, to relax the restriction, you'll need to change the condition in the having clause to >=.
So far I have a query with a result set (in a temp table) with several columns but I am only concerned with four. One is a customer ID(varchar), one is Date (smalldatetime), one is Amount(money) and the last is Type(char). I have multiple rows with the same custmer ID and want to evaluate them based on Date, Amount and Type. For example:
Customer ID Date Amount Type
A 1-1-10 200 blue
A 1-1-10 400 green
A 1-2-10 400 green
B 1-11-10 100 blue
B 1-11-10 100 red
For all occurrences of A I want to compare them to identify only one, first by earliest date, then by greatest Amount, then if still tied by comparing Types. I would then return one row for each customer.
I would provide some of the query but I am at home now after spending two days trying to get a correct result. It looks something like this:
(query to populate #tempTable)
GROUP BY customer_id
HAVING date_cd =
(SELECT MIN(date_cd)
FROM order_table ot
WHERE ot.customerID = #tempTable.customerID
)
OR date_cd IS NULL
I assume the HAVING would result in only one row per customer_id. This did not end up being the case since there were some ties there.
I am not sure I can do the OR - there are some with NULL values here - and it did not account for the step to the next comparison if they were all the same anyway. I am not seeing a way to avoid doing some row processing of the temp table with some kind of IF or WHERE loop.
As I write I am thinking maybe I use #tempTable.date_cd in the HAVING clause instead of looking at the original table. but that should return the same dates?
Am I on the right track or is there something missing? Suggestions? More info??
try below query :-
select * from #tempTable
GROUP BY customer_id
HAVING isnull(date_cd,"1900/01/01") =min(isnull(date_cd,"1900/01/01"))
I have a rails 4 (ruby 2) app that tracks time for employees against various companies. I need to get a sum of the minutes per company per date. My problem is I'm not sure the best way to pad date/company pairs with 0 if there are no time entries for that company on that day.
Tables
Companies Time_Entries
id name ... id, created_at, company_id, minutes ...
Current output given only 2 companies and 2 days,
[{"company_id":1,"company_name":"Company A","date":"2013-06-24","minutes":987},
{"company_id":1,"company_name":"Company A","date":"2013-06-25","minutes":5},
{"company_id":2,"company_name":"Company B","date":"2013-06-24","minutes":500}]
Expected output to do is pad days that aren't recorded with 0's is to have an additional item in the list where the last item is the new item.
[{"company_id":1,"company_name":"Company A","date":"2013-06-24","minutes":987},
{"company_id":1,"company_name":"Company A","date":"2013-06-25","minutes":5},
{"company_id":2,"company_name":"Company B","date":"2013-06-24","minutes":500},
{"company_id":2,"company_name":"Company B","date":"2013-06-25","minutes":0}]
Current Query (PostgreSQL)
#minutes = TimeEntry.where("created_at >= ?", 1.week.ago.utc)
.group('companies.id, date(created_at)')
.joins(:company)
.select("companies.id as company_id", "companies.name as company_name", "date(created_at)", "SUM(minutes) as minutes")
.order("date ASC")
I'm not sure the best way to go about this. I can think of a couple options:
A 3 deep loop that loops through days, than a loop through companies, than a loop through found results to add any day/company pairs that have not already been added.
Do a left join on a generate_series() for a date range in postgresq and coalesce null sums to 0, but I don't think that will get me all the way
Some unknown better more elegant option