Im currently experimenting a bit with cypher. I have a simple setup of components beeing connected to a merchant by a realtionship "sells" having a property "price"
(merchant-[:sells{price:10}]->component)
I made a cypher query which calculates the lowest price, if you buy products from the same merchant.
MATCH sup-[s:sells]->component
WITH SUM(s.price) AS total, sup
RETURN sup, total
ORDER BY total ASC
Now while this is working, I have an issue finding the cheapest price(s), in case 2 or more suppliers are tied. Id like to get something like
_________________________
| price | supplier |
-------------------------
| 60 | conrad |
| | amazon |
-------------------------
You can view my setup here:
http://console.neo4j.org/?id=wpz165
EDIT:
Ok, i found a way although it isnt pretty.
MATCH sup-[s:sells]->component
WITH SUM(s.price) AS minprice, sup
ORDER BY minprice
LIMIT 1
MATCH sup2-[s2:sells]->component2
WITH SUM(s2.price) AS total2, sup2, minprice
WHERE total2 = minprice
RETURN minprice, sup2
How does this work? Well the first part finds the lowest price(by ordering and only returning the first row). The second part runs the whole query again, and filters out items which dont have the lowest price...so the whole query is run two times.
any better ideas???
For my aesthetic this is less ugly though it does require three WITH clauses.
Sum price by supplier for all components
Find Minimum price
Return all suppliers with minimum price
MATCH sup-[s:sells]->component
WITH sup, SUM(s.price) AS price_sum
MATCH sup, price_sum
WITH MIN(price_sum) AS price_min
MATCH sup2-[s2:sells]->component2
WITH sup2, SUM(s2.price) AS price_sum2, price_min
WHERE price_sum2 = price_min
RETURN sup2, price_sum2
Related
I am returning date that looks like this:
"Jonathan" | "Chicago" | 6 | ["Hot","Warm","Cold","Cold","Cold","Warm"]
Where the third column is a count of the values in column 4.
I want to extract values out of the collection in column 4 and create new columns based on the values. My expected output would be:
Hot | Cold | Warm with the values 1 | 3 | 2 representing the counts of each value.
My current query is match (p)-[]->(c)-[]->(w) return distinct p.name, c.name, count(w), collect (w.weather)
I'd imagine this is simple, but i cant figure it out for the life of me.
Cypher does not have way to "pivot" data (as discussed here). That is in part because it does not support dynamically generating the names of return values (e.g., "Cold") -- and it is these names that appear as "column" headers in the Text and Table visualizations provided by the neo4j Browser.
However, if you know that you only have, say, 3 possible "weather" names, you can use a query like this, which hardcodes those names in the RETURN clause:
MATCH (c:City)-[:HAS_WEATHER]->(w:Weather)
WITH c, {weather: w.weather, count: COUNT(*)} AS weatherCount
WITH c, REDUCE(s = {Cold: 0, Warm: 0, Hot: 0}, x IN COLLECT(weatherCount) | apoc.map.setKey(s, x.weather, x.count)) AS counts
MATCH (p:Person)-[:LIVES_IN]->(c)
RETURN p.name AS pName, c.name AS cName, counts.Cold AS Cold, counts.Warm AS Warm, counts.Hot AS Hot
The above query efficiently gets the weather data for a city once (for all people in that city), instead of once per person.
The APOC function apoc.map.setKey is a convenient way to get a map with an updated key value.
I am experimenting with a graph representing (:Shopper)'s who -[:Make]->(:Purchase)'s and each purchase -[:Contains]->(:Item)'s. The challenge is that I want to compare the quantity of Item A each Shopper bought on their most recent purchase. Eliminating Items with only one :Contains relationship won't work, because the Item may have been bought in an earlier purchase as well.
I can get data on the set of all Items in all Shoppers' most recent Purchases with
MATCH (s:Shopper)-->(p:Purchase)
WITH s, max(p.Time) AS latest
MATCH (s)-->(p:Purchase)
WHERE p.Time = latest
MATCH (p)-[c:Contains]->(i:Item)
RETURN s.Name, p.Time, c.Quantity, i.Name
but now I want to replace the second MATCH clause with something like
MATCH (p:Purchase)-[c1:Contains]->(i:Item)<-[c2:Contains]-(p:Purchase)
and it doesn't return any results. I suspect that this looks for items that have two :Contains relationships to the SAME Purchase. I want to get the :Contains relationships on two DIFFERENT Purchases in the same filtered group. How can I do this efficiently? I really want to avoid having to redo the filtering process on the second Purchase node.
[UPDATED]
In your top query, you do not need to MATCH twice to get the latest Purchase for each Shopper (see below).
In your MATCH snippet, you are using the same p variable for both Purchase nodes, so of course they are forced to be the same node.
Here is a query that should return a set of data for each Item that was in the latest Purchases of multiple Shoppers:
MATCH (s:Shopper)-[:Make]->(pur:Purchase)
WITH s, pur
ORDER BY pur.Time DESC
WITH s, HEAD(COLLECT(pur)) AS p
MATCH (p)-[c:Contains]->(i:Item)
WITH i, COLLECT({shopper: s.Name, time: p.Time, quantity: c.Quantity}) AS set
WHERE SIZE(set) > 1
RETURN i.Name AS item, set;
Here is a console that demonstrates the query with your sample data (with corrections to label and type names). It produces this result:
+-------------------------------------------------------------------------------------------------------------------------------+
| item | set |
+-------------------------------------------------------------------------------------------------------------------------------+
| "Banana" | [{shopper=Mandy, time=213, quantity=12},{shopper=Joe, time=431, quantity=5},{shopper=Steve, time=320, quantity=1}] |
+-------------------------------------------------------------------------------------------------------------------------------+
I am struggling to get the proper cypher that is both efficient and allows pagination through skip and limit.
Here is the simple scenario: I have the related nodes (company)<-[A]-(set)<-[B]-(job) where there are multiple instances of (set) with distinct (job) instances related to them. The (job) nodes have a specific status property that can hold one of several states. We need to count the number of (job) nodes in a particular state per (set) and use skip and limit to paginate on the distinct (set) nodes.
So we can get a very efficient query for job.status counts using this.
match (c:Company {id: 'MY.co'})<-[:type_of]-(s:Set)<-[:job_for]-(j:Job)
return s.Description, j.Status, count(*) as StatusCount;
Which will give us a rows of the Set.Description, Job.Status, and JobStatus count. But we will get multiple rows for the Set based on the Job.Status. This is not conducive to paging over distinct sets though. Something like:
s.Description j.Status StatusCount
-------------------+--------------+----------------
Set 1 | Unassigned | 10
Set 1 | Completed | 2
Set 2 | Unassigned | 3
Set 1 | Reviewed | 10
Set 3 | Completed | 4
Set 2 | Reviewed | 7
What we are trying to achieve with the proper cypher is result rows based on distinct Sets. Something like this:
s.Description Unassigned Completed Reviewed
-------------------+--------------+-------------+----------
Set 1 | 10 | 2 | 10
Set 2 | 3 | 0 | 7
Set 3 | 0 | 4 | 0
This would then allow us to paginate over Sets using skip and limit properly.
I have tried many different approaches and cannot seem to find the right combination for this type of result. Anyone have any ideas? Thanks!
** EDIT - Using the answer provided by MIchael, here's how to get the status count values in java **
match (c:Company {id: 'MY.co'})<-[:type_of]-(s:Set)<-[:job_for]-(j:Job)
with s, j.Status as Status,count(*) as StatusCount
return s.Description, collect({Status:Status,StatusCount:StatusCount]) as StatusCounts;
List<Object> statusMaps = (List<Object>) row.get("StatusCounts");
for(Object statusEntry : statusMaps ) {
Map<String,Object> statusMap = (Map<String,Object>) statusEntry;
String status = (String) statusMap.get("Status");
Number count = statusMap.get("StatusCount");
}
You can use WITH and aggregation, and optionally a map result
match (c:Company {id: 'MY.co'})<-[:type_of]-(s:Set)<-[:job_for]-(j:Job)
with s, j.Status as Status,count(*) as StatusCount
return s.Description, collect([Status,StatusCount]);
or
match (c:Company {id: 'MY.co'})<-[:type_of]-(s:Set)<-[:job_for]-(j:Job)
with s, j.Status as Status,count(*) as StatusCount
return s.Description, collect({Status:Status,StatusCount:StatusCount]);
Looking for a little assistance on a Cypher query. Given a set of customers peer who own book p, I am able to retrieve a set of customers target who own at least one book also owned by peer but who don't own p. This is accomplished using the following query:
match
(p:Book {isbn:"123456"})<-[:owns]-(peer:Customer)
-[:owns]->(other:Book)<-[o:owns]-(target:Customer)
WHERE NOT( (target)-[:owns]->(p))
return target.name
limit 10;
My next step is to determine how many other books each member of the target set own, and order those members accordingly. I've attempted several variations based on the Neo4j documentation and SO answers, but am having no luck. For instance I tried using with:
match
(p:Book {isbn:"123456"})<-[:owns]-(peer:Customer)
-[:owns]->(other:Book)<-[o:owns]-(target:Customer)
WHERE NOT( (target)-[:owns]->(p))
WITH target, count(o) as co
WHERE co > 1
return target.name
limit 10;
I also tried what seems to my novice eye was the most reasonable query:
match
(p:Book {isbn:"123456"})<-[:owns]-(peer:Customer)
-[:owns]->(other:Book)<-[o:owns]-(target:Customer)
WHERE NOT( (target)-[:owns]->(p))
return target.name, count(o)
limit 10;
In both of these cases, the query just runs without end (upwards of 10 minutes before I stop execution). Any insight into what I'm doing wrong?
EDIT
As it turns out this latter query does execute but takes 15 minutes to complete and is reporting incorrect numbers, as evidenced here:
+-------------------------------+
| target.name | count(o) |
+-------------------------------+
| "John Smith" | 12840 |
| "Mary Moore" | 11501 |
+-------------------------------+
I'm looking for the number of books each customer specifically owns, not sure where these 12840 and 11501 numbers are coming from really. Any thoughts?
How about this one:
MATCH (p:Book {isbn:"123456"})<-[:owns]-(peer:Customer)
WITH distinct peer, p
MATCH (peer)-[:owns]->(other:Book)
WITH distinct other, p
MATCH (other)<-[o:owns]-(target:Customer)
WHERE NOT((target)-[:owns]->(p))
RETURN target.name, count(o)
LIMIT 10;
I have a graph which has states following each other in time. Each of the states can have a number of actions that happened (0..n) and a number of recommendations (0..n) assigned by some software.
I can do a query on cypher like this
start n=node:name(name="State")
match a<-[:hasAction]-s-[:isA]->n
s-[l?:hasRecommendation]->r
where l.likelihood>0.2
return distinct s.name as state, collect(a.name) as actions,
r.name as recommendation, l.likelihood as likelihood
order by s.name asc, l.likelihood desc
which gives me a table like this
state | actions | recommendation | likelihood
--------------------------------------------------
State 1 | [a1,a2,a3] | a1 | 0.25
State 1 | [a1,a2,a3] | a4 | 0.05
State 2 | [a2,a3] | a3 | 0.56
State 2 | [a2,a3] | a2 | 0.34
State 2 | [a2,a3] | a1 | 0.15
If I process that table manually, I can filter these results and have only the top 2 results for each state for example. This is time consuming and very unelegant.
My problem is, that I never know how many recommendations a state has, so I can't use limit/skip here. Ideally I'd like it to return only a set amount of states (e.g 100) including their top recommendations - this query could return between 0 and 100*n lines.
Is there a better way to achieve this in cypher?
The easy way to achieve this is to select the states that have recommendations and limit the result to 100 first, and then only retrieve the top 2 recommendations for these 100 states by dynamically computing the percentiles for each state, something like this,
start n=node:name(name="State")
Match s-[:isA]->n, s-[?:hasRecommendation]->r
With distinct s
Order by s.name
limit 100
Match s-[?:hasRecommendation]->r
With s, (count(r)-1.0) / count(r) as p
Match s-[l?:hasRecommendation]->r
With s, percentile_disc(l.likelihood, p) as m
start n=node:name(name="State")
match a<-[:hasAction]-s-[:isA]->n,
s-[l?:hasRecommendation]->r
where l.likelihood>= m
return distinct s.name as state, collect(a.name) as actions,
r.name as recommendation, l.likelihood as likelihood
order by s.name asc, l.likelihood desc
It's a bit verbose, but Cypher does not support nested functions for aggregations. so I have to get the "count" and the "percentile" with two separate queries.