I have 5 columns of data I want to return from a query, plus a count of the first column.
A couple of other things I want is to only include listings that are active (which is stored by the tag "Include" in Column M) and I want the data to be randomized (I do this by creating a random number generator in column P). Neither of these last 2 should be displayed. The data I wanted to be returned is located in Columns Q, R, S, T, U.
My data looks like this:
M N O P Q R S T U
Active Text Text RN Phone# ID Name Level Location
Include text text 0.51423 10000001 1223 Bob Level 2 Boston
Include text text 0.34342 10000005 2234 Dylan Level 3 San Francisco
Exclude text text 0.56453 10000007 2311 Janet Level 8 Des Moines
Include text text 0.23122 10000008 2312 Gina Level 8 Houston
Include text text 10000001 1225 Ronda Level 3 Boston
Include text text 10000001 1236 Nathan Level 2 Boston
So, ideally, results would look like:
count Phone# Phone# ID Name Level Location
3 10000001 1223 Bob Level 2 Boston
1 10000005 2234 Dylan Level 3 San Francisco
1 10000008 2312 Gina Level 8 Houston
I don't care what ID or Name shows up behind the phone number so long as it's one of the numbers on the list.
Now, I have been able to get the function to work separately (ORDER and COUNT), but can't get both to work in 1 function:
Worked:
=QUERY(Function!M:U, "SELECT count (Q), Q where O = 'Include' group by Q")
=QUERY(Function!M:U, "SELECT Q, R, S, T, U where O = 'Include' ORDER BY P DESC")
Did not work:
=QUERY(Function!M:U, "SELECT count (Q), Q group by Q, R, S, T, U where O = 'Include' group by Q ORDER BY P DESC, R, S, T, U")
=QUERY(Function!M:U, "SELECT count (Q), Q, R, S, T, U group by Q where O = 'Include' group by Q ORDER BY P DESC")
=QUERY(Function!M:U, "SELECT count (Q), Q group by Q where O = 'Include' group by Q ORDER BY P DESC, R, S, T, U")
Maybe someone has an idea of where I'm going wrong with combining the two different types of syntax? Help is much appreciated! :)
=ARRAYFORMULA({"count Phone#", "Phone#", "ID", "Name", "Level", "Location";
QUERY(Function!M3:U,
"select count(Q),Q where P is not null group by Q label count(Q)''", 0),
IFERROR(VLOOKUP(INDEX(QUERY(Function!M3:U,
"select Q,count(Q) where P is not null group by Q label count(Q)''", 0),,1),
QUERY(Function!M3:U,
"select Q,R,S,T,U where P is not null order by P desc", 0), {2, 3, 4, 5}, 0))})
cell P2:
=ARRAYFORMULA({"RN"; IF(M3:M="Include", RANDBETWEEN(ROW(A3:A),99^9), )})
Related
Title might be misleading as I wasn't sure how to properly summarize the problem.
I have a dataset of trips with two locations (source and destination) and also other attributes (about customer, cargo, equipment, etc).
Are there any algorithms that I could apply in order to cluster those trips, given that I want to use both spatial points (source and destination) for clustering, not just one.
Let's say if I have following trips:
A1 -> B1
A2 -> B2
A1 -> C1
A2 -> C2
I want to get clusters like:
A -> B
A -> C
A very simple solution I can think about is to cluster each location independently, and then use cluster ids to group by both clusters.
Something like (this was tested in Google BigQuery):
with data as (
select st_geogpoint(100, 50) a, st_geogpoint(101, 51) b
union all
select st_geogpoint(100.01, 50) a, st_geogpoint(101.01, 51) b
union all
select st_geogpoint(100, 50.01) a, st_geogpoint(90, 51) b
union all
select st_geogpoint(100.01, 50.01) a, st_geogpoint(90, 51.01) b
),
clusters as (
select
a, b,
st_clusterdbscan(a, 1e4, 1) OVER() a_id,
st_clusterdbscan(b, 1e4, 1) OVER() b_id
from data
)
select
a_id, b_id,
st_centroid_agg(a) a_center,
st_centroid_agg(b) b_center
from clusters
group by a_id, b_id
a_id b_id a_center b_center
0 0 POINT(100.005 50.0000001074259) POINT(101.005 51.0000001066994)
0 1 POINT(100.005 50.0100001074192) POINT(90 51.005)
I'm an ultimate neo4j beginner. I have a cypher query similar to
WITH ["123", "456", "789"] as ids
MATCH (p:user)-[:follower]->(m:user)
WHERE m.id in ids
WITH p, 2 as inputCnt, count(DISTINCT p) as cnt
WHERE cnt = inputCnt
RETURN p
123, 456 and 789 are user ids. List length is dynamic, can be larger than 3.
What I'm trying to find is 123, 456, 789, plus other nodes which are following at least 2 users in given list. If a node is following only 1 user, it's not needed.
I got the main idea from here but apparently the problem there is different so my query doesn't return any results. I'm sure there's exactly 1 node in my graph which satisfies my condition, so I should see a result with 4 nodes.
Let me give some examples to clarify:
When there are no users following at least 2 of them return:
When there are no users following at least 2 of them, but they are following among themselves, return:
When there's a single user, say (000), following 123 and 456 return:
When there are two users, say 000 and XXX, one following all 3 of
them, one following 2 of them, return:
When you say "at least 2 users" then it should be >= rather than =. Then you count distinct users m that are followed by another users p1, p2,..,pn.
WITH ["123", "456", "789"] as ids
MATCH (p:user)-[:follower]->(m:user)
WHERE m.id in ids
WITH p, count(DISTINCT m) as cnt where cnt >= 2
RETURN p
If you want to return users m then do a collect and check the size.
WITH ["123", "456", "789"] as ids
MATCH (p:user)-[:follower]->(m:user)
WHERE m.id in ids
WITH p, collect(DISTINCT m) as m_users where size(m_users) >= 2
RETURN p, m_users
EDIT:
Do a match of users from the id list
Using OPTIONAL match, find all followers to m
Check if the count is >= 2 OR no connection to the list OR connection within the list
Return distinct followers p and users m
WITH [ "222" , "333", "789"] as ids
MATCH (m:user) WHERE m.ID in ids
WITH collect(m) as ms
OPTIONAL MATCH (p:user)-[:follower]->(m) WHERE m in ms
WITH p, ms, collect(DISTINCT m) as m_users
WHERE size(m_users) >= 2 OR p is null OR p in ms
WITH p, ms + m_users as allUsers
UNWIND allUsers as m
RETURN distinct p, m
See below result:
I need help figuring out a MATCH statement.
My data model is as follows:
(a:musician {name}) //individual musicians
(b:jamSession {date, durationInHours}) //the date and length of jam sessions where 2 or more musicians participated together
and the relation
[r:PLAYED]
I've already figured out how to find all of the jam sessions a specific musician played at:
MATCH (a:musician {name:"Joe Smith"})-[r:PLAYED]->(b:jamSession) RETURN a.name, b.date
and all of the musicians a specific musician played with
MATCH (a:musician {name:"Joe Smith"})-[r:PLAYED]->(b:jamSession)<-[r2:PLAYED]-(c:musician) RETURN c.name
But how do I get only the musicians that Joe Smith has played with were the sum total time of their common jam sessions was >=100 hours and what date the pair of musicians meet the 100 hour milestone?
This may work for you (assuming b.date is suitable for sorting):
MATCH (a:musician)-[:PLAYED]->(b:jamSession)<-[:PLAYED]-(c:musician)
WHERE a.name = "Joe Smith"
WITH a, c, b ORDER BY b.date
WITH a, c,
REDUCE(s = {sum: 0}, x IN COLLECT(b) |
CASE WHEN s.date IS NULL AND x.durationInHours + s.sum >= 100
THEN {sum: s.sum + x.durationInHours, date: x.date}
ELSE s
END
) AS data
WHERE data.date IS NOT NULL
RETURN c.name AS name, data.date AS date
The aggregating function COLLECT is used to collect the date-ordered b nodes that are shared by the same a and c node pairs. And the REDUCE function is used to iterate through the ordered b nodes to find when the 100 threshold is met.
In this Cypher query, I want to sum all the weights over paths in a graph:
MATCH p=(n:person)-[r*2..3]->(m:person)
WHERE n.name = 'alice' and m.name = 'bob'
WITH REDUCE(weights=0, rel IN r : weights + rel.weight) AS weight_sum, p
return n.name, m.name, weight_sum
LIMIT 10
In this query, I expect to receive a table with 3 columns: n.name, m.name (identical in all the rows), and weight_sum -- according to the weight sum in the specific path.
However, I get this error:
reduce(...) requires '| expression' (an accumulation expression) (line 3,
column 6 (offset: 89))
"WITH REDUCE(weights=0, rel IN r : weights + rel.weight) AS weight_sum, p"
I obviously miss something trivial. But what?
Shouldn't that be
REDUCE(weights=0, rel IN r | weights + rel.weight) AS weight_sum
(with a pipe instead of a colon) as per the documentation in http://neo4j.com/docs/developer-manual/current/cypher/functions/list/ ?
reduce(totalAge = 0, n IN nodes(p)| totalAge + n.age) AS reduction
Hope this helps.
Regards,
Tom
In genetic genealogy X-chromosome data is useful linking to certain ancestors. This is well illustrated at: X-DNA Inheritance Chart
My Neo4j database has nodes for each Person and relationships connecting them of father and mother. Each node has a property sex (for the Person's gender; M or F). A female has two X-chromosomes, one from either parent. A male has one X-chromosome, always from the mother.
You can use reduce to see the genders involved in the inheritance from ancestors:
match p=(n:Person{RN:1})-[:father|mother*..20]->m
return m.fullname as FullName
,reduce(status ='', q IN nodes(p)| status + q.sex) AS c
order by length(p), c
So, starting with a male (RN:1), the result for c is MM for his father and MF for his mother, MMM for the paternal grandfather and MFM for the maternal grandfather, etc. This pattern shows that when c contains MM (two Ms together in sequence) that these are NOT contributing to the X-chromosome of the start Person.
I want to remove any node that has the MM pattern. It's easy to do this with external code, but I cannot figure out how to do it within the cypher query.
This should work for you:
MATCH p=(n:Person { RN:1 })-[:father|mother*..20]->m
WITH m, NODES(p) AS a
WITH m, REDUCE(c = "", i IN RANGE(0, SIZE(a)-1)| CASE
WHEN c IS NULL OR (i > 0 AND (a[i-1]).sex = "M" AND (a[i]).sex = "M") THEN
NULL
ELSE
c + (a[i]).sex
END ) AS c
WHERE c IS NOT NULL
RETURN m.fullName AS fullName, c
ORDER BY LENGTH(c);
And here is a console that demonstrates the results.
A little late to the party and same thought process as #cybersam's solution.
match p=(n:Person { RN: 1 })-[:father|mother*..20]->(m)
with p, m, extract( g in nodes(p) | g.sex ) as genders
with p, m, genders, range(0,size(genders) -1,1) as gender_index
unwind gender_index as idx
with p, m, genders, collect([genders[idx], genders[idx+1]]) as pairs
where not ['M','M'] in pairs
return m.fullName
,reduce(status ='', q IN nodes(p)| status + q.sex) AS c
order by length(p), c
This query gets me only ancestors contributing an X-chromosome:
match p=(n:Person{RN:1})-[:father|mother*..20]->(m)
with m, reduce(status ='', q IN nodes(p)| status + q.sex) AS c
where c=replace(c,'MM','')
return m.RN,m.fullname as Name, c
The collection of genders adds a gender for each generation and is filtered to exclude any MM since a male cannot transmit his X to another male (e.g., son).