Getting Mutliple results from different relationships with Cypher - neo4j

I am sure this question has been asked but I can't find it.
I have a social graph and I want to be able to show people suggestions based on 3 different relationships in one result.
I have 3 different nodes (Skill, Interest, Title)
Each person has a relationship of SKILL_OF, INTEREST_OF, and IS_TITLED respectively.
I would like to have a single (unique if possible) results set of Matching the person, then finding people that have the same skills, interests, and job title.
I tried to start with 2 results (and then wanted to add title on after) but here is what I have.
MATCH (p:Person { username:'wkolcz' })-[INTEREST_OF]->(Interest)<-[i:INTEREST_OF]-(f:Person)
MATCH(p)-[SKILL_OF]->(s:Skill)<-[sk:SKILL_OF]-(sf:Person)
RETURN f.first_name,f.last_name, sf.first_name, sf.last_name, i, s
I tried to make the matching person the same variable but, as you experts know, that failed. I got a result set but it doesn't make sense to me how I could then display it.
I would like a single list of first_name, last_name, username from the 2 and bonus points of I could get the matches also returned (i and s) so I could display the matching results (This person also has skill(s) in X or This person also has interest in X)
Thanks and let me know!

[EDITED]
This turned out to be a very interesting problem.
I provide a solution that:
Only returns a single result row for every person.
Displays all the interests and skills shared by that person and wkolcz as separate collections. (I presume that people in the DB can have multiple interests and skills.)
The solution finds all the people with shared interests and/or skills in a single MATCH clause.
MATCH (p:Person { username:'wkolcz' })-[r1:INTEREST_OF|SKILL_OF]->(n)<-[r2:INTEREST_OF|SKILL_OF]-(f)
WHERE TYPE(r1) = TYPE(r2)
WITH f, COLLECT(TYPE(r1)) AS ts, COLLECT(n.name) AS names
RETURN f.first_name, f.last_name, f.username,
REDUCE(s = { interests: [], skills: []}, i IN RANGE(0, LENGTH(ts)-1) | CASE
WHEN ts[i] = "INTEREST_OF"
THEN { interests: s.interests + names[i], skills: s.skills }
ELSE { interests: s.interests, skills: s.skills + names[i]} END ) AS shared;
Here is a console that shows these sample results:
+---------------------------------------------------------------------------------------------+
| f.first_name | f.last_name | f.username | shared |
+---------------------------------------------------------------------------------------------+
| "Fred" | "Smith" | "fsmith" | {interests=[Bird Watching], skills=[]} |
| "Oscar" | "Grouch" | "ogrouch" | {interests=[Bird Watching, Politics], skills=[]} |
| "Wilma" | "Jones" | "wjones" | {interests=[Bird Watching], skills=[Woodworking]} |
+---------------------------------------------------------------------------------------------+

Related

Cypher - loading results while optionally adding count of relationships

I'm using a graph database to establish a relationship between folders, their children and users (be it owners or sharers of the folder).
Here is an example of my structure. Where orange are folders and blue are users. -
What I want my query to achieve: It should return direct children of the folder under query, and while doing so determine if the child folder being returned is being shared.
My query
MATCH (:Folder { name: 'Nick Hamill' })-[:CHILD]->(children:Folder)
WITH children
OPTIONAL MATCH path = (children)<-[*]-(:User)
UNWIND RELATIONSHIPS(path) AS r WITH children, r
WHERE TYPE(r) = 'SHARES'
RETURN children AS model, COUNT(r) > 0 AS shared
So the query works brilliantly (perhaps a little optimisation needed?) when there is a related user (see below), however, the query fails to return any result if there is no user relationship. I personally can't see why this is because it's an optional match, and surely the count could just return empty?
╒══════════════════════════════════════════════════════════════════════╤════════╕
│"model" │"shared"│
╞══════════════════════════════════════════════════════════════════════╪════════╡
│{"name":"Dr. Denis Abshire","created_at":"2019-10-11 13:54:58","id":"c│true │
│f5e084f-d963-35d3-9c6f-fe29b86f6d43","updated_at":"2019-10-11 13:54:58│ │
│"} │ │
└──────────────────────────────────────────────────────────────────────┴────────┘
The query should be relatively self-explanatory but for the sake of clarity here's some expected outputs -
| Query Folder | Returned Folder | Shared? |
|----------------------|------------------|---------|
| Miss Dessie Oritz II | Nick Hamill | TRUE |
| Nick Hamill | Dr Denis Abshire | TRUE |
| Samara Russell | Shemar Huels PhD | FALSE |
| Shemar Huels PhD | Hazle Ward | FALSE |
I'm running neo4j 3.5.11 community edition. I feel like this should be a fairly easy solution, I'm just meeting the limits of my extremely limited cypher knowledge.
Appreciate any help!
I don't undertsand why you are using this in your query :
OPTIONAL MATCH path = (children)<-[*]-(:User)
UNWIND RELATIONSHIPS(path) AS r WITH children, r
WHERE TYPE(r) = 'SHARES'
With (children)<-[*]-(:User) you are searching all the path (without restriction on its size) between the children & User nodes.
And with the WHERE TYPE(r) = 'SHARES' you only want the SHARES relationship ...
So your query will work on this kind of pattern : (children)<-[:CHILD]-(:Folder)<-[:CHILD]-(:Folder)<-[:SHARES]-(:User)
Is it what you want ?
If so, can you try this query :
MATCH (:Folder { name: 'Nick Hamill' })-[:CHILD]->(children:Folder)
RETURN children AS model, size((children)<-[:CHILD*0..]-(:Folder)<-[:SHARES]-(:User)) > 0 AS shared

How to show in the browser the aggregated value of a relationship property as a single relationship

Below are my nodes and their properties:
| Node | Property |
=====================
|Person | id, name |
|Product | id, name |
|Buys | amount |
This query sums the amount paid by each person for a given product:
MATCH (t:Person)-[r:Buys]->(p:Product)
return t,p, sum(r.amount) as Amount
Sadly, the browser continues to display multiple relationships between each Person and Product if there were multiple such purchases. I would only like to see one relationship between a given person and a given product, showing the sum -- rather than multiple relationships.
Any ideas on how to do that?

Find the nodes with the most mutual connecting nodes?

I'm working with a data set that contains customers, their purchases and the businesses they purchased from, and I'm trying to determine which businesses share the highest number of mutual customers. Ideally the output would be a table that lists the connected businesses and the number of mutual customers. I.e.:
| BUSINESS_1 - BUSINESS_2 | 4 |
| BUSINESS_1 - BUSINESS_5 | 3 |
| BUSINESS_3 - BUSINESS_7 | 2 |
| BUSINESS_4 - BUSINESS_9 | 2 |
I don't have much at this point, but the query I'm working with looks something like this:
MATCH (c:Customer)<-[:Trans_Cust]-(t:Transaction)-[:Trans_Business]->(b:Business)
RETURN c, t, b
Thanks in advance
I guess this should do the trick, maybe provide a sample dataset on http://console.neo4j.org for us to help.
MATCH (b:Business)
MATCH (b)<-[:Trans_Business]-(t:Transaction)-[:Trans_Cust]->(c:Customer)
MATCH (c)<-[:Trans_Cust]-(:Transaction)-[:Trans_Business]->(other:Business)
WHERE b <> other
WITH b, other, collect(distinct(customer)) as customers
RETURN b, other, size(customers) as sharedCustomers
ORDER BY sharedCustomers DESC

Performance in Neo4j cypher query

I have the following cypher query:
MATCH (country:Country { name: 'norway' }) <- [:LIVES_IN] - (person:Person)
WITH person
MATCH (skill:Skill { name: 'java' }) <- [:HAS_SKILL] - (person)
WITH person
OPTIONAL MATCH (skill:Skill { name: 'javascript' }) <- [rel:HAS_SKILL] - (person)
WITH person, CASE WHEN skill IS NOT NULL THEN 1 ELSE 0 END as matches
ORDER BY matches DESC
LIMIT 50
RETURN COLLECT(ID(person)) as personIDs
It seems to perform worse when adding more nodes. Right now with only 5000 Person nodes (a Person node can have multiple HAS_SKILL relationships to Skill nodes). Right now it takes around 180 ms to perform the query, but adding another 1000 Person nodes with relationships adds 30-40 ms to the query. We are planning on having millions of Person nodes, so adding 40 ms every 1000 Person is a no go.
I use parameters in my query instead of 'norway', 'java', 'javascript' in the above query. I have created indexes on :Country(name) and :Skill(name).
My goal with the query is to match every person that lives in a specified country (norway) which also have the skill 'java'. If the person also have the skill 'javascript' it should be ordered higher in the result.
How can I restructure the query to improve performance?
Edit:
There also seems to be an issue with the :Country nodes, if I switch out
MATCH (country:Country { name: 'norway' }) <- [:LIVES_IN] - (person:Person)
with
MATCH (city:City { name: 'vancouver' }) <- [:LIVES_IN] - (person:Person)
the query time jumps down to around 15-50 ms, depending on what city i query for. It is still a noticeable increase in query time when adding more nodes.
Edit 2:
I seems like the query time is increased by a huge amount when there is a lot of rows in the first match clause. So if I switch the query to match on Skill nodes first, the query times decreases substantially. The query is part of an API and it is created dynamically and I do not know which of the match clauses that will return the smallest amount of rows. It will probably also be a lot more rows in every match clause when the database grows.
Edit 3
I have done some testing from the answers and I now have the following query:
MATCH (country:Country { name: 'norway'})
WITH country
MATCH (country) <- [:LIVES_IN] - (person:Person)
WITH person
MATCH (person) - [:HAS_SKILL] -> (skill:Skill) WHERE skill.name = 'java'
MATCH (person) - [:MEMBER_OF_GROUP] -> (group:Group) WHERE group.name = 'some_group_name'
RETURN DISTINCT ID(person) as id
LIMIT 50
this still have performance issues, is it maybe better to first match all the skills etc, like with the Country node? The query can also grow bigger, I may have to add matching against multiple skills, groups, projects etc.
Edit 4
I modified the query slightly and it seems like this did the trick. I now match all the needed skills, company, groups, country etc first. Then use those later in the query. In the profiler this reduced the database hits from 700k to 188 or something. It is a slightly different query from my original query (different labeled nodes etc), but it solves the same problem. I guess this can be further improved by maybe matching on the node with the least relationships first etc, to start with a reduced number of nodes. I'll do some more testing later!
MATCH (company:Company { name: 'relinkgroup' })
WITH company
MATCH (skill:Skill { name: 'java' })
WITH company, skill
MATCH (skill2:Skill { name: 'ajax' })
WITH company, skill, skill2
MATCH (country:Country { name: 'canada' })
WITH company, skill, skill2, country
MATCH (company) <- [:WORKED_AT] - (person:Person)
, (person) - [:HAS_SKILL] -> (skill)
, (person) - [:HAS_SKILL] -> (skill2)
, (person) - [:LIVES_IN] -> (country)
RETURN DISTINCT ID(person) as id
LIMIT 50
For the first line of your query, the execution has to look for all possible paths between the country and person. Limiting your initial match (thus defining a more accurate starting point for the traversal) you'll win some performance.
So instead of
MATCH (country:Country { name: 'norway' }) <- [:LIVES_IN] - (person:Person)
Try doing it in two steps :
MATCH (country:Country { name: 'norway' })
WITH country
MATCH (country)<-[:LIVES_IN]-(person:Person)
WITH person
As an example, I'll use the simple movie app in the neo4j console : http://console.neo4j.org/
Doing a query equivalent to yours for finding people that knows cypher :
MATCH (n:Crew)-[r:KNOWS]-m WHERE n.name='Cypher' RETURN n, m
The execution plan will be :
Execution Plan
ColumnFilter
|
+Filter
|
+TraversalMatcher
+------------------+------+--------+-------------+----------------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+------------------+------+--------+-------------+----------------------------------------+
| ColumnFilter | 2 | 0 | | keep columns n, m |
| Filter | 2 | 14 | | Property(n,name(0)) == { AUTOSTRING0} |
| TraversalMatcher | 7 | 16 | | m, r, m |
+------------------+------+--------+-------------+----------------------------------------+
Total database accesses: 30
And by defining an accurate starting point :
MATCH (n:Crew) WHERE n.name='Cypher' WITH n MATCH (n)-[:KNOWS]-(m) RETURN n,m
Result in the following execution plan :
Execution Plan
ColumnFilter
|
+SimplePatternMatcher
|
+Filter
|
+NodeByLabel
+----------------------+------+--------+-------------------+----------------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+----------------------+------+--------+-------------------+----------------------------------------+
| ColumnFilter | 2 | 0 | | keep columns n, m |
| SimplePatternMatcher | 2 | 0 | m, n, UNNAMED53 | |
| Filter | 1 | 8 | | Property(n,name(0)) == { AUTOSTRING0} |
| NodeByLabel | 4 | 5 | n, n | :Crew |
+----------------------+------+--------+-------------------+----------------------------------------+
Total database accesses: 13
As you can see, the first method use the traversal pattern, which is quite a bit exponantionnaly expensive with the amount of nodes, and you're doing a global match on the graph.
The second uses an explicit starting point, using the labels index.
EDIT
For the skills part, I would do something like this, if you have some test data to provide it could be more helpful for testing :
MATCH (country:Country { name: 'norway' })
WITH country
MATCH (country)<-[:LIVES_IN]-(person:Person)-[:HAS_SKILL]->(skill:Skill)
WHERE skill.name = 'java'
WITH person
OPTIONAL MATCH (person)-[:HAS_SKILL]->(skillb:Skill) WHERE skillb.name = 'javascript'
WITH person, skillb
There is no need for global lookups, as he already found persons, he just follows the "HAS_SKILL" relationships and filter on skill.name value
Edit 2:
Concerning your last edit, maybe this last part of the query :
MATCH (company) <- [:WORKED_AT] - (person:Person)
, (person) - [:HAS_SKILL] -> (skill)
, (person) - [:HAS_SKILL] -> (skill2)
, (person) - [:LIVES_IN] -> (country)
Could be better written as :
MATCH (person:Person)-[:WORKED_AT]->(company)
WHERE (person)-[:HAS_SKILL]->(skill)
AND (person)-[:HAS_SKILL]->(skill2)
AND (person)-[:LIVES_IN]->(country)

how can I return two collections with a cypher query in the neo4j .net client

I'd like to return two collections in one query "tags" and "items" where each tag can have 0..many items. It looks like if I use the projection, it will assume a single collection with two columns rather than two collections, is that correct? Is there a better way to run this search query?
I'm getting "the query response contains columns Tags, Items however ...anonymous type does not contain settable properties to receive this data"
var query = client
.Cypher
.StartWithNodeIndexLookup("tags", "tags_fulltext", keyword)
.Match("tags<-[:TaggedWith]-items")
.Return((items, tags) => new
{
Tags = tags.As<Tag>(),
Items = items.As<Item>()
});
var results = await query.ResultsAsync;
return new SearchResult
{
Items = results.Select(x => x.Items).ToList(),
Tags = results.Select(x => x.Tags).Distinct().ToList()
};
Option 1
Scenario: You want to retrieve all of the tags that match a keyword, then for each of those tags, retrieve each of the items (in a way that still links them to the tag).
First up, this line:
.StartWithNodeIndexLookup("tags", "tags_fulltext", keyword)
Should be:
.StartWithNodeIndexLookup("tag", "tags_fulltext", keyword)
That is, the identity should be tag not tags. That's because the START clause results in a set of nodes which are each a tag, not a set of nodes called tags. Semantics, but it makes things simpler in the next step.
Now that we're calling it tag instead of tags, we update our MATCH clause to:
.Match("tag<-[:TaggedWith]-item")
That says "for each tag in the set, go and find each item attached to it". Again, 'item' is singular.
Now lets return it:
.Return((tag, item) => new
{
Tag = tag.As<Tag>(),
Items = item.CollectAs<Item>()
});
Here, we take each 'item' and collect them into a set of 'items'. My usage of singular vs plural in that code is very specific.
The resulting Cypher table looks something like this:
-------------------------
| tag | items |
-------------------------
| red | A, B, C |
| blue | B, D |
| green | E, F, G |
-------------------------
Final code:
var query = client
.Cypher
.StartWithNodeIndexLookup("tag", "tags_fulltext", keyword)
.Match("tag<-[:TaggedWith]-item")
.Return((tag, item) => new
{
Tag = tag.As<Tag>(),
Items = item.CollectAs<Item>()
});
That's not what fits into your SearchResult though.
Option 2
Scenario: You want to retrieve all of the tags that match a keyword, then all of the items that match any of those tags, but you don't care about linking the two together.
Let's go back to the Cypher query:
START tag=node:tags_fulltext('keyword')
MATCH tag<-[:TaggedWith]-item
RETURN tag, item
That would produce a Cypher result table like this:
--------------------
| tag | item |
--------------------
| red | A |
| red | B |
| red | C |
| blue | B |
| blue | D |
| green | E |
| green | F |
| green | G |
--------------------
You want to collapse each of these to a single, unrelated list of tags and items.
We can use collect to do that:
START tag=node:tags_fulltext('keyword')
MATCH tag<-[:TaggedWith]-item
RETURN collect(tag) AS Tags, collect(item) AS Items
-----------------------------------------------------------------------------
| tags | items |
-----------------------------------------------------------------------------
| red, red, red, blue, blue, green, green, green | A, B, C, B, D, E, F, G |
-----------------------------------------------------------------------------
We don't want all of those duplicates though, so let's just collect the distinct ones:
START tag=node:tags_fulltext('keyword')
MATCH tag<-[:TaggedWith]-item
RETURN collect(distinct tag) AS Tags, collect(distinct item) AS Items
--------------------------------------------
| tags | items |
--------------------------------------------
| red, blue, green | A, B, C, D, E, F, G |
--------------------------------------------
With the Cypher working, turning it into .NET is an easy translation:
var query = client
.Cypher
.StartWithNodeIndexLookup("tag", "tags_fulltext", keyword)
.Match("tag<-[:TaggedWith]-item")
.Return((tag, item) => new
{
Tags = tag.CollectDistinct<Tag>(),
Items = item.CollectDistinct<Item>()
});
Summary
Always start with the Cypher
Always start with the Cypher
When you have working Cypher, the .NET implementation should be almost one-for-one
Problems?
I've typed all of this code out in a textbox with no VS support and I haven't tested any of it. If something crashes, please report the full exception text and query on our issues page. Tracking crashes here is hard. Tracking crashes without the full exception text, message, stack trace and so forth just consumes my time by making it harder to debug, and reducing how much time I can spend helping you otherwise.

Resources