Neo4j Excluding nodes from result - neo4j

I am quite new to Neo4j and Graph.
I have a simple Graph database with just one relationship: table -[contains]-> column
I'm trying to get a list of tables and columns that contain a specific term in their name.
I want to show that as a list of Tables and a count of columns for that table. If the table name does not contain the term but one of it columns does, then it should be in the list. Also, the count of columns should only include columns that contain the term.
Here is an example:
Table: "Chicago", Columns: "Chi_Address", "Chi_Weather", "Latitude"
Table: "Miami" , Columns: "Mia_to_Chi", "Mia_Weather"
Table: "Dallas" , Columns: "Dal_to_Mia", "Dal_Weather"
If I search for the term "chi", the desired result would be:
Table -- Col Count
Chicago -- 2
Miami -- 1
This is my current query:
MATCH (t:table)-[r:contains]->(c:column)
where toLower(t.name) contains toLower('CHI') or toLower(c.name) contains toLower('CHI')
return t.name as Table_Name,count(c.name) as Column_Count
My problem is that if a table contains the term, then I get a count of all its columns, not just the ones with the term. So I'm my example I would get
Chicago -- 3 //Instead of 2
Miami -- 1
I was thinking of doing something like:
count(c.name WHERE c.name contains('CHI')
But that doesn't seem to be a valid syntax.
Any help would be appreciated.
PS: Happy to take any advice on how to improve my current query. For example, I'm sure that having the search term twice is something that I should improve.

Since your approach isn't going to use an index lookup anyway, we might as well change the approach here.
We can start off by matching to all :table nodes, then OPTIONAL MATCH to all :column nodes with your CONTAINS predicate. When we count() the matches this will only include the column count where the CONTAINS check is true (in some cases it will be 0). So that gets our Column_Count correct.
Next we'll filter the results, only keeping rows where we found a positive Column_Count or where the CONTAINS check is true for the :table node.
MATCH (t:table)
OPTIONAL MATCH (t)-[:contains]->(c:column)
WHERE toLower(c.name) contains toLower('CHI')
WITH t, count(c) as Column_Count
WHERE Column_Count <> 0 OR toLower(t.name) contains toLower('CHI')
RETURN t.name as Table_Name, Column_Count

Related

Neo4j Cypher Aggregating Value Counts

I am returning date that looks like this:
"Jonathan" | "Chicago" | 6 | ["Hot","Warm","Cold","Cold","Cold","Warm"]
Where the third column is a count of the values in column 4.
I want to extract values out of the collection in column 4 and create new columns based on the values. My expected output would be:
Hot | Cold | Warm with the values 1 | 3 | 2 representing the counts of each value.
My current query is match (p)-[]->(c)-[]->(w) return distinct p.name, c.name, count(w), collect (w.weather)
I'd imagine this is simple, but i cant figure it out for the life of me.
Cypher does not have way to "pivot" data (as discussed here). That is in part because it does not support dynamically generating the names of return values (e.g., "Cold") -- and it is these names that appear as "column" headers in the Text and Table visualizations provided by the neo4j Browser.
However, if you know that you only have, say, 3 possible "weather" names, you can use a query like this, which hardcodes those names in the RETURN clause:
MATCH (c:City)-[:HAS_WEATHER]->(w:Weather)
WITH c, {weather: w.weather, count: COUNT(*)} AS weatherCount
WITH c, REDUCE(s = {Cold: 0, Warm: 0, Hot: 0}, x IN COLLECT(weatherCount) | apoc.map.setKey(s, x.weather, x.count)) AS counts
MATCH (p:Person)-[:LIVES_IN]->(c)
RETURN p.name AS pName, c.name AS cName, counts.Cold AS Cold, counts.Warm AS Warm, counts.Hot AS Hot
The above query efficiently gets the weather data for a city once (for all people in that city), instead of once per person.
The APOC function apoc.map.setKey is a convenient way to get a map with an updated key value.

I need to count the number of connection between two nodes with a certain property

My database contains informations about the nominations for the accademy awards.
I want to know how many directors have won an oscar for "best director" more than one time.
I can't quite get to the result that i want, a list of nominees.
The closest I've been is with this query:
MATCH (n:Nominee)-[n1:NOMINATED]->(c:Category)
WHERE c.name="Best Director" AND n1.win=true
RETURN count(n1.win), n.name
ORDER BY n.name;
wich returns the directors names and the number of times they won an oscar.
I tried to do something like
MATCH (n:Nominee)-[n1:NOMINATED]->(c:Category)
WHERE c.name="Best Director" AND n1.win=true AND count(n1.win)>1
RETURN n.name;
but got an error that says
Invalid use of aggregating function count(...) in this context (line
2, column 50 (offset: 96)) "WHERE c.name="Best Director" AND
n1.win=true AND count(n1.win)>1"
Can someone help me with this?
Use WITH to aggregate the wins first. According to the docs:
[...] WITH is used to introduce aggregates which can then by used in predicates in WHERE. These aggregate expressions create new bindings in the results. WITH can also, like RETURN, alias expressions that are introduced into the results using the aliases as binding name.
So a query like this should work:
MATCH (n:Nominee)-[n1:NOMINATED]->(c:Category)
WHERE c.name="Best Director" AND n1.win=true
WITH n, count(n1.win) AS winCount
WHERE winCount > 1
RETURN n.name;
See also the docs on WHERE:
WHERE adds constraints to the patterns in a MATCH or OPTIONAL MATCH clause or filters the results of a WITH clause.

Neo4j - Iterate for common nodes for a given list of nodes

I have no idea of iterating on list in neo4j. Please some one suggest the idea for the below problem.
Example:
I have some nodes in the graph.
Then, I will give few(always varying, this is the user input) keywords to search for nodes which are common to this words. In my graph each word is a node.
Ex: Input: [Best sports car]
output: connected nodes for Best are [samsung,porshe,ambassdor,protein,puma]
connected nodes for sports are [cricket,racing,rugby,puma,porshe]
connected nodes for car are [porshe,ambassdor,benz,audi]
Common nodes to all words are : [porshe]
Result is : porshe
I don't have any idea of iterating each word and storing the match results. Please some one suggest any idea.
In order to test the following working query, I'll make some assumptions :
The words nodes have the label :Word and the name property.
The porsche, puma, etc.. nodes have the label :Item and a name property.
Item nodes have an outgoing CONNECT relationships to Word nodes
Which will give the following graph :
The query is the following (in order to simulate the given words as parameters, I added a WITH containing the words list in the beginning of the query)
WITH ["car","best","sports"] as words
MATCH (n:Word)<-[:CONNECT]-(i:Item)
WHERE n.name IN words
WITH i, count(*) as c, words
WHERE c = size(words)
RETURN i
And will return only the porsche Item node.
Logic explanation
The logic of the query, is that if a node matches all given words, there will be 3 patterns to it found in the first MATCH, so the count(*) will have a value of 3 here for the porsche node.
This value is compared to the size of the words list.
More explanations
In the WITH statement, there is two expressions : i and count(*).
i is not an aggregate function, so it will act as a grouping key.
count(*) is an aggregate function and will run on the i bucket, calculating the aggregate values.
For example, if you want to know how many words each Item is matching you can simply do :
WITH ["car","best","sports"] as words
MATCH (n:Word)<-[:CONNECT]-(i:Item) WHERE n.name IN words
RETURN i.name, count(*)
Which will return this :
You can see that porsche is matching 3 words, which is the size of the given words list, then you can simply compare the 3 from the count aggregation to this size.
In order to fully understand how aggregation works, you can refer to the manual : http://neo4j.com/docs/stable/query-aggregation.html
You can test the query here :
http://console.neo4j.org/r/e6bee0
If you pass the words as parameters, this will then be the corresponding query :
MATCH (n:Word)<-[:CONNECT]-(i:Item)
WHERE n.name IN {words}
WITH i, count(*) as c
WHERE c = size({words})
RETURN i
assuming {words} is the name of the given query parameter
Is something like this what you are after?
Start with a collection of words form the requested search.
Match each word against the graph.
Collect the connected words in a list.
with ['Best', 'sports', 'car'] as word_coll
unwind word_coll as word
match (:Word {name: word})--(conn_word:Word)
return word,collect(conn_word)

Neo4j auto-index, legacy index and label schema: differences for a relative-to-a-node full-text search

this question is partially answered in
neo4j-legacy-indexes-and-auto-index-vs-new-label-bases-schema-indexes
and
the-difference-between-legacy-indexing-auto-indexing-and-the-new-indexing-approach
I can't comment on them yet and write a new thread here.
In my db, I have a legacy index 'topic' and label 'Topic'.
I know that:
a. pattern MATCH (n:Label) will scan the nodes;
b. pattern START (n:Index) will search on legacy index
c. auto-index is a sort of legacy index and should gimme same results as (b) but it does not in my case
d. START clause should be replaced by MATCH for "good practices".
I have inconsistent results between a. and b. (see below), cannot figure out how to use proper syntax with MATCH for searching on indexing insted of labels.
Here some examples:
1#
start n=node:topic('name:(keyword1 AND keyword2)') return n
6 rows, 3ms
start n=node:node_auto_index('name:(keyword1 AND keyword2)') return n;
0 rows
MATCH (n:Topic) where n.name =~ '(?i).*keyword1*.AND.*keyword2*.' return n;
0 rows, 10K ms
2#
start n=node:topic('name:(keyword1)') return n
212 rows, 122 ms [all coherent results containing substring keyword1]
start n=node:node_auto_index('name:(keyword1)') return n
0 rows
MATCH (n:Topic) where n.name =~ '(?i).*keyword1*.'return n
835 rows, 8K ms [also results not coherent, containing substring eyword]
MATCH (n:Topic) where n.name =~ 'keyword1' return n;
1 row, >6K ms [exact match]
MATCH (n:topic) where n.name =~ 'keyword1' return n;
no results (here I used an index 'topic' not a label 'Topic'!)
MATCH (node:topic) where node.name =~ 'keyword1' return node;
no results (attempt to use node "object" directly, as in auto-index syntax)
Could you help shed some light:
What's the difference between a legacy index and auto-index and why inconsistent results between the two?
How to use MATCH clause with Indexes rather than labels?
I want to reproduce results of full-text search.
Which syntax to do a full-text search applied to ONLY the neighbor of a node, not the full-db? MATCH ? START clause? legacy-index ? label? I am confused.
The auto index (there is only one) is a manual (aka legacy) index having the name node_auto_index. This special index tracks changes to the graph by hooking into the transaction processing. So if you declared name as part of your auto index for nodes in the config, any change to a node having a name property is reflected to that index.
Note that auto indexes do not automatically populate on an existing dataset when you add e.g. a new property for auto indexing.
Note further that manual or auto indexes are totally independent of labels.
The only way to query a manual or auto index is by using the START clause:
START n=node:<indexName>(<lucene query expression>) // index query
START n=node:<indexName>(key='<value>') // exact index lookup
Schema indexes are completely different and are used in MATCH when appropriate.
A blog post of mine covers all the index capabilities of neo4j.
In general you use an index in graph databases to identify the start points for traversals. Once you've got a reference inside the graph you just follow relationships and do no longer do index lookups.
For full text indexing, see another blog post.
updates based on commets below
In fact MATCH (p:Topic {name: 'DNA'}) RETURN p and MATCH (n:Topic) where n.name = 'DNA' return n are both equvalent. Both result in the same query plan. If there is a schema index on label Topic and property name (by CREATE INDEX ON :Topic(name)) Cypher will implicitly use the schema index to find the specified node(s).
At the moment you cannot use full text searches based on schema indexes. Full text is only available in manual / auto indexing.
All the example you've provided with START n=node:topic(...) rely on a manual index. It's your responsibility to keep them in sync with your graph contents, so I assume the differences are due to inconsistent modifications in the graph and not reflecting the change to the manual index.
In any case if you use START n=node:topic(....) will never use a schema index.

Querying multiple indexes not working if one condition fails in Neo4j

I am trying to search for a key word on all the indexes. I have in my graph database.
Below is the query:
start n=node:Users(Name="Hello"),
m=node:Location(LocationName="Hello")
return n,m
I am getting the nodes and if keyword "Hello" is present in both the indexes (Users and Location), and I do not get any results if keyword Hello is not present in any one of index.
Could you please let me know how to modify this cypher query so that I get results if "Hello" is present in any of the index keys (Name or LocationName).
In 2.0 you can use UNION and have two separate queries like so:
start n=node:Users(Name="Hello")
return n
UNION
start n=node:Location(LocationName="Hello")
return n;
The problem with the way you have the query written is the way it calculates a cartesian product of pairs between n and m, so if n or m aren't found, no results are found. If one n is found, and two ms are found, then you get 2 results (with a repeating n). Similar to how the FROM clause works in SQL. If you have an empty table called empty, and you do select * from x, empty; then you'll get 0 results, unless you do an outer join of some sort.
Unfortunately, it's somewhat difficult to do this in 1.9. I've tried many iterations of things like WITH collect(n) as n, etc., but it boils down to the cartesian product thing at some point, no matter what.

Resources