This is an extension of the following question:
I have a data lineage related graph in Neo4J with variable length path containing intermediate nodes (tables) as an array:
match p=(s)-[r:airflow_loads_to*]->(t)
where s.database_name='hive'
and s.schema_name='test'
and s.name="source_table"
return s.name, [n in nodes(p) | n.name] as arrayOfName,t.name
Now, this resultset contains loops that I want to omit. I can 'remove' these loops without the flattening ArrayOfName result by running:
match p=(s)-[r:airflow_loads_to*]->(t)
where s.database_name='hive'
and s.schema_name='test'
and s.name="source_table"
return s.name, min(size(r)) as pathlen, t.name
order by pathlen
However, when I add ArrayOfName back, the resultset contains all rows again.
I guess this means chaining the results somehow, using the pathlen as a filter or preventing that loops exist in the path p at all.
But I am stuck on how to accomplish this...
did you try
p=shortestPath((s)-[r:airflow_loads_to*]->(t))
because it seems that is what you need
Related
match(m:master_node:Application)-[r]-(k:master_node:Server)-[r1]-(n:master_node)
where (m.name contains '' and (n:master_node:DeploymentUnit or n:master_node:Schema))
return distinct m.name,n.name
Hi,I am trying to get total number of records for the above query.How I change the query using count function to get the record count directly.
Thanks in advance
The following query uses the aggregating funtion COUNT. Distinct pairs of m.name, n.name values are used as the "grouping keys".
MATCH (m:master_node:Application)--(:master_node:Server)--(n:master_node)
WHERE EXISTS(m.name) AND (n:DeploymentUnit OR n:Schema)
RETURN m.name, n.name, COUNT(*) AS cnt
I assume that m.name contains '' in your query was an attempt to test for the existence of m.name. This query uses the EXISTS() function to test that more efficiently.
[UPDATE]
To determine the number of distinct n and m pairs in the DB (instead of the number of times each pair appears in the DB):
MATCH (m:master_node:Application)--(:master_node:Server)--(n:master_node)
WHERE EXISTS(m.name) AND (n:DeploymentUnit OR n:Schema)
WITH DISTINCT m.name AS n1, n.name AS n2
RETURN COUNT(*) AS cnt
Some things to consider for speeding up the query even further:
Remove unnecessary label tests from the MATCH pattern. For example, can we omit the master_node label test from any nodes? In fact, can we omit all label testing for any nodes without affecting the validity of the result? (You will likely need a label on at least one node, though, to avoid scanning all nodes when kicking off the query.)
Can you add a direction to each relationship (to avoid having to traverse relationships in both directions)?
Specify the relationship types in the MATCH pattern. This will filter out unwanted paths earlier. Once you do so, you may also be able to remove some node labels from the pattern as long as you can still get the same result.
Use the PROFILE clause to evaluate the number of DB hits needed by different Cypher queries.
You can find examples of how to use count in the Neo4j docs here
In your case the first example where:
count(*)
Is used to return a count of each returned item should work.
I've got a list of id's and a list of values. I want to catch each node with the id and set a property by the value.
With just one Node that is super basic:
MATCH (n) WHERE n.id='node1' SET n.name='value1'
But i have a list of id's ['node1', 'node2', 'node3'] and same amount of values ['value1', 'value2', 'value3'] (For simplicity i used a pattern but values and id's vary a lot). My first approach was to use the query above and just call the database each time. But nowadays this isn't appropriate since i got thousand of id's which would result in thousand of requests.
I came up with this approach that I iterate over each entry in both lists and set the values. The first node from the node list has to get the first value from the value list and so on.
MATCH (n) WHERE n.id IN["node1", "node2"]
WITH n, COLLECT(n) as nodeList, COLLECT(["value1","value2"]) as valueList
UNWIND nodeList as nodes
UNWIND valueList as values
FOREACH (index IN RANGE(0, size(nodeList)) | SET nodes.name=values[index])
RETURN nodes, values
The problem with this query is that every node gets the same value (the last of the value list). The reason is in the last part SET nodes.name=values[index] I can't use the index on the left side nodes[index].name - doesn't work and the database throws error if i would do so. I tried to do it with the nodeList, node and n. Nothing worked out well. I'm not sure if this is the right way to achieve the goal maybe there is a more elegant way.
Create pairs from the ids and values first, then use UNWIND and simple MATCH .. SET query:
// THe first line will likely come from parameters instead
WITH ['node1', 'node2', 'node3'] AS ids,['value1', 'value2', 'value3'] AS values
WITH [i in range(0, size(ids)) | {id:ids[i], value:values[i]}] as pairs
UNWIND pairs AS pair
MATCH (n:Node) WHERE n.id = pair.id
SET n.value = pair.value
The line
WITH [i in range(0, size(ids)) | {id:ids[i], value:values[i]}] as pairs
combines two concepts - list comprehensions and maps. Using the list comprehension (with omitted WHERE clause) it converts list of indexes into a list of maps with id,value keys.
I have the following cypher code:
MATCH (n)
WHERE toLower(n.name) STARTS WITH toLower('ja')
RETURN n
This case-insensitive query returns all the nodes which their names start with the substring "ja". For example if I execute this in my db it will return ["Javier", "Jacinto", "Jasper", "Jacob"]
I need this script to also remove the unwanted nodes on this list, for example let's say that an array containing ["Jasper, Javier"] is sent to the data access layer indicating that those two nodes shouldn't be returned, leaving the final query result as follows: ["Jacinto", "Jacob"]
How can I perform this?
If you know before making the query which items should be excluded you can say:
MATCH (n)
WHERE toLower(n.name) STARTS WITH toLower('ja')
AND NOT (toLower(n.name) IN ['jasper', 'javier'])
RETURN n
New to Neo4j and I've tried various approaches with both Cypher and APOC based on similar questions asked on here but no luck since none are sort-tree combos. Let's say I have some hierarchical data that is just a bunch of root nodes with children and all nodes have the property 'name'. As of right now (with query below), when displaying the result by name, my data resembles:
zxcv
rtyu
asdf
qwer
wert
bbbb
fghj
yuio
What can I add to the following query that will sort by name such that the entire tree is sorted?
MATCH p=(n:Code)-[:HAS_CHILD*]->(m)
WHERE NOT ()-[:HAS_CHILD]->(n)
WITH COLLECT(p) AS ps
CALL apoc.convert.toTree(ps) yield value
RETURN value
If I tack on SORT BY value.name it only sorts the root nodes since that is what is at the top of what is returned. Can't figure out how/where to sort the nodes before collecting, or when I did then I couldn't figure out how to make the collection afterward. Happy to use apoc.coll.sortNodes, apoc.coll.sort, etc., but can't quite figure out how to work them in.
Update
This gets a little closer but still only sorting the root
MATCH (n:Code)-[:HAS_CHILD*]->(m)
WHERE NOT ()-[:HAS_CHILD]->(n)
WITH n
ORDER BY n.name
MATCH p=(n:Code)-[:HAS_CHILD*]->(m)
WHERE NOT ()-[:HAS_CHILD]->(n)
WITH COLLECT (p) AS ps
CALL apoc.convert.toTree(ps) yield value
RETURN value
MATCH (n:Code)-[:HAS_CHILD*]->(m)
WHERE NOT (m)-[:HAS_CHILD]->()
WITH n, m
ORDER BY n.name, m.name
MATCH p=(n:Code)-[:HAS_CHILD*]->(m)
WHERE NOT ()-[:HAS_CHILD]->(n)
WITH COLLECT(p) AS ps
CALL apoc.convert.toTree(ps) yield value
RETURN value
I am trying to search for a key word on all the indexes. I have in my graph database.
Below is the query:
start n=node:Users(Name="Hello"),
m=node:Location(LocationName="Hello")
return n,m
I am getting the nodes and if keyword "Hello" is present in both the indexes (Users and Location), and I do not get any results if keyword Hello is not present in any one of index.
Could you please let me know how to modify this cypher query so that I get results if "Hello" is present in any of the index keys (Name or LocationName).
In 2.0 you can use UNION and have two separate queries like so:
start n=node:Users(Name="Hello")
return n
UNION
start n=node:Location(LocationName="Hello")
return n;
The problem with the way you have the query written is the way it calculates a cartesian product of pairs between n and m, so if n or m aren't found, no results are found. If one n is found, and two ms are found, then you get 2 results (with a repeating n). Similar to how the FROM clause works in SQL. If you have an empty table called empty, and you do select * from x, empty; then you'll get 0 results, unless you do an outer join of some sort.
Unfortunately, it's somewhat difficult to do this in 1.9. I've tried many iterations of things like WITH collect(n) as n, etc., but it boils down to the cartesian product thing at some point, no matter what.