neo4j: how to efficiently search value in list? - neo4j

I need to build a graph db with massive amount of nodes and relations. every node should hold a list of string values and I need to be able to query all the nodes connected to a starting node, that have a given value in their list.
for example, I might have a node with a list of ["dog", "cat", "bird"], and I might need to query all nodes that have the value "dog" in their list.
now my question is this - what would be more efficient solution for that list in neo4j?
hold the values as an actual list, and search value inside that list during the query?
or...
instead using a list property, implement the list as separated properties and use HAS(n.property) to find all the nodes with a property?
other solution?
what would be the most efficient way (for lots of queries)?
thanks!

Implement list as seperate property, taht is the efficient way to handle large amount of data.
Then you can access data e.g
MATCH (n:node) where n.property = {SearchValue} return n;
Implementing list is not a good idea.

Related

how to remove property duplicates inside Neo4j

I'm new at Neo4j and i'm trying to learn it in order to start a project for my thesis, i spent a lot of time to manually reorganize a large dataset from xml to csv, then i imported a piece of this db inside Neo4j, i would avoid those duplicates but i canno't find a proper query, can you help me?
Thanks!
Please tell me if you need to know anything about this database or other stuff to undestand better the situation.
Here is one way to ensure that the dedicateePersonId list has distinct values:
MATCH (p:Person)
UNWIND p.dedicateePersonId AS id
WITH p, COLLECT(DISTINCT id) AS ids
SET p.dedicateePersonId = ids
[UPDATE]
The above approach is sufficient if there was only one property that needs adjustment but very inappropriate (too complex and inefficient) for multiple properties.
Here is a much better approach (using the functions apoc.create.setProperty and apoc.coll.toSet) to update the values of any number of property keys. keys is assumed to be parameter containing the names of the properties to be updated.
MATCH (p:Person)
UNWIND $keys AS k
CALL apoc.create.setProperty(p, k, apoc.coll.toSet(p[k])) YIELD node
RETURN DISTINCT node

neo4j: CYPHER query all properties of a node

We are evaluating Neo4J for future projects.  Currently just experimenting with learning Cypher and its capabilities.  But one thing that I think should be very straightforward has so far eluded me.  I want to be able to see all properties and their values for any given Node.  In SQL that would be something like:
select * from TableX where ID = 12345;
I have looked through the latest Neo4J docs and numerous Google searches but so far I am coming up empty.  I did find the keys() function that will return the property names in a string list, but that is marginally useful at best.  What I want is a query that will return prop names and the corresponding values like:
name     :  "Lebron"
city     :  "Cleveland"
college  :  "St. Vincent–St. Mary High School"
You may want to reread the Neo4j docs.
Returning the node itself will include the properties map for the node, which is typically the way you will get all properties (keys and values) for the node.
MATCH (n)
WHERE id(n) = 12345
RETURN n
If you explicitly just want the properties but without the metadata related to the node itself, returning properties(n) (assuming n is a node variable) will return the node's properties.
MATCH (n)
WHERE id(n) = 12345
RETURN properties(n) as props
With regard to how columns (variables) work, these are always explicit, so you don't have a way to dynamically get the columns corresponding to the properties of a node. You will instead need to use the approaches above, where the variable corresponds to either a node (where you can get at the properties map through the structure) or a properties map.
The main difference between this approach and select * in SQL is that Neo4j has no table schema, so you can use whatever properties you want on nodes of the same type, and those can differ between nodes of the same type, so there is no common structure to reference which will provide the properties for a node of a given label (you would need to scan all nodes of that label and accumulate the distinct properties to do so).

Neo4j ordered tree

We are working with a hierarchy tree structure where a parent has zero or more children, and a child has either one or zero parents. When we query for a list of direct children for a given parent the query returns the children in random order. We need the children to return in the order we define when we create or update the children.
I have added a relationship between children -[:Sibling]-> so the 'top' sibling has only an incoming :Sibling relationship, and the 'bottom' sibling has only an outgoing relationship.
Given this, is there a Cypher query to return the children in sibling order?
I have a query that returns each child, and its sibling, but now I have to write some code to return the list in the correct order.
An alternative approach might be to add a sort number to each child node. This would need to be updated for all the children if one of them changes order. This approach seems slightly foreign to the graph database concept.
If this problem has been encountered before, is there a standard algorithm for solving it programatically?
Update1
sample data as requested by Bruno
(parent1)
(child1)-[:ChildOf]->(parent1)
(child2)-[:ChildOf]->(parent1) (child2)-[:Sibling]->(child1)
(child3)-[:ChildOf]->(parent1) (child3)-[:Sibling]->(child2)
is there a cypher query to return child1, child2, child3 in that order?
if not, then the ordering can be done programatically
using properties instead of relationships
(parent1)
(child1)-[:ChildOf]->(parent1) (child1:{order:1})
(child2)-[:ChildOf]->(parent1) (child2:{order:2})
(child3)-[:ChildOf]->(parent1) (child3:{order:3})
`match (c)-[:ChildOf]->(parent1) return c ordered by c:order`
I do not expect that there is a cypher query that can update the order of children.
Update2
I have now arrived at the following query which returns children in the right order
`match (firstChild)-[:FirstChildOf]->(parent) match (sibling)-[:Sibling*]->(firstChild) return firstChild,sibling`
This query depends on adding a -[:FirstChildOf]->(parent) relationship.
If I don't hear otherwise I'll set this to the answer.
Shall I assume there is no cypher query for inserting a node into an ordered list?
When you create your graph model you should concentrate on the questions you want to answered with your data model.
If you want to get child of a parent ordered by a "create or update" property then you should store it since the relationship in general does not represent an order.
It is not foreign to the graph database concept, since the graph databases use properties. It is not always easy task to decide
to store something as a relationship or a property. It is all about the proper modelling.
If you have the concept of '[:NEXT_SIBLING]' or similar, then it will be a pain in the back, when you have to remove a node. So I should use it when this state is constant. For example in a timetree, where the days are after each other and it does not change.
In general, if the creation order should be used, I should use timestamps like this:
create (n:Node {created:timestamp()});
Then you can use the timestamp to order.
match (p:Parent)-[:HAS_CHILDREN]->(n:Child) where p.name='xy' return n order by n.created;
And you can use timestamps for relationships too.
Similar question is here:
Modeling an ordered tree with neo4j
Here are some tips I use for dealing with ordered lists of nodes in Neo4J.
Reverse the relation direction to (child1)<-[:HasChild]-(parent1). (mostly just logical reinforcement of the next items, since a "parent has list of children")
Add the property index to HasChild. This will let you do WITH child, hasChild.index as sid ORDER BY sid ASC for sibling order. (This is how I maintain arbitrary order information on lists of children) This is on the relationship because it assumes that a node can be part of more than one ordered list.
You can use SIZE(shortestpath((root)-[*]->(child1)) as depth to then order them by depth from a root.
Since this is for arbitrary order, you must update all the indexes to update the order (You can do something like WITH COLLECT(child) as children, FILTER(c IN COLLECT(child) WHERE c.index >=3) as subset FOREACH (c IN subset| SET c.index+=1) for a basic insert, otherwise you will have to just rewrite them all to arbitrarily change the order.
If you don't actually care about the order, just that it is consistent, you can use WITH child, ID(child) as sid ORDER BY sid ASC. This essentially is "Order by node age"
One other option is to use a meta relationship. So :HasChild would act as a list of nodes, and then something like :NextMember would tell you what the next item from this one is. This is more flexible, but in my opinion harder to work with (You need to case for doesn't have next, to get the right order you have to do a 'trace' on the nodes, doesn't work if you want to add this node to another ordered list later, ect.)
Of course, if the order isn't arbitrary (based on name or age or something), than it is much better to just sort on the non-arbitrary logic.
The query for returning the children in the right order is
match (firstChild)-[:FirstChildOf]->(parent) match (sibling)-[:Sibling*]->(firstChild) return firstChild,sibling

Organize alternative names (nicknames, aliases) in neo4j

Say you have some nodes in your model that may go by multiple alternative names, but all the names refer to the same object.
For example, you may want to be able to query the "World" node by using name "World" in one context, whereas in different context you want to find the same node quickly also by the name "Global".
Is it optimal to organize this information in the form of string array property aliases like this? :
If you add World to your aliases you can use the legacy node_auto_index to index that aliases field
which will index each value individually and the query it with
Start n=node:node_auto_index(aliases="Global")
return n
I think you could use Lucene for that.
You could index the same property several times with different names.
You can then query the index in the way you want through Java APIs or Cypher.
For instance:
START n = node:myIndex(myProperty="ALIAS_1"),
m = node:myIndex(myProperty="ALIAS_2")
[...]

Neo4j: Java API to compute intersection multiple properties

I'm very new in using Neo4j and have a question regarding the computation of intersections of nodes.
Let's suppose, I have the three properties A,B,C and I want to select only the nodes that have all three properties.
I created an index for the properties and thus, I can get all nodes having one of the properties. However, afterwards I have to merge the IndexHits. Is there a way to select directly all nodes having the three properties?
My second idea was to create a node for each property and connect other nodes by relationships. I can then iterate over all relationships and get for each property a list of nodes which are connected. But again, I have to compute the intersection afterwards.
Is there a function I miss here, since I suppose it's a standard problem.
Thanks a lot,
Benny
Do you also have the values you look for? You would start with the property that limits the amount of found nodes most.
MATCH (a:Label {property1:{value1}})
WHERE a.property2 = {value2} AND a.property3 = {value3}
RETURN a
For the Java API and lucene indexes:
gdb.index().forNodes("foo").query("p1:value1 p2:value2 p3:value3")
Lucene query syntax

Resources