identifying node types - neo4j

I have a number of nodes of different types by that I mean nodes that have different properties on them. For instance, I have a number of nodes that all they have is a property of fileName and uploadDate. If I want to check against all file names do I just need to do
START n=node(*) WHERE has(n.File) RETURN n;
Is this the best practice (i.e. querying a flattened out database). Thanks!

Your query scans all nodes, this will become slower as your data set grows.
For identifying nodes of a certain type, there are two common approaches:
Type attribute
Set a property named 'type' (or '_type_' f.e. if you like to mark it as a system property) with the value describing your type, e.g. 'File'.
Then you can lookup nodes through an index like that:
start n=node:node_auto_index(type='File') return n;
Type nodes
Connect nodes of a certain type to 'type' node and query over relationships:
start type_node=node:node_auto_index(name='File')
match type_node<-[:IS_A]-file
return file;
(The Beer Graph on this page http://www.neo4j.org/learn/try is an example for this.)

Related

Get all nodes with a specific type of relationship to a root node

I have a rather large and complex graph in Neo4j (millions of nodes and relationships in various types), I want to get all child nodes (in all depths) of a specific root node, but only with a specific type of relationship
I have tried: Match (n:NODE_TYPE)-[*:REL_TYPE]->(r:NODE_TYPE {id:SPECIFIC_ID}) return n
But I get a syntax error for specifying a label on the relationship
Querying the whole graph takes a really long time without specifying the relationship type, and nodes could go through paths that will eventually lead to the root node but will use other types of relationships (which is not good for my use case)
you need to change the order of the rel type and wildcard operator:
Match (n:NODE_TYPE)-[:REL_TYPE*]->(r:NODE_TYPE {id:SPECIFIC_ID})
return n

neo4j: CYPHER query all properties of a node

We are evaluating Neo4J for future projects.  Currently just experimenting with learning Cypher and its capabilities.  But one thing that I think should be very straightforward has so far eluded me.  I want to be able to see all properties and their values for any given Node.  In SQL that would be something like:
select * from TableX where ID = 12345;
I have looked through the latest Neo4J docs and numerous Google searches but so far I am coming up empty.  I did find the keys() function that will return the property names in a string list, but that is marginally useful at best.  What I want is a query that will return prop names and the corresponding values like:
name     :  "Lebron"
city     :  "Cleveland"
college  :  "St. Vincent–St. Mary High School"
You may want to reread the Neo4j docs.
Returning the node itself will include the properties map for the node, which is typically the way you will get all properties (keys and values) for the node.
MATCH (n)
WHERE id(n) = 12345
RETURN n
If you explicitly just want the properties but without the metadata related to the node itself, returning properties(n) (assuming n is a node variable) will return the node's properties.
MATCH (n)
WHERE id(n) = 12345
RETURN properties(n) as props
With regard to how columns (variables) work, these are always explicit, so you don't have a way to dynamically get the columns corresponding to the properties of a node. You will instead need to use the approaches above, where the variable corresponds to either a node (where you can get at the properties map through the structure) or a properties map.
The main difference between this approach and select * in SQL is that Neo4j has no table schema, so you can use whatever properties you want on nodes of the same type, and those can differ between nodes of the same type, so there is no common structure to reference which will provide the properties for a node of a given label (you would need to scan all nodes of that label and accumulate the distinct properties to do so).

Do labels and properties have an id value in Neo4j?

I know that nodes and relationships have an integral value that identifies them. Is the same true of labels and/or properties? Is it sufficient to identify a node labeling by giving a node id and a string? That is, is it possible to assign the same label to the same node/relationship more than once?
All nodes and relationships have IDs and can be looked up by their IDs. You can do it like this:
MATCH (n) WHERE id(n) = 5 RETURN n;
IDs should be thought of as an internal implementation detail; don't rely on them to have any particular value or to be consistent or ordered, just rely on them to uniquely identify each node.
In general, IMHO it's good practice to assign your own meaningful identifier to nodes, probably indexed, that you can use to find your nodes, in the same way you'd assign a primary key to a relational database record.
It is possible to assign a label to more than one node; labels should be thought of more as classes of nodes, sort of like entities in an ERD. Typically labels would be things like Person, Company, Job, etc. Labels don't have much to do with identifiers though.
Do labels and properties have an id value in Neo4j?
No
Is it sufficient to identify a node labeling by giving a node id and a string?
If by this you mean that you have the node id and a label value in a String variable then yes, it is. It is also sufficient to just use the ID.
MATCH (n) WHERE ID(n) = 1234 RETURN n
Better:
MATCH (n:YourLabel) WHERE ID(n) = 1234 RETURN n
As FrobberOfBits intimated it is also preferable to ignore (withint reason) the internal IDs that Neo attaches to Nodes/Relationships in your external interactions. This is in part to do with Neo recycling it's internal identifiers (So if you create a Node it gets assigned ID 1, delete that Node, the next created Node could be assigned ID 1), and inpart to do with exposing the internals of the system. Instead you should probably attach meaningful identifiers or UUIDS where required.
Using your own identifier:
MATCH (n:YourLabel{uid:1234}) RETURN n
To make lookups fast, index them:
CREATE INDEX ON :YourLabel(uid)
Is it possible to assign the same label to the same node/relationship more than once?
Nodes can have as many labels as you want to assign them (including none), which is handy when you want to maintain a hierarchy of Node "types". Having a label makes them faster to lookup as Neo has a hint of where to start.
Relationships can only have a single type and that type is immutable. i.e If you want to change from type HAS_A to HAD_A you cannot just change the type, you must delete and re-add the relationship.
A node can be related to another node as many times as you want, using the same relationship type or different relationship types. Performancewise it is better to have different relationship types than to use properties on relationships as the lookup is faster.
CREATE (p:Person{name:"Dave"}), (m:Pet), (d:Pet),
(p)-[:HAS_PET{type:"Cat"}]->(m),
(p)-[:HAS_PET{type:"Dog"}]->(d)
Is fine, as is:
REATE (p:Person{name:"Dave"}), (m:Pet), (d:Pet),
(p)-[:HAS_CAT]->(m),
(p)-[:HAS_DOG]->(d)
Which is now faster if you wanted to do a query for matching all dogs. If Dave's dog gets reassigned you cannot do:
MATCH (p:Person{name:"Dave"})-[rel:HAS_DOG]->()
SET rel:HAS_CAT
But you could do:
MATCH (p:Person{name:"Dave"})<-[rel:HAS_PET{type:"Dog"}]-()
SET rel.type = "Cat"
But I've probably missed the point of the multiple-assignment question.
Not sure what you're aiming for:
yes, property-names, rel-types and labels have internal id's they are not stored as strings
you can identify nodes by label + property + value, one if you have a uniqueness constraint otherwise multiple
you can identify nodes by label (many)

How to find distinct nodes in a Neo4j/Cypher query

I'm trying to do some pattern matching in neo4j/cypher and I came across this issue:
There are two types of graphs I want to search for:
Star graphs: A graph with one center node and multiple outgoing relationships.
n-length line graphs: A line graph with length n where none of the nodes are repeats (I have some bidirectional edges and cycles in my graph)
So the main problem is that when I do something such as:
MATCH a-->b, a-->c, a-->d
MATCH a-->b-->c-->d
Cypher doesn't guarantee (when I tried it) that a, b, c, and d are all different nodes. For small graphs, this can easily be fixed with
WHERE not(a=b) AND not(a=c) AND ...
But I'm trying to have graphs of size 10+, so checking equality between all nodes isn't a viable option. Afaik, RETURN DISTINCT does not work as well since it doesn't check equality among variables, only across different rows. Is there any simple way I can specify the query to make the differently named nodes distinct?
Old question, but look to APOC Path Expander procedures for how to address these kinds of use cases, as you can change the traversal uniqueness behavior for expansion (the same way you can when using the traversal API...which these procedures use).
Cypher implicitly uses RELATIONSHIP_PATH uniqueness, meaning that per path returned, a relationship must be unique, it cannot be used multiple times in a single path.
While this is good for queries where you need all possible paths, it's not a good fit for queries where you want distinct nodes or a subgraph or to prevent repeating nodes in a path.
For an n-length path, let's say depth 6 with only outgoing relationships of any type, we can change the uniqueness to NODE_PATH, where a node must be unique per path, no repeats in a path:
MATCH (n)
WHERE id(n) = 12345
CALL apoc.path.expandConfig(n, {maxLevel:6, uniqueness:'NODE_PATH'}) YIELD path
RETURN path
If you want all reachable nodes up to a certain depth (or at any depth by omitting maxLevel), you can use NODE_GLOBAL uniqueness, or instead just use apoc.path.subgraphNodes():
MATCH (n)
WHERE id(n) = 12345
CALL apoc.path.subgraphNodes(n, {maxLevel:6}) YIELD node
RETURN node
NODE_GLOBAL uniqueness means that across all paths that a node must be unique, it will only be visited once, and there will only be one path to a node from a given start node. This keeps the number of paths that need to be evaluated down significantly, but because of this behavior not all relationships will be traversed, if they expand to a node already visited.
You will not get relationships back with this procedure (you can use apoc.path.spanningTree() for that, although as previously mentioned not all relationships will be included, as we will only capture a single path to each node, not all possible paths to nodes). If you want all nodes up to a max level and all possible relationships between those nodes, then use apoc.path.subgraphAll():
MATCH (n)
WHERE id(n) = 12345
CALL apoc.path.subgraphAll(n, {maxLevel:6}) YIELD nodes, relationships
RETURN nodes, relationships
Richer options exist for label and relationship filtering, or filtering (whitelist, blacklist, endnode, terminator node) based on lists of pre-matched nodes.
We also support repeating sequences of relationships or node labels.
If you need filtering by node or relationship properties during expansion, then this won't be a good option as that feature is yet supported.

Find all sub-graphes containing at least one node having a certain property

My graph is composed of multiple "sub-graphes" that are disconnected from one another. These sub-graphes are composed of nodes that are connected with a given relation type.
I would like to get (for example) the list of sub-graphes that contain at least one node that has the property "name" equals "John".
It's equivalent to finding one node per subgraph having this property.
One solution would be to find all the nodes having this property and loop through this list to only pick the ones that are not connected to the previously picked ones. But that would be ugly and quite heavy. Is there an elegant way to do that with Cypher?
I'm trying with something along this direction but have no success so far:
START source=node:user('name:"John"')
MATCH source-[r?:KNOWS*]-target
WHERE r is null
RETURN source
Try this one it may help
START source=node:user('name:"John"')
MATCH source-[r:KNOWS]-()-[r2:KNOWS]-target
WHERE NOT(source-[r:KNOWS]-target)
RETURN target

Resources