How could i use this SQL on cypher(neo4j) - neo4j

hi how can i transform this SQL Query as CYPHER Query ? :
SELECT n.enginetype, n.Rocket20, n.Yearlong, n.DistanceOn,
FROM TIMETAB AS n
JOIN PLANEAIR AS p ON (n.tailnum = p.tailNum)
If it is requisition before using that query to create any relationship or antyhing please write and help with that one too.. thanks

Here's a good guide for comparing SQL with Cypher and showing the equivalent Cypher for some SQL queries.
If we were to translate this directly, we'd use :PLANEAIR and :TIMETAB node labels (though I'd recommend using better names for these), and we'll need a relationship between them. Let's call it :RELATION.
Joins in SQL tend to be replaced with relationships between nodes, so we'll need to create these patterns in your graph:
(:PLANEAIR)-[:RELATION]->(:TIMETAB)
There are several ways to get your data into the graph, usually through LOAD CSV. The general approach is to MERGE your :PLANEAIR and :TIMETAB nodes with some id or unique property (maybe TailNum?, use ON CREATE SET ... after the MERGE to add the rest of the properties to the node when it's created, and then MERGE the relationship between the nodes.
The MERGE section of the developers manual should be helpful here, though I'd recommend reading through the entire dev manual anyway.
With this in place, the Cypher equivalent query is:
MATCH (p:PLANEAIR)-[:RELATION]->(n:TIMETAB)
RETURN n.Rocket20,p.enginetype, n.year, n.distance
Now this is just a literal translation of your SQL query. You may want to reconsider your model, however, as I'm not sure how much value there is in keeping time-related data for a plane separate from its node. You may just want to have all of the :TIMETAB properties on the :PLANEAIR node and do away with the :TIMETAB nodes completely. Of course your queries and use cases should guide how to model that data best.
EDIT
As far as creating the relationship between :PLANEAIR and :TIMETAB nodes (and again, I recommend using better labels for these, and maybe even keeping all time-related properties on a :Plane node instead of a separate one), provided you already have those nodes created, you'll need to do a joining match, but it will help to have a unique constraints on :PLANEAIR(tailnum) :TIMETAB(tailNum) (or an index, if this isn't supposed to be a unique property):
CREATE CONSTRAINT ON (p:PLANEAIR)
ASSERT p.tailNum IS UNIQUE
CREATE CONSTRAINT ON (n:TIMETAB)
ASSERT n.TailNum IS UNIQUE
Now we're ready to create the relationships
MATCH (p:PLANEAIR)
MATCH (n:TIMETAB)
WHERE p.tailNum = n.tailNum
CREATE (p)-[:RELATION]->(n)
REMOVE n.tailNum
Now that the relationships are created, and :TIMETAB tailNum property removed, we can drop the unique constraint on :TIMETAB(tailNum), since the relationship to :PLANEAIR is all we need.
DROP CONSTRAINT ON (n:TIMETAB)
ASSERT n.tailNum IS UNIQUE

Related

How to move all neo4j relationships with all their labels and properties from one node to another?

Suppose you've got two nodes that represent the same thing, and you want to merge those two nodes. Both nodes can have any number of relations with other nodes.
The basics are fairly easy, and would look something like this:
MATCH (a), (b) WHERE a.id == b.id
MATCH (b)-[r]->()
CREATE (a)-[s]->()
SET s = PROPERTIES(r)
DELETE DETACH b
Only I can't create a relation without a type. And Cypher doesn't support variable labels either. I'd love to be able to do something like
CREATE (a)-[s:{LABELS(r)}]->(o)
but that doesn't work. To create the relation, you need to know the type of the relation, and in this case I really don't.
Is there a way to dynamically assign types to relationships, or am I going to have to query the types of the old relation, and then string concat new queries with the proper types? That's not impossible, but a lot slower and more complex. And this could potentially match a lot of elements and even more relationships, so having to generate a separate query for every instance is going to slow things down quite a lot.
Or is there a way to change the target of the old relationship? That would probably be the fastest, but I'm not aware of any way to do that.
I think you need to take a look at APOC, especially apoc.create.relationship which enable creating relationships with dynamic type.
Adapting your example, you should end up with something along the line of (not tested):
MATCH (a), (b) WHERE a.id == b.id
MATCH (b)-[r]->(n)
CALL apoc.create.relationship(a, type(r), properties(r), n)
DETACH DELETE b
NB
relationships have TYPE and not label
the proper cypher statement to delete relationships attached to a node and the node itself is DETACH DELETE (and not DELETE DETACH)
Related resource: https://markhneedham.com/blog/2016/10/30/neo4j-create-dynamic-relationship-type/
The APOC procedure apoc.refactor.mergeNodes should be very helpful. That procedure is very powerful, and you need to read the documentation to understand how to configure it to do what you want in your specific situation.
Here is a simple example that shows how to use the procedure's default configuration to merge nodes with the same id:
MATCH (node:Foo)
WITH node.id AS id, COLLECT(node) AS nodes
WHERE SIZE(nodes) > 1
CALL apoc.refactor.mergeNodes(nodes, {}) YIELD node
RETURN node
In this example, I specified an arbitrary Foo label to avoid accidentally merging unwanted nodes. Doing so also helps to speed up the query if you have a lot of nodes with other labels (since they will not need to be scanned for the id property).
The aggregating function COLLECT is used to collect a list of all the nodes with the same id. After checking the size of the list, it is passed to the procedure.

Neo4j ordered tree

We are working with a hierarchy tree structure where a parent has zero or more children, and a child has either one or zero parents. When we query for a list of direct children for a given parent the query returns the children in random order. We need the children to return in the order we define when we create or update the children.
I have added a relationship between children -[:Sibling]-> so the 'top' sibling has only an incoming :Sibling relationship, and the 'bottom' sibling has only an outgoing relationship.
Given this, is there a Cypher query to return the children in sibling order?
I have a query that returns each child, and its sibling, but now I have to write some code to return the list in the correct order.
An alternative approach might be to add a sort number to each child node. This would need to be updated for all the children if one of them changes order. This approach seems slightly foreign to the graph database concept.
If this problem has been encountered before, is there a standard algorithm for solving it programatically?
Update1
sample data as requested by Bruno
(parent1)
(child1)-[:ChildOf]->(parent1)
(child2)-[:ChildOf]->(parent1) (child2)-[:Sibling]->(child1)
(child3)-[:ChildOf]->(parent1) (child3)-[:Sibling]->(child2)
is there a cypher query to return child1, child2, child3 in that order?
if not, then the ordering can be done programatically
using properties instead of relationships
(parent1)
(child1)-[:ChildOf]->(parent1) (child1:{order:1})
(child2)-[:ChildOf]->(parent1) (child2:{order:2})
(child3)-[:ChildOf]->(parent1) (child3:{order:3})
`match (c)-[:ChildOf]->(parent1) return c ordered by c:order`
I do not expect that there is a cypher query that can update the order of children.
Update2
I have now arrived at the following query which returns children in the right order
`match (firstChild)-[:FirstChildOf]->(parent) match (sibling)-[:Sibling*]->(firstChild) return firstChild,sibling`
This query depends on adding a -[:FirstChildOf]->(parent) relationship.
If I don't hear otherwise I'll set this to the answer.
Shall I assume there is no cypher query for inserting a node into an ordered list?
When you create your graph model you should concentrate on the questions you want to answered with your data model.
If you want to get child of a parent ordered by a "create or update" property then you should store it since the relationship in general does not represent an order.
It is not foreign to the graph database concept, since the graph databases use properties. It is not always easy task to decide
to store something as a relationship or a property. It is all about the proper modelling.
If you have the concept of '[:NEXT_SIBLING]' or similar, then it will be a pain in the back, when you have to remove a node. So I should use it when this state is constant. For example in a timetree, where the days are after each other and it does not change.
In general, if the creation order should be used, I should use timestamps like this:
create (n:Node {created:timestamp()});
Then you can use the timestamp to order.
match (p:Parent)-[:HAS_CHILDREN]->(n:Child) where p.name='xy' return n order by n.created;
And you can use timestamps for relationships too.
Similar question is here:
Modeling an ordered tree with neo4j
Here are some tips I use for dealing with ordered lists of nodes in Neo4J.
Reverse the relation direction to (child1)<-[:HasChild]-(parent1). (mostly just logical reinforcement of the next items, since a "parent has list of children")
Add the property index to HasChild. This will let you do WITH child, hasChild.index as sid ORDER BY sid ASC for sibling order. (This is how I maintain arbitrary order information on lists of children) This is on the relationship because it assumes that a node can be part of more than one ordered list.
You can use SIZE(shortestpath((root)-[*]->(child1)) as depth to then order them by depth from a root.
Since this is for arbitrary order, you must update all the indexes to update the order (You can do something like WITH COLLECT(child) as children, FILTER(c IN COLLECT(child) WHERE c.index >=3) as subset FOREACH (c IN subset| SET c.index+=1) for a basic insert, otherwise you will have to just rewrite them all to arbitrarily change the order.
If you don't actually care about the order, just that it is consistent, you can use WITH child, ID(child) as sid ORDER BY sid ASC. This essentially is "Order by node age"
One other option is to use a meta relationship. So :HasChild would act as a list of nodes, and then something like :NextMember would tell you what the next item from this one is. This is more flexible, but in my opinion harder to work with (You need to case for doesn't have next, to get the right order you have to do a 'trace' on the nodes, doesn't work if you want to add this node to another ordered list later, ect.)
Of course, if the order isn't arbitrary (based on name or age or something), than it is much better to just sort on the non-arbitrary logic.
The query for returning the children in the right order is
match (firstChild)-[:FirstChildOf]->(parent) match (sibling)-[:Sibling*]->(firstChild) return firstChild,sibling

Cypher: Create relationships between nodes based on a common property key id

I'm brand new to Cypher (and Stackoverflow) and am having trouble creating relationships between nodes based on share property keys.
I would like to do something like this:
MATCH (a:Person)-->()<--(b:Country)
WHERE HAS (a.id) AND HAS (b.id) AND a.id=b.id
CREATE (a)-[:LIVES]->(b);
to create a relationship between Country node and Person nodes where they share the same id.
The above creates no errors when run but doesn't create any relationships either and I know that the ids should match.
Many thanks!!
EDIT:
I think I know what is going wrong - I'm asking to match nodes that have a relationship to eachother but no relationships are set up yet hence 0 results. I have now tried:
MATCH (a:Person),
(b:Country)
WHERE HAS (a.id) AND HAS (b.id) AND a.id=b.id
CREATE (a)-[:LIVES]->(b);
and the query is running. It's a big data set so might take a while......
That worked. Had to reduce the size of my data set (down from 64k nodes) as Neo4j was taking way too long to process but once I had a smaller set it worked fine.
One minor addition for future Googlers.
per the help files as of version 3.4
The has() function has been superseded by exists() and has been removed.
The new code should read
MATCH (a:Person),
(b:Country)
WHERE EXISTS (a.id) AND EXISTS (b.id) AND a.id=b.id
CREATE (a)-[:LIVES]->(b);

Is a DFS Cypher Query possible?

My database contains about 300k nodes and 350k relationships.
My current query is:
start n=node(3) match p=(n)-[r:move*1..2]->(m) where all(r2 in relationships(p) where r2.GameID = STR(id(n))) return m;
The nodes touched in this query are all of the same kind, they are different positions in a game. Each of the relationships contains a property "GameID", which is used to identify the right relationship if you want to pass the graph via a path. So if you start traversing the graph at a node and follow the relationship with the right GameID, there won't be another path starting at the first node with a relationship that fits the GameID.
There are nodes that have hundreds of in and outgoing relationships, some others only have a few.
The problem is, that I don't know how to tell Cypher how to do this. The above query works for a depth of 1 or 2, but it should look like [r:move*] to return the whole path, which is about 20-200 hops.
But if i raise the values, the querys won't finish. I think that Cypher looks at each outgoing relationship at every single path depth relating to the start node, but as I already explained, there is only one right path. So it should do some kind of a DFS search instead of a BFS search. Is there a way to do so?
I would consider configuring a relationship index for the GameID property. See http://docs.neo4j.org/chunked/milestone/auto-indexing.html#auto-indexing-config.
Once you have done that, you can try a query like the following (I have not tested this):
START n=node(3), r=relationship:rels(GameID = 3)
MATCH (n)-[r*1..]->(m)
RETURN m;
Such a query would limit the relationships considered by the MATCH cause to just the ones with the GameID you care about. And getting that initial collection of relationships would be fast, because of the indexing.
As an aside: since neo4j reuses its internally-generated IDs (for nodes that are deleted), storing those IDs as GameIDs will make your data unreliable (unless you never delete any such nodes). You may want to generate and use you own unique IDs, and store them in your nodes and use them for your GameIDs; and, if you do this, then you should also create a uniqueness constraint for your own IDs -- this will, as a nice side effect, automatically create an index for your IDs.

Why do relationships as a concept exist in neo4j or graph databases in general?

I can't seem to find any discussion on this. I had been imagining a database that was schemaless and node based and heirarchical, and one day I decided it was too common sense to not exist, so I started searching around and neo4j is about 95% of what I imagined.
What I didn't imagine was the concept of relationships. I don't understand why they are necessary. They seem to add a ton of complexity to all topics centered around graph databases, but I don't quite understand what the benefit is. Relationships seem to be almost exactly like nodes, except more limited.
To explain what I'm thinking, I was imagining starting a company, so I create myself as my first nodes:
create (u:User { u.name:"mindreader"});
create (c:Company { c.name:"mindreader Corp"});
One day I get a customer, so I put his company into my db.
create (c:Company { c.name:"Customer Company"});
create (u:User { u.name:"Customer Employee1" });
create (u:User { u.name:"Customer Employee2"});
I decide to link users to their customers
match (u:User) where u.name =~ "Customer.*"
match (c:Company) where c.name =~ "Customer.*
create (u)-[:Employee]->(c);
match (u:User where name = "mindreader"
match (c:Company) where name =~ "mindreader.*"
create (u)-[:Employee]->(c);
Then I hire some people:
match (c:Company) where c.name =~ "mindreader.*"
create (u:User { name:"Employee1"})-[:Employee]->(c)
create (u:User { name:"Employee2"})-[:Employee]->(c);
One day hr says they need to know when I hired employees. Okay:
match (c:Company)<-[r:Employee]-(u:User)
where name =~ "mindreader.*" and u.name =~ "Employee.*"
set r.hiredate = '2013-01-01';
Then hr comes back and says hey, we need to know which person in the company recruited a new employee so that they can get a cash reward for it.
Well now what I need is for a relationship to point to a user but that isn't allowed (:Hired_By relationship between :Employee relationship and a User). We could have an extra relationship :Hired_By, but if the :Employee relationship is ever deleted, the hired_by will remain unless someone remembers to delete it.
What I could have done in neo4j was just have a
(u:User)-[:hiring_info]->(hire_info:HiringInfo)-[:hired_by]->(u:User)
In which case the relationships only confer minimal information, the name.
What I originally envisioned was that there would be nodes, and then each property of a node could be a datatype or it could be a pointer to another node. In my case, a user record would end up looking like:
User {
name: "Employee1"
hiring_info: {
hire_date: "2013-01-01"
hired_by: u:User # -> would point to a user
}
}
Essentially it is still a graph. Nodes point to each other. The name of the relationship is just a field in the origin node. To query it you would just go
match (u:User) where ... return u.name, u.hiring_info.hiring_date, u.hiring_info.hired_by.name
If you needed a one to many relationship of the same type, you would just have a collection of pointers to nodes. If you referenced a collection in return, you'd get essentially a join. If you delete hiring_info, it would delete the pointer. References to other nodes would not have to be a disorganized list at the toplevel of a node. Furthermore when I query each user I will know all of the info about a user without both querying for the user itself and also all of its relationships. I would know his name and the fact that he hired someone in the same query. From the database backend, I'm not sure much would change.
I see quite a few questions from people asking whether they should use nodes or relationships to model this or that, and occasionally people asking for a relationship between relationships. It feels like the XML problem where you are wondering if a pieces of information should be its own tag or just a property its parent tag.
The query engine goes to great pains to handle relationships, so there must be some huge advantage to having them, but I can't quite see it.
Different databases are for different things. You seem to be looking for a noSQL database.
This is an extremely wide topic area that you've reached into, so I'll give you the short of it. There's a spectrum of database schemas, each of which have different use cases.
NoSQL aka Non-relational Databases:
Every object is a single document. You can have references to other documents, but any additional traversal means you're making another query. Times when you don't have relationships between your data very often, and are usually just going to want to query once and have a large amount of flexibly-stored data as the document that is returnedNote: These are not "nodes". Node have a very specific definition and implies that there are edges.)
SQL aka Relational Databases:
This is table land, this is where foreign keys and one-to-many relationships come into play. Here you have strict schemas and very fast queries. This is honestly what you should use for your user example. Small amounts of data where the relationships between things are shallow (You don't have to follow a relationship more than 1-2 times to get to the relevant entry) are where these excel.
Graph Database:
Use this when relationships are key to what you're trying to do. The most common example of a graph is something like a social graph where you're connecting different users together and need to follow relationships for many steps. (Figure out if two people are connected within a depth for 4 for instance)
Relationships exist in graph databases because that is the entire concept of a graph database. It doesn't really fit your application, but to be fair you could just keep more in the node part of your database. In general the whole idea of a database is something that lets you query a LOT of data very quickly. Depending on the intrinsic structure of your data there are different ways that that makes sense. Hence the different kinds of databases.
In strongly connected graphs, Neo4j is 1000x faster on 1000x the data than a SQL database. NoSQL would probably never be able to perform in a strongly connected graph scenario.
Take a look at what we're building right now: http://vimeo.com/81206025
Update: In reaction to mindreader's comment, we added the related properties to the picture:
RDBM systems are tabular and put more information in the tables than the relationships. Graph databases put more information in relationships. In the end, you can accomplish much the same goals.
However, putting more information in relationships can make queries smaller and faster.
Here's an example:
Graph databases are also good at storing human-readable knowledge representations, being edge (relationship) centric. RDF takes it one step further were all information is stored as edges rather than nodes. This is ideal for working with predicate logic, propositional calculus, and triples.
Maybe the right answer is an object database.
Objectivity/DB, which now supports a full suite of graph database capabilities, allows you to design complex schema with one-to-one, one-to-many, many-to-one, and many-to-many reference attributes. It has the semantics to view objects as graph nodes and edges. An edge can be just the reference attribute from one node to another or an edge can exist as an edge object that sits between two nodes.
An edge object can have any number of attribute and can have references off to other objects, as shown in the diagram below.
Being able to "hang" complex objects off of an edge allows Objectivity/DB to support weighted queries where the edge-weight can be calculated using a user-defined weight calculator operator. The weight calculator operator can build the weight from a static attribute on the edge or build the weight by digging down through the objects connected to the edge. In the picture, above, we could create a edge-weight calculator that computes the sum of the CallDetail lengths connected to the Call edge.

Resources