neo4j: file import, relationship match, create new relationship - neo4j

I know how this would be accomplished in SQL but having difficulty wrapping my brain around how to do this in cypher..
Basically working on a master data setup where a user has a master_id (node) and need to use an existing relationship property to determine the master_id in order to create a new relationship between the master_id node and a location node.
Currently have master users created as nodes that contain a master_id property. A relationship is created between the master user and a brand, and the relationship has a property of brand_user_id.
I now have another file I need to import that contains data at the brand_user level, but need to create the relationship between the master_id and a location node. In order to do this because the file does not contain the master_id property, I am attempting to use the new file to lookup master_id's based on the existing relationship with the brand, then use that master_id to create the new relationship with the location.
Have this relationship:
(m:Master{master_id:12345})-[:IS_BRAND_USER{brand_user_id:9876}]->(b:Brand{name:"Acme"})
Have this file:
brand_user_id,location_id
9876,6
Need this relationship:
(m:Master{master_id:12345})-[:HAS_LOCATION]->(l:Location{id:6})
My approach:
LOAD CSV WITH HEADERS FROM "file:///brand_user_ids.csv" as buid
MATCH (m:Master)-[r:IS_BRAND_USER{brand_user_id:buid.id}]->(b:Brand)
WITH m, buid.location_id AS location_id
MATCH (l:Location {id: location_id})
CREATE (m)-[:HAS_LOCATION {source: 'abcdef'}]->(l)
Seems to run for an extremely long time and not seeing any real progress after an hour so I'm wondering if this is fundamentally the correct approach or not, or if I have inadvertently created some horrific cross join equivalent.

The problem is that you are trying to enter a graph on a relationship. And that always requires lots of "scanning the graph".
Now, I'm not a specialist in your domain, but you might be missing a type of nodes here ... BrandUser. And there could be several reasons for that :
Based on your data a Master can have many BrandUser id's. Potentially even more than one per Brand ? Do you have other properties that make sense on the BrandUser level ?
That Location data is strange. Wouldn't you agree it's actually the BrandUser that has a location and that a Master may have many locations ?
The most important reason is however ... if you're going to enter the graph on the brand_user_id all the time (and judging from the location example that may be the case) ... you've got the reason to turn it into a node right there.
So ... it's a modeling issue really.
Hope this helps.
Regards,
Tom

Related

neo4j - Relationship between three nodes

I'm totally new to Neo4j and I'm testing it in these days.
One issue I have with it is how to correctly implement a relationship which involves 3 different nodes using Spring Data. Suppose, for example, that I have 3 #NodeEntitys: User, Tag and TaggableObject.
As you can argue, a User can add a Tag to a TaggableObject; I model this operation with a #RelationshipEntity TaggingOperation.
However, I can't find a simple way to glue the 3 entities inside the relationship. I mean, the obvious choice is to set #StartNode User tagger and #EndNode TaggedObject taggedObject; but how can I also add the Tag to the relationship?
This is called a "hyperedge", I believe, and it's not something that Neo4j supports directly. You can create an additional node to support it, tough. So you could have a TagEvent node with a schema like so:
(:User)-[:PERFORMED]->(:TagEvent)
(:Tag)<-[:USED]-(:TagEvent)
(:TagObject)<-[:TAGGED]-(:TagEvent)
Another alternative is to store a foreign key as a property on a relationship or a node. Obviously that's not very graphy, but if you just need it for reference that might not be a bad solution. Just remember to not use the internal Neo4j ID as in future versions that may not be dependable. You should create your own ID for this purpose.

Cypher: Create relationships between nodes based on a common property key id

I'm brand new to Cypher (and Stackoverflow) and am having trouble creating relationships between nodes based on share property keys.
I would like to do something like this:
MATCH (a:Person)-->()<--(b:Country)
WHERE HAS (a.id) AND HAS (b.id) AND a.id=b.id
CREATE (a)-[:LIVES]->(b);
to create a relationship between Country node and Person nodes where they share the same id.
The above creates no errors when run but doesn't create any relationships either and I know that the ids should match.
Many thanks!!
EDIT:
I think I know what is going wrong - I'm asking to match nodes that have a relationship to eachother but no relationships are set up yet hence 0 results. I have now tried:
MATCH (a:Person),
(b:Country)
WHERE HAS (a.id) AND HAS (b.id) AND a.id=b.id
CREATE (a)-[:LIVES]->(b);
and the query is running. It's a big data set so might take a while......
That worked. Had to reduce the size of my data set (down from 64k nodes) as Neo4j was taking way too long to process but once I had a smaller set it worked fine.
One minor addition for future Googlers.
per the help files as of version 3.4
The has() function has been superseded by exists() and has been removed.
The new code should read
MATCH (a:Person),
(b:Country)
WHERE EXISTS (a.id) AND EXISTS (b.id) AND a.id=b.id
CREATE (a)-[:LIVES]->(b);

Dynamic Nodes and relationship creation in Neo4j

I am storing Hierarchy data in Neo4j. I want to store history of the Node. Consider I have a label called GROUP and the earlier name was "MARKETING" now it has been changed to "MARKET123". So i want to create a new node where the name will be MARKET123 and the create a relationship with other connected node same as for the older node named "MARKETING"...
But all this i want to do dynamically instead of passing the other Nodes name and the relationship value in the cypher query.
Please suggest me how it can be done.
You can add versioning to your graph nodes.
Here is a graph gist about time-based versioning that you may adapt to your needs.
http://www.neo4j.org/graphgist?608bf0701e3306a23e77

Neo4j Relationship Schema Indexes

Using Neo4j 2.1.4 and SDN 3.2.0.RELEASE
I have a graph that connects nodes with relationships that have a UUID associated with them. External systems use the UUID as a means to identify the source and target of the relationship. Within Spring Data Neo4j (SDN) we have a #RelationshipEntity(type=”LINKED_TO”) class with a #StartNode, #EndNode and a String uuid field. The uuid field is #Indexed and the resulting schema definition in Neo4j shows up as
neo4j-sh (?)$ SCHEMA
==> Indexes
...
==> ON :Link(uuid) ONLINE
...
However, running a cypher query against the data, e.g.
MATCH ()-[r:LINKED_TO]->() WHERE uuid=’XXXXXX’ RETURN r;
does a full scan of the database and takes a long time
If I try to use the index by running
MATCH ()-[r:LINKED_TO]->() USING INDEX r:Link(uuid) WHERE uuid=’XXXXXX’ RETURN r;
I get
SyntaxException: Type mismatch: expected Node but was Relationship.
As I understand it, Relationships are supposed to be first class citizens in Neo4j, but I can’t see how to utilize the index on the relationship to prevent the graph equivalent of a table scan on the database to locate the relationship.
I know there are posts like How to use relationship index in Cypher which ask similar things, but this Link is the relationship between the two nodes. If I converted the Link to a Node, we would be creating a Node to represent a Relationship which seems wrong when we are working in a graph database - I'd end up with ()-[:xxx]->(:Link)-[:xxx]->() to represent one relationship. It would make the model messy purely due to the fact that the Link couldn't be represented as a relationship.
The Link has got a unique, shared key attached to it that I want to use. The Schema output suggests that the index is there for that field - I just can't use it.
Does anyone have any suggestions?
Many thanks,
Dave
Schema indexes are only available for nodes. The only way to index relationships is using legacy indexes or autoindexes. Legacy indexes need to be used explicitly in the START clause:
START r=relationship:my_index_name(uuid=<myuuid>)
RETURN r
I'm not sure how this can be used in conjunction with SDN.
side note: requiring relationship index is almost always a indication of doing something wrong in your graph data model. Everything being a thing or having an identity in your domain should be a node. So if a relationship requires a uuid, maybe the relationship refers to a thing and therefore should be converted into a node having a inbound relationship to the previous start node and a outbound relationship to the previous end node.

Why do relationships as a concept exist in neo4j or graph databases in general?

I can't seem to find any discussion on this. I had been imagining a database that was schemaless and node based and heirarchical, and one day I decided it was too common sense to not exist, so I started searching around and neo4j is about 95% of what I imagined.
What I didn't imagine was the concept of relationships. I don't understand why they are necessary. They seem to add a ton of complexity to all topics centered around graph databases, but I don't quite understand what the benefit is. Relationships seem to be almost exactly like nodes, except more limited.
To explain what I'm thinking, I was imagining starting a company, so I create myself as my first nodes:
create (u:User { u.name:"mindreader"});
create (c:Company { c.name:"mindreader Corp"});
One day I get a customer, so I put his company into my db.
create (c:Company { c.name:"Customer Company"});
create (u:User { u.name:"Customer Employee1" });
create (u:User { u.name:"Customer Employee2"});
I decide to link users to their customers
match (u:User) where u.name =~ "Customer.*"
match (c:Company) where c.name =~ "Customer.*
create (u)-[:Employee]->(c);
match (u:User where name = "mindreader"
match (c:Company) where name =~ "mindreader.*"
create (u)-[:Employee]->(c);
Then I hire some people:
match (c:Company) where c.name =~ "mindreader.*"
create (u:User { name:"Employee1"})-[:Employee]->(c)
create (u:User { name:"Employee2"})-[:Employee]->(c);
One day hr says they need to know when I hired employees. Okay:
match (c:Company)<-[r:Employee]-(u:User)
where name =~ "mindreader.*" and u.name =~ "Employee.*"
set r.hiredate = '2013-01-01';
Then hr comes back and says hey, we need to know which person in the company recruited a new employee so that they can get a cash reward for it.
Well now what I need is for a relationship to point to a user but that isn't allowed (:Hired_By relationship between :Employee relationship and a User). We could have an extra relationship :Hired_By, but if the :Employee relationship is ever deleted, the hired_by will remain unless someone remembers to delete it.
What I could have done in neo4j was just have a
(u:User)-[:hiring_info]->(hire_info:HiringInfo)-[:hired_by]->(u:User)
In which case the relationships only confer minimal information, the name.
What I originally envisioned was that there would be nodes, and then each property of a node could be a datatype or it could be a pointer to another node. In my case, a user record would end up looking like:
User {
name: "Employee1"
hiring_info: {
hire_date: "2013-01-01"
hired_by: u:User # -> would point to a user
}
}
Essentially it is still a graph. Nodes point to each other. The name of the relationship is just a field in the origin node. To query it you would just go
match (u:User) where ... return u.name, u.hiring_info.hiring_date, u.hiring_info.hired_by.name
If you needed a one to many relationship of the same type, you would just have a collection of pointers to nodes. If you referenced a collection in return, you'd get essentially a join. If you delete hiring_info, it would delete the pointer. References to other nodes would not have to be a disorganized list at the toplevel of a node. Furthermore when I query each user I will know all of the info about a user without both querying for the user itself and also all of its relationships. I would know his name and the fact that he hired someone in the same query. From the database backend, I'm not sure much would change.
I see quite a few questions from people asking whether they should use nodes or relationships to model this or that, and occasionally people asking for a relationship between relationships. It feels like the XML problem where you are wondering if a pieces of information should be its own tag or just a property its parent tag.
The query engine goes to great pains to handle relationships, so there must be some huge advantage to having them, but I can't quite see it.
Different databases are for different things. You seem to be looking for a noSQL database.
This is an extremely wide topic area that you've reached into, so I'll give you the short of it. There's a spectrum of database schemas, each of which have different use cases.
NoSQL aka Non-relational Databases:
Every object is a single document. You can have references to other documents, but any additional traversal means you're making another query. Times when you don't have relationships between your data very often, and are usually just going to want to query once and have a large amount of flexibly-stored data as the document that is returnedNote: These are not "nodes". Node have a very specific definition and implies that there are edges.)
SQL aka Relational Databases:
This is table land, this is where foreign keys and one-to-many relationships come into play. Here you have strict schemas and very fast queries. This is honestly what you should use for your user example. Small amounts of data where the relationships between things are shallow (You don't have to follow a relationship more than 1-2 times to get to the relevant entry) are where these excel.
Graph Database:
Use this when relationships are key to what you're trying to do. The most common example of a graph is something like a social graph where you're connecting different users together and need to follow relationships for many steps. (Figure out if two people are connected within a depth for 4 for instance)
Relationships exist in graph databases because that is the entire concept of a graph database. It doesn't really fit your application, but to be fair you could just keep more in the node part of your database. In general the whole idea of a database is something that lets you query a LOT of data very quickly. Depending on the intrinsic structure of your data there are different ways that that makes sense. Hence the different kinds of databases.
In strongly connected graphs, Neo4j is 1000x faster on 1000x the data than a SQL database. NoSQL would probably never be able to perform in a strongly connected graph scenario.
Take a look at what we're building right now: http://vimeo.com/81206025
Update: In reaction to mindreader's comment, we added the related properties to the picture:
RDBM systems are tabular and put more information in the tables than the relationships. Graph databases put more information in relationships. In the end, you can accomplish much the same goals.
However, putting more information in relationships can make queries smaller and faster.
Here's an example:
Graph databases are also good at storing human-readable knowledge representations, being edge (relationship) centric. RDF takes it one step further were all information is stored as edges rather than nodes. This is ideal for working with predicate logic, propositional calculus, and triples.
Maybe the right answer is an object database.
Objectivity/DB, which now supports a full suite of graph database capabilities, allows you to design complex schema with one-to-one, one-to-many, many-to-one, and many-to-many reference attributes. It has the semantics to view objects as graph nodes and edges. An edge can be just the reference attribute from one node to another or an edge can exist as an edge object that sits between two nodes.
An edge object can have any number of attribute and can have references off to other objects, as shown in the diagram below.
Being able to "hang" complex objects off of an edge allows Objectivity/DB to support weighted queries where the edge-weight can be calculated using a user-defined weight calculator operator. The weight calculator operator can build the weight from a static attribute on the edge or build the weight by digging down through the objects connected to the edge. In the picture, above, we could create a edge-weight calculator that computes the sum of the CallDetail lengths connected to the Call edge.

Resources