I'm new to Neo4j, and playing around by trying to set up a music database. To start simple, I'm just playing with two labels:
Artist
Song
Obviously this is a parent-child relationship, where a Song is a child of an Artist (or possibly multiple Artists), and might look something like:
(:Artist {name:'name'})-[:RECORDED]->(:Song {title:'title'})
I am making the following assumptions:
Artist names are unique
Song titles are not unique
Duplicate ingest data is unavoidable
To give an example of what I'd like to do:
I ingest "Hallelujah" by Leonard Cohen. A new Artist node and Song node are created, with a RECORDED relationship
I ingest "Hallelujah" by Jeff Buckley. Again, new Artist and Song node are created, with a RECORDED relationship. The first "Hallelujah" Song is not associated with this new graph at all.
I ingest "Hallelujah" by Jeff Buckley again. Nothing happens.
I ingest "Lilac Wine" by Jeff Buckley. We reuse our old Artist node, but I have a new Song node with a RECORDED relationship
From what I can tell, using MERGE gets me close, but not quite there (it stops duplication of the ARTIST, but not of the SONG). If I use CREATE, then point number 3. doesn't work properly.
I guess I could add another property to the SONG label which tracks its ARTIST (and I can therefore make unique), but that seems a little redundant and unidiomatic of a graph database, no?
Does anyone have any bright ideas on the most succinct way of enforcing these relationships and requirements?
Merge Artist first, and after Song:
WITH 'Leonard Cohen' AS ArtistName,
'Hallelujah' AS SongTitle
MERGE (A:Artist {name:ArtistName})
WITH A,
SongTitle
OPTIONAL MATCH p=(A)-[:RECORDED]->(:Song {title:SongTitle})
FOREACH (x in CASE WHEN p IS NULL THEN [1] ELSE [] END |
CREATE (S:Song {title:SongTitle})
MERGE (A)-[:RECORDED]->(S)
)
WITH A,
SongTitle
MATCH p = (A)-[:RECORDED]->(:Song {title:SongTitle})
RETURN p
I don't think song title is something you can rely on for uniqueness, especially if this graph includes covers of existing songs.
Determining some additional means to imply uniqueness is the way to go.
Artist is one way. Recorded date might be another piece of data to consider. If you're reading this from some other kind of database, there may be some other unique ID they use for uniqueness.
Whatever the case, once you have the fields that you want to use to determine uniqueness, MERGE your song node with all those fields present.
Related
I am working on making the framework for a music recommendation system (with data from Million Songs in a CSV file) by connecting songs in a graph database using neo4j. This is my first time using neo4j, but I have used SQL before.
So far I have three nodes: Song, Artist, and Tempo.
I have already created the relationship between artists and their songs, and now I'm trying to create a relationship between each song and a range of tempos.
I could just have each song have a relationship to a specific tempo (ex: 120bpm), however that would not be very useful since I would not then be able to backtrack from Tempo and see another song that's very close in speed (ex: 119 or 121bpm).
Therefore, I'm attempting to group my Tempo nodes (which are floats) from being one exact number (ex:120bpm) to a range such as 0-80 (classified as very slow), 81-100 (slow), 101-130 (moderate), ... etc.
I know it would theoretically be better not to have set groups of tempos, but I'm just beginning and it will be ok for now.
Each Song node has parameters title artistName tempo.
Each Artist node has parameters artistName title.
Each Tempo node has parameters tempo title.
I have tried using creating a new node via:
CREATE (Tempo {Tempo.tempo<80});
... and several other ways I can't remember right now. Anyone that knows how to do this or if it's possible?
You seem to be duplicating properties unnecessarily across multiple node labels, in a way that would prevent a given node from being related to multiple other nodes. For example, an Artist node should not have a title property, since that would tie that node to a specific Song. Every Song would presumably have a relationship to the appropriate Artist anyway, so there is no need to store the song's title in the Artist node.
Also, as #InverseFalcon suggested, you can represent a range by using a pair of properties, say min and max.
Here is an example of a path in a suitable data model:
(:Tempo {min: 0, max: 79})<-[:HAS_TEMPO]-(:Song {title: 'Foo'})<-[:PERFORMED]-(:Artist {name: 'Fred'})
There would be one Tempo node for each tempo range.
Using the above data model, this simple query will return all songs that have the same tempo range ($speed is a parameter indicating the specific tempo of interest):
MATCH (t:Tempo)
WHERE t.min <= $speed <= t.max
MATCH (t)<-[:HAS_TEMPO]-(s:Song)
RETURN s;
And this is how you'd return the distinct artists who have ever performed a song in the desired tempo range:
MATCH (t:Tempo)
WHERE t.min <= $speed <= t.max
MATCH (t)<-[:HAS_TEMPO]-(:Song)<-[:PERFORMED]-(a:Artist)
RETURN DISTINCT a;
I know how this would be accomplished in SQL but having difficulty wrapping my brain around how to do this in cypher..
Basically working on a master data setup where a user has a master_id (node) and need to use an existing relationship property to determine the master_id in order to create a new relationship between the master_id node and a location node.
Currently have master users created as nodes that contain a master_id property. A relationship is created between the master user and a brand, and the relationship has a property of brand_user_id.
I now have another file I need to import that contains data at the brand_user level, but need to create the relationship between the master_id and a location node. In order to do this because the file does not contain the master_id property, I am attempting to use the new file to lookup master_id's based on the existing relationship with the brand, then use that master_id to create the new relationship with the location.
Have this relationship:
(m:Master{master_id:12345})-[:IS_BRAND_USER{brand_user_id:9876}]->(b:Brand{name:"Acme"})
Have this file:
brand_user_id,location_id
9876,6
Need this relationship:
(m:Master{master_id:12345})-[:HAS_LOCATION]->(l:Location{id:6})
My approach:
LOAD CSV WITH HEADERS FROM "file:///brand_user_ids.csv" as buid
MATCH (m:Master)-[r:IS_BRAND_USER{brand_user_id:buid.id}]->(b:Brand)
WITH m, buid.location_id AS location_id
MATCH (l:Location {id: location_id})
CREATE (m)-[:HAS_LOCATION {source: 'abcdef'}]->(l)
Seems to run for an extremely long time and not seeing any real progress after an hour so I'm wondering if this is fundamentally the correct approach or not, or if I have inadvertently created some horrific cross join equivalent.
The problem is that you are trying to enter a graph on a relationship. And that always requires lots of "scanning the graph".
Now, I'm not a specialist in your domain, but you might be missing a type of nodes here ... BrandUser. And there could be several reasons for that :
Based on your data a Master can have many BrandUser id's. Potentially even more than one per Brand ? Do you have other properties that make sense on the BrandUser level ?
That Location data is strange. Wouldn't you agree it's actually the BrandUser that has a location and that a Master may have many locations ?
The most important reason is however ... if you're going to enter the graph on the brand_user_id all the time (and judging from the location example that may be the case) ... you've got the reason to turn it into a node right there.
So ... it's a modeling issue really.
Hope this helps.
Regards,
Tom
I've just recently started using neo4j and I've run into an issue. There doesn't seem to be an answer to this on here but I might also be wording it incorrectly. I'm building a small site that categorizes music. There are multiple song nodes with BELONGS_TO relationships to genre nodes. How can I get every song that belongs to a set of user specified genres.
For example. Song1, Song2, Song3 all belong to both Pop and Electronic. Song4 just belongs to Pop. How can I query to get every song belonging to both Pop and Electronic? In this case Song1, Song2, an Song3.
I've been struggling with this for a while. This is what I have so far but it doesn't return anything. If I replace AND with OR I get all the songs that belong to one of those genres.
MATCH (n:Song)-[r:BELONGS_TO]->(Genre)
WHERE (n)-[r]->(Genre{name:"Pop"}) AND (n)-[r]->(Genre{name:"Electronic"})
RETURN n
Thank you.
What you're trying to do in the WHERE clause you should actually do in the MATCH clause. Here you go:
MATCH (g1:Genre {name: "Pop"})<-[:BELONGS_TO]-(popElectronicSongs:Song)-[:BELONGS_TO]->(g2:Genre {name: "Electronic"});
RETURN popElectronicSongs;
You can actually do quite a lot with just the MATCH clause as you can see here. The WHERE bit usually gets used for filtering based on various predicates. For example you might say WHERE popElectronicSongs.title =~ /S.*/ to filter for only songs whose name starts with S.
I am using neo4j 2.1.5 and tying to model a scenario where nodes are video-links and they fall under multiple categories and subcategories, however these nodes will also form a ordered sequence under a particular sequence-category.
e.g;
Sequence 1: (node 1)-[:relatedTo]->(node 2)->[:relatedTo]->(node 3)-[:relatedTo]->(node 4)
Sequence 2: (node 2)-[:someRel]->(node 8)->[:someRel]->(node 4)
What should be the best way to model this?
Note: more video nodes will be added in a sequence-category between the nodes.
The way that you've done this looks fine to me. A set of nodes, with relationships between them, essentially forming a queue. What you're doing here is basically using neo4j to model a linked list.
Linkage to the category is probably where it's going to get interesting. With linked lists, running through them is easy, but random access (jumping into the middle) is hard. Now, if the order were always fixed, you could just add a seq attribute to your relationships, and label them numerically (0, 1, 2...). You could then jump in anywhere by querying for the appropriate seq value. BUT, if you're saying you want to do dynamic insertions into the list, this is probably a terrible idea. If you create a new #3, that means you have to increment all subsequent seq numbers. So that's probably not going to work.
Probably just a Category node that is always guaranteed to link to the first item in the sequence is a good way to go, like this:
(c:Category {name:"Horror"})-[:contains]->(v1:Video {name: "Texas Chainsaw Massacre"})-[:next]->(v2:Video { name: "Hellraiser"});
You can do as many :next relationships as you want. To insert something new, you might do this:
MATCH (v1:Video {name: "Texas Chainsaw Massacre"})-[r:next]->(v2:Video {name: "Hellraiser"})
CREATE v1-[:next]->(v3:Video {name: "New Thing I'm inserting"}-[:next]->v2
DELETE r;
An alternative is that you could link the category to every video (not just the first). Your videos would still have ordering, and you'd be able to jump into them with random access by video name, but you wouldn't know where the head of your list was anymore.
I have read the Neo4j manual and saw the numerous short examples regarding movie graph. I have also installed it locally and played with the cypher.
Here is the setup:
I have the following nodes: Movies (with name and id, owned by friend), Actors(with name and ids) Directors (with names and id), Genre (with id and name)
Relations are: Actors acted in Movies (1 movie - many actors), Directors directed a movie (1 director per movie but a director can direct many movies), and Movies has several genre "(many to many)
1) Owned by friend I dont know why but following the LOAD CSV example they put USA as a node rather than a property but is there a logical reason why its better to put it as a node rather than a property like i did?
2)
What I want to search is similar to the answer given to this question:
Nearest nodes to a give node, assigning dynamically weight to relationship types
However - I do not have a weight on the relationship and its more of a "go find the first give nodes connected to it"
Given that the "owned by friend" can only be owned by 1 person.
If given movie title "Spider-Man" (which for example purpose is owned by frank) go find the next occurrence of a movie that is owned by John.
So after reading Neo4j I believe that I dont need to specify which relationship is needed to traverse but just go find the next movie that meets my criteria, right?
So Following the above link
MATCH (n:Start { title: 'Spider-Man' }),
(n)-[:CONNECTED*0..2]-(x)
RETURN x
So go to node Spider-Man and go find me X as long as it is connected but I got stump by *0..2 because its the range...what if I just say "go find me the first you that means the own by John"
3) following up to #2 - how do i insert the fitler "own by john" ?
There are a number of things in your question that don't quite make sense. Here's a stab at an answer.
1) Making 'USA' a node rather than a property is useful if you want to search based on country. If 'USA' is a node, you are able to limit your search by starting at the 'USA' node. If you don't care to do this, then it doesn't really matter. It may also save a small amount of space for longer country names to store the name once and link to it via relationships.
2) Your example doesn't match your described graph. I can't really speak to it without a better example.
3) This is probably easy to answer once you improve your example.
OK. Based on the comments to the answer, here's what you need. To find one movie owned by John that is connected via common actors, directors, etc to the movie Spider-man owned by Frank (that is, sub-graphs like (movie)<--(actor)-->(movie) ) you can write:
MATCH (n:Movie {title : 'Spider-Man', owned_by : 'Frank'})<-[*2]->(m:Movie {owned_by : 'John'})
RETURN m LIMIT 1
If you want more responses, alter or remove the LIMIT on the RETURN clause. If you want to allow chains that pass through chains like (movie)<--(actor)-->(movie)<--(director)-->(movie), you can increase the number of relationships matched (the *2) to 4, 6, 8, etc. You probably shouldn't just write the relationship part of the MATCH clause as -[*]-, because this could get into infinite loops.