I'm going to preface this that I am a total database pleb. I have 0 experience with any form of databases so I know that I'm in way over my head.
Background: I do Active Directory consulting for my company so I routinely look at client's group membership of their active directory accounts. Currently, I have a PowerShell script that will run my analytics, however, I'm finding that it takes way too long in larger organizations. I'm thinking "There has to be a better way" so I have jumped into looking at databases. NEO4J seems to be a good possible solution as I should be able to to link a user account or group as a member of another group. However, after browsing documentation and forums, I have no idea how to create those links.
I have two CSVs that I have successfully imported with the following information:
Users = DistinguishedName, SAMACCOUNTNAME, MemberOf
Groups = DistinguishedName, SAMACCOUNTNAME, MemberOf, Members
What I want to do is match a string from all users and groups (DistinguishedName) to a string in the group node's property of members. Members is a concatenated string of all DistinguishedName's (whether user or group). So if a node with a DistinguishedName matches part of a string in a group's "members" property, I want to build a one way relationship like so:
user -[memberof] - > group
The best I could rack my brain on this is the following code but I have no idea if I'm even close:
Match(n)
Match(u:user) WHERE n.Members CONTAINS u.DN
Create (u)-[MS:Memberof]->((match)})
In PowerShell, I know how I would accomplish this (loosely translated to relate to the NEO4J world):
$groups = (all-groups)
$AllUsersAndGroups = (all-objs)
foreach ($line in $groups) {
$line.relationship = $line | where {$_.members -contains $AllUsersAndGrups.DistinguishedName}
}
So at last, I'm stuck right now. I will continue to look into it but I figure I would ask the community as you guys have the experience and stuff.
Here is an example of how you should have imported your data (notice that the redundant Members column is not actually needed):
Import (in batches of 5000, to avoid resource issues) each user, and create a unique relationship to its group:
USING PERIODIC COMMIT 5000
LOAD CSV WITH HEADERS FROM "file:///users.csv" AS u
MERGE (u:User {DistinguishedName: u.DistinguishedName, SAMACCOUNTNAME: u.SAMACCOUNTNAME})
MERGE (g:Group {DistinguishedName: u.MemberOf})
MERGE (u)-[:Memberof]->(g);
Import each group, and create a unique relationship to its parent group, if any:
USING PERIODIC COMMIT 5000
LOAD CSV WITH HEADERS FROM "file:///groups.csv" AS g1
MERGE (:Group {DistinguishedName: g1.DistinguishedName, SAMACCOUNTNAME: g1.SAMACCOUNTNAME})
MERGE (g2:Group {DistinguishedName: g1.MemberOf})
MERGE (g1)-[:Memberof]->(g2);
Related
I just downloaded and installed Neo4J. Now I'm working with a simple csv that is looking like that:
So first I'm using this to merge the nodes for that file:
LOAD CSV WITH HEADERS FROM 'file:///Athletes.csv' AS line
MERGE(Rank:rank{rang: line.Rank})
MERGE(Name:name{nom: line.Name})
MERGE(Sport:sport{sport: line.Sport})
MERGE(Nation:nation{pays: line.Nation})
MERGE(Gender: gender{genre: line.Gender})
MERGE(BirthDate:birthDate{dateDeNaissance: line.BirthDate})
MERGE(BirthPlace: birthplace{lieuDeNaissance: line.BirthPlace})
MERGE(Height: height{taille: line.Height})
MERGE(Pay: pay{salaire: line.Pay})
and this to create some constraint for that file:
CREATE CONSTRAINT ON(name:Name) ASSERT name.nom IS UNIQUE
CREATE CONSTRAINT ON(rank:Rank) ASSERT rank.rang IS UNIQUE
Then I want to display to which country the athletes live to. For that I use:
Create(name)-[:WORK_AT]->(nation)
But I have have that appear:
I would like to know why I have that please.
I thank in advance anyone that takes time to help me.
Several issues come to mind:
If your CREATE clause is part of your first query: since the CREATE clause uses the variable names name and nation, and your MERGE clauses use Name and Nation (which have different casing) -- the CREATE clause would just create new nodes instead of using the Name and Nation nodes.
If your CREATE clause is NOT part of your first query: your CREATE clause would just create new nodes (since variable names, even assuming they had the same casing, are local to a query and are not stored in the DB).
Solution: You can add this clause to the end of the first query:
CREATE (Name)-[:WORK_AT]->(Nation)
Yes, Agree with #cybersam, it's the case sensitive issue of 'name' and 'nation' variables.
My suggesttion:
MERGE (Name)-[:WORK_AT]->(Nation)
I see that you're using MERGE for nodes, so just in case any values of Name or Nation duplicated, you should use MERGE instead of CREATE.
I'm very new to the Neo4j world so please forgive me if this is a trivial question. I have 2 tables I've loaded into the database using LOAD CSV
artists:
artist_name,artist_id
"Bob","abc"
"Jack","def"
"James","ghi"
"Someone","jkl"
"John","mno"
agency_list:
"Agency"
"A"
"B"
"C"
"D"
Finally, I have an intermediary table that has the artist and the agencies that represent them.
artist_agencies:
artist_name,artist_id,agency
"Bob","abc", "A"
"Bob","abc", "B"
"Jack","def", "C"
"James","ghi", "C"
"Someone","jkl","B"
"Someone","jkl", "C"
"John","mno", "D"
Notice some artists can be a part of multiple agencies (which is why I didn't include the agency variable in the Artist table)
I'm trying to get four agency nodes that connect to each artist based on a :REPRESENTS relationship. Basically something like:
(agency:Agency) - [:REPRESENTS] -> (artist:Artist)
The code I've tried is:
LOAD CSV WITH HEADERS FROM "file:///agency_list.csv" as agencies
CREATE (agency:Agency {agency: agencies.Agency})
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///artists.csv" as artists
CREATE (artist:Artist {artist: artists.artist_name, artist_id: artists.artist_id})
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///artist_agencies.csv" as line
CREATE (ag:Agency) - [:REPRESENTS] -> (ar:Artist {track_artist_uri:line.track_artist_uri})
So far I'm getting this, each blue node is a duplicate of an agency name. Rather than just having one single agency node that connects to all artists via the :REPRESENTS relationship. result
I guess my problem is that I don't know how to relate the artists table to the agency_list table via this intermediate artist_agencies table. Is there a better way to do this or am I on the right track?
Thanks!
Joey
The artist_agencies.csv query needs to find the appropriate Agency and Artist nodes before creating a relationship between them. For example:
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///artist_agencies.csv" as line
MATCH (ag:Agency) WHERE ag.agency = line.agency
MATCH (ar:Artist) WHERE ar.artist_id = line.artist_id
CREATE (ag)-[:REPRESENTS]->(ar)
Aside: The artist_agencies.csv file does not need the artist_name column.
[UPDATE]
If the artist_agencies.csv data could cause duplicate relationships to be created, replace CREATE with (the more expensive) MERGE to avoid that. And make sure you do not have duplicate Agency or Artist nodes.
I am trying to achieve what is shown here:
I have 2 CSV Files, diease_mstr and Test_mstr Now in Test_mstr, I have many test to disease ID records, which means none of them are unique. The disease ID points to disease_mstr file. In disease_mstr file I have only 2 fields, ID and Disease_name (disease name is unique).
Now, I am creating 3 nodes with labels
1) Tests (only "testname" property) which will have unique tests (total 345 unique testnames)
**Properties :**
a) testname
2) Linknode (pulled entire Test_mstr file) also pulled "disease_name" for corresponding disease_ID from Disease_mstr File
**Properties**
a)tname
b)dname
c)did
3) Disease (pulled form disease_mstr) file.
**Properties**
a)did
b)diseasename
Afterwhich I run create relationships
1)MATCH (t:Tests),(n:Linknode) where t.testname = n.tname CREATE (n)-[r:TEST_2]->(t) RETURN n,r,t
2)MATCH (d:Disease), (l:Linknode) where d.did = l.did MERGE (d)-[r:FOR_DISEASE]->(l) RETURN d,r,l
To get the desired result as shown in image, I run following cypher command :
MATCH (d:Disease)-[r2:FOR_DISEASE]->(l:Linknode)-[r:TEST_2]->(t:Tests) RETURN l,r,t,r2 LIMIT 25
Can someone please help me create 2 more relationships which is marked and linked in image with BLUE and GREEN lines?.
Sample files and images can be accessed in my google folder link
Is your goal to link all diseases to tests so that for any disease you can find out which tests are relevant and for each test, which diseases it tests for?
If so, you are nearly there.
You don't need the link nodes other than to help you during linking the tests to the diseases. In your current scenario you're treating the link nodes as you would if you were creating a relational database. They won't add any value in your graph db. You can create a single relationship between diseases and tests which will do all the work.
Here's a step by step way to load your database. (It probably isn't the most efficient, but it's easy to follow and it works.)
Normalise and load your tests:
load csv with headers from "file:///test_mstr_csv.csv" as line
merge (:Test {testname:line.test_name});
Load your diseases (these looked normalised to me)
load csv with headers from "file:///disease_mstr_csv.csv" as line
create (:Disease {did:line.did, diseasename:line.disease_name});
Load your link nodes:
load csv with headers from "file:///test_mstr_csv.csv" as line
merge (:Link {testname:line.test_name, parentdiseaseid:line.parent_disease_ID});
Now you can create a direct relationship between the diseases and tests with the following query:
match(d:Disease), (l:Link) where d.did = l.parentdiseaseid
with d, l.testname as name
match(t:Test {testname:name}) create (d)<-[:TEST_FOR]-(t);
This last query will find all the link nodes for each disease and extract the test name. It then looks up the test and joins it directly to its corresponding disease.
The link nodes are redundent now, so you can delete them if you wish.
To create the 'blue lines', which I assume are meant to show where tests have diseases in common, run the query below:
match (d:Disease)<-[]-(:Test)-[]->(e:Disease) where id(d) > id(e)
merge (d)-[:BLUE_LINE]->(e);
The match clause finds all disease pairs with a common test, the where clause ensures a link is created in only one direction and the merge clause ensures only one link is created.
I'm learning about neo4j and I have the following question.
I have two groups of nodes, the first one is called Workers who have an ID and the name of the worker.
On the other hand there is another group of nodes, called products, which apart from the id, has the following attributes; price, name.
I want to make a relationship called "manipulate" where I relate a worker to the product that he is going to manipulate.
For this I have a trabajaensector.csv file which relates the workers by id, along with the products they are going to manipulate, also by id.
This is its form:
id1,id2,sector
1,1,fruteria
2,2,fruteria
3,2,fruteria
4,7,panaderia
5,5,fruteria
6,5,fruteria
7,9,bebidas
8,9,bebidas
9,10,bebidas
10,10,bebidas
11,3,pescaderia
12,8,panaderia
13,7,panaderia
14,9,bebidas
15,10,bebidas
16,4,pescaderia
17,2,fruteria
18,4,pescaderia
In summary, id1 (worker) manipulates id2 (product) and its sector is "fruteria/pescaderia/panaderia o bebida"
This is my CQL for creating manipulate relationship:
LOAD CSV WITH HEADERS FROM "file:///trabajaensector.csv" AS csvLine
MATCH(w:Worker),(p:Product) where w.id= toInt(csvLine.id1) and p.id=
toInt(csvLine.id2) create (w)-[sect:trabajasec]->(p) return sect
Here is my problem, the relationship is apparently creating well, however I am losing that third "sector" data, which indicates the sector where the worker works by manipulating that product.
For example, the relationship for a worker named Juan who manipulates apples should have in the relation the variable / attribute "fruteria" or for fish "pescaderia".
Any idea of how to properly include that data in the relationship and how to recover it?
You can add a sector property to the trabajasec relationships:
LOAD CSV WITH HEADERS FROM "file:///trabajaensector.csv" AS csvLine
MATCH (w:Worker), (p:Product)
WHERE w.id = TOINT(csvLine.id1) AND p.id = TOINT(csvLine.id2)
CREATE (w)-[sect:trabajasec {sector: csvLine.sector}]->(p)
RETURN sect;
To use the above query, you should first delete the trabajasec relationships created by your earlier LOAD CSV query.
I have a csv file generated with contents as follows
GOID GOName
GO:0007190 activation of adenylate cyclase activity
DiseaseID DiseaseName
D058490 46 XY Disorders of Sex Development
D000172 Acromegaly
D049913 ACTH-Secreting,Pituitary Adenoma
D058186 Acute Kidney Injury
D000310 Adrenal Gland Neoplasms
D000312 Adrenal Hyperplasia Congenital
C537045 Albright's hereditary osteodystrophy
D000544 Alzheimer Disease
D019969 Amphetamine-Related Disorders
D000855 Anorexia
D000860 Anoxia
D001008 Anxiety Disorders
D001169 Arthritis Experimental
D001171 Arthritis Juvenile
D001172 Arthritis Rheumatoid
D001249 Asthma
D001254 Astrocytoma
and so on.
I want to create link between GOIDs through Diseases such that one disease node is connected to two or more different GOID nodes.
My output should look like this
Load your diseases all at once as under a :Disease label.
Load all your Global data at once under a :Global label
Create another CSV file with the Global->Disease linkages, and use MERGE to create the relationships.
The relationship CSV would look like this:
goID,diseaseID
"GO:1234","D000456"
The command to read the CSV and create the relationships would look like this:
USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM "file:/D:/Relationships.csv" as line
MERGE (:Global {goID: line.goID})-[:RELATIONSHIP]->(:Disease {diseaseID: line.diseaseID})
once your data is loaded, you can then query it like so:
MATCH (g:Global {goID: "GO:0007190"})-[r:RELATIONSHIP]->(d:Disease)
return g, r, d
For cases where a disease has multiple global conditions, you can find and create a relationship like so:
match (d:Disease)
match (go1:GO)-[:RELATIONSHIP]->(d)
match (go2:GO)-[:RELATIONSHIP]->(d) where go2 <> go1
create (go1)-[:RELATIONSHIP]->(go2)
create (go2)-[:RELATIONSHIP]->(go1)
Strictly speaking you don't need a bi-directional relationship, so creating the second relationship could be left out. One potential concern is if more than one disease links two global values. If that is a concern, then setting a "Disease" property on the relationship would help identify how these globals are related.