multiple loads in neo4j - neo4j

I have loaded some data in neo4j graph database using batch importer. Now let's say if I have to load more data then do i have to keep track of what was inserted externally or there are standard features of neo4j that can be used to:
1) get the id for the last node inserted so that i know the id for the new node that needs to be inserted and index accordingly.
2) get the list of nodes already present in database so that i can check the uniqueness of the nodes that are going to be inserted. if a node already exists in the database i will just use the same id and won't create a new node.
3) check the uniqueness of the triplets - suppose a triplet "January Month is_a" is already present in neo4j database and let's say the new data that i want to insert also have same triplet, i would like to not insert it as it will give me duplicate results.
For example: if you add following data in neo4j graph database using batch-importer:https://github.com/jexp/batch-import
$ cat nodes.csv
name age works_on
Michael 37 neo4j
Selina 14
Rana 6
Selma 4
$ cat nodes_index.csv
0 name age works_on
1 Michael 37 neo4j
2 Selina 14
3 Rana 6
4 Selma 4
$ cat rels.csv
start end type since counter:int
1 2 FATHER_OF 1998-07-10 1
1 3 FATHER_OF 2007-09-15 2
1 4 FATHER_OF 2008-05-03 3
3 4 SISTER_OF 2008-05-03 5
2 3 SISTER_OF 2007-09-15 7
Now, if you have to add more data to the same database then you will need to know following things:
1) if nodes already exists then what are their ids so that you can use them while creating a triplet, if not then create a list of such nodes (not in database) and then start from a id that has not been used in last import and use it as a starting id for creating a new nodes_index.csv
2) if a triplet in database already exist, then don't create that triplet again as it will result in a duplicate result when running cypher queries against the database.
It seems like same issue has been raised here as well: https://github.com/jexp/batch-import/issues/27
Thanks!

1- why you need to know last node id .. you don't need to know the id to insert the new node it will added automatically in first free id in graph
2- for uniqueness , why you don't use create unique query "for nodes and relations as well"
here you can check the references : http://docs.neo4j.org/chunked/1.8/cypher-query-lang.html

Related

when loading csv in neo4j do not create all the relationships

good to all please help me with this problem :D
when I execute my query:
USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM "file:///Create_all.csv" AS row
MATCH(x:Category{uuid:row.uuid_category})
MERGE (t:Subscriber{name:row.name_subscriber, uuid:row.uuid_subscriber})
CREATE (n:Product{name: row.name_product, uuid: row.uuid_product}),
(Price:AttributeValue{name:'Price', value: row.price_product}),
(Stock:AttributeValue{name:'Stock', value: row.stock_product }),
(Style:AttributeValue{name:'Style', value: 'Pop Art'}),
(Subject:AttributeValue{name:'Subject', value: 'Portrait'}),
(Originality:AttributeValue{name:'Originality', value: 'Reproduction'}),
(Region:AttributeValue{name:'Region', value: 'Japan'}),
(Price)-[:IS_ATTRIBUTEVALUE_OF]->(n),
(Stock)-[:IS_ATTRIBUTEVALUE_OF]->(n),
(Style)-[:IS_ATTRIBUTEVALUE_OF]->(n),
(Subject)-[:IS_ATTRIBUTEVALUE_OF]->(n),
(Originality)-[:IS_ATTRIBUTEVALUE_OF]->(n),
(Region)-[:IS_ATTRIBUTEVALUE_OF]->(n)
WITH (n),(t),(x)
create (n)-[:OF_CATEGORY]->(x)
create (t)-[:SELLS]->(n)
The format of my csv is as follows:
I have 4 categories, 30 products and 10 subscriber creates me:
Added 164 labels, created 164 nodes, set 328 properties, created 184
relationships, completed after 254 ms.
I verify the result with:
MATCH p=()-[r:OF_CATEGORY]->() RETURN count(r)
There are 23 relationships created, however, the remaining 7 relationships were not created.
please guide me with the query should be created all relationships in this case would be 30 relationships products with categories
The critical part is MATCH(x:Category{uuid:row.uuid_category})
If that match fails for a row, the row will be wiped out and none of the other operations for that row will execute.
Since your input consists of 4 of the same category (let's call them 1,2,3,and 4) repeating 7 times (for 28 rows total so far), and then two of those occurring one more time each (2 times if both successful, for a total of your entire 30 rows), it would make sense if some of your matches are failing, with :Category nodes with some of those uuid_category properties not actually being present in the graph.
Of those uuids (1,2,3, and 4), only 1 and 2 occur at the end (so occurring across 8 rows for these two, as opposed to 7 times for uuids 3 and 4). It would make sense if either uuid 3 or 4 doesn't have a corresponding node in the graph. That would get us 1 * 7 + 2 * 8 = 23, which is the number of relationships that your query is creating.
So there is no :Category node for the uuid_category ending with either 3 or 4.
Check your graph against your data to confirm.

Neo4j Connecting multiple relationships between multiple nodes

I am trying to achieve what is shown here:
I have 2 CSV Files, diease_mstr and Test_mstr Now in Test_mstr, I have many test to disease ID records, which means none of them are unique. The disease ID points to disease_mstr file. In disease_mstr file I have only 2 fields, ID and Disease_name (disease name is unique).
Now, I am creating 3 nodes with labels
1) Tests (only "testname" property) which will have unique tests (total 345 unique testnames)
**Properties :**
a) testname
2) Linknode (pulled entire Test_mstr file) also pulled "disease_name" for corresponding disease_ID from Disease_mstr File
**Properties**
a)tname
b)dname
c)did
3) Disease (pulled form disease_mstr) file.
**Properties**
a)did
b)diseasename
Afterwhich I run create relationships
1)MATCH (t:Tests),(n:Linknode) where t.testname = n.tname CREATE (n)-[r:TEST_2]->(t) RETURN n,r,t
2)MATCH (d:Disease), (l:Linknode) where d.did = l.did MERGE (d)-[r:FOR_DISEASE]->(l) RETURN d,r,l
To get the desired result as shown in image, I run following cypher command :
MATCH (d:Disease)-[r2:FOR_DISEASE]->(l:Linknode)-[r:TEST_2]->(t:Tests) RETURN l,r,t,r2 LIMIT 25
Can someone please help me create 2 more relationships which is marked and linked in image with BLUE and GREEN lines?.
Sample files and images can be accessed in my google folder link
Is your goal to link all diseases to tests so that for any disease you can find out which tests are relevant and for each test, which diseases it tests for?
If so, you are nearly there.
You don't need the link nodes other than to help you during linking the tests to the diseases. In your current scenario you're treating the link nodes as you would if you were creating a relational database. They won't add any value in your graph db. You can create a single relationship between diseases and tests which will do all the work.
Here's a step by step way to load your database. (It probably isn't the most efficient, but it's easy to follow and it works.)
Normalise and load your tests:
load csv with headers from "file:///test_mstr_csv.csv" as line
merge (:Test {testname:line.test_name});
Load your diseases (these looked normalised to me)
load csv with headers from "file:///disease_mstr_csv.csv" as line
create (:Disease {did:line.did, diseasename:line.disease_name});
Load your link nodes:
load csv with headers from "file:///test_mstr_csv.csv" as line
merge (:Link {testname:line.test_name, parentdiseaseid:line.parent_disease_ID});
Now you can create a direct relationship between the diseases and tests with the following query:
match(d:Disease), (l:Link) where d.did = l.parentdiseaseid
with d, l.testname as name
match(t:Test {testname:name}) create (d)<-[:TEST_FOR]-(t);
This last query will find all the link nodes for each disease and extract the test name. It then looks up the test and joins it directly to its corresponding disease.
The link nodes are redundent now, so you can delete them if you wish.
To create the 'blue lines', which I assume are meant to show where tests have diseases in common, run the query below:
match (d:Disease)<-[]-(:Test)-[]->(e:Disease) where id(d) > id(e)
merge (d)-[:BLUE_LINE]->(e);
The match clause finds all disease pairs with a common test, the where clause ensures a link is created in only one direction and the merge clause ensures only one link is created.

Neo4j - Problems with MATCH JOIN logic

I am having a problem creating a JOIN (MATCH) relationship. I am using the Neo4j example for the Northwinds graph database load as my learning example.
I have 2 simple CSV files that I successfully loaded via LOAD CSV FROM HEADERS. I then set 2 indexes, one for each entity. My final step is to create the MATCH (JOIN) statement. This is where I am having problems.
After running the script, instead of telling me how many relationships it created, my return message is "(no changes, no records)". Here are my script lines:
LOAD CSV WITH HEADERS FROM 'FILE:///TestProducts.csv' AS row
CREATE (p:Product)
SET p = row
Added 113 labels, created 113 nodes, set 339 properties, completed after 309 ms.
LOAD CSV WITH HEADERS FROM 'FILE:///TestSuppliers.csv' AS row
CREATE (s:Supplier)
SET s = row
Added 23 labels, created 23 nodes, set 46 properties, completed after 137 ms.
CREATE INDEX ON :Product(productID)
Added 1 index, completed after 20 ms.
CREATE INDEX ON :Supplier(supplierID)
Added 1 index, completed after 2 ms.
MATCH (p:Product),(s:Supplier)
WHERE p.supplierID = s.supplierID
CREATE (s)-[:SUPPLIES]->(p)
(no changes, no records)
Why? If I run the Northwinds example, with the example files, it works. It says 77 relationships were created. Also is there any way to see database structure? How can I debug this issue? Any help is greatly appreciated.
I think you may be using the wrong casing for the property names. The NorthWind data uses uppercased first letters for its property names.
Try using ProductID and SupplierID in your indexes and the MATCH clause.
Thanks for all the suggestions. With Neo4j there are always multiple ways to solve the problem. I did some digging and found a rather simple solution.
MATCH (a)-[r1]->()-[r3]->(b) CREATE UNIQUE (a)-[:REQUIRES]-(b);
Literal Code (for me) is:
MATCH (a:Application)-[:CONSISTS_OF]->()-[:USES]->(o:Object) CREATE UNIQUE (a)-[:REQUIRES]-(o);
This grouped the relationships (n2) and created a virtual relationship, making the individual n2 nodes redundant for the query.
Namaste Everyone!
Dean

Return only results based on current object for dynamic menus

If I have an object that has_many - how would I go about getting back only the results that are related to the original results related ids?
Example:
tier_tbl
| id | name
1 low
2 med
3 high
randomdata_tbl
| id | tier_id | name
1 1 xxx
2 1 yyy
3 2 zzz
I would like to build a query that returns only, in the case of the above example, rows 1 and 2 from tier_tbl, because only 1 and 2 exist in the tier_id data.
Im new to activerecord, and without a loop, don't know a good way of doing this. Does rails allow for this kind of query building in an easier way?
The reasoning behind this is so that I can list only menu items that relate to the specific object I am dealing with. If the object i am dealing with has only the items contained in randomdata_tbl, there is no reason to display the 3rd tier name. So i'd like to omit it completely. I need to go this direction because of the way the models are set up. The example im dealing with is slightly more complicated.
Thanks
Lets call your first table tiers and second table randoms
If tier has many randoms and you want to find all tiers whoes id present in table randoms, you can do it that way:
# database query only
Tier.joins(:randoms).uniq
or
# with some ruby code
Tier.select{ |t| t.randoms.any? }

neo4j REST 'Server got itself in trouble'

I am running a very basic test to check my understanding and evaluate neo4j REST server (neo4j-community-1.8.M07). I am using Neo4j Python REST Client.
Each test iteration starts with a random strings for the source node name and the destination node name. The names contain only letters a..z and numbers 0..9 (oddly enough, I never got it to fail if I use A..Z and 0..9). The name may be from one char to 36 chars long and there are no repeating chars. I create 36 nodes, where the 1-st node name is only one char long and the 36-th node name has 36 chars. Then I create relations between all nodes. The name of each relation is the concatenation of the source node name and the destination node name. The final graph has 37 nodes (1 reference node and 36 nodes with names from one char to 36 non-repeating chars) and 1260 relations. Before each test iteration I clear the graph, so that it has only one (the reference) node.
The problem is that after several successful iterations neo4j REST server crashes:
Error [500]: Internal Server Error. Server got itself in trouble.
Invalid data sent
The query that crashes the system can be different - here is an example of a query_string that caused a problem:
START n_from=node:index_faqts(node_name="h"),
n_to=node:index_faqts(node_name="hg2b8wpj04ms")CREATE UNIQUE
n_from-[r:`hhg2b8wpj04ms` ]->n_to RETURN r
self.cypher_extension.execute_query( query_string )
I spent a lot of time trying to find a trend, but in vain. If I did something wrong with the queries none of the tests would ever work. I have observed crashes for number of successful test cycles between 5 and 25 rounds.
What might be causing neo4j REST server to crash?
P.S. Some details...
The nodes are created like this:
...
self.index_faqts[ "node_name" ][ p_str_node_name ] =
self.gdb.nodes.create( **p_dict_node_attributes )
...
Just in case - before issuing the query to create a new relation I check the graph to make sure that the
source and the destination nodes exist. That check never failed.
You are using too many relationship-types, currently the limit is at 32k. Might be patched in Neo4j if you have a valid use-case.

Resources