when loading csv in neo4j do not create all the relationships - neo4j

good to all please help me with this problem :D
when I execute my query:
USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM "file:///Create_all.csv" AS row
MATCH(x:Category{uuid:row.uuid_category})
MERGE (t:Subscriber{name:row.name_subscriber, uuid:row.uuid_subscriber})
CREATE (n:Product{name: row.name_product, uuid: row.uuid_product}),
(Price:AttributeValue{name:'Price', value: row.price_product}),
(Stock:AttributeValue{name:'Stock', value: row.stock_product }),
(Style:AttributeValue{name:'Style', value: 'Pop Art'}),
(Subject:AttributeValue{name:'Subject', value: 'Portrait'}),
(Originality:AttributeValue{name:'Originality', value: 'Reproduction'}),
(Region:AttributeValue{name:'Region', value: 'Japan'}),
(Price)-[:IS_ATTRIBUTEVALUE_OF]->(n),
(Stock)-[:IS_ATTRIBUTEVALUE_OF]->(n),
(Style)-[:IS_ATTRIBUTEVALUE_OF]->(n),
(Subject)-[:IS_ATTRIBUTEVALUE_OF]->(n),
(Originality)-[:IS_ATTRIBUTEVALUE_OF]->(n),
(Region)-[:IS_ATTRIBUTEVALUE_OF]->(n)
WITH (n),(t),(x)
create (n)-[:OF_CATEGORY]->(x)
create (t)-[:SELLS]->(n)
The format of my csv is as follows:
I have 4 categories, 30 products and 10 subscriber creates me:
Added 164 labels, created 164 nodes, set 328 properties, created 184
relationships, completed after 254 ms.
I verify the result with:
MATCH p=()-[r:OF_CATEGORY]->() RETURN count(r)
There are 23 relationships created, however, the remaining 7 relationships were not created.
please guide me with the query should be created all relationships in this case would be 30 relationships products with categories

The critical part is MATCH(x:Category{uuid:row.uuid_category})
If that match fails for a row, the row will be wiped out and none of the other operations for that row will execute.
Since your input consists of 4 of the same category (let's call them 1,2,3,and 4) repeating 7 times (for 28 rows total so far), and then two of those occurring one more time each (2 times if both successful, for a total of your entire 30 rows), it would make sense if some of your matches are failing, with :Category nodes with some of those uuid_category properties not actually being present in the graph.
Of those uuids (1,2,3, and 4), only 1 and 2 occur at the end (so occurring across 8 rows for these two, as opposed to 7 times for uuids 3 and 4). It would make sense if either uuid 3 or 4 doesn't have a corresponding node in the graph. That would get us 1 * 7 + 2 * 8 = 23, which is the number of relationships that your query is creating.
So there is no :Category node for the uuid_category ending with either 3 or 4.
Check your graph against your data to confirm.

Related

Neo4j - Return Specific Limited-Number of Relationships/Nodes depending on a calculation

How is it possible to return a limited number of nodes/relationships depending on a calculation?
I prepared one example:
Imagine we have 4 users (nodes) who each collect (relationship) 40kg Apples with a created-timestamp. On the other hand, we have a basket (node) which can take 100kg. How is it possible to return only the oldest 3 relationships, because the basket can be filled with those 3 relationships?
In other words:
The sum of collected-kilos of the 3 oldest relationships is over the basket size of 100kg. If we would take only the 2 oldest relationships we would have a sum of collected-kilos of 80, which is too less for the basket. Taking all 4 relationships would result in 160kg, which is far too much.
The background of my question is to reduce the size of query-return. If a query would return all 4 relationships and one would always ignore the 4th relationship, it makes it unneeded beforehand.
Thank you
The schema of the example looks like this:
In Neo4j the relationships/nodes could be created by this:
create
(_0:`Basket` {`size`:100}),
(_1:`User` {`name`:"Franc"}),
(_6:`User` {`name`:"Peter"}),
(_34:`User` {`name`:"Betty"}),
(_35:`User` {`name`:"Rita"}),
(_54:`Fruit` ),
(_1)-[:`COLLECTS` {`created`:"20221206212715",`kilos`:40,`type`:"Apples"}]->(_54),
(_6)-[:`COLLECTS` {`created`:"20221206212417",`kilos`:40,`type`:"Apples"}]->(_54),
(_34)-[:`COLLECTS` {`created`:"20221206212547",`kilos`:40,`type`:"Apples"}]->(_54)
(_35)-[:`COLLECTS` {`created`:"20221206212815",`kilos`:40,`type`:"Apples"}]->(_54)

Find nodes with 3+ occurrences in a 10 minute period

I have a list of nodes with a startTime property. I need to determine if the list contains a clump of 3 or more nodes with a startTime within 10 minutes of each other. I don't need to get the nodes that are in the clump, I just need a boolean indicating the existence of such a clump.
I am at a loss, everything I have tried fails so badly that it is not worth posting them.
I feel that I am missing something easy.
This should be doable.
First you'll need to collect the startTimes, order them, and collect them.
From there, you'll need to get the relevant pairings (each entry, and the entry 2 indices ahead for the end of the duration) that will comprise a group of 3, then see if the start times of that pair occur within 10 minutes of each other.
Assuming for the sake of example :Event nodes with a startTime property, you might use this query to get the results you want:
MATCH (e:Event)
WITH e
ORDER BY e.startTime ASC
WITH collect(e.startTime)[1..] as times
WITH times, range(0, size(times) - 3) as indices
RETURN any(index in indices WHERE times[index + 2] <= times[index] + duration({minutes:10}))

Neo4j - Problems with MATCH JOIN logic

I am having a problem creating a JOIN (MATCH) relationship. I am using the Neo4j example for the Northwinds graph database load as my learning example.
I have 2 simple CSV files that I successfully loaded via LOAD CSV FROM HEADERS. I then set 2 indexes, one for each entity. My final step is to create the MATCH (JOIN) statement. This is where I am having problems.
After running the script, instead of telling me how many relationships it created, my return message is "(no changes, no records)". Here are my script lines:
LOAD CSV WITH HEADERS FROM 'FILE:///TestProducts.csv' AS row
CREATE (p:Product)
SET p = row
Added 113 labels, created 113 nodes, set 339 properties, completed after 309 ms.
LOAD CSV WITH HEADERS FROM 'FILE:///TestSuppliers.csv' AS row
CREATE (s:Supplier)
SET s = row
Added 23 labels, created 23 nodes, set 46 properties, completed after 137 ms.
CREATE INDEX ON :Product(productID)
Added 1 index, completed after 20 ms.
CREATE INDEX ON :Supplier(supplierID)
Added 1 index, completed after 2 ms.
MATCH (p:Product),(s:Supplier)
WHERE p.supplierID = s.supplierID
CREATE (s)-[:SUPPLIES]->(p)
(no changes, no records)
Why? If I run the Northwinds example, with the example files, it works. It says 77 relationships were created. Also is there any way to see database structure? How can I debug this issue? Any help is greatly appreciated.
I think you may be using the wrong casing for the property names. The NorthWind data uses uppercased first letters for its property names.
Try using ProductID and SupplierID in your indexes and the MATCH clause.
Thanks for all the suggestions. With Neo4j there are always multiple ways to solve the problem. I did some digging and found a rather simple solution.
MATCH (a)-[r1]->()-[r3]->(b) CREATE UNIQUE (a)-[:REQUIRES]-(b);
Literal Code (for me) is:
MATCH (a:Application)-[:CONSISTS_OF]->()-[:USES]->(o:Object) CREATE UNIQUE (a)-[:REQUIRES]-(o);
This grouped the relationships (n2) and created a virtual relationship, making the individual n2 nodes redundant for the query.
Namaste Everyone!
Dean

Neo4j: Best way to batch relate nodes using Cypher?

When I run a script that tries to batch merge all nodes a certain types, I am getting some weird performance results.
When merging 2 collections of nodes (~42k) and (~26k), the performance is nice and fast.
But when I merge (~42) and (5), performance DRAMATICALLY degrades. I'm batching the ParentNodes (so (~42k) split up in batches of 500. Why does performance drop when I'm, essentially, merging less nodes (when the batch set is the same, but the source of the batch set is high and the target set is low)?
Relation Query:
MATCH (s:ContactPlayer)
WHERE has(s.ContactPrefixTypeId)
WITH collect(s) AS allP
WITH allP[7000..7500] as rangedP
FOREACH (parent in rangedP |
MERGE (child:ContactPrefixType
{ContactPrefixTypeId:parent.ContactPrefixTypeId}
)
MERGE (child)-[r:CONTACTPLAYER]->(parent)
SET r.ContactPlayerId = parent.ContactPlayerId ,
r.ContactPrefixTypeId = child.ContactPrefixTypeId )
Performance Results:
Process Starting
Starting to insert Contact items
[+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++]
Total time for 42149 Contact items: 19176.87ms
Average time per batch (500): 213.4ms
Longest batch time: 663ms
Starting to insert ContactPlayer items
[++++++++++++++++++++++++++++++++++++++++++++++++++++++++]
Total time for 27970 ContactPlayer items: 9419.2106ms
Average time per batch (500): 167.75ms
Longest batch time: 689ms
Starting to relate Contact to ContactPlayer
[++++++++++++++++++++++++++++++++++++++++++++++++++++++++]
Total time taken to relate Contact to ContactPlayer: 7907.4877ms
Average time per batch (500): 141.151517857143ms
Longest batch time: 883.0918ms for Batch number: 0
Starting to insert ContactPrefixType items
[+]
Total time for 5 ContactPrefixType items: 22.0737ms
Average time per batch (500): 22ms
Longest batch time: 22ms
Already inserted data for Contact.
Starting to relate ContactPrefixType to Contact
[+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++]
Total time taken to relate ContactPrefixType to Contact: 376540.8309ms
Average time per batch (500): 4429.78643647059ms
Longest batch time: 14263.1843ms for Batch number: 63
So far, the best I could come up with is the following (and it's a hack, specific to my environment):
If / Else condition:
If childrenNodes.count() < 200 -> assume they are type identifiers for the parent... i.e. ContactPrefixType
Else assume it is a matrix for relating multiple item types together (i.e. ContactAddress)
If childNodes < 200
MATCH (parent:{parentLabel}),
(child:{childLabel} {{childLabelIdProperty}:parent.{parentRelationProperty}})
CREATE child-[r:{relationshipLabel}]->parent
This takes about 3-5 seconds to complete per relationship type
Else
MATCH (child:{childLabel}),
(parent:{parentLabel} {{parentPropertyField : child.{childLabelIdProperty}})
WITH collect(parent) as parentCollection, child
WITH parentCollection[{batchStart}..{batchEnd}] as coll, child
FOREACH (parent in coll |
CREATE child-[r:{relationshipLabel}]-parent )
I'm not sure this is the most efficient way of doing this, but after trying MANY different options, this seems to be the fastest.
Stats:
insert 225,018 nodes with 2,070,977 properties
create 464,606 relationships
Total: 331 seconds.
Because this is a straight import and I'm not dealing with updates yet, I assume that all the relationships are correct and don't need to worry about invalid data... however, I will try to set properties to the relationship type so as to be able to perform cleanup functions later (i.e. store the parent and child Id's in the relationship type as properties for later reference)
If anyone can improve on this, I would love it.
Can you pass the ids in as parameters rather than fetch them from the graph? The query could look like
MATCH (s:ContactPlayer {ContactPrefixTypeId:{cptid})
MERGE (c:ContactPrefixType {ContactPrefixTypeId:{cptid})
MERGE c-[:CONTACT_PLAYER]->s
If you use the REST API Cypher resource, I think the entity should look something like
{
"query":...,
"params": {
"cptid":id1
}
}
If you use the transactional endpoint, it should look something like this. You control transaction size by the number of statements in each call, and also by the number of calls before you commit. More here.
{
"statements":[
"statement":...,
"parameters": {
"cptid":id1
},
"statement":...,
"parameters": {
"cptid":id2
}
]
}

multiple loads in neo4j

I have loaded some data in neo4j graph database using batch importer. Now let's say if I have to load more data then do i have to keep track of what was inserted externally or there are standard features of neo4j that can be used to:
1) get the id for the last node inserted so that i know the id for the new node that needs to be inserted and index accordingly.
2) get the list of nodes already present in database so that i can check the uniqueness of the nodes that are going to be inserted. if a node already exists in the database i will just use the same id and won't create a new node.
3) check the uniqueness of the triplets - suppose a triplet "January Month is_a" is already present in neo4j database and let's say the new data that i want to insert also have same triplet, i would like to not insert it as it will give me duplicate results.
For example: if you add following data in neo4j graph database using batch-importer:https://github.com/jexp/batch-import
$ cat nodes.csv
name age works_on
Michael 37 neo4j
Selina 14
Rana 6
Selma 4
$ cat nodes_index.csv
0 name age works_on
1 Michael 37 neo4j
2 Selina 14
3 Rana 6
4 Selma 4
$ cat rels.csv
start end type since counter:int
1 2 FATHER_OF 1998-07-10 1
1 3 FATHER_OF 2007-09-15 2
1 4 FATHER_OF 2008-05-03 3
3 4 SISTER_OF 2008-05-03 5
2 3 SISTER_OF 2007-09-15 7
Now, if you have to add more data to the same database then you will need to know following things:
1) if nodes already exists then what are their ids so that you can use them while creating a triplet, if not then create a list of such nodes (not in database) and then start from a id that has not been used in last import and use it as a starting id for creating a new nodes_index.csv
2) if a triplet in database already exist, then don't create that triplet again as it will result in a duplicate result when running cypher queries against the database.
It seems like same issue has been raised here as well: https://github.com/jexp/batch-import/issues/27
Thanks!
1- why you need to know last node id .. you don't need to know the id to insert the new node it will added automatically in first free id in graph
2- for uniqueness , why you don't use create unique query "for nodes and relations as well"
here you can check the references : http://docs.neo4j.org/chunked/1.8/cypher-query-lang.html

Resources