Well scenario is like this:
I create a one node called counter node. Its initial value is 0 and incremented as user create its account on my website.
So there are three operation happen to operate this:
Read counter node value
Do some logic in php . Here like +1 to previous value of counter node
Write new value of counter node
Now problem is, If two or more users are coming exactly same time and creating such a condition that
Before first user write new value to counter node , it is being read by second user. Thus this will leave value of my 'counter node' in unstable condition.
Hope you got what I meant..
Any Solution ??
I am using neo4j 1.9.5 and php
Php Jadell :
https://github.com/jadell/Neo4jPHP
I heard of batch processing but not sure whether it will work. If any solution , Can you please give me a short example.
Thanks Amit Aggarwal
You can't do that with the pure REST API. I would try it with Cypher, maybe something like:
START n=node(123)
SET n.noOfUsers = n.noOfUsers + 1
RETURN n.noOfUsers
This should work in the latest version of Cypher
http://console.neo4j.org/?id=tnkldf
Neo4j 2.0 has mandatory transactions. If you incremented your counter property noOfUsers in a transaction, I'd think that would help you with your concurrency issue.
Just a thought, but first a question: What's the purpose of the counter? Is it for assigning user IDs, or is it strictly informational? If the latter, must you have an exact count? Eg if you wanted the total number of twitter or facebook users, would it matter if the count was off by a few? If the count doesn't need to be exact (or exact at a particular instance in time), you could run a periodic process to return the count of user nodes, like:
MATCH n:User
return count(*)
This would also help you deal with deleted nodes.
Related
I have a Neo4j DB with more than 30 million nodes. I'm wondering what might be the most efficient (regarding memory and speed) approach to do a bulk update of a returning pattern within a node property using only the Cypher Shell.
e.g.: having a node with the label USER with a property name of the type String like:
'Peter_Test'
If I want to get rid of all the underscores in a bulk update what's the best way to achieve this without having to select each of the 30 million nodes in a single transaction, update the content and write it back to the same property?
A selection of all USER nodes upfront and a following UNWIND for each entry within the selection plus an update would definitely run into memory issues.
Any advice to perform such a task?
You can use apoc procedure apoc.periodic.iterate for this ,
CALL apoc.periodic.iterate( "MATCH (o:Order) WHERE o.date > '2016-10-13' RETURN o", "MATCH (o)-[:HAS_ITEM]->(i) WITH o, sum(i.value) as value SET o.value = value", {batchSize:100})
This is example in the documentation. Where you return the nodes to be updated in first query and do the update in second query.here ,you won't load all 30 million nodes at once..you can configure it using batch size .
Check out the documentation here
I am refering to this graphgist : https://neo4j.com/graphgist/project-management
I'm actually trying to update a project plan when a duration on one task changes.
In the GraphGist, the whole project is always calculated from the initial activity to the last activity. This doesn't work great for me in a multi-project environment where I don't really know what the starting point is, and I don't know either the end point. What I would like for now, is just to update the earliest start of any activity which depends on a task I just updated.
The latest I have is the following :
MATCH p1=(:Activity {description:'Perform needs analysis'})<-[:REQUIRES*]-(j:Activity)
UNWIND nodes(p1) as task
MATCH (pre:Activity)<-[:REQUIRES]-(task:Activity)
WITH MAX(pre.duration+pre.earliest_start) as updateEF,task
SET task.earliest_start = updateEF
The intent is to get all the paths in the projects which depends on the task I just updated (in this case : "perform needs analysis"), also at every step of the path I'm checking if there aren't other dependencies which would override my duration update.
So, of course it only works on the direct connections.
if I have A<-[:requires]-B<-[:requires]-C
if I increase duration A, I believe it updates B based on A, but then C is calculated with the duration of B before B duration was updated.
How can I make this recursive? maybe using REDUCE?
(still searching...)
This is a very interesting issue.
You want to update the nodes 1 step away from the originally updated node, and then update the nodes 2 steps away (incorporating the previously-updated values as appropriate), and then 3 steps away, and so on until every node reachable from the original node has been updated.
The Cypher planner does not generate code that performs this kind of query/update pattern, where new values are propagated step by step through paths.
However, there is a workaround using the APOC plugin. For example, using apoc.periodic.iterate:
CALL apoc.periodic.iterate(
"MATCH p=(:Activity {description:'Perform needs analysis'})<-[:REQUIRES*]-(task:Activity)
RETURN task ORDER BY LENGTH(p)",
"MATCH (pre:Activity)<-[:REQUIRES]-(task)
WITH MAX(pre.duration+pre.earliest_start) as updateEF, task
SET task.earliest_start = updateEF",
{batchSize:1})
The first Cypher statement passed to the procedure generates the task nodes, ordered by distance from the original node. The second Cypher statement gets the pre nodes for each task, and sets the appropriate earliest_start value for that task. The batchSize:1 option tells the procedure to perform every iteration of the second statement in its own transaction, so that subsequent iterations will see the updated values.
NOTE: If the same task can be encountered multiple times at different distances, you will have to determine if this approach is right for you. Also, you cannot have other operations writing to the DB at the same time, as that could lead to inconsistent results.
why neo4j order by is very slow for large database :(
here is the example query:
PROFILE MATCH (n:Item) RETURN n ORDER BY n.name Desc LIMIT 25
and in result it's read all records but i already used index on name property.
here is the result
Click here to see results
it reads all nodes, it's real mess for large number of records.
any solution for this?
or neo4j is not good choice too for us :(
and any way to get last record from nodes?
Your question and problem are not very clear.
1) Are you sure that you added the index correctly?
CREATE INDEX ON :Item(name)
In the Neo4j browser execute :schema to see all your indexes.
2) How many Items does your database hold and what running time are you expecting and achieving?
3) What do you mean by 'last record from nodes'?
Indexes are currently only used to find entry points into the graph, but not for other uses including ordering of results.
Indexed-backed ORDER BY operations have been a highly requested feature for awhile, and while we've been tracking and ordering its priority, we've had several other features that took priority over this work.
I believe indexed-backed ORDER BY operations are currently scheduled very soon, for our 3.5 release coming in the last few months of 2018.
UPDATED: Wes hit a home run here! Thanks.. I've added a Rails version I was developing using the neography Gem.. Accomplishes the same thing but his version is much faster.. see comparison below
I am using a linked list in Neo4j (1.9, REST, Cypher) to help keep the comments in proper order (Yes I know I can sort on the time etc).
(object node)---[:comment]--->(comment)--->(comment)--->(comment).... etc
Currently I have 900 comments and it's taking 7 seconds to get through the whole list - completely unacceptable.. I'm just returning the ID of the node (I know, don't do this, but it's not he point of my post).
What I'm trying to do is find the ID's of users who commented so I can return a count.. (like "Joe and 405 others commented on your post").. Now, I'm not even counting the unique nodes at this point - I'm just returning the author_id for each record.. (I'll worry about counting later - first take care of the basic performance issue).
start object=node(15837) match object-[:COMMENTS*]->comments return comments.author_id
7 seconds is waaaay too long..
Instead of using a linked list, I could just simply have an object and link all the comments directly to the node - but that could lead to a supernode that is just bogged down, and then finding the most recent comments, even with skip and limit, will be dog slow..
Will relationship indexes help here? I've never used them other than to ensure a unique relationship, or to see if a relationship exists, but can I use them in a cypher query to help speed things up?
If not, what else can I do to decrease the time it takes to return the IDs?
COMPARISON: Here is the Rails version using "phase II" methods of the Neography gem:
next_node_id=18233
#neo=Neography::Rest.new
start_node = Neography::Node.load(next_node_id, #neo)
all_nodes=start_node.outgoing(:COMMENTS).depth(10000)
raise all_nodes.size.to_i
Result: 526 nodes found in 290ms..
Wes' solution took 5 ms.. :-)
Relationship indexes will not help. I'd suggest using an unmanaged extension and the traversal API--it will be a lot faster than Cypher for this particular query on long lists. This example should get you close:
https://github.com/wfreeman/linkedlistlength
I based it on Mark Needham's example here:
http://www.markhneedham.com/blog/2014/07/20/neo4j-2-1-2-finding-where-i-am-in-a-linked-list/
If you're only doing this to return a count, the best solution here is to not figure it out on every query since it isn't changing that often. Cache the results on the node in a total_comments property to your node. Every time a relationship is added or removed, update that count. If you want to know whether any of the current user's friends commented on it so you can say, "Joe and 700 others commented on this," you could do a second query:
start joe=node(15830) object=node(15838) match joe-[:FRIENDS]->friend-[:POSTED_COMMENT]->comment<-[:COMMENTS]-object RETURN friend LIMIT 1
You limit it to 1 since you only need the name of one friend who commented. If it returns someone, adjust the number of comments displayed by 1, include the user's name. You could do that with JS so it doesn't delay your page load. Sorry if my Cypher is a little off, not used to <2.0 syntax.
Sinking in big trouble,
Well can anyone tell me , how can i acquire write lock through cypher.
Note : I will use REST APIs , So my cypher would in php.
EDITED :
Scenario:
I am using Neo4j REST server and PHP to access it.
Now i have created a node say 'counter-node' which generates new user id. Logic is just add 1 to previous value.
Now If two users are coming simultaneously then first user read 'counter-node' value BUT before it can update it to 1 , second user read it . Thus value in 'counter-node' is not as expected.
Any Help
You don't need to acquire write locks explicitly. All nodes that you modify in a transaction are write-locked automatically.
So if you do this in your logic:
start tx
increment counter node
read the value of the counter node and set it on the user node as ID
commit tx
no two users will ever get the same ID.
The popular APOC plugin for Neo4j has a selection of explicit Locking procedures that can be called via Cypher such as call apoc.lock.nodes([nodes])
Learn more at neo4j-contrib.github.io/neo4j-apoc-procedures/#_locking
Note: as far as I can tell, this functionality doesn't exist natively in Cypher so APOC is probably your best bet.