Is Neo4j newsfeed flawed?

Is Neo4j newsfeed flawed? - neo4j

I am using this example, http://neo4j.com/docs/stable/cypher-cookbook-newsfeed.html, to maintain newsfeeds for my users. So I use the following to post a status update:
MATCH (me)
WHERE me.name='Bob'
OPTIONAL MATCH (me)-[r:STATUS]-(secondlatestupdate)
DELETE r
CREATE (me)-[:STATUS]->(latest_update { text:'Status',date:123 })
WITH latest_update, collect(secondlatestupdate) AS seconds
FOREACH (x IN seconds | CREATE (latest_update)-[:NEXT]->(x))
RETURN latest_update.text AS new_status
I encountered a severe flaw in this and don't know how to fix it. In a very rare scenario where two status updates are posted at the exactly same time (ex. 10ms apart), instead of replacing the current status, Neo4j creates two status updates. This leads to a much bigger problem where, the next updates are posted twice!

This looks like a race condition. To resolve that you basically need to make sure that at a given time only one transaction is modifying the status for this specific user.
Neo4j's Java API does have the ability to set locks to achieve this. Cypher doesn't have an explicit feature for this but you can e.g. remove a non-existing property to force a lock on the given node. With a lock in place concurrent transaction need to wait this the holder of the lock is finished with his transaction.
So grab a lock early in your statement:
MATCH (me)
WHERE me.name='Bob'
REMOVE me._not_existing // side effect: grab a lock early
WITH me
OPTIONAL MATCH (me)-[r:STATUS]-(secondlatestupdate)
DELETE r
CREATE (me)-[:STATUS]->(latest_update { text:'Status',date:123 })
WITH latest_update, collect(secondlatestupdate) AS seconds
FOREACH (x IN seconds | CREATE (latest_update)-[:NEXT]->(x))
RETURN latest_update.text AS new_status

Related

How to make transactions running a particular query execute sequentially in Neo4j?

I'm working on an app where users can un-bookmark a post they've bookmarked before. But I realized that if multiple requests is sent by a particular user to un-bookmark a post they've bookmarked before, node properties get set multiple times. For example, if user 1 bookmarked post 1, their noOfBookmarks (both user and post) will increase by 1 and when they un-bookmark, their noOfBookmarks will decrease by 1. But sometimes during concurrent requests, I get incorrect or negative noOfBookmarks depending on the number of requests. I'm using MATCH which will return 0 rows when the pattern can't be found.
I think the problem is because of the isolation level neo4j is using. During concurrent requests, the changes made by the first query to run will not be visible to other transactions until the first transaction is committed. So the MATCH is still returning rows, that's why I'm getting invalid properties. I think what I need is for transactions to be executed sequentially or get an exclusive read lock.
I've tried setting a property on the user and post node (before MATCHing the bookmark relationship) which will make the first transaction get a write lock on those nodes. I thought other transactions will wait at this point for the write lock to be released before continuing but it didn't work.
How do I ensure the first transaction during concurrent requests modify the graph and other transactions stop at that MATCH (which is the behaviour during sequential requests)?
This is my cypher query:
MATCH (user:User { id: $userId })
MATCH (post:Post { id: $postId })
WITH user, post
MATCH (user)-[bookmarkRel:BOOKMARKED_POST]->(post)
WITH bookmarkRel, user, post
DELETE bookmarkRel
WITH post, user
SET post.noOfBookmarks = post.noOfBookmarks - 1,
user.noOfBookmarks = user.noOfBookmarks - 1
RETURN post { .* }
Thank you

Reactor groupBy: What happens with remaining items after GroupedFlux is canceled?

I need to group infinite Flux by key with high cardinality.
For example:
group key is domain url
calls to one domain should be strictly sequential (next call happens after previous one is completed)
calls to different domains should be concurrent
time interval between items with same key (url) is unknown, but expected to have burst nature. Several items emitted in short period of time then long pause until next group.
queue
.groupBy(keyMapper, groupPrefetch)
.flatMap(
{ group ->
group.concatMap(
{ task -> makeSlowRemoteCall(task) },
0
)
.takeUntil { remoteCallResult -> remoteCallResult == DONE }
.timeout(groupTimeout, Mono.empty())
.then()
}
, concurrency
)
I cancel the group in two cases:
makeSlowRemoteCall() result indicates that with high probability there will be no new items in this group in near future.
Next item is not emitted during groupTimeout. I use timeout(timeout, fallback) variant to suppress TimeoutException and allow flatMap's inner publisher to complete successfully.
I want possible future items with same key to make new GroupedFlux and be processed with same flatMap inner pipeline.
But what happens if GroupedFlux has remaining unrequested items when I cancel it?
Does groupBy operator re-queue them into new group with same key or they are lost forever. If later what is the proper way to solve my problem. I am also not sure if I need to set concatMap() prefetch to 0 in this case.

I think groupBy() operator is not fit for my task with infinite source and a lot of groups. It makes infinite groups so it is necessary to somehow cancel idle groups downstream. But it is not possible to cancel GroupedFlux with guarantee that it has no unconsumed elements.
I think it will be great to have groupBy variant that emits finite groups.
Something like groupBy(keyMapper, boundryPredicate). When boundryPredicate returns true current group is complete and next element with same key will start new group.

Neo4j: Maintaining node counts

I'm investigating the use of Neo4j to detect potentially fraudulent card transactions in near real time. I receive details of a customer and a card they've just used from our on-line systems. What I'm trying to do here is create new nodes for the customer and card if they don't exist, then establish the relationship between them.
Whenever the customer uses the card I want to set the time the card was last used, in addition, if this is the first time this customer-->card relationship has been seen, update totals of the number of cards the customer is associated with and the number of customers associated with the card.
The Cypher below seems to work, however I think it will re-evaluate the counts every time the relationship is seen, not just on the create. Is it possible to use the ON MATCH and ON CREATE in this statement to limit the unnecessary processing?
MERGE (c:customers {customer_id:"12345678"})
MERGE (a:cards {card_hash:"45uIic..."})
MERGE (c)-[r:has_card]->(a)
set r.last_transaction = "30-NOV-2016 07:58:42"
set a.card_ct = size(()-[:has_card]->(a))
set c.card_count = size((c)-[:has_card]->())
I'm running this from Python (using py2neo), I also want to return something back that will allow me to kick off a bespoke dijkstra based search of the neighborhood. Any ideas how I'd return some variable based on whether this was a new or existing relationship?

There is no need you to even have the card_ct or card_count properties.
Since neo4j 2.1, getting a count of the number of relationships of a specific type from a node is very efficient. So, every time you need a count, just use SIZE(()-[:has_card]->(node)) or SIZE((node)-[:has_card]->()).

How about something like this. Create a counter on a MATCH and if the counter is greater than zero then it is an existing relationship. Otherwise it is a new relationship.
MERGE (c:customers {customer_id:"12345678"})
MERGE (a:cards {card_hash:"45uIic..."})
MERGE (c)-[r:has_card]->(a)
ON MATCH SET r.num = coalesce(r.num, 0) + 1
set r.last_transaction = "30-NOV-2016 07:58:42"
set a.card_ct = size(()-[:has_card]->(a))
set c.card_count = size((c)-[:has_card]->())
RETURN
CASE
WHEN r.num > 0 THEN false
ELSE true
END as new_relationship

Here's the Cypher I've ended up with, thanks to Dave Bennett for his suggestion. I also realised that I don't need to initiate any further analysis if only 1 customer is associated with 1 card so I've excluded this as well.
MERGE (c:customers {customer_id:"12345678"})
MERGE (a:cards {card_hash:"BFgn..."})
MERGE (c)-[r:has_card]->(a)
ON CREATE SET a.card_scheme = "VISA DEBIT"
, a.card_ct = size(()-[:has_card]->(a))
, c.card_count = size((c)-[:has_card]->())
ON MATCH SET r.ind = 1
SET r.last_transaction = "06-Dec-2016 11:19:13"
RETURN CASE WHEN exists(r.ind)
AND a.card_ct + c.card_count > 2
THEN false
ELSE true END as new_relationship

Cypher query - Optional Create

I am trying to create a social network-like structure.
I would like to create a timeline of posts which looks like this
(user:Person)-[:POSTED]->(p1:POST)-[:PREV]->[p2:POST]...
My problem is the following.
Assuming a post for a user already exists, I can create a new post by executing the following cypher query
MATCH (user:Person {id:#id})-[rel:POSTED]->(prev_post:POST)
DELETE rel
CREATE (user)-[:POSTED]->(post:POST {post:"#post", created:timestamp()}),
(post)-[:PREV]->(prev_post);
Assuming, the user has not created a post yet, this query fails. So I tried to somehow include both cases (user has no posts / user has at least one post) in one update query (I would like to insert a new post in the "post timeline")
MATCH (user:Person {id:"#id"})
OPTIONAL MATCH (user)-[rel:POSTED]->(prev_post:POST)
CREATE (post:POST {post:"#post2", created:timestamp()})
FOREACH (o IN CASE WHEN rel IS NOT NULL THEN [rel] ELSE [] END |
DELETE rel
)
FOREACH (o IN CASE WHEN prev_post IS NOT NULL THEN [prev_post] ELSE [] END |
CREATE (post)-[:PREV]->(o)
)
MERGE (user)-[:POSTED]->(post)
Is there any kind of if-statement (or some type of CREATE IF NOT NULL) to avoid using a foreach loop two times (the query looks a litte bit complicated and I know that the loop will only run 1 time)?.
However, this was the only solution, I could come up with after studying this SO post. I read in an older post that there is no such thing as an if-statement.
EDIT: The question is: Is it even good to include both cases in one query since I know that the "no-post case" will only occur once and that all other cases are "at least one post"?
Cheers

I've seen a solution to cases like this in some articles. To use a single query for all cases, you could create a special terminating node for the list of posts. A person with no posts would be like:
(:Person)-[:POSTED]->(:PostListEnd)
Now in all cases you can run the query:
MATCH (user:Person {id:#id})-[rel:POSTED]->(prev_post)
DELETE rel
CREATE (user)-[:POSTED]->(post:POST {post:"#post", created:timestamp()}),
(post)-[:PREV]->(prev_post);
Note that the no label is specified for prev_post, so it can match either (:POST) or (:PostListEnd).
After running the query, a person with 1 post will be like:
(:Person)-[:POSTED]->(:POST)-[:PREV]->(:PostListEnd)
Since the PostListEnd node has no info of its own, you can have the same one node for all your users.

I also do not see a better solution than using FOREACH.
However, I think I can make your query a bit more efficient. My solution essentially merges the 2 FOREACH tests into 1, since prev_postand rel must either be both NULL or both non-NULL. It also combines the CREATE and the MERGE (which should have been a CREATE, anyway).
MATCH (user:Person {id:"#id"})
OPTIONAL MATCH (user)-[rel:POSTED]->(prev_post:POST)
CREATE (user)-[:POSTED]->(post:POST {post:"#post2", created:timestamp()})
FOREACH (o IN CASE WHEN prev_post IS NOT NULL THEN [prev_post] ELSE [] END |
DELETE rel
CREATE (post)-[:PREV]->(o)
)

In the Neo4j v3.2 developer manual it specifies how you can create essentially a composite key made of multiple node properties at this link:
CREATE CONSTRAINT ON (n:Person) ASSERT (n.firstname, n.surname) IS NODE KEY
However, this is only available for the Enterprise Edition, not Community.

"CASE" is as close to an if-statement as you're going to get, I think.
The FOREACH probably isn't so bad given that you're likely limited in scope. But I see no particular downside to separating the query into two, especially to keep it readable and given the operations are fairly small.
Just my two cents.

Too much time importing data and creating nodes

i have recently started with neo4j and graph databases.
I am using this Api to make the persistence of my model. I have everything done and working but my problems comes related to efficiency.
So first of all i will talk about the scenary. I have a couple of xml documents which translates to some nodes and relations between the, as i already read that this API still not support a batch insertion, i am creating the nodes and relations once a time.
This is the code i am using for creating a node:
var newEntry = new EntryNode { hash = incremento++.ToString() };
var result = client.Cypher
.Merge("(entry:EntryNode {hash: {_hash} })")
.OnCreate()
.Set("entry = {newEntry}")
.WithParams(new
{
_hash = newEntry.hash,
newEntry
})
.Return(entry => new
{
EntryNode = entry.As<Node<EntryNode>>()
});
As i get it takes time to create all the nodes, i do not understand why the time it takes to create one increments so fats. I have made some tests and am stuck at the point where creating an EntryNode the setence takes 0,2 seconds to resolve, but once it has reached 500 it has incremented to ~2 seconds.
I have also created an index on EntryNode(hash) manually on the console before inserting any data, and made test with both versions, with and without index.
Am i doing something wrong? is this time normal?
EDITED:
#Tatham
Thanks for the answer, really helped. Now i am using the foreach statement in the neo4jclient to create 1000 nodes in just 2 seconds.
On a related topic, now that i create the nodes this way i wanted to also create relationships. This is the code i am trying right now, but got some errors.
client.Cypher
.Match("(e:EntryNode)")
.Match("(p:EntryPointerNode)")
.ForEach("(n in {set} | " +
"FOREACH (e in (CASE WHEN e.hash = n.EntryHash THEN [e] END) " +
"FOREACH (p in pointers (CASE WHEN p.hash = n.PointerHash THEN [p] END) "+
"MERGE ((p)-[r:PointerToEntry]->(ee)) )))")
.WithParam("set", nodesSet)
.ExecuteWithoutResults();
What i want it to do is, given a list of pairs of strings, get the nodes (which are uniques) with the string value as the property "hash" and create a relationship between them. I have tried a couple of variants to do this query but i dont seem to find the solution.
Is this possible?

This approach is going to be very slow because you do a separate HTTP call to Neo4j for every node you are inserting. Each call is then a transaction. Finally, you are also returning the node back, which is probably a waste.
There are two options for doing this in batches instead.
From https://stackoverflow.com/a/21865110/211747, you can do something like this, where you pass in a set of objects and then FOREACH through them in Cypher. This means one, larger, HTTP call to Neo4j and then executing in a single transaction on the DB:
FOREACH (n in {set} | MERGE (c:Label {Id : n.Id}) SET c = n)
http://docs.neo4j.org/chunked/stable/query-foreach.html
The other option, coming soon, is that you will be able to write something like this in Cypher:
LOAD CSV WITH HEADERS FROM 'file://c:/temp/input.csv' AS n
MERGE (c:Label { Id : n.Id })
SET c = n
https://github.com/davidegrohmann/neo4j/blob/2.1-fix-resource-failure-load-csv/community/cypher/cypher/src/test/scala/org/neo4j/cypher/LoadCsvAcceptanceTest.scala

Categories

HOME

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart