Query timeout in Neo4j 3.0.6 - neo4j

It looks like previously working approach is deprecated now:
unsupported.dbms.executiontime_limit.enabled=true
unsupported.dbms.executiontime_limit.time=1s
According to the documentation new variables are responsible for timeouts handling:
dbms.transaction.timeout
dbms.transaction_timeout
At the same time the new variables look related to the transactions.
The new timeout variables look not working. They were set in the neo4j.conf as follows:
dbms.transaction_timeout=5s
dbms.transaction.timeout=5s
Slow cypher query isn't terminated.
Then the Neo4j plugin was added to model a slow query with transaction:
#Procedure("test.slowQuery")
public Stream<Res> slowQuery(#Name("delay") Number Delay )
{
ArrayList<Res> res = new ArrayList<>();
try ( Transaction tx = db.beginTx() ){
Thread.sleep(Delay.intValue(), 0);
tx.success();
} catch (Exception e) {
System.out.println(e);
}
return res.stream();
}
The function served by the plugin is executed with neoism Golang package. And the timeout isn't triggered as well.

The timeout is only honored if your procedure code invokes either operations on the graph like reading nodes and rels or explicitly checks if the current transaction is marked as terminate.
For the later, see https://github.com/neo4j-contrib/neo4j-apoc-procedures/blob/master/src/main/java/apoc/util/Utils.java#L41-L51 as example.

According to the documentation the transaction guard is interested in orphaned transactions only.
The server guards against orphaned transactions by using a timeout. If there are no requests for a given transaction within the timeout period, the server will roll it back. You can configure the timeout in the server configuration, by setting dbms.transaction_timeout to the number of seconds before timeout. The default timeout is 60 seconds.
I've not found a way how to trigger timeout for a query which isn't orphaned with a native functionality.
#StefanArmbruster pointed a good direction. The timeout triggering functionality can be got with creating a wrapper function in Neo4j plugin like it is made in apoc.

Related

Are stored procedures in Cosmos DB automatically retried on conflict?

Stored procedures in Cosmos DB are transactional and run under isolation snapshop with optimistic concurrency control. That means that write conflicts can occur, but they are detected so the transaction is rolled back.
If such a conflict occurs, does Cosmos DB automatically retry the stored procedure, or does the client receive an exception (maybe a HTTP 412 precondition failure?) and need to implement the retry logic itself?
I tried running 100 instances of a stored procedures in parallel that would produce a write conflict by reading the a document (without setting _etag), waiting for a while and then incrementing an integer property within that document (again without setting _etag).
In all trials so far, no errors occurred, and the result was as if the 100 runs were run sequentially. So the preliminary answer is: yes, Cosmos DB automatically retries running an SP on write conflicts (or perhaps enforces transactional isolation by some other means like locking), so clients hopefully don't need to worry about aborted SPs due to conflicts.
It would be great to hear from a Cosmos DB engineer how this is achieved: retry, locking or something different?
You're correct in that this isn't properly documented anywhere. Here's how OCC check can be done in a stored procedure:
function storedProcedureWithEtag(newItem)
{
var context = getContext();
var collection = context.getCollection();
var response = context.getResponse();
if (!newItem) {
throw 'Missing item';
}
// update the item to set changed time
newItem.ChangedTime = (new Date()).toISOString();
var etagForOcc = newItem._etag;
var upsertAccecpted = collection.upsertDocument(
collection.getSelfLink(),
newItem,
{ etag: etagForOcc }, // <-- Pass in the etag
function (err2, feed2, options2) {
if (err2) throw err2;
response.setBody(newItem);
}
);
if (!upsertAccecpted) {
throw "Unable to upsert item. Id: " + newItem.id;
}
}
Credit: https://peter.intheazuresky.com/2016/12/22/documentdb-optimistic-concurrency-in-a-stored-procedure/
SDK does not retry on a 412, 412 failures are related to Optimistic Concurrency and in those cases, you are controlling the ETag that you are passing. It is expected that the user handles the 412 by reading the newest version of the document, obtains the newer ETag, and retries the operation with the updated value.
Example for V3 SDK
Example for V2 SDK

How to parallelize HTTP requests within an Apache Beam step?

I have an Apache Beam pipeline running on Google Dataflow whose job is rather simple:
It reads individual JSON objects from Pub/Sub
Parses them
And sends them via HTTP to some API
This API requires me to send the items in batches of 75. So I built a DoFn that accumulates events in a list and publish them via this API once they I get 75. This results to be too slow, so I thought instead of executing those HTTP requests in different threads using a thread pool.
The implementation of what I have right now looks like this:
private class WriteFn : DoFn<TheEvent, Void>() {
#Transient var api: TheApi
#Transient var currentBatch: MutableList<TheEvent>
#Transient var executor: ExecutorService
#Setup
fun setup() {
api = buildApi()
executor = Executors.newCachedThreadPool()
}
#StartBundle
fun startBundle() {
currentBatch = mutableListOf()
}
#ProcessElement
fun processElement(processContext: ProcessContext) {
val record = processContext.element()
currentBatch.add(record)
if (currentBatch.size >= 75) {
flush()
}
}
private fun flush() {
val payloadTrack = currentBatch.toList()
executor.submit {
api.sendToApi(payloadTrack)
}
currentBatch.clear()
}
#FinishBundle
fun finishBundle() {
if (currentBatch.isNotEmpty()) {
flush()
}
}
#Teardown
fun teardown() {
executor.shutdown()
executor.awaitTermination(30, TimeUnit.SECONDS)
}
}
This seems to work "fine" in the sense that data is making it to the API. But I don't know if this is the right approach and I have the sense that this is very slow.
The reason I think it's slow is that when load testing (by sending a few million events to Pub/Sub), it takes it up to 8 times more time for the pipeline to forward those messages to the API (which has response times of under 8ms) than for my laptop to feed them into Pub/Sub.
Is there any problem with my implementation? Is this the way I should be doing this?
Also... am I required to wait for all the requests to finish in my #FinishBundle method (i.e. by getting the futures returned by the executor and waiting on them)?
You have two interrelated questions here:
Are you doing this right / do you need to change anything?
Do you need to wait in #FinishBundle?
The second answer: yes. But actually you need to flush more thoroughly, as will become clear.
Once your #FinishBundle method succeeds, a Beam runner will assume the bundle has completed successfully. But your #FinishBundle only sends the requests - it does not ensure they have succeeded. So you could lose data that way if the requests subsequently fail. Your #FinishBundle method should actually be blocking and waiting for confirmation of success from the TheApi. Incidentally, all of the above should be idempotent, since after finishing the bundle, an earthquake could strike and cause a retry ;-)
So to answer the first question: should you change anything? Just the above. The practice of batching requests this way can work as long as you are sure the results are committed before the bundle is committed.
You may find that doing so will cause your pipeline to slow down, because #FinishBundle happens more frequently than #Setup. To batch up requests across bundles you need to use the lower-level features of state and timers. I wrote up a contrived version of your use case at https://beam.apache.org/blog/2017/08/28/timely-processing.html. I would be quite interested in how this works for you.
It may simply be that the extremely low latency you are expecting, in the low millisecond range, is not available when there is a durable shuffle in your pipeline.

Why is GroupByKey in beam pipeline duplicating elements (when run on Google Dataflow)?

Background
We have a pipeline that starts by receiving messages from PubSub, each with the name of a file. These files are exploded to line level, parsed to JSON object nodes and then sent to an external decoding service (which decodes some encoded data). Object nodes are eventually converted to Table Rows and written to Big Query.
It appeared that Dataflow was not acknowledging the PubSub messages until they arrived at the decoding service. The decoding service is slow, resulting in a backlog when many message are sent at once. This means that lines associated with a PubSub message can take some time to arrive at the decoding service. As a result, PubSub was receiving no acknowledgement and resending the message. My first attempt to remedy this was adding an attribute to each PubSub messages that is passed to the Reader using withAttributeId(). However, on testing, this only prevented duplicates that arrived close together.
My second attempt was to add a fusion breaker (example) after the PubSub read. This simply performs a needless GroupByKey and then ungroups, the idea being that the GroupByKey forces Dataflow to acknowledge the PubSub message.
The Problem
The fusion breaker discussed above works in that it prevents PubSub from resending messages, but I am finding that this GroupByKey outputs more elements than it receives: See image.
To try and diagnose this I have removed parts of the pipeline to get a simple pipeline that still exhibits this behavior. The behavior remains even when
PubSub is replaced by some dummy transforms that send out a fixed list of messages with a slight delay between each one.
The Writing transforms are removed.
All Side Inputs/Outputs are removed.
The behavior I have observed is:
Some number of the received messages pass straight through the GroupByKey.
After a certain point, messages are 'held' by the GroupByKey (presumably due to the backlog after the GroupByKey).
These messages eventually exit the GroupByKey (in groups of size one).
After a short delay (about 3 minutes), the same messages exit the GroupByKey again (still in groups of size one). This may happen several times (I suspect it is proportional to the time they spend waiting to enter the GroupByKey).
Example job id is 2017-10-11_03_50_42-6097948956276262224. I have not run the beam on any other runner.
The Fusion Breaker is below:
#Slf4j
public class FusionBreaker<T> extends PTransform<PCollection<T>, PCollection<T>> {
#Override
public PCollection<T> expand(PCollection<T> input) {
return group(window(input.apply(ParDo.of(new PassthroughLogger<>(PassthroughLogger.Level.Info, "Fusion break in")))))
.apply("Getting iterables after breaking fusion", Values.create())
.apply("Flattening iterables after breaking fusion", Flatten.iterables())
.apply(ParDo.of(new PassthroughLogger<>(PassthroughLogger.Level.Info, "Fusion break out")));
}
private PCollection<T> window(PCollection<T> input) {
return input.apply("Windowing before breaking fusion", Window.<T>into(new GlobalWindows())
.triggering(Repeatedly.forever(AfterPane.elementCountAtLeast(1)))
.discardingFiredPanes());
}
private PCollection<KV<Integer, Iterable<T>>> group(PCollection<T> input) {
return input.apply("Keying with random number", ParDo.of(new RandomKeyFn<>()))
.apply("Grouping by key to break fusion", GroupByKey.create());
}
private static class RandomKeyFn<T> extends DoFn<T, KV<Integer, T>> {
private Random random;
#Setup
public void setup() {
random = new Random();
}
#ProcessElement
public void processElement(ProcessContext context) {
context.output(KV.of(random.nextInt(), context.element()));
}
}
}
The PassthroughLoggers simply log the elements passing through (I use these to confirm that elements are indeed repeated, rather than there being an issue with the counts).
I suspect this is something to do with windows/triggers, but my understanding is that elements should never be repeated when .discardingFiredPanes() is used - regardless of the windowing setup. I have also tried FixedWindows with no success.
First, the Reshuffle transform is equivalent to your Fusion Breaker, but has some additional performance improvements that should make it preferable.
Second, both counters and logging may see an element multiple times if it is retried. As described in the Beam Execution Model, an element at a step may be retried if anything that is fused into it is retried.
Have you actually observed duplicates in what is written as the output of the pipeline?

Zendesk - CreateUser JobStatus Results is Null

So I am currently working with the ZenDesk API. I am creating many users using the batch CreateUser method that takes an array of up to 100 user objects. Now, for some reason, some users fail to generate. Therefore, I wanted to obtain the result of the JobStatus after creating the users so that I can identify the problem easily. The issue is that the result variable is null after performing the CreateUsers() method.
Some sample code:
public static void createEndUsers(Zendesk zd){
for(Organization org : zd.getOrganizations()){
List<User> usersList = getUsersPerOrg(org, 0)
JobStatus js = zd.createUsers(usersList);
List<T> resultElements = js.getResults();
}
}
Why is getResults() returning null in this instance? Is it because the operation has not yet been performed when it reaches that part of the code? How can I make sure that I "wait" until the result is ready before I try to access it?
If you have a look at the posssible values from org.zendesk.client.v2.model.JobStatus.JobStatusEnum you will notice that the job may be queued for execution or it could still be running at the time that the job status was returned by the operation org.zendesk.client.v2.Zendesk#createUsers(org.zendesk.client.v2.model.User...).
As can be seen from the Zendesk Documentation for createUsers Operation
This endpoint returns a job_status JSON object and queues a background job to do the work. Use the Show Job Status endpoint to check for the job's completion.
only when the job is completed, there will be a corresponding result delivered for the operation.
A possible solution in your case would be to check (possibly in a separate thread) every 500ms whether the job status is not queued or not running (otherwise said whether it completed) and if it successfully completed to retrieve the results.

Timeout for Futures in Akka

We have a server that processes portfolio and securities (inside it) in different actors. For portfolio with smaller number of securities (<20) this works fine. When i increase the number of security count to 1000, encountered following issues:
akka.dispatch.FutureTimeoutException: Futures timed out after [5000] milliseconds
I could bypass this error by increasing timeout inside akka config, is that the right thing to do? In akka versions earlier than 1.2 i could set self.timeout inside the actor but that is deprecated.
The other issue I faced (intermittently) is that the entire server hangs while joining in futures.map code inside my portfolio actor:
//fork out for each security
val listOfFutures = new ListBuffer[Future[Security]]()
for (security <- portfolio.getSecurities.toList) {
val securityProcessor = actorOf[SecurityProcessor].start()
listOfFutures += (securityProcessor ? security) map {
_.asInstanceOf[Security]
}
}
EventHandler.info(this,"joining results from security processors")
//join for each security
val futures = Future.sequence(listOfFutures.toList)
futures.map {
listOfSecurities =>
portfolioResponse = MergeHelper.merge(portfolio, listOfSecurities)
}.get
You do not state which version of Akka you're on, and given my limited time with the crystal ball I'll assume that you're on 1.2.
You can specify a Timeout when you call ask/?
(Also, your code is a bit convoluted, but that I have already solved in your other question.)
Cheers,
√

Resources