Dart eventloop control equivalent - dart

In NodeJS we have process.nextTick(), setImmediate and setTimeout , I want to know what is equivalent in Dart
setImmediate() is designed to execute a script once the current poll phase completes.
setTimeout() schedules a script to be run after a minimum threshold in ms has elapsed.
any time you call process.nextTick() in a given phase, all callbacks passed to process.nextTick() will be resolved before the event loop continues. This can create some bad situations because it allows you to "starve" your I/O by making recursive process.nextTick() calls, which prevents the event loop from reaching the poll phase

The corresponding Dart operations are:
setTimeout(callback, durationMS): Timer(duration, callback) or Future.delayed(duration, callbackWithResult).
setImmediate(callback): Timer.run(callback), Timer(Duration.zero, callback) or Future(callbackWithResult).
nextTick(callback): scheduleMicrotask(callback) or Future.microtask(callbackWithResult).

Related

Is there a way to make repeatedly forever apache beam trigger to only execute after the previous execution is completed?

I am using global window with repeated forever after processing time trigger to process streaming data from pub-sub as below :
PCollection<KV<String,SMMessage>> perMSISDNLatestEvents = messages
.apply("Apply global window",Window.<SMMessage>into(new GlobalWindows())
.triggering(Repeatedly.forever(AfterProcessingTime.pastFirstElementInPane().plusDelayOf(Duration.standardMinutes(1))))
.discardingFiredPanes())
.apply("Convert into kv of msisdn and SM message", ParDo.of(new SmartcareMessagetoKVFn()))
.apply("Get per MSISDN latest event",Latest.perKey()).apply("Write into Redis", ParDo.of(new WriteRedisFn()));
Is there a way to make repeatedly forever apache beam trigger to only execute after the previous execution is completed ? The reason for my question is because the next trigger processing will need to read data from redis, written by the previous trigger execution.
Thank You
So the trigger here would fire at the interval you provided. The trigger is not aware of any downstream processing so it's unable to depend on such steps of your pipeline.
Instead of depending on the trigger for consistency here, you could add a barrier (a DoFn) that exists before the Write step and only gives up execution after you see the previous data in Redis.
You could try and explicitly declare a global window trigger, as the example below:
Trigger subtrigger = AfterProcessingTime.pastFirstElementInPane();
Trigger maintrigger = Repeatedly.forever(subtrigger);
I think that triggers would help you on your case, since it will allow you to create event times, which will run when you or your code trigger them, so you would only run repeatedly forever when a trigger finishes first.
I found this documentation which might guide you on the triggers you are trying to create.

is there a way to inject and a config into a ParDo without sideInput?

I have a ParDo that uses state and timers with a periodically updating PcollectionView as sideInput to that parDo; google dataflow will throw an exception that timers are not allowed in such a case. Is there another way to feed config data to the parDo with out sideInput? Essentially, the sideInput was a map of config data that was reading from datastore about every 24 hours.
I am currently trying to see if I can create a ParDo before the one with state and timers to periodically update the config, but I don't see how we can access that map from within the next ParDo. Any suggestions?
Note: This pipeline is running in streaming mode with a global window and reading from pubsub messages as they arrive. Datastore is used to hold data needed to decide when to output an element to a pubsub topic.
Instead of using state timers to update the side input, you can use a fixed window to periodically update your PCollectionView with your data source:
PCollectionView<Map<String,String>> sideInput = pipeline
.apply(notifications)
.apply(
Window.<Long>into(FixedWindows.of(Duration.standardMinutes(refreshMinutes)))
.triggering(
Repeatedly.forever(AfterPane.elementCountAtLeast(1))
)
.withAllowedLateness(Duration.ZERO)
.discardingFiredPanes()
)
.apply( /* query data source */ )
.apply(View.<Map<String,String>>asSingleton());

Openwhisk: request has not yet finished

I have a distributed Openwhisk setup, and when I execute a large number of functions simultaneously like this
wsk -i action invoke test -r -b
at some point instead of getting the actual result, I start getting the following message:
ok: invoked /_/test, but the request has not yet finished, with id
Any idea how I could force Openwhisk execute the function and still return the result regardless how long the invocation actually takes? Is there maybe some playbook variable responsible for timeout?
You can execute a function in non-blocking mode. In this case, you will get the activation ID immediately and function execution will happen in the background. You can then check/track the status of function execution with the activation ID.
Remove the "-b" option from the command.
Also, a function execution timeout is configurable (default 60s), if function execution requires more time to execute, you can define it while creating the function.
For per function limit you can specify below setting when creating the function.
-t, --timeout LIMIT the timeout LIMIT in milliseconds after which the action is terminated (default 60000)

How can I programmatically cancel a Dataflow job that has run for too long?

I'm using Apache Beam on Dataflow through Python API to read data from Bigquery, process it, and dump it into Datastore sink.
Unfortunately, quite often the job just hangs indefinitely and I have to manually stop it. While the data gets written into Datastore and Redis, from the Dataflow graph I've noticed that it's only a couple of entries that get stuck and leave the job hanging.
As a result, when a job with fifteen 16-core machines is left running for 9 hours (normally, the job runs for 30 minutes), it leads to huge costs.
Maybe there is a way to set a timer that would stop a Dataflow job if it exceeds a time limit?
It would be great if you can create a customer support ticket where we would could try to debug this with you.
Maybe there is a way to set a timer that would stop a Dataflow job if
it exceeds a time limit?
Unfortunately the answer is no, Dataflow does not have an automatic way to cancel a job after a certain time. However, it is possible to do this using the APIs. It is possible to wait_until_finish() with a timeout then cancel() the pipeline.
You would do this like so:
p = beam.Pipeline(options=pipeline_options)
p | ... # Define your pipeline code
pipeline_result = p.run() # doesn't do anything
pipeline_result.wait_until_finish(duration=TIME_DURATION_IN_MS)
pipeline_result.cancel() # If the pipeline has not finished, you can cancel it
To sum up, with the help of #ankitk answer, this works for me (python 2.7, sdk 2.14):
pipe = beam.Pipeline(options=pipeline_options)
... # main pipeline code
run = pipe.run() # doesn't do anything
run.wait_until_finish(duration=3600000) # (ms) actually starts a job
run.cancel() # cancels if can be cancelled
Thus, in case if a job was successfully finished within the duration time in wait_until_finished() then cancel() will just print a warning "already closed", otherwise it will close a running job.
P.S. if you try to print the state of a job
state = run.wait_until_finish(duration=3600000)
logging.info(state)
it will be RUNNING for the job that wasn't finished within wait_until_finished(), and DONE for finished job.
Note: this technique will not work when running Beam from within a Flex Template Job...
The run.cancel() method doesn't work if you are writing a template and I haven't seen any successful work around it...

Defining "global" behavior in Gulp (measuring task duration)

I'm working on moving us from ant to gulp, and as part of the effort I want to write timing stats to Graphite. We're doing this in ant as well (no idea how, beside the point anyway). My question is, I'd prefer to not have to add some or other plugin manually to every task we have (we have over 60), but rather have some sort of global behavior, where for every task, before the task is run a timer is start, and when it signals completion we push some data to Graphite (over statsd).
Can someone point me in the right direction where to hook into gulp for this? I couldn't find anything particularly useful in the docs / recipes...
We're running gulp#4.
Instead of adding timing code to your numerous tasks, you could make use of the NPM gulp-duration package.
A snippet of an example of it's use is shown below:
function rebundle() {
var uglifyTimer = duration('uglify time')
var bundleTimer = duration('bundle time')
return bundler.bundle()
.pipe(source('bundle.js'))
.pipe(bundleTimer)
// start just before uglify recieves its first file
.once('data', uglifyTimer.start)
.pipe(uglify())
.pipe(uglifyTimer)
.pipe(gulp.dest('example/'))
}
gulp-duration's duration function:
Creates a new pass-through duration stream. When this stream is
closed, it will log the amount of time since its creation to your
terminal.
will then allow you to log the duration of the task.
Whilst this is not a global behaviour solution, at least you can specify the timing code in your gulp file, as opposed to having to modify all 60+ of your tasks.

Resources