Defining "global" behavior in Gulp (measuring task duration) - ant

I'm working on moving us from ant to gulp, and as part of the effort I want to write timing stats to Graphite. We're doing this in ant as well (no idea how, beside the point anyway). My question is, I'd prefer to not have to add some or other plugin manually to every task we have (we have over 60), but rather have some sort of global behavior, where for every task, before the task is run a timer is start, and when it signals completion we push some data to Graphite (over statsd).
Can someone point me in the right direction where to hook into gulp for this? I couldn't find anything particularly useful in the docs / recipes...
We're running gulp#4.

Instead of adding timing code to your numerous tasks, you could make use of the NPM gulp-duration package.
A snippet of an example of it's use is shown below:
function rebundle() {
var uglifyTimer = duration('uglify time')
var bundleTimer = duration('bundle time')
return bundler.bundle()
.pipe(source('bundle.js'))
.pipe(bundleTimer)
// start just before uglify recieves its first file
.once('data', uglifyTimer.start)
.pipe(uglify())
.pipe(uglifyTimer)
.pipe(gulp.dest('example/'))
}
gulp-duration's duration function:
Creates a new pass-through duration stream. When this stream is
closed, it will log the amount of time since its creation to your
terminal.
will then allow you to log the duration of the task.
Whilst this is not a global behaviour solution, at least you can specify the timing code in your gulp file, as opposed to having to modify all 60+ of your tasks.

Related

Is there a way to make repeatedly forever apache beam trigger to only execute after the previous execution is completed?

I am using global window with repeated forever after processing time trigger to process streaming data from pub-sub as below :
PCollection<KV<String,SMMessage>> perMSISDNLatestEvents = messages
.apply("Apply global window",Window.<SMMessage>into(new GlobalWindows())
.triggering(Repeatedly.forever(AfterProcessingTime.pastFirstElementInPane().plusDelayOf(Duration.standardMinutes(1))))
.discardingFiredPanes())
.apply("Convert into kv of msisdn and SM message", ParDo.of(new SmartcareMessagetoKVFn()))
.apply("Get per MSISDN latest event",Latest.perKey()).apply("Write into Redis", ParDo.of(new WriteRedisFn()));
Is there a way to make repeatedly forever apache beam trigger to only execute after the previous execution is completed ? The reason for my question is because the next trigger processing will need to read data from redis, written by the previous trigger execution.
Thank You
So the trigger here would fire at the interval you provided. The trigger is not aware of any downstream processing so it's unable to depend on such steps of your pipeline.
Instead of depending on the trigger for consistency here, you could add a barrier (a DoFn) that exists before the Write step and only gives up execution after you see the previous data in Redis.
You could try and explicitly declare a global window trigger, as the example below:
Trigger subtrigger = AfterProcessingTime.pastFirstElementInPane();
Trigger maintrigger = Repeatedly.forever(subtrigger);
I think that triggers would help you on your case, since it will allow you to create event times, which will run when you or your code trigger them, so you would only run repeatedly forever when a trigger finishes first.
I found this documentation which might guide you on the triggers you are trying to create.

Is there a way to force a Dataflow job to stop after it is running for a long time

is there a way to force a DAtaflow job to kill itself if it is running longer than xxx hours?
Kind regards
Marco
Posting the comment as an answer.
We are in the process of implementing this for Batch Pipelines. It is not yet available as a Dataflow flag, but it will be within a month.
We recently implemented this feature for Dataflow. You would do it by passing an extra experiment:
--experiments=max_workflow_runtime_walltime_seconds=300
Or whatever number of seconds.
Programatically this would be like so:
String experimentValue = String.format(
"max_workflow_runtime_walltime_seconds=%d",
killAfterSeconds);
ExperimentalOptions.addExperiment(myOptions.as(ExperimentalOptions.class), experimentValue);
In Python:
experiment_value = "max_workflow_runtime_walltime_seconds=%d" % timeout_secs
my_options.view_as(DebugOptions).add_experiment(experiment_value)

Dask task steam does not display a custom task given to `map_blocks`

I have written a function named nd_rmmeh and passed it to dask.array.Array.map_blocks.
The task runs and completes normally but does not show on task stream on dashboard.
This is despite the fact that it does show on task "graph" and task "progress" as seen in the picture below:
I did mouse over the boxes and did not find any nd_rmmeh labels.
The timing of nd_rmmeh do coincides with when empty (white) sections of task stream appears.
However, I couldn't see how it is actually run from the dashboard.
I am interested in checking whether the nd_rmmeh release GIL enough to be run as threads instead of processes.
I have a suspicion that it doesn't by looking at htop task manager.
For context, here is how I call map_blocks:
da.copy(
deep = False,
data = da.data.map_blocks(
nd_rmmeh,
dtype = np.float,
meta = da.data,
# the rest is some key word arguments to nd_rmmeh ... omitted
),
)
I can not recall why I use `` instead of xarray.map_blocks,
but it feels like that shouldn't matter.
So the questions is:
why task stream doesn't display the custom function and what could be done to fix it.

How can I programmatically cancel a Dataflow job that has run for too long?

I'm using Apache Beam on Dataflow through Python API to read data from Bigquery, process it, and dump it into Datastore sink.
Unfortunately, quite often the job just hangs indefinitely and I have to manually stop it. While the data gets written into Datastore and Redis, from the Dataflow graph I've noticed that it's only a couple of entries that get stuck and leave the job hanging.
As a result, when a job with fifteen 16-core machines is left running for 9 hours (normally, the job runs for 30 minutes), it leads to huge costs.
Maybe there is a way to set a timer that would stop a Dataflow job if it exceeds a time limit?
It would be great if you can create a customer support ticket where we would could try to debug this with you.
Maybe there is a way to set a timer that would stop a Dataflow job if
it exceeds a time limit?
Unfortunately the answer is no, Dataflow does not have an automatic way to cancel a job after a certain time. However, it is possible to do this using the APIs. It is possible to wait_until_finish() with a timeout then cancel() the pipeline.
You would do this like so:
p = beam.Pipeline(options=pipeline_options)
p | ... # Define your pipeline code
pipeline_result = p.run() # doesn't do anything
pipeline_result.wait_until_finish(duration=TIME_DURATION_IN_MS)
pipeline_result.cancel() # If the pipeline has not finished, you can cancel it
To sum up, with the help of #ankitk answer, this works for me (python 2.7, sdk 2.14):
pipe = beam.Pipeline(options=pipeline_options)
... # main pipeline code
run = pipe.run() # doesn't do anything
run.wait_until_finish(duration=3600000) # (ms) actually starts a job
run.cancel() # cancels if can be cancelled
Thus, in case if a job was successfully finished within the duration time in wait_until_finished() then cancel() will just print a warning "already closed", otherwise it will close a running job.
P.S. if you try to print the state of a job
state = run.wait_until_finish(duration=3600000)
logging.info(state)
it will be RUNNING for the job that wasn't finished within wait_until_finished(), and DONE for finished job.
Note: this technique will not work when running Beam from within a Flex Template Job...
The run.cancel() method doesn't work if you are writing a template and I haven't seen any successful work around it...

Jenkins pipeline "waitUntil" - change delay between attempts

We use a Jenkins pipeline for our builds and tests. After the build, we run automated tests on several measurement devices.
For a better overview about the needed testing time, I created a test stage which is periodically checking the status of the tests. When all tests are finished, the pipeline is done. I use the "waitUntil" implementation of Jenkins pipeline for this functionality.
My problem is: The pause between the attemps gets more and more after every try. This is a quite good idea. BUT: After a while, the pause between the attemps gets up to 16 hours and more. This value is too high for my needs because I want to know the needed test time exactly.
My question is: Does anyone know a way to change this behaviour of "waitUntil"?
I know I could use a "while" loop but I would prefer to solve this using "waitUntil".
stage ">>> Waiting for testruns"
waitUntil {
sleep(10)
return(checkIfTestsAreFinished())
}
New versions of Jenkins have capped this to never go over 15 seconds (see https://issues.jenkins-ci.org/browse/JENKINS-34554 ).
In waitUntil, if the processing in the block returns false, then the waitUntil step waits a bit longer and tries again. “a bit longer” means, a 0.25 second wait time. If it needs to loop again, it multiplies that by a factor of 1.2 to get 0.3 seconds for the next wait cycle. On each succeeding cycle, the last wait time is multiplied again by 1.2 to get the time to wait. So, the sequence goes as 0.25, 0.3, 0.36, 0.43, 0.51... until 15 secs(as it is mentioned in one of the answers below, Jenkins solved it).
See the image below:
If you are using older Jenkins version then possible solution could be to use timeout
timeout(time: 1, unit: 'HOURS'){
// you can change format in seconds, minutes, hours
// you can decide your know timeout limit.
waitUntil {
try {
//sleep(10)// you don't need this
return(checkIfTestsAreFinished())
} catch (exception) {
return false
}
}//waituntil
}//timeout
Please note the behavior is from Jenkins LTS 2.235.3.
Jenkins command waitUntil is not supposed to launch something synchronously.
To know the required time you must add a timestamp into the test output parse it and calc separately.

Resources