Does dataflow support custom triggers or updating trigger delays? - google-cloud-dataflow

TL:DR; Is it possible to create a custom trigger that only fires if some flag is set? Is it possible to deploy the job with a trigger with a huge delay while we know a large data event is happening, and then deploy an update to the job with the trigger having a normal or no delay once that event is finished?
Following on from: Remove duplicates across window triggers/firings
The situation where this happens the most problematically (millions of duplicate firings) is when we're doing a backfill of old data. Given we know when this was happening I was wondering if we could implement a custom trigger that doesn't fire while a flag is set. Is that something that would be possible? Alternatively, could we deploy the job with a trigger that includes a huge delay while backfill is going on, and then issue an update with the normal trigger when it's finished?

Dataflow does not yet support custom triggers, or triggers based on some separate piece of metadata. However, you can change the frequency of a processing time trigger with Update; just change the value of the plusDelay() builder function and run with --update as normal.

Related

How to reflect process on before_update?

I have a project in which when I try to update some attribute, a long and exhausting before_update function runs. This function runs some scripts, and when they're finished successfully the attribute is changed.
The problem is that I want a way to reflect to current status of the currently running scripts (to display some sort of 2/5...3/5... process), but I can't figure out a solution. I tried saving the last running command in the DB, but because the scripts are running in a before_update scope the commit is done only after all script are finished.
Is there any elegant solution to this kind of problem?
In general, you should avoid running expensive, cross-cutting code in callbacks. A time will come when you want to update one of those records without running that code, and then you'll start adding flags to determine when that callback should run, and all sorts of other nastiness. Also, if the record is being updated during a request, the expensive callback code will slow the whole request down, and potentially time out and/or block other visitors from accessing your application.
The way to architect this would be to create the record first (perhaps with a flag/state that tells the rest of your app that the update hasn't been "processed" yet - meaning that related code currently in your callback hasn't run yet). Then, you'd enqueue a background job that does whatever is in your callback. If you are using Sidekiq, you can use the sidekiq-status gem to update the job's status as it's running.
You'd then add a controller/action that checks up on the job's status and returns it in JSON, and some JS that pings that action every few seconds to check up on the status of the job and update your interface accordingly.
Even if you didn't want to update your users on the status of the job, a background job would probably still be in order here - especially if that code is very expensive, or involves third-party API calls. If not, it likely belongs in the controller, and you could run it all in a transaction. But if you need to update your users on the status of that work, a background job is the way to go.

Rails finding all user-scheduled events for a particular time?

I've got an app that allows users to schedule tasks to run whenever they desire. (A todo list with recurring items.)
I need to somehow re-trigger these events to show up again each time their schedule comes up by updating an attribute on the object - it may also send a notification to the user.
My plan for this was to have a cron job that runs every minute/hour/short interval, and in that job, it would find all of the items with schedules that match the current time or should be updated since the last job, however, short of iterating through every item, I don't see a quick way of querying for those objects.
Using Ice Cube I can very easily and cleanly save schedules in my database, but I don't see a method of finding all events that match a particular point in time.
I know once I find the item I can run occurring_between? or occurring_at? to find if I should run it, but that requires pulling every single item into memory and manually checking it, which is not very scaleable.
Is there a way I'm missing, or are there other suggestions for accomplishing what I'm trying to do here? It's still pretty early, so I'm not attached to Ice Cube or any of the current implementations.
After some more thought- I'm not seeing any way to do this, so I've come up with a little hack that I'll do instead:
On the item object, I'll have 2 additional attributes. One will be the schedule which is fed to Ice Cube to generate all of the dates/times to recur at. The next will be next_occurrence, which I'll set on create and each time the item is renewed.
Then in the worker, I'll query for all items that have a next_occurrence in the past and process them, resetting the next_occurrence to be the next time the schedule is to occur.
I'll leave this answer unmarked for a bit in case anybody has a better solution.

How to process only the newest message using an out-of-context hook

In my application I use a winevent hook to get focus changes system-wide. Because there are no timing problems, I use an out-of-context hook, even if I know that it is slow. If there are multiple events fired quickly on after another, the system queues them and gives them to the hook callback function in the right order.
Now I would like to process only the newest focus change. So if there are already other messages in the queue, I want the callback function to stop and restart with the parameters of the newest message. Is there a way to do that?
When you receive a focus change, create an asynchronous notification to yourself, and cancel any previous notification(s) that may still be pending.
You can use PostMessage() and PeekMessage(PM_REMOVE) for that. Post a custom message to yourself, removing any previous custom message(s) that are still in the queue.
Or, you can use TTimer/SetTimer() to (re)start a timer on each focus change, and then process the last change when the timer elapses.
Either way, only the last notification will be processed once the messages slow down.

Multiple instances of a Quarts job to emulate timer functionality

I have a Grails application and need to set a timer, something that will initiate a broadcast via WebSocket at a given time, with the following stipulations:
A timer can be postponed or cancelled
There might be several different timers running at the same time (but with different "contexts")
Clustered mode should be supported, i.e. a timer fires only once regardless of the number of instances of my app in a cluster.
The solution I have come up with is:
Use Quartz to create an ordinary job without any given date for when it should be fired
The moment I want to set a timer, I create a new instance of the job with a cronExpression to fire it at a given time, and then save it persistently
Should I need to postpone the timer, I fetch it from the DB and rewrite the cronExpression to a new value instead.
My concerns are:
Is there any other way I can "set a timer" in Grails, possibly without using the concept of Quartz jobs?
It is possible to have multiple instances of a Quartz job, but is it the right way to use Quartz, or should it be avoided? Maybe I should use custom triggers instead?
If I go the way I explained before, is it going to work in a clustered mode (multiple instances)?

Send feedback to the user for a long running operation in Grails?

I have a long running operation in my Grails application. My first solution is simply to perform the operation in the controller and let the user wait until the action is finished. This is not an acceptable solution, I want to show the user the progress of the action. The progress is simply text. The action can take from 10 seconds to roughly 30 minutes.
How can I show the progress to the user for my long running action?
First you might want to try the Executor plugin so you can run the job in the background. This works quite well.
Then I guess you have 2 options. Have the browser poll the server via Ajax for an update (as Tim pointed out the JProgress plugin would do something like this for you) or get even more bleeding edge and consider HTML5 WebWorkers for a kind of server push approach. WebWorkers are not available in
You will need something like a Task or Job domain class with a field percentageComplete. The controller will create and save the Task or Job and then spawn a new thread to execute it. Perhaps place the execution code in a service.
It will be up to your execution code to update the Task or Job's percentageComplete field as it completes its task. Then you can poll the job (via ajax) to see how the job is progressing.
Note: determining that percentage which is complete is very much to up to your specific code. You will probably have to just come up with a best guess based on the knowledge you have. Even for an operation where it is obvious how to determine percentage complete (like a file download), it is not certain (network issues, etc.)
Can you determin the state of your progress? Let's say in percent?
I would create the long operation as a quarz job (background) and query the state of job/ long running progress via ajax.
The JProgress plugin might help, but I've never tried it out...

Resources