In my process, i need to allow a user to execute the same task ( with different params ) multiple times.
Showing TaskService class i observe that there is just complete method.
The complete method set that task to complete and user cannot execute it again.
Is there a solution to that?
Thanks!
A task can only be executed once. However, your process could allow the same task (node) being triggered multiple times, for example use an intermediate signal start event leading to your task, where depending on the data sent alongside the signal, the task will be triggered. The process could allow retriggering the same task multiple times (during a certain phase of the process for example).
Related
I have a stream created in snowflake and a task which moves stream data into another table. I want the task to execute automatically every time there is new data in stream. How to automatically trigger the task when there is new data in stream?
Tasks can only triggered on a schedule but you can have them run as often as every minute
If a run takes more than 1 minute, then the next task is delayed until 1 min after the previous task finishes
While it is not "on demand" a task running every minute should suffice for most situations. Of course it means a warehouse constantly running but the same would be true for an on demand service assuming new data arrives all the time. If data is irregular, you can add a stream has data check so the task does not run if there is no data: https://docs.snowflake.com/en/sql-reference/functions/system_stream_has_data.html
Currently I am working with queue job on the ruby on rail with the Sidekiq. I have 2 jobs that are depend to each other and I want 1st job to finish first before starting the 2nd job, so is there any way to make it with Sidekiq.
Yes, you can use the YourSidekiqJob.new.perform(parameters_to_the_job) pattern. This will run your jobs in order, synchronously.
However, there are 2 things to consider here:
What happens if the first job fails?
How long does the each job run?
For #2, the pattern blocks execution for the length of time each job takes to run. If the jobs are extremely short in runtime, why use the jobs in the first place? If they're long, are you expecting the user to wait until they're done?
Alternatively, you can schedule the running of the second job as the last line in the body of the first one. You still need to account for the failure mode of job #1 or #2. Also, you need to consider that the job won't necessarily run when it's scheduled to run, due to the state of the queue at schedule time. How does this affect your business logic?
Hope this helps
--edit according to last comment
class SecondJob < SidekiqJob
def perform(params)
data = SomeData.find
return unless data.ready?
# do whatever you need to do with the ready data
end
end
This question is about Grand Central Dispatch, and dispatch_group_wait() in particular.
Assume a dispatch_group called group with 10 tasks in it waiting to be performed.
Elsewhere I have a task that needs to wait for any tasks in group to finish before it executes. To make that happen, I use dispatch_group_wait(group, DISPATCH_TIME_FOREVER).
To distinguish it from the tasks in group I'll call it lonelyTask.
If another task gets added to group while lonelyTask is waiting, which gets executed first, lonelyTask or the task that was added to group? In other words, do tasks added to a group while another task is waiting to execute get to "cut in line" ahead of the waiting task, or is the order in which they were called maintained?
I have searched the documentation, but haven't been able to find an answer to this question...
dispatch_group_wait and dispatch_group_notify both wait for the number of items that have entered the group to transition to zero. So, if you add an eleventh task to the group before all of the original ten tasks complete, a call to dispatch_group_wait will wait for all eleven to complete before continuing.
Group is a +/- counter (semaphore) that fires each time it reaches zero. It is unaware of tasks, because dispatch_group_async() is just a wrapper that enter()s group before task submission and enqueues a new block that will call task's block and then leave() that group. That's all. Groups may be even used without queues as asynchronous retain-counters or sml that.
So in general "you can't".
But if you are able to [re]target (dispatch_set_target_queue()) all related queues to a single concurrent queue, then dispatch_barrier_async() on that queue may do what you want, if called in proper isolation. Note that group has nothing to do with this at all.
Also, original lonelyTask is not a task from group's point of view — it is unrelated thread of execution that waits until group is balanced. The synonym for "cutting in line" is a "barrier".
Another approach would be to create a private group for each taskset and wait for that specific group. That will not insert a barrier though — following tasks will not wait for completion of lonelyTask and any uncontrolled async()s may "break the rules", leading to funny debugging sessions.
See the updated question below.
Original question:
In my current Rails project, I need to parse large xml/csv data file and save it into mongodb.
Right now I use this steps:
Receive uploaded file from user, store the data into mongodb
Use sidekiq to perform async process of the data in mongodb.
After process finished, delete the raw data.
For small and medium data in localhost, the steps above run well. But in heroku, I use hirefire to dynamically scale worker dyno up and down. When worker still processing the large data, hirefire see empty queue and scale down worker dyno. This send kill signal to the process, and leave the process in incomplete state.
I'm searching a better way to do the parsing, allow the parsing process got killed anytime (saving the current state when receiving kill signal), and allow the process got re-queued.
Right now I'm using Model.delay.parse_file and it don't get re-queued.
UPDATE
After reading sidekiq wiki, I found article about job control. Can anyone explain the code, how it works, and how it preserve it's state when receiving SIGTERM signal and the worker get re-queued?
Is there any alternative way to handle job termination, save current state, and continue right from the last position?
Thanks,
Might be easier to explain the process and the high level steps, give a sample implementation (a stripped down version of one that I use), and then talk about throw and catch:
Insert the raw csv rows with an incrementing index (to be able to resume from a specific row/index later)
Process the CSV stopping every 'chunk' to check if the job is done by checking if Sidekiq::Fetcher.done? returns true
When the fetcher is done?, store the index of the currently processed item on the user and return so that the job completes and control is returned to sidekiq.
Note that if a job is still running after a short timeout (default 20s) the job will be killed.
Then when the job runs again simply, start where you left off last time (or at 0)
Example:
class UserCSVImportWorker
include Sidekiq::Worker
def perform(user_id)
user = User.find(user_id)
items = user.raw_csv_items.where(:index => {'$gte' => user.last_csv_index.to_i})
items.each_with_index do |item, i|
if (i+1 % 100) == 0 && Sidekiq::Fetcher.done?
user.update(last_csv_index: item.index)
return
end
# Process the item as normal
end
end
end
The above class makes sure that each 100 items we check that the fetcher is not done (a proxy for if shutdown has been started), and ends execution of the job. Before the execution ends however we update the user with the last index that has been processed so that we can start where we left off next time.
throw catch is a way to implement this above functionality a little cleaner (maybe) but is a little like using Fibers, nice concept but hard to wrap your head around. Technically throw catch is more like goto than most people are generally comfortable with.
edit
Also you could not make call to Sidekiq::Fetcher.done? and record the last_csv_index on each row or on each chunk of rows processed, that way if your worker is killed without having the opportunity to record the last_csv_index you can still resume 'close' to where you left off.
You are trying to address the concept of idempotency, the idea that processing a thing multiple times with potential incomplete cycles does not cause problems. (https://github.com/mperham/sidekiq/wiki/Best-Practices#2-make-your-jobs-idempotent-and-transactional)
Possible steps forward
Split the file up into parts and process those parts with a job per part.
Lift the threshold for hirefire so that it will scale when jobs are likely to have fully completed (10 minutes)
Don't allow hirefire to scale down while a job is working (set a redis key on start and clear on completion)
Track progress of the job as it is processing and pick up where you left off if the job is killed.
I am trying to accomplish triggering a quartz job and somehow bind that job to a specific user or session and have the ability to check the status of the job.
Basically I need to trigger a job every time a user logs in. This job downloads information specific to that user -- it could take several minutes, and many of these jobs could be triggered async'ly for multiple users logging in at the same time.
Now the thing is, while the job is running, the user could be visiting different pages on the site -- that is, I don't want to interrupt them, so I need a way to trigger the job, but first check if there is already one running for that specific user. I also need the ability to update the UI once the job is finished.
I'm sort of at a loss here.. I can't find much on Google so I would really appreciate any insight.
Check out the Executor Plugin. It adds a callAsync() MetaMethod to your Grails artefacts, which will return a Java Future object, so that you can retrieve the status and results of your async call.
If you need Quartz's scheduling ability as well, just put your logic in a Grails Service and call the Service method from the Quartz job or via callAsync() depending on the circumstances.
I can only offer a suggestion. I'm not sure why you're asking how to trigger a job every time a user logs. For this, all you need to do is put all the code in the job into a service so that in the login controller you can also call the service code when the user logs in.
In the job class:
class EmailAlertJob {
def utilService
static triggers = {
cron name: 'emailCronTrigger', cronExpression: "0 15 1 ? * SUN-SAT"
}
def execute() {
def send= utilService.sendEmail()
}
}
In the controller:
def userLogsIn = {
utilService.sendEmail()
}
As far as keeping track if the job is running, there are several ways to do this such as keep a flag specific to the user, and checking the flag before running the job.