I'm using celery inside a django project, i have a celery scheduled task that run every minute and check inside a db if there is a new task to start, and also the task configured has a time start and duration.
The job of this periodic task is:
Start a new async task if there is a newone configured. (task.delay(...))
Check if a task previous started is running
Stop task that exceed its duration (app.control.revoke(...))
.... and other stuff...
But the question is: What is the "best practice" to monitor the status of started async task inside a periodic task?
I mean everytime the sceduled task run, i get from DB all the configured task (started, to start....) but i don't have the related celery task id associated to it, should i store celery task id inside db, to have the db task associated to the related task celery running?
Could django-celery help me?
Thanks.
Celery will track status automatically for you using Result Backends. If you want to store this state using Django ORM, then yes django-celery can help with that:
http://docs.celeryproject.org/en/latest/django/first-steps-with-django.html#using-the-django-orm-cache-as-a-result-backend
http://docs.celeryproject.org/en/latest/userguide/tasks.html#result-backends
http://docs.celeryproject.org/en/latest/configuration.html#task-result-backend-settings
One thing that might also help is Celery has several features for stopping tasks that exceed their duration.
You can use configuration to set global limits:
http://docs.celeryproject.org/en/latest/configuration.html#celeryd-task-time-limit
You can set expiration times per task type using decorator parameters:
http://docs.celeryproject.org/en/latest/userguide/tasks.html#list-of-options
You can set expiration times per scheduled instance:
http://docs.celeryproject.org/en/latest/userguide/calling.html#expiration
Related
There is one task Assigned to WORKER A, however after spending sometime, WORKER A realized, this can not be handled by own and needs to be transferred to WORKER B.
How can we achieve this using Twilio Task Router?
First you have to understand how is the lifecycle of a Task.
When the task is created. the first state is pending.
Then, Twilio will look for a worker who has capacity to get this Task.
The task is now reserved.
When a Task is reserved, this task could not be assigned to a new agent, because it violates the Task LifeCycle. (https://www.twilio.com/docs/taskrouter/lifecycle-task-state)
If you are going to solve this problem, you have two options:
If you want a Flex solution for the twilio flex plattaform you can use a plugin available (https://www.twilio.com/docs/flex/solutions-library/chat-and-sms-transfers)
If you want to solve it with a backend solution. you have to first:
delete or complete the Task.
Create a new one with the same Task attributes to preserve the data in the
conversation.
Create a new channel to communicate the worker with the task user.
Assign the task to the workerSid (WorkerB). Remember that, you have to handle if the worker B has no capacity to recieve a new Task
When the Spring Cloud Data Flow server uses local deployer to handle the task lifecycle management(launch, stop, etc.,), the corresponding task execution log can be obtained only when the task execution status is RUNNING.
This is by design because the local task launcher prunes the task instance history every time a new task instance is launched and hence the access to the log is not available, which is explored by the code here.
The reason was not to grow the number of Task process IDs in the local deployer' in process Map. You can see the issue related to it here.
But, this causes some side effects as discussed in another thread as not being able to show the previous instances' task execution log in local deployer mode.
I think it would be ok to consider having some X number of task executions in history and that way at least we can avoid these side effects for a few executions in the history. Created a GH issue to track this.
I'm a little confused about this. I have a couple of tasks that I would like to run asynchronously, for example my inventory sync integration. For this I have implemented delayed job, but I realize that I need to run rake jobs:work on Heroku for this. I can use the Heroku scheduler to run this rake task every 10 minutes. My question is; if I create rake tasks to run i.e. my inventory sync method, do I still need delayed job? My understanding is that heroku scheduler kicks off 'one off dynos'.
Instead of using delayed job, could I not just kick off the sync method directly since a separate dyno is used anyway? What is the added value of delayed job here?
Heroku's Scheduler replaces what cron would handle on a typical server. Delayed Job or Sidekiq are for processing jobs asynchronously from your app, not a timed schedule.
The reason you use a worker & run these jobs on the back-end is so that your server can return a response as soon as is possible rather than making the user wait for some potentially unnecessarily long running process to finish (lots of queries, outbound e-mail, external API requests, etc.).
Ex, scheduler can run analytics or updates from a script every hour or day, but delayed job can not.
My environment: CentOS, Rails 4, Ruby 2.
I'm running in Delayed Job infinite loop that getting in real time information from some another site.
The task is to run simultaniously many processes to get information from different sites. For each new process I add it to DJ queue by running search_engine.delay.track!.
So when I run one worker it successfully takes first job from queue and when job complete takes next. When I run more than one worker by ./bin/delayed_job -n5 start every DJ worker takes first job in queue and starts to track! it. But I want them to take only jobs, that was not been taked by other workers.
It was because of Delayed::Worker.max_run_time = 0. The DJ thought, that work already done and runs it again.
I have a rake task which is going to call 4 more rake tasks, in order:
rake:one
rake:two
rake:three
rake:four
Rake tasks one, two, and three are getting data and adding it to my database. Then rake:four is going to do something with that data. But I need to make sure that one, two, and three are complete first. Each rake task is actually spinning up Sidekiq workers to run in the background. In this scenario, would all of the workers created by rake:one finish first, then rake:two, etc?
If not, how can I ensure that the workers are executed in order?
Sidekiq processes jobs in the order which they are created, but by default it processes multiple jobs simultaneously, and there is no guarantee that a given job will finish before another job is started.
Quoting from https://github.com/mperham/sidekiq/wiki/FAQ:
How can I process a certain queue in serial?
You can't, by design. Sidekiq is designed for asynchronous processing
of jobs that can be completed in isolation and independent of each
other. Jobs will be popped off of Redis in the order in which they
were pushed but there's no guarantee that Job #1 will execute fully
before Job #2 is started.
If you need serial execution, you should look into other systems which
give those types of guarantees.
Note you can create a Sidekiq process dedicated to processing a queue
with a single worker. This will give you serial execution but it's a
hack.
Also note you can use third-party extensions for sidekiq to achieve
that goal.
You can simply create one meta rake task, which will include all those tasks in right order.
Or as a less hacky solution: reduce number of workers per queue to 1:
https://github.com/brainopia/sidekiq-limit_fetch#limits
And add all your jobs to this queue