Starting one delayed job in Rails creates two processes - ruby-on-rails

Initially I have no process for delayed jobs(as indicated by htop), then when I run the command RAILS_ENV=production bin/delayed_job start I got one delayed job worker, as indicated by files in tmp/pids. However htop indicates now that there are two processes, as shown in the picture below.
So why is this happening? The other delayed job consumes memory where I don't have much of it!, however its TIME+ is zero, so it didn't consume time, so what does this means ?

I guess these are actually not two processes but two threads of a single process. You can hide threads by typing the capital H key in htop. If you'll see just one line then, you'll prove that it's a single process.
Delayed job probably has some master thread that governs the worker threads (or just the single worker in your setup), watches the queues and runs the workers if needed. Threads share most of the memory so I rather don't think the resources consumption issue comes from the two lines in htop.

Related

How to have Dask workers terminate when done?

I can't just shut down the entire cluster like in this answer because there might be other jobs running. I run one cluster in order to avoid having to use Kubernetes. Jobs get submitted to this cluster, but they call into C libraries that leak memory.
The workers run one thread per process, so it would be acceptable to terminate the entire worker process and have it be restarted.
I can't just use os.kill from the task itself because the task's return value has to be propagated back through Dask. I have to get Dask to terminate the process for me at the right time.
Is there any way to do this?

How to correctly use Resque workers?

I have the following tasks to do in a rails application:
Download a video
Trim the video with FFMPEG between a given duration (Eg.: 00:02 - 00:09)
Convert the video to a given format
Move the converted video to a folder
Since I wanted to make this happen in background jobs, I used 1 resque worker that processes a queue.
For the first job, I have created a queue like this
#queue = :download_video that does it's task, and at the end of the task I am going forward to the next task by calling Resque.enqueue(ConvertVideo, name, itemId). In this way, I have created a chain of queues that are enqueued when one task is finished.
This is very wrong, since if the first job starts to enqueue the other jobs (one from another), then everything get's blocked with 1 worker until the first list of queued jobs is finished.
How should this be optimised? I tried adding more workers to this way of enqueueing jobs, but the results are wrong and unpredictable.
Another aspect is that each job is saving a status in the database and I need the jobs to be processed in the right order.
Should each worker do a single job from above and have at least 4 workers? If I double the amount to 8 workers, would it be an improvement?
Have you considered using sidekiq ?
As said in Sidekiq documentation :
resque uses redis for storage and processes messages in a single-threaded process. The redis requirement makes it a little more difficult to set up, compared to delayed_job, but redis is far better as a queue than a SQL database. Being single-threaded means that processing 20 jobs in parallel requires 20 processes, which can take a lot of memory.
sidekiq uses redis for storage and processes jobs in a multi-threaded process. It's just as easy to set up as resque but more efficient in terms of raw processing speed. Your worker code does need to be thread-safe.
So you should have two kind of jobs : download videos and convert videos and any download video job should be done in parallel (you can limit that if you want) and then each stored in one queue (the "in-between queue") before being converted by multiple convert jobs in parallel.
I hope that helps, this link explains quite well the best practices in Sidekiq : https://github.com/mperham/sidekiq/wiki/Best-Practices
As #Ghislaindj noted Sidekiq might be an alternative - largely because it offers plugins that control execution ordering.
See this list:
https://github.com/mperham/sidekiq/wiki/Related-Projects#execution-ordering
Nonetheless, yes, you should be using different queues and more workers which are specific to the queue. So you have a set of workers all working on the :download_video queue and then you other workers attached to the :convert_video queue, etc.
If you want to continue using Resque another approach would be to use delayed execution, so when you enqueue your subsequent jobs you specify a delay parameter.
Resque.enqueue_in(10.seconds, ConvertVideo, name, itemId)
The down-side to using delayed execution in Resque is that it requires the resque-scheduler package, so you're introducing a new dependency:
https://github.com/resque/resque-scheduler
For comparison Sidekiq has delayed execution natively available.
Have you considered merging all four tasks into just one? In this case you can have any number of workers, one will do the job. It will work very predictable, you can even know how much time will take to finish the task. You also don't have problems when one of the subtasks takes longer than all others and it piles up in the queue.

How do I guarantee two delayed_job jobs aren't run concurrently while still allowing concurrency for other jobs?

I have a scenario where I have long-running jobs that I need to move to a background process. Delayed job with a single worker would be very simple to implement, but would run very, very slowly as jobs mount up. Much of the work is slow because the thread has to sleep to wait on various remote API calls, so running multiple workers concurrently is a very obvious choice.
Unfortunately, some of these jobs are dependent on each other. I can't run two jobs belonging to the same identifier simultaneously. Order doesn't matter, only that exactly one worker can be working on a given ID's work.
My first thought was named queues, and name the queue for the identifiers, but the identifiers are dynamic data. We could be running ID 1 today, 5 tomorrow, 365849 and 645609 the next, so on and so forth. That's too many named queues. Not only would giving each one a single worker probably exceed available system resources (as well as being incredibly wasteful since most of them won't be active at any given time), but since workers aren't configured from inside the code but rather as environment variables, I'd wind up with some insane config files. And creating a sane pool of N generic workers could wind up with all N workers running on the same queue if that's the only queue with work to do.
So what I need is a way to prevent two jobs sharing a given ID from running at the same time, while allowing any number of jobs not sharing IDs to run concurrently.

Is it possible to force concurrent jobs to run in separate Sidekiq processes?

One of the benefits of Sidekiq over Resqueue is that it can run multiple jobs in the same process. The drawback, however, is I can't figure out how to force a set of concurrent jobs to run in different processes.
Here's my use case: say I have to generate 64M rows of data, and I have 8 vCPUs on an amazon EC2 instance. I'd like to carve the task up into 8 concurrent jobs generating 8M rows each. The problem is that if I'm running 8 sidekiq processes, sometimes sidekiq will decide to run 2 or more of the jobs in the same process, and so it doesn't use all 8 vCPUs and takes much longer to finish. Is there any way to tell sidekiq which worker to use or to force it to spread jobs in a group evenly amongst processes?
Answer is you can't easily, by design. Specialization is what leads to SPOFs.
You can create a custom queue for each process and then create one job for each queue.
You can use JRuby which doesn't suffer the same flaw.
You can execute the processing as a rake task which will spawn one process per job, ensuring an even load.
You can carve up 64 jobs instead of 8 and get a more even load that way.
I would probably do the latter unless the resulting I/O crushes the machine.

Erlang: Job Scheduling Over a Dynamic Set of Nodes

I need some advice writing a Job scheduler in Erlang which is able to distribute jobs ( external os processes) over a set of worker nodes. A job can last from a few milliseconds to a few hours. The "scheduler" should be a global registry where jobs come in, get sorted and then get assigned and executed on connected "worker nodes". Worker nodes should be able to register on the scheduler by telling how many jobs they are able to process in parallel (slots). Worker nodes should be able to join and leave at any time.
An Example:
Scheduler has 10 jobs waiting
Worker Node A connects and is able to process 3 jobs in parallel
Worker Node B connects and is able to process 1 job in parallel
Some time later, another worker node joins which is able to process 2 jobs in parallel
Questions:
I seriously spent some time thinking about the problem but I am still not sure which way to go. My current solution is to have a globally registered gen_server for the scheduler which holds the jobs in its state. Every worker node spawns N worker processes and registers them on the scheduler. The worker processes then pull a job from the scheduler (which is an infinite blocking call with {noreply, ...} if no jobs are currently availale).
Here are some questions:
Is it a good idea to assign every new job to an existing worker, knowing that I will have to re-assign the job to another worker at the time new workers connect? (I think this is how the Erlang SMP scheduler does things, but reassigning jobs seems like a big headache to me)
Should I start a process for every worker processing slot and where should this process live: on the scheduler node or on the worker node? Should the scheduler make rpc calls to the worker node or would it be better for the worker nodes to pull new jobs and then execute them on their own?
And finally: Is this problem already solved and where to find the code for it? :-)
I already tried RabbitMQ for job scheduling but custom job sorting and deployment adds a lot of complexity.
Any advice is highly welcome!
Having read your answer in the comments I'd still recommend to use pool(3):
Spawning 100k processes is not a big deal for Erlang because spawning a process is much cheaper than in other systems.
One process per job is a very good pattern in Erlang, start a new process run the job in the process keeping all the state in the process and terminate the process after the job is done.
Don't bother with worker processes that process a job and wait for a new one. This is the way to go if you are using OS-processes or threads because spawning is expensive but in Erlang this only adds unnecessary complexity.
The pool facility is useful as a low level building block, the only thing it misses your your functionality is the ability to start additional nodes automatically. What I would do is start with pool and a fixed set of nodes to get the basic functionality.
Then add some extra logic that watches the load on the nodes e.g. also like pool does it with statistics(run_queue). If you find that all nodes are over a certain load threshold just slave:start/2,3 a new node on a extra machine and use pool:attach/1to add it to your pool.
This won't rebalance old running jobs but new jobs will automatically be moved to the newly started node since its still idle.
With this you can have a fast pool controlled distribution of incoming jobs and a slower totally separate way of adding and removing nodes.
If you got all this working and still find out -- after some real world benchmarking please -- you need rebalancing of jobs you can always build something into the jobs main loops, after a message rebalance it can respawn itself using the pool master passing its current state as a argument.
Most important just go ahead and build something simple and working and optimize it later.
My solution to the problem:
"distributor" - gen_server,
"worker" - gen_server.
"distributor" starts "workers" using slave:start_link, each "worker" is started with max_processes parameter,
"distributor" behavior:
handle_call(submit,...)
* put job to the queue,
* cast itself check_queue
handle_cast(check_queue,...)
* gen_call all workers for load (current_processes / max_processes),
* find the least busy,
* if chosen worker load is < 1 gen_call(submit,...) worker
with next job if any, remove job from the queue,
"worker" behavior (trap_exit = true):
handle_call(report_load, ...)
* return current_process / max_process,
handle_call(submit, ...)
* spawn_link job,
handle_call({'EXIT', Pid, Reason}, ...)
* gen_cast distributor with check_queue
In fact it is more complex than that as I need to track running jobs, kill them if I need to, but it is easy to implement in such architecture.
This is not a dynamic set of nodes though, but you can start new node from the distributor whenever you need.
P.S. Looks similar to pool, but in my case I am submitting port processes, so I need to limit them and have better control of what is going where.

Resources