I was searching for a way to retrieve information about how the scheduling is done during a program's execution: which processes are in which scheduler, if they change, what process is active at each scheduler, if each scheduler runs in one core etc...
Any ideas or related documentation/articles/anything?
I would suggest you take a look on the following tracing/profiling options:
erlang:system_profile/2
It has options for monitoring scheduler and run queue (runnable_procs) activity.
The scheduler option will report
{profile, scheduler, Id, State, NoScheds, Ts}
where State will tell you if it is active or not. NoScheds reports the number of currently active schedulers (if I remember correctly).
The runnable_procs option will let you know if a process is put into or removed from a run queue of a particular scheduler.
If you have a system that supports DTrace, you can use the erlang dtrace probes being developed to see exactly when process scheduling events occur.
For example, I wrote a simple one-liner that shows you the number of nanoseconds that pass between sending a message to a process and having the recipient process be scheduled for execution (± a few nanoseconds for cross-core clock variance and processes and such).
Related
I would like to use a Dask delayed task to call an external program, which outputs it's progress to STDOUT. In the delayed, I plan to monitor the STDOUT and would like to update the Client process that is waiting for the delayed task with progress information extracted from the STDOUT. Is there a recommended way for a delayed task to communicate with its Client processes, or do I need to roll my own?
You could achieve this kind of flow with any of the coordination primitives or actors provided by dask. From your description, the Queue or pubsub mechanisms seem like they might be favourite. You should note that all of these are generally means for low-frequency and low-volume communications.
I am trying to understand how BEAM VM works, so there is my question. When one spawns a process in erlang, the result is a PID. Does that mean that a process is suspended until requested process is spawned?
I am trying to understand how BEAM VM works.
The details are in the free book, "The Beam Book".
Does that mean that a process is suspended until requested process is
spawned?
It depends.
Erlang is a concurrent language. When we say that processes run
concurrently we mean that for an outside observer it looks like two
processes are executing at the same time.
In a single core system this is achieved by preemptive multitasking.
This means that one process will run for a while, and then the
scheduler of the virtual machine will suspend it and let another
process run.
In a multicore or a distributed system we can achieve true
parallelism, that is, two or more processes actually executing at the
exact same time. In an SMP enabled emulator the system uses several OS
threads to indirectly execute Erlang processes by running one
scheduler and emulator per thread. In a system using the default
settings for ERTS there will be one thread per enabled core (physical
or hyper threaded).
No the processes are independent of one another. Here are the erlang docs
I have the following tasks to do in a rails application:
Download a video
Trim the video with FFMPEG between a given duration (Eg.: 00:02 - 00:09)
Convert the video to a given format
Move the converted video to a folder
Since I wanted to make this happen in background jobs, I used 1 resque worker that processes a queue.
For the first job, I have created a queue like this
#queue = :download_video that does it's task, and at the end of the task I am going forward to the next task by calling Resque.enqueue(ConvertVideo, name, itemId). In this way, I have created a chain of queues that are enqueued when one task is finished.
This is very wrong, since if the first job starts to enqueue the other jobs (one from another), then everything get's blocked with 1 worker until the first list of queued jobs is finished.
How should this be optimised? I tried adding more workers to this way of enqueueing jobs, but the results are wrong and unpredictable.
Another aspect is that each job is saving a status in the database and I need the jobs to be processed in the right order.
Should each worker do a single job from above and have at least 4 workers? If I double the amount to 8 workers, would it be an improvement?
Have you considered using sidekiq ?
As said in Sidekiq documentation :
resque uses redis for storage and processes messages in a single-threaded process. The redis requirement makes it a little more difficult to set up, compared to delayed_job, but redis is far better as a queue than a SQL database. Being single-threaded means that processing 20 jobs in parallel requires 20 processes, which can take a lot of memory.
sidekiq uses redis for storage and processes jobs in a multi-threaded process. It's just as easy to set up as resque but more efficient in terms of raw processing speed. Your worker code does need to be thread-safe.
So you should have two kind of jobs : download videos and convert videos and any download video job should be done in parallel (you can limit that if you want) and then each stored in one queue (the "in-between queue") before being converted by multiple convert jobs in parallel.
I hope that helps, this link explains quite well the best practices in Sidekiq : https://github.com/mperham/sidekiq/wiki/Best-Practices
As #Ghislaindj noted Sidekiq might be an alternative - largely because it offers plugins that control execution ordering.
See this list:
https://github.com/mperham/sidekiq/wiki/Related-Projects#execution-ordering
Nonetheless, yes, you should be using different queues and more workers which are specific to the queue. So you have a set of workers all working on the :download_video queue and then you other workers attached to the :convert_video queue, etc.
If you want to continue using Resque another approach would be to use delayed execution, so when you enqueue your subsequent jobs you specify a delay parameter.
Resque.enqueue_in(10.seconds, ConvertVideo, name, itemId)
The down-side to using delayed execution in Resque is that it requires the resque-scheduler package, so you're introducing a new dependency:
https://github.com/resque/resque-scheduler
For comparison Sidekiq has delayed execution natively available.
Have you considered merging all four tasks into just one? In this case you can have any number of workers, one will do the job. It will work very predictable, you can even know how much time will take to finish the task. You also don't have problems when one of the subtasks takes longer than all others and it piles up in the queue.
At the moment we have an Amazon Simple Workflow application that has a few tasks that can occur in parallel at the beginning of the process, followed by one path through a critical region where we can only allow one process to proceed.
We have modeled the critical region as a child workflow and we only allow one process to run in the child workflow at a time (though there is a race condition in our code that hasn't caused us issues yet). This is doing the job, but it has some issues.
We have a method that keeps checking if the child workflow is running and if it isn't it proceeds (race condition mentioned above - the is running check and starting running are not an atomic operation), otherwise throws an exception and retries, this method has an exponential backoff, the problems are: 1. With multiple workflows entering, which workflow will proceed first is non-deterministic, it would be better if this were a FIFO queue. 2. We can end up waiting a long time for the next workflow to start so there is wasted time, would be nice if the workflows proceeded as soon as the last one had finished.
We can address point 2 by reducing the retry interval, but we would still have the non-FIFO problem.
I can imagine modeling this quite easily on a single machine with a queue and locks, but what are our options in SWF?
You can have "critical section" workflow that is always running. Then signal it to "queue" execute requests. Upon receiving signal the "critical section" workflow either starts activity if it is not running or queues the request in the decider. When activity execution completes the "response" signal is sent back to the requester workflow. As "critical section" workflow is always running it has periodically restart itself as new (passing list of outstanding requests as a parameter) the same way all cron workflows are doing.
Any way to identify the specific core an Erlang process is scheduled on?
Let's say you spawn a bunch of processes to simply print out the core the process is running on, and then exit. Any way to do this?
I spent some time reading docs and googling but couldn't find anything.
Thanks.
EDIT: "core" = CPU core number (or if not number, another identifier that identifies the CPU core).
There is erlang:system_info(scheduler_id) that in most cases is maped to a logical core. But this information is pretty ephemeral because the process may be suspended and resumed on any other scheduler.
What is your use case that you really need that kind of information?
No there is not. If you spawn 2000 processes and they terminate quickly, chances are that you will finish the job before rebalancing occurs. In this case you would only have a single core operating all the time.
You could take a look at the scheduler utilization calls however, see erlang:statistics(scheduler_wall_time). It will tell you how much work each scheduler is really doing.