Execution window time - erlang

I've read an article in the book elixir in action about processes and scheduler and have some questions:
Each process get a small execution window, what is does mean?
Execution windows is approximately 2000 function calls?
What is a process implicitly yield execution?

Let's say you have 10,000 Erlang/Elixir processes running. For simplicity, let's also say your computer only has a single process with a single core. The processor is only capable of doing one thing at a time, so only a single process is capable of being executed at any given moment.
Let's say one of these processes has a long running task. If the Erlang VM wasn't capable of interrupting the process, every single other process would have to wait until that process is done with its task. This doesn't scale well when you're trying to handle tens of thousands of requests.
Thankfully, the Erlang VM is not so naive. When a process spins up, it's given 2,000 reductions (function calls). Every time a function is called by the process, it's reduction count goes down by 1. Once its reduction count hits zero, the process is interrupted (it implicitly yields execution), and it has to wait its turn.
Because Erlang/Elixir don't have loops, iterating over a large data structure must be done recursively. This means that unlike most languages where loops become system bottlenecks, each iteration uses up one of the process' reductions, and the process cannot hog execution.
The rest of this answer is beyond the scope of the question, but included for completeness.
Let's say now that you have a processor with 4 cores. Instead of only having 1 scheduler, the VM will start up with 4 schedulers (1 for each core). If you have enough processes running that the first scheduler can't handle the load in a reasonable amount of time, the second scheduler will take control of the excess processes, executing them in parallel to the first scheduler.
If those two schedulers can't handle the load in a reasonable amount of time, the third scheduler will take on some of the load. This continues until all of the processors are fully utilized.
Additionally, the VM is smart enough not to waste time on processes that are idle - i.e. just waiting for messages.
There is an excellent blog post by JLouis on How Erlang Does Scheduling. I recommend reading it.

Related

What exactly happens when one calls a 'spawn' function?

I am trying to understand how BEAM VM works, so there is my question. When one spawns a process in erlang, the result is a PID. Does that mean that a process is suspended until requested process is spawned?
I am trying to understand how BEAM VM works.
The details are in the free book, "The Beam Book".
Does that mean that a process is suspended until requested process is
spawned?
It depends.
Erlang is a concurrent language. When we say that processes run
concurrently we mean that for an outside observer it looks like two
processes are executing at the same time.
In a single core system this is achieved by preemptive multitasking.
This means that one process will run for a while, and then the
scheduler of the virtual machine will suspend it and let another
process run.
In a multicore or a distributed system we can achieve true
parallelism, that is, two or more processes actually executing at the
exact same time. In an SMP enabled emulator the system uses several OS
threads to indirectly execute Erlang processes by running one
scheduler and emulator per thread. In a system using the default
settings for ERTS there will be one thread per enabled core (physical
or hyper threaded).
No the processes are independent of one another. Here are the erlang docs

Best practices in setting number of dask workers

I am a bit confused by the different terms used in dask and dask.distributed when setting up workers on a cluster.
The terms I came across are: thread, process, processor, node, worker, scheduler.
My question is how to set the number of each, and if there is a strict or recommend relationship between any of these. For example:
1 worker per node with n processes for the n cores on the node
threads and processes are the same concept? In dask-mpi I have to set nthreads but they show up as processes in the client
Any other suggestions?
By "node" people typically mean a physical or virtual machine. That node can run several programs or processes at once (much like how my computer can run a web browser and text editor at once). Each process can parallelize within itself with many threads. Processes have isolated memory environments, meaning that sharing data within a process is free, while sharing data between processes is expensive.
Typically things work best on larger nodes (like 36 cores) if you cut them up into a few processes, each of which have several threads. You want the number of processes times the number of threads to equal the number of cores. So for example you might do something like the following for a 36 core machine:
Four processes with nine threads each
Twelve processes with three threads each
One process with thirty-six threads
Typically one decides between these choices based on the workload. The difference here is due to Python's Global Interpreter Lock, which limits parallelism for some kinds of data. If you are working mostly with Numpy, Pandas, Scikit-Learn, or other numerical programming libraries in Python then you don't need to worry about the GIL, and you probably want to prefer few processes with many threads each. This helps because it allows data to move freely between your cores because it all lives in the same process. However, if you're doing mostly Pure Python programming, like dealing with text data, dictionaries/lists/sets, and doing most of your computation in tight Python for loops then you'll want to prefer having many processes with few threads each. This incurs extra communication costs, but lets you bypass the GIL.
In short, if you're using mostly numpy/pandas-style data, try to get at least eight threads or so in a process. Otherwise, maybe go for only two threads in a process.

Should spawn be used in Erlang whenever I have a non dependent asynchronous function?

If I have a function that can be executed asynchronously without any dependencies and no other functions require its results directly, should I use spawn ? In my scenario I want to proceed to consume a message queue, so spawning would relif my blocking loop, but if there are other situations where I can distribute function calls as much as possible, will that affect negatively my application ?
Overall, what would be the pros and cons of using Spawn.
Unlike operating system processes or threads, Erlang processes are very light weight. There is minimal overhead in starting, stopping, and scheduling new processes. You should be able to spawn as many of them as you need (the max per vm is in the hundreds of thousands). The Actor model Erlang implements allows you to think about what is actually happening in parallel and write your programs to express that directly. Avoid complicating your logic with work queues if you can avoid it.
Spawn a process whenever it makes logical sense, and optimize only when you have to.
The first thing that come in mind is the size of parameters. They will be copied from your current process to the new one and if the parameters are huge it may be inefficient.
Another problem that may arise is bloating VM with such amount of processes that your system will become irresponsive. You can overcome this problem by using pool of worker processes or special monitor process that will allow to work only limited amount of such processes.
so spawning would relif my blocking loop
If you are in the situation that a loop will receive many messages requiring independant actions, don't hesitate and spawn new processes for each message processing, this way you will take advantage of the multicore capabilities (if any) of your computer. As kjw0188 says, the Erlang processes are very light weight and if the system hits the limit of process numbers alive in parallel (with the assumption that you are doing reasonable code) it is more likely that the application is overloading the capability of the node.

When is it appropriate to increase the async-thread size from zero?

I have been reading the documentation trying to understand when it makes sense to increase the async-thread pool size via the +A N switch.
I am perfectly prepared to benchmark, but I was wondering if there were a rule-of-thumb for when one ought to suspect that growing the pool size from 0 to N (or N to N+M) would be helpful.
Thanks
The BEAM runs Erlang code in special threads it calls schedulers. By default it will start a scheduler for every core in your processor. This can be controlled and start up time, for instance if you don't want to run Erlang on all cores but "reserve" some for other things. Normally when you do a file I/O operation then it is run in a scheduler and as file I/O operations are relatively slow they will block that scheduler while they are running. Which can affect the real-time properties. Normally you don't do that much file I/O so it is not a problem.
The asynchronous thread pool are OS threads which are used for I/O operations. Normally the pool is empty but if you use the +A at startup time then the BEAM will create extra threads for this pool. These threads will then only be used for file I/O operations which means that the scheduler threads will no longer block waiting for file I/O and the real-time properties are improved. Of course this costs as OS threads aren't free. The threads don't mix so scheduler threads are just scheduler threads and async threads are just async threads.
If you are writing linked-in drivers for ports these can also use the async thread pool. But you have to detect when they have been started yourself.
How many you need is very much up to your application. By default none are started. Like #demeshchuk I have also heard that Riak likes to have a large async thread pool as they open many files. My only advice is to try it and measure. As with all optimisation?
By default, the number of threads in a running Erlang VM is equal to the number of processor logical cores (if you are using SMP, of course).
From my experience, increasing the +A parameter may give some performance improvement when you are having many simultaneous file I/O operations. And I doubt that increasing +A might increase the overall processes performance, since BEAM's scheduler is extremely fast and optimized.
Speaking of the exact numbers – that totally depends on your application I think. Say, in case of Riak, where the maximum number of opened files is more or less predictable, you can set +A to this maximum, or several times less if it's way too big (by default it's 64, BTW). If your application contains, like, millions of files, and you serve them to web clients – that's another story; most likely, you might want to run some benchmarks with your own code and your own environment.
Finally, I believe I've never seen +A more than a hundred. Doesn't mean you can't set it, but there's likely no point in it.

speed up php-cli

Why is a php cli process using 25% of CPU, is there a way to reduce this? Right now I'm running 3 instances but obviously I would like to run much more to finish the job faster.
Background info: I'm moving data from a transbase db to mysql db.
EDIT: If I run this in a browser there isn't such a noticeable load on the CPU.
More processes doesn't mean faster processing. The PHP process takes as much CPU as it can to finisgh the task as quick as possible. It's probably 25% because you got a quad-core processor and it's a single threaded task.
Ideally, you would need 4 processes if you could assign each of them to a different code. Also, because of waiting for database or disk-I/O, a single thread cannot fully use all CPU power all the time, so go ahead and run more processes. It's not that a 5th processes will crash because all CPU power is used up; it will just take its share, while the OS divides processing power to all running processes.
Just dont' start too many; every process has a little overhead, and you won't benefit from having 200 simultaneous processes.

Resources