What exactly happens when one calls a 'spawn' function? - erlang

I am trying to understand how BEAM VM works, so there is my question. When one spawns a process in erlang, the result is a PID. Does that mean that a process is suspended until requested process is spawned?

I am trying to understand how BEAM VM works.
The details are in the free book, "The Beam Book".
Does that mean that a process is suspended until requested process is
spawned?
It depends.
Erlang is a concurrent language. When we say that processes run
concurrently we mean that for an outside observer it looks like two
processes are executing at the same time.
In a single core system this is achieved by preemptive multitasking.
This means that one process will run for a while, and then the
scheduler of the virtual machine will suspend it and let another
process run.
In a multicore or a distributed system we can achieve true
parallelism, that is, two or more processes actually executing at the
exact same time. In an SMP enabled emulator the system uses several OS
threads to indirectly execute Erlang processes by running one
scheduler and emulator per thread. In a system using the default
settings for ERTS there will be one thread per enabled core (physical
or hyper threaded).

No the processes are independent of one another. Here are the erlang docs

Related

Execution window time

I've read an article in the book elixir in action about processes and scheduler and have some questions:
Each process get a small execution window, what is does mean?
Execution windows is approximately 2000 function calls?
What is a process implicitly yield execution?
Let's say you have 10,000 Erlang/Elixir processes running. For simplicity, let's also say your computer only has a single process with a single core. The processor is only capable of doing one thing at a time, so only a single process is capable of being executed at any given moment.
Let's say one of these processes has a long running task. If the Erlang VM wasn't capable of interrupting the process, every single other process would have to wait until that process is done with its task. This doesn't scale well when you're trying to handle tens of thousands of requests.
Thankfully, the Erlang VM is not so naive. When a process spins up, it's given 2,000 reductions (function calls). Every time a function is called by the process, it's reduction count goes down by 1. Once its reduction count hits zero, the process is interrupted (it implicitly yields execution), and it has to wait its turn.
Because Erlang/Elixir don't have loops, iterating over a large data structure must be done recursively. This means that unlike most languages where loops become system bottlenecks, each iteration uses up one of the process' reductions, and the process cannot hog execution.
The rest of this answer is beyond the scope of the question, but included for completeness.
Let's say now that you have a processor with 4 cores. Instead of only having 1 scheduler, the VM will start up with 4 schedulers (1 for each core). If you have enough processes running that the first scheduler can't handle the load in a reasonable amount of time, the second scheduler will take control of the excess processes, executing them in parallel to the first scheduler.
If those two schedulers can't handle the load in a reasonable amount of time, the third scheduler will take on some of the load. This continues until all of the processors are fully utilized.
Additionally, the VM is smart enough not to waste time on processes that are idle - i.e. just waiting for messages.
There is an excellent blog post by JLouis on How Erlang Does Scheduling. I recommend reading it.

How erlang control the process which created by os:cmd?

1> os:cmd("ping google.com").
When above code was executed, there are two process are created, one is erlang process and one is system level process.
Is there any lib for erlang that we can monitor the system level process "ping google.com"?
Using the erlexec application to run OS processes gives you a lot more control over
those processes. You can send signals to processes (e.g. to stop it), setup Erlang monitors for OS processes and you get the status code when the OS process terminates (os:cmd doesn't give you that).
Take a look at the erlexec documentation.

Identify core an Erlang process

Any way to identify the specific core an Erlang process is scheduled on?
Let's say you spawn a bunch of processes to simply print out the core the process is running on, and then exit. Any way to do this?
I spent some time reading docs and googling but couldn't find anything.
Thanks.
EDIT: "core" = CPU core number (or if not number, another identifier that identifies the CPU core).
There is erlang:system_info(scheduler_id) that in most cases is maped to a logical core. But this information is pretty ephemeral because the process may be suspended and resumed on any other scheduler.
What is your use case that you really need that kind of information?
No there is not. If you spawn 2000 processes and they terminate quickly, chances are that you will finish the job before rebalancing occurs. In this case you would only have a single core operating all the time.
You could take a look at the scheduler utilization calls however, see erlang:statistics(scheduler_wall_time). It will tell you how much work each scheduler is really doing.

Should spawn be used in Erlang whenever I have a non dependent asynchronous function?

If I have a function that can be executed asynchronously without any dependencies and no other functions require its results directly, should I use spawn ? In my scenario I want to proceed to consume a message queue, so spawning would relif my blocking loop, but if there are other situations where I can distribute function calls as much as possible, will that affect negatively my application ?
Overall, what would be the pros and cons of using Spawn.
Unlike operating system processes or threads, Erlang processes are very light weight. There is minimal overhead in starting, stopping, and scheduling new processes. You should be able to spawn as many of them as you need (the max per vm is in the hundreds of thousands). The Actor model Erlang implements allows you to think about what is actually happening in parallel and write your programs to express that directly. Avoid complicating your logic with work queues if you can avoid it.
Spawn a process whenever it makes logical sense, and optimize only when you have to.
The first thing that come in mind is the size of parameters. They will be copied from your current process to the new one and if the parameters are huge it may be inefficient.
Another problem that may arise is bloating VM with such amount of processes that your system will become irresponsive. You can overcome this problem by using pool of worker processes or special monitor process that will allow to work only limited amount of such processes.
so spawning would relif my blocking loop
If you are in the situation that a loop will receive many messages requiring independant actions, don't hesitate and spawn new processes for each message processing, this way you will take advantage of the multicore capabilities (if any) of your computer. As kjw0188 says, the Erlang processes are very light weight and if the system hits the limit of process numbers alive in parallel (with the assumption that you are doing reasonable code) it is more likely that the application is overloading the capability of the node.

Erlang VM: scheduler runtime information

I was searching for a way to retrieve information about how the scheduling is done during a program's execution: which processes are in which scheduler, if they change, what process is active at each scheduler, if each scheduler runs in one core etc...
Any ideas or related documentation/articles/anything?
I would suggest you take a look on the following tracing/profiling options:
erlang:system_profile/2
It has options for monitoring scheduler and run queue (runnable_procs) activity.
The scheduler option will report
{profile, scheduler, Id, State, NoScheds, Ts}
where State will tell you if it is active or not. NoScheds reports the number of currently active schedulers (if I remember correctly).
The runnable_procs option will let you know if a process is put into or removed from a run queue of a particular scheduler.
If you have a system that supports DTrace, you can use the erlang dtrace probes being developed to see exactly when process scheduling events occur.
For example, I wrote a simple one-liner that shows you the number of nanoseconds that pass between sending a message to a process and having the recipient process be scheduled for execution (± a few nanoseconds for cross-core clock variance and processes and such).

Resources