Identify core an Erlang process - erlang

Any way to identify the specific core an Erlang process is scheduled on?
Let's say you spawn a bunch of processes to simply print out the core the process is running on, and then exit. Any way to do this?
I spent some time reading docs and googling but couldn't find anything.
Thanks.
EDIT: "core" = CPU core number (or if not number, another identifier that identifies the CPU core).

There is erlang:system_info(scheduler_id) that in most cases is maped to a logical core. But this information is pretty ephemeral because the process may be suspended and resumed on any other scheduler.
What is your use case that you really need that kind of information?

No there is not. If you spawn 2000 processes and they terminate quickly, chances are that you will finish the job before rebalancing occurs. In this case you would only have a single core operating all the time.
You could take a look at the scheduler utilization calls however, see erlang:statistics(scheduler_wall_time). It will tell you how much work each scheduler is really doing.

Related

What exactly happens when one calls a 'spawn' function?

I am trying to understand how BEAM VM works, so there is my question. When one spawns a process in erlang, the result is a PID. Does that mean that a process is suspended until requested process is spawned?
I am trying to understand how BEAM VM works.
The details are in the free book, "The Beam Book".
Does that mean that a process is suspended until requested process is
spawned?
It depends.
Erlang is a concurrent language. When we say that processes run
concurrently we mean that for an outside observer it looks like two
processes are executing at the same time.
In a single core system this is achieved by preemptive multitasking.
This means that one process will run for a while, and then the
scheduler of the virtual machine will suspend it and let another
process run.
In a multicore or a distributed system we can achieve true
parallelism, that is, two or more processes actually executing at the
exact same time. In an SMP enabled emulator the system uses several OS
threads to indirectly execute Erlang processes by running one
scheduler and emulator per thread. In a system using the default
settings for ERTS there will be one thread per enabled core (physical
or hyper threaded).
No the processes are independent of one another. Here are the erlang docs

What makes erlang scalable?

I am working on an article describing fundamentals of technologies used by scalable systems. I have worked on Erlang before in a self-learning excercise. I have gone through several articles but have not been able to answer the following questions:
What is in the implementation of Erlang that makes it scalable? What makes it able to run concurrent processes more efficiently than technologies like Java?
What is the relation between functional programming and parallelization? With the declarative syntax of Erlang, do we achieve run-time efficiency?
Does process state not make it heavy? If we have thousands of concurrent users and spawn and equal number of processes as gen_server or any other equivalent pattern, each process would maintain a state. With so many processes, will it not be a drain on the RAM?
If a process has to make DB operations and we spawn multiple instances of that process, eventually the DB will become a bottleneck. This happens even if we use traditional models like Apache-PHP. Almost every business application needs DB access. What then do we gain from using Erlang?
How does process restart help? A process crashes when something is wrong in its logic or in the data. OTP allows you to restart a process. If the logic or data does not change, why would the process not crash again and keep crashing always?
Most articles sing praises about Erlang citing its use in Facebook and Whatsapp. I salute Erlang for being scalable, but also want to technically justify its scalability.
Even if I find answers to these queries on an existing link, that will help.
Regards,
Yash
Shortly:
It's unmutable. You have no variables, only terms, tuples and atoms. Program execution can be divided by breakpoint at any place. Fully transactional model.
Processes are even lightweight than .NET threads and isolated.
It's made for communications. Millions of connections? Fully asynchronous? Maximum thread safety? Big cross-platform environment, which built only for one purpose — scale&communicate? It's all Ericsson language — first in this sphere.
You can choose some impersonators like F#, Scala/Akka, Haskell — they are trying to copy features from Erlang, but only Erlang born from and born for only one purpose — telecom.
Answers to other questions you can find on erlang.com and I'm suggesting you to visit handbook. Erlang built for other aims, so it's not for every task, and if you asking about awful things like php, Erlang will not be your language.
I'm no Erlang developer (yet) but from what I have read about it some of the features that makes it very scalable is that Erlang has its own lightweight processes that are using message passing to communicate with each other. Because of this there is no such thing as shared state and locking which is the case when using for example a multi threaded Java application.
Another difference compared to Java is that the Erlang VM does garbage collection on every little process that is running which does not take any time at all compared to Java which does garbage collection only per VM.
If you get problem with bottlenecks from database connection you could start by using a database pooling app running against maybe a replicated PostgreSQL cluster or if you still have bottlenecks use a multi replicated NoSQL setup with Mnesia, Riak or CouchDB.
I think process restarts can be very useful when you are experiencing rare bugs that only appear randomly and only when specific criteria is fulfilled. Bugs that cause the application to crash as soon as you restart the app should optimally be fixed or taken care of with a circuit breaker so that it does not spread further.
Here is one way process restart helps. By not having to deal with all possible error cases. Say you have a program that divides numbers. Some guy enters a zero to divide by. Instead of checking for that possible error (and tons more), just code the "happy case" and let process crash when he enters 3/0. It just restarts, and he can figure out what he did wrong.
You an extend this into an infinite number of situations (attempting to read from a non-existent file because the user misspelled it, etc).
The big reason for process restart being valuable is that not every error happens every time, and checking that it worked is verbose.
Error handling is verbose typically, so writing it interspersed with the logic handling doing a task can make it harder to understand the code. Moving that logic outside of the task allows you to more clearly distinguish between "doing things" code, and "it broke" code. You just let the thing that had a problem fail, and handle it as needed by a supervising party.
Since most errors don't mean that the entire program must stop, only that that particular thing isn't working right, by just restarting the part that broke, you can keep operating in a state of degraded functionality, instead of being down, while you repair the problem.
It should also be noted that the failure recovery is bounded. You have to lay out the limits for how much failure in a certain period of time is too much. If you exceed that limit, the failure propagates to another level of supervision. Each restart includes doing any needed process initialization, which is sometimes enough to fix the problem. For example, in dev, I've accidentally deleted a database file associated with a process. The crashes cascaded up to the level where the file was first created, at which point the problem rectified itself, and everything carried on.

What function is to small for spawning into an own process

Where is the limit where there is no benefit of spawning a process to make a more parallelized function call?
For example when doing a recursive lookup in a tree structure, each child node would add a process and a message call to the parent just for a simple comparison.
Spawning process and do the work will be always slower than just do the work. It strongly depend on your exact requirements. Especially non-function requirements are the key. So go and do measurements. It's pretty easy. See documentation about Profiling for more details and there are also 3rd party projects easing benchmarking over there.
Spawning more processes won't necessarily make tasks run in parallel. For example, if you have a 24 cores on your system, only 24 processes can run at any one time.
Instead it might be good to think about how much work is being done when you examine a node in a tree. Lets say the node value represents a url which needs to be called to retrieve a value. In this case it might be a good idea to spawn a process for each node. This way a process can be scheduled to run while another process is waiting for an answer to the http request.

Should spawn be used in Erlang whenever I have a non dependent asynchronous function?

If I have a function that can be executed asynchronously without any dependencies and no other functions require its results directly, should I use spawn ? In my scenario I want to proceed to consume a message queue, so spawning would relif my blocking loop, but if there are other situations where I can distribute function calls as much as possible, will that affect negatively my application ?
Overall, what would be the pros and cons of using Spawn.
Unlike operating system processes or threads, Erlang processes are very light weight. There is minimal overhead in starting, stopping, and scheduling new processes. You should be able to spawn as many of them as you need (the max per vm is in the hundreds of thousands). The Actor model Erlang implements allows you to think about what is actually happening in parallel and write your programs to express that directly. Avoid complicating your logic with work queues if you can avoid it.
Spawn a process whenever it makes logical sense, and optimize only when you have to.
The first thing that come in mind is the size of parameters. They will be copied from your current process to the new one and if the parameters are huge it may be inefficient.
Another problem that may arise is bloating VM with such amount of processes that your system will become irresponsive. You can overcome this problem by using pool of worker processes or special monitor process that will allow to work only limited amount of such processes.
so spawning would relif my blocking loop
If you are in the situation that a loop will receive many messages requiring independant actions, don't hesitate and spawn new processes for each message processing, this way you will take advantage of the multicore capabilities (if any) of your computer. As kjw0188 says, the Erlang processes are very light weight and if the system hits the limit of process numbers alive in parallel (with the assumption that you are doing reasonable code) it is more likely that the application is overloading the capability of the node.

How reliable is windows task scheduler for scheduling code to run repeatedly?

I have a bit of code that needs to sit on a windows server 2003 machine and run every minute.
What is the recommended way of handling this? Is it ok to design it as a console service and just have the task scheduler hit it ever minute? (is that even possible?) Should I just suck it up and write it as a windows service?
Since it needs to run every single minute, I would suggest writing a Windows Service. It is not very complicated, and if you never did this before, it would be great for you to learn how it is done.
Calling the scheduled task every minute is not something I would recommend.
I would say suck it up and write it as a Windows service. I've not found scheduled tasks to be very reliable and when it doesn't run, I have yet to find an easy way to find out why it hasn't.
Windows Scheduled Tasks has been fairly reliable for our purposes and we favor them in almost all cases over Windows Services due to their ease of installing and advanced recovery features. The always on nature of a windows service could end up being a problem if a part of the code that was written ends ups getting locked up or looped in a piece of code that it shouldn't be in. We generally write our code in a fashion similar to this
Init();
Run();
CleanUp();
Then as part of the Scheduled Task we put a time limit on how long the process can run and have it kill the process if it runs for longer. If we do have a piece of code that is having trouble Scheduled Tasks will kill it and the process will start up in the next minute.
if you need to have it run every minute, I would build it as a windows service. I wouldn't use the scheduler for anything less than a daily task.
I would say that it depends on what it was doing, but in general I am always in favor of having the fewest layers. If you write it as a console service and use the task scheduler then you have two places to maintain going forward.
If you write it as a windows service then you only have one fewer places to check in case something goes wrong.
While searching for scheduled service help, i came across to a very good article by Jon Galloway.
There are various diadvantages if a windows service is used for scheduled task. I agreed with it. I would suggest to use Task Scheduled, simple in implementation. Please refer to detailed information of implementing the task scheduler. Hope this info helps in finalizing the implementation approach.
The only other point to consider, is that if you're job involves some kind of database interaction, consider looking into the integration/scheduling services provided by your database.
For example, creating an SSIS package for your SQL Server related service may seem a bit like overkill, but it can be integrated nicely with the environment and will have its own logging/error checking mechanisms already in place.
I agree, it is kind of a waste of effort to create even a console executable and schedule it to be run every minute. I would suggest exploring something like Quartz.Net. That way you can create a simple job and schedule it to run every minute.

Resources