which core runs the session during concurrent computing - ipython-parallel

I opened a ipython session, started ipcluster with 4 engines, let engine[0] continuously work in a non-blocking mode.
Now I have some computation to do in the ipython session. How do I know and make sure that the computation in this session is not using the same core/cpu/resources as engine[0]?

Short answer: If you don't send the work to engine 0 explicitly, it won't use engine 0's resources.
IPython doesn't manage cores or other physical resources, it just allocates processes. It is left to your operating system to assign concurrent processes to your CPU cores.
If you have started an IPython cluster with four engines and are using it from an interactive IPython session, you have five Python processes that can run code concurrently—your four engines, plus the interactive session itself. If engine 0 is running a job in the background and you perform local computations in your interactive session, your operating system should assign the work done in the interactive session to a different core than the one occupied by engine 0, assuming there is a core available.

Related

Excessive memory use pyspark

I have setup a JupyterHub and configured a pyspark kernel for it. When I open a pyspark notebook (under username Jeroen), two processes are added, a Python process and a Java process. The Java process is assigned 12g of virtual memory (see image). When running a test script on a range of 1B number it grows to 22g. Is that something to worry about when we work on this server with multiple users? And if it is, how can I prevent Java from allocating so much memory?
You don't need to worry about virtual memory usage, reserved memory is much more important here (the RES column).
You can control size of JVM heap usage using --driver-memory option passed to spark (if you use pyspark kernel on jupyterhub you can find it in environment under PYSPARK_SUBMIT_ARGS key). This is not exactly the memory limit for your application (there are other memory regions on JVM), but it is very close.
So, when you have multiple users setup, you should learn them to set appropriate driver memory (the minimum they need for processing) and shutdown notebooks after they finish work.

MPI/Pthread program does not scale

I have a MPI/Pthread program in which each MPI process will be running on a separate computing node. Within each MPI process, certain number of Pthreads (1-8) are launched. However, no matter how many Pthreads are launched within a MPI process, the overall performance is pretty much the same. I suspect all the Pthreads are running on the same CPU core. How can I assign threads to different CPU cores?
Each computing node has 8 cores.(two Quad core Nehalem processors)
Open MPI 1.4
Linux x86_64
Questions like this are often dependent on the problem at hand. Most likely, you are running into a resource lock issue (where the threads are competing for a lock) -- this would look like only one core was doing any work, because only one thread can (effectively) do any work at any given time.
Setting CPU affinity for a certain thread is not a good solution. You should allow for the OS scheduler to optimally determine the physical core assignment for a given pthread.
Look at your code and try to figure out where you are locking where you shouldn't be, or if you've come up with a correct parallel solution to the problem at hand. You should also test a version of the program using only pthreads (not MPI) and see if scaling is achieved.

Get the number of cores in Erlang with Linux

i am writing a concurrent program and i need to know the number of cores of the system so then the program will know how many processes to open.
Is there command to get this inside Erlang code?
Thnx.
You can use
erlang:system_info(logical_processors_available)
to get the number of cores that can be used by the erlang runtime system.
There is also:
erlang:system_info(schedulers_online)
which tells you how many scheduler threads are actually running.
To get the number of available cores, use the logical_processors flag to erlang:system_info/1:
1> erlang:system_info(logical_processors).
8
There are two companion flags to this one: logical_processors_online shows how many are in use, and logical_processors_available show how many are available (it will return unknown when all logical processors available are online).
To know how to parallelize your code, you should rely on schedulers_online which will return the number of actual Erlang schedulers that are available in your current VM instance:
1> erlang:system_info(schedulers_online).
8
Note however that parallelizing on this value alone might not be enough. Sometimes you have other processes running that need some CPU time and sometimes your algorithm would benefit from even more parallelism (waiting on IO for example). A rule of thumb is to use the value obtained from schedulers_online as a multiplier for parallelism, but always test with different multiples to see what works best for your application.
How this information is exposed will be very operating system specific (unless you happen to be writing an operating system of course).
You didn't say what operating system you're working on. In the case of Linux, you can get the data from /proc/cpuinfo, however there are subtleties with the meaning of hyperthreading and the issue of multiple cores on the same die using a shared L2 cache (effectively you've got a NUMA architecture).

How does an Erlang process bind to a specific scheduler?

How does an Erlang process bind to a specific scheduler?
Currently processes does not get bound to specific schedulers (though you can force it via undocumentet functions, not recommended). Scheduler threads may be bound to logical processors using cpu topology and binding types. The vm does use some of this information to enhance performance in its normal scheduling scheme.
Reading from an old mail from Kenneth Lundin:
The Erlang VM without SMP support has 1 scheduler which runs in the
main process thread. The scheduler picks runnable Erlang processes
and IO-jobs from the run-queue and there is no need to lock data
structures since there is only one thread accessing them.
The Erlang VM with SMP support can have 1 to many schedulers which are
run in 1 thread each. The schedulers pick runnable Erlang processes
and IO-jobs from one common run-queue. In the SMP VM all shared data
structures are protected with locks, the run-queue is one example of
a data structure protected with locks.
From OTP R12B the SMP version of the VM is automatically started as
default if the OS reports more than 1 CPU (or Core) and with the same
number of schedulers as CPU's or Cores.
Not sure if this answer your question. Could you expand a bit more?

Is it better to start multiple erlang nodes per machine, or just one per machine?

Preface: When I say "machine" below, I mean either a physical dedicated server, or a virtual private server. When I say "node" I mean, an instance of the erlang virtual machine, of which there could be multiple running as separate processes under a single unix kernel.
I've got a project that involves multiple erlang/OTP applications. The applications will be running together and talking to each other on the same machine. They will all be hitting the disk, using memory and spawning erlang processes. They will also be using network resources because they will be talking to similar machines with the same set of applications running on them in a cluster.
Almost all of this communication is via HTTP. Thus I could separate each erlang OTP application into a separate instance of the erlang VM on the same machine and they could still talk to each other.
My question is: Is it better to have them running all under one erlang VM so that this erlang VM process can allocate access to resources among them, and schedule the execution of the various erlang processes.
Or is it better to have separate erlang nodes on a given server?
If one is better than the other, why?
I'm assuming running all of these apps in a single erlang vm which is given, essentially, full run of the server, will result in better performance. The OS is just managing the disk and ram at the low level, and only has one significant process (the erlang VM) to switch with... and the erlang VM is probably smarter about allocating resources when it has the holistic view of all the erlang processes.
This may be something that I need to test, but I'm not in a position to do so effectively in the near term.
The answer is: it depends.
Advantages of using a single node:
Memory is controlled by a single Erlang VM. It is way easier.
Inter-application communication (if using erlang-messaging) is faster.
Less operating system context switches happens
Advantages of using multiple nodes:
If the system is linking in C code to the VM, death of one node due to a bug in C will not kill the others.
Agree with #I GIVE CRAP ANSWERS
I would go with one VM. Here is why:
dynamic handling of run time queues belonging to schedulers (with varied origin of CPU load its important)
fewer VMs to monitor
better understanding of memory allocation and easier to spot malicious process (can compare all of them at once)
much easier inter app supervision
I wouldn't care about VM crash - you need to be prepared any way. Heart works especially well in the cluster of equal units.
We've always used one VM per application because it's easier to manage.
The scheduler and SMP support in Erlang have come a long way in the past few years, so there isn't as much reason as there used to be to run multiple VMs on the same node.
I Agree with previous answers but there is a case scenario where having multiple nodes per cpu is the answer: When a heavy task hits the node. A task may take multiple minutes to complete and in such case a gen server will hold the node until completion of the task.

Resources