How do I check the number of cores used in a Pthread program? - pthreads

I created a Pthread program and executed it on a server.
The number of threads is less than the number of logic cores of the server.
However, since it is the server, I am not sure the number of cores used to execute this multi-threaded program...
Will the program always be executed with the same number of cores as the number of threads?
How do I check this information?
Any help are appreciated!

Related

Are there alternatives to pthreads for linux for parallel execution and memory sharing?

I wrote a c++ linux application that used the pthread library. But it didn't work for me because instead of launching 100 threads it started only 98 threads:
pthread_join Segmentation fault with 100 threads Is there an alternative other than 'fork' to parallelize my code? An advantage of threads was in the fact that I had all the global variable shared and I could place mutex where I had to write a shared variable.
Is there an alternative other than 'fork' to parallelize my code?
There is std::thread, which C++ people around here anyway tend to tell people to use instead of pthreads. However, that is highly likely to be implemented on top of a lower-level thread library, most likely pthreads on a system that offers pthreads in the first place.
There is also OpenMP, but this is again a wrapper around lower-level threading mechanisms.
The only readily usable alternative to parallelizing via multiple threads is parallelizing via multiple processes, which is what I take you to mean by your reference to fork.
An advantage of threads was in the fact that I had all the global variable shared and I could place mutex where I had to write a shared variable.
It is possible to share memory among multiple processes and to have mutexes that are shared among processes. That's a little trickier than just using a regular shared variable, but not so much so. The mechanisms for this are called "shared memory", and in the POSIX world there are two flavors: older, so-called System V shared memory segments, and newer POSIX shared memory.
May I suggest, however, that a better solution might be simply to reduce the number of threads. 100 threads is hugely excessive for parallel computation on most machines because your true concurrency is limited by the number of execution units (cores) the machine has. More threads than that may make some sense if you expect them to regularly block on I/O (on different files) for a significant time, but even then 100 is probably beyond the threshhold of reasonable. If you have more threads contending for execution time than you have execution units on which to schedule them then you are probably getting worse performance than you would with fewer threads.

How to specify number of cores erlang will run on

I wanted to start erlang while varying the number of cores
in order to test the scalability of my program. I expect that running the program on more cores should be faster than running on less cores.
How can I specify the core limits?
In fact, I have tried with smp -disable (and I supposed that it will run on 1 core? isn't it?) But the execution time still the same as with more cores.
I tried also to put +S 1:1 (assuming 1 scheduler so as to run on 1 core? as well as other scheduler numbers), but it seems nothing has changed.
Was that because of characteristic of my program or did I do something wrong on specifying the core limits?
And if possible could someone give some tips on how to scale your Erlang programs.
Thank you very much.

Rails application servers

I've been reading information about how different rails application servers work for a while and some things got me confused probably because of my lack of knowledge in this field. Anyway, the following things got me confused:
Puma server has the following line about its clustered mode workers number in its readme:
On a ruby implementation that offers native threads, you should tune this number to match the number of cores available
So if I have, lets say, 2 cores and use rubinius as a ruby implementation, should I still use more than 1 process considering that rubinius use native threads and doesn't have the lock thus it uses all the CPU cores anyway, even with 1 process?
I understand it that I'd need to only increase the threads pool of the only process if I upgrade to a machine with more cores and memory, if it's not correct, please explain it to me.
I've read some articles on using Server-Sent Events with puma which, as far as I understand, blocks the puma thread since the browser keeps the connection open, so if I have 16 threads and 16 people are using my site, then the 17th would have to wait before one of those 16 leaves so it could connect? That's not very efficient, is it? Or what do I miss?
If I have a 1 core machine with 3Gb of RAM, just for the sake of the question, and using unicorn as my application server and 1 worker takes 300 MBs of memory and its CPU usage is insignificant, how many workers should I have? Some say that the number of workers should be equal to the number of cores, but if I set the workers number to, lets say, 7 (since I have enough RAM for it), it will be able to handle 7 concurrent requests, won't it? So it's just a question of memory and cpu usage and amount of RAM? Or what do I miss?

find number of idling processor cores in Erlang

I need to find whether they are any idling processor cores and how much of them?
My task: I need to do a calculation in parallel and the number of processes which are running at a time need to be limited and not exceeding the existing number of processor cores.
This line will give you the existing number of processor cores.
The rest of the code does pretty much what you intend to do anyway.
Here's the doc for that func call.

When is it appropriate to increase the async-thread size from zero?

I have been reading the documentation trying to understand when it makes sense to increase the async-thread pool size via the +A N switch.
I am perfectly prepared to benchmark, but I was wondering if there were a rule-of-thumb for when one ought to suspect that growing the pool size from 0 to N (or N to N+M) would be helpful.
Thanks
The BEAM runs Erlang code in special threads it calls schedulers. By default it will start a scheduler for every core in your processor. This can be controlled and start up time, for instance if you don't want to run Erlang on all cores but "reserve" some for other things. Normally when you do a file I/O operation then it is run in a scheduler and as file I/O operations are relatively slow they will block that scheduler while they are running. Which can affect the real-time properties. Normally you don't do that much file I/O so it is not a problem.
The asynchronous thread pool are OS threads which are used for I/O operations. Normally the pool is empty but if you use the +A at startup time then the BEAM will create extra threads for this pool. These threads will then only be used for file I/O operations which means that the scheduler threads will no longer block waiting for file I/O and the real-time properties are improved. Of course this costs as OS threads aren't free. The threads don't mix so scheduler threads are just scheduler threads and async threads are just async threads.
If you are writing linked-in drivers for ports these can also use the async thread pool. But you have to detect when they have been started yourself.
How many you need is very much up to your application. By default none are started. Like #demeshchuk I have also heard that Riak likes to have a large async thread pool as they open many files. My only advice is to try it and measure. As with all optimisation?
By default, the number of threads in a running Erlang VM is equal to the number of processor logical cores (if you are using SMP, of course).
From my experience, increasing the +A parameter may give some performance improvement when you are having many simultaneous file I/O operations. And I doubt that increasing +A might increase the overall processes performance, since BEAM's scheduler is extremely fast and optimized.
Speaking of the exact numbers – that totally depends on your application I think. Say, in case of Riak, where the maximum number of opened files is more or less predictable, you can set +A to this maximum, or several times less if it's way too big (by default it's 64, BTW). If your application contains, like, millions of files, and you serve them to web clients – that's another story; most likely, you might want to run some benchmarks with your own code and your own environment.
Finally, I believe I've never seen +A more than a hundred. Doesn't mean you can't set it, but there's likely no point in it.

Resources