Precise definition of "MPI_THREAD_SERIALIZED" - pthreads

Since MPI_THREAD_MULTIPLE is not as perform and stable as MPI_THREAD_SERIALIZED, I am trying to convert my MPI application to serialized. However, the exact definition of MPI_THREAD_SERIALIZED is not clear to me. Does it mean:
Start and the end of any MPI call should not overlap with other calls?
or 2. Start of any MPI call should not overlap with other calls. (But two calls can still be overlapped)
If the former is the case, how to implement multi-threaded applications with long-lasting blocking APIs such as MPI_Comm_accept, MPI_Win_fence?
An example will be a DPM application waiting for clients to serve.

Related

django channels and running event loop

For a game website, I want a player to contest either agains a human or an AI.
I am using Django + Channels (Django-4.0.2 asgiref-3.5.0 channels-3.0.4)
This is a long way of learning...
Human vs Human: the game take place is the web browser turn by turn. Each time a player connects, it opens a websocket connexion, a move is sent through the socket, processed by the consumer (validated and saved in the database) and sent to the other player.
It is managed only with sync programming.
Human vs AI: I try to use the same route as previously. A test branch check if the game is against the computer and process a move instead of receiving it from the other end of the websocket. This AI move can be a blocking operation as it can take from 2 to 5sec.
I don't want the receive method of the consumer to wait for the AI to return its move, since I have other operations to do quickly (like update some informations on the client side).
Then I thought I could easily take advantage of the allegedly already existing event loop of the channels framework. I could send the AI thinking process to this loop and return the result later to the client through the send method of the consumer.
However, when I write:
loop = asyncio.get_event_loop()
loop.create_task(my_AI_thinking())
Django raises a runtime effort error (the same as described here: https://github.com/django/asgiref/issues/278) telling me there is no running event loop.
The solution seemed to be to upgrade asgiref to 3.5.0 which I did but issue not solved.
I think I am a little bit short of background, and some enlightments should help me to understand a little bit more what is the root cause of this fail.
My first questions would be:
In the combo django + channels + asgi: which is in charge to run the eventloop?
How to check if indeed one event loop is running whatever the thread?
Maybe your answers wil raise other questions.
Did you try running your event_loop example on Django 3.2? (and/or with different Python version)? I experienced various problems with Django 4.0 & Python 3.10, so I keep with Django 3.2 and Python3.7/3.8/3.9 for now, maybe your errors are one of these problems?
If you won't be able to get event_loop running, I see two possible alternative solutions:
Open two WS connections: one only for the moves, and the other for all the other stuff, such as updating information on Player's UI, etc.
You can also use multiprocessing to "manually" send calculating AI move to other thread, and then join the two threads again, after receiving the result (the move). To be honest, multiprocessing in Python is quite simple -- it's pretty handy, if you are familiar with the idea of multithreaded applications.
Unfortunately, I have not yet used event loops in channels myself, maybe someone more experienced in that matter will be able to better address your issue.

why epoll accepted new fd and spawn new thread doesn't scale well?

I have seen a lot of argument about epoll accepted new fd and spawn new thread for read and write on it's own thread doesn't scale well? But how it doesn't scale well? What if every connection has heavy processing like:-
doing database transaction
doing heavy algorithm work
waiting for other things to completed.
If my purpose definitely just want to do the thing inside the program(no more fancy routing to other connection to do stuff), and do not spawn new thread for read/write io. It might be hanging forever just because of one function waiting for something right? If this is the case how epoll scale well if do not spawn new thread?
epoll_wait(...);
// available to read now
recv(....);
// From here if i don't spawn thread, the program will be hanging. What should I do?
processing algorithm work.....// At least 3 secs to do the job.
continue;
AFAIU, epoll(7) does not spawn new threads by itself (see also pthreads(7)...). You need some other call (using pthread_create(3) or the underlying clone(2) system call used by pthread_create...) to create threads.
Read more about the C10K problem (which today should be called C100K) and some pthread tutorial. But it looks like your program could be compute-intensive, not IO-bound. So the bottleneck might be computer power (then you cannot get scalability with just multi-threading on a single computer node; you need distributed computing)
Threads are quite heavy resources. So you want to have some thread pool and have only a few dozens of active (i.e. runnable) threads. See this.
Be also aware of other multiplexing system calls (such as poll(2)), of non-blocking IO (fcntl(2) with O_NONBLOCK), of asynchronous IO (see aio(7)).
I recommend using some existing event-loop based library (look into libev, libevent, Glib, Poco, Qt, ... or for HTTP mostly: libonion on the server side, libcurl on the client side). Look also into 0mq.
The concepts related to callbacks, continuations, CPS could be useful and improve your thinking.
Languages like Go and its Goroutines could be helpful.
It might be hanging forever ....
That should not happen if you design your program carefully (of course having event loops using something like poll or epoll_wait - with a limited delay of less than a second and probably prefering non-blocking IO).
Probably, spending a few weeks learning more about Operating Systems concepts should be worthwhile. Also understanding most system calls (listed in syscalls(2)) after having read more about Linux programming (e.g. the old ALP book, or something newer) should be preferable. Perhaps you don't need something as sophisticated as epoll (because using just poll might be enough).

Using ISR functions in the Contiki Context

I am new to using the Contiki OS and I have a fundamental question.
Can I safely use a low level ISR from within a Contiki Process?
I am doing this as a quick test and it is functioning well.
However, I am concerned that I may be undermining something in the OS that will
fail at a later time under different conditions.
In the context of a process which is fired periodically based upon an event timer,
I am calling a function which sets up an LED to blink.
The LED blinking function itself is a callback from an ISR fired by a hardware timer on an Atmel SAMD21 MCU.
Can some one please clarify for me what constraints I should be concerned about in this particular case?
Thank You.
Basically you can, but you have to understand the context in which each part of the code run in.
A Process has the context of a function, the Contiki's scheduler runs in the main body, timers will enqueue process wakes in this scheduler, in fact, think of Contiki Processes as functions called after each other, notice that those PROCESS_* macros does in fact call return on the function.
When you are at an interrupt handler or callback, you are in a different context, here you can have race conditions if you share data with processes, the same it would be in a bare-metal firmware where interrupt and main() are different contexts.
I strongly recommend you to read about the "protothreads", besides they sound like threads, they are not, they are functions running in the main body. (I believe this link will enlighten you http://dunkels.com/adam/pt/)
On the problem you described, I see nothing wrong.
Contiki itself has some hardware abstractions modules, so you won't have to deal with the platform directly from you application code. I have written big firmwares using Contiki and found these abstractions not very much usable, since it has limited applications. What I did, on this case, was to write my own low level layer to touch the platform, so in the application everything is still platform independent, but, from the OS perspective, I had application code calling platform registers.

Is there an Erlang behaviour that can act on its own instead of waiting to be called?

I'm writing an Erlang application that requires actively polling some remote resources, and I want the process that does the polling to fit into the OTP supervision trees and support all the standard facilities like proper termination, hot code reloading, etc.
However, the two default behaviours, gen_server and gen_fsm seem to only support operation based on callbacks. I could abuse gen_server to do that through calls to self or abuse gen_fsm by having a single state that always loops to itself with a timeout 0, but I'm not sure that's safe (i.e. doesn't exhaust the stack or accumulate unread messages in the mailbox).
I could make my process into a special process and write all that handling myself, but that effectively makes me reimplement the Erlang equivalent of the wheel.
So is there a behavior for code like this?
loop(State) ->
do_stuff(State), % without waiting to be called
loop(NewState).
And if not, is there a safe way to trick default behaviours into doing this without exhausting the stack or accumulating messages over time or something?
The standard way of doing that in Erlang is by using erlang:send_after/3. See this SO answer and also this example implementation.
Is it possible that you could employ an essentially non OTP compliant process? Although to be a good OTP citizen, you do ideally want to make your long running processes into gen_server's and gen_fsm's, sometimes you have to look beyond the standard issue rule book and consider why the rules exist.
What if, for example, your supervisor starts your gen_server, and your gen_server spawns another process (lets call it the active_poll process), and they link to each other so that they have shared fate (if one dies the other dies). The active_poll process is now indirectly supervised by the supervisor that spawned the gen_server, because if it dies, so will the gen_server, and they will both get restarted. The only problem you really have to solve now is code upgrade, but this is not too difficult - your gen_server gets a code_change callback call when the code is to be upgraded, and it could simply send a message to the active_poll process, which can make an appropriate fully qualified function call, and bingo, it's running the new code.
If this doesn't suit you for some reason and/or you MUST use gen_server/gen_fsm/similar directly...
I'm not sure that writing a 'special process' really gives you very much. If you wrote a special process correctly, such that it is in theory compliant to OTP design principals, it could still be ineffective in practice if it blocks or busy waits in a loop somewhere, and doesn't invoke sys when it should, so you really have at most a small optimisation over using gen_server/gen_fsm with a zero timeout (or by having an async message handler which does the polling and sends a message to self to trigger the next poll).
If what ever you are doing to actively poll can block (such as a blocking socket read for example), this is really big trouble, as gen_server, gen_fsm or a special process will all be stopped from fullfilling their usual obligations (which they would usually be able to either because the callback in the case of gen_server/gen_fsm returns, or because receive is called and the sys module invoked explicitly in the case of a special process).
If what you are doing to actively poll is non blocking though, you can do it, but if you poll without any delay then it effectively becomes a busy wait (it's not quite because the loop will include a receive call somewhere, which means the process will yield, giving the scheduler voluntary opportunity to run other processes, but it's not far off, and it will still be a relative CPU hog). If you can have a 1ms delay between each poll that makes a world of difference vs polling as rapidly as you can. It's not ideal, but if you MUST, it'll work. So use a timeout (as big as you can without it becoming a problem), or have an async message handler which does the polling and sends a message to self to trigger the next poll.

Erlang gen_server vs stateless module

I've recently finished Joe's book and quite enjoyed it.
I'm since then started coding a soft realtime application with erlang and I have to say I am a bit confused at the use of gen_server.
When should I use gen_server instead of a simple stateless module?
I define a stateless module as follow:
- A module that takes it's state as a parameter (much like ETS/DETS) as opposed to keeping it internally (like gen_server)
Say for an invoice manager type module, should it initialize and return state which I'd then pass subsequently to it?
SomeState = InvoiceManager:Init(),
SomeState = InvoiceManager:AddInvoice(SomeState, AnInvoiceFoo).
Suppose I'd need multiple instances of the invoice manager state (say my application manages multiple companies each with their own invoices), should they each have a gen_server with internal state to manage their invoices or would it better fit to simply have the stateless module above?
Where is the line between the two?
(Note the invoice manage example above is just that, an example to illustrate my question)
I don't really think you can make that distinction between what you call a stateless module and gen_server. In both cases there is a recursive receive loop which carries state in at least one argument. This main loop handles requests, does work depending on the requests and, when necessary, sends results back the requesters. The main loop will most likely handle a number of administrative requests as well which may not be part of the main API/protocol.
The difference is that gen_server abstracts away the main receive loop and allows the user to only the write the actual user code. It will also handle many administrative OTP functions for you. The main difference is that the user code is in another module which means that you see the passed through state more easily. Unless you actually manage to write your code in one big receive loop and not call other functions to do the work there is no real difference.
Which method is better depends very much on what you need. Using gen_server will simplify your code and give you added functionality "for free" but it can be more restrictive. Rolling your own will give you more power but also you give more possibilities to screww things up. It is probably a little faster as well. What do you need?
It strongly depend of your needs and application design. When you need shared state between processes you have to use process to keep this state. Then gen_server, gen_fsm or other gen_* is your friend. You can avoid this design when your application is not concurrent or this design doesn't bring you some other benefits. For example break your application to processes will lead to simpler design. In other case sometimes you can choose single process design and using "stateless" modules for performance or such. "stateless" module is best choice for very simply stateless (pure functional) tasks. gen_server is often best choice for thinks that seems naturally "process". You must use it when you want share something between processes (using processes can be constrained by scalability or concurrency).
Having used both models, I must say that using the provided gen_server helps me stay structured more easily. I guess this is why it is included in the OTP stack of tools: gen_server is a good way to get the repetitive boiler-plate out of the way.
If you have shared state over multiple processes you should probably go with gen_server and if the state is just local to one process a stateless module will do fine.
I suppose your invoices (or whatever they stand for) should be persistent, so they would end up in an ETS/Mnesia table anyway. If this is so, you should create a stateless module where you put your API for accessing the invoice table.

Resources