why epoll accepted new fd and spawn new thread doesn't scale well?

why epoll accepted new fd and spawn new thread doesn't scale well? - epoll

I have seen a lot of argument about epoll accepted new fd and spawn new thread for read and write on it's own thread doesn't scale well? But how it doesn't scale well? What if every connection has heavy processing like:-
doing database transaction
doing heavy algorithm work
waiting for other things to completed.
If my purpose definitely just want to do the thing inside the program(no more fancy routing to other connection to do stuff), and do not spawn new thread for read/write io. It might be hanging forever just because of one function waiting for something right? If this is the case how epoll scale well if do not spawn new thread?
epoll_wait(...);
// available to read now
recv(....);
// From here if i don't spawn thread, the program will be hanging. What should I do?
processing algorithm work.....// At least 3 secs to do the job.
continue;

AFAIU, epoll(7) does not spawn new threads by itself (see also pthreads(7)...). You need some other call (using pthread_create(3) or the underlying clone(2) system call used by pthread_create...) to create threads.
Read more about the C10K problem (which today should be called C100K) and some pthread tutorial. But it looks like your program could be compute-intensive, not IO-bound. So the bottleneck might be computer power (then you cannot get scalability with just multi-threading on a single computer node; you need distributed computing)
Threads are quite heavy resources. So you want to have some thread pool and have only a few dozens of active (i.e. runnable) threads. See this.
Be also aware of other multiplexing system calls (such as poll(2)), of non-blocking IO (fcntl(2) with O_NONBLOCK), of asynchronous IO (see aio(7)).
I recommend using some existing event-loop based library (look into libev, libevent, Glib, Poco, Qt, ... or for HTTP mostly: libonion on the server side, libcurl on the client side). Look also into 0mq.
The concepts related to callbacks, continuations, CPS could be useful and improve your thinking.
Languages like Go and its Goroutines could be helpful.
It might be hanging forever ....
That should not happen if you design your program carefully (of course having event loops using something like poll or epoll_wait - with a limited delay of less than a second and probably prefering non-blocking IO).
Probably, spending a few weeks learning more about Operating Systems concepts should be worthwhile. Also understanding most system calls (listed in syscalls(2)) after having read more about Linux programming (e.g. the old ALP book, or something newer) should be preferable. Perhaps you don't need something as sophisticated as epoll (because using just poll might be enough).

Related

Using ISR functions in the Contiki Context

I am new to using the Contiki OS and I have a fundamental question.
Can I safely use a low level ISR from within a Contiki Process?
I am doing this as a quick test and it is functioning well.
However, I am concerned that I may be undermining something in the OS that will
fail at a later time under different conditions.
In the context of a process which is fired periodically based upon an event timer,
I am calling a function which sets up an LED to blink.
The LED blinking function itself is a callback from an ISR fired by a hardware timer on an Atmel SAMD21 MCU.
Can some one please clarify for me what constraints I should be concerned about in this particular case?
Thank You.

Basically you can, but you have to understand the context in which each part of the code run in.
A Process has the context of a function, the Contiki's scheduler runs in the main body, timers will enqueue process wakes in this scheduler, in fact, think of Contiki Processes as functions called after each other, notice that those PROCESS_* macros does in fact call return on the function.
When you are at an interrupt handler or callback, you are in a different context, here you can have race conditions if you share data with processes, the same it would be in a bare-metal firmware where interrupt and main() are different contexts.
I strongly recommend you to read about the "protothreads", besides they sound like threads, they are not, they are functions running in the main body. (I believe this link will enlighten you http://dunkels.com/adam/pt/)
On the problem you described, I see nothing wrong.
Contiki itself has some hardware abstractions modules, so you won't have to deal with the platform directly from you application code. I have written big firmwares using Contiki and found these abstractions not very much usable, since it has limited applications. What I did, on this case, was to write my own low level layer to touch the platform, so in the application everything is still platform independent, but, from the OS perspective, I had application code calling platform registers.

Is there an Erlang behaviour that can act on its own instead of waiting to be called?

I'm writing an Erlang application that requires actively polling some remote resources, and I want the process that does the polling to fit into the OTP supervision trees and support all the standard facilities like proper termination, hot code reloading, etc.
However, the two default behaviours, gen_server and gen_fsm seem to only support operation based on callbacks. I could abuse gen_server to do that through calls to self or abuse gen_fsm by having a single state that always loops to itself with a timeout 0, but I'm not sure that's safe (i.e. doesn't exhaust the stack or accumulate unread messages in the mailbox).
I could make my process into a special process and write all that handling myself, but that effectively makes me reimplement the Erlang equivalent of the wheel.
So is there a behavior for code like this?
loop(State) ->
do_stuff(State), % without waiting to be called
loop(NewState).
And if not, is there a safe way to trick default behaviours into doing this without exhausting the stack or accumulating messages over time or something?

The standard way of doing that in Erlang is by using erlang:send_after/3. See this SO answer and also this example implementation.

Is it possible that you could employ an essentially non OTP compliant process? Although to be a good OTP citizen, you do ideally want to make your long running processes into gen_server's and gen_fsm's, sometimes you have to look beyond the standard issue rule book and consider why the rules exist.
What if, for example, your supervisor starts your gen_server, and your gen_server spawns another process (lets call it the active_poll process), and they link to each other so that they have shared fate (if one dies the other dies). The active_poll process is now indirectly supervised by the supervisor that spawned the gen_server, because if it dies, so will the gen_server, and they will both get restarted. The only problem you really have to solve now is code upgrade, but this is not too difficult - your gen_server gets a code_change callback call when the code is to be upgraded, and it could simply send a message to the active_poll process, which can make an appropriate fully qualified function call, and bingo, it's running the new code.
If this doesn't suit you for some reason and/or you MUST use gen_server/gen_fsm/similar directly...
I'm not sure that writing a 'special process' really gives you very much. If you wrote a special process correctly, such that it is in theory compliant to OTP design principals, it could still be ineffective in practice if it blocks or busy waits in a loop somewhere, and doesn't invoke sys when it should, so you really have at most a small optimisation over using gen_server/gen_fsm with a zero timeout (or by having an async message handler which does the polling and sends a message to self to trigger the next poll).
If what ever you are doing to actively poll can block (such as a blocking socket read for example), this is really big trouble, as gen_server, gen_fsm or a special process will all be stopped from fullfilling their usual obligations (which they would usually be able to either because the callback in the case of gen_server/gen_fsm returns, or because receive is called and the sys module invoked explicitly in the case of a special process).
If what you are doing to actively poll is non blocking though, you can do it, but if you poll without any delay then it effectively becomes a busy wait (it's not quite because the loop will include a receive call somewhere, which means the process will yield, giving the scheduler voluntary opportunity to run other processes, but it's not far off, and it will still be a relative CPU hog). If you can have a 1ms delay between each poll that makes a world of difference vs polling as rapidly as you can. It's not ideal, but if you MUST, it'll work. So use a timeout (as big as you can without it becoming a problem), or have an async message handler which does the polling and sends a message to self to trigger the next poll.

Delphi: form as passive UI during externally controlled processing

In a Delphi forms app, how can I get processing code to execute without user input, and how do I get the UI to update with a given frame rate?
The code in question is a test frame for testing/measuring the concurrent operation of components under heavy load, with multiple processes on the same or different machines. The focus is mostly on database operations (peer-to-peer or server-based) and filesystem reliability/performance with regard to file and byte range locking, especially over the network with heterogeneous client OSes.
The frame waits for external events (IPC, file system, network) that signal start and stop of a test run; after the start signal it calls the provided test function in a tight loop until the stop signal is received. Then it waits for the next start signal or the signal to quit.
I've been doing similar things in FoxPro for ages. There it is easy because the Fox doesn't have to sit on a message pump like Delphi's Application.Run(); so I just put up a non-modal form, arrange for it to be refreshed every couple hundred milliseconds and then dive into the procedural code. In raw Win16/Win32 it was slightly less easy but still fairly straightforward.
In Delphi I wouldn't even begin to know where to look, and the structure of the documentation (D7+XE2) has successfully defied me so far. What's the simplest way to do this in Delphi? I guess I could always spin up a new thread for the actual processing, and use raw Win32 calls like RedrawWindow() and PostQuitMessage() to bend the app to my will. But that looks rather klunky. Surely there must be 'delphier' ways of doing this?

Create a background thread to do the processing task. That leaves the main UI thread free to service its message loop as required.
Any information that the task needs to present to the user must be synchronized or queued to the main UI thread. Of course, there's plenty more detail required to write the complete application, but threading is the solution. You can use a high level library to shield yourself from the raw threads, but that doesn't change the basic fact that you need to offload the processing to a thread other than the main UI thread.

How does RunLoop reduces CPU cycles

I have been reading about RunLoops for a few days in Apple documentation and stuff from Google search. I have understood the concept of RunLoops to great extent but still I got no answer to some basic questions regarding RunLoops.
How Runloop exactly works ? Is it something like while loop running at some system level ?
If it is indeed some sort of while loop at some system level, then how does it differ from polling ?
Please provide me with some pointers for this..

The whole point about a RunLoop (variously named as Window Handler, main-loop, event-loop on other platforms) is that it facilitates an Event Driven Architecture in which an application only runs when there is something to do - for example, responding to user-interaction. This is the opposite of polling.
Fundamental to the architecture is a some kind of message queue that a thread can block on until a message available for processing. On MacOSX and iOS systems the queue is a Mach kernel RPC port. On Windows it's a kernel IPC queue, and X-windows systems, a unix-domain or network socket.
Events are inserted into the queue by other system components - for instance a Window Manager and other applications. It is also common for applications to message themselves from other threads in order to perform all UI processing in the same thread.
The run-loop itself resides in application space and looks something like this:
while (!stop)
{
message = WaitForNextMessage();
DispatchMessage(message);
}
Usually, whatever UI framework you use provides a mechanism for registering an event handler for particular types of events.

how to fill the command buffer of the worker thread (iOS threading)

I was going over the RunLoop iOS documentation and it discusses the idea illustrated here:
(source: apple.com)
in the RunLoopSource it provides the following interface for client threads (ie the Main thread in the above illustration) to fill the audio buffer with commands and data, and to subsequently fire all commands available in the said buffer:
// Client interface for registering commands to process
- (void)addCommand:(NSInteger)command withData:(id)data
- (void)fireAllCommandsOnRunLoop:(CFRunLoopRef)runloop
In the add command method we're simply adding commands to an NSMutableArray data structure.
My question is how can we encapsulate those commands in variables such that they are methods.. the data variable in the addCommand method is of type id.. can we put a block in there for example? Are there any best practices here/sample code etc? thanks.

This technique pre-dates blocks. The beauty about using blocks with concurrency is that you can throw as much work as you want at the system, and given its total device scope can schedule that work on multiple cores and threads as it sees fit. You can also use a concurrent NSOperation and have it implement a fifo to accept work and process it, but in this case there will only be the secondary thread, and it will again be scheduled run time as the system sees fit to giving it, so no advantage over blocks.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart