Why exactly does ePoll scale better than Poll?

Why exactly does ePoll scale better than Poll? - epoll

Short question but for me its difficult to understand.
Why exactly does ePoll scale better than Poll?

While Damon's reason is correct for the unusual case where you never block on a socket, in typical real-world programs, the reason is completely different. A typical program looks like this:
1) Do all the work we can do now.
2) Check if any network connections need service, blocking if there's nothing to do.
3) Service any network connections discovered.
4) Go to step 1.
Typically, because you just did all the work you can do, when you come around to step 2, there is no work for you to do. So you'll have to wait a bit. Now, imagine there are 800 sockets you are interested in. The kernel has to put on the wait queue for each of those 800 sockets. And, a split-second later when data arrives on one of those 800 sockets, the kernel has to remove you from those 800 wait queues. Placing a task on a wait queue requires creating a 'thunk' to link that task to that wait queue. No good optimizations are possible because the kernel has no idea which 800 sockets you'll be waiting for.
With epoll, the epoll socket itself has a wait queue, and the process is only put on that one single wait queue. A thunk is needed to link each of the 800 connections to the epoll wait queue, but that thunk is persistent. You create it by adding a socket to an epoll set, and it remains there until you remove the socket from the set.
When there's activity on the socket, the kernel handles it in the task that detects the activity. When you wait, the kernel already knows if there's a detected event and the kernel only has to put you on that one wait queue. When you wake, it only has to remove you from that one queue.
So it's not so much the copying that's the killer with select or poll, it's the fact that the kernel has to manipulate a massive number of wait queues on each blocking operation.

The poll system call needs to copy your list of file descriptors to the kernel each time. This happens only once with epoll_ctl, but not every time you call epoll_wait.
Also, epoll_wait is O(1) in respect of the number of descriptors watched1, which means it does not matter whether you wait on one descriptor or on 5,000 or 50,000 descriptors. poll, while being more efficient than select, still has to walk over the list every time (i.e. it is O(N) in respect of number of descriptors).
And lastly, epoll can in addition to the "normal" mode work in "edge triggered" mode, which means the kernel does not need keep track of how much data you've read after you've been signalled readiness. This mode is more difficult to grasp, but somewhat more efficient.
1As correctly pointed out by David Schwartz, epoll_wait is of course still O(N) in respect of events that occur. There is hardly a way it could be any different, with any interface. If N events happen on a descriptor that is watched, then the application needs to get N notifications, and needs to do N "things" in order to react on what's happening.
This is again slightly, but not fundamentally different in edge triggered mode, where you actually get M events with M <= N. In edge triggered mode, when the same event (say, POLLIN) happens several times, you will probably get fewer notifications, possibly only a single one. However, this doesn't change much about the big-O notation as such.
However, epoll_wait is irrespective of the number of descriptors watched. Under the assumption that it is used in the intended, "normal" way (that is, many descriptors, few events), this is what really matters, and here it is indeed O(1).
As an analogy, you can think of a hash table. A hash table accesses its content in O(1), but one could argue that calculating the hash is actually O(N) in respect of the key length. This is technically absolutely correct, and there probably exist cases where this is a problem, however, for most people, this just doesn't matter.

Related

Inhibit Time in Tx-PDO

Objects 180Nh have the following subindices:
0x00:----
0x01:----
0x02:----
0x03 (inhibit time): This subindex contains a time lock in 100 µs steps (see following figure). This can be used to set a time that must elapse after the sending of a PDO before the PDO is sent another time. This time only applies for asynchronous PDOs. This is intended to prevent PDOs from being sent continuously if the mapped object constantly changes.
0x04 (compatibility entry): This subindex has no function and exists only for compatibility reasons.
0x05 (event timer): This time (in ms) can be used to trigger an Event which handles the copying of the data and the sending of the PDO.
According to the above point, we realize that when the event occurs, a certain time is determined, which is blocked, and it is for Tx-PDO; now, if the event occurs in this interval, it will be executed in the next section.
Why should the whole section be implemented? Why is the second, third, and fourth event executed in the last part?
Shouldn't the third and fourth events be executed separately?

By default, common CANopen device profiles like for example CiA 401 "generic I/O module" are configured to suit large automation networks. That is: a large network with lots of nodes where it is important to keep bus traffic low. On such networks nodes only transmit PDOs when there has been a data update (an internal event has occurred).
However, such a setup is very much unsuitable when CANopen is used for real-time control systems, like for example having a PLC controlling a bunch of actuator I/O modules that control motions of a machine. Which could also be a safety-related application. In such systems, it is custom to always send data repeatedly at even intervals, even if it has not changed. For example send all data once every 10ms/100ms.
Only the last data sent is used by the receiving node(s), so in case data goes missing/corrupt, new reliable data will arrive soon again. And if no data arrives at all, that's an indication that something is broken and the system ought to revert to a safe state, after receiving no new data in a certain time period. This is how mobile/automotive control systems are most commonly designed, since it is safe, deterministic and proven in use. Custom, non-standard CAN bus protocols by OEM are often implemented exactly like this.
Now, to achieve this with CANopen, we have to configure the TPDO communication parameters. Event timer to set the interval and inhibit time to prevent the node spamming extra data as soon as something has changed. If I remember correctly we also need to set 180N:2 transmission type to asynchronous (which sounds counter-intuitive).
With a setup like this, only the most recent event matters. The most up to date data will always get sent, at fixed intervals.

CAN J1939 device stops responding after communication timeout

I'm a higher layer guy, I don't and don't want to know much about can-bus, j1939 or even particular ECUs. I just don't like the software solution, so I'd like to ask, if customer's requirements are legitimate.
If particular ECU doesn't receive CAN frame within 300 ms timeout after powerup, it stops responding to any further frames and must be power cycled. This is a information from customer's technicians, I have to just believe it.
It is possible to powerup ECU after CAN driver thread is ready, but it would require some extra wiring by end customers.
Software solutions are all bad or worse, like running FreeRTOS before important checks, put CAN driver code to code common with other products, or start CAN periphery in the bootloader and left running without software control until driver starts.
The sensitive part is, that we have no explicit demand to start CAN driver within such a short time in specification. Customer says, that it's part of J1939 specification.
Can someone confirm or disprove, that J1939 allows devices to unrecoverably stop receiving after 300 ms of silence or requires devices to start transmitting within 300 ms after powerup? Or at least guide me to parts of J1939 standard, which could possibly regard this?
Thank you

If particular ECU doesn't receive CAN frame within 300 ms timeout after powerup, it stops responding to any further frames and must be power cycled. This is a information from customer's technicians, I have to just believe it.
This does of course entirely depend on what task it is performing.
Generally, an ECU, as in an automotive computer in a car/truck etc is never allowed to hang up/latch up. The normal course of action would be for the ECU to either reboot/reset itself or revert to a fail-safe mode.
But in case of tractors and heavy machinery the normal safe mode is "stop everything".
It is possible to powerup ECU after CAN driver thread is ready, but it would require some extra wiring by end customers.
I don't know what this is supposed to mean. What is "extra wiring"? Something to keep other nodes in common mode while one is rebooting? Terminating resistors? Some dirty power-up delay circuit?
Software solutions are all bad or worse, like running FreeRTOS before important checks, put CAN driver code to code common with other products, or start CAN periphery in the bootloader and left running without software control until driver starts.
Generally speaking, it's custom to initialize critical hardware like clocks, watchdogs, prescalers, pull resistors etc very early on. Initializing hardware peripherals may or may not be critical. It's custom to do this after the CRT has been executed, at the beginning of main() and the order of initialization usually matters a lot.
If you have a delay longer than 300ms from power-on reset to the start of main(), something is terribly wrong with the program.
The sensitive part is, that we have no explicit demand to start CAN driver within such a short time in specification. Customer says, that it's part of J1939 specification.
I haven't worked much with J1939 and I don't remember what it says specifically, but 300ms is an eternity in a real-time system! It's not a "short time".
In general, correctly designed mission-/safety-critical CAN control systems in automotive/industrial settings work like this:
All data is sent repeatedly in fixed intervals, regardless of if it has changed or not. Commonly once per 10ms or once per 100ms.
A node which has not received new data will use the previously received data for now.
There is a timeout from the point of when last valid data was received, when the receiving node must stop using old data and revert to a fail-safe mode. This time is often relative to how fast the controlled object can move. It's common to have timeouts after some multiple of 100ms.
I would say that your customer's requirements are very reasonable, it's nothing out of the ordinary.

My colleague answered, that there's no such demand, only vague 300 ms timeout.

Is it sufficient to set ROS publisher buffer to 1 and Subscriber buffer to 1000 and still not loose any messages

I am trying to understand subscriber and publisher buffers. If I set subsrciber buffer to 1000 and publisher buffer to 1, are there any chances that I loose messages ? Could anyone please explain me the same?

Yes, in theory you may lose messages with these settings, in practice it depends.
Theory: spinner threads
On both sides, publisher as well as subscriber, there are so called spinner threads responsible for handling the callbacks (for message sending on the publisher side and message evaluation on the subscriber-side). These spinner threads are working in parallel to the main thread. If messages are arriving faster from the main thread than they are being processed by the spinner thread, the number of messages given by the queue size will be buffered up before beginning to throw away the oldest ones. Therefore if you publish at a very high rate the publisher-sided spinner thread might drop older messages, while if your callback function on the subscriber side takes too long to execute your subscriber queue will start dropping messages. To improve this one can use multi-threaded spinners where one increases the number of spinner threads and activate concurrency in order to process the callback queue more quickly. Read more about it here.
Practice: Choosing the queue size
The queue size of the publisher queue you should set depends on which rate you publish and if you publish in bursts. If you publish in bursts or at higher frequencies (e.g. > 10 Hz) a publisher queue size of 1 won't be sufficient. On the subscriber side it is harder to give recommendations as it also depends on how long the callback takes to process the information.
It is actually also possible to set the value 0 for the queues which results in an arbitrarily large queue but this might be problematic as the required memory might grow indefinitely, well at least until your computer freezes. Furthermore having a large queue size might often be disadvantageous: If you set a large queue and the callback takes long to execute you might be working on very outdated data while the queue gets longer and longer.
Alternative communication patterns
If you want to guarantee that information is actually being processed (e.g. real-time or safety-relevant information) ROS topics are probably the wrong choice. Depending on what precisely you need the other two communication methods services or actions might be an alternative. But for things like large information streams of safety-relevant real-time data there are no perfect communication mechanisms in ROS1.

Is it guaranteed that mnesia event listeners will get each state of a record, if it changes fast?

Let's say I have some record like {my_table, Id, Value}.
I constantly overwrite the value so that it holds consecutive integers like 1, 2, 3, 4, 5 etc.
In a distributed environment, is it guaranteed that my event listeners will receive all of the values? (I don't care about ordering)

I haven't verified this by reading that part of the source yet, but it appears that sending a message out is part of the update process, so messages should always come out, even on very fast changes. (The alternative would be for Mnesia to either queue messages or queue changes and run them in batches. I'm almost positive this is not what happens -- it would be too hard to predict the variability of advantageous moments to start batching jobs or queueing messages. Sending messages is generally much cheaper than making a change in the db.)
Since Erlang guarantees delivery of messages to a live destination this is as close to a promise that every Mnesia change will eventually be seen as you're likely to get. The order of messages couldn't be guaranteed on the receiving end (as it appears you expect), and of course a network failure could make a set of messages get missed (rendering the destination something other than live from the perspective of the sender).

Delaying event handling in Flash

I'd like to delay the handling for some captured events in ActionScript until a certain time. Right now, I stick them in an Array when captured and go through it when needed, but this seems inefficient. Is there a better way to do this?

Well, to me this seems a clean and efficient way of doing that.
What do you mean by delaying? you mean simply processing them later, or processing them after a given time?
You can always set a timout to the actual processing function in your event handler (using flash.utils.setTimeout), to process the event at a precise moment in time. But that can become inefficient, since you may have many timeouts dangeling about, that need to be handled by the runtime.
Maybe you could specify your needs a little more.
edit:
Ok, basically, flash player is single threaded - that is bytecode execution is single threaded. And any event, that is dispatched, is processed immediatly, i.e. dispatchEvent(someEvent) will directly call all registered handlers (thus AS bytecode).
Now there are events, which actually are generated in the background. These come either from I/O (network, userinput) or timers (TimerEvents). It may happen, that some of these events actually occur, while bytecode is executed. This usually happens in a background thread, which passes the event (in the abstract sense of the term) to the main thread through a (de)queue.
If the main thread is busy executing bytecode, then it will ignore these messages until it is done (notice: nearly any bytecode execution is always the implicit consequence of an event (be it enter frame, or input, or timer or load operation or whatever)). When it is idle, it will look in all queues, until it finds an available message, wraps the information into an ActionScript Event object, and dispatches it as previously described.
Thus this queueing is a very low level mechanism, that comes from thread-to-thread communication (and appears in many multi-threading scenarios), and is inaccessible to you.
But as I said before, your approach both is valid and makes sense.

Store them into Vector instead of Array :p
I think it's all about how you structure your program, maybe you can assign the captured event under the related instance? So that it's all natural to process the captured event with it instead of querying from a global vector

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart