Is it guaranteed that mnesia event listeners will get each state of a record, if it changes fast? - erlang

Let's say I have some record like {my_table, Id, Value}.
I constantly overwrite the value so that it holds consecutive integers like 1, 2, 3, 4, 5 etc.
In a distributed environment, is it guaranteed that my event listeners will receive all of the values? (I don't care about ordering)

I haven't verified this by reading that part of the source yet, but it appears that sending a message out is part of the update process, so messages should always come out, even on very fast changes. (The alternative would be for Mnesia to either queue messages or queue changes and run them in batches. I'm almost positive this is not what happens -- it would be too hard to predict the variability of advantageous moments to start batching jobs or queueing messages. Sending messages is generally much cheaper than making a change in the db.)
Since Erlang guarantees delivery of messages to a live destination this is as close to a promise that every Mnesia change will eventually be seen as you're likely to get. The order of messages couldn't be guaranteed on the receiving end (as it appears you expect), and of course a network failure could make a set of messages get missed (rendering the destination something other than live from the perspective of the sender).

Related

Inhibit Time in Tx-PDO

Objects 180Nh have the following subindices:
0x00:----
0x01:----
0x02:----
0x03 (inhibit time): This subindex contains a time lock in 100 µs steps (see following figure). This can be used to set a time that must elapse after the sending of a PDO before the PDO is sent another time. This time only applies for asynchronous PDOs. This is intended to prevent PDOs from being sent continuously if the mapped object constantly changes.
0x04 (compatibility entry): This subindex has no function and exists only for compatibility reasons.
0x05 (event timer): This time (in ms) can be used to trigger an Event which handles the copying of the data and the sending of the PDO.
According to the above point, we realize that when the event occurs, a certain time is determined, which is blocked, and it is for Tx-PDO; now, if the event occurs in this interval, it will be executed in the next section.
Why should the whole section be implemented? Why is the second, third, and fourth event executed in the last part?
Shouldn't the third and fourth events be executed separately?
By default, common CANopen device profiles like for example CiA 401 "generic I/O module" are configured to suit large automation networks. That is: a large network with lots of nodes where it is important to keep bus traffic low. On such networks nodes only transmit PDOs when there has been a data update (an internal event has occurred).
However, such a setup is very much unsuitable when CANopen is used for real-time control systems, like for example having a PLC controlling a bunch of actuator I/O modules that control motions of a machine. Which could also be a safety-related application. In such systems, it is custom to always send data repeatedly at even intervals, even if it has not changed. For example send all data once every 10ms/100ms.
Only the last data sent is used by the receiving node(s), so in case data goes missing/corrupt, new reliable data will arrive soon again. And if no data arrives at all, that's an indication that something is broken and the system ought to revert to a safe state, after receiving no new data in a certain time period. This is how mobile/automotive control systems are most commonly designed, since it is safe, deterministic and proven in use. Custom, non-standard CAN bus protocols by OEM are often implemented exactly like this.
Now, to achieve this with CANopen, we have to configure the TPDO communication parameters. Event timer to set the interval and inhibit time to prevent the node spamming extra data as soon as something has changed. If I remember correctly we also need to set 180N:2 transmission type to asynchronous (which sounds counter-intuitive).
With a setup like this, only the most recent event matters. The most up to date data will always get sent, at fixed intervals.

Is there any latency in SQS while creating it using AWS API and sending messages immediately after creating it

I want to create SQS using code whenever it is required to send messages and delete it after all messages are consumed.
I just wanted to know if there is some delay required between creating an SQS using Java code and then sending messages to it.
Thanks.
Virendra Agarwal
You'll have to try it and make observations. SQS is a dostributed system, so there is a possibility that a queue might not immediately be usable, though I did not find a direct documentation reference for this.
Note the following:
If you delete a queue, you must wait at least 60 seconds before creating a queue with the same name.
https://docs.aws.amazon.com/AWSSimpleQueueService/latest/APIReference/API_CreateQueue.html
This means your names will always need to be different, but it also implies something about the internals of SQS -- deleting a queue is not an instantaneous process. The same might be true of creation, though that is not necessarily the case.
Also, there is no way to know with absolute certainty that a queue is truly empty. A long poll that returns no messages is a strong indication that there are no messages remaining, as long as there are also no messages in flight (consumed but not deleted -- these will return to visibility if the consumer resets their visibility or improperly handles an exception and does not explicitly reset their visibility before the visibility timeout expires).
However, GetQueueAttributes does not provide a fail-safe way of assuring a queue is truly empty, because many of the counter attributes are the approximate number of messages (visible, in-flight, etc.). Again, this is related to the distributed architecture of SQS. Certain rare, internal failures could potentially cause messages to be stranded internally, only to appear later. The significance of this depends on the importance of the messages and the life cycle of the queue, and the risks of any such an issue seem -- to me -- increased when a queue does not have an indefinite lifetime (i.e. when the plan for a queue is to delete it when it is "empty"). This is not to imply that SQS is unreliable, only to make the point that any and all systems do eventually behave unexpectedly, however rare or unlikely.

SQS - How far out of order might messages be delivered?

I have a use case for SQS where I'll be sending messages about specific objects within a system. Each object will have a message at most every 20 seconds, and there are hundreds of thousands (potentially millions) of objects, which means I'll be handling tens of thousands (potentially hundreds of thousands) of messages per second. The volume of messages precludes using FIFO queues.
Most of the time, I don't care about in-order messaging. If messages for two different objects get delivered in a different order than they were emitted, that's fine. What could potentially be a problem is if two messages relating to the same object were delivered out of order.
Given that each object would only have events every 20 seconds, and 20 seconds is an eternity in computing time, it strikes me that it would be very unlikely for two messages sent 20 seconds apart (with potentially millions of messages between them) to be delivered out of order. That said, I haven't been able to find any hard data about out-of-order delivery with SQS. I know it's a thing that can happen, but I haven't seen any measured data about it.
I'm wondering if there is any kind of measured data on the probability that a message gets delivered X amount of time out of order, or X messages out of order.
SQS makes no guarantee about how far out of order a message can appear for a non-FIFO queue.
The most-related measurement I've seen to what your looking for is this experiment that found processing times for a message to be available for polling after it has been submitted to the queue. They also have a link to source code if you want to replicate the experiment to gather your own metrics.
If you absolutely must have them in the original order, you have a few options. They're not necessarily good options, but they are options.
Determine a way to horizontally partition your object IDs into n buckets, and use n different FIFO queues. (Probably the best option.)
Add your own sequence numbers to the messages.
Partition your messages into queues based on the current time. Drain each queue in order. (For example, you might publish to a single queue for only 4 seconds, and rotate sequentially through a group of 15 queues.)
Use a database and store the message timestamps in way that allows you to get the oldest message.

Why exactly does ePoll scale better than Poll?

Short question but for me its difficult to understand.
Why exactly does ePoll scale better than Poll?
While Damon's reason is correct for the unusual case where you never block on a socket, in typical real-world programs, the reason is completely different. A typical program looks like this:
1) Do all the work we can do now.
2) Check if any network connections need service, blocking if there's nothing to do.
3) Service any network connections discovered.
4) Go to step 1.
Typically, because you just did all the work you can do, when you come around to step 2, there is no work for you to do. So you'll have to wait a bit. Now, imagine there are 800 sockets you are interested in. The kernel has to put on the wait queue for each of those 800 sockets. And, a split-second later when data arrives on one of those 800 sockets, the kernel has to remove you from those 800 wait queues. Placing a task on a wait queue requires creating a 'thunk' to link that task to that wait queue. No good optimizations are possible because the kernel has no idea which 800 sockets you'll be waiting for.
With epoll, the epoll socket itself has a wait queue, and the process is only put on that one single wait queue. A thunk is needed to link each of the 800 connections to the epoll wait queue, but that thunk is persistent. You create it by adding a socket to an epoll set, and it remains there until you remove the socket from the set.
When there's activity on the socket, the kernel handles it in the task that detects the activity. When you wait, the kernel already knows if there's a detected event and the kernel only has to put you on that one wait queue. When you wake, it only has to remove you from that one queue.
So it's not so much the copying that's the killer with select or poll, it's the fact that the kernel has to manipulate a massive number of wait queues on each blocking operation.
The poll system call needs to copy your list of file descriptors to the kernel each time. This happens only once with epoll_ctl, but not every time you call epoll_wait.
Also, epoll_wait is O(1) in respect of the number of descriptors watched1, which means it does not matter whether you wait on one descriptor or on 5,000 or 50,000 descriptors. poll, while being more efficient than select, still has to walk over the list every time (i.e. it is O(N) in respect of number of descriptors).
And lastly, epoll can in addition to the "normal" mode work in "edge triggered" mode, which means the kernel does not need keep track of how much data you've read after you've been signalled readiness. This mode is more difficult to grasp, but somewhat more efficient.
1As correctly pointed out by David Schwartz, epoll_wait is of course still O(N) in respect of events that occur. There is hardly a way it could be any different, with any interface. If N events happen on a descriptor that is watched, then the application needs to get N notifications, and needs to do N "things" in order to react on what's happening.
This is again slightly, but not fundamentally different in edge triggered mode, where you actually get M events with M <= N. In edge triggered mode, when the same event (say, POLLIN) happens several times, you will probably get fewer notifications, possibly only a single one. However, this doesn't change much about the big-O notation as such.
However, epoll_wait is irrespective of the number of descriptors watched. Under the assumption that it is used in the intended, "normal" way (that is, many descriptors, few events), this is what really matters, and here it is indeed O(1).
As an analogy, you can think of a hash table. A hash table accesses its content in O(1), but one could argue that calculating the hash is actually O(N) in respect of the key length. This is technically absolutely correct, and there probably exist cases where this is a problem, however, for most people, this just doesn't matter.

sequence ID for handling reliability

I'm trying to figure out a simple way to handle reliability for UDP messages. I figured I would just send each one with a sequencing ID and by comparing the ID to the one previously received, a loss can be detected. I would normally just use integers however the idea that it would just keep incrementing indefinitely did not sit right with me.
I could use base64, but that would only make it a bit more readable but doesn't really solve anything.
I also considered prefixing a date stamp however that would be kind of sloppy when dealing messages received around midnight.
I feel like there must a better solution that someone could suggest, even if that is to just stick with integers.
My preference for this particular job is to use an incrementing (at least 64-bit) integer sequence seeded with a high-resolution timestamp. In this way, even if there is a loss of state at the sending end, when the sequence is re-seeded from the time, it will in all likelihood simply jump forward. Any Year 10K bugs that this might introduce are left as an exercise for Lazarus Long. :-)
Keep in mind that sequence-gap detection is essentially an optimization. The transmitting end needs to retransmit until an ack is received, with a nack (for a gap or corrupted datagram) simply eliciting earlier retransmission. (ZMODEM is a rare exception to this rule, with its default mode of operation being the use of a single ack at the end of the stream and all other retransmissions governed by nacks; as a file transfer protocol, however, it is essentially one giant multipart datagram.)
Use TCP? This is why TCP is different to UDP
I don't mean to be sarcastic, but this is why TCP is there.

Resources