Is it sufficient to set ROS publisher buffer to 1 and Subscriber buffer to 1000 and still not loose any messages - ros

I am trying to understand subscriber and publisher buffers. If I set subsrciber buffer to 1000 and publisher buffer to 1, are there any chances that I loose messages ? Could anyone please explain me the same?

Yes, in theory you may lose messages with these settings, in practice it depends.
Theory: spinner threads
On both sides, publisher as well as subscriber, there are so called spinner threads responsible for handling the callbacks (for message sending on the publisher side and message evaluation on the subscriber-side). These spinner threads are working in parallel to the main thread. If messages are arriving faster from the main thread than they are being processed by the spinner thread, the number of messages given by the queue size will be buffered up before beginning to throw away the oldest ones. Therefore if you publish at a very high rate the publisher-sided spinner thread might drop older messages, while if your callback function on the subscriber side takes too long to execute your subscriber queue will start dropping messages. To improve this one can use multi-threaded spinners where one increases the number of spinner threads and activate concurrency in order to process the callback queue more quickly. Read more about it here.
Practice: Choosing the queue size
The queue size of the publisher queue you should set depends on which rate you publish and if you publish in bursts. If you publish in bursts or at higher frequencies (e.g. > 10 Hz) a publisher queue size of 1 won't be sufficient. On the subscriber side it is harder to give recommendations as it also depends on how long the callback takes to process the information.
It is actually also possible to set the value 0 for the queues which results in an arbitrarily large queue but this might be problematic as the required memory might grow indefinitely, well at least until your computer freezes. Furthermore having a large queue size might often be disadvantageous: If you set a large queue and the callback takes long to execute you might be working on very outdated data while the queue gets longer and longer.
Alternative communication patterns
If you want to guarantee that information is actually being processed (e.g. real-time or safety-relevant information) ROS topics are probably the wrong choice. Depending on what precisely you need the other two communication methods services or actions might be an alternative. But for things like large information streams of safety-relevant real-time data there are no perfect communication mechanisms in ROS1.

Related

Is there any latency in SQS while creating it using AWS API and sending messages immediately after creating it

I want to create SQS using code whenever it is required to send messages and delete it after all messages are consumed.
I just wanted to know if there is some delay required between creating an SQS using Java code and then sending messages to it.
Thanks.
Virendra Agarwal
You'll have to try it and make observations. SQS is a dostributed system, so there is a possibility that a queue might not immediately be usable, though I did not find a direct documentation reference for this.
Note the following:
If you delete a queue, you must wait at least 60 seconds before creating a queue with the same name.
https://docs.aws.amazon.com/AWSSimpleQueueService/latest/APIReference/API_CreateQueue.html
This means your names will always need to be different, but it also implies something about the internals of SQS -- deleting a queue is not an instantaneous process. The same might be true of creation, though that is not necessarily the case.
Also, there is no way to know with absolute certainty that a queue is truly empty. A long poll that returns no messages is a strong indication that there are no messages remaining, as long as there are also no messages in flight (consumed but not deleted -- these will return to visibility if the consumer resets their visibility or improperly handles an exception and does not explicitly reset their visibility before the visibility timeout expires).
However, GetQueueAttributes does not provide a fail-safe way of assuring a queue is truly empty, because many of the counter attributes are the approximate number of messages (visible, in-flight, etc.). Again, this is related to the distributed architecture of SQS. Certain rare, internal failures could potentially cause messages to be stranded internally, only to appear later. The significance of this depends on the importance of the messages and the life cycle of the queue, and the risks of any such an issue seem -- to me -- increased when a queue does not have an indefinite lifetime (i.e. when the plan for a queue is to delete it when it is "empty"). This is not to imply that SQS is unreliable, only to make the point that any and all systems do eventually behave unexpectedly, however rare or unlikely.

Semaphores in NSOperationQueues

I'm taking my first swing at a Swift/NSOperationQueue based design, and I'm trying to figure out how to maintain data integrity across queues.
I'm early in the design process, but the architecture is probably going to involve one queue (call it sensorQ) handling a stream of sensor measurements from a variety of sensors that will feed a fusion model. Sensor data will come in at a variety of rates, some quite fast (accelerometer data, for example), but some will require extended computation that could take, say, a second or more.
What I'm trying to figure out is how to capture the current state into the UI. The UI must be handled by the main queue (call it mainQ) but will reflect the current state of the fusion engine.
I don't want to hammer the UI thread with every update that happens on the sensor queue because they could be happening quite frequently, so an NSOperationQueue.mainQueue.addOperationWithBlock() call passing state back to the UI doesn't seem feasible. By the same token, I don't want to send queries to the sensor queue because if it's processing a long calculation I'll block waiting for it.
I'm thinking to set up an NSTimer that might copy the state into the UI every tenth of a second or so.
To do that, I need to make sure that the state isn't being updated on the sensor queue at the same time I'm copying it out to the UI queue. Seems like a job for a semaphore, but I'm not turning up much mention of semaphores in relation to NSOperationQueues.
I am finding references to dispatch_semaphore_t objects in Grand Central Dispatch.
So my question is basically, what's the recommended way of handling these situations? I see repeated admonitions to work at the highest levels of abstraction (NSOperationQueue) unless you need the optimization of a lower level such as GCD. Is this a case I need the optimization? Can the dispatch_semiphore_t work with NSOperationQueue? Is there an NSOperationQueue based approach here that I'm overlooking?
How much data are you sending to the UI? A few numbers? A complex graph?
If you are processing all your sensor data on an NSOperationQueue (let's call it sensorQ), why not make the queue serial? Then when your timer fires, you can post an "Update UI" task to sensorQ. When your update task arrives on the sensorQ, you know no other sensor is modifying the state. You can bundle up your data and post to the main (UI) queue.
A better answer could be provided if we knew:
1. Do your sensors have a minimum and maximum data rate?
2. How many sensors are contributing to your fusion model?
3. How are you synchronizing access from the sensors to your fusion model?
4. How much data and in what format is the "update" to the UI?
My hunch is, semaphores are not required.
One method that might work here is to decouple the sensor data queue from your UI activities via a ring buffer. This effectively eliminates the need for semaphores.
The idea is that the sensor data processing component pushes data into the ring buffer and the UI component pulls the data from the ring buffer. The sensor data thread writes at the rate determined by your sensor/processing and the UI thread reads at whatever refresh rate is appropriate for your application.

How do I set the inputBuffer of an akka stream publisher?

I'm using Akka streams in a context where sinks for a single source will come and go. For this reason I'm creating a publisher from a source and attaching subscribers as the need arise:
val publisher= mySource.runWith(Sink.publisher(true))
with
publisher.subscribe(subscriber1)// There will be others
Some of the subscribers will be faster than others and I'd like to allow the faster ones to go ahead independently of the slowest, at least to the extend permitted by the input buffer of the publisher. This buffer is described by the comment on the Sink.publisher(true) method:
If fanout is true, the materialized Publisher will support multiple Subscribers and the size of the inputBuffer configured for this stage becomes the maximum number of elements that the fastest [[org.reactivestreams.Subscriber]] can be ahead of the slowest one before slowing the processing down due to back pressure.
My problem is that I don't know how to set this inputBuffer value "for this stage". The closest I have seen is described in the Dropping Broadcast section of this article but this seems to insist on the use of the Flow DSL. I believe that I can't use the DSL because of my need to continually attach new Subscribers.
As a result, my overall stream rate is held back by the slowest subscriber. A related aspect of what I am trying to do relates to making sure the different subscribers are running on different threads (without creating explicit actors as subscribers).
It'd look something like (for Akka Streams 2.0.1):
Sink.asPublisher(true).addAttributes(Attributes.inputBuffer(initialSize, maxSize))

How can I change the background operation priority dynamically using Dispatch or Operation queues.

Here is the problem that I got. I have several tasks to complete in background when application is running. When I run these tasks in background by pushing them to concurrent dispatch queue it takes more then 10 seconds to complete all of them. They basically load data from disk and parse it and represent the result to the user. That is they are just cached results and hugely improve the user experience.
This cached results are used in a particular functionality inside the app, and when that functionality is not used immediately after opening the application, it is not a problem that it takes 10 seconds to load the data that supports that functionality, because when user decides to use it, that data will already be loaded.
But when user immediately enters that function in the app after opening it, it takes considerable time (from the point of view of the user) to load the data. Also the whole data is not needed at the same moment, but rather the piece of it at a given moment.
That's why we need concurrently load the data, and if possible bring the results as soon as possible. That's why I decided to break the data into chunks, and when user requests the data, we should load the corresponding chunk by background thread and give that thread the highest priority. I'll explain what I mean.
Imagine there are 100 pieces of data and it takes more than 10 seconds to load them all. Whenever user queries the data first time, the app determines which chunk of the data user needs and starts loading that chunk. After that part is loaded the remaining data will also be loaded in the background, in order to make later queries faster (without the lag of loading the cache). But here a problem occurs, when user decides to change the query immediately after he has already entered one, and that change occurs for instance on the 2nd second of data loading process (remember it takes more than 10 seconds to load the data and we still have more than 8 seconds to complete the loading process), then in the extreme case user will receive his data waiting until all data will be loaded. That's way I need somehow manage the execution of the background tasks. That is, when user changes the input, I should change the priorities of execution, and give the thread that loads the corresponding chunk the highest priority without stopping it, so it will receive more processor time, and will finish sooner, and deliver results to the user faster, than it would if I have left the priorities the same. I know I can assign priorities to queues. But is there a way that I can change them dynamically while they are still executing?
Or do I need to implement custom thread management, in order to implement these behaviour? I really don't want to dive into thread management, and will be glad if it is possible to implement using only dispatch or operation queues.
I hope I've described the problem well. If not please comment bellow what is unclear, I'll explain.
Thank you so much for reading so far :) And special thanks to one who will provide an answer. And very special thanks to one, who will give me solution using dispatch or operation queues :)))
I think you need to move away from thinking about the priority at which the queues are running (which actually doesn't sound very important for the scenario you are describing) and more towards how you can use Dispatch I/O or an even simpler Dispatch source to control how the data is being read in. As you say, it takes 10 seconds the load the data and if the user suddenly changes their query immediately after asking, you need to essentially stop reading the data for the previous request and do whatever needs to be done to fulfill the most recent query. Using Dispatch I/O to chunk the data (asynchronously) and update the UI also asynchronously will allow you to change your mind mid-stream (using some sort of semaphore or cancellation flag) and either continue to trickle the data in (you don't say whether or not that data will remain useful if the user changes their mind or not), suspend the reading process, or cancel it altogether and start a new operation. Eithe way, being able to suspend/resume a source and also have it fire callbacks for reasonably small chunks of data will certainly enable you to make decisions on a much more granular chunk of time than 8 seconds!
I'm afraid the only way to do that is to cancel running operation before starting new one.
You cannot remove it from queue until it's done or canceled.
As an improvement for your problem I would suggest to load things even user doesn't need them in background - so you can load them from cache after it's there.
You can create 2 NSOperationQueue with 2 different priorities and download things in background whenever user is idle on LowPriorityQueue. For important operations you can have high priority queue - which you will cancel each time search term changes.
On top of that you just need to cache results from both of those queues.

Why exactly does ePoll scale better than Poll?

Short question but for me its difficult to understand.
Why exactly does ePoll scale better than Poll?
While Damon's reason is correct for the unusual case where you never block on a socket, in typical real-world programs, the reason is completely different. A typical program looks like this:
1) Do all the work we can do now.
2) Check if any network connections need service, blocking if there's nothing to do.
3) Service any network connections discovered.
4) Go to step 1.
Typically, because you just did all the work you can do, when you come around to step 2, there is no work for you to do. So you'll have to wait a bit. Now, imagine there are 800 sockets you are interested in. The kernel has to put on the wait queue for each of those 800 sockets. And, a split-second later when data arrives on one of those 800 sockets, the kernel has to remove you from those 800 wait queues. Placing a task on a wait queue requires creating a 'thunk' to link that task to that wait queue. No good optimizations are possible because the kernel has no idea which 800 sockets you'll be waiting for.
With epoll, the epoll socket itself has a wait queue, and the process is only put on that one single wait queue. A thunk is needed to link each of the 800 connections to the epoll wait queue, but that thunk is persistent. You create it by adding a socket to an epoll set, and it remains there until you remove the socket from the set.
When there's activity on the socket, the kernel handles it in the task that detects the activity. When you wait, the kernel already knows if there's a detected event and the kernel only has to put you on that one wait queue. When you wake, it only has to remove you from that one queue.
So it's not so much the copying that's the killer with select or poll, it's the fact that the kernel has to manipulate a massive number of wait queues on each blocking operation.
The poll system call needs to copy your list of file descriptors to the kernel each time. This happens only once with epoll_ctl, but not every time you call epoll_wait.
Also, epoll_wait is O(1) in respect of the number of descriptors watched1, which means it does not matter whether you wait on one descriptor or on 5,000 or 50,000 descriptors. poll, while being more efficient than select, still has to walk over the list every time (i.e. it is O(N) in respect of number of descriptors).
And lastly, epoll can in addition to the "normal" mode work in "edge triggered" mode, which means the kernel does not need keep track of how much data you've read after you've been signalled readiness. This mode is more difficult to grasp, but somewhat more efficient.
1As correctly pointed out by David Schwartz, epoll_wait is of course still O(N) in respect of events that occur. There is hardly a way it could be any different, with any interface. If N events happen on a descriptor that is watched, then the application needs to get N notifications, and needs to do N "things" in order to react on what's happening.
This is again slightly, but not fundamentally different in edge triggered mode, where you actually get M events with M <= N. In edge triggered mode, when the same event (say, POLLIN) happens several times, you will probably get fewer notifications, possibly only a single one. However, this doesn't change much about the big-O notation as such.
However, epoll_wait is irrespective of the number of descriptors watched. Under the assumption that it is used in the intended, "normal" way (that is, many descriptors, few events), this is what really matters, and here it is indeed O(1).
As an analogy, you can think of a hash table. A hash table accesses its content in O(1), but one could argue that calculating the hash is actually O(N) in respect of the key length. This is technically absolutely correct, and there probably exist cases where this is a problem, however, for most people, this just doesn't matter.

Resources