I am trying to use RabbitMQ for a distributed system that would work something like:
a producer puts in a queue a JSON-formatted list of order ids
several consumers pull out of that queue, do the business logic with that order ids and the result (JSON formatted) as well is put back into another queue
from the second queue, another consumer will take the data and pass it back to the caller
I am still very new to RabbitMQ and I am wondering if this model is the right approach, given the fact that the data should be back as fast as possible (sometimes in the matter of seconds, max 5) so there are real time requirements.
Also, how large can the message passed to a queue can be? The JSON that the producer will get back will be fairly large, based on what the consumer does.
Thanks for any ideas!
See page 47 in this presentation (InfoQ) for a great comparision between different messaging formats.
There's nothing wrong with the design you suggested.
The slight wrinkle is that enforcing "real time requirements" isn't straightforward. For instance, it's not currently possible to expire messages within a queue, so this would need to be handled by the clients when consuming messages.
The total size of messages in RabbitMQ <=1.8.1 was bounded by the amount of available RAM. As of 2.0.0, it's bounded by the amount of available disk space (i.e. rabbit will page messages to disk if it's running low on memory). Individual message sizes are recorded as 32-bit integers (IIRC), so individual messages cannot be larger than ~4GB; if this is a problem, consider saving the JSONs to network storage and passing some ID to them in the messages. Other than this, there aren't any constraints.
Related
We are developing a typical CANbus networked system with what you could call a controller organizing a number of devices.
The devices needs configuration, which the controller writes (and might also read back) using regular object dictionary items (currently in the manufacturer specific range).
The devices also perform actions (commands) with more than 8 bytes of data and this we solve by having write only items in the device object dictionary and relying on the regular segmentation/de-segmentation of SDO's. (I don't know if this is the CANopen way of doing things, but it seems reasonable).
However, the device also produces events (say some sensor data passes a certain threshold) resulting in more than 8 bytes of asynchronous data coming up from the device. PDO's are meant to be used for sending async event data, but it can only contain 8 bytes. The devices could write the data into an object dictionary item on the controller, but this doesn't seem like the CANopen way. Am I right?
The best we've come up with is to send a PDO to the controller, informing the controller that more data are available in the object dictionary on the device.
Anyone with CANopen background that can way in on the best (CANopen) way of solving this?
Since I'm repeating 8 bytes a lot, we can safely assume that this network is not running CAN-FD.
The key of any sensible CAN network design is to consider real-time, data priorities, bus load and data amounts early on. If you find yourself with a chunk of data larger than 8 bytes, then that strongly suggests that something is wrong in this design - it should likely be split in several packages.
Generally, you shouldn't be using SDO for data at all, since they come with overhead. That includes writes to the object dictionary, which also means SDO access. Block transfers etc with SDO are meant for things like bootloaders or one-time configuration, not for live data traffic in operational mode. It can be done, but it is fishy.
You can in theory map data across several PDOs with PDO mapping, but all of this really sounds like an "XY problem" - you are convinced that you need to transmit larger chunks of data and look for a way to do it. But step 1 is to look at the fundamental network data/design and see if you actually need those large chunks or if it makes sense to split them in several. The ideal CANopen design is to have one PDO per type of data, when possible.
We need to process a big number of messages stored in SQS (the messages originate from Amazon store and SQS is the only place we can save them to) and save the result to our database. The problem is, SQS can only return 10 messages at a time. Considering we can have up to 300000 messages in SQS, even if requesting and processing a 10 messages takes little time, the whole process takes forever with the main culprit being actually requesting and receiving the messages from SQS.
We're looking for a way to speed this up. The intended result would be dumping the results to our database. The process would probably run a few times per day (the number of messages would likely be less per run in that scenario).
Like Michael-sqlbot wrote, parallel requests were the solution. By rewriting our code to use async and making 10 requests at the same time, we managed to reduce the execution time to something much reasonable.
I guess it's because I rarely use multithreading directly in my job, that I haven't thought of using it to solve this problem.
I'm using Akka streams in a context where sinks for a single source will come and go. For this reason I'm creating a publisher from a source and attaching subscribers as the need arise:
val publisher= mySource.runWith(Sink.publisher(true))
with
publisher.subscribe(subscriber1)// There will be others
Some of the subscribers will be faster than others and I'd like to allow the faster ones to go ahead independently of the slowest, at least to the extend permitted by the input buffer of the publisher. This buffer is described by the comment on the Sink.publisher(true) method:
If fanout is true, the materialized Publisher will support multiple Subscribers and the size of the inputBuffer configured for this stage becomes the maximum number of elements that the fastest [[org.reactivestreams.Subscriber]] can be ahead of the slowest one before slowing the processing down due to back pressure.
My problem is that I don't know how to set this inputBuffer value "for this stage". The closest I have seen is described in the Dropping Broadcast section of this article but this seems to insist on the use of the Flow DSL. I believe that I can't use the DSL because of my need to continually attach new Subscribers.
As a result, my overall stream rate is held back by the slowest subscriber. A related aspect of what I am trying to do relates to making sure the different subscribers are running on different threads (without creating explicit actors as subscribers).
It'd look something like (for Akka Streams 2.0.1):
Sink.asPublisher(true).addAttributes(Attributes.inputBuffer(initialSize, maxSize))
In my app I want to retrieve a large amount of data from Parse to build a view of statistics. However, in future, as data builds up, this may be a huge amount.
For example, 10,000 results. Even if I fetched in batches of 1000 at a time, this would result in 10 fetches. This could rapidly, send me over the 30 requests per second limitation by Parse. Specifically when several other chunks of data may need to be collected at the same time for other stats.
Any recommendations/tips/advice for this scenario?
You will also run into limits with the skip and limit query variables. And heavy weight lifting on a mobile device could also present issues for you.
If you can you should pre-aggregate these statistics, perhaps once per day, so that you can simply directly request the details.
Alternatively, create a cloud code function to do some processing for you and return the results. Again, you may well run into limits here, so a cloud job may meed your needs better, and then you may need to effectively create a request object which is processed by the job and then poll for completion or send out push notifications on completion.
I tried to make an Erlang in-memory datastore that would receive messages and add them to a list. Here's the current incarnation. The trouble is, I'm receiving about 200 messages per second and this easily exhausts the memory available.
Once a minute, I send a {write, Pid} message that should clear out and clean up this list, but it doesn't look like it's being garbage collected.
What am I doing wrong? I think I'm approaching this from the completely wrong direction...
datastore(Db) ->
receive
{put, Data} ->
datastore(lists:concat([Data,Db]));
{write, Responder} ->
ScratchName = "ScratchFile.dat",
{ok, ScratchDevice} = file:open(ScratchName,[write]),
file:write(ScratchDevice,Db),
ok = file:close(ScratchDevice),
Responder ! {load, ScratchName},
datastore([])
end.
First spontaneous comment is that file:open will open the file, truncate it, and then write to it. So every time in the loop will overwrite any previous data. So if the Responder is slow with its loading of the file, there could be data you did not expect in the file.
Second reaction is that you don't have to do this buffering yourself. If you open the file with the option {delayed_write, Size, Delay}, and set Size and Delay to values that fit your purpose, you get precisely what you are trying to implement here by just writing all the time.
Third reaction is that you are probably doing the wrong thing if you use a file to communicate between different parts of your system. What are you attempting to do?
ps.
If you need a new random filename, you can easily generate one with erlang:now/0 and io_lib:format/2. As an added bonus they will sort in creation order.
This is a very wrong way of buffering in Erlang. Data Structures such as ETS (http://www.erlang.org/doc/man/ets.html) have been designed to handle thousands and millions of IN-MEMORY Erlang Data Structures with ease. Please, do not use Lists or Queues for handling too much data. If a part of your code will be handling data which other parts of the application are supposed to consume and yet you know that the consumers will be doing it a slower rate as compared to the part that is generating or getting the data, then you need a more robust way of buffering (ETS Tables).
Another thing is that usually, processes are a point of failure in a system. If a process is used to buffer or hold on to very essential data, even if that data is instantaneous but critical to the system, what would happen at that time when the process exits or dies ? ETS tables have been designed in a way that they can provide data access to all processes even applications within the same VM (of type public). In this way, all processes can use the data, reading as much as they want (concurrently) but what you would do is to ensure consistency by having one writer / updater.
ETS Tables rarely fail in an application as compared to the frequency at which processes fail. Most recently, a method that helps us to redeem data in a failing ETS table has been introduced ( ets:give_away/3 ).
Another thing, in a comment above, you have mentioned that you are working for a large Company. Usually, with large teams, its better you evaluate a number of options and make intensive tests against several depending the nature of the application you are developing. To avoid side effects, its best that you identify which data structures are best to use for what. For example, for in-memory storage, capable of handling 200 messages per second, if tested properly, Lists and Files would fail against ETS Tables.