We need to process a big number of messages stored in SQS (the messages originate from Amazon store and SQS is the only place we can save them to) and save the result to our database. The problem is, SQS can only return 10 messages at a time. Considering we can have up to 300000 messages in SQS, even if requesting and processing a 10 messages takes little time, the whole process takes forever with the main culprit being actually requesting and receiving the messages from SQS.
We're looking for a way to speed this up. The intended result would be dumping the results to our database. The process would probably run a few times per day (the number of messages would likely be less per run in that scenario).
Like Michael-sqlbot wrote, parallel requests were the solution. By rewriting our code to use async and making 10 requests at the same time, we managed to reduce the execution time to something much reasonable.
I guess it's because I rarely use multithreading directly in my job, that I haven't thought of using it to solve this problem.
Related
I want to create SQS using code whenever it is required to send messages and delete it after all messages are consumed.
I just wanted to know if there is some delay required between creating an SQS using Java code and then sending messages to it.
Thanks.
Virendra Agarwal
You'll have to try it and make observations. SQS is a dostributed system, so there is a possibility that a queue might not immediately be usable, though I did not find a direct documentation reference for this.
Note the following:
If you delete a queue, you must wait at least 60 seconds before creating a queue with the same name.
https://docs.aws.amazon.com/AWSSimpleQueueService/latest/APIReference/API_CreateQueue.html
This means your names will always need to be different, but it also implies something about the internals of SQS -- deleting a queue is not an instantaneous process. The same might be true of creation, though that is not necessarily the case.
Also, there is no way to know with absolute certainty that a queue is truly empty. A long poll that returns no messages is a strong indication that there are no messages remaining, as long as there are also no messages in flight (consumed but not deleted -- these will return to visibility if the consumer resets their visibility or improperly handles an exception and does not explicitly reset their visibility before the visibility timeout expires).
However, GetQueueAttributes does not provide a fail-safe way of assuring a queue is truly empty, because many of the counter attributes are the approximate number of messages (visible, in-flight, etc.). Again, this is related to the distributed architecture of SQS. Certain rare, internal failures could potentially cause messages to be stranded internally, only to appear later. The significance of this depends on the importance of the messages and the life cycle of the queue, and the risks of any such an issue seem -- to me -- increased when a queue does not have an indefinite lifetime (i.e. when the plan for a queue is to delete it when it is "empty"). This is not to imply that SQS is unreliable, only to make the point that any and all systems do eventually behave unexpectedly, however rare or unlikely.
I have a web site where user can upload a PDF and convert it to WORD doc.
It works nice but sometimes (5-6 times per hour) the users have to wait more than usual for the conversion to take place....
I use ASP.NET MVC and the flow is:
- USER uploads file -> get the stream and convert it to word -> save word file as a temp file -> return the user the url
I am not sure if I have to convert this flow to asynchronous? Basically, my flow is sequential now BUT I have about 3-5 requests per second and CPU is dual core and 4 GB Ram.
And as I know maxConcurrentRequestsPerCPU is 5000; also The default value of Threads Per Processor Limit is 25; so these default settings should be more than fine, right?
Then why still my web app has "waitings" some times? Are there any IIS settings I need to modify from default to anything else or I should just go and make my sync method for conversion to be async?
Ps: The conversion itself is taking between 1 seconds to 40-50 seconds depending on the pdf file size.
UPDATE: Basically what it's not very clear for me is: if a user uploads a file and the conversion is long shouldn't only current request "suffer" because of this? Because the next request is independent, make another CPU call and different thread so should be no wait here, isn't it?
There are a couple of things that must be defined clearly here. Async(hronous) method and flow are not the same thing at least as far as I can understand.
An asynchronous method (using Task, usually also leveraging the async/await keywords) will work in the following way:
The execution starts on thread t1 until it reaches an await
The (potentially) long operation will not take place on thread t1 - sometimes not even on an app thread at all, leveraging IOCP (I/O completion ports).
Thread t1 is free and released back to the thread pool and is ready to service other requests if needed
When the (potentially) long operation returns a thread is taken from the thread pool (could even be the same t1 or, most probably, another one) and the rest of the code execution resumes from the last await encountered
The rest of the code executes
There's a couple of things to note here:
a. The client is blocked during the whole process. The eventual switching of threads and so on happens only on the server
b. This approach is mainly designed to alleviate an unwanted condition called 'thread starvation'. It is not meant to speed up the total client waiting time and it usually doesn't speed up the process.
As far as I understand an asynchronous flow would mean, at least in this case, that after the user's request of converting the document, the client (i.e. the client's browser) would quickly receive a response in which (s)he is informed that this potentially long process has started on the server, the user should be patient and this current response page might provide progress feedback.
In your case I recommend the second approach because the first approach would not help at all.
Of course this will not be easy. You need to emulate a queue, you need to have a processing agent and an eviction policy (most probably enforce by the same agent if you don't want a second agent).
This would work along the following lines:
a. The end user submits a file, the web server receives it
b. The web server places it in the queue and receives a job number
c. The web server returns the user a response with the job number (let's say an HTML page with a polling mechanism that would periodically receive progress from the server)
d. The agent would start processing the document when it gets the chance (i.e. finishes other work) and update its status in a common place for the web server to pick this information
e. The web server would receive calls from the HTML response asking for the status of the job and would find out that the job is complete and offer a download link or start downloading it directly.
This can be refined in some ways:
instead of the client polling the server, websockets or long polling (for example SignalR covers both) could be used
many processing agents could be used instead of one if the hardware configuration makes sense
The queue can be implemented with a simple RDBMS, Remus Rușanu has a nice article about this.
I am trying to use RabbitMQ for a distributed system that would work something like:
a producer puts in a queue a JSON-formatted list of order ids
several consumers pull out of that queue, do the business logic with that order ids and the result (JSON formatted) as well is put back into another queue
from the second queue, another consumer will take the data and pass it back to the caller
I am still very new to RabbitMQ and I am wondering if this model is the right approach, given the fact that the data should be back as fast as possible (sometimes in the matter of seconds, max 5) so there are real time requirements.
Also, how large can the message passed to a queue can be? The JSON that the producer will get back will be fairly large, based on what the consumer does.
Thanks for any ideas!
See page 47 in this presentation (InfoQ) for a great comparision between different messaging formats.
There's nothing wrong with the design you suggested.
The slight wrinkle is that enforcing "real time requirements" isn't straightforward. For instance, it's not currently possible to expire messages within a queue, so this would need to be handled by the clients when consuming messages.
The total size of messages in RabbitMQ <=1.8.1 was bounded by the amount of available RAM. As of 2.0.0, it's bounded by the amount of available disk space (i.e. rabbit will page messages to disk if it's running low on memory). Individual message sizes are recorded as 32-bit integers (IIRC), so individual messages cannot be larger than ~4GB; if this is a problem, consider saving the JSONs to network storage and passing some ID to them in the messages. Other than this, there aren't any constraints.
My application is basically a content based router which will route MMS events.
The logger I am using is the one that comes with the OTP framework in SASL mode "error_logger"
The issue is ::
I am using a client to generate MMS events with default values. This client (in Java) has the ability to send high load of events in multiple THREADS
I am sending 100 events in 10 threads (each thread sending 10 MMS events) to the my router written in Erlang/OTP.
The problem is, when such high load is received by my router , my Logger hangs i.e it stops updating my Log file. But the router is still able to route the events.
The conclusions that I have come up with is ::
Scheduling problem in Erlang when such high load of events is received (a separate process for each event).
A very unlikely dead-loack state.
Might be due to sending events in multiple threads rather than sending them sequentially. But I guess a router will be connected to multiple service provider boxes, so I thought of sending events in threads.
Can anybody help mw in demystifying the problem?
You already have a good answer, but I'll add to the discussion.
The error_logger is by default using cached write operations to disk. So one possibility is that you don't really notice this while under low load, but under high load your writes get stuck in the cache for a while.
On a side note: there should be no problem having multiple threads doing calls to Erlang.
Another way of testing this is to add your own logger to error_logger, and see what happens. Possibly printing to the shell or something else that is "fast".
Which version of Erlang are you using? Prior to R14A (R13B4 maybe?), there was a performance penalty when you invoked a selective receive when the message queue contained a lot of messages. This behaviour meant that in a process that receives lots of messages (error_logger being the canonical example), if it was barely keeping up with the load then a small spike in load could cause the cost of processing to spike up and stay there as the new processing cost was higher than the process could bear. This problem has been solved in R14A.
Secondly - why are you sending a high volume of events/calls/logs to a text logger? Formatting strings for output to a human readable log file is a lot more expensive than using a binary disk_log for instance. Reducing the cost of logging will help, but reducing the volume of logs will help even more. Maybe investigate exactly why you need to log these things and see if you can't record them another (less expensive) way.
Problems with error_logger are often symptoms of some other overload problem. Try looking at the message queue sizes for all your processes when this problem occurs and see if something else is backed up too. The following erlang shellcode might help:
[ { P, element(2, process_info(P, message_queue_len)) }
|| P <- erlang:processes(), is_process_alive(P) ]
I'm designing a .net interface for sending and receiving a HL7 message and noticed on this forum theres a few people with this experience.
My question is.... Would anyone be able to share their experience on how long it could take to get a message response back from a hospital HL7 server. (Particularly when requesting patient demographics) - seconds / minutes / Hours?
My dilemma is do I design my application to make the user wait for the message to come back.
(Sorry if this is a little off topic, it’s still kinda programming related? – I searched the web for HL7 forums but got stuck so again if anyone knows of any please let me know )
cheers,
Jason
In my experience, you should receive an ACK or NAK back within a few seconds. The receiving application shouldn't do something like making you wait while it performs operations on the message. We have timeouts set to 30 seconds, and we almost never wait that long for a response.
This is quite dependent on the kind of HL7 message sent, typically messages like ADT's are sent as essentially updates to the server, and are acknowledged almost immediately if the hospital system is behaving well. This will result in a protocol level acknowledgement, which indicates that the peer has received the message but not necessarily processed it yet.
Typically, most systems will employ a broker or message queue in their integration engines so you get your ack almost immediately.
Other messages like lab request messages may actually send another non-ack message back which contains the information requested. These requests can take longer.
You can check with the peer you're communicating with to see what integration engine they are using, and if a queue sits on that end which would help ensure the response times are short.
In the HL7 integration tool I work on, we use queues for inbound data so we can responde immediately. And for our outbound connections, 10s timeouts are default, and seem to work fine for most of our customers.
When sending a Query type event in HL7, it could take a number of seconds to get the proper response back. You also need to code for the possibility that you will never get a response back, and the possibility that connected systems "don't do" queries.
Most HL7 nets that I have worked on, assume that all interested systems are listening for demographic updates at all times. Usually, receiving systems process these updates into a patient database that documents both the Person and Encounter (Stay) information on the fly.
In my location, my system usually gets about 10-20 thousand messages a day, most of which are patient demographic updates.
It depends if the response is generated automatically by a system or if the response is generated after an user does something on the system. For an automatic response it might take less than a second, depending of course on the processing that is done by the system and the current work load of that system. If the system is not too busy and processing is just a couple of queries and verification of some conditions, considering network delays, response time should be a few seconds or less.