I went through the documentation of the trace/3 BIF in Erlang. However, one observation I have made is that it cannot be used for tracing the consuming of messages from the mailbox. The flag 'receive' only traces when messages are added to a process's mailbox.
Is there any way one can trace events such as reading from the mailbox using the receive construct? If not, is there any reason why this isn't possible? It seems very strange that one can trace most kind of events in a program and the reading of messages from a mailbox is not traceable.
There is no such tool. You can only hope for call tracing of the handling function. It is rather easy in OTP applications since you can hook handle_....
Related
So my scenario is the following:
Producer will send a message to a queue using AsyncRabbitTemplate#sendAndReceive
Consumer will proccess the message and send a reply to the reply queue
so until now everything will work fine when the producer is up and running. The message from the reply queue will be received and everything is ok.
But when the producer is going down before all replies have been received there is no way to get them later. All pending replies will produce a warning "No pending reply - perhaps timed out:". I totally understand why this is happening when I look at the code.
Is there no way to have an persistent store of the information of incoming reply messages? Am I doing something comepletely wrong or is it just not possible to cover my use case with spring-amqp?
So the question is what is the best way to receive replies from a fixed reply queue after a restart of the producer.
There's not currently any support for persisting pending reply information; typically with request/reply scenarios, if the requestor dies the reply makes no sense. But I can see there are scenarios where that might not be the case.
Instead of using the async template, you could simply use the RabbitTemplate send() and use a listener container configured to handle and route the replies.
You would need to do your own request/reply correlation (e.g. with the correlationId header), persisting the pending reply correlations someplace.
Spring Integration provides a Metadatastore abstraction with several implementations which might be suitable.
I am looking to run a service that will be consuming messages that are placed into an SQS queue. What is the best way to structure the consumer application?
One thought would be to create a bunch of threads or processes that run this:
def run(q, delete_on_error=False):
while True:
try:
m = q.read(VISIBILITY_TIMEOUT, wait_time_seconds=MAX_WAIT_TIME_SECONDS)
if m is not None:
try:
process(m.id, m.get_body())
except TransientError:
continue
except Exception as ex:
log_exception(ex)
if not delete_on_error:
continue
q.delete_message(m)
except StopIteration:
break
except socket.gaierror:
continue
Am I missing anything else important? What other exceptions do I have to guard against in the queue read and delete calls? How do others run these consumers?
I did find this project, but it seems stalled and has some issues.
I am leaning toward separate processes rather than threads to avoid the the GIL. Is there some container process that can be used to launch and monitor these separate running processes?
There are a few things:
The SQS API allows you to receive more than one message with a single API call (up to 10 messages, or up to 256k worth of messages, whichever limit is hit first). Taking advantage of this feature allows you to reduce costs, since you are charged per API call. It looks like you're using the boto library - have a look at get_messages.
In your code right now, if processing a message fails due to a transient error, the message won't be able to be processed again until the visibility timeout expires. You might want to consider returning the message to the queue straight away. You can do this by calling change_visibility with 0 on that message. The message will then be available for processing straight away. (It might seem that if you do this then the visibility timeout will be permanently changed on that message - this is actually not the case. The AWS docs state that "the visibility timeout for the message the next time it is received reverts to the original timeout value". See the docs for more information.)
If you're after an example of a robust SQS message consumer, you might want to check out NServiceBus.AmazonSQS (of which I am the author). (C# - sorry, I couldn't find any python examples.)
I have a system that wraps RabbitMQ using erlang and the erlang client. We have the occasional situation where a subscriber goes down and messages queue. We will be implementing a dead-letter queue in the near future but I would like to implement a tool in the mean time to bind to a given queue and PULL all messages. I can then push them off somewhere else and replay them when the subscriber comes back online. However, I am having a hard time determining the best way to do this with the Rabbit tutorials/docs/ Mainly because the tutorials are a bit lacking for erlang clients.
Does anybody have experience with this or something similar?
I think the best thing to do is make the queue set to not auto delete. That way the queue will stay alive when the subscriber goes down. The exchange will continue to push messages to the queue which will store them until the subscriber comes back up and starts reading again.
I have a gen_server module that logs data to a file when a client process sends it data. What happens when two client processes send data at the same time to this module? Will the file operations conflict with each other? The erlang documentation is frustratingly unclear here.
Every Erlang process maintains a message queue. The process will fetch a message and handle the messages one by one.
In your example, if two clients calls the gen_server at the same time, these calls will become a message in the queue of gen_server process, and the gen_server will process these messages one by one. So no need to worry about a conflict.
But if one process has to handle too many messages from other processes, you'll need to think about the capacity of the process and optimize the design, or else it will become a bottleneck.
The gen_server runs in a separate process from your client process so when you do a call/cast to it you are actually sending messages to the server process.
All messages are placed in a process' message queue and processes handle their message one-by-one. If a message arrives while a process is busy then it is placed in the message queue. So log messages arriving concurrently will never interfere with each other as they will be processed sequentially.
This is not a property of the gen_server as such but a general property of all processes in erlang, which is why no mention of this is made in the gen_server documentation.
gen_server just handles requests in the order that they was done, not matter they was done from one process or many.
In case of writing to log there is no reason to worry about race-conditions.
Fortunately, the source for OTP is readily available at github, but the short answer is that gen_server runs in a loop, answering requests in order received, with no priority of one type (handle_cast, handle_call, or handle_info) over another.
Using handle_call can potentially be an issue, as the gen_server process will have to return before it can deal with the next cast/call/info in queue. For example, in a handle_call, avoid gen_server:call with self()!
I'm working on a Rails application that periodically needs to perform large numbers of IO-bound operations. These operations can be performed asynchronously. For example, once per day, for each user, the system needs to query Salesforce.com to fetch the user's current list of accounts (companies) that he's tracking. This results in huge numbers (potentially > 100k) of small queries.
Our current approach is to use ActiveMQ with ActiveMessaging. Each of our users is pushed onto a queue as a different message. Then, the consumer pulls the user off the queue, queries Salesforce.com, and processes the results. But this approach gives us horrible performance. Within a single poller process, we can only process a single user at a time. So, the Salesforce.com queries become serialized. Unless we run literally hundreds of poller processes, we can't come anywhere close to saturating the server running poller.
We're looking at EventMachine as an alternative. It has the advantage of allowing us to kickoff large numbers of Salesforce.com queries concurrently within a single EventMachine process. So, we get great parallelism and utilization of our server.
But there are two problems with EventMachine. 1) We lose the reliable message delivery we had with ActiveMQ/ActiveMessaging. 2) We can't easily restart our EventMachine's periodically to lessen the impact of memory growth. For example, with ActiveMessaging, we have a cron job that restarts the poller once per day, and this can be done without worrying about losing any messages. But with EventMachine, if we restart the process, we could literally lose hundreds of messages that were in progress. The only way I can see around this is to build a persistance/reliable delivery layer on top of EventMachine.
Does anyone have a better approach? What's the best way to reliably execute large numbers of asynchronous IO-bound operations?
I maintain ActiveMessaging, and have been thinking about the issues of a multi-threaded poller also, though not perhaps at the same scale you guys are. I'll give you my thoughts here, but am also happy to discuss further o the active messaging list, or via email if you like.
One trick is that the poller is not the only serialized part of this. STOMP subscriptions, if you do client -> ack in order to prevent losing messages on interrupt, will only get sent a new message on a given connection when the prior message has been ack'd. Basically, you can only have one message being worked on at a time per connection.
So to keep using a broker, the trick will be to have many broker connections/subscriptions open at once. The current poller is pretty heavy for this, as it loads up a whole rails env per poller, and one poller is one connection. But there is nothing magical about the current poller, I could imagine writing a poller as an event machine client that is implemented to create new connections to the broker and get many messages at once.
In my own experiments lately, I have been thinking about using Ruby Enterprise Edition and having a master thread that forks many poller worker threads so as to get the benefit of the reduced memory footprint (much like passenger does), but I think the EM trick could work as well.
I am also an admirer of the Resque project, though I do not know that it would be any better at scaling to many workers - I think the workers might be lighter weight.
http://github.com/defunkt/resque
I've used AMQP with RabbitMQ in a way that would work for you. Since ActiveMQ implements AMQP, I imagine you can use it in a similar way. I have not used ActiveMessaging, which although it seems like an awesome package, I suspect may not be appropriate for this use case.
Here's how you could do it, using AMQP:
Have Rails process send a message saying "get info for user i".
The consumer pulls this off the message queue, making sure to specify that the message requires an 'ack' to be permanently removed from the queue. This means that if the message is not acknowledged as processed, it is returned to the queue for another worker eventually.
The worker then spins off the message into the thousands of small requests to SalesForce.
When all of these requests have successfully returned, another callback should be fired to ack the original message and return a "summary message" that has all the info germane to the original request. The key is using a message queue that lets you acknowledge successful processing of a given message, and making sure to do so only when relevant processing is complete.
Another worker pulls that message off the queue and performs whatever synchronous work is appropriate. Since all the latency-inducing bits have already performed, I imagine this should be fine.
If you're using (C)Ruby, try to never combine synchronous and asynchronous stuff in a single process. A process should either do everything via Eventmachine, with no code blocking, or only talk to an Eventmachine process via a message queue.
Also, writing asynchronous code is incredibly useful, but also difficult to write, difficult to test, and bug-prone. Be careful. Investigate using another language or tool if appropriate.
also checkout "cramp" and "beanstalk"
Someone sent me the following link: http://github.com/mperham/evented/tree/master/qanat/. This is a system that's somewhat similar to ActiveMessaging except that it is built on top of EventMachine. It's almost exactly what we need. The only problem is that it seems to only work with Amazon's queue, not ActiveMQ.