RabbitMQ Integration Test and Threading - spring-amqp

I have written a RabbitMQ consumer by implementing the MessageListener interface and setting up a SimpleMessageListenerContainer. Its working well when I test it manually. Now I would like to write an integration test that:
Creates a message
Pushes the message to my RabbitMQ server
Waits while the message is consumed by my MessageListener implementation
Test makes some asserts once everything is done
However, since my MessageListener is running in a separate thread, it makes unit testing difficult. Using a Thread.sleep in my test to wait for the MessageListener is unreliable, I need some kind of blocking approach.
Is setting up a response queue and using rabbitTemplate.convertSendAndReceive my only option? I wanted to avoid setting up response queues, since they won't be used in the real system.
Is there any way to accomplish this using only rabbitTemplate.convertAndSend and then somehow waiting for my MessageListener to receive the message and process it? Ideally, I would imagine something like this:
rabbitTemplate.convertAndSend("routing.key", testObject);
waitForListner() // Somehow wait for my MessageListener consume the message
assertTrue(...)
assertTrue(...)
I know I could just pass a message directly to my MessageListener without connecting to RabbitMQ at all, but I was hoping to test the whole system if it is possible to do so. I plan on falling back to that solution if there is no way to accomlish my goal in a reasonably clean way.

There are several approaches, the easiest being to wrap your listener and pass in a CountDownLatch which is counted down by the listener and the main test thread uses
assertTrue(latch.await(TimeUnit.SECONDS));
You can also pass back the actual message received so you can verify it is as expected.
Also see the integration test cases in the framework itself.

Related

Worker Threads and Using Multple Messages Queue Software? Good Idea?

Problem: I have a program that will do a lot of post-processing on a file and then sending it ANOTHER web-service for more processing. Does my design smell and is this the right way to ‘tackle a problem’
Rails Accepts file and kicks off a resque_job to do work
The resque job will sends work via REST to another web-service cluster(very slow.. does MORE work) && places the task to be monitored in a monitor_queue within Rabbit MQ for completion . The Resque job will not wait another-webservice to complete task. It exits
There are some smell issues in my design, perhaps gut reactions that could be misguided?
Is it good design to have TWO message queues. The rational is that Rabbit_MQ has a built in method for creating worker_queues (they even give sample code). Resque_Job uses redis, and seems to be the accepted method of having ‘delayed jobs’ with Rails.
What I like about RabbitMQ: It has round-robin tasking abilities (so all threats get tasked) , and guarantees work will not be removed from a QUEUE without a message acknowledgement.
Resque seems to primary suggested solution for launching delayed_jobs within Rails.
Followup: When performing the polling, I was thinking a simple worker_queue of just iterating through the entire queue with seperate 'workers' makes the most sense? Do you agree.
I don't think this is a bad design. It's a type of Service Oriented Architecture, and even though you have separate queuing systems, they're completely separate applications and would only communicate through a specific interface, which has some pros and cons. I didn't quite understand the reasoning for using RabbitMQ, though. Also, a lot of new apps seem to be using Sidekiq; IMHO it is superior in every way to Resque.

Background Tasks in Spring (AMQP)

I need to handle a time-consuming and error-prone task (e.g., invoking a SOAP endpoint that will trigger the delivery of an SMS) whenever a given endpoint of my REST API is invoked, but I'd prefer not to make my users wait for that before sending a response back. Spring AMQP is already part of my stack, so I though about leveraging it to establish a "work queue" and have a number of worker processes consuming from the queue and taking care of the "work units". I have, however, the following requirements:
A work unit is guaranteed to be delivered, and delivered to exactly one worker.
Shall a work unit fail to be completed for any reason it must get placed back in the queue so that another worker can pick it up later.
Work units survive server reboots and crashes. This is mandatory because I won't be using a DB of any kind to store them.
I know RabbitMQ and Spring AMQP can be configured in such a way that ensures these three requirements, but I've only ever used it to achieve RPC so I don't know much about anything other than that. Is there any example I might follow? What are some of the pitfalls to watch out for?
While creating queues, rabbitmq gives you two options; transient or durable. Durable messages will be available until you acknowledge them. And messages won't expire if you do not give queue a ttl. For starters you can enable rabbitmq management plugin and play around a little.
But if you really want to guarantee the safety of your messages against hard resets or hardware problems, i guess you need to use a rabbitmq cluster.
Rabbitmq Clustering and you can find high availability subject on the right side of the page.
This guy explaines how to cluster
By the way i like beanstalkd too. You can make it write messages to disk and they will be safe except disk failures.

Deploying an SQS Consumer

I am looking to run a service that will be consuming messages that are placed into an SQS queue. What is the best way to structure the consumer application?
One thought would be to create a bunch of threads or processes that run this:
def run(q, delete_on_error=False):
while True:
try:
m = q.read(VISIBILITY_TIMEOUT, wait_time_seconds=MAX_WAIT_TIME_SECONDS)
if m is not None:
try:
process(m.id, m.get_body())
except TransientError:
continue
except Exception as ex:
log_exception(ex)
if not delete_on_error:
continue
q.delete_message(m)
except StopIteration:
break
except socket.gaierror:
continue
Am I missing anything else important? What other exceptions do I have to guard against in the queue read and delete calls? How do others run these consumers?
I did find this project, but it seems stalled and has some issues.
I am leaning toward separate processes rather than threads to avoid the the GIL. Is there some container process that can be used to launch and monitor these separate running processes?
There are a few things:
The SQS API allows you to receive more than one message with a single API call (up to 10 messages, or up to 256k worth of messages, whichever limit is hit first). Taking advantage of this feature allows you to reduce costs, since you are charged per API call. It looks like you're using the boto library - have a look at get_messages.
In your code right now, if processing a message fails due to a transient error, the message won't be able to be processed again until the visibility timeout expires. You might want to consider returning the message to the queue straight away. You can do this by calling change_visibility with 0 on that message. The message will then be available for processing straight away. (It might seem that if you do this then the visibility timeout will be permanently changed on that message - this is actually not the case. The AWS docs state that "the visibility timeout for the message the next time it is received reverts to the original timeout value". See the docs for more information.)
If you're after an example of a robust SQS message consumer, you might want to check out NServiceBus.AmazonSQS (of which I am the author). (C# - sorry, I couldn't find any python examples.)

How to test a class that connects to a FTP server?

I am developing a live update for my application. So far, I have created almost all unit tests but I have no idea how to test an specific class that connects to a FTP server and downloads new versions.
To test this class, should I create an FTP test server and use it in my unit tests? If so, how can I make sure this FTP server is always consistent to my tests? Should I create manually every file I will need before the test begin or should I automate this in my Test Class (tear down and setup methods)?
This question also applies to unit testing classes that connects do any kind of server.
EDIT
I am already mocking my ftp class so I dont always need to connect to the ftp server in other tests.
Let me see if I got this right about what Warren said in his comment:
I would argue that once you're talking to a separate app over TCP/IP
we should call that "integration tests". One is no longer testing a
unit or a method, but a system.
When a unit test needs to communicate to another app (that can be a HTTP server or FTP server) is this no longer a unit test but a integration server? If so, am I doing it wrong by trying to use unit testing techniques to create this test? Is it correct to say that I should not unit test this class? It does make sense to me because it seems to be a lot of work for a unit test.
In testing, the purpose is always first to answer the question: what is tested - that is, the scope of the test.
So if you are testing a FTP server implementation, you'll have to create a FTP client.
If you are testing a FTP client, you'll have to create a FTP server.
You'll have therefore to downsize the test extend, until you'll reach an unitary level.
It may be e.g. for your purpose:
Getting a list of the current files installed for the application;
Getting a list of the files available remotely;
Getting a file update;
Checking that a file is correct (checksum?);
and so on...
Each tested item is to have some mocks and stubs. See this article about the difference between the two. In short (AFAIK), a stub is just an emulation object, which always works. And a mock (which should be unique in each test) is the element which may change the test result (pass or fail).
For the exact purpose of a FTP connection, you may e.g. (when testing the client side) have some stubs which return a list of files, and a mock which will test several possible issues of the FTP server (time out, connection lost, wrong content). Then your client side shall react as expected. Your mock may be a true FTP server instance, but which will behave as expected to trigger all potential errors. Typically, each error shall raise an exception, which is to be tracked by the test units, in order to pass/fail each test.
This is a bit difficult to write good testing code. A test-driven approach is a bit time consuming at first, but it is always better in the long term. A good book is here mandatory, or at least some reference articles (like Martin Fowler's as linked above). In Delphi, using interfaces and SOLID principles may help you writing such code, and creating stubs/mocks to write your tests.
From my experiment, every programmer can be sometimes lost in writing tests... good test writing can be more time consuming than feature writing, in some circumstances... you are warned! Each test shall be see as a feature, and its cost shall be evaluated: is it worth it? Is not another test more suitable here? Is my test decoupled from the feature it is testing? Is it not already tested? Am I testing my code, or a third-party/library feature?
Out of the subject, but my two cents: HTTP/1.1 may be a better candidate nowadays than FTP, even for file update. You can resume a HTTP connection, load HTTP content by chunks in parallel, and this protocol is more proxy friendly than FTP. And it is much easier to host some HTTP content than FTP (some FTP servers have also known security issues). Most software updates are performed via HTTP/1.1 these days, not FTP (e.g. Microsoft products or most Linux repositories).
EDIT:
You may argue that you are making integration tests, when you use a remote protocol. It could make sense, but IMHO this is not the same.
To my understanding, integration tests take place when you let all your components work together as with the real application, then check that they are working as expected. My proposal about FTP testing is that you are mocking a FTP server in order to explicitly test all potential issues (timeout, connection or transmission error...). This is something else than integration tests: code coverage is much bigger. And you are only testing one part of the code, not the whole code integration. This is not because you are using some remote connection that you are doing integration tests: this is still unitary testing.
And, of course, integration and system tests shall be performed after unitary tests. But FTP client unitary tests can mock a FTP server, running it locally, but testing all potential issues which may occur in the real big world wide web.
If you are using Indy 10's TIdFTP component, then you can utilize Indy's TIdIOHandlerStream class to fake an FTP connection without actually making a physical connection to a real FTP server.
Create a TStream object, such as TMemoryStream or TStringStream, that contains the FTP responses you expect TIdFTP to receive for all of the commands it sends (use a packet sniffer to capture those beforehand to give you an idea of what you need to include), and place a copy of your update file in the local folder where you would normally download to. Create a TIdIOHandlerStream object and assign the TStream as its ReceiveStream, then assign that IOHandler to the TIdFTP.IOHandler property before calling Connect().
For example:
ResponseData := TStringStream.Create(
'220 Welcome' + EOL +
... + // login responses here, etc...
'150 Opening BINARY mode data connection for filename.ext' + EOL +
'226 Transfer finished' + EOL +
'221 Goodbye' + EOL);
IO := TIdIOHandlerStream.Create(FTP, ResponseData); // TIdIOHandlerStream takes ownership of ResponseData by default
FTP.IOHandler := IO;
FTP.Passive := False; // Passive=True does not work under this setup
FTP.Connect;
try
FTP.Get('filename.ext', 'c:\path\filename.ext');
// copy your test update file to 'c:\path\filename.ext'...
finally
FTP.Disconnect;
end;
Unit tests are supposed to be fast, lightening fast. Anything that slows them down discourages you from wanting to run them.
They are also supposed to be consistent from one run to another. Testing an actual file transfer would introduce the possibility for random failures in your unit tests.
If the class you are testing does nothing more than wrap the api of the ftp library you are using then you've reached one of the boundaries of your application you don't need to unit test it. (Well, sometimes you do. Its called exploratory testing but these are usually thrown away once you get your answer)
If, however, there is any logic in the class you should try to test it in isolation from the actual api. You do this by creating a wrapper for the ftp api. Then in your unit tests you create a test double that can stand in as a replacement for the wrapper. There are lots of variations that go by different names: stub, fake, mock object. Bottom line is you want to make sure your unit tests are isolated from any external influence. A unit test with sporadic behavior is less than useless.
Testing the actual file transfer mechanism should be done in integration testing which is usually run less frequently because its slower. Even in integration testing you'll want to try to control the test environment as much as possible. (i.e. testing with a ftp server on a the local network that is configured to mimic the production server).
And remember, you'll never catch everything up front. Errors will slip through no matter how good the tests are. Just make sure when they do that you add another test to catch them the next time.
I would recommend either buying or checking out a copy of XUnit Test Patterns by Gerard Meszaros. Its a treasure trove of useful information on what/when/how to unit test.
Just borrow the FTP or HTTP Server demo that comes with whatever socket component set you prefer (Indy, ICS, or whatever). Instant test server.
I would put it into a tools folder to go with my unit tests. I might write some code that checks if TestFtpServer.exe is already live, and if not, launch it.
I would keep it out of my unit test app's process memory space, thus the separate process.
Note that by the time you get to FTP server operations, unit testing should really be called "integration testing".
I would not manually create files from my unit test. I would expect that my code should check out from version control, and build, as it is, from a batch file, which runs my test program, which knows about a sub-folder called Tools that contains EXEs and maybe a folder called ServerData and LocalData that could be used to hold the data that is starting out on the server and being transferred down to my local unit test app. Maybe you can hack your demo server to have it terminate a session part way through (when you want to test failures) but I still don't think you're going to get good coverage.
Note If you're doing automatic updates, I think that no amount of unit testing is going to cut it. You need to deal with a lot of potential issues that are internet related. What happens when your hostname doesn't resolve? What happens when a download gets part way through and fails? Automatic-updating is not a great match with the capabilities of unit testing.
Write a couple of focused integration tests for the one component which knows how to communicate with an FTP server. For those tests you will need to start an FTP server before each tests, put there any files needed by the test, and after the test shutdown the server.
With that done, in all other tests you won't use the component which really connects to an FTP server, but you will use a fake or mock version of it (which is backed by some in-memory data structure instead of real files and network sockets). That way you can write unit tests, which don't need an FTP server or network connection, for everything else except the FTP client component.
In addition to those tests, it might be desirable to also have some end-to-end tests which launch the whole program (unlike the component-level focused integration tests) and connect a real FTP server. End-to-end tests can't cover all corner cases (unlike unit tests), but they can help to solve integration issues.

Executing large numbers of asynchronous IO-bound operations in Rails

I'm working on a Rails application that periodically needs to perform large numbers of IO-bound operations. These operations can be performed asynchronously. For example, once per day, for each user, the system needs to query Salesforce.com to fetch the user's current list of accounts (companies) that he's tracking. This results in huge numbers (potentially > 100k) of small queries.
Our current approach is to use ActiveMQ with ActiveMessaging. Each of our users is pushed onto a queue as a different message. Then, the consumer pulls the user off the queue, queries Salesforce.com, and processes the results. But this approach gives us horrible performance. Within a single poller process, we can only process a single user at a time. So, the Salesforce.com queries become serialized. Unless we run literally hundreds of poller processes, we can't come anywhere close to saturating the server running poller.
We're looking at EventMachine as an alternative. It has the advantage of allowing us to kickoff large numbers of Salesforce.com queries concurrently within a single EventMachine process. So, we get great parallelism and utilization of our server.
But there are two problems with EventMachine. 1) We lose the reliable message delivery we had with ActiveMQ/ActiveMessaging. 2) We can't easily restart our EventMachine's periodically to lessen the impact of memory growth. For example, with ActiveMessaging, we have a cron job that restarts the poller once per day, and this can be done without worrying about losing any messages. But with EventMachine, if we restart the process, we could literally lose hundreds of messages that were in progress. The only way I can see around this is to build a persistance/reliable delivery layer on top of EventMachine.
Does anyone have a better approach? What's the best way to reliably execute large numbers of asynchronous IO-bound operations?
I maintain ActiveMessaging, and have been thinking about the issues of a multi-threaded poller also, though not perhaps at the same scale you guys are. I'll give you my thoughts here, but am also happy to discuss further o the active messaging list, or via email if you like.
One trick is that the poller is not the only serialized part of this. STOMP subscriptions, if you do client -> ack in order to prevent losing messages on interrupt, will only get sent a new message on a given connection when the prior message has been ack'd. Basically, you can only have one message being worked on at a time per connection.
So to keep using a broker, the trick will be to have many broker connections/subscriptions open at once. The current poller is pretty heavy for this, as it loads up a whole rails env per poller, and one poller is one connection. But there is nothing magical about the current poller, I could imagine writing a poller as an event machine client that is implemented to create new connections to the broker and get many messages at once.
In my own experiments lately, I have been thinking about using Ruby Enterprise Edition and having a master thread that forks many poller worker threads so as to get the benefit of the reduced memory footprint (much like passenger does), but I think the EM trick could work as well.
I am also an admirer of the Resque project, though I do not know that it would be any better at scaling to many workers - I think the workers might be lighter weight.
http://github.com/defunkt/resque
I've used AMQP with RabbitMQ in a way that would work for you. Since ActiveMQ implements AMQP, I imagine you can use it in a similar way. I have not used ActiveMessaging, which although it seems like an awesome package, I suspect may not be appropriate for this use case.
Here's how you could do it, using AMQP:
Have Rails process send a message saying "get info for user i".
The consumer pulls this off the message queue, making sure to specify that the message requires an 'ack' to be permanently removed from the queue. This means that if the message is not acknowledged as processed, it is returned to the queue for another worker eventually.
The worker then spins off the message into the thousands of small requests to SalesForce.
When all of these requests have successfully returned, another callback should be fired to ack the original message and return a "summary message" that has all the info germane to the original request. The key is using a message queue that lets you acknowledge successful processing of a given message, and making sure to do so only when relevant processing is complete.
Another worker pulls that message off the queue and performs whatever synchronous work is appropriate. Since all the latency-inducing bits have already performed, I imagine this should be fine.
If you're using (C)Ruby, try to never combine synchronous and asynchronous stuff in a single process. A process should either do everything via Eventmachine, with no code blocking, or only talk to an Eventmachine process via a message queue.
Also, writing asynchronous code is incredibly useful, but also difficult to write, difficult to test, and bug-prone. Be careful. Investigate using another language or tool if appropriate.
also checkout "cramp" and "beanstalk"
Someone sent me the following link: http://github.com/mperham/evented/tree/master/qanat/. This is a system that's somewhat similar to ActiveMessaging except that it is built on top of EventMachine. It's almost exactly what we need. The only problem is that it seems to only work with Amazon's queue, not ActiveMQ.

Resources