Test using StepVerifier blocks when using Spring WebClient with retry - project-reactor

EDIT: here https://github.com/wujek-srujek/reactor-retry-test is a repository with all the code.
I have the following Spring WebClient code to POST to a remote server (Kotlin code without imports for brevity):
private val logger = KotlinLogging.logger {}
#Component
class Client(private val webClient: WebClient) {
companion object {
const val maxRetries = 2L
val firstBackOff = Duration.ofSeconds(5L)
val maxBackOff = Duration.ofSeconds(20L)
}
fun send(uri: URI, data: Data): Mono<Void> {
return webClient
.post()
.uri(uri)
.contentType(MediaType.APPLICATION_JSON)
.bodyValue(data)
.retrieve()
.toBodilessEntity()
.doOnSubscribe {
logger.info { "Calling backend, uri: $uri" }
}
.retryExponentialBackoff(maxRetries, firstBackOff, maxBackOff, jitter = false) {
logger.debug { "Call to $uri failed, will retry (#${it.iteration()} of max $maxRetries)" }
}
.doOnError {
logger.error { "Call to $uri with $maxRetries retries failed with $it" }
}
.doOnSuccess {
logger.info { "Call to $uri succeeded" }
}
.then()
}
}
(It returns an empty Mono as we don't expect an answer, nor do we care about it.)
I would like to test 2 cases, and one of them is giving me headaches, namely the one in which I want to test that all the retries have been fired. We are using MockWebServer (https://github.com/square/okhttp/tree/master/mockwebserver) and the StepVerifier from reactor-test. (The test for success is easy and doesn't need any virtual time scheduler magic, and works just fine.) Here is the code for the failing one:
#JsonTest
#ContextConfiguration(classes = [Client::class, ClientConfiguration::class])
class ClientITest #Autowired constructor(
private val client: Client
) {
lateinit var server: MockWebServer
#BeforeEach
fun `init mock server`() {
server = MockWebServer()
server.start()
}
#AfterEach
fun `shutdown server`() {
server.shutdown()
}
#Test
fun `server call is retried and eventually fails`() {
val data = Data()
val uri = server.url("/server").uri()
val responseStatus = HttpStatus.INTERNAL_SERVER_ERROR
repeat((0..Client.maxRetries).count()) {
server.enqueue(MockResponse().setResponseCode(responseStatus.value()))
}
StepVerifier.withVirtualTime { client.send(uri, data) }
.expectSubscription()
.thenAwait(Duration.ofSeconds(10)) // wait for the first retry
.expectNextCount(0)
.thenAwait(Duration.ofSeconds(20)) // wait for the second retry
.expectNextCount(0)
.expectErrorMatches {
val cause = it.cause
it is RetryExhaustedException &&
cause is WebClientResponseException &&
cause.statusCode == responseStatus
}
.verify()
// assertions
}
}
I am using withVirtualTime because I don't want the test to take nearly seconds.
The problem is that the test blocks indefinitely. Here is the (simplified) log output:
okhttp3.mockwebserver.MockWebServer : MockWebServer[51058] starting to accept connections
Calling backend, uri: http://localhost:51058/server
MockWebServer[51058] received request: POST /server HTTP/1.1 and responded: HTTP/1.1 500 Server Error
Call to http://localhost:51058/server failed, will retry (#1 of max 2)
Calling backend, uri: http://localhost:51058/server
MockWebServer[51058] received request: POST /server HTTP/1.1 and responded: HTTP/1.1 500 Server Error
Call to http://localhost:51058/server failed, will retry (#2 of max 2)
As you can see, the first retry works, but the second one blocks. I don't know how to write the test so that it doesn't happen. To make matters worse, the client will actually use jitter, which will make the timing hard to anticipate.
The following test using StepVerifier but without WebClient works fine, even with more retries:
#Test
fun test() {
StepVerifier.withVirtualTime {
Mono
.error<RuntimeException>(RuntimeException())
.retryExponentialBackoff(5,
Duration.ofSeconds(5),
Duration.ofMinutes(2),
jitter = true) {
println("Retrying")
}
.then()
}
.expectSubscription()
.thenAwait(Duration.ofDays(1)) // doesn't matter
.expectNextCount(0)
.expectError()
.verify()
}
Could anybody help me fix the test, and ideally, explain what is wrong?

This is a limitation of virtual time and the way the clock is manipulated in StepVerifier. The thenAwait methods are not synchronized with the underlying scheduling (that happens for example as part of the retryBackoff operation). This means that the operator submits retry tasks at a point where the clock has already been advanced by one day. So the second retry is scheduled for + 1 day and 10 seconds, since the clock is at +1 day. After that, the clock is never advanced so the additional request is never made to MockWebServer.
Your case is made even more complicated in the sense that there is an additional component involved, the MockWebServer, that still works "in real time".
Though advancing the virtual clock is a very quick operation, the response from the MockWebServer still goes through a socket and thus has some amount of latency to the retry scheduling, which makes things more complicated from the test writing perspective.
One possible solution to explore would be to externalize the creation of the VirtualTimeScheduler and tie advanceTimeBy calls to the mockServer.takeRequest(), in a parallel thread.

Related

Micronaut ReadTimeoutException

I have a Grails 4 application providing a REST API. One of the endpoints sometimes fail with the following exception:
io.micronaut.http.client.exceptions.ReadTimeoutException: Read Timeout
at io.micronaut.http.client.exceptions.ReadTimeoutException.<clinit>(ReadTimeoutException.java:26)
at io.micronaut.http.client.DefaultHttpClient$10.exceptionCaught(DefaultHttpClient.java:1917)
at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:297)
at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:276)
at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:268)
at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireExceptionCaught(CombinedChannelDuplexHandler.java:426)
at io.netty.channel.ChannelHandlerAdapter.exceptionCaught(ChannelHandlerAdapter.java:92)
at io.netty.channel.CombinedChannelDuplexHandler$1.fireExceptionCaught(CombinedChannelDuplexHandler.java:147)
at io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:143)
at io.netty.channel.CombinedChannelDuplexHandler.exceptionCaught(CombinedChannelDuplexHandler.java:233)
at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:297)
at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:276)
at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:268)
at io.netty.handler.timeout.ReadTimeoutHandler.readTimedOut(ReadTimeoutHandler.java:98)
at io.netty.handler.timeout.ReadTimeoutHandler.channelIdle(ReadTimeoutHandler.java:90)
at io.netty.handler.timeout.IdleStateHandler$ReaderIdleTimeoutTask.run(IdleStateHandler.java:505)
at io.netty.handler.timeout.IdleStateHandler$AbstractIdleTask.run(IdleStateHandler.java:477)
at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)
at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:127)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:405)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:906)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:834)
The endpoint uses micronaut http client to call other systems. The remote system takes a very long time to respond, causing the ReadTimeOutException.
Here is the code calling the remote Service:
class RemoteTaskService implements GrailsConfigurationAware {
String taskStepperUrl
// initializes fields from configuration
void setConfiguration(Config config) {
taskStepperUrl = config.getProperty('services.stepper')
}
private BlockingHttpClient getTaskClient() {
HttpClient.create(taskStepperUrl.toURL()).toBlocking()
}
List<Map> loadTasksByProject(long projectId) {
try {
retrieveRemoteList("/api/tasks?projectId=${projectId}")
} catch(HttpClientResponseException e) {
log.error("Loading tasks of project failed with status: ${e.status.code}: ${e.message}")
throw new NotFoundException("No tasks found for project ${projectId}")
}
}
private List<Map> retrieveRemoteList(String path) {
HttpRequest request = HttpRequest.GET(path)
HttpResponse<List> response = taskClient.exchange(request, List) as HttpResponse<List>
response.body()
}
}
I've tried resolving it using the following configuration in my application.yml:
micronaut:
server:
read-timeout: 30
and
micronaut.http.client.read-timeout: 30
...with no success. Despite my configuration, the timeout still occurs around 10s after calling the endpoint.
How can I change the read timeout duration for the http rest client?
micronaut.http.client.read-timeout takes a duration, so you should add a measuring unit to the value, like 30s, 30m or 30h.
It seems that the configuration values are not injected in the manually created http clients.
A solution is to configure the HttpClient at creation, setting the readTimeout duration:
private BlockingHttpClient getTaskClient() {
HttpClientConfiguration configuration = new DefaultHttpClientConfiguration()
configuration.readTimeout = Duration.ofSeconds(30)
new DefaultHttpClient(taskStepperUrl.toURL(), configuration).toBlocking()
}
In my case I was streaming a file from a client as
#Get(value = "${service-path}", processes = APPLICATION_OCTET_STREAM)
Flowable<byte[]> fullImportStream();
so when I got this my first impulse was to increase the read-timeout value. Though, for streaming scenarios the property that applies is read-idle-timeout as stated in the docs https://docs.micronaut.io/latest/guide/configurationreference.html#io.micronaut.http.client.DefaultHttpClientConfiguration

Camel exception handling in Grails

I currently have exception handling being done in an abstract class that all my routes inherit. Something like this:
onException(SocketException,HttpOperationFailedException)
.handled(true)
.maximumRedeliveries(settings.maximumRedeliveries)
.redeliverDelay(settings.redeliverDelay)
.useCollisionAvoidance()
.collisionAvoidanceFactor(settings.collisionAvoidanceFactor)
.onRedelivery(redeliveryProcessor)
.log('retry failed, sending to the route failed coordinator')
.to(routeFailedCoordinator)
Now, I want to do some different things based on different response codes. For all codes other than 200, HttpOperationFailedException get's thrown. For 4XX codes, I want to send the message on to a failed queue and send an email, if enabled for that particular route. For all other errors, I want to go through the retry cycle. Here's what works for the 4XX errors:
onException(HttpOperationFailedException)
.handled(true)
.process { Exchange x ->
HttpOperationFailedException ex = x.getProperty(Exchange.EXCEPTION_CAUGHT, HttpOperationFailedException.class)
log.debug("Caught a HttpOperationFailedException: statusCode=${ex?.statusCode}")
ProducerTemplate producer = x.getContext().createProducerTemplate()
if (ex?.statusCode >= 400 && ex?.statusCode < 500) {
log.debug("Skipping retries ...")
producer.send(routeFailedEndpoint, x)
x.in.body = "Request:\n${x.in.body}\n\nResponse: ${ex.statusCode}\n${ex.responseBody}".toString()
if (sendFailedEmailEnabled)
producer.send('direct:routeFailedEmailHandler', x)
} else {
producer.send(routeFailedRetryEndpoint, x)
}
}.stop()
How do I add code for retrying like in the first code snippet? I tried using nested choice()...when()...otherwise() clauses and kept getting compile errors.
Anyone had to do something similar?
Here is my code with nested choice()..when()..otherwise() clauses:
onException(HttpOperationFailedException)
.handled(true)
.choice()
.when { Exchange x ->
HttpOperationFailedException ex = x.getProperty(Exchange.EXCEPTION_CAUGHT, HttpOperationFailedException.class)
log.debug("Caught a HttpOperationFailedException: statusCode=${ex?.statusCode}")
if (ex?.statusCode >= 400 && ex?.statusCode < 500) {
log.debug("Skipping retries ...")
x.in.body = "Request:\n${x.in.body}\n\nResponse: ${ex.statusCode}\n${ex.responseBody}".toString()
return true // don't retry
}
log.debug("Performing retries ...")
return false // do attempt retries
}.choice()
.when { !sendFailedEmailEnabled }.to(routeFailedEndpoint)
.otherwise()
.multicast().to(routeFailedEndpoint, 'direct:routeFailedEmailHandler').endChoice()
.otherwise()
.getParent().getParent().getParent()
.maximumRedeliveries(settings.maximumRedeliveries)
.redeliverDelay(settings.redeliverDelay)
.useCollisionAvoidance()
.collisionAvoidanceFactor(settings.collisionAvoidanceFactor)
.onRedelivery(redeliveryProcessor)
.to(routeFailedCoordinator)
You would have to have 2 onException blocks:
One onException with the redelivery settings for redelivery attempts
Another onException that handles the exception and send that email and what you want to do.
Use an onWhen on both onException blocks, to return true or false in either situation based on that http status code. The onWhen is executed by Camel to know which of the onException blocks to use (you can have more, but first to return true is used).
You can find more details on the Camel website, or in the Camel in Action book that has a full chapter devoted to error handling.
Thanks, Claus, you pointed me in the right direction.
Basically, as Claus said, use multiple onException blocks, each using an onWhen clause ...
onException(HttpOperationFailedException)
.onWhen(new Predicate() {
public boolean matches(Exchange exchange) {
HttpOperationFailedException ex = exchange.getProperty(Exchange.EXCEPTION_CAUGHT, HttpOperationFailedException.class)
log.debug("Caught an HttpOperationFailedException: statusCode=${ex?.statusCode}, processing 4XX error")
return (ex?.statusCode >= 400 && ex?.statusCode < 500)
}
}).handled(true)
.to(routeFailedEndpoint)
.choice()
.when { sendFailedEmailEnabled }.process(prepareFailureEmail).to('direct:routeFailedEmailHandler')
onException(HttpOperationFailedException)
.onWhen(new Predicate() {
public boolean matches(Exchange exchange) {
HttpOperationFailedException ex = exchange.getProperty(Exchange.EXCEPTION_CAUGHT, HttpOperationFailedException.class)
log.debug("Caught an HttpOperationFailedException: statusCode=${ex?.statusCode}, processing >=500 error")
return (ex?.statusCode >= 500)
}
}).handled(true)
.maximumRedeliveries(settings.maximumRedeliveries)
.redeliverDelay(settings.redeliverDelay)
.useCollisionAvoidance()
.collisionAvoidanceFactor(settings.collisionAvoidanceFactor)
.onRedelivery(redeliveryProcessor)
.to(routeFailedCoordinator)

run something async in Grails 2.3

In My Grails service, there is a part of a method I wish to run asynchronously.
Following, the doc for 2.3.x http://grails.org/doc/2.3.0.M1/guide/async.html
I do
public class MyService {
public void myMethod() {
Promise p = task {
// Long running task
}
p.onError { Throwable err ->
println "An error occured ${err.message}"
}
p.onComplete { result ->
println "Promise returned $result"
}
// block until result is called
def result = p.get()
}
}
However, I want to execute mine without any blocking. The p.get() method blocks. How do I execute the promise without any sort of blocking. I don't care if myMethod() returns, it is a kinda of fire and forget method.
So, according to the documentation if you don't call .get() or .waitAll() but rather just make use of onComplete you can run your task without blocking the current thread.
Here is a very silly example that I worked up in the console to as a proof of concept.
import static grails.async.Promises.*
def p = task {
// Long running task
println 'Off to do something now ...'
Thread.sleep(5000)
println '... that took 5 seconds'
return 'the result'
}
p.onError { Throwable err ->
println "An error occured ${err.message}"
}
p.onComplete { result ->
println "Promise returned $result"
}
println 'Just to show some output, and prove the task is running in the background.'
Running the above example gives you the following output:
Off to do something now ...
Just to show some output, and prove the task is running in the background.
... that took 5 seconds
Promise returned the result

WinJS.xhr Timeout Loses Requests?

What I'm trying to do (though I fully suspect there's a better way to do it) is to send HTTP requests to a range of hosts on my network. I can hit every host by calling WinJS.xhr in a loop. However, it takes too long to complete the range.
Inspecting in Fiddler shows that a dozen or so requests are sent at a time, wait to time out, and then move on to the next dozen or so. So I figured I'd try to reduce the timeout for each request. For my needs, if the host doesn't respond in 500 ms, it's not going to respond.
Following the documentation, I tried wrapping the call to WinJS.xhr in a call to WinJS.Promise.timeout with a small enough setting, but there was no change. Changing the promise timeout didn't really affect the actual request.
A little more searching led me to a suggestion whereby I could modify the XMLHttpRequest object that WinJS.xhr uses and set the timeout on that. This worked like a charm in terms of blasting out requests at a faster rate. However, there seems to be a side-effect.
Watching the requests in Fiddler, about a dozen or so fire off very quickly and then the whole thing ends. The "next dozen or so" never come. Sometimes (based on the semi-randomness of asynchronous calls) the first dozen or so that shows up in fiddler includes 9-10 from the low and of the range and 2-3 from the top end of the range, or close to it.
Is there something else I can try, or some other way to accomplish the end goal here? (Within the scope of this question the end goal is to send a large number of requests in a reasonable amount of time, but any suggestions on a better overall way to scan for a particular service on a network is also welcome.)
Can you write out the code you're using for timeout, i wrote something like this but it wasn't working, so I'm curious as to how you're doing it:
var timeoutFired = function () {
console.log("derp");
};
var options = {
url: "http://somesite.com",
responseType: "document",
customRequestInitializer: function (req) {
req.timeout = 1;
req.ontimeout = timeoutFired;
//do something with the XmlHttpRequest object req
}
};
WinJS.xhr(options).
....
Here are some alternatives that you may find helpful, not sure how/why timeout wasn't working but I tried to write out a custom timeout function:
(function (global) {
var options = {
url: "http://something.com",
responseType: "document",
};
var request = WinJS.xhr(options).then(
function (xmlHttpRequest) {
console.log("completed");
},
function (xmlHttpRequest) {
//error or cancel() will throw err
console.log("error"+ xmlHttpRequest.message);
},
function (xmlHttpRequest) {
console.log("progress")
});
function waitTime() {
return new WinJS.Promise(
function (complete, error, progress) {
var seconds = 0;
var interval = window.setInterval(
function () {
seconds++;
progress(seconds);
//prob should be called milliseconds
if (seconds > 5) {
window.clearInterval(interval);
complete();
}
}, 100);
});
};
waitTime().done(
function () {
console.log("complete");
request.cancel();
},
function () {
console.log("error")
},
function (seconds) {
console.log("progress:" + seconds)
});
});
Another cool little trick is using promise.any (vs .join) which fires off when one OR the other finishes first, so taking that into account you can write something like this:
(function (global) {
var options = {
url: "http://url.com",
responseType: "document",
};
var request = {
runRequest: function () {
return WinJS.xhr(options).then(
function (xmlHttpRequest) {
console.log("completed");
},
function (xmlHttpRequest) {
//error or cancel() will throw err
console.log("error" + xmlHttpRequest.message);
},
function (xmlHttpRequest) {
console.log("progress")
});
}
};
WinJS.Promise.any([WinJS.Promise.timeout(500), request.runRequest()]).done(
function () {
console.log("any complete");
});
})();

Blackberry: Make a iterative HTTP GET petition using Comms API

I want to store position coords (latitude, longitude) in a table in my MySQL DB querying a url in a way similar to this one: http://locationstore.com/postlocation.php?latitude=var1&longitude=var2 every ten seconds. PHP script works like a charm. Getting the coords in the device ain't no problem either. But making the request to the server is being a hard one. My code goes like this:
public class LocationHTTPSender extends Thread {
for (;;) {
try {
//fetch latest coordinates
coords = this.coords();
//reset url
this.url="http://locationstore.com/postlocation.php";
// create uri
uri = URI.create(this.url);
FireAndForgetDestination ffd = null;
ffd = (FireAndForgetDestination) DestinationFactory.getSenderDestination
("MyContext", uri);
if(ffd == null)
{
ffd = DestinationFactory.createFireAndForgetDestination
(new Context("MyContext"), uri);
}
ByteMessage myMsg = ffd.createByteMessage();
myMsg.setStringPayload("doesnt matter");
((HttpMessage) myMsg).setMethod(HttpMessage.POST);
((HttpMessage) myMsg).setQueryParam("latitude", coords[0]);
((HttpMessage) myMsg).setQueryParam("longitude", coords[1]);
((HttpMessage) myMsg).setQueryParam("user", "1");
int i = ffd.sendNoResponse(myMsg);
ffd.destroy();
System.out.println("Lets sleep for a while..");
Thread.sleep(10000);
System.out.println("woke up");
} catch (Exception e) {
// TODO Auto-generated catch block
System.out.println("Exception message: " + e.toString());
e.printStackTrace();
}
}
I haven't run this code to test it, but I would be suspicious of this call:
ffd.destroy();
According to the API docs:
Closes the destination. This method cancels all outstanding messages,
discards all responses to those messages (if any), suspends delivery
of all incoming messages, and blocks any future receipt of messages
for this Destination. This method also destroys any persistable
outbound and inbound queues. If Destination uses the Push API, this
method will unregister associated push subscriptions. This method
should be called only during the removal of an application.
So, if you're seeing the first request succeed (at least sometimes), and subsequent requests fail, I would try removing that call to destroy().
See the BlackBerry docs example for this here
Ok so I finally got it running cheerfully. The problem was with the transport selection; even though this example delivered WAP2 (among others) as an available transport in my device, running the network diagnostics tool showed only BIS as available. It also gave me the connection parameters that I needed to append at the end of the URL (;deviceside=false;ConnectionUID=GPMDSEU01;ConnectionType=mds-public). The code ended up like this:
for (;;) {
try {
coords.refreshCoordinates();
this.defaultUrl();
this.setUrl(stringFuncs.replaceAll(this.getUrl(), "%latitude%", coords.getLatitude() + ""));
this.setUrl(stringFuncs.replaceAll(this.getUrl(), "%longitude%", coords.getLongitude() + ""));
cd = cf.getConnection(this.getUrl());
if (cd != null) {
try {
HttpConnection hc = (HttpConnection)cd.getConnection();
final int i = hc.getResponseCode();
hc.close();
} catch (Exception e) {
}
}
//dormir
Thread.sleep(15000);
} catch (Exception e) {
} finally {
//cerrar conexiones
//poner objetos a null
}
Thanks for your help #Nate, it's been very much appreciated.

Resources