Reactor Netty - how to send with delayed Flux - project-reactor

In Reactor Netty, when sending data to TCP channel via out.send(publisher), one would expect any publisher to work. However, if instead of a simple immediate Flux we use a more complex one with delayed elements, then it stops working properly.
For example, if we take this hello world TCP echo server, it works as expected:
import reactor.core.publisher.Flux;
import reactor.netty.DisposableServer;
import reactor.netty.tcp.TcpServer;
import java.time.Duration;
public class Reactor1 {
public static void main(String[] args) throws Exception {
DisposableServer server = TcpServer.create()
.port(3344)
.handle((in, out) -> in
.receive()
.asString()
.flatMap(s ->
out.sendString(Flux.just(s.toUpperCase()))
))
.bind()
.block();
server.channel().closeFuture().sync();
}
}
However, if we change out.sendString to
out.sendString(Flux.just(s.toUpperCase()).delayElements(Duration.ofSeconds(1)))
then we would expect that for each received item an output will be produced with one second delay.
However, the way server behaves is that if it receives multiple items during the interval, it will produce output only for the first item. For example, below we type aa and bb during the first second, but only AA gets produced as output (after one second):
$ nc localhost 3344
aa
bb
AA <after one second>
Then, if we later type additional line, we get output (after one second) but from the previous input:
cc
BB <after one second>
Any ideas how to make send() work as expected with a delayed Flux?

I think you shouldn't recreate publisher for the out.sendString(...)
This works:
DisposableServer server = TcpServer.create()
.port(3344)
.handle((in, out) -> out
.options(NettyPipeline.SendOptions::flushOnEach)
.sendString(in.receive()
.asString()
.map(String::toUpperCase)
.delayElements(Duration.ofSeconds(1))))
.bind()
.block();
server.channel().closeFuture().sync();

Try to use concatMap. This works:
DisposableServer server = TcpServer.create()
.port(3344)
.handle((in, out) -> in
.receive()
.asString()
.concatMap(s ->
out.sendString(Flux.just(s.toUpperCase())
.delayElements(Duration.ofSeconds(1)))
))
.bind()
.block();
server.channel().closeFuture().sync();
Delaying on the incoming traffic
DisposableServer server = TcpServer.create()
.port(3344)
.handle((in, out) -> in
.receive()
.asString()
.timestamp()
.delayElements(Duration.ofSeconds(1))
.concatMap(tuple2 ->
out.sendString(
Flux.just(tuple2.getT2().toUpperCase() +
" " +
(System.currentTimeMillis() - tuple2.getT1())
))
))
.bind()
.block();

Related

When does reactor execute a subscription chain?

The reactor documentation states the following:
Nothing happens until you subscribe
If that was true, why do I see a java.lang.NullPointerException when I run the following code snippet, which has a reactor chain without a subscription?
#Test
void test() {
String a = null;
Flux.just(a.toLowerCase())
.doOnNext(System.out::println);
}
Deepak,
Nothing happens means the data will not be flowing through the chain of your functions to your consumers until a subscription happens.
You're getting NPE because Java tries to compute the value which is given to a hot operator just() on the Flux definition step.
You can also convert just() to a cold operator using defer() so you will receive NPE only after a subscription happened:
public Flux<String> test() {
String a = null;
return Flux.defer(() -> Flux.just(a.toLowerCase()))
.doOnNext(System.out::println);
}
Please, read more about hot vs hold operators.
Update:
Small example of cold and hot publishers. Each time new subscription happens cold publisher's body is recalculated. Meanwhile, just() is only producing time that was calculated only once at definition time.
Mono<Date> currentTime = Mono.just(Calendar.getInstance().getTime());
Mono<Date> realCurrentTime = Mono.defer(() -> Mono.just(Calendar.getInstance().getTime()));
// 1 sec sleep
Thread.sleep(1000);
currentTime.subscribe(time -> System.out.println("Current Time " + time.getTime()));
realCurrentTime.subscribe(time -> System.out.println("Real current Time " + time.getTime()));
Thread.sleep(2000);
currentTime.subscribe(time -> System.out.println("Current Time " + time.getTime()));
realCurrentTime.subscribe(time -> System.out.println("Real current Time " + time.getTime()));
The output is:
Current Time 1583788755759
Real current Time 1583788756826
Current Time 1583788755759
Real current Time 1583788758833

F# Giraffe: Different cache headers based on result

I am struggling with how to set different cache response headers based on whether the result is an Ok or an Error. My code is something like the following (but with other types in the result):
let resultToJson (result:Result<'a,string>) : HttpHandler =
match result with
| Ok o -> Successful.ok (json o)
| Error s -> ServerErrors.internalError (text s)
I can add the headers by doing something like the following:
let resultToJson (result:Result<'a,string>) : HttpHandler =
fun (next : HttpFunc) (ctx : HttpContext) ->
let response =
let headers = ctx.Response.Headers
match result with
| Ok o ->
headers.Add("Cache-Control", new StringValues("public, max-age=10, stale-while-revalidate=2"))
headers.Add("Vary", new StringValues("Origin"))
Successful.ok (json o)
| Error s ->
headers.Add("Cache-Control", new StringValues("no-cache"))
ServerErrors.internalError (text s)
response next ctx
But this does not feel right. I would like to use the standard HttpHandlers from the ResponseCaching module to set the right cache headers:
publicResponseCaching 10 (Some "Origin") // For Ok: Add 10 sec public cache, Vary by Origin
noResponseCaching // For Error: no caching
How do I achieve this?
The response cache handler is supposed to be piped into an normal pipeline. Your choice between Ok and Error is a choose function, so you can use a choose that takes a list of handlers that can be attempted. To reject a path, just return a task { return None }, to move forward, it's next ctx.
If you want to keep all the logic in one controller, like you have now, just keep your match and pipe your json/text response into one of the caching handlers.
let fn = json o >=> publicResponseCaching 30 None) in fn next ctx
if it's nested inside a hander, instead of in a pipeline, you have to apply the next & ctx
I found the solution to my problem.
Yes, I can chain the HttpHandlers like Gerard and Honza Brestan mentioned, using the fish operator (>=>). The reason I could not make that work in the first place was that I also had created a fish operator for the Result type in an opened module. Basically I had created proper fish soup
As soon as I refactored my code so that the module containing the Result fish operator was not open in this scope, everything worked fine as expected.
Another point to remember is that response caching needs to be called before the finalizing HttpHandler, otherwise it will not be called:
// Simplified code
let resultToJson =
function
| Ok o -> publicResponseCaching 10 (Some "Origin") >=> Successful.ok(json o)
| Error e -> noResponseCaching >=> ServerErrors.internalError(text e)

Apache Beam PubSubIO with GroupByKey

I'm trying with Apache Beam 2.1.0 to consume simple data (key,value) from google PubSub and group by key to be able to treat batches of data.
With default trigger my code after "GroupByKey" never fires (I waited 30min).
If I defined custom trigger, code is executed but I would like to understand why default trigger is never fired. I tried to define my own timestamp with "withTimestampLabel" but same issue. I tried to change duration of windows but same issue too (1second, 10seconds, 30seconds etc).
I used command line for this test to insert data
gcloud beta pubsub topics publish test A,1
gcloud beta pubsub topics publish test A,2
gcloud beta pubsub topics publish test B,1
gcloud beta pubsub topics publish test B,2
From documentation it says that we can do one or the other but not necessarily both
If you are using unbounded PCollections, you must use either
non-global windowing OR an aggregation trigger in order to perform a
GroupByKey or CoGroupByKey
It looks to be similar to
Consuming unbounded data in windows with default trigger
Scio: groupByKey doesn't work when using Pub/Sub as collection source
My code
static class Compute extends DoFn<KV<String, Iterable<Integer>>, Void> {
#ProcessElement
public void processElement(ProcessContext c) {
// Code never fires
System.out.println("KEY:" + c.element().getKey());
System.out.println("NB:" + c.element().getValue().spliterator().getExactSizeIfKnown());
}
}
public static void main(String[] args) {
Pipeline p = Pipeline.create(PipelineOptionsFactory.create());
p.apply(PubsubIO.readStrings().fromSubscription("projects/" + args[0] + "/subscriptions/test"))
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))))
.apply(
MapElements
.into(TypeDescriptors.kvs(TypeDescriptors.strings(), TypeDescriptors.integers()))
.via((String row) -> {
String[] parts = row.split(",");
System.out.println(Arrays.toString(parts)); // Code fires
return KV.of(parts[0], Integer.parseInt(parts[1]));
})
)
.apply(GroupByKey.create())
.apply(ParDo.of(new Compute()));
p.run();
}

Handling WebExceptions properly?

I have the following F# program that retrieves a webpage from the internet:
open System.Net
[<EntryPoint>]
let main argv =
let mutable pageData : byte[] = [| |]
let fullURI = "http://www.badaddress.xyz"
let wc = new WebClient()
try
pageData <- wc.DownloadData(fullURI)
()
with
| :? System.Net.WebException as err -> printfn "Web error: \n%s" err.Message
| exn -> printfn "Unknown exception:\n%s" exn.Message
0 // return an integer exit code
This works fine if the URI is valid and the machine has an internet connection and the web server responds properly etc. In an ideal functional programming world the results of a function would not depend on external variables not passed as arguments (side effects).
What I would like to know is what is the appropriate F# design pattern to deal with operations which might require the function to deal with recoverable external errors. For example if the website is down one might want to wait 5 minutes and try again. Should parameters like how many times to retry and delays between retries be passed explicitly or is it OK to embed these variables in the function?
In F#, when you want to handle recoverable errors you almost universally want to use the option or the Choice<_,_> type. In practice the only difference between them is that Choice allows you to return some information about the error while option does not. In other words, option is best when it doesn't matter how or why something failed (only that it did fail); Choice<_,_> is used when having information about how or why something failed is important. For example, you might want to write the error information to a log; or perhaps you want to handle an error situation differently based on why something failed -- a great use case for this is providing accurate error messages to help users diagnose a problem.
With that in mind, here's how I'd refactor your code to handle failures in a clean, functional style:
open System
open System.Net
/// Retrieves the content at the given URI.
let retrievePage (client : WebClient) (uri : Uri) =
// Preconditions
checkNonNull "uri" uri
if not <| uri.IsAbsoluteUri then
invalidArg "uri" "The URI must be an absolute URI."
try
// If the data is retrieved successfully, return it.
client.DownloadData uri
|> Choice1Of2
with
| :? System.Net.WebException as webExn ->
// Return the URI and WebException so they can be used to diagnose the problem.
Choice2Of2 (uri, webExn)
| _ ->
// Reraise any other exceptions -- we don't want to handle them here.
reraise ()
/// Retrieves the content at the given URI.
/// If a WebException is raised when retrieving the content, the request
/// will be retried up to a specified number of times.
let rec retrievePageRetry (retryWaitTime : TimeSpan) remainingRetries (client : WebClient) (uri : Uri) =
// Preconditions
checkNonNull "uri" uri
if not <| uri.IsAbsoluteUri then
invalidArg "uri" "The URI must be an absolute URI."
elif remainingRetries = 0u then
invalidArg "remainingRetries" "The number of retries must be greater than zero (0)."
// Try to retrieve the page.
match retrievePage client uri with
| Choice1Of2 _ as result ->
// Successfully retrieved the page. Return the result.
result
| Choice2Of2 _ as error ->
// Decrement the number of retries.
let retries = remainingRetries - 1u
// If there are no retries left, return the error along with the URI
// for diagnostic purposes; otherwise, wait a bit and try again.
if retries = 0u then error
else
// NOTE : If this is modified to use 'async', you MUST
// change this to use 'Async.Sleep' here instead!
System.Threading.Thread.Sleep retryWaitTime
// Try retrieving the page again.
retrievePageRetry retryWaitTime retries client uri
[<EntryPoint>]
let main argv =
/// WebClient used for retrieving content.
use wc = new WebClient ()
/// The amount of time to wait before re-attempting to fetch a page.
let retryWaitTime = TimeSpan.FromSeconds 2.0
/// The maximum number of times we'll try to fetch each page.
let maxPageRetries = 3u
/// The URI to fetch.
let fullURI = Uri ("http://www.badaddress.xyz", UriKind.Absolute)
// Fetch the page data.
match retrievePageRetry retryWaitTime maxPageRetries wc fullURI with
| Choice1Of2 pageData ->
printfn "Retrieved %u bytes from: %O" (Array.length pageData) fullURI
0 // Success
| Choice2Of2 (uri, error) ->
printfn "Unable to retrieve the content from: %O" uri
printfn "HTTP Status: (%i) %O" (int error.Status) error.Status
printfn "Message: %s" error.Message
1 // Failure
Basically, I split your code out into two functions, plus the original main:
One function that attempts to retrieve the content from a specified URI.
One function containing the logic for retrying attempts; this 'wraps' the first function which performs the actual requests.
The original main function now only handles 'settings' (which you could easily pull from an app.config or web.config) and printing the final results. In other words, it's oblivious to the retrying logic -- you could modify the single line of code with the match statement and use the non-retrying request function instead if you wanted.
If you want to pull content from multiple URIs AND wait for a significant amount of time (e.g., 5 minutes) between retries, you should modify the retrying logic to use a priority queue or something instead of using Thread.Sleep or Async.Sleep.
Shameless plug: my ExtCore library contains some things to make your life significantly easier when building something like this, especially if you want to make it all asynchronous. Most importantly, it provides an asyncChoice workflow and collections functions designed to work with it.
As for your question about passing in parameters (like the retry timeout and number of retries) -- I don't think there's a hard-and-fast rule for deciding whether to pass them in or hard-code them within the function. In most cases, I prefer to pass them in, though if you have more than a few parameters to pass in, you're better off creating a record to hold them all and passing that instead. Another approach I've used is to make the parameters option values, where the defaults are pulled from a configuration file (though you'll want to pull them from the file once and assign them to some private field to avoid re-parsing the configuration file each time your function is called); this makes it easy to modify the default values you've used in your code, but also gives you the flexibility of overriding them when necessary.

MailboxProcessor.PostAndReply design choice

Looking at:
member this.PostAndReply : (AsyncReplyChannel<'Reply> -> 'Msg) * ?int -> 'Reply
I can't figure out why the signature looks so counter-intuitive to me. What we want to do is posting a message to an agent, and wait for a reply.
Why do we have to give him a weird function as a 'message'?
See again this MSDN snippet:
let rec loop() =
printf "> "
let input = Console.ReadLine()
printThreadId("Console loop")
let reply = agent.PostAndReply(fun replyChannel -> input, replyChannel)
if (reply <> "Stopping.") then
printfn "Reply: %s" reply
loop()
else
()
loop()
I'd rather prefer something like this:
member this.PostAndReply : 'Msg * ?int -> 'Reply
Thanks
This type signature looks pretty confusing when you see it for the first time, but it does make sense.
The F# library design
The idea behind the is that when you call PostAndReply you need to give it a function that:
constructs a message of type 'Msg (to be sent to the agent)
after the F# runtime builds a channel for sending messages back to the caller (channels are represented as values of type AsyncReplyChannel<'Reply>).
The message that you construct needs to contain the reply channel, but the F# library does not know how you want to represent your messages (and so it does not know how you want to store the reply channel in the message). As a result, the library asks you to write a function that will construct the message for the agent after the system constructs the channel.
Your alternative suggestion
The problem with your suggestion is that if PostAndReply had a type 'Msg -> 'Reply, the message that the agent receives after it calls Receive would be of the following type:
'Msg * AsyncReplyChannel<'Reply>
... so every message received to the agent would also have to carry a channel for sending replies back. However, you probably don't want to send a reply back for every received message, so this wouldn't really work. Maybe you could use something like:
'Msg * option<AsyncReplyChannel<'Reply>>
... but that's just getting more complicated (and it still isn't quite right, because you can only reply to some messages from 'Msg, but not to all of them).

Resources