I am using Spray to query a REST endpoint which will return a largish amount of data with several items that should be processed. The data is a series of json objects. Is there a way to convert the response into a stream of these objects that doesn not require me to read the entire response into memory?
Reading the docs there is mention of "chunked responses", which seem to be along the lines of what I want. How do I use that in a spray-client pipeline?
I've just implemented something like this today, thanks to the excellent article found at http://boldradius.com/blog-post/VGy_4CcAACcAxg-S/streaming-play-enumerators-through-spray-using-chunked-responses.
Essentially, what you want to do is to get hold of the RequestContext in one of your Route definitions, and get a reference to its "responder" Actor. This is the Actor by which Spray sends responses back to the client that sent the original request.
To send back a chunked response, you have to signal that the response is starting, then send the chunks one by one, and then finally signal that the response has finished. You do this via the ChunkedResponseStart, MessageChunk, and ChunkedMessageEnd classes from spray.http package.
Essentially what I end up doing is sending a response as a series of these classes like this:
0) A bunch of imports to put into the class with your Routes in, and a case object:
import akka.actor.{Actor, ActorRef}
import spray.http._
import akka.actor.ActorRef
import akka.util.Timeout
import akka.pattern.ask
import spray.http.HttpData
import scala.concurrent.duration._
import scala.concurrent.{ExecutionContext, Future}
import akka.actor.{ActorContext, ActorRefFactory, Props}
import spray.http.{HttpData, ContentType}
import spray.routing.RequestContext
import scala.concurrent.ExecutionContext
import scala.concurrent.ExecutionContext.Implicits.global
import spray.json.RootJsonFormat
import spray.http.MediaTypes._
object Messages {
case object Ack
}
1) Get a hold of the requestContext from your Route:
path ("asdf") {
get { requestContext => {
... further code here for sending chunked response ...
}
}
2) Start the response (as a JSON envelope that'll hold the response data in a JSON array called "myJsonData" in this case):
responder.forward(ChunkedResponseStart(HttpResponse(entity = HttpEntity(`application/json`, """{"myJsonData": ["""))).withAck(Ack))
3) Iterate over your array of results, sending their JSONified versions to the response as elements in the JSON array, comma separated until the final element is sent - then no need for a trailing comma:
requestContext.responder.forward(MessageChunk(HttpData(myArray.toJson).withAck(Ack))
if (!lastElement) { // however you work this out in your code!
requestContext.responder.forward(MessageChunk(HttpData(",").withAck(Ack))
}
4) When there's nothing left to send, close the JSON envelope:
responder.forward(MessageChunk("]}").withAck(Ack))
and signal the end of the response:
responder.forward(ChunkedMessageEnd().withAck(Ack))
In my solution I have been working with Play Iteratees and Enumerators and so I have not included big chunks of code here because they are very much tied up with these mechanisms which may not be suitable for your needs. The point of the "withAck" call is that this will cause the responder to ask for an Acknowledgement message when the network signals that it's OK to accept more chunks. Ideally you would craft your code to wait for the return of the Ack message in the future before sending more chunks.
I hope that the above may give you a starter for ten at least, and as I say, these concepts are explained really well in the article I linked to!
Thanks,
Duncan
Related
I like basic explanations of complex concepts in reactor all over the web, they are not particularly useful in production code, so following piece of code I wrote which sends a message to kafka using reactor kafka + spring boot:
import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import reactor.core.publisher.Flux;
import reactor.core.publisher.Mono;
import reactor.kafka.sender.KafkaSender;
import reactor.kafka.sender.SenderOptions;
import reactor.kafka.sender.SenderRecord;
import reactor.kafka.sender.SenderResult;
import java.util.Properties;
public class CallbackSender {
private ObjectMapper objectMapper;
private String topic;
private static final Logger log = LoggerFactory.getLogger(CallbackSender.class.getName());
private final KafkaSender<String, String> sender;
public CallbackSender(ObjectMapper objectMapper, Properties senderProps, String topic) {
this.sender = KafkaSender.create(SenderOptions.create(senderProps));
this.objectMapper = objectMapper;
this.topic = topic;
}
public Mono<SenderResult<String>> sendMessage(ProcessContext<? extends AbstractMessage> processContext) throws JsonProcessingException {
ProducerRecord<String, String> producerRecord = new ProducerRecord<>(topic,
objectMapper.writeValueAsString(processContext.getMessage()));
SenderRecord<String, String, String> senderRecord = SenderRecord.create(producerRecord, processContext.getId());
return sender.send(Flux.just(senderRecord))
.doOnError(e -> log.error("Send failed", e))
.last();
}
}
What I can't grasp in my mind is what exactly is the difference between calling this.sendMessage as .map vs .flatMap from the outer pipeline, so what for the explanation that map applying synchronous transformation to the emitted element if my synchronous function is not really doing anything synchronous apart from basic fields fetch?
Here Kafka sender is already reactive and async , so it doesn't matter which one I use? Is that correct assumption?
Is my code non-idiomatic?
Or for this particular it would be just a safe wrap of everything I am doing inside .sendMessage in .flatMap in case someone would add synchronous code in future, i.e. sugar-safety syntax.
My understanding is that .map will simply prepare pipeline in this case which returns Mono, and subscriber for outer calling pipeline will trigger entire domino effect, is that correct?
What I can't grasp in my mind is what exactly is the difference between calling this.sendMessage as .map vs .flatMap from the outer pipeline
map() applies a synchronous function (i.e. one "in-place" with no subscriptions or callbacks) and just returns the result as is. flatMap() applies an asynchronous transformer function, and unwraps the Publisher when done. So:
My understanding is that .map will simply prepare pipeline in this case which returns Mono, and subscriber for outer calling pipeline will trigger entire domino effect, is that correct?
Yes, that's correct (if by "domino effect" you mean that the returning mono will be subscribed to and its result returned.)
so what for the explanation that map applying synchronous transformation to the emitted element if my synchronous function is not really doing anything synchronous apart from basic fields fetch?
Quite simply, because that's what you've told it to do. There's nothing inherently asynchronous about setting up a publisher, just its execution once it's been subscribed to (which doesn't happen with a map() call.)
I used to do something like this:
HttpResponse res = req.response;
String dataReceived;
await req.listen((List<int> buffer) {
dataReceived = new String.fromCharCodes(buffer);
}).asFuture();
Map data = JSON.decode(dataReceived);
When I needed UTF8 support, I modified it to:
Map data = JSON.decode(await new Utf8Codec().decodeStream(request));
Kevin Moore suggested to encode/decode like this:
https://dartpad.dartlang.org/1d229cfdc1c1fd2ab877
So I've got:
Map data;
await request.listen((List<int> buffer) {
data = JSON.fuse(UTF8).decode(buffer);
}).asFuture();
Not sure that I need the asFuture():
Map data;
await request.listen((List<int> buffer) => data = JSON.fuse(UTF8).decode(buffer));
Or do I? And this method requires that I encode it into bytes on the client side:
sendData: new JsonUtf8Encoder().convert({'model': message, 'authToken': app.authToken}))
What are the benefits of this? Isn't it more to send over the wire?
I believe Shelf and/or the new RPC lib would handle this stuff for me? Shall I move to one of those? Right now, it's all homegrown.
HttpRequest is a Stream<List<int>>. You don't want to use listen because you'll only get the first "chunk" of data.
Instead you'll want to do something like this:
import 'dart:async';
import 'dart:convert';
main() async {
var input = {'a':1, 'b':2};
var decoder = JSON.fuse(UTF8).decoder;
var json = await decoder.bind(toByteStream(input)).single;
print(json);
}
Stream<List<int>> toByteStream(json) =>
_encoder.bind(new Stream.fromIterable([json]));
final _encoder = new JsonUtf8Encoder();
https://dartpad.dartlang.org/9807d0c5ed89360c9f53
Yes as you can see on https://github.com/dart-lang/shelf/blob/master/lib/src/message.dart#L136 shelf defaults to UTF-8
I am likely biased but I would definitely recommend moving over to shelf. You have several options depending on what you prefer, like:
shelf_rpc as you mentioned. I haven't used it but likely full featured API support
shelf_bind if you simply want to bind a handler function parameter to a JSON body. This is lower level, more flexible and less prescriptive but does less. e.g.
router.post('/foo', (#RequestBody() Foo foo) => ...)
shelf_rest. Adds higher level more prescriptive API support (similar to shelf_rpc).
full frameworks like redstone, mojito etc. These do more for you but you need to buy into more
Had a chat w/ Kevin to better understand his answer, and thought it best to share my learnings as a new answer.
HttpRequest is always a Stream<List<int>> – a streamed list of integers. Those integers are bytecodes, and this is commonly referred to as a bytestream. You can be sure that no matter what API you use to send data over the wire, that it is sent as a bytestream.
The HttpRequest.request() method accepts sendData in several forms...
* If specified, `sendData` will send data in the form of a [ByteBuffer],
* [Blob], [Document], [String], or [FormData] along with the HttpRequest.
Source:
https://api.dartlang.org/apidocs/channels/stable/dartdoc-viewer/dart:html.HttpRequest#id_request
...but these are just abstractions, and ultimately your data is sent as a Stream<List<int>> bytestream.
So on the server we first set up a decoder that will decode both JSON and UTF8 (for correct char handling), and then we bind that to the HttpRequest request, which is a bytestream. I think single just serves to ensure we throw an exception if we received more than one data event. Here's all the code we need to interpret an HttpRequest:
import 'dart:async';
import 'dart:convert';
static handleRequest(HttpRequest request) async {
var decoder = JSON.fuse(UTF8).decoder;
var data = await decoder.bind(request).single;
print('The decoded data received is:\n\n$data');
}
I'm using OpenDDS v3.6, and trying to send a message to a specific DDS peer, one of many. In the IDL, the message structure looks like the following:
module Test
{
#pragma DCPS_DATA_TYPE "Test::MyMessage"
#pragma DCPS_DATA_KEY "Test::MyMessage dest_id"
struct MyMessage {
short dest_id;
string txt;
};
};
My understanding is that because the data key is unique, this is a new instance of the topic being written to, and any further msgs written w/ the same data key send to this specific instance of the topic. My send code is as follows:
DDS::ReturnCode_t ret;
Test::MyMessage msg;
// populate msg
msg.dest_id = n;
DDS::InstanceHandle_t handle;
handle = msg_writer->register_instance(msg);
ret = msg_writer->write(msg, handle);
So now I need to figure out how to get the receiving peer to read only from this topic instance and not receive all the other messages being sent to other peers. I started with the following, but not sure how to properly select a specific topic instance.
DDS::InstanceHandle_t instance;
status = msg_dr->take_next_instance(spec, si, 1, DDS::ANY_SAMPLE_STATE,
DDS::ANY_VIEW_STATE, DDS::ANY_INSTANCE_STATE);
Any help much appreciated.
The easiest way to achieve what you are looking for is by using a ContentFilteredTopic. This class is a specialization of the TopicDescription class and allows you to specify an expression (like a SQL WHERE-clause) of the samples that you are interested in.
Suppose you want your DataReader to only receive samples with dest_id equal to 42, then the corresponding code for creating the ContentFilteredTopic would look something like
DDS::ContentFilteredTopic_var cft =
participant->create_contentfilteredtopic("MyTopic-Filtered",
topic,
"dest_id = 42",
StringSeq());
From there on, you create your DataReader using cft as the parameter for the TopicDescription. The resulting reader will look like a regular DataReader, except that it only receives the desired samples and nothing else. Since the field dest_id happens to be the field that identifies the instance, the end result is that you will only have one instance in your DataReader.
You can check out the DDS specification (section 7.1.2.3.3) or OpenDDS Developer's Guide (section 5.2) for more details.
I send an entity from an iOS client and it is processed by the following backendAPI method:
#ApiMethod(name="dataInserter.insertData",path="insertData",httpMethod="post")
public Entity insertData(customEntity userInput){
ofy().save().entity(userInput).now();
return userInput;
}
customEntity is defined within customEntity.java as follows:
//Import Statements here
#Entity
public class customEntity {
#Id public String someID;
#Index String providedData;
}
After the above code runs, datastore contains the following entry:
ID/Name providedData
id=5034... <null>
If I add the following lines to my method:
customEntity badSoup=new customEntity();
badSoup.providedData="I am exhausted";
ofy().save().entity(badSoup).now();
I see the following in the datastore after I run the code:
ID/Name providedData
id=5034... I am exhausted
In a post almost similar to this one, the poster -- Drux -- concludes "...assignments to #Indexed properties only have actual effects on indices (and hence queries) if they are carried out directly with Objectify on the server (not indirectly on iOS clients and then passed to the server with Google Cloud Endpoints)." stickfigure then responds, "It sounds like what you're saying is 'cloud endpoints is not reconstituting your SomeEntity object correctly'. Objectify is not involved; it just saves whatever you give it."
It's hard to tell whether stickfigure is correct most especially given the fact that when I explore my API using Google's APIs Explorer, the same problem described above still occurs.
Is anyone able to explain what's causing this or is Drux's conclusion correct?
I am working on implementing Gatling tool to load test few of our RESTful web API methods,
for some reason I am not successful in parameterize my input data into the URI.
I am getting "i.g.h.a.AsyncHandlerActor" error..
it would be great if I'm able to see what exactly the end URI where the call was made.
below is my Scala code for one of the method to be load tested
import io.gatling.core.Predef._
import io.gatling.http.Predef._
import scala.concurrent.duration._
import java.util.concurrent.ThreadLocalRandom
class StockRun3 extends Simulation {
val httpConf = http
.baseURL("http://xxx.xxx.xx.xx:95/v9/stk")
.acceptHeader("application/json")
.authorizationHeader("appKeyToken=XXXXXXX&appKey=YYYYYYYYYY")
object Search {
val Datafeeder= csv("StockDataSource2.csv").random
val search = feed(Datafeeder)
.exec(http("Search")
.get("/availability")
.queryParam("""productIds""","""${product}""")
.queryParam("""ocationIds""","""${store}""")
)
.pause(1)
}
val users = scenario("Users").exec(Search.search)
setUp(
users.inject(nothingFor(4 seconds),
atOnceUsers(10),
rampUsers(10) over(60 seconds),
constantUsersPerSec(2) during(30 seconds))
).protocols(httpConf)
}
As explained in the documentation, lower the logging level in logback.xml.
Then, you have a typo in "ocationIds" queryParam.