Bidirectional gRPC stream sometimes stops processing responses after stopping and starting - ios

In short
We have a mobile app that streams fairly high volumes of data to and from a server through various bidirectional streams. The streams need to be closed on occasion (for example when the app is backgrounded). They are then reopened as needed. Sometimes when this happens, something goes wrong:
From what I can tell, the stream is up and running on the device's side (the status of both the GRPCProtocall and the GRXWriter involved is either started or paused)
The device sends data on the stream fine (the server receives the data)
The server seems to send data back to the device fine (the server's Stream.Send calls return as successful)
On the device, the result handler for data received on the stream is never called
More detail
Our code is heavily simplified below, but this should hopefully provide enough detail to indicate what we're doing. A bidirection stream is managed by a Switch class:
class Switch {
/** The protocall over which we send and receive data */
var protocall: GRPCProtoCall?
/** The writer object that writes data to the protocall. */
var writer: GRXBufferedPipe?
/** A static GRPCProtoService as per the .proto */
static let service = APPDataService(host: Settings.grpcHost)
/** A response handler. APPData is the datatype defined by the .proto. */
func rpcResponse(done: Bool, response: APPData?, error: Error?) {
NSLog("Response received")
// Handle response...
}
func start() {
// Create a (new) instance of the writer
// (A writer cannot be used on multiple protocalls)
self.writer = GRXBufferedPipe()
// Setup the protocall
self.protocall = Switch.service.rpcToStream(withRequestWriter: self.writer!, eventHandler: self.rpcRespose(done:response:error:))
// Start the stream
self.protocall.start()
}
func stop() {
// Stop the writer if it is started.
if self.writer.state == .started || self.writer.state == .paused {
self.writer.finishWithError(nil)
}
// Stop the proto call if it is started
if self.protocall?.state == .started || self.protocall?.state == .paused {
protocall?.cancel()
}
self.protocall = nil
}
private var needsRestart: Bool {
if let protocall = self.protocall {
if protocall.state == .notStarted || protocall.state == .finished {
// protocall exists, but isn't running.
return true
} else if writer.state == .notStarted || writer.state == .finished {
// writer isn't running
return true
} else {
// protocall and writer are running
return false
}
} else {
// protocall doesn't exist.
return true
}
}
func restartIfNeeded() {
guard self.needsRestart else { return }
self.stop()
self.start()
}
func write(data: APPData) {
self.writer.writeValue(data)
}
}
Like I said, heavily simplified, but it shows how we start, stop, and restart streams, and how we check whether a stream is healthy.
When the app is backgrounded, we call stop(). When it is foregrounded and we need the stream again, we call start(). And we periodically call restartIfNeeded(), eg. when screens that use the stream come into view.
As I mentioned above, what happens occasionally is that our response handler (rpcResponse) stops getting called when server writes data to the stream. The stream appears to be healthy (server receives the data we write to it, and protocall.state is neither .notStarted nor .finished). But not even the log on the first line of the response handler is executed.
First question: Are we managing the streams correctly, or is our way of stopping and restarting streams prone to errors? If so, what is the correct way of doing something like this?
Second question: How do we debug this? Everything we could think of that we can query for a status tells us that the stream is up and running, but it feels like the objc gRPC library keeps a lot of its mechanics hidden from us. Is there a way to see whether responses from server may do reach us, but fail to trigger our response handler?
Third question: As per the code above, we use the GRXBufferedPipe provided by the library. Its documentation advises against using it in production because it doesn't have a push-back mechanism. To our understanding, the writer is only used to feed data to the gRPC core in a synchronised, one-at-a-time fashion, and since server receives data from us fine, we don't think this is an issue. Are we wrong though? Is the writer also involved in feeding data received from server to our response handler? I.e. if the writer broke due to overload, could that manifest as a problem reading data from the stream, rather than writing to it?
UPDATE: Over a year after asking this, we have finally found a deadlock bug in our server-side code that was causing this behaviour on client-side. The streams appeared to hang because no communication sent by the client was handled by server, and vice-versa, but the streams were actually alive and well. The accepted answer provides good advice for how to manage these bi-directional streams, which I believe is still valuable (it helped us a lot!). But the issue was actually due to a programming error.
Also, for anyone running into this type of issue, it might be worth investigating whether you're experiencing this known issue where a channel gets silently dropped when iOS changes its network. This readme provides instructions for using Apple's CFStream API rather than TCP sockets as a possible fix for that issue.

First question: Are we managing the streams correctly, or is our way of stopping and restarting streams prone to errors? If so, what is the correct way of doing something like this?
From what I can tell by looking at your code, the start() function seems to be right. In the stop() function, you do not need to call cancel() of self.protocall; the call will be finished with the previous self.writer.finishWithError(nil).
needsrestart() is where it gets a bit messy. First, you are not supposed to poll/set the state of protocall yourself. That state is altered by itself. Second, setting those state does not close your stream. It only pause a writer, and if app is in background, pausing a writer is like a no-op. If you want to close a stream, you should use finishWithError to terminate this call, and maybe start a new call later when needed.
Second question: How do we debug this?
One way is to turn on gRPC log (GRPC_TRACE and GRPC_VERBOSITY). Another way is to set breakpoint at here where gRPC objc library receives a gRPC message from the server.
Third question: Is the writer also involved in feeding data received from server to our response handler?
No. If you create a buffered pipe and feed that as request of your call, it only feed data to be sent to server. The receiving path is handled by another writer (which is in fact your protocall object).
I don't see where the usage of GRXBufferedPipe in production is discouraged. The known drawback about this utility is that if you pause the writer but keep writing data to it with writeWithValue, you end up buffering a lot of data without being able to flush them, which may cause memory issue.

Related

Distinct Stream in Dart

I'm writing a flutter app which sends commands via BlueTooth (FlutterBlue) to a device. The device controlls some LEDs.
The communication is working in general quite well but:
On the UI I have a slider controlling the light intensity. When I pull the slider there are more values generated than the bluetooth backend can handle.
In my first implementation I was sending the data directly to the bluetooth characteristic, resulting in exceptions from the bluetooth backend and some values get lost. It's hard to fade light down to zero.
In my second approach I'm using a stream and an await for loop to send the data. Now all values are send without any exceptions but it takes several seconds after releasing the slider until all values are send. Since I want direct visual feedback on the LEDs, this is not an option.
Since there are multiple commands of the same type to be send, I can skip all commands of the same type which were added while the bluetooth send routine was processing a write event.
I saw that there is a Stream.Distinct method but: It returns a new stream. So I have to exit my await for loop and handle the new stream.
Is there a way of removing undesired events from an existing stream without creating a new stream where I have to listen to?
Here is what I'm doing:
class MyBlueToothDevice {
BluetoothDevice _device;
List<BluetoothCharacteristic> _characteristics =
List<BluetoothCharacteristic>();
final _sendStream = StreamController<Tuple2<SendCommands, List<int>>>();
MyBlueToothDevice(this._device) {
_writeNext();
}
Future<void> write(SendCommands command, List<int> value) async {
if (isConnected) {
_sendStream.add(Tuple2<SendCommands, List<int>>(command, value));
// await _characteristics[command.index].write(value).catchError((value) {
// print("Characteristics.write error: $value");
// });
}
}
Future<void> _writeNext() async {
await for (var tuple in _sendStream.stream) {
await _characteristics[tuple.item1.index]
.write(tuple.item2)
.catchError((value) {
print("Characteristics.write error: $value");
});
}
}
}
The best solution is to use application state management to receive all the events from your slider. The state manager will then rate-limit the messages to the device to something it can handle, and also ensure that the most recent message is not lost.
A very basic solution would receive the slider value and update the value in the state manager. A periodic timer with a suitable rate could then update that value to the device; possibly only if the value actually changed since the last time it was sent.

Are WebSocket messages cached on iOS?

Strange but I cannot find any information on that: if I write a [large] message to the WebSocket stream on iOS and the execution gets back to my code, is the message already sent or somehow buffered?
I'm using Starscream library but it just uses CFStream-s.
Looking at the source code for the Starscream library mentioned, the library appends the send operation to a NSOperation queue:
private func dequeueWrite(..) {
...
writeQueue.addOperation(operation)
}
and then immediately returns.
So when the one of the send methods returns, for example:
open func write(data: Data, completion: (() -> ())? = nil)
The message will not yet have been sent.
But as you can see you can pass a completion block to this method, that will be called when the whole message has been written to the underlying output stream. Note that this doesn't tell you anything about whether the message has actually been sent on the network, or if the sender has received it successfully.
To know if the sender has received and processed the message successfully, you need to wait for a response message - that is something you need to define in your application protocol.
Before using the Starscream library in production, you might want to report/fix some issues in it. While reviewing the send mechanism I noticed that if the OutputStream buffer is full on WebSocket.swift line 1254 the library tries sending the rest of the buffer in a busy loop rather than waiting for a hasSpaceAvailable event. This may waste a lot of CPU cycle if you send a large message.
Also, it looks like the case when stream.write returns 0, indicating that the output buffer is full, is incorrectly handled as an error.
Probably it use
func CFWriteStreamWrite(_ stream: CFWriteStream!,
_ buffer: UnsafePointer<UInt8>!,
_ bufferLength: CFIndex) -> CFIndex
the write call return "The number of bytes successfully written, 0 if the stream has been filled to capacity (for fixed-length streams), or -1 if either the stream is not open or an error occurs."
So yes, they are buffered. But I think that is the only option, a write function need to have the buffer because every socket have a max buffer zsize

Dart: Do I have to cancel Stream subscriptions and close StreamSinks?

I know I have to cancel Stream Subscriptions when I no longer want to receive any events.
Do I have to this even after I receive a 'Done' event? Or do I get memory leaks?
What happens to Streams that are passed to addStream of another Stream? Are they automatically canceled?
Same Question on the StreamSink side do I have to close them if the stream is already done?
Short-answer: no, but you should. Nothing in the contract of either StreamSubscription or StreamSink requires closing the resources, but some use cases can lead to memory leaks if you don't close them, even though in some cases, doing so might be confusing. Part of the confusion around these classes is that they are overloaded, and handle two fairly distinct use cases:
Resource streams (like file I/O, network access)
Event streams (like click handlers)
Let's tackle these subjects one at a time, first, StreamSubscription:
StreamSubscription
When you listen to a Stream, you receive a StreamSubscription. In general, when you are done listening to that Stream, for any reason, you should close the subscription. Not all streams will leak memory if choose not to, but, some will - for example, if you are reading input from a file, not closing the stream means the handle to the file may remain open.
So, while not strictly required, I'd always cancel when done accessing the stream.
StreamSink
The most common implementation of StreamSink is StreamController, which is a programmatic interface to creating a Stream. In general, when your stream is complete (i.e. all data emitted), you should close the controller.
Here is where it gets a little confusing. Let's look at those two cases:
File I/O
Imagine you were creating an API to asynchronously read a File line-by-line:
Stream<String> readLines(String path);
To implement this, you might use a StreamController:
Stream<String> readLines(String path) {
SomeFileResource someResource;
StreamController<String> controller;
controller = new StreamController<String>(
onListen: () {
someResource = new SomeFileResource(path);
// TODO: Implement adding to the controller.
},
);
return controller.stream;
}
In this case, it would make lots of sense to close the controller when the last line has been read. This gives a signal to the user (a done event) that the file has been read, and is meaningful (you can close the File resource at that time, for example).
Events
Imagine you were creating an API to listen to news articles on HackerNews:
Stream<String> readHackerNews();
Here it makes less sense to close the underlying sink/controller. Does HackerNews ever stop? Event streams like this (or click handlers in UI programs) don't traditionally "stop" without the user accessing for it (i.e cancelling the StreamSubscription).
You could close the controller when you are done, but it's not required.
Hope that makes sense and helps you out!
I found in my case that if I have code like this:
Stream<String> readHackerNews(String path) {
StreamController<String> controller = StreamController<String>();
......
return controller.stream;
}
I see a warning message "Close instance of 'dart.core.Sink'." in the Visual Studio Code.
In order to fix this warning I added
controller.close()
to the event handler for the OnCancel event, see below:
Stream<String> readHackerNews(String path) {
StreamController<String> controller = StreamController<String>();
//TODO: your code here
controller.onCancel = () {
controller.close();
};
return controller.stream;
}
Hope this helps!

Are these two Observable Operations Equivalent?

I'm not sure why, but for some reason when using the observable that is created via concat I will always get all values that are pushed from my list (works as intended). Where as with the normal subscribe it seems that some values never make it to those who have subscribed to the observable (only in certain conditions).
These are the two cases that I am using. Could anyone attempt to explain why in certain cases when subscribing to the second version not all values are received? Are they not equivalent? The intent here is to rewind the stream. What are some reasons that could explain why Case 2 fails while Case 1 does not.
Replay here is just a list of the ongoing stream.
Case 1.
let observable =
Observable.Create(fun (o:IObserver<'a>) ->
let next b =
for v in replay do
o.OnNext(v.Head)
o.OnNext(b)
o.OnCompleted()
someOtherObs.Subscribe(next, o.OnError, o.OnCompleted))
let toReturn = observable.Concat(someOtherObs).Publish().RefCount()
Case 2.
let toReturn =
Observable.Create(fun (o:IObserver<'a>) ->
for v in replay do
o.OnNext(v.Head)
someOtherObs.Subscribe(o)
).Publish().RefCount()
Caveat! I don't use F# regularly enough to be 100% comfortable with the syntax, but I think I see what's going on.
That said, both of these cases look odd to me and it greatly depends on how someOtherObs is implemented, and where (in terms of threads) things are running.
Case 1 Analysis
You apply concat to a source stream which appears to work like this:
It subscribes to someOtherObs, and in response to the first event (a) it pushes the elements of replay to the observer.
Then it sends event (a) to the observer.
Then it completes. At this point the stream is finished and no further events are sent.
In the event that someOtherObs is empty or just has a single error, this will be propagated to the observer instead.
Now, when this stream completes, someOtherObs is concatenated on to it. What happens now is a little unpreditcable - if someOtherObs is cold, then the first event would be sent a second time, if someOtherObs is hot, then the first event is not resent, but there's a potential race condition around which event of the remainder will go next which depends on how someOtherObs is implemented. You could easily miss events if it's hot.
Case 2 Analysis
You replay all the replay events, and then send all the events of someOtherObs - but again there's a race condition if someOtherObs is hot because you only subscribe after pushing replay, and so might miss some events.
Comments
In either case, it seems messy to me.
This looks like an attempt to do a merge of a state of the world (sotw) and a live stream. In this case, you need to subscribe to the live stream first, and cache any events while you then acquire and push the sotw events. Once sotw is pushed, you push the cached events - being careful to de-dupe events that may been read in the sotw - until you are caught up with live at which point you can just pass live events though.
You can often get away with naive implementations that flush the live cache in an OnNext handler of the live stream subscription, effectively blocking the source while you flush - but you run the risk of applying too much back pressure to the live source if you have a large history and/or a fast moving live stream.
Some considerations for you to think on that will hopefully set you on the right path.
For reference, here is an extremely naïve and simplistic C# implementation I knocked up that compiles in LINQPad with rx-main nuget package. Production ready implementations I have done in the past can get quite complex:
void Main()
{
// asynchronously produce a list from 1 to 10
Func<Task<List<int>>> sotw =
() => Task<List<int>>.Run(() => Enumerable.Range(1, 10).ToList());
// a stream of 5 to 15
var live = Observable.Range(5, 10);
// outputs 1 to 15
live.MergeSotwWithLive(sotw).Subscribe(Console.WriteLine);
}
// Define other methods and classes here
public static class ObservableExtensions
{
public static IObservable<TSource> MergeSotwWithLive<TSource>(
this IObservable<TSource> live,
Func<Task<List<TSource>>> sotwFactory)
{
return Observable.Create<TSource>(async o =>
{
// Naïve indefinite caching, no error checking anywhere
var liveReplay = new ReplaySubject<TSource>();
live.Subscribe(liveReplay);
// No error checking, no timeout, no cancellation support
var sotw = await sotwFactory();
foreach(var evt in sotw)
{
o.OnNext(evt);
}
// note naive disposal
// and extremely naive de-duping (it really needs to compare
// on some unique id)
// we are only supporting disposal once the sotw is sent
return liveReplay.Where(evt => !sotw.Any(s => s.Equals(evt)))
.Subscribe(o);
});
}
}

Dart Web Server: prevent crash

Id'like to develop a web services + web sockets server using dart but the problem is I can't ensure the server's high availability because of uncatched exceptions in isolates.
Of course, I have try-catched my main function, but this is not enough.
If an exception occurs in the then() part of a future, the server will crash.
Which means that ONE flawd request can put the server down.
I realize that this is an open issue but is there any way to acknoledge any crash WITHOUT crashing the VM so that the server can continue serving other requests ?
Thank you.
What I've done in the past is use the main isolate to launch a child isolate which hosts the actual web server. When you launch an isolate, you can pass in an "uncaught exception" handler to the child isolate (I also think you should be able to register one at the top-level as well, to prevent this particular issue, as referenced by the issue in the original question).
Example:
import 'dart:isolate';
void main() {
// Spawn a child isolate
spawnFunction(isolateMain, uncaughtExceptionHandler);
}
void isolateMain() {
// this is the "real" entry point of your app
// setup http servers and listen etc...
}
bool uncaughtExceptionHandler(ex) {
// TODO: add logging!
// respawn a new child isolate.
spawnFunction(isolateMain, uncaughtException);
return true; // we've handled the uncaught exception
}
Chris Buckett gave you a good way to restart your server when it fails. However, you still don't want your server to go down.
The try-catch only works for synchronous code.
doSomething() {
try {
someSynchronousFunc();
someAsyncFunc().then(() => print('foo'));
} catch (e) {
// ...
}
}
When your async method completes or fails, it happens "long" after the program is done with the doSomething method.
When you write asynchronous code, it's generally a good idea to start a method by returning a future:
Future doSomething() {
return new Future(() {
// your code here.
var a = b + 5; // throws and is caught.
return someAsyncCall(); // Errors are forwarded if you return the Future directly.
});
}
This ensures that if you have code that throws, it catches them and the caller can then catchError() them.
If you write this way, you have much less crashes, assuming that you have some error handling at the top level at least.
Whenever you are calling a method that returns a Future, either return it directly (like shown above) or catchError() for it so that you are handling the possible errors locally.
There's a great lengthy article on the homepage that you should read.

Resources