Convert Stream<Stream<T>> to Stream<T>

Convert Stream<Stream<T>> to Stream<T> - dart

I am using rxdart package to handle stream in dart. I am stuck in handling a peculiar problem.
Please have a look at this dummy code:
final userId = BehaviorSubject<String>();
Stream<T> getStream(String uid) {
// a sample code that returns a stream
return BehaviorSubject<T>().stream;
}
final Observable<Stream<T>> oops = userId.map((uid) => getStream(uid));
Now I want to convert the oops variable to get only Observable<T>.
I am finding it difficult to explain clearly. But let me try. I have a stream A. I map each output of stream A to another stream B. Now I have Stream<Stream<B>> - a kind of recurrent stream. I just want to listen to the latest value produced by this pattern. How may I achieve this?

I will list several ways to flatten the Stream<Stream<T>> into single Stream<T>.
1. Using pure dart
As answered by #Irn, this is a pure dart solution:
Stream<T> flattenStreams<T>(Stream<Stream<T>> source) async* {
await for (var stream in source) yield* stream;
}
Stream<int> getStream(String v) {
return Stream.fromIterable([1, 2, 3, 4]);
}
void main() {
List<String> list = ["a", "b", "c"];
Stream<int> s = flattenStreams(Stream.fromIterable(list).map(getStream));
s.listen(print);
}
Outputs: 1 2 3 4 1 2 3 4 1 2 3 4
2. Using Observable.flatMap
Observable has a method flatMap that flattens the output stream and attach it to ongoing stream:
import 'package:rxdart/rxdart.dart';
Stream<int> getStream(String v) {
return Stream.fromIterable([1, 2, 3, 4]);
}
void main() {
List<String> list = ["a", "b", "c"];
Observable<int> s = Observable.fromIterable(list).flatMap(getStream);
s.listen(print);
}
Outputs: 1 2 3 4 1 2 3 4 1 2 3 4
3. Using Observable.switchLatest
Convert a Stream that emits Streams (aka a "Higher Order Stream") into a single Observable that emits the items emitted by the most-recently-emitted of those Streams.
This is the solution I was looking for! I just needed the latest output emitted by the internal stream.
import 'package:rxdart/rxdart.dart';
Stream<int> getStream(String v) {
return Stream.fromIterable([1, 2, 3, 4]);
}
void main() {
List<String> list = ["a", "b", "c"];
Observable<int> s = Observable.switchLatest(
Observable.fromIterable(list).map(getStream));
s.listen(print);
}
Outputs: 1 1 1 2 3 4

It's somewhat rare to have a Stream<Stream<Something>>, so it isn't something that there is much explicit support for.
One reason is that there are several (at least two) ways to combine a stream of streams of things into a stream of things.
Either you listen to each stream in turn, waiting for it to complete before starting on the next, and then emit the events in order.
Or you listen on each new stream the moment it becomes available, and then emit the events from any stream as soon as possible.
The former is easy to write using async/await:
Stream<T> flattenStreams<T>(Stream<Stream<T>> source) async* {
await for (var stream in source) yield* stream;
}
The later is more complicated because it requires listening on more than one stream at a time, and combining their events. (If only StreamController.addStream allowed more than one stream at a time, then it would be much easier). You can use the StreamGroup class from package:async for this:
import "package:async/async" show StreamGroup;
Stream<T> mergeStreams<T>(Stream<Stream<T>> source) {
var sg = StreamGroup<T>();
source.forEach(sg.add).whenComplete(sg.close);
// This doesn't handle errors in [source].
// Maybe insert
// .catchError((e, s) {
// sg.add(Future<T>.error(e, s).asStream())
// before `.whenComplete` if you worry about errors in [source].
return sg.stream;
}

If you want a Stream> to return a Stream, you basically need to flatten the stream.
The flatmap function is what you would use here
public static void main(String args[]) {
StreamTest st = new StreamTest<String>();
List<String> l = Arrays.asList("a","b", "c");
Stream s = l.stream().map(i -> st.getStream(i)).flatMap(i->i);
}
Stream<T> static getStream(T uid) {
// a sample code that returns a stream
return Stream.of(uid);
}
If you need the first object, then use the findFirst() method
public static void main(String args[]) {
StreamTest st = new StreamTest<String>();
List<String> l = Arrays.asList("a","b", "c");
String str = l.stream().map(i -> st.getStream(i)).flatMap(i->i).findFirst().get();
}

You need to call asyncExpand method on your stream - it lets you transform each element into a sequence of asynchronous events.

Related

Dart Streams - How to combine previous and next element?

I have a simple widget subscribed to a Stream of elements.
Each time a new element is received I would like to get also the previous element and decide which one of them pass downstream.
Currently I am using the map operator to store the previous element and calculate the next, like this:
elements.map((e) {
if (this.previous == null) {
this.previous = e;
return e;
}
final next = merge(this.previous, e);
this.previous = e;
return next;
}).listen(...);
How can I do this better and avoid having this.previous?

If you use the rxdart package there is an extension method called pairwise which according to the documentation:
Emits the n-th and n-1th events as a pair. The first event won't be emitted until the second one arrives.
Then you should be able to do something along the lines of this:
elements.pairwise().map((pair) => merge(pair.first, pair.last)).listen(...);

Here is one possibility
void main() {
List list = [12, 24, 48, 60];
list.reduce((value, element) {
print(value + element); // Push to another list maybe?
return element;
});
}
If you are working with a stream try this
void main() {
var counterStream = Stream<int>.periodic(const Duration(seconds: 1), (x) => x)
.reduce((previous, element) {
print(previous + element); // Push to another stream maybe?
return element;
});
}

What's the difference between async and async* in Dart?

I am making an application using flutter framework .
During this I came across with the keywords in Dart async and async*.
Can anybody tell me what's the difference between them?

Short answer
async gives you a Future
async* gives you a Stream.
async
You add the async keyword to a function that does some work that might take a long time. It returns the result wrapped in a Future.
Future<int> doSomeLongTask() async {
await Future.delayed(const Duration(seconds: 1));
return 42;
}
You can get that result by awaiting the Future:
main() async {
int result = await doSomeLongTask();
print(result); // prints '42' after waiting 1 second
}
async*
You add the async* keyword to make a function that returns a bunch of future values one at a time. The results are wrapped in a Stream.
Stream<int> countForOneMinute() async* {
for (int i = 1; i <= 60; i++) {
await Future.delayed(const Duration(seconds: 1));
yield i;
}
}
The technical term for this is asynchronous generator function. You use yield to return a value instead of return because you aren't leaving the function.
You can use await for to wait for each value emitted by the Stream.
main() async {
await for (int i in countForOneMinute()) {
print(i); // prints 1 to 60, one integer per second
}
}
Going on
Watch these videos to learn more, especially the one on Generators:
Isolates and Event Loops
Futures
Streams
async / await
Generators

Marking a function as async or async* allows it to use the async/await for a Future.
The difference between both is that async* will always return a Stream and offer some syntactical sugar to emit a value through the yield keyword.
We can therefore do the following:
Stream<int> foo() async* {
for (int i = 0; i < 42; i++) {
await Future.delayed(const Duration(seconds: 1));
yield i;
}
}
This function emits a value every second, which increments every time.

Solution, Origins and Insights
This answer includes simplified and easy to understand examples
async
The async computation cannot provide a result immediately when it is started because the program may need to wait for an external response like:
Reading a file
Querying a database
Fetching data from an API
Instead of blocking all computation until the result is available, the asynchronous computation immediately returns a Future object which will eventually "complete" with the result.
Example (This type of async call can be used only without returning a response):
void main() async {
// The next line awaits 5 seconds
await Future.delayed(Duration(seconds: 5));
// Pseudo API call that takes some time
await fetchStocks();
}
Future
A Future represents a computation that doesn’t complete immediately. Whereas a normal function returns the result, an asynchronous function returns a Future, which will
eventually contain the result. The Future will tell you when the result is ready.
Future is appended when the async function returns a value
Represents the result of a single computation (in contrast to a Stream)
Example:
Future<String> fetchUserOrder() =>
// Imagine that this function is more complex and slow.
Future.delayed(
const Duration(seconds: 2),
() => 'Large Latte',
);
void main(List<String> arguments) async {
var order = await fetchUserOrder();
// App awaits 2 seconds
print('Your $order is ready');
}
Stream
A source of asynchronous data events.
A Stream provides a way to receive a sequence of events. Each event is either a data event, also called an element of the stream.
Stream is a sequence of results
From stream you get notified for results (stream elements)
async* (streams)
async* is an asynchronous generator that returns a Stream object. Made to create streams.
An example of using a stream and async*:
// Creating a new stream with async*
// Each iteration, this stream yields a number
Stream<int> createNumberStream(int number) async* {
for (int i = 1; i <= number; i++) {
yield i;
}
}
void main(List<String> arguments) {
// Calling the stream generation
var stream = createNumberStream(5);
// Listening to Stream yielding each number
stream.listen((s) => print(s));
}
Result:
1
2
3
4
5
Bonus: Transforming an Existing Stream
If you already have a stream, you can transform it to a new stream based on the original stream’s events.
Example (same code as before but with a twist):
Stream<int> createNumberStream(int number) async* {
for (int i = 1; i <= number; i++) {
yield i;
}
}
// This part is taking a previous stream through itself and outputs updated values
// This code multiplies each number from the stream
Stream<int> createNumberDoubling(Stream<int> chunk) async* {
await for (final number in chunk) {
yield number*2;
}
}
void main(List<String> arguments) {
// Here we are Transforming the first stream through createNumberDoubling stream generator
var stream = createNumberDoubling(createNumberStream(5));
stream.listen((s) => print(s));
}
Result:
2
4
6
8
10
Solution
The async and async* are close relatives, they are even from the same library dart:async
The async represent a Future and a one-time exchange while the async* represents a Stream, a stream of multiple events

Async functions execute synchronously until they reach the await keyword. Therefore, all synchronous code within an async function body executes immediately.
Future<int> foo() async {
await Future.delayed(Duration(seconds: 1));
return 0;
}
Async* is used to create a function that returns a bunch of future values one at a time. Each result is wrapped in a Stream.
Stream<int> foo() async* {
for (var i = 0; i < 10; i++) {
await Future.delayed(Duration(seconds: 1));
yield i;
}
}

async* will always return a Stream
Stream<int> mainStream(int value) async* {
for (int i = 1; i <= value; i++) {
yield i;
}
}
async returns the result wrapped in the Future. So it might take longer time. See the below example:
void main() async {
// The next line awaits 10 seconds
await Future.delayed(Duration(seconds: 10));
}

Live monitoring using Apache Beam

I'd like to accomplish the following using Apache Beam:
calculate every 5 seconds the events that are read from pubsub in the last minute
The goal is to have a semi-realtime view on the rate data comes in. This can then be expanded towards more complex use cases afterwards.
After searching, I've not come across a way to solve this seemingly simple problem. Things that do not work:
global window + repeated triggers (triggers do not fire when there is no input)
sliding window + withoutDefaults (does not allow empty windows to be emitted apparently)
Any suggestion on how to solve this problem?

As already discussed, Beam does not emit data for empty windows. In addition to the reasons given by Rui Wang we can add the challenge of how the latter stages would handle those empty panes.
Anyway, the specific use case that you describe -monitoring rolling count of number of messages - should be possible with some work even if the metric falls down to zero eventually. One possibility would be to publish a steady number of dummy messages which would advance the watermark and fire the panes but are filtered out later within the pipeline. The problem with this approach is that the publishing source needs to be adapted and that might not always be convenient/possible. Another one would involve generating this fake data as another input and co-group it with the main stream. The advantage is that everything can be done in Dataflow without the need to tweak the source or the sink. To illustrate this I provide an example.
The inputs are divided in two streams. For the dummy one, I used GenerateSequence to create a new element every 5 seconds. I then window the PCollection (windowing strategy needs to be compatible with the one for the main stream so I will use the same). Then I map the element to a key-value pair where the value is 0 (we could use other values as we know from which stream the element comes but I want to evince that dummy records are not counted).
PCollection<KV<String,Integer>> dummyStream = p
.apply("Generate Sequence", GenerateSequence.from(0).withRate(1, Duration.standardSeconds(5)))
.apply("Window Messages - Dummy", Window.<Long>into(
...
.apply("Count Messages - Dummy", ParDo.of(new DoFn<Long, KV<String, Integer>>() {
#ProcessElement
public void processElement(ProcessContext c) throws Exception {
c.output(KV.of("num_messages", 0));
}
}));
For the main stream, that reads from Pub/Sub, I map each record to value 1. Later on, I will add all the ones as in typical word count examples using map-reduce stages.
PCollection<KV<String,Integer>> mainStream = p
.apply("Get Messages - Data", PubsubIO.readStrings().fromTopic(topic))
.apply("Window Messages - Data", Window.<String>into(
...
.apply("Count Messages - Data", ParDo.of(new DoFn<String, KV<String, Integer>>() {
#ProcessElement
public void processElement(ProcessContext c) throws Exception {
c.output(KV.of("num_messages", 1));
}
}));
Then we need to join them using a CoGroupByKey (I used the same num_messages key to group counts). This stage will output results when one of the two inputs has elements, therefore unblocking the main issue here (empty windows with no Pub/Sub messages).
final TupleTag<Integer> dummyTag = new TupleTag<>();
final TupleTag<Integer> dataTag = new TupleTag<>();
PCollection<KV<String, CoGbkResult>> coGbkResultCollection = KeyedPCollectionTuple.of(dummyTag, dummyStream)
.and(dataTag, mainStream).apply(CoGroupByKey.<String>create());
Finally, we add all the ones to obtain the total number of messages for the window. If there are no elements coming from dataTag then the sum will just default to 0.
public void processElement(ProcessContext c, BoundedWindow window) {
Integer total_sum = new Integer(0);
Iterable<Integer> dataTagVal = c.element().getValue().getAll(dataTag);
for (Integer val : dataTagVal) {
total_sum += val;
}
LOG.info("Window: " + window.toString() + ", Number of messages: " + total_sum.toString());
}
This should result in something like:
Note that results from different windows can come unordered (this can happen anyway when writing to BigQuery) and I did not play with the window settings to optimize the example.
Full code:
public class EmptyWindows {
private static final Logger LOG = LoggerFactory.getLogger(EmptyWindows.class);
public static interface MyOptions extends PipelineOptions {
#Description("Input topic")
String getInput();
void setInput(String s);
}
#SuppressWarnings("serial")
public static void main(String[] args) {
MyOptions options = PipelineOptionsFactory.fromArgs(args).withValidation().as(MyOptions.class);
Pipeline p = Pipeline.create(options);
String topic = options.getInput();
PCollection<KV<String,Integer>> mainStream = p
.apply("Get Messages - Data", PubsubIO.readStrings().fromTopic(topic))
.apply("Window Messages - Data", Window.<String>into(
SlidingWindows.of(Duration.standardMinutes(1))
.every(Duration.standardSeconds(5)))
.triggering(AfterWatermark.pastEndOfWindow())
.withAllowedLateness(Duration.ZERO)
.accumulatingFiredPanes())
.apply("Count Messages - Data", ParDo.of(new DoFn<String, KV<String, Integer>>() {
#ProcessElement
public void processElement(ProcessContext c) throws Exception {
//LOG.info("New data element in main output");
c.output(KV.of("num_messages", 1));
}
}));
PCollection<KV<String,Integer>> dummyStream = p
.apply("Generate Sequence", GenerateSequence.from(0).withRate(1, Duration.standardSeconds(5)))
.apply("Window Messages - Dummy", Window.<Long>into(
SlidingWindows.of(Duration.standardMinutes(1))
.every(Duration.standardSeconds(5)))
.triggering(AfterWatermark.pastEndOfWindow())
.withAllowedLateness(Duration.ZERO)
.accumulatingFiredPanes())
.apply("Count Messages - Dummy", ParDo.of(new DoFn<Long, KV<String, Integer>>() {
#ProcessElement
public void processElement(ProcessContext c) throws Exception {
//LOG.info("New dummy element in main output");
c.output(KV.of("num_messages", 0));
}
}));
final TupleTag<Integer> dummyTag = new TupleTag<>();
final TupleTag<Integer> dataTag = new TupleTag<>();
PCollection<KV<String, CoGbkResult>> coGbkResultCollection = KeyedPCollectionTuple.of(dummyTag, dummyStream)
.and(dataTag, mainStream).apply(CoGroupByKey.<String>create());
coGbkResultCollection
.apply("Log results", ParDo.of(new DoFn<KV<String, CoGbkResult>, Void>() {
#ProcessElement
public void processElement(ProcessContext c, BoundedWindow window) {
Integer total_sum = new Integer(0);
Iterable<Integer> dataTagVal = c.element().getValue().getAll(dataTag);
for (Integer val : dataTagVal) {
total_sum += val;
}
LOG.info("Window: " + window.toString() + ", Number of messages: " + total_sum.toString());
}
}));
p.run();
}
}

Another way to approach this problem is using a stateful DoFn with a looping Timer that triggers at each 5 second tick. This looping timer generates the default data necessary for the live monitoring, and ensures that each window has at least one event to process.
One issue with the approach described by https://stackoverflow.com/a/54543527/430128 is that, in a system with multiple keys, these "dummy" events need to be generated for every key.
See https://beam.apache.org/blog/looping-timers/. Option 1 and 2 in that article are an external heartbeat source and a generated source in the beam pipeline respectively. Option 3 is the looping timer.

Dartlang wait more than one future

I want to do something after a lot of future functions are done, but I do not know how to write the code in dart?
the code is like this:
for (var d in data) {
d.loadData().then()
}
// when all loaded
// do something here
but I don't want to wait for them one by one:
for (var d in data) {
await d.loadData(); // NOT NEED THIS
}
how to write those code in dart?

You can use Future.wait to wait for a list of futures:
import 'dart:async';
Future main() async {
var data = [];
var futures = <Future>[];
for (var d in data) {
futures.add(d.loadData());
}
await Future.wait(futures);
}
DartPad example

Existing answer gives enough information, but I want to add a note/warning.
As stated in the docs:
The value of the returned future will be a list of all the values that were produced in the order that the futures are provided by iterating futures.
So, that means that the example below will return 4 as the first element (index 0), and 2 as the second element (index 1).
import 'dart:async';
Future main() async {
print('start');
List<int> li = await Future.wait<int>([
fetchLong(), // longer (which gives 4) is first
fetchShort(), // shorter (which gives 2) is second
]);
print('results: ${li[0]} ${li[1]}'); // results: 4 2
}
Future<int> fetchShort() {
return Future.delayed(Duration(seconds: 3), () {
print('Short!');
return 2;
});
}
Future<int> fetchLong() {
return Future.delayed(Duration(seconds: 5), () {
print('Long!');
return 4;
});
}

If you want to wait for multiple futures of different types and also support null-safety then you can add a helper function similar to the following.
import 'package:tuple/tuple.dart';
Future<Tuple2<T1, T2>> waitConcurrently<T1, T2>(
Future<T1> future1, Future<T2> future2) async {
late T1 result1;
late T2 result2;
await Future.wait([
future1.then((value) => result1 = value),
future2.then((value) => result2 = value)
]);
return Future.value(Tuple2(result1, result2));
}
In order for this to work you need tuples.
At the moment Dart does not provide tuples natively, but there is a package from Google which does: https://pub.dev/packages/tuple

In addition, I'd like to supplement Günter Zöchbauer's answer with FutureOr variant. You'll need to convert your FutureOr<T> variable to Future<T> first and then call wait:
Future.wait(list.map((x) async => x))

What is the difference between Stream<List<int>> and Stream<int> in Dart

I am trying to wrap my head around Dart Streams. In particular this example of the command line utility cat has the following lines of code:
Stream<List<int>> stream = new File(path).openRead();
// Transform the stream using a `StreamTransformer`. The transformers
// used here convert the data to UTF8 and split string values into
// individual lines.
return stream
.transform(UTF8.decoder)
.transform(const LineSplitter())
.listen((line) {
if (showLineNumbers) {
stdout.write('${lineNumber++} ');
}
stdout.writeln(line);
}).asFuture().catchError((_) => _handleError(path));
The declaration of the Stream<T> as Stream<List<int>> has me a bit confused. Why is it not declared as a Stream<int>. How does the List<> type make this different. Are the subscriber events buffered in some way if it is a List?
What Type (as in <T>) is passed to the first transform? Is it an int or a List<int>?
What type is passed to each of the next transforms and what determines their type.
Does this example read the entire file before passing the results of the transform to the next transform? If so, is there an example somewhere of how to Stream very large files similar to this Node question Parsing huge logfiles in Node.js - read in line-by-line

Good question.
UTF8 is a Utf8Codec that extends Codec<String, List<int>>. So UTF8.decoder is a Converter<List<int>, String> that takes List<int> as parameter.
LineSplitter is a Converter<String, List<String>>. So it takes String as parameter. The resulting stream of .transform(const LineSplitter()) is a Stream<String> where each line is sent.
File.openRead doesn't read the entire file before writing the first bytes to the stream. So there's no problem to deal with large files.

Alexandre Ardhuin has the first three questions right. The 4th question however is not. After taking this apart and stubbing out my own version of the code I determined the following:
Even on a 37Mb file, the the transforms only get called once.
Here is the code I used to figure it out.
import 'dart:async';
import 'dart:convert';
import 'dart:io';
void main(List<String> arguments) {
Stream<List<int>> stream = new File('Data.txt').openRead();
stream
.transform(const Utf8InterceptDecoder())
.transform(const LineSplitterIntercept())
.listen((line) {
// stdout.writeln(line);
}).asFuture().catchError((_) => print(_));
}
int lineSplitCount = 0;
class LineSplitterIntercept extends LineSplitter {
const LineSplitterIntercept() : super();
// Never gets called
List<String> convert(String data) {
stdout.writeln("LineSplitterIntercept.convert : Data:" + data);
return super.convert(data);
}
StringConversionSink startChunkedConversion(ChunkedConversionSink<String> sink) {
stdout.writeln("LineSplitterIntercept.startChunkedConversion Count:"+lineSplitCount.toString()+ " Sink: " + sink.toString());
lineSplitCount++;
return super.startChunkedConversion(sink);
}
}
int utfCount = 0;
class Utf8InterceptDecoder extends Utf8Decoder {
const Utf8InterceptDecoder() : super();
//never gets called
String convert(List<int> codeUnits) {
stdout.writeln("Utf8InterceptDecoder.convert : codeUnits.length:" + codeUnits.length.toString());
return super.convert(codeUnits);
}
ByteConversionSink startChunkedConversion(ChunkedConversionSink<String> sink) {
stdout.writeln("Utf8InterceptDecoder.startChunkedConversion Count:"+ utfCount.toString() + " Sink: "+ sink.toString());
utfCount++;
return super.startChunkedConversion(sink);
}
}

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Convert Stream<Stream<T>> to Stream<T> - dart

You need to call asyncExpand method on your stream - it lets you transform each element into a sequence of asynchronous events.

Related

Dart Streams - How to combine previous and next element?

What's the difference between async and async* in Dart?

Live monitoring using Apache Beam

Dartlang wait more than one future

What is the difference between Stream<List<int>> and Stream<int> in Dart

Categories

Resources