Make stream combination exhaust once one of its underlying streams are exhausted - stream

If I want to combine multiple same-typed streams into one, I would use Stream::select:
let combined = first_stream.select(second_stream)
However, once one of the streams is exhausted, the other can still produce results for the combined stream. What can I use to exhaust the combined stream once either of the underlying streams is exhausted?

Write your own stream combinator:
use futures::{Async, Poll, Stream}; // 0.1.25
struct WhileBoth<S1, S2>(S1, S2)
where
S1: Stream,
S2: Stream<Item = S1::Item, Error = S1::Error>;
impl<S1, S2> Stream for WhileBoth<S1, S2>
where
S1: Stream,
S2: Stream<Item = S1::Item, Error = S1::Error>,
{
type Item = S1::Item;
type Error = S1::Error;
fn poll(&mut self) -> Poll<Option<Self::Item>, Self::Error> {
match self.0.poll() {
// Return errors or ready values (including the `None`
// that indicates the stream is empty) immediately.
r # Err(_) | r # Ok(Async::Ready(_)) => r,
// If the first stream is not ready, try the second one.
Ok(Async::NotReady) => self.1.poll(),
}
}
}
See also:
How can I add new methods to Iterator?

Related

Guess Look behind caused a blank page on iOS device [duplicate]

I am looking for an alternative for this:
(?<=\.\d\d)\d
(Match third digit after a period.)
I'm aware I can solve it by using other methods, but I have to use a regular expression and more importantly I have to use replace on the string, without adding a callback.
Turn the lookbehind in a consuming pattern and use a capturing group:
And use it as shown below:
var s = "some string.005";
var rx = /\.\d\d(\d)/;
var m = s.match(/\.\d\d(\d)/);
if (m) {
console.log(m[1]);
}
Or, to get all matches:
const s = "some string.005 some string.006";
const rx = /\.\d\d(\d)/g;
let result = [], m;
while (m = rx.exec(s)) {
result.push(m[1]);
}
console.log( result );
An example with matchAll:
const result = Array.from(s.matchAll(rx), x=>x[1]);
EDIT:
To remove the 3 from the str.123 using your current specifications, use the same capturing approach: capture what you need and restore the captured text in the result using the $n backreference(s) in the replacement pattern, and just match what you need to remove.
var s = "str.123";
var rx = /(\.\d\d)\d/;
var res = s.replace(rx, "$1");
console.log(res);

How do you apply a Combine operator only after the first message has been received?

In Combine, using only the built-in operators, is there a way to skip an operator on the first value but then apply that operator for all subsequent values?
Consider the following:
publisher
.debounce(...)
.sink(...)
In this arrangement, debounce will wait for the specified timeout to elapse before passing on the value to sink. However, there are many times when you only want debounce to kick-in after the first element. For example, if the user is trying to filter a list of contacts, it's very possible that they only enter one letter into a text field. If that's the case, the application should probably start filtering immediately, without having to wait for the debounce to timeout.
I'm aware of the Drop publishers, but I can't seem to find a combination of them that will perform more of a "skip" operation such that the sink receives every value, but the debounce is ignored on the first value.
Something like the following:
publisher
.if_first_element_passthrough_to_sink(...), else_debounce(...)
.sink(...)
Is something like this possible with the built-in operators?
Clarification
Some clarification since my original posting wasn't as clear as it should have been... The answer provided by Asperi below is very close, but ideally the first element in a sequence is always delivered, then debounce would kick in.
Imagine the user is typing the following:
A B C ... (pauses typing for a few seconds) ... D ... (pauses) ... E F G
What I would like is:
A, D and E are delivered immediately.
B C is coalesced into just C using debounce
F G is coalesced into just G using debounce
If I correctly understood your needs it can be achieved based on Concatenate as like the following (in pseudo-code):
let originalPublisher = ...
let publisher = Publishers.Concatenate(
prefix: originalPublisher.first(),
suffix: originalPublisher.debounce(for: 0.5, scheduler: RunLoop.main))
.eraseToAnyPublisher()
so, prefix just sends first element downstream from original publisher and finished, afterwards suffix just pass all following elements using debounce.
In your particular case of debounce, you might prefer the behavior of throttle. It sends the first element immediately, and then sends no more than one element per interval.
Anyway, can you do it with Combine built-ins? Yes, with some difficulty. Should you? Maybe…
Here's a marble diagram of your goal:
Each time a value goes into the kennyc-debouncer, it starts a timer (represented by a shaded region). If a value arrives while the timer is running, the kennyc-debouncer saves the value and restarts the timer. When the timer expires, if any values arrived while the timer was running, the kennyc-debouncer emits the latest value immediately.
The scan operator allows us to keep state that we mutate each time an input arrives. We need to send two kinds of inputs into scan: the outputs from the upstream publisher, and timer firings. So let's define a type for those inputs:
fileprivate enum DebounceEvent<Value> {
case value(Value)
case timerFired
}
What kind of state do we need inside our scan transform? We definitely need the scheduler, the interval, and the scheduler options, so that we can set timers.
We also need a PassthroughSubject we can use to turn timer firings into inputs to the scan operator.
We can't actually cancel and restart a timer, so instead, when the timer fires, we'll see whether it should have been restarted. If so, we'll start another timer. So we need to know whether the timer is running, and what output to send when the timer fires, and the restart time for the timer if restarting is necessary.
Since scan's output is the entire state value, we also need the state to include the output value to send downstream, if any.
Here's the state type:
fileprivate struct DebounceState<Value, S: Scheduler> {
let scheduler: S
let interval: S.SchedulerTimeType.Stride
let options: S.SchedulerOptions?
let subject = PassthroughSubject<Void, Never>()
enum TimerState {
case notRunning
case running(PendingOutput?)
struct PendingOutput {
var value: Value
var earliestDeliveryTime: S.SchedulerTimeType
}
}
var output: Value? = nil
var timerState: TimerState = .notRunning
}
Now let's look at how to actually use scan with some other operators to implement the kennyc version of debounce:
extension Publisher {
func kennycDebounce<S: Scheduler>(
for dueTime: S.SchedulerTimeType.Stride,
scheduler: S,
options: S.SchedulerOptions? = nil
) -> AnyPublisher<Output, Failure>
{
let initialState = DebounceState<Output, S>(
scheduler: scheduler,
interval: dueTime,
options: options)
let timerEvents = initialState.subject
.map { _ in DebounceEvent<Output>.timerFired }
.setFailureType(to: Failure.self)
return self
.map { DebounceEvent.value($0) }
.merge(with: timerEvents)
.scan(initialState) { $0.updated(with: $1) }
.compactMap { $0.output }
.eraseToAnyPublisher()
}
}
We start by constructing the initial state for the scan operator.
Then, we create a publisher that turns the Void outputs of the state's PassthroughSubject into .timerFired events.
Finally, we construct our full pipeline, which has four stages:
Turn the upstream outputs (from self) into .value events.
Merge the value events with the timer events.
Use scan to update the debouncing state with the value and timer events. The actual work is done in an updated(with:) method we'll add to DebounceState below.
Map the full state down to just the value we want to pass downstream, and discard nulls (which happen when upstream events get suppressed by debouncing).
All that's left is to write the updated(with:) method. It looks at each incoming event's type (value or timerFired) and the state of the timer to decide what the new state should be and, if necessary, set a new timer.
extension DebounceState {
func updated(with event: DebounceEvent<Value>) -> DebounceState<Value, S> {
var answer = self
switch (event, timerState) {
case (.value(let value), .notRunning):
answer.output = value
answer.timerState = .running(nil)
scheduler.schedule(after: scheduler.now.advanced(by: interval), tolerance: .zero, options: options) { [subject] in subject.send() }
case (.value(let value), .running(_)):
answer.output = nil
answer.timerState = .running(.init(value: value, earliestDeliveryTime: scheduler.now.advanced(by: interval)))
case (.timerFired, .running(nil)):
answer.output = nil
answer.timerState = .notRunning
case (.timerFired, .running(.some(let pendingOutput))):
let now = scheduler.now
if pendingOutput.earliestDeliveryTime <= now {
answer.output = pendingOutput.value
answer.timerState = .notRunning
} else {
answer.output = nil
scheduler.schedule(after: pendingOutput.earliestDeliveryTime, tolerance: .zero, options: options) { [subject] in subject.send() }
}
case (.timerFired, .notRunning):
// Impossible!
answer.output = nil
}
return answer
}
}
Does it work? Let's test it:
import PlaygroundSupport
PlaygroundPage.current.needsIndefiniteExecution = true
let subject = PassthroughSubject<String, Never>()
let q = DispatchQueue.main
let start = DispatchTime.now()
let cfStart = CFAbsoluteTimeGetCurrent()
q.asyncAfter(deadline: start + .milliseconds(100)) { subject.send("A") }
// A should be delivered at start + 100ms.
q.asyncAfter(deadline: start + .milliseconds(200)) { subject.send("B") }
q.asyncAfter(deadline: start + .milliseconds(300)) { subject.send("C") }
// C should be delivered at start + 800ms.
q.asyncAfter(deadline: start + .milliseconds(1100)) { subject.send("D") }
// D should be delivered at start + 1100ms.
q.asyncAfter(deadline: start + .milliseconds(1800)) { subject.send("E") }
// E should be delivered at start + 1800ms.
q.asyncAfter(deadline: start + .milliseconds(1900)) { subject.send("F") }
q.asyncAfter(deadline: start + .milliseconds(2000)) { subject.send("G") }
// G should be delivered at start + 2500ms.
let ticket = subject
.kennycDebounce(for: .milliseconds(500), scheduler: q)
.sink {
print("\($0) \(((CFAbsoluteTimeGetCurrent() - cfStart) * 1000).rounded())") }
Output:
A 107.0
C 847.0
D 1167.0
E 1915.0
G 2714.0
I'm not sure why the later events are so delayed. It could just be playground side effects.

Split events based on criteria and handle in order

Having the following problem: given a list of events that have a partitionId property (0-10 for example), I'd like incoming events to be split according to the paritionId so that events with same partitionId are handled in order they're received.
With more or less even distribution, that would lead to 10 events (for each partition) being handled in parallel.
Besides creating 10 single-threaded dispatchers and sending the event to the right dispatcher, is there a way to accomplish the above using Project Reactor ?
Thanks.
The code below
splits source stream into partitions,
creates ParallelFlux, one "rail" per partition,
schedules "rails" into separate threads,
collects the results
Having dedicated thread for each partition guaranties its values are processed in original order.
#Test
public void partitioning() throws InterruptedException {
final int N = 10;
Flux<Integer> source = Flux.range(1, 10000).share();
// partition source into publishers
Publisher<Integer>[] publishers = new Publisher[N];
for (int i = 0; i < N; i++) {
final int idx = i;
publishers[idx] = source.filter(v -> v % N == idx);
}
// create ParallelFlux each 'rail' containing single partition
ParallelFlux.from(publishers)
// schedule partitions into different threads
.runOn(Schedulers.newParallel("proc", N))
// process each partition in its own thread, i.e. in order
.map(it -> {
String threadName = Thread.currentThread().getName();
Assert.assertEquals("proc-" + (it % 10 + 1), threadName);
return it;
})
// collect results on single 'rail'
.sequential()
// and on single thread called 'subscriber-1'
.publishOn(Schedulers.newSingle("subscriber"))
.subscribe(it -> {
String threadName = Thread.currentThread().getName();
Assert.assertEquals("subscriber-1", threadName);
});
Thread.sleep(1000);
}

Wrapping a sequence in a Stream in F#

I have a function that accepts a Stream. My data is in a large list, running into millions of items.
Is there a simple way I can wrap a sequence in a Stream, returning chunks of my sequence in the stream? One obvious approach is to implement my own stream class that returns chunks of the sequence. Something like :
type SeqStream(sequence:seq<'a>) =
inherit Stream()
default x.Read(buf, offset, count) =
// get next chunk
// yield chunk
Is there a simpler way of doing it? I don't have the means to change the target function that accepts a stream though.
I think that your approach looks good. The only problem is that Stream is a relatively complicated class that has quite a few members and you probably don't want to implement most of them - if you want to pass it to some code that uses some of the additional members, you'll need to make the implementation more complex. Anyway, a simple stream that implements only Read can look like this:
type SeqStream<'a>(sequence:seq<'a>, formatter:'a -> byte[]) =
inherit Stream()
// Keeps bytes that were read previously, but were not used
let temp = ResizeArray<_>()
// Enumerator for reading data from the sequence
let en = sequence.GetEnumerator()
override x.Read(buffer, offset, size) =
// Read next element and add it to temp until we have enough
// data or until we reach the end of the sequence
while temp.Count < size && en.MoveNext() do
temp.AddRange(formatter(en.Current))
// Copy data to the output & return count (may be less then
// required (at the end of the sequence)
let ret = min size temp.Count
temp.CopyTo(0, buffer, offset, ret)
temp.RemoveRange(0, ret)
ret
override x.Seek(offset, dir) = invalidOp "Seek"
override x.Flush() = invalidOp "Flush"
override x.SetLength(l) = invalidOp "SetLength"
override x.Length = invalidOp "Length"
override x.Position
with get() = invalidOp "Position"
and set(p) = invalidOp "Position"
override x.Write(buffer, offset, size) = invalidOp "Write"
override x.CanWrite = false
override x.CanSeek = false
override x.CanRead = true
Note that I added an additional parameter - a function to convert value of the generic type to a byte array. In general, it is difficult to convert anything to bytes (you could use some serialization), so this is probably easier. For example, for integers you can write:
let stream = new SeqStream<_>([ 1 .. 5 ], System.BitConverter.GetBytes)

How to use a sequence expression to return lines from a webstream on demand

This function is fine, but it doesn't do what I would like it to, but I hav eused it to make sure the use of objects is OK:
let getStreamData_ok (uri:string) =
let request = WebRequest.Create uri
use response = request.GetResponse()
use stream = response.GetResponseStream()
use reader = new StreamReader(stream)
while not reader.EndOfStream do
ignore <| reader.ReadLine()
I would like to connect to a stream and pull the file down one line at a time, on demand. This function doesn't work, I have tried shifting various lines in and out of the sequence expression without any success:
let getStreamData_OnDemand (uri:string) =
let request = WebRequest.Create uri
use response = request.GetResponse()
seq {
use stream = response.GetResponseStream()
use reader = new StreamReader(stream)
while not reader.EndOfStream do
yield reader.ReadLine()
}
Usage code:
let lines = getStreamData_OnDemand("http://stackoverflow.com/")
for line in lines do
ignore line
Thank you
This should work:
let getStreamData_OnDemand (uri:string) = seq {
let request = WebRequest.Create uri
use response = request.GetResponse()
use stream = response.GetResponseStream()
use reader = new StreamReader(stream)
while not reader.EndOfStream do
yield reader.ReadLine() }
The key difference compared to your second code snippet (the one that uses sequence expressions) is that everything is done inside a sequence expression. Most importantly, the use response = ... line is also enclosed in the sequence expression.
This is essential, because use in this case means that the response will be disposed only after the iteration over the sequence is completed. In your second code snippet, you would dispose response before anything is read from the returned sequence.
In your original snippet, it is disposed after getStreamData_OnDemand returns, but that's before you even started iterating over the sequence - so when you start iterating over the sequence, it is already disposed!

Resources