I am implementing one tokenizer. It parses the document, tokenizes it on a set of possible delimiters and then provides me combination of 1-, 2- and 3-word tokens.
I was able to achieve my goal, but only in one specific way:
Stream<String> contentStr = file.openRead().transform(utf8.decoder);
Stream<String> tokens = contentStr.transform(charSplitter).transform(tokenizer).asBroadcastStream();
var twoWordTokens = tokens.transform(sliding(2));
var threeWordTokens = tokens.transform(sliding(3));
StreamController<String> merger = StreamController();
tokens.forEach((token) => merger.add(token));
threeWordTokens.forEach((token) => merger.add(token));
twoWordTokens.forEach((token) => merger.add(token));
merger.stream.forEach(print);
As you can see I do following:
broadcast original stream of tokens
transform it to 2 additional streams by sliding window transformation
create a StreamConsumer (StreamController to be precise) and pump every event from 3 streams to that stream consumer.
then I print every element of the stream consumer to test
It works but I don't like that I add each element from source streams via StreamConsumer.add method. I wanted to use StreamController.addStream instead but that somehow does not work.
The following code gives me a Bad state: Cannot add event while adding a stream error and I understand why:
StreamController<String> merger = StreamController();
merger.addStream(tokens);
merger.addStream(twoWordTokens);
merger.addStream(threeWordTokens);
merger.stream.forEach(print);
This is per API documentation of the StreamController.addStream.
So I need to wait for each addStream returning future completion:
StreamController<String> merger = StreamController();
await merger.addStream(tokens);
await merger.addStream(twoWordTokens);
await merger.addStream(threeWordTokens);
await merger.stream.forEach(print);
But in this case I get nothing printed in the console.
If I do this:
StreamController<String> merger = StreamController();
merger.stream.forEach(print);
await merger.addStream(tokens);
await merger.addStream(twoWordTokens);
await merger.addStream(threeWordTokens);
Then only the 1-word tokens, i.e. elements of the original broadcast stream are printed. Elements of the derived streams are not.
I kind of understand why this happens, because all my streams are derived from the original broadcast stream.
Is there a better way to implement such a pipeline?
Probably my problem can be reformulated in terms of stream duplication/forking, but I can't see a way to clone a stream in Dart. If you can advice on that - please do.
I hope to allow concurrent addStream at some point, but until then, you need to handle the events indpendently:
var allAdds = [
tokens.forEach(merger.add),
twoWordTokens.forEach(merger.add),
threeWordTokens.forEach(merger.add)];
Future.wait(allAdds).then((_) { merger.close(); });
merger.stream.forEach(print);
That's if you want to control everything yourself. You can also use the StreamGroup class from package:async. It collects a number of streams and emits their events as a single stream.
This assumes that you have no error events.
I've been struggling with this for couple of hours and can't seem tof ind a solution. I'll appreciate any help.
I building a flutter app trying to follow the BLOC pattern.
I have a widget with two text fields: Country code and PhoneNumber.
I have define a Bloc with two Sink (one for each field) and a Stream with a state. The Stream State is a merge of the two sink such as:
factory AuthenticationBloc(AuthenticationRepository authRepository) {
final onPhoneChanged = PublishSubject<String>();
final onCountryChanged = PublishSubject<String>();
//oldState would be the last entry in the stream state.
final stateChangeFromThePhone = onPhoneChanged.map((newPhone)=>oldState.copyWith(phone:newPhone));
final stateChangeFromtheCountry = onCountryChanged.map((newCountry)=>oldState.copyWith(country:newCountry));
final state = Observable.merge[stateChangeFromThePhone, stateChangeFromtheCountry];
}
This is pseudo code but the idea is there. My question is how can I get access to the latest event from the state stream represented in the code by oldState?
I could define a variable on which I store this value on each new event in the state stream but looks ugly... :(
Any advice?
What would be the best way to capture the inner text in the following case?
inner_text = any*;
tag_cdata = '<![CDATA[' inner_text >cdata_start %cdata_end ']]>';
The problem is, it seems like the cdata_end action fires several times due to the fact that inner_text could match ].
I found the solution. You need to handle non-determinism. It wasn't clear initially, but the correct solution is something like this:
inner_text = any*;
tag_cdata = '<![CDATA[' inner_text >text_begin %text_end ']]>' %cdata_end;
action text_begin {
text_begin_at = p;
}
action text_end {
text_end_at = p;
}
action cdata_end {
delegate.cdata(data.byteslice(text_begin_at, text_end_at-text_begin_at))
}
Essentially, you wait until you are sure you parsed a complete CDATA tag before firing the callback, using information you previously captured.
In addition, I found that some forms of non-determinism in Ragel need to be explicitly handled using priorities. While this seems a bit ugly, it is the only solution in some cases.
When dealing with a pattern such as (a+ >a_begin %a_end | b)* you will find that the events are called for every single a encountered, rather than at the longest sub-sequence. This ambiguity, in some cases, can be solved using the longest match kleene star **. What this does is it prefers to match the existing pattern rather than wrapping around.
What was surprising to me, is that this actually modifies the way events are called, too. As an example, this produces a machine which is unable to buffer more than one character at a time when invoking callbacks:
%%{
machine example;
action a_begin {}
action a_end {}
main := ('a'+ >a_begin %a_end | 'b')*;
}%%
Produces:
You'll notice that it calls a_begin and a_end every time.
In contrast, we can make the inner loop and event handling greedy:
%%{
machine example;
action a_begin {}
action a_end {}
main := ('a'+ >a_begin %a_end | 'b')**;
}%%
which produces:
If I have Stream in Dart, I can use both listen and forEach, but I don't understand the difference.
So for example consider this code:
final process = await Process.start('pub', ['serve']);
process.stdout.map((l) => UTF8.decode(l)).forEach(print);
I could also have written:
process.stdout.map((l) => UTF8.decode(l)).listen(print);
Is there any difference?
The forEach function on a Stream will stop at the first error, and it won't give you a StreamSubscription to control how you listen on a stream. Using forEach is great if that's what you want - it tells you when it's done (the returned Future) and all you have to do is handle the events. If you need more control, you can use the listen call which is how forEach is implemented.
It's like the difference between Iterable.forEach and Iterable.iterator - the former just calls a callback for each element, the other gives you a way to control the iteration.
I'm building an nsIProtocolHandler implementation in Delphi. (more here)
And it's working already. Data the module builds gets streamed over an nsIInputStream. I've got all the nsIRequest, nsIChannel and nsIHttpChannel methods and properties working.
I've started testing and I run into something strange. I have a page "a.html" with this simple HTML:
<img src="a.png">
Both "xxm://test/a.html" and "xxm://test/a.png" work in Firefox, and give above HTML or the PNG image data.
The problem is with displaying the HTML page, the image doesn't get loaded. When I debug, I see:
NewChannel gets called for a.png, (when Firefox is processing an OnDataAvailable notice on a.html),
NotificationCallbacks is set (I only need to keep a reference, right?)
RequestHeader "Accept" is set to "image/png,image/*;q=0.8,*/*;q=0.5"
but then, the channel object is released (most probably due to a zero reference count)
Looking at other requests, I would expect some other properties to get set (such as LoadFlags or OriginalURI) and AsyncOpen to get called, from where I can start getting the request responded to.
Does anybody recognise this? Am I doing something wrong? Perhaps with LoadFlags or the LoadGroup? I'm not sure when to call AddRequest and RemoveRequest on the LoadGroup, and peeping from nsHttpChannel and nsBaseChannel I'm not sure it's better to call RemoveRequest early or late (before or after OnStartRequest or OnStopRequest)?
Update: Checked on the freshly new Firefox 3.5, still the same
Update: To try to further isolate the issue, I try "file://test/a1.html" with <img src="xxm://test/a.png" /> and still only get above sequence of events happening. If I'm supposed to add this secundary request to a load-group to get AsyncOpen called on it, I have no idea where to get a reference to it.
There's more: I find only one instance of the "Accept" string that get's added to the request headers, it queries for nsIHttpChannelInternal right after creating a new channel, but I don't even get this QueryInterface call through... (I posted it here)
Me again.
I am going to quote the same stuff from nsIChannel::asyncOpen():
If asyncOpen returns successfully, the
channel is responsible for keeping
itself alive until it has called
onStopRequest on aListener or called
onChannelRedirect.
If you go back to nsViewSourceChannel.cpp, there's one place where loadGroup->AddRequest is called and two places where loadGroup->RemoveRequest is being called.
nsViewSourceChannel::AsyncOpen(nsIStreamListener *aListener, nsISupports *ctxt)
{
NS_ENSURE_TRUE(mChannel, NS_ERROR_FAILURE);
mListener = aListener;
/*
* We want to add ourselves to the loadgroup before opening
* mChannel, since we want to make sure we're in the loadgroup
* when mChannel finishes and fires OnStopRequest()
*/
nsCOMPtr<nsILoadGroup> loadGroup;
mChannel->GetLoadGroup(getter_AddRefs(loadGroup));
if (loadGroup)
loadGroup->AddRequest(NS_STATIC_CAST(nsIViewSourceChannel*,
this), nsnull);
nsresult rv = mChannel->AsyncOpen(this, ctxt);
if (NS_FAILED(rv) && loadGroup)
loadGroup->RemoveRequest(NS_STATIC_CAST(nsIViewSourceChannel*,
this),
nsnull, rv);
if (NS_SUCCEEDED(rv)) {
mOpened = PR_TRUE;
}
return rv;
}
and
nsViewSourceChannel::OnStopRequest(nsIRequest *aRequest, nsISupports* aContext,
nsresult aStatus)
{
NS_ENSURE_TRUE(mListener, NS_ERROR_FAILURE);
if (mChannel)
{
nsCOMPtr<nsILoadGroup> loadGroup;
mChannel->GetLoadGroup(getter_AddRefs(loadGroup));
if (loadGroup)
{
loadGroup->RemoveRequest(NS_STATIC_CAST(nsIViewSourceChannel*,
this),
nsnull, aStatus);
}
}
return mListener->OnStopRequest(NS_STATIC_CAST(nsIViewSourceChannel*,
this),
aContext, aStatus);
}
Edit:
As I have no clue about how Mozilla works, so I have to guess from reading some code. From the channel's point of view, once the original file is loaded, its job is done. If you want to load the secondary items linked in file like an image, you have to implement that in the listener. See TestPageLoad.cpp. It implements a crude parser and it retrieves child items upon OnDataAvailable:
NS_IMETHODIMP
MyListener::OnDataAvailable(nsIRequest *req, nsISupports *ctxt,
nsIInputStream *stream,
PRUint32 offset, PRUint32 count)
{
//printf(">>> OnDataAvailable [count=%u]\n", count);
nsresult rv = NS_ERROR_FAILURE;
PRUint32 bytesRead=0;
char buf[1024];
if(ctxt == nsnull) {
bytesRead=0;
rv = stream->ReadSegments(streamParse, &offset, count, &bytesRead);
} else {
while (count) {
PRUint32 amount = PR_MIN(count, sizeof(buf));
rv = stream->Read(buf, amount, &bytesRead);
count -= bytesRead;
}
}
if (NS_FAILED(rv)) {
printf(">>> stream->Read failed with rv=%x\n", rv);
return rv;
}
return NS_OK;
}
The important thing is that it calls streamParse(), which looks at src attribute of img and script element, and calls auxLoad(), which creates new channel with new listener and calls AsyncOpen().
uriList->AppendElement(uri);
rv = NS_NewChannel(getter_AddRefs(chan), uri, nsnull, nsnull, callbacks);
RETURN_IF_FAILED(rv, "NS_NewChannel");
gKeepRunning++;
rv = chan->AsyncOpen(listener, myBool);
RETURN_IF_FAILED(rv, "AsyncOpen");
Since it's passing in another instance of MyListener object in there, that can also load more child items ad infinitum like a Russian doll situation.
I think I found it (myself), take a close look at this page. Why it doesn't highlight that the UUID has been changed over versions, isn't clear to me, but it would explain why things fail when (or just prior to) calling QueryInterface on nsIHttpChannelInternal.
With the new(er) UUID, I'm getting better results. As I mentioned in an update to the question, I've posted this on bugzilla.mozilla.org, I'm curious if and which response I will get there.