dart streams and await - dart

I'm having trouble understanding the flow of the following code.
The code should process MERGE_SIZE lines (3 in this run), save the lines to a 'phase' file, and then process the next 3 lines and so on.
The call to savePhase has an await so I was expecting for the savePhase to complete before additional lines are processed.
As you can see in the output below every line is process and then the savePhase calls complete.
Future _sort() async {
var completer = Completer<void>();
var instance = 0;
var lineCount = MERGE_SIZE;
var phaseDirectory = Directory.systemTemp.createTempSync();
var list = <String>[];
var sentToPhase = false;
await File(filename)
.openRead()
.map(utf8.decode)
.transform(LineSplitter())
.forEach((l) async {
list.add(l);
print('$l linecount:$lineCount');
lineCount--;
if (lineCount == 0) {
lineCount = MERGE_SIZE;
instance++;
sentToPhase = true;
await savePhase(phaseDirectory, 1, instance, list, lineDelimiter);
list.clear();
print('savePhase completed');
}
});
which outputs
9 line linecount:3
8 line linecount:2
7 line linecount:1
6 line linecount:3
5 line linecount:2
4 line linecount:1
3 line linecount:3
2 line linecount:2
1 line linecount:1
savePhase completed
savePhase completed
savePhase completed
Is this something to do with the streams that openRead uses to deliver the read lines?
I thought I had await figured out, but apparently not :)

Not tested your program but I am fairly sure that your problem is that you expect the forEach() method are waiting for each Future to be completed before the next call which are not the case.
Try take a look at the following solution which are about more or less the same problem:
Sequential processing of a variable number of async functions in Dart
Flow of program
So what happens in your code is that the file your are reading seems to be small enough that the whole content can be read in one go in one of the buffers used when reading a file. This will explain why you see multiple line linecount lines before savePhase completed.
As previous mentioned, the forEach() method on Stream does not take into account that the method given as parameter does return a Future which should be awaited. You can see that in the implementation showed here:
https://api.dart.dev/stable/2.7.0/dart-async/Stream/forEach.html
So that means that the Future returned from calling forEach() does really just complete when all lines has been processed but does not wait for each Future generated for each line (remember, a async method does always return a Future regardless of it contains an await).
Since you also use shared variables between each spawned Future you will also get some funky behavior here since you e.g. share the same list but also clearing the list afterwards. So there are a potential here for errors here.

Related

Waiting for 20 seconds before continuing (permanent error)

the get_chat_history and egt_chat_members methods throw a permanent waiting error -Waiting for 20 (23,21,22,18) seconds before continuing. get_chat works fine. This error appeared a couple of days ago.
async with tg_cl:
while True:
try:
async for members in tg_cl.get_chat_members(target):
members_chat.append(members)
break
except FloodWait as Err:
print("Flood wait: {} seconds".format(Err.value))
sleep(Err.value)
continue
...............
async with tg_cl:
while True:
try:
if 'join' in chat:
info_chat = await tg_cl.join_chat(chat)
else:
info_chat = await tg_cl.get_chat(chat)
async for messages in tg_cl.get_chat_history(chat, limit=1, offset_id=-1):
count_messages = messages.id
break
except FloodWait as Err:
print("Flood wait: {} seconds".format(Err.value))
sleep(Err.value)
continue
Pyrogram already handles FloodWait errors on its own, you don't need to apply any logic yourself.
When setting up your Client instance, you can set the sleep_threshold. This is the amount of time that Pyrogram will handle a FloodWait error on its own, without any logic needed from you. You can set it to an arbitrarily high value to not get any errors anymore. Keep in mind that Pyrogram will silently handle these errors itself and only print something like "waiting x seconds" in your output.
list_of_members = []
for member in app.get_chat_members(chat_id):
list_of_members.append(member.id)
print(list_of_members)
[123, 456, 789, ...]
Please note that in Channels you can only retrieve 200 members at a time, in chats only 10 000 (ten thousand), this is a hard limit by the Server.
See Pyrogram's documentation on the available arguments, as well as some examples:
https://docs.pyrogram.org/api/methods/get_chat_members

Dart: getElementsByClassName returns a 0 element list but the data is there

I'm writing a function that will parse certain websites and fetch data from there, which will be used to create instances of a class. I'm able to successfully extract the data when it is retrieved using the getElementById() function, but for some reason, the getElementsByClassName() always returns a node list with 0 elements.
The site I'm currently parsing is here.
If you search for 'datas-nev', you will find exactly one match:
<p class="datas-nev"><b>Kutya neve: </b>Jhonny</p>
And here is the code use for parsing:
import 'package:html/parser.dart' show parse;
...
final response = await http.get(URL);
var document = parse(response.body);
var detailsContainer = document.getElementById('husky_details_container_right');
var dogName = new List<Node>();
dogName = document.getElementsByClassName('datas-nev');
The contents of the detailsContainer can be extracted successfully, for example this gives me back a string of relevant data I will use later:
var humanBehaviourValue;
try { humanBehaviourValue = detailsContainer.nodes[1].nodes[19].nodes[1].nodes[7].nodes[1].toString(); }
catch (e) { humanBehaviourValue = 'N/A'; }
But when I check the value of dogName in the debug window, I get the following:
dogName = {_growableList} size = 0
I already tried initializing the dogName 'properly' by List<Node> dogName = new List<Node>(); but it didn't help. I also tried other datas-* values, but it seems the parser can't find them. I even tried using just datas (because that is a div, while others are paragraphs), but that didn't help either.
Basically I could just hardwire the name and some data (breed, color, etc) as those never really change, but the location of the shelter can change, and keeping it up-to-date by scraping the data seems better than pushing updates out manually. That means I mostly need the value of datas-helyszin but that isn't parsed either.
As #Günter Zöchbauer pointed out, the code actually works. I was just looking for the value too soon, before it was actually fetched...

Can RxJS be used in a pull-based way?

The examples in the RxJS README seem to suggest we have to subscribe to a source. In other words: we wait for the source to send events. In that sense, sources seem to be push-based: the source decides when it creates new items.
This contrasts, however, with iterators, where strictly speaking new items need only be created when requested, i.e., when a call is made to next(). This is pull-based behavior, also known as lazy generation.
For instance, a stream could return all Wikipedia pages for prime numbers. The items are only generated when you ask for them, because generating all of them upfront is quite an investment, and maybe only 2 or 3 of them might be read anyway.
Can RxJS also have such pull-based behavior, so that new items are only generated when you ask for them?
The page on backpressure seems to indicate that this is not possible yet.
Short answer is no.
RxJS is designed for reactive applications so as you already mentioned if you need pull-based semantics you should be using an Iterator instead of an Observable. Observables are designed to be the push-based counterparts to the iterator, so they really occupy different spaces algorithmically speaking.
Obviously, I can't say this will never happen, because that is something the community will decide. But as far as I know 1) the semantics for this case just aren't that good and 2) this runs counter to the idea of reacting to data.
A pretty good synopsis can be found here. It is for Rx.Net but the concepts are similarly applicable to RxJS.
Controlled observable from the page you referenced can change a push observable to pull.
var controlled = source.controlled();
// this callback will only be invoked after controlled.request()
controlled.subscribe(n => {
log("controlled: " + n);
// do some work, then signal for next value
setTimeout(() => controlled.request(1), 2500);
});
controlled.request(1);
A truly synchronous iterator is not possible, as it would block when the source was not emitting.
In the snippet below, the controlled subscriber only gets a single item when it signals, and it does not skip any values.
var output = document.getElementById("output");
var log = function(str) {
output.value += "\n" + str;
output.scrollTop = output.scrollHeight;
};
var source = Rx.Observable.timer(0, 1000);
source.subscribe(n => log("source: " + n));
var controlled = source.controlled();
// this callback will only be invoked after controlled.request()
controlled.subscribe(n => {
log("controlled: " + n);
// do some work, then signal for next value
setTimeout(() => controlled.request(1), 2500);
});
controlled.request(1);
<script src="https://cdnjs.cloudflare.com/ajax/libs/rxjs/2.5.2/rx.all.js"></script>
<body>
<textarea id="output" style="width:150px; height: 150px"></textarea>
</body>
I'm quite late to the party, but it's actually very simple to combine generators with observables. You can pull a value from a generator function by syncing it with a source observable:
const fib = fibonacci()
interval(500).pipe(
map(() => fib.next())
)
.subscribe(console.log)
Generator implementation for reference:
function* fibonacci() {
let v1 = 1
let v2 = 1
while (true) {
const res = v1
v1 = v2
v2 = v1 + res
yield res
}
}

How do I check the end of Stream in Dart?

fellow dart programmers.
I am reading in a file using Stream as below.
Stream<List<int>> stream = new File(filepath).openRead();
stream
.transform(UTF8.decoder)
.transform(const LineSpilitter())
.listen((line){
// TODO: check if this is the last line of the file
var isLastLine;
});
I want to check whether the line in listen() is the last line of the file.
I don't think you can check if the current chunk of data is the last one.
You can only pass a callback that is called when the stream is closed.
Stream<List<int>> stream = new File('main.dart').openRead();
stream.
.transform(UTF8.decoder)
.transform(const LineSpilitter())
.listen((line) {
// TODO: check if this is the last line of the file
var isLastLine;
}
,onDone: (x) => print('done')); // <= add a second callback
While the answer from #Günter Zöchbauer works, the last property of streams accomplishes exactly what you are asking for (I guess the dart team added this functionality in the last 5 years).
Stream<List<int>> stream = new File('main.dart').openRead();
List<int> last = await stream.last;
But note: It is not possible to listen to the stream and use await stream.last.
This will cause an error:
StateError (Bad state: Stream has already been listened to.)

nsIProtocolHandler: trouble loading image for html page

I'm building an nsIProtocolHandler implementation in Delphi. (more here)
And it's working already. Data the module builds gets streamed over an nsIInputStream. I've got all the nsIRequest, nsIChannel and nsIHttpChannel methods and properties working.
I've started testing and I run into something strange. I have a page "a.html" with this simple HTML:
<img src="a.png">
Both "xxm://test/a.html" and "xxm://test/a.png" work in Firefox, and give above HTML or the PNG image data.
The problem is with displaying the HTML page, the image doesn't get loaded. When I debug, I see:
NewChannel gets called for a.png, (when Firefox is processing an OnDataAvailable notice on a.html),
NotificationCallbacks is set (I only need to keep a reference, right?)
RequestHeader "Accept" is set to "image/png,image/*;q=0.8,*/*;q=0.5"
but then, the channel object is released (most probably due to a zero reference count)
Looking at other requests, I would expect some other properties to get set (such as LoadFlags or OriginalURI) and AsyncOpen to get called, from where I can start getting the request responded to.
Does anybody recognise this? Am I doing something wrong? Perhaps with LoadFlags or the LoadGroup? I'm not sure when to call AddRequest and RemoveRequest on the LoadGroup, and peeping from nsHttpChannel and nsBaseChannel I'm not sure it's better to call RemoveRequest early or late (before or after OnStartRequest or OnStopRequest)?
Update: Checked on the freshly new Firefox 3.5, still the same
Update: To try to further isolate the issue, I try "file://test/a1.html" with <img src="xxm://test/a.png" /> and still only get above sequence of events happening. If I'm supposed to add this secundary request to a load-group to get AsyncOpen called on it, I have no idea where to get a reference to it.
There's more: I find only one instance of the "Accept" string that get's added to the request headers, it queries for nsIHttpChannelInternal right after creating a new channel, but I don't even get this QueryInterface call through... (I posted it here)
Me again.
I am going to quote the same stuff from nsIChannel::asyncOpen():
If asyncOpen returns successfully, the
channel is responsible for keeping
itself alive until it has called
onStopRequest on aListener or called
onChannelRedirect.
If you go back to nsViewSourceChannel.cpp, there's one place where loadGroup->AddRequest is called and two places where loadGroup->RemoveRequest is being called.
nsViewSourceChannel::AsyncOpen(nsIStreamListener *aListener, nsISupports *ctxt)
{
NS_ENSURE_TRUE(mChannel, NS_ERROR_FAILURE);
mListener = aListener;
/*
* We want to add ourselves to the loadgroup before opening
* mChannel, since we want to make sure we're in the loadgroup
* when mChannel finishes and fires OnStopRequest()
*/
nsCOMPtr<nsILoadGroup> loadGroup;
mChannel->GetLoadGroup(getter_AddRefs(loadGroup));
if (loadGroup)
loadGroup->AddRequest(NS_STATIC_CAST(nsIViewSourceChannel*,
this), nsnull);
nsresult rv = mChannel->AsyncOpen(this, ctxt);
if (NS_FAILED(rv) && loadGroup)
loadGroup->RemoveRequest(NS_STATIC_CAST(nsIViewSourceChannel*,
this),
nsnull, rv);
if (NS_SUCCEEDED(rv)) {
mOpened = PR_TRUE;
}
return rv;
}
and
nsViewSourceChannel::OnStopRequest(nsIRequest *aRequest, nsISupports* aContext,
nsresult aStatus)
{
NS_ENSURE_TRUE(mListener, NS_ERROR_FAILURE);
if (mChannel)
{
nsCOMPtr<nsILoadGroup> loadGroup;
mChannel->GetLoadGroup(getter_AddRefs(loadGroup));
if (loadGroup)
{
loadGroup->RemoveRequest(NS_STATIC_CAST(nsIViewSourceChannel*,
this),
nsnull, aStatus);
}
}
return mListener->OnStopRequest(NS_STATIC_CAST(nsIViewSourceChannel*,
this),
aContext, aStatus);
}
Edit:
As I have no clue about how Mozilla works, so I have to guess from reading some code. From the channel's point of view, once the original file is loaded, its job is done. If you want to load the secondary items linked in file like an image, you have to implement that in the listener. See TestPageLoad.cpp. It implements a crude parser and it retrieves child items upon OnDataAvailable:
NS_IMETHODIMP
MyListener::OnDataAvailable(nsIRequest *req, nsISupports *ctxt,
nsIInputStream *stream,
PRUint32 offset, PRUint32 count)
{
//printf(">>> OnDataAvailable [count=%u]\n", count);
nsresult rv = NS_ERROR_FAILURE;
PRUint32 bytesRead=0;
char buf[1024];
if(ctxt == nsnull) {
bytesRead=0;
rv = stream->ReadSegments(streamParse, &offset, count, &bytesRead);
} else {
while (count) {
PRUint32 amount = PR_MIN(count, sizeof(buf));
rv = stream->Read(buf, amount, &bytesRead);
count -= bytesRead;
}
}
if (NS_FAILED(rv)) {
printf(">>> stream->Read failed with rv=%x\n", rv);
return rv;
}
return NS_OK;
}
The important thing is that it calls streamParse(), which looks at src attribute of img and script element, and calls auxLoad(), which creates new channel with new listener and calls AsyncOpen().
uriList->AppendElement(uri);
rv = NS_NewChannel(getter_AddRefs(chan), uri, nsnull, nsnull, callbacks);
RETURN_IF_FAILED(rv, "NS_NewChannel");
gKeepRunning++;
rv = chan->AsyncOpen(listener, myBool);
RETURN_IF_FAILED(rv, "AsyncOpen");
Since it's passing in another instance of MyListener object in there, that can also load more child items ad infinitum like a Russian doll situation.
I think I found it (myself), take a close look at this page. Why it doesn't highlight that the UUID has been changed over versions, isn't clear to me, but it would explain why things fail when (or just prior to) calling QueryInterface on nsIHttpChannelInternal.
With the new(er) UUID, I'm getting better results. As I mentioned in an update to the question, I've posted this on bugzilla.mozilla.org, I'm curious if and which response I will get there.

Resources