I'm learning Dart's Future, and have read some articles about the Future.
It says Dart is single-thread, and we can use Future to make some expensive functions run later, e.g. reading files.
Suppose reading a file will cost 10 seconds, and I have 3 files to read.
My dart code:
main() {
readFile("aaa.txt");
readFile("bbb.txt");
readFile("ccc.txt");
print("Will print the content of the files later");
}
readFile(String filename) {
File file = new File(filename);
file.readAsString().then((content) {
print("File content:\n");
print(content);
});
}
Since reading a file will cost 10 seconds, so the above code will cost at least 30 seconds, right? Using futures to read files just to make the expensive tasks run later one by one, without blocking current code, but won't reduce the total cost?
If in java, I can make a thread pool, and make 3 future tasks running in parallel, the total cost will be between 10 and 20 seconds.
Is it possible to do the same in Dart? Is using Dart's isolate the only solution?
I would expect that this could take 10 seconds, as it will start three reads, each of which will queue an callback to the "then" function when the read is complete. It is entirely possible that the three files will load in parallel and all complete after 10 seconds. The callbacks will be called on the main thread sequentially though.
Although the user code in dart is single threaded (assuming you don't use isolates or web workers), nothing says that the implementation can't create threads or use the operating system's asynchronous loading to perform tasks in parallel as long as the future's run sequentially in the main thread.
That's correct. If you start an new async path with new Timer(), new Future(), or scheduleMicrotask() it will be scheduled for later execution.
When one of your async paths is waiting for a network request or the file system returning data, another async path may jump in and run in the meantime. So you might get a runtime less than 30 seconds, but you can't reduce runtime by adding a CPU.
I have to admit, that I don't know details about when scheduling takes place and how it works exactly.
Dart has no threads, so if you want to run code in parallel you need isolates.
Almost 30 seconds.
I had just run the code with dart 2.16.2, and the result is almost 30 seconds.
here is my code:
import 'dart:async';
import 'dart:convert';
import 'dart:io';
import 'dart:isolate';
main() async {
print('main start');
printCurrentTime("main before all future");
Future(() => readFile(0));
Future(() => readFile(1));
Future(() => readFile(2));
Future(() {
printCurrentTime("future last");
});
print('main end');
printCurrentTime("main");
}
printCurrentTime(String name) {
print("$name ${DateTime.now().millisecondsSinceEpoch}");
}
readFile(number) {
print("start read file $number");
var watch = Stopwatch();
watch.start();
var filename = r"path/to/file";
File file = File(filename);
file.readAsBytes().then((content) {
printCurrentTime("\nfuture#$number start");
print("File $number content:");
print(content.toString().length);
printCurrentTime("future#$number finish");
print("finish read file $number");
});
}
And here is the result:
main start
main before all future 1652964314276
main end
main 1652964314278
// all the event queue start to run
start read file 0
start read file 1
start read file 2
future last 1652964314290
// the dart system read file parallelly, after finish read file
// they put the future to the event queue, and dart start running all
// those event task one by one:
future#0 start 1652964314343
File 0 content:
241398625
future#0 finish 1652964317457
finish read file 0
future#1 start 1652964317457
File 1 content:
241398625
future#1 finish 1652964320470
finish read file 1
future#2 start 1652964320471
File 2 content:
241398625
future#2 finish 1652964323403
finish read file 2
As we can see:
file.readAsBytes() take about 53ms (or 100ms sometime during my test)
content.toString() take about 3s or more
So we can come to this conclusion:
file.readAsBytes() all run in the other thread parallelly, and the value the return Future<Uint8List> is added to the Event Task dequeue which is run synchronously, that's why we can see the future#1 start... print one by one.
Related
import 'dart:async';
void main() async {
Future.microtask(() => print(1));
Future.value(2).then(print);
Future.sync(() => print(3));
Future.sync(() => 4).then(print);
}
Output I observe in dartpad:
3
1
2
4
Why didn't microtask get executed first?And what is different in the two Future.sync functions that printed them in different orders.
An important aspect with the .then() method, is the following you can find in the documentation:
When this future completes with a value, the onValue callback will be called with that value. If this future is already completed, the callback will not be called immediately, but will be scheduled in a later microtask.
https://api.dart.dev/stable/2.15.1/dart-async/Future/then.html
So what happens is that:
You schedule a microtask to call print(1).
Future.value(2) is done in sync and the following .then will be done in a microtask. The queue of microtasks is now: print(1), print(2).
The third line is sync and runs immediately in full so we run print(3). This gives the first line in your input.
We schedule a new microtask because of the .then(). The queue of microtasks is now: print(1), print(2), print(4).
After main() is done we run the microtasks in order they come in which explains the rest of your output.
An important note is that you are never awaiting on any of the returned Future from the .then() methods so any async stuff will first be executed when main() is done.
I am learning Dart and working with Isolate. I wrote next code, and expected that it will create three isolate process that will work infinity:
main() {
Isolate.spawn(echo, "Hello");
Isolate.spawn(echo, "Hello2");
Isolate.spawn(echo, "Hello3");
}
void echo(var message)
{
while(true)
{
print(message);
}
}
But I am getting very strange output like (every time different):
$ dart app.dart
Hello
Hello
Hello
Hello
HelloHello2
Hello
Hello3
Hello2
Hello
The VM will terminate the entire program as soon as the main isolate ends. For you, that happens after you have spawned all three isolates. There is nothing keeping the main isolate alive, so the entire program just ends ... eventually, when the isolate is done shutting down. When that is depends on timing, so it can vary quite a lot.
To keep an isolate alive forever, you can create a ReceivePort. Try addig:
var keepalive = ReceivePort();
to your program, then it should keep running forever.
Also, the printing is not just a list of lines containing hello's, they are intermixed.
The three isolates are running concurrently. They all write to the same output (stdout), so the outputs get intermixed. There is no promise that a print call is atomic, and it isn't, so a print call in one isolate can happen in the middle of a print call in another isolate.
What happens here is that print doesn't just print the argument, it also prints a newline afterwards. Those are two different writes to stdout, so it is possible for another isolate to print its message between the "Hello" and the "\n" following it.
I have an Apache Beam pipeline running on Google Dataflow whose job is rather simple:
It reads individual JSON objects from Pub/Sub
Parses them
And sends them via HTTP to some API
This API requires me to send the items in batches of 75. So I built a DoFn that accumulates events in a list and publish them via this API once they I get 75. This results to be too slow, so I thought instead of executing those HTTP requests in different threads using a thread pool.
The implementation of what I have right now looks like this:
private class WriteFn : DoFn<TheEvent, Void>() {
#Transient var api: TheApi
#Transient var currentBatch: MutableList<TheEvent>
#Transient var executor: ExecutorService
#Setup
fun setup() {
api = buildApi()
executor = Executors.newCachedThreadPool()
}
#StartBundle
fun startBundle() {
currentBatch = mutableListOf()
}
#ProcessElement
fun processElement(processContext: ProcessContext) {
val record = processContext.element()
currentBatch.add(record)
if (currentBatch.size >= 75) {
flush()
}
}
private fun flush() {
val payloadTrack = currentBatch.toList()
executor.submit {
api.sendToApi(payloadTrack)
}
currentBatch.clear()
}
#FinishBundle
fun finishBundle() {
if (currentBatch.isNotEmpty()) {
flush()
}
}
#Teardown
fun teardown() {
executor.shutdown()
executor.awaitTermination(30, TimeUnit.SECONDS)
}
}
This seems to work "fine" in the sense that data is making it to the API. But I don't know if this is the right approach and I have the sense that this is very slow.
The reason I think it's slow is that when load testing (by sending a few million events to Pub/Sub), it takes it up to 8 times more time for the pipeline to forward those messages to the API (which has response times of under 8ms) than for my laptop to feed them into Pub/Sub.
Is there any problem with my implementation? Is this the way I should be doing this?
Also... am I required to wait for all the requests to finish in my #FinishBundle method (i.e. by getting the futures returned by the executor and waiting on them)?
You have two interrelated questions here:
Are you doing this right / do you need to change anything?
Do you need to wait in #FinishBundle?
The second answer: yes. But actually you need to flush more thoroughly, as will become clear.
Once your #FinishBundle method succeeds, a Beam runner will assume the bundle has completed successfully. But your #FinishBundle only sends the requests - it does not ensure they have succeeded. So you could lose data that way if the requests subsequently fail. Your #FinishBundle method should actually be blocking and waiting for confirmation of success from the TheApi. Incidentally, all of the above should be idempotent, since after finishing the bundle, an earthquake could strike and cause a retry ;-)
So to answer the first question: should you change anything? Just the above. The practice of batching requests this way can work as long as you are sure the results are committed before the bundle is committed.
You may find that doing so will cause your pipeline to slow down, because #FinishBundle happens more frequently than #Setup. To batch up requests across bundles you need to use the lower-level features of state and timers. I wrote up a contrived version of your use case at https://beam.apache.org/blog/2017/08/28/timely-processing.html. I would be quite interested in how this works for you.
It may simply be that the extremely low latency you are expecting, in the low millisecond range, is not available when there is a durable shuffle in your pipeline.
I want to stop/sleep executing to simulate long time process, unfortunately I can't find information about it. I've read the following topic (How can I "sleep" a Dart program), but it isn't what I look for.
For example sleep() function from dart:io packages isn't applicable, because this package is not available in a browser.
For example:
import 'dart:html';
main() {
// I want to "sleep"/hang executing during several seconds
// and only then run the rest of function's body
querySelect('#loading').remove();
...other functions and actions...
}
I know that there is Timer class to make callbacks after some time, but still it doesn't prevent the execution of program as a whole.
There is no way to stop execution. You can either use a Timer, Future.delayed, or just use an endless loop which only ends after certain time has passed.
If you want a stop the world sleeping function, you could do it entirely yourself. I will mention that I don't recommend you do this, it's a very bad idea to stop the world, but if you really want it:
void sleep(Duration duration) {
var ms = duration.inMilliseconds;
var start = new DateTime.now().millisecondsSinceEpoch;
while (true) {
var current = new DateTime.now().millisecondsSinceEpoch;
if (current - start >= ms) {
break;
}
}
}
void main() {
print("Begin.");
sleep(new Duration(seconds: 2));
print("End.");
}
The test below attempts to run the less pager command and return once
the user quits. The problem is that it doesn't wait for user input, it
just lists the entire file and exits. Platform: xubuntu 12.04, Dart
Editor build: 13049.
import 'dart:io';
void main() {
shell('less', ['/etc/mime.types'], (exitCode) => exit(exitCode));
}
void shell(String cmd, List<String> opts, void onExit(int exitCode)) {
var p = Process.start(cmd, opts);
p.stdout.pipe(stdout); // Process output to stdout.
stdin.pipe(p.stdin); // stdin to process input.
p.onExit = (exitCode) {
p.close();
onExit(exitCode);
};
}
The following CoffeeScript function (using nodejs I/O) works:
shell = (cmd, opts, callback) ->
process.stdin.pause()
child = spawn cmd, opts, customFds: [0, 1, 2]
child.on 'exit', (code) ->
process.stdin.resume()
callback code
How can I make this work in Dart?
John has a good example about how to look at user input. But doesn't answer your original question. Unfortunately your question doesn't fit with how Dart operates. The two examples you have, the Dart version and CoffeeScript/Node.js version, do two completely different things.
In your CoffeeScript version, the spawn command is actually creating a new process and then passing execution over to that new process. Basically you're program is not interactively communicating with the process, rather your user is interacting with the spawned process.
In Dart it is different, your program is interacting with the spawned process. It is not passing off execution to the new process. Basically what you are doing is piping the input/output to and from the new process to your program itself. Since your program doesn't have a 'window height' from the terminal, it passes all the information at once. What you're doing in dart is almost equivalent to:
less /etc/mime.types | cat
You can use Process.start() to interactively communicate with processes. But it is your program which is interactively communicating with the process, not the user. Thus you can write a dart program which will launch and automatically play 'zork' or 'adventure' for instance, or log into a remote server by looking at the prompts from process's output.
However, at current there is no way to simply pass execution to the spawned process. If you want to communicate the process output to a user, and then also take user input and send it back to a process it involves an additional layer. And even then, not all programs (such as less) behave the same as they do when launched from a shell environment.
Here's a basic structure for reading console input from the user. This example reads lines of text from the user, and exits on 'q':
import 'dart:io';
import 'dart:isolate';
final StringInputStream textStream = new StringInputStream(stdin);
void main() {
textStream.onLine = checkBuffer;
}
void checkBuffer(){
final line = textStream.readLine();
if (line == null) return;
if (line.trim().toLowerCase() == 'q'){
exit(0);
}
print('You wrote "$line". Now write something else!');
}