I want to set up a queue of functions in Darts. Queuing should be asynchronous, allowing multiple functions to run concurrently. However, a maximum of three functions should be executed simultaneously. How can I achieve this?
I already tied working off a list but i am struggeling at adding a limit on same time running functions
List<String> queue = new List();
main(){
queue.add("...");
queue.add("...");
queue.add("...");
for(String q in queue){
await crawl(q);
}
}
crawl(String) async{
...
}
I would use a queue:
import "dart:collection";
final queue = Queue<String>();
main() {
queue
..add("...")
..add("...")
..add("...");
while (queue.isNotEmpty) {
await crawl(queue.removeFirst());
}
}
crawl(String x) async {
.... queue.add(...) ...
}
This should work. It will not do concurrent crawling because await each operation. If you want concurrent crawling, I recommend being a little more clever. Look for worker pools or similar structures to ensure that you only have a certain number of operations running at the same time.
Related
I have an Apache Beam pipeline running on Google Dataflow whose job is rather simple:
It reads individual JSON objects from Pub/Sub
Parses them
And sends them via HTTP to some API
This API requires me to send the items in batches of 75. So I built a DoFn that accumulates events in a list and publish them via this API once they I get 75. This results to be too slow, so I thought instead of executing those HTTP requests in different threads using a thread pool.
The implementation of what I have right now looks like this:
private class WriteFn : DoFn<TheEvent, Void>() {
#Transient var api: TheApi
#Transient var currentBatch: MutableList<TheEvent>
#Transient var executor: ExecutorService
#Setup
fun setup() {
api = buildApi()
executor = Executors.newCachedThreadPool()
}
#StartBundle
fun startBundle() {
currentBatch = mutableListOf()
}
#ProcessElement
fun processElement(processContext: ProcessContext) {
val record = processContext.element()
currentBatch.add(record)
if (currentBatch.size >= 75) {
flush()
}
}
private fun flush() {
val payloadTrack = currentBatch.toList()
executor.submit {
api.sendToApi(payloadTrack)
}
currentBatch.clear()
}
#FinishBundle
fun finishBundle() {
if (currentBatch.isNotEmpty()) {
flush()
}
}
#Teardown
fun teardown() {
executor.shutdown()
executor.awaitTermination(30, TimeUnit.SECONDS)
}
}
This seems to work "fine" in the sense that data is making it to the API. But I don't know if this is the right approach and I have the sense that this is very slow.
The reason I think it's slow is that when load testing (by sending a few million events to Pub/Sub), it takes it up to 8 times more time for the pipeline to forward those messages to the API (which has response times of under 8ms) than for my laptop to feed them into Pub/Sub.
Is there any problem with my implementation? Is this the way I should be doing this?
Also... am I required to wait for all the requests to finish in my #FinishBundle method (i.e. by getting the futures returned by the executor and waiting on them)?
You have two interrelated questions here:
Are you doing this right / do you need to change anything?
Do you need to wait in #FinishBundle?
The second answer: yes. But actually you need to flush more thoroughly, as will become clear.
Once your #FinishBundle method succeeds, a Beam runner will assume the bundle has completed successfully. But your #FinishBundle only sends the requests - it does not ensure they have succeeded. So you could lose data that way if the requests subsequently fail. Your #FinishBundle method should actually be blocking and waiting for confirmation of success from the TheApi. Incidentally, all of the above should be idempotent, since after finishing the bundle, an earthquake could strike and cause a retry ;-)
So to answer the first question: should you change anything? Just the above. The practice of batching requests this way can work as long as you are sure the results are committed before the bundle is committed.
You may find that doing so will cause your pipeline to slow down, because #FinishBundle happens more frequently than #Setup. To batch up requests across bundles you need to use the lower-level features of state and timers. I wrote up a contrived version of your use case at https://beam.apache.org/blog/2017/08/28/timely-processing.html. I would be quite interested in how this works for you.
It may simply be that the extremely low latency you are expecting, in the low millisecond range, is not available when there is a durable shuffle in your pipeline.
I have a simple test program to try ...
object ActorLeak extends App {
val system = ActorSystem("ActorLeak")
val times = 100000000
for (i <- 1 to times) {
val myActor = system.actorOf(Props(classOf[TryActor], i), name = s"TryActor-$i")
//Thread sleep 100
myActor ! StopCmd
if (i % 10000 == 0)
println(s"Completed $i")
}
println(s"Creating and stopping $times end.")
val hookThread = new Thread(new Runnable {
def run() {
system.shutdown()
}
})
Runtime.getRuntime.addShutdownHook(hookThread)
}
case object StopCmd
class TryActor(no: Int) extends Actor {
def receive = {
case StopCmd => context stop self
}
}
I found: sometime OutOfMemoryError, sometimes make JVM die, run slowly slowly ...
Is there memory leak in creation / stop of actors?
Actor creation and messaging are both asynchronous, when actorOf returns this does not mean the actor has been created yet, and when ! returns it does not mean the actor has received or acted upon the message.
This means that you are actually not creating and stopping an actor for each iteration, but that you trigger creation, and send a message, this loop is probably quicker in queueing up actor creation than the messages can arrive and trigger the stopping of the messages which fills up the heap of your JVM.
To do what you I think you are trying to do you would have to provide a response from the actor upon receiving the StopCmd and wait for that inside of your loop before continuing with the next iteration. This can be done with the ask pattern together with Await.result to block the main thread until the actor reply has returned.
Note that this is only useful for your understanding and not something that you would do in an actual system using Akka.
I want to stop/sleep executing to simulate long time process, unfortunately I can't find information about it. I've read the following topic (How can I "sleep" a Dart program), but it isn't what I look for.
For example sleep() function from dart:io packages isn't applicable, because this package is not available in a browser.
For example:
import 'dart:html';
main() {
// I want to "sleep"/hang executing during several seconds
// and only then run the rest of function's body
querySelect('#loading').remove();
...other functions and actions...
}
I know that there is Timer class to make callbacks after some time, but still it doesn't prevent the execution of program as a whole.
There is no way to stop execution. You can either use a Timer, Future.delayed, or just use an endless loop which only ends after certain time has passed.
If you want a stop the world sleeping function, you could do it entirely yourself. I will mention that I don't recommend you do this, it's a very bad idea to stop the world, but if you really want it:
void sleep(Duration duration) {
var ms = duration.inMilliseconds;
var start = new DateTime.now().millisecondsSinceEpoch;
while (true) {
var current = new DateTime.now().millisecondsSinceEpoch;
if (current - start >= ms) {
break;
}
}
}
void main() {
print("Begin.");
sleep(new Duration(seconds: 2));
print("End.");
}
I am making a (quick and dirty) Batching API that allows the UI to send a selection of REST API calls and get results for all of them at once.
I am using PromiseMap to make some asynchronous REST calls to the relevant services, which get collected afterward.
There could be a large number of threads that need to run, and I would like to throttle the number of threads that run at the same time, similar to Executor's thread pool.
Is this possible without physically separating the threads into multiple PromiseMaps and chaining them? I haven't found anything online describing limiting the thread pool.
//get requested calls
JSONArray callsToMake=request.JSON as JSONArray
//registers calls in promise map
def promiseMap = new PromiseMap()
//Can I limit this Map as a thread pool to, say, run 10 at a time until finished
data.each {
def tempVar=it
promiseMap[tempVar.id]={makeCall(tempVar.method, "${basePath}${tempVar.to}" as String, tempVar.body)}
}
def result=promiseMap.get()
def resultList=parseResults(result)
response.status=HttpStatusCodes.ACCEPTED
render resultList as JSON
I'm hoping there's a fairly straight-forward setting that I may be ignorant of.
Thank you.
The default Async implementation in Grails is GPars. To configure the number of threads you need to use a GParsPool. See:
http://gpars.org/guide/guide/dataParallelism.html#dataParallelism_parallelCollections_GParsPool
Example:
withPool(10) {...}
withPool doesn't seem to be working. Just incase if anyone is looking to limit threads here is what i did. We can create a custom Group with custom ThreadPool and specify the number of the Threads.
def customGroup = new DefaultPGroup(new DefaultPool(true, 5))
try {
Dataflow.usingGroup(customGroup, {
def promises = new PromiseList()
(1..100).each { number ->
promises << {
log.info "Performing Task ${number}"
Thread.sleep(200)
number++
}
}
def result = promises.get()
})
}
finally {
customGroup.shutdown()
}
Use
runtime 'org.grails:grails-async-gpars'
at build.gradle
And
GParsExecutorsPool.withPool(10){service ->
Shop.list().each{shop ->
Item.list().each{item ->
service.submit({createOrder(shop, item)} as Runnable)
}
}
}
in your Service for example
I'm learning Dart's Future, and have read some articles about the Future.
It says Dart is single-thread, and we can use Future to make some expensive functions run later, e.g. reading files.
Suppose reading a file will cost 10 seconds, and I have 3 files to read.
My dart code:
main() {
readFile("aaa.txt");
readFile("bbb.txt");
readFile("ccc.txt");
print("Will print the content of the files later");
}
readFile(String filename) {
File file = new File(filename);
file.readAsString().then((content) {
print("File content:\n");
print(content);
});
}
Since reading a file will cost 10 seconds, so the above code will cost at least 30 seconds, right? Using futures to read files just to make the expensive tasks run later one by one, without blocking current code, but won't reduce the total cost?
If in java, I can make a thread pool, and make 3 future tasks running in parallel, the total cost will be between 10 and 20 seconds.
Is it possible to do the same in Dart? Is using Dart's isolate the only solution?
I would expect that this could take 10 seconds, as it will start three reads, each of which will queue an callback to the "then" function when the read is complete. It is entirely possible that the three files will load in parallel and all complete after 10 seconds. The callbacks will be called on the main thread sequentially though.
Although the user code in dart is single threaded (assuming you don't use isolates or web workers), nothing says that the implementation can't create threads or use the operating system's asynchronous loading to perform tasks in parallel as long as the future's run sequentially in the main thread.
That's correct. If you start an new async path with new Timer(), new Future(), or scheduleMicrotask() it will be scheduled for later execution.
When one of your async paths is waiting for a network request or the file system returning data, another async path may jump in and run in the meantime. So you might get a runtime less than 30 seconds, but you can't reduce runtime by adding a CPU.
I have to admit, that I don't know details about when scheduling takes place and how it works exactly.
Dart has no threads, so if you want to run code in parallel you need isolates.
Almost 30 seconds.
I had just run the code with dart 2.16.2, and the result is almost 30 seconds.
here is my code:
import 'dart:async';
import 'dart:convert';
import 'dart:io';
import 'dart:isolate';
main() async {
print('main start');
printCurrentTime("main before all future");
Future(() => readFile(0));
Future(() => readFile(1));
Future(() => readFile(2));
Future(() {
printCurrentTime("future last");
});
print('main end');
printCurrentTime("main");
}
printCurrentTime(String name) {
print("$name ${DateTime.now().millisecondsSinceEpoch}");
}
readFile(number) {
print("start read file $number");
var watch = Stopwatch();
watch.start();
var filename = r"path/to/file";
File file = File(filename);
file.readAsBytes().then((content) {
printCurrentTime("\nfuture#$number start");
print("File $number content:");
print(content.toString().length);
printCurrentTime("future#$number finish");
print("finish read file $number");
});
}
And here is the result:
main start
main before all future 1652964314276
main end
main 1652964314278
// all the event queue start to run
start read file 0
start read file 1
start read file 2
future last 1652964314290
// the dart system read file parallelly, after finish read file
// they put the future to the event queue, and dart start running all
// those event task one by one:
future#0 start 1652964314343
File 0 content:
241398625
future#0 finish 1652964317457
finish read file 0
future#1 start 1652964317457
File 1 content:
241398625
future#1 finish 1652964320470
finish read file 1
future#2 start 1652964320471
File 2 content:
241398625
future#2 finish 1652964323403
finish read file 2
As we can see:
file.readAsBytes() take about 53ms (or 100ms sometime during my test)
content.toString() take about 3s or more
So we can come to this conclusion:
file.readAsBytes() all run in the other thread parallelly, and the value the return Future<Uint8List> is added to the Event Task dequeue which is run synchronously, that's why we can see the future#1 start... print one by one.