async operation concurrency understanding - ios

These queues manage the tasks you provide to GCD and execute those tasks in FIFO order. This guarantees that first task added to the queue is the first task started in the queue, the second task added will be the second to start, and so on down the line.
below code
let anotherQueue = DispatchQueue(label: "com.gcdTest.Queue", qos: .userInteractive)
anotherQueue.async {
anotherQueue.async{
anotherQueue.async{
anotherQueue.async {
print("task 6")
for _ in 1...300 { }
}
}
print("task 3")
for _ in 301...600 {}
}
anotherQueue.async{
anotherQueue.async{
print("task 5")
for _ in 700...900 {}
}
print("task 4")
for _ in 5000...7000 {}
}
print("task 1")
for _ in 9000...10000 {}
}
anotherQueue.async {
print("task 2")
for _ in 1...1000 {}
}
produces output
task 1
task 2
task 3
task 4
task 5
task 6
But when we run the same code in Concurrent it produces unpredictable output.
ex:- change first line of code to below line
let anotherQueue = DispatchQueue(label: "com.gcdTest.Queue", qos: .userInteractive, attributes: .concurrent)
output
task 3
task 2
task 1
task 4
task 5
task 6
By definition it states
Tasks in concurrent queues are guaranteed to start in the order they were added…and that’s about all you’re guaranteed!
So, expecting a similar output which is produced by serial queue(by default). (task1, task2, task3, task4, task5, task6)
Please any one help me out, where i am going wrong.

Bottom line, GCD will always start the tasks on a queue in the order that they were dispatched to that queue. In the case of a serial queue, that means that they will run sequentially, in that order, and this behavior is easily observable.
In the case of a concurrent queue, however, while it will start the tasks in the queued order, for tasks that are dispatched quickly in succession, they may all start quickly in succession, too, running concurrently with each other. In short, they may start running at nearly the same time, and you therefore have no assurances which will encounter its respective print statement first. Just because the concurrent queue started one task a few milliseconds after another, that provides no assurances regarding the order that those two tasks encounter their respective print statements.
In short, instead of deterministic behavior for the sequence of the print statements, you have a simple race with non-deterministic behavior.
As an aside, while it's clear that your example introduces races when employed on a concurrent queue, it should be noted that because of your nested dispatch statements, you'll have race conditions on your serial queue, too. It looks like the sequence of behavior is entirely predictable on serial queue, but it's not.
Let's consider a simplified version of your example. I'm assuming that we'll start this from the main thread:
queue.async {
queue.async {
print("task 3")
}
print("task 1")
}
queue.async {
print("task 2")
}
Clearly, task 1 will be added to the queue first and if that queue is free, it will start immediately on that background thread, while the main thread proceeds. But as the code on the main thread approaches the dispatching of task 2, task 1 will start and will proceed to dispatch task 3. You have a classic race between the dispatching of task 2 and task 3.
Now, in practice, you'll see task 2 dispatched before task 3, but it doesn't take much of a delay to introduce non-deterministic behavior. For example, on my computer, if, before dispatching task 2, a Thread.sleep(forTimeInterval: 0.00005) manifested the non-deterministic behavior. But even without delays (or for loops of a certain number of iterations), the behavior is technically non-deterministic.
But we can create simple example that eliminates the races implicit in the above examples, but still illustrates the difference between serial and concurrent queue behavior that you were originally asking about:
for i in 0 ..< 10 {
queue.async { [i] in
print(i)
}
}
This is guaranteed to print in order on serial queue, but not necessarily so on a concurrent queue.

Related

Is there an equivalent to Akka Streams' `conflate` and/or `batch` operators in Reactor?

I am looking for an equivalent of the batch and conflate operators from Akka Streams in Project Reactor, or some combination of operators that mimic their behavior.
The idea is to aggregate upstream items when the downstream backpressures in a reduce-like manner.
Note that this is different from this question because the throttleLatest / conflate operator described there is different from the one in Akka Streams.
Some background regarding what I need this for:
I am watching a change stream on a MongoDB and for every change I run an aggregate query on the MongoDB to update some metric. When lots of changes come in, the queries can't keep up and I'm getting errors. As I only need the latest value of the aggregate query, it is fine to aggregate multiple change events and run the aggregate query less often, but I want the metric to be as up-to-date as possible so I want to avoid waiting a fixed amount of time when there is no backpressure.
The closest I could come so far is this:
changeStream
.window(Duration.ofSeconds(1))
.concatMap { it.reduce(setOf<String>(), { applicationNames, event -> applicationNames + event.body.sourceReference.applicationName }) }
.concatMap { Flux.fromIterable(it) }
.concatMap { taskRepository.findTaskCountForApplication(it) }
but this would always wait for 1 second regardless of backpressure.
What I would like is something like this:
changeStream
.conflateWithSeed({setOf(it.body.sourceReference.applicationName)}, {applicationNames, event -> applicationNames + event.body.sourceReference.applicationName})
.concatMap { Flux.fromIterable(it) }
.concatMap { taskRepository.findTaskCountForApplication(it) }
I assume you always run only 1 query at the same time - no parallel execution. My idea is to buffer elements in list(which can be easily aggregated) as long as the query is running. As soon as the query finishes, another list is executed.
I tested it on a following code:
boolean isQueryRunning = false;
Flux.range(0, 1000000)
.delayElements(Duration.ofMillis(10))
.bufferUntil(aLong -> !isQueryRunning)
.doOnNext(integers -> isQueryRunning = true)
.concatMap(integers-> Mono.fromCallable(() -> {
int sleepTime = new Random().nextInt(10000);
System.out.println("processing " + integers.size() + " elements. Sleep time: " + sleepTime);
Thread.sleep(sleepTime);
return "";
})
.subscribeOn(Schedulers.elastic())
).doOnNext(s -> isQueryRunning = false)
.subscribe();
Which prints
processing 1 elements. Sleep time: 4585
processing 402 elements. Sleep time: 2466
processing 223 elements. Sleep time: 2613
processing 236 elements. Sleep time: 5172
processing 465 elements. Sleep time: 8682
processing 787 elements. Sleep time: 6780
Its clearly visible, that size of the next batch is proprortional to previous query execution time(Sleep time).
Note that it is not "real" backpressure solution, just a workaround. Also its not suited for parallel execution. It might also require some tuning in order to prevent running queries for empty batches.

Concurrent Queue Issue - iOS/Swift

In my program I need two tasks to run simultaneously in the background. To do that i have used concurrent queues as below,
let concurrentQueue = DispatchQueue(label: "concurrentQueue", qos: .utility, attributes: .concurrent)
concurrentQueue.async {
for i in 0 ..< 10{
print(i)
}
}
concurrentQueue.async {
for i in (0 ..< 10).reversed(){
print(i)
}
}
Here I need the output like this,
0
9
1
8
2
7
3
6
4
5
5
4
6
3
7
2
8
1
9
0
But what I get is,
I referred below tutorial in order to have some basic knowledge about Concurrent Queues in Swift 3
https://www.appcoda.com/grand-central-dispatch/
Can someone tell me what is wrong with my code? or else is it the result I should get? Is there any other ways to get my thing done? Any help would be highly appreciated.
There is nothing wrong with your code sample. That is the correct syntax for submitting two tasks to a concurrent queue.
The problem is the expectation that you'd necessarily see them run concurrently. There are two issues that could affect this:
The first dispatched task can run so quickly that it just happens to finish before the second task gets going. If you slow them down a bit, you'll see your concurrent behavior:
let concurrentQueue = DispatchQueue(label: Bundle.main.bundleIdentifier! + ".concurrentQueue", qos: .utility, attributes: .concurrent)
concurrentQueue.async {
for i in 0 ..< 10 {
print("forward: ", i)
Thread.sleep(forTimeInterval: 0.1)
}
}
concurrentQueue.async {
for i in (0 ..< 10).reversed() {
print("reversed:", i)
Thread.sleep(forTimeInterval: 0.1)
}
}
You'd never sleep in production code, but for pedagogical purposes, it can better illustrate the issue.
You can also omit the sleep calls, and just increase the numbers dramatically (e.g. 1_000 or 10_000), and you might start to see concurrent processing taking place.
Your device could be resource constrained, preventing it from running the code concurrently. Devices have a limited number of CPU cores to run concurrent tasks. Just because you submitted the tasks to concurrent queue, it doesn't mean the device is capable of running the two tasks at the same time. It depends upon the hardware and what else is running on that device.
By the way, note that you might see different behavior on the simulator (which is using your Mac's CPU, which could be running many other tasks) than on a device. You might want to make sure to test this behavior on an actual device, if you're not already.
Also note that you say you "need" the output to alternate print statements between the two queues. While the screen snapshots from your tutorial suggest that this should be expected, you have absolutely no assurances that this will be the case.
If you really need them to alternate back and forth, you have to add some mechanism to coordinate them. You can use semaphores (which I'm reluctant to suggest simply because they're such a common source of problems, especially for new developers) or operation queues with dependencies.
May be you could try using semaphore.
let semaphore1 = DispatchSemaphore(value: 1)
let semaphore2 = DispatchSemaphore(value: 0)
concurrentQueue.async {
for i in 0 ..< 10{
semaphore1.wait()
print(i)
semaphore2.signal()
}
}
concurrentQueue.async {
for i in (0 ..< 10).reversed(){
semaphore2.wait()
print(i)
semaphore1.signal()
}
}

GCD concurrent queue not starting tasks in FIFO order [duplicate]

This question already has answers here:
iOS GCD custom concurrent queue execution sequence
(2 answers)
Closed 5 years ago.
I have a class which contains two methods as per the example in Mastering Swift by Jon Hoffman. The class is as below:
class DoCalculation {
func doCalc() {
var x = 100
var y = x * x
_ = y/x
}
func performCalculation(_ iterations: Int, tag: String) {
let start = CFAbsoluteTimeGetCurrent()
for _ in 0..<iterations {
self.doCalc()
}
let end = CFAbsoluteTimeGetCurrent()
print("time for \(tag): \(end - start)")
}
}
Now in the viewDidLoad() of the ViewController from the single view template, I create an instance of the above class and then create a concurrent queue. I then add the blocks executing the performCalculation(: tag:) method to the queue.
cqueue.async {
print("Starting async1")
calculation.performCalculation(10000000, tag: "async1")
}
cqueue.async {
print("Starting async2")
calculation.performCalculation(1000, tag: "async2")
}
cqueue.async {
print("Starting async3")
calculation.performCalculation(100000, tag: "async3")
}
Every time I run the application on simulator, I get random out put for the start statements. Example outputs that I get are below:
Example 1:
Starting async1
Starting async3
Starting async2
time for async2: 4.1961669921875e-05
time for async3: 0.00238299369812012
time for async1: 0.117094993591309
Example 2:
Starting async3
Starting async2
Starting async1
time for async2: 2.80141830444336e-05
time for async3: 0.00216799974441528
time for async1: 0.114436984062195
Example 3:
Starting async1
Starting async3
Starting async2
time for async2: 1.60336494445801e-05
time for async3: 0.00220298767089844
time for async1: 0.129496037960052
I don't understand why the blocks don't start in FIFO order. Can somebody please explain what am I missing here?
I know they will be executed concurrently, but its stated that concurrent queue will respect FIFO for starting the execution of tasks, but won't guarantee which one completes first. So at least the starting task statements should have started with
Starting async1
Starting async3
Starting async2
and this completion statements random:
time for async2: 4.1961669921875e-05
time for async3: 0.00238299369812012
time for async1: 0.117094993591309
and the completion statements random.
A concurrent queue runs the jobs you submit to it concurrentlyThat's what it's for.
If you want a queue the runs jobs in FIFO order, you want a serial queue.
I see what you're saying about the docs claiming that the jobs will be submitted in FIFO order, but your test doesn't really establish the order in which they're run. If the concurrent queue has 2 threads available but only one processor to run those threads on, it might swap out one of the threads before it gets a chance to print, run the other job for a while, and then go back to running the first job. There's no guarantee that a job runs to the end before getting swapped out.
I don't think a print statement gives you reliable information about the order in which the jobs are started.
cqueue is a concurrent queue which is dispatching your block of work to three different threads(it actually depends on the threads availability) at almost the same time but you can not control the time at which each thread completes the work.
If you want to perform a task serially in a background queue, you are much better using serial queue.
let serialQueue = DispatchQueue(label: "serialQueue")
Serial Queue will start the next task in queue only when your previous task is completed.
"I don't understand why the blocks don't start in FIFO order" How do you know they don't? They do start in FIFO order!
The problem is that you have no way to test that. The notion of testing it is, in fact, incoherent. The soonest you can test anything is the first line of each block — and by that time, it is perfectly legal for another line of code from another block to execute, because these blocks are asynchronous. That is what asynchronous means.
So, they start in FIFO order, but there is no guarantee about the order in which, given multiple asynchronous blocks, their first lines will be executed.
With a concurrent queue, you are effectively specifing that they can run at the same time. So while they’re added in FIFO manner, you have a race condition between these various worker threads, and thus you have no assurance which will hit its respective print statement first.
So, this raises the question: Why do you care which order they hit their respective print statements? If order is really important, you shouldn't be using concurrent queue. Or, the other way of saying that, if you want to use a concurrent queue, write code that isn't dependent upon the order with which they run.
You asked:
Would you suggest some way to get the info when a Task is dequeued from the queue so that I can log it to get the FIFO order.
If you're asking how to enjoy FIFO starting of the tasks on concurrent queue in real-world app, the answer is "you don't", because of the aforementioned race condition. When using concurrent queues, never write code that is strictly dependent upon the FIFO behavior.
If you're asking how to verify this empirically for purely theoretical purposes, just do something that ties up the CPUs and frees them up one by one:
// utility function to spin for certain amount of time
func spin(for seconds: TimeInterval, message: String) {
let start = CACurrentMediaTime()
while CACurrentMediaTime() - start < seconds { }
os_log("%#", message)
}
// my concurrent queue
let queue = DispatchQueue(label: label, attributes: .concurrent)
// just something to occupy up the CPUs, with varying
// lengths of time; don’t worry about these re FIFO behavior
for i in 0 ..< 20 {
queue.async {
spin(for: 2 + Double(i) / 2, message: "\(i)")
}
}
// Now, add three tasks on concurrent queue, demonstrating FIFO
queue.async {
os_log(" 1 start")
spin(for: 2, message: " 1 stop")
}
queue.async {
os_log(" 2 start")
spin(for: 2, message: " 2 stop")
}
queue.async {
os_log(" 3 start")
spin(for: 2, message: " 3 stop")
}
You'll be able to see those last three tasks are run in FIFO order.
The other approach, if you want to confirm precisely what GCD is doing, is to refer to the libdispatch source code. It's admittedly pretty dense code, so it's not exactly obvious, but it's something you can dig into if you're feeling ambitious.

Async.StartImmediate vs Async.RunSynchronously

As my limited (or even wrong) understanding, both Async.StartImmediate and Async.RunSynchronously start an async computation on current thread. Then what is exactly the difference between these two functions? Can anyone help explain?
Update:
After looking into F# source code at https://github.com/fsharp/fsharp/blob/master/src/fsharp/FSharp.Core/control.fs, I think I kind of understand what happens. Async.StartImmediate starts the async on the current thread. After it hits an async binding, whether it will continue to run on the current thread depends on the async binding itself. For example, if the async binding calls Async.SwitchToThreadPool, it will run on ThreadPool instead of the current thread. In this case, you will need to call Async.SwitchToContext if you want to go back to the current thread. Otherwise, if the async binding doesn’t do any switch to other threads, Async.StartImmediate will continue to execute the async binding on the current thread. In this case, there is no need to call Async.SwitchToContext if you simply want to stay on the current thread.
The reason why Dax Fohl’s example works on GUI thread is because Async.Sleep carefully captures
the SynchronizationContext.Current and makes sure the continuation run in the captured context using
SynchronizationContext.Post(). See https://github.com/fsharp/fsharp/blob/master/src/fsharp/FSharp.Core/control.fs#L1631, where unprotectedPrimitiveWithResync wrapper changes the “args.cont” (the continuation)
to be a Post to the captured context (see: https://github.com/fsharp/fsharp/blob/master/src/fsharp/FSharp.Core/control.fs#L1008 — trampolineHolder.Post is basically SynchronizationContext.Post). This will only work
when SynchronizationContext.Current is not null, which is always the case for GUI thread. Especially,
if you run in a console app with StartImmediate, you will find Async.Sleep will indeed go to ThreadPool, because the main thread in console app doesn’t have SynchronizationContext.Current.
So to summarize, this indeed works with GUI thread because certain functions like Async.Sleep, Async.AwaitWaitHandle etc carefully capture and makes sure to post back to the previous context.
It looks this is a deliberate behavior, however this doesn’t seem to be documented anywhere in the MSDN.
Async.RunSynchronously waits until the entire computation is completed. So use this any time you need to run an async computation from regular code and need to wait for the result. Simple enough.
Async.StartImmediate ensures that the computation is run within the current context but doesn't wait until the entire expression is finished. The most common use for this (for me, at least) is when you want to run a computation on the GUI thread, asynchronously. For example if you wanted to do three things on the GUI thread at 1-second intervals, you could write
async {
do! Async.Sleep 1000
doThing1()
do! Async.Sleep 1000
doThing2()
do! Async.Sleep 1000
doThing3()
} |> Async.StartImmediate
That will ensure everything gets called in the GUI thread (assuming you call that from the GUI thread), but won't block the GUI thread for the whole 3 seconds. If you use RunSynchronously there, it'll block the GUI thread for the duration and your screen will become unresponsive.
(If you haven't done GUI programming, then just note that updates to GUI controls all have to be done from the same thread, which can be difficult to coordinate manually; the above takes away a lot of the pain).
To give another example, here:
// Async.StartImmediate
async {
printfn "Running"
do! Async.Sleep 1000
printfn "Finished"
} |> Async.StartImmediate
printfn "Next"
> Running
> Next
// 1 sec later
> Finished
// Async.RunSynchronously
async {
printfn "Running"
do! Async.Sleep 1000
printfn "Finished"
} |> Async.RunSynchronously
printfn "Next"
> Running
// 1 sec later
> Finished
> Next
// Async.Start just for completion:
async {
printfn "Running"
do! Async.Sleep 1000
printfn "Finished"
} |> Async.Start
printfn "Next"
> Next
> Running // With possible race condition since they're two different threads.
// 1 sec later
> Finished
Also note that Async.StartImmediate can't return a value (since it doesn't run to completion before continuing), whereas RunSynchronously can.

How to find out the PID of the Flinks execution process?

I want to measure flinks performance with performance counters (perf). My code:
var text = env.readTextFile("<filename>")
var counts = text.flatMap { _.toLowerCase.split("\\W+") }.map { (_, 1) }.groupBy(0).sum(1)
counts.writeAsText("<filename_result>", WriteMode.OVERWRITE)
env.execute()
I know the PID of the jobmanager. Also I can see the TID of the Thread (CHAIN DataSource), that runs the execute()-command, during execution. But for each execution the TID changes, so it wont work with the TID. Is there a way to figure out the PID of the jobmanagers child process, that runs the execute()-command? And are there different child processes for every transformation (e.g. flatMap) of the rdd? If so, is it possible to find out their distinct PIDs?
The individual operators are not executed in distinct processes. The JobManager and the TaskManagers are started as Java processes. The TaskManager then runs a set of parallel tasks (corresponding to the operators). Each parallel task is executed in its own thread. When you start Flink, then the system will create files /tmp/your-name-taskmanager.pid and /tmp/your-name-jobmanager.pid which contain the PID of the processes.

Resources