I have been using "usleep" to stop a thread during some milliseconds and I have checked that is stopping more time than expected.
I am sure I am doing something wrong, I am not an expert in Swift, but I don't understand it because it is very easy to check. For example:
DispatchQueue.global(qos: .background).async {
let timeStart: Int64 = Date().toMillis()
usleep(20 * 1000) // 20 ms
let timeEnd: Int64 = Date().toMillis()
let timeDif = timeEnd - timeStart
print("start: \(timeStart), end: \(timeEnd), dif: \(timeDif)")
}
The result:
start: 1522712546115, end: 1522712546235, dif: 120
If I execute the same in the main thread:
start: 1522712586996, end: 1522712587018, dif: 22
I think the way I generate the thread is wrong for stopping it. How could I generate a thread that works good with usleep?
Thanks
A couple of thoughts:
The responsiveness to usleep is a function of the Quality of Service of the queue. For example, doing thirty 20ms usleep calls on the five queue types, resulting in the following average and standard deviations (measured in ms):
QoS mean stdev
‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ ‐‐‐‐‐ ‐‐‐‐‐
background 99.14 28.06
utility 74.12 23.66
default 23.88 1.83
userInitiated 22.09 1.87
userInteractive 20.99 0.29
The higher the quality of service, the closer to 20 ms it got, and with a lower standard deviation.
If you want accurate time measurements, you should avoid using Date and/or CFAbsoluteTimeGetCurrent(). As the documentation warns us:
Repeated calls to this function do not guarantee monotonically increasing results. The system time may decrease due to synchronization with external time references or due to an explicit user change of the clock.
You can use a mach_time based value, such as conveniently returned by CACurrentMediaTime(), to avoid this problem. For example:
let start = CACurrentMediaTime()
// do something
let elapsed = CACurrentMediaTime() - start
If you need even higher accuracy, see Apple Technical Q&A #2169.
I, too, get times around 120 ms when sleeping a thread on a background queue:
import Foundation
DispatchQueue.global(qos: .background).async {
let timeStart = Date()
usleep(20 * 1000) // 20 ms
let timeEnd = Date()
let timeDif = timeEnd.timeIntervalSince(timeStart) * 1000
print("start: \(timeStart), end: \(timeEnd), dif: \(timeDif)")
exit(0)
}
CFRunLoopRun()
outputs:
start: 2018-04-03 00:10:54 +0000, end: 2018-04-03 00:10:54 +0000, dif: 119.734048843384
However, with default QoS, my results are consistently closer to 20 ms:
import Foundation
DispatchQueue.global(qos: .default).async {
let timeStart = Date()
usleep(20 * 1000) // 20 ms
let timeEnd = Date()
let timeDif = timeEnd.timeIntervalSince(timeStart) * 1000
print("start: \(timeStart), end: \(timeEnd), dif: \(timeDif)")
exit(0)
}
CFRunLoopRun()
start: 2018-04-03 00:12:15 +0000, end: 2018-04-03 00:12:15 +0000, dif: 20.035982131958
So it appears that the .background QoS is causing the behavior you are seeing. Although I don't have direct knowledge of why this would be, it doesn't seem too far-fetched to speculate that the OS may be somewhat more lax about waking up sleeping threads that are marked with a background QoS. Indeed, this is what Apple's documentation has to say about it:
A quality of service (QoS) class allows you to categorize work to be performed by NSOperation, NSOperationQueue, NSThread objects, dispatch queues, and pthreads (POSIX threads). By assigning a QoS to work, you indicate its importance, and the system prioritizes it and schedules it accordingly. For example, the system performs work initiated by a user sooner than background work that can be deferred until a more optimal time. In some cases, system resources may be reallocated away from the lower priority work and given to the higher priority work.
The possibility for your background work to be "deferred until a more optimal time" seems a plausible explanation of the behavior you're seeing.
Related
I have two streams, stream A and stream B. Both streams contain the same type of event which has an ID and a timestamp. For now, all i want the flink job to do is join the events that have the same ID inside of a window of 1 minute. The watermark is assigned on event.
sourceA = initialSourceA.map(parseToEvent)
sourceB = initialSourceB.map(parseToEvent)
streamA = sourceA
.assignTimestampsAndWatermarks(CustomWatermarkStrategy())
.keyBy(Event.Key)
streamB = sourceB
.assignTimestampsAndWatermarks(CustomWatermarkStrategy())
.keyBy(Event.Key)
streamA
.join(streamB)
.where(Event.Key)
.equalTo(Event.Key)
.window(TumblingEventTimeWindows.of(Time.of(1, TimeUnit.MINUTES)))
.apply(giveMePairOfEvents)
.print()
Inside my test I try to send the following:
sourceA.send(Event(ID_1, 0 seconds))
sourceB.send(Event(ID_1, 0 seconds))
//to increase the watermark
sourceA.send(Event(ID_1, 62 seconds))
sourceB.send(Event(ID_1, 62 seconds))
For parallelism = 1, I can see the events from time 0 getting joined together.
However, for parallelism = 2 the print does not display anything getting joined. To figure out the problem, I tried to print the events after the keyBy of each stream and I can see they are all running on the same instance. Placing the print after the watermarking, for obvious reasons, that the events are currently on the different instances.
This leads me to believe that I am somehow doing something incorrectly when it comes to watermarking since for a parallelism higher than 1 it doesn't increase the watermark. So here's a couple of questions i asked myself:
Is it possible that each event has a seperate watermark generator and i have to increase them specifically?
Do I run keyBy first and then watermark so that my events from each stream use the same watermarkgenerator?
Sending another set of events as follows:
sourceA.send(Event(ID_1, 0 seconds))
sourceB.send(Event(ID_1, 0 seconds))
//to increase the watermark
sourceA.send(Event(ID_1, 62 seconds))
sourceB.send(Event(ID_1, 62 seconds))
sourceA.send(Event(ID_1, 122 seconds))
sourceB.send(Event(ID_1, 122 seconds))
Ended up sending the joined first events. Further inspection showed that the third set of events used the same watermarkgenerator that the second one didn't use. Something which I am not very clear on why is happening. How can I assign and increase watermarks correctly when using a join function in Flink?
EDIT 1:
The custom watermark generator:
class CustomWaterMarkGenerator(
private val maxOutOfOrderness: Long,
private var currentMaxTimeStamp: Long = 0,
)
: WatermarkGenerator<EventType> {
override fun onEvent(event: EventType, eventTimestamp: Long, output: WatermarkOutput) {
val a = currentMaxTimeStamp.coerceAtLeast(eventTimestamp)
currentMaxTimeStamp = a
output.emitWatermark(Watermark(currentMaxTimeStamp - maxOutOfOrderness - 1));
}
override fun onPeriodicEmit(output: WatermarkOutput?) {
}
}
The watermark strategy:
class CustomWatermarkStrategy(
): WatermarkStrategy<Event> {
override fun createWatermarkGenerator(context: WatermarkGeneratorSupplier.Context?): WatermarkGenerator<Event> {
return CustomWaterMarkGenerator(0)
}
override fun createTimestampAssigner(context: TimestampAssignerSupplier.Context?): TimestampAssigner<Event> {
return TimestampAssigner{ event: Event, _: Long->
event.timestamp
}
}
}
Custom source:
The sourceFunction is currently an rsocket connection that connects to a mockstream where i can send events through mockStream.send(event). The first thing I do with the events is parse them using a map function (from string into my event type) and then i assign my watermarks etc.
Each parallel instance of the watermark generator will operate independently, based solely on the events it observes. Doing the watermarking immediately after the sources makes sense (although even better, in general, is to do watermarking directly in the sources).
An operator with multiple input channels (such as the keyed windowed join in your application) sets its current watermark to the minimum of the watermarks it has received from its active input channels. This has the effect that any idle source instances will cause the watermarks to stall in downstream tasks -- unless those sources explicitly mark themselves as idle. (And FLINK-18934 meant that prior to Flink 1.14 idleness propagation didn't work correctly with joins.) An idle source is a likely suspect in your situation.
One strategy for debugging this sort of problem is to bring up the Flink WebUI and observe the behavior of the current watermark in all of the tasks.
To get more help, please share the rest of the application, or at least the custom source and watermark strategy.
I am looking for an equivalent of the batch and conflate operators from Akka Streams in Project Reactor, or some combination of operators that mimic their behavior.
The idea is to aggregate upstream items when the downstream backpressures in a reduce-like manner.
Note that this is different from this question because the throttleLatest / conflate operator described there is different from the one in Akka Streams.
Some background regarding what I need this for:
I am watching a change stream on a MongoDB and for every change I run an aggregate query on the MongoDB to update some metric. When lots of changes come in, the queries can't keep up and I'm getting errors. As I only need the latest value of the aggregate query, it is fine to aggregate multiple change events and run the aggregate query less often, but I want the metric to be as up-to-date as possible so I want to avoid waiting a fixed amount of time when there is no backpressure.
The closest I could come so far is this:
changeStream
.window(Duration.ofSeconds(1))
.concatMap { it.reduce(setOf<String>(), { applicationNames, event -> applicationNames + event.body.sourceReference.applicationName }) }
.concatMap { Flux.fromIterable(it) }
.concatMap { taskRepository.findTaskCountForApplication(it) }
but this would always wait for 1 second regardless of backpressure.
What I would like is something like this:
changeStream
.conflateWithSeed({setOf(it.body.sourceReference.applicationName)}, {applicationNames, event -> applicationNames + event.body.sourceReference.applicationName})
.concatMap { Flux.fromIterable(it) }
.concatMap { taskRepository.findTaskCountForApplication(it) }
I assume you always run only 1 query at the same time - no parallel execution. My idea is to buffer elements in list(which can be easily aggregated) as long as the query is running. As soon as the query finishes, another list is executed.
I tested it on a following code:
boolean isQueryRunning = false;
Flux.range(0, 1000000)
.delayElements(Duration.ofMillis(10))
.bufferUntil(aLong -> !isQueryRunning)
.doOnNext(integers -> isQueryRunning = true)
.concatMap(integers-> Mono.fromCallable(() -> {
int sleepTime = new Random().nextInt(10000);
System.out.println("processing " + integers.size() + " elements. Sleep time: " + sleepTime);
Thread.sleep(sleepTime);
return "";
})
.subscribeOn(Schedulers.elastic())
).doOnNext(s -> isQueryRunning = false)
.subscribe();
Which prints
processing 1 elements. Sleep time: 4585
processing 402 elements. Sleep time: 2466
processing 223 elements. Sleep time: 2613
processing 236 elements. Sleep time: 5172
processing 465 elements. Sleep time: 8682
processing 787 elements. Sleep time: 6780
Its clearly visible, that size of the next batch is proprortional to previous query execution time(Sleep time).
Note that it is not "real" backpressure solution, just a workaround. Also its not suited for parallel execution. It might also require some tuning in order to prevent running queries for empty batches.
This question already has answers here:
iOS GCD custom concurrent queue execution sequence
(2 answers)
Closed 5 years ago.
I have a class which contains two methods as per the example in Mastering Swift by Jon Hoffman. The class is as below:
class DoCalculation {
func doCalc() {
var x = 100
var y = x * x
_ = y/x
}
func performCalculation(_ iterations: Int, tag: String) {
let start = CFAbsoluteTimeGetCurrent()
for _ in 0..<iterations {
self.doCalc()
}
let end = CFAbsoluteTimeGetCurrent()
print("time for \(tag): \(end - start)")
}
}
Now in the viewDidLoad() of the ViewController from the single view template, I create an instance of the above class and then create a concurrent queue. I then add the blocks executing the performCalculation(: tag:) method to the queue.
cqueue.async {
print("Starting async1")
calculation.performCalculation(10000000, tag: "async1")
}
cqueue.async {
print("Starting async2")
calculation.performCalculation(1000, tag: "async2")
}
cqueue.async {
print("Starting async3")
calculation.performCalculation(100000, tag: "async3")
}
Every time I run the application on simulator, I get random out put for the start statements. Example outputs that I get are below:
Example 1:
Starting async1
Starting async3
Starting async2
time for async2: 4.1961669921875e-05
time for async3: 0.00238299369812012
time for async1: 0.117094993591309
Example 2:
Starting async3
Starting async2
Starting async1
time for async2: 2.80141830444336e-05
time for async3: 0.00216799974441528
time for async1: 0.114436984062195
Example 3:
Starting async1
Starting async3
Starting async2
time for async2: 1.60336494445801e-05
time for async3: 0.00220298767089844
time for async1: 0.129496037960052
I don't understand why the blocks don't start in FIFO order. Can somebody please explain what am I missing here?
I know they will be executed concurrently, but its stated that concurrent queue will respect FIFO for starting the execution of tasks, but won't guarantee which one completes first. So at least the starting task statements should have started with
Starting async1
Starting async3
Starting async2
and this completion statements random:
time for async2: 4.1961669921875e-05
time for async3: 0.00238299369812012
time for async1: 0.117094993591309
and the completion statements random.
A concurrent queue runs the jobs you submit to it concurrentlyThat's what it's for.
If you want a queue the runs jobs in FIFO order, you want a serial queue.
I see what you're saying about the docs claiming that the jobs will be submitted in FIFO order, but your test doesn't really establish the order in which they're run. If the concurrent queue has 2 threads available but only one processor to run those threads on, it might swap out one of the threads before it gets a chance to print, run the other job for a while, and then go back to running the first job. There's no guarantee that a job runs to the end before getting swapped out.
I don't think a print statement gives you reliable information about the order in which the jobs are started.
cqueue is a concurrent queue which is dispatching your block of work to three different threads(it actually depends on the threads availability) at almost the same time but you can not control the time at which each thread completes the work.
If you want to perform a task serially in a background queue, you are much better using serial queue.
let serialQueue = DispatchQueue(label: "serialQueue")
Serial Queue will start the next task in queue only when your previous task is completed.
"I don't understand why the blocks don't start in FIFO order" How do you know they don't? They do start in FIFO order!
The problem is that you have no way to test that. The notion of testing it is, in fact, incoherent. The soonest you can test anything is the first line of each block — and by that time, it is perfectly legal for another line of code from another block to execute, because these blocks are asynchronous. That is what asynchronous means.
So, they start in FIFO order, but there is no guarantee about the order in which, given multiple asynchronous blocks, their first lines will be executed.
With a concurrent queue, you are effectively specifing that they can run at the same time. So while they’re added in FIFO manner, you have a race condition between these various worker threads, and thus you have no assurance which will hit its respective print statement first.
So, this raises the question: Why do you care which order they hit their respective print statements? If order is really important, you shouldn't be using concurrent queue. Or, the other way of saying that, if you want to use a concurrent queue, write code that isn't dependent upon the order with which they run.
You asked:
Would you suggest some way to get the info when a Task is dequeued from the queue so that I can log it to get the FIFO order.
If you're asking how to enjoy FIFO starting of the tasks on concurrent queue in real-world app, the answer is "you don't", because of the aforementioned race condition. When using concurrent queues, never write code that is strictly dependent upon the FIFO behavior.
If you're asking how to verify this empirically for purely theoretical purposes, just do something that ties up the CPUs and frees them up one by one:
// utility function to spin for certain amount of time
func spin(for seconds: TimeInterval, message: String) {
let start = CACurrentMediaTime()
while CACurrentMediaTime() - start < seconds { }
os_log("%#", message)
}
// my concurrent queue
let queue = DispatchQueue(label: label, attributes: .concurrent)
// just something to occupy up the CPUs, with varying
// lengths of time; don’t worry about these re FIFO behavior
for i in 0 ..< 20 {
queue.async {
spin(for: 2 + Double(i) / 2, message: "\(i)")
}
}
// Now, add three tasks on concurrent queue, demonstrating FIFO
queue.async {
os_log(" 1 start")
spin(for: 2, message: " 1 stop")
}
queue.async {
os_log(" 2 start")
spin(for: 2, message: " 2 stop")
}
queue.async {
os_log(" 3 start")
spin(for: 2, message: " 3 stop")
}
You'll be able to see those last three tasks are run in FIFO order.
The other approach, if you want to confirm precisely what GCD is doing, is to refer to the libdispatch source code. It's admittedly pretty dense code, so it's not exactly obvious, but it's something you can dig into if you're feeling ambitious.
At the answer to the question on Stack and in the book at here on page 52 I found the normal getTickCount getTickFrequency combination to measure time of execution gives time in milliseconds . However the OpenCV website says its time in seconds. I am confused. Please help...
There is no room for confusion, all the references you have given point to the same thing.
getTickCount gives you the number of clock cycles after a certain event, eg, after machine is switched on.
A = getTickCount() // A = no. of clock cycles from beginning, say 100
process(image) // do whatever process you want
B = getTickCount() // B = no. of clock cycles from beginning, say 150
C = B - A // C = no. of clock cycles for processing, 150-100 = 50,
// it is obvious, right?
Now you want to know how many seconds are these clock cycles. For that, you want to know how many seconds a single clock takes, ie clock_time_period. If you find that, simply multiply by 50 to get total time taken.
For that, OpenCV gives second function, getTickFrequency(). It gives you frequency, ie how many clock cycles per second. You take its reciprocal to get time period of clock.
time_period = 1/frequency.
Now you have time_period of one clock cycle, multiply it with 50 to get total time taken in seconds.
Now read all those references you have given once again, you will get it.
dwStartTimer=GetTickCount();
dwEndTimer=GetTickCount();
while((dwEndTimer-dwStartTimer)<wDelay)//delay is 5000 milli seconds
{
Sleep(200);
dwEndTimer=GetTickCount();
if (PeekMessage (&uMsg, NULL, 0, 0, PM_REMOVE) > 0)
{
TranslateMessage (&uMsg);
DispatchMessage (&uMsg);
}
}
I have this code:
for i in 1 .. 10 do
let (tree, interval) = time (fun () -> insert [12.; 6. + 1.0] exampletree 128.)
printfn "insertion time: %A" interval.TotalMilliseconds
()
with the time function defined as
let time f =
let start = DateTime.Now
let res = f ()
let finish = DateTime.Now
(res, finish - start)
the function insert is not relevant here, other than the fact that it doesn't employ mutation and thus returns the same value every time.
I get the results:
insertion time: 218.75
insertion time: 0.0
insertion time: 0.0
insertion time: 0.0
insertion time: 0.0
insertion time: 0.0
insertion time: 0.0
insertion time: 0.0
insertion time: 0.0
insertion time: 0.0
The question is why does the code calculate the result only once (from the insertion time, the result is always correct and equal)? Also, how to force the program to do computation multiple times (I need that for profiling purposes)?
Edit: Jared has supplied the right answer. Now that I know what to look for, I can get the stopwatch code from a timeit function for F#
I had the following results:
insertion time: 243.4247
insertion time: 0.0768
insertion time: 0.0636
insertion time: 0.0617
insertion time: 0.065
insertion time: 0.0564
insertion time: 0.062
insertion time: 0.069
insertion time: 0.0656
insertion time: 0.0553
F# does not do automatic memoization of your functions. In this case memo-ization would be incorrect. Even though you don't mutate items directly you are accessing a mutable value (DateTime.Now) from within your function. Memoizing that or a function accessing it would be a mistake since it can change from call to call.
What you're seeing here is an effect of the .Net JIT. The first time this is run the function f() is JIT'd and produces a noticable delay. The other times it's already JIT'd and executes a time which is smaller than the granularity of DateTime
One way to prove this is to use a more granular measuring class like StopWatch. This will show the function executes many times.
The first timing is probably due to JIT compilation. The actual code you're timing probably runs in less time than DateTime is able to measure.
Edit: Beaten by 18 secs... I'm just glad I had the right idea :)