What are Tasks and Frames in Real-Time? - task

I am trying to understand the differences between tasks and frames in real-time system. If my understanding is correct, tasks are mainly the combination of different threads that need to be run at a specific rat. For example, I might have task A that has 10 threads. I need to run task A every and I need to repeat the task every 30 ms (i.e. need to finish running all 10 threads by 30 ms). Also, If I cannot finish running everything with in by 30 ms, task 'A' will be "Overrunning".
In relation to this, what is a frame in real-time and how does it fit in with tasks?

I found out that "Passs" are often put out as "Frames" where each pass is actually the rate at which the scheduler runs each task.
e.g. If I have my system demading 100Hz rate:
TASKS RATE(Hz) FRAMES(PASS)
- - - - - - - - - - - - - - - - -
TASK1 100 1
TASK2 50 2
TASK3 25 4
TASK4 12.5 8
TASK5 12.5 4
100 Hz can be divided as:
2 passes (each 50 Hz)
4 passes (each 25 Hz)
8 passes (each 12.5 Hz)
16 passes (each 6.25 Hz)
Silly things!

Related

Is synchronised looping supported for AKPlayers that are multiples in their duration?

Id' like to know if synchronised looping is supported for AKPlayer(s) that are multiples in their duration?
Seems that is not supported or if not intended, it's a bug? Found similar report here (How to use the loop if the track was not started from the beginning (with buffering type = .always in AKPlayer )), where I thought I was providing a solution but after plenty of tests found that the solution provided does not work either. See attachment (*)
I've planned to record some loops that have a duration that is the same or a multiple of the smallest loop. Firstly, found that synchronization failed when trying to start .play for several AKPlayer at the same AVAudioTime start point. After a few attempts, fixed by sticking to buffering .always, among other things such as .prepare method. So, hopefully, that's out of the way...
The problem is that I expect to listen to a bunch of loops play synchronously, even if some are 2x or 4x longer in duration...
So while expecting to have looping work for the main requirement where:
- Loop1 of duration 2.5 [looping]
- Loop2 of duration 2.5 [looping]
- Loop3 of duration 5 [looping]
Noticed that the Loop3 behaves badly, where the last half repeats a few times, let's say for a 4/4, looking at the beat numbers we'd hear the following:
- Loop1: 1 2 3 4, 1 2 3 4, 1 2 3 4, 1 2 3 4
- Loop2: 1 2 3 4, 1 2 3 4, 1 2 3 4, 1 2 3 4
- Loop3: 1 2 3 4 5 6 7 8, 5 6 7 8, 5 6 7 8
Is this expected to fail? is loop of separate players that the duration is multiples, a feature that is supported?
After a few more tests, I find that this happens after adding a third track. For example:
- Loop1: 1 2 3 4
- Loop2: 1 2 3 4 5 6 7 8
Seems to work fine this far, but now I add a new track:
Loop1: 1 2 3 4
Loop2: 1 2 3 4 5 6 7 8
Loop3: 1 2 3 4
And what I hear is:
Loop1: 1 2 3 4 1 2 3 4 1 2 3 4
Loop2: 1 2 3 4 1 2 3 4 5 6 7 8
Loop3: 1 2 3 4 1 2 3 4 1 2 3 4
I'd try AKClipRecorder but just found that I need to declare the length ahead of recording time, it breaks the main requirement :)
(*) Audio file exposing the issue, this test was done with AKWaveTable but seems to be the same problem. I'll look into rewriting some code that is easier to share to see if it's related to my implementation but, there's the link I've shared at the top, where someone else exposes the same problem.
https://drive.google.com/open?id=1zxIJgFFvTwGsve11RFpc-_Z94gEEzql7
I believe that I got the problem and that is related to scheduling the play start time for newer loops.
Before, I'd record a loop and then play it at the currentTime that is the value of a master player. The problem with that is regarding the startTime that the player holds in its state, which is immutable given that is read from memory, from my point of view. Which will always be true to more or less the end-point of the master loop, which is mid-point for the recorded loop that happens to be twice the size or another multiple of the master loop.
To solve this I've scheduled the player items differently, as follows:
player.startTime = 0
player.endTime = audioFile.duration
let offsetCurrentime = ((beatLength * 4.0) - currentTime)
player.play(at: AVAudioTime.now() + offsetCurrentime)
The .startTime defines the start of the loop start point, I've also declared the duration length as the .endTime; Finally, I've computed the length of the master bar or the master loop that I use as a reference (or looper clock), which then is passed to the play method. Meaning that I'm scheduling it to play to the startTime and not from the currentTime as that would cause issues, as I've exposed before!
To summarize, use the property at of method .play to schedule when to start from the starting point and NOT from the current time the loop is on playing.

Simulating loops in Google Spreadsheet using build-in formulas

I have such columns in GS:
Equipments Amount . Equipment 1 Equipment 2
---------- ------- ----------- -----------
Equipment 1 2 Process 1 Process 3
Equipment 2 3 Process 2 Process 4
Process 5
I need to produce equipment 1 x2, and equipment 2 x3.
When equipments are produced, then Process 1 is executed 2 times, Process 2 - 2 times, Process 3 - 3 times, Process 4 - 3 times, Process 5 - 3 times.
So I need to generate such list:
Process 1
Process 1
Process 2
Process 2
Process 3
Process 3
Process 3
Process 4
Process 4
Process 4
Process 5
Process 5
Process 5
Of course, I want a formula which will be dynamic (e.g. can add another equipment or change processes in particular equipment)
1 list using rept:
=TRANSPOSE(SPLIT(JOIN(",",FILTER(REPT(C2:C&",",B2),C2:C<>"")),","))
Multy-list rept:
=TRANSPOSE(SPLIT(JOIN(",",FILTER(REPT(C2:C&",",VLOOKUP(D2:D,A:B,2,)),C2:C<>"")),","))
There is no easy way to solve your problem with formulas.
I would strongly suggest you write a script. It's easier than you think. You can even record an action, and then see the code you need to reproduce the action.

How to reduce Ipython parallel memory usage

I'm using Ipython parallel in an optimisation algorithm that loops a large number of times. Parallelism is invoked in the loop using the map method of a LoadBalancedView (twice), a DirectView's dictionary interface and an invocation of a %px magic. I'm running the algorithm in an Ipython notebook.
I find that the memory consumed by both the kernel running the algorithm and one of the controllers increases steadily over time, limiting the number of loops I can execute (since available memory is limited).
Using heapy, I profiled memory use after a run of about 38 thousand loops:
Partition of a set of 98385344 objects. Total size = 18016840352 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 5059553 5 9269101096 51 9269101096 51 IPython.parallel.client.client.Metadata
1 19795077 20 2915510312 16 12184611408 68 list
2 24030949 24 1641114880 9 13825726288 77 str
3 5062764 5 1424092704 8 15249818992 85 dict (no owner)
4 20238219 21 971434512 5 16221253504 90 datetime.datetime
5 401177 0 426782056 2 16648035560 92 scipy.optimize.optimize.OptimizeResult
6 3 0 402654816 2 17050690376 95 collections.defaultdict
7 4359721 4 323814160 2 17374504536 96 tuple
8 8166865 8 196004760 1 17570509296 98 numpy.float64
9 5488027 6 131712648 1 17702221944 98 int
<1582 more rows. Type e.g. '_.more' to view.>
You can see that about half the memory is used by IPython.parallel.client.client.Metadata instances. A good indicator that results from the map invocations are being cached is the 401177 OptimizeResult instances, the same number as the number of optimize invocations via lbview.map - I am not caching them in my code.
Is there a way I can control this memory usage on both the kernel and the Ipython parallel controller (who'se memory consumption is comparable to the kernel)?
Ipython parallel clients and controllers store past results and other metadata from past transactions.
The IPython.parallel.Client class provides a method for clearing this data:
Client.purge_everything()
documented here. There is also purge_results() and purge_local_results() methods that give you some control over what gets purged.

optimize hive query for multitable join

INSERT OVERWRITE TABLE result
SELECT /*+ STREAMTABLE(product) */
i.IMAGE_ID,
p.PRODUCT_NO,
p.STORE_NO,
p.PRODUCT_CAT_NO,
p.CAPTION,
p.PRODUCT_DESC,
p.IMAGE1_ID,
p.IMAGE2_ID,
s.STORE_ID,
s.STORE_NAME,
p.CREATE_DATE,
CASE WHEN custImg.IMAGE_ID is NULL THEN 0 ELSE 1 END,
CASE WHEN custImg1.IMAGE_ID is NULL THEN 0 ELSE 1 END,
CASE WHEN custImg2.IMAGE_ID is NULL THEN 0 ELSE 1 END
FROM image i
JOIN PRODUCT p ON i.IMAGE_ID = p.IMAGE1_ID
JOIN PRODUCT_CAT pcat ON p.PRODUCT_CAT_NO = pcat.PRODUCT_CAT_NO
JOIN STORE s ON p.STORE_NO = s.STORE_NO
JOIN STOCK_INFO si ON si.STOCK_INFO_ID = pcat.STOCK_INFO_ID
LEFT OUTER JOIN CUSTOMIZABLE_IMAGE custImg ON i.IMAGE_ID = custImg.IMAGE_ID
LEFT OUTER JOIN CUSTOMIZABLE_IMAGE custImg1 ON p.IMAGE1_ID = custImg1.IMAGE_ID
LEFT OUTER JOIN CUSTOMIZABLE_IMAGE custImg2 ON p.IMAGE2_ID = custImg2.IMAGE_ID;
I have a join query where i am joining huge tables and i am trying to optimize this hive query. Here are some facts about the tables
image table has 60m rows,
product table has 1b rows,
product_cat has 1000 rows,
store has 1m rows,
stock_info has 100 rows,
customizable_image has 200k rows.
a product can have one or two images (image1 and image2) and product level information are stored only in product table. i tried moving the join with product to the bottom but i couldnt as all other following joins require data from the product table.
Here is what i tried so far,
1. I gave the hint to hive to stream product table as its the biggest one
2. I bucketed the table (during create table) into 256 buckets (on image_id) and then did the join - didnt give me any significant performance gain
3. changed the input format to sequence file from textfile(gzip files) , so that it can be splittable and hence more mappers can be run if hive want to run more mappers
Here are some key logs from hive console. I ran this hive query in aws. Can anyone help me understand the primary bottleneck here ? This job is only processing a subset of the actual data.
Stage-14 is selected by condition resolver.
Launching Job 1 out of 11
Number of reduce tasks not specified. Estimated from input data size: 22
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Kill Command = /home/hadoop/bin/hadoop job -kill job_201403242034_0001
Hadoop job information for Stage-14: number of mappers: 341; number of reducers: 22
2014-03-24 20:55:05,709 Stage-14 map = 0%, reduce = 0%
.
2014-03-24 23:26:32,064 Stage-14 map = 100%, reduce = 100%, Cumulative CPU 34198.12 sec
MapReduce Total cumulative CPU time: 0 days 9 hours 29 minutes 58 seconds 120 msec
.
2014-03-25 00:33:39,702 Stage-30 map = 100%, reduce = 100%, Cumulative CPU 20879.69 sec
MapReduce Total cumulative CPU time: 0 days 5 hours 47 minutes 59 seconds 690 msec
.
2014-03-26 04:15:25,809 Stage-14 map = 100%, reduce = 100%, Cumulative CPU 3903.4 sec
MapReduce Total cumulative CPU time: 0 days 1 hours 5 minutes 3 seconds 400 msec
.
2014-03-26 04:25:05,892 Stage-30 map = 100%, reduce = 100%, Cumulative CPU 2707.34 sec
MapReduce Total cumulative CPU time: 45 minutes 7 seconds 340 msec
.
2014-03-26 04:45:56,465 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 3901.99 sec
MapReduce Total cumulative CPU time: 0 days 1 hours 5 minutes 1 seconds 990 msec
.
2014-03-26 04:54:56,061 Stage-26 map = 100%, reduce = 100%, Cumulative CPU 2388.71 sec
MapReduce Total cumulative CPU time: 39 minutes 48 seconds 710 msec
.
2014-03-26 05:12:35,541 Stage-4 map = 100%, reduce = 100%, Cumulative CPU 3792.5 sec
MapReduce Total cumulative CPU time: 0 days 1 hours 3 minutes 12 seconds 500 msec
.
2014-03-26 05:34:21,967 Stage-5 map = 100%, reduce = 100%, Cumulative CPU 4432.22 sec
MapReduce Total cumulative CPU time: 0 days 1 hours 13 minutes 52 seconds 220 msec
.
2014-03-26 05:54:43,928 Stage-21 map = 100%, reduce = 100%, Cumulative CPU 6052.96 sec
MapReduce Total cumulative CPU time: 0 days 1 hours 40 minutes 52 seconds 960 msec
MapReduce Jobs Launched:
Job 0: Map: 59 Reduce: 18 Cumulative CPU: 3903.4 sec HDFS Read: 37387 HDFS Write: 12658668325 SUCCESS
Job 1: Map: 48 Cumulative CPU: 2707.34 sec HDFS Read: 12658908810 HDFS Write: 9321506973 SUCCESS
Job 2: Map: 29 Reduce: 10 Cumulative CPU: 3901.99 sec HDFS Read: 9321641955 HDFS Write: 11079251576 SUCCESS
Job 3: Map: 42 Cumulative CPU: 2388.71 sec HDFS Read: 11079470178 HDFS Write: 10932264824 SUCCESS
Job 4: Map: 42 Reduce: 12 Cumulative CPU: 3792.5 sec HDFS Read: 10932405443 HDFS Write: 11812454443 SUCCESS
Job 5: Map: 45 Reduce: 13 Cumulative CPU: 4432.22 sec HDFS Read: 11812679475 HDFS Write: 11815458945 SUCCESS
Job 6: Map: 42 Cumulative CPU: 6052.96 sec HDFS Read: 11815691155 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 days 7 hours 32 minutes 59 seconds 120 msec
OK
The query is still taking longer than 5 hours in Hive where as in RDBMS it takes only 5 hrs. I need some help in optimizing this query, so that it executes much faster. Interestingly, when i ran the task with 4 large core instances, the time taken improved only by 10 mins compared to the run with 3 large instance core instances. but when i ran the task with 3 med cores, it took 1hr 10 mins more.
This brings me to the question, "is Hive even the right choice for such complex joins" ?
I suspect the bottleneck is just in sorting your product table, since it seems much larger than the others. I think joins with Hive for tables over a certain size become untenable, simply because they require a sort.
There are parameters to optimize sorting, like io.sort.mb, which you can try setting, so that more sorting occurs in memory, rather than spilling to disk, re-reading and re-sorting. Look at the number of spilled records, and see if this much larger than your inputs. There are a variety of ways to optimize sorting. It might also help to break your query up into multiple subqueries so it doesn't have to sort as much at one time.
For the stock_info , and product_cat tables, you could probably keep them in memory since they are so small ( Check out the 'distributed_map' UDF in Brickhouse ( https://github.com/klout/brickhouse/blob/master/src/main/java/brickhouse/udf/dcache/DistributedMapUDF.java ) For custom image, you might be able to use a bloom filter, if having a few false positives is not a real big problem.
To completely remove the join, perhaps you could store the image info in a keystone DB like HBase to do lookups instead. Brickhouse also had UDFs for HBase , like hbase_get and base_cached_get .

reducing jitter of serial ntp refclock

I am currently trying to connect my DIY DC77 clock to ntpd (using Ubuntu). I followed the instructions here: http://wiki.ubuntuusers.de/Systemzeit.
With ntpq I can see the DCF77 clock
~$ ntpq -c peers
remote refid st t when poll reach delay offset jitter
==============================================================================
+dispatch.mxjs.d 192.53.103.104 2 u 6 64 377 13.380 12.608 4.663
+main.macht.org 192.53.103.108 2 u 12 64 377 33.167 5.008 4.769
+alvo.fungus.at 91.195.238.4 3 u 15 64 377 16.949 7.454 28.075
-ns1.blazing.de 213.172.96.14 2 u - 64 377 10.072 14.170 2.335
*GENERIC(0) .DCFa. 0 l 31 64 377 0.000 5.362 4.621
LOCAL(0) .LOCL. 12 l 927 64 0 0.000 0.000 0.000
So far this looks OK. However I have two questions.
What exactly is the sign of the offset? Is .DCFa. ahead of the system clock or behind the system clock?
.DCFa. points to refclock-0 which is a DIY DCF77 clock emulating a Meinberg clock. It is connected to my Ubuntu Linux box with an FTDI usb-serial adapter running at 9600 7e2. I verified with a DSO that it emits the time with jitter significantly below 1ms. So I assume the jitter is introduced by either the FTDI adapter or the kernel. How would I find out and how can I reduce it?
Part One:
Positive offsets indicate time in the client is behind time on the server.
Negative offsets indicate that time in the client is ahead of time on the server.
I always remember this as "what needs to happen to my clock?"
+0.123 = Add 0.123 to me
-0.123 = Subtract 0.123 from me
Part Two:
Yes the USB serial converters add jitter. Get a real serial port:) You can also use setserial and tell it that the serial port needs to be low_latency. Just apt-get setserial.
Bonus Points:
Lose the unreferenced local clock entry. NO LOCL!!!!

Resources