Keep Gcov Test name in GCDA files - gcov

After having performed my test coverage on my product using lcov (for C++ dev), i'd like to draw a matrix to have the correspondence between the test name and the files it covers.
The idea is to have a quick view of the code covered by 1 test file.
eg:
xxxx |file 1 |file 2 |file 3 |file 4 | file 5 |
test 1 | YES | NO | YES | YES | YES |
test 2 | YES | NO | NO | No | NO |
test 3 | YES | YES | NO | NO | YES |
In my project, I need to run thousands of tests to check the coverage of thousands of files, so the matrix will be huge.
Unfortunately, it seems that by design GCOV does not works this way, because we will have only one set of gcda files that covers the whole code, and it looks not possible to determines which test covers which part of the code.
The only solution I could imagine is the following one:
for current_test in all_tests do:
run 1 current_test
retrieve gcda -> .info file
extract from the .info file the name of covered code files
append the matrix with current_test / code filename
The problem is that it will be extremely long, because to do so, it will take around 5 min for 1 test... I'll spend weeks to wait...
Any idea would be very welcomed.
Thanks a lot for your help.
Regards,
Thomas

Unfortunately the gcov data does not include test names, and they must be added in post-processing. Therefore, your sequential loop is the sensible approach if you stay within gcov-based coverage collection.
Workarounds you can try:
Run your tests with an appropriate GCOV_PREFIX variable so that the coverage is written into a different directory, rather than next to your object files.
Use a different coverage tool. E.g. kcov performs runtime instrumentation and writes the coverage results into a directory you specify. However, the coverage data formats are not usable for gcov-based tools.
Distribute your tests across multiple machines.
My guess is that GCOV_PREFIX is likely to work in your scenario so that you can easily run your tests in parallel. This variable is a bit fiddly because you need to know the absolute paths of your object files, but it's probably easier to figure that out than it is to wait multiple days for your coverage matrix.

Related

Algorithms for correlation of events/issues

We are working on a system that aims to help development teams, SRE, DevOps team members by debugging many of the well known infrastructure issues (k8s to begin with) on their behalf and generate a detailed report which details report which details the specifics of the issue, possible root causes and clear next steps for the users facing the problem. In short, instead of you having to open up terminal, run several commands to arrive at an issue, a system does it for you and show it in a neat UI. We plan to leverage AI to provide better user experiences.
Questions:
1.There are several potential use case like predictive analytics, anomaly detection, forecasting, etc. We will not analysis application logs or metrics (may include metrics in future). Unlike application level logs, the platform logs are more unified. What is a good starting point for AI usage especially for platform based logs?
2.We plant to use AI to analysis issue correlations, we Apyori, FP Growth and got output. The output looks like below
| antecedent | consequent | confidence | lift |
|----------------------------|-------------------| ---------- | ---- |
| [Failed, FailedScheduling] | [BackOff] | 0.75 | 5.43 |
| [NotTriggerScaleUp] | [FailedScheduling]| 0.64 | 7.29 |
| [Failed] | [BackOff] | 0.52 | 3.82 |
| [FailedCreatePodSandBox] | [FailedScheduling]| 0.51 | 5.88 |
FP Growth is data mining algorithm, from the output we can figure the pattern of events. There is one potential use case, save the previous output and compare it with latest output to detect abnormal pattern in the latest output. Can we use the output to inference issue correlations or any other scenario we can use the output?
3.Some logs seems irrelevant, but actually they have connections, like one host has issue, it will impact the applications running on it, the time span maybe long, how can we figure out this kind of relationships?
Any comments and suggestions will be greatly appreciated, thank you in advance.

Optimizing repeated transformations in Apache Beam/DataFlow

I wonder if Apache Beam.Google DataFlow is smart enough to recognize repeated transformations in the dataflow graph and run them only once. For example, if I have 2 branches:
p | GroupByKey() | FlatMap(...)
p | combiners.Top.PerKey(...) | FlatMap(...)
both will involve grouping elements by key under the hood. Will the execution engine recognize that GroupByKey() has the same input in both cases and run it only once? Or do I need to manually ensure that GroupByKey() in this case proceeds all branches where it gets used?
As you may have inferred, this behavior is runner-dependent. Each runner implements its own optimization logic.
The Dataflow Runner does not currently support this optimization.

How to analyse z3 performance issues?

I have 37 similar SMT2 problems, each in two equisatisfiable versions that I call compact and unrolled. The problems are using incremental SMT solving and every (check-sat) returns unsat. The compact versions are using the QF_AUFBV logic, the unrolled versions use QF_ABV. I did run them in z3, yices, and boolector (but boolector only supports the unrolled version). The results of this performance evaluation can be found here:
http://scratch.clifford.at/compact_smt2_enc_r1102.html
The SMT2 files for this examples can be downloaded from here (~10 MB):
http://scratch.clifford.at/compact_smt2_enc_r1102.zip
I run each solver 5 times with different values for the :random-seed option. (Except boolector which does not support :random-seed. So I simply run boolector 5 times on the same input.) The variation I get when running the solvers with different :random-seed is relatively small (see +/- values in the table for the max outlier).
There is a wide spread between solvers. Boolector and Yices are consistently faster than z3, in some cases up to two orders of magnitude.
However, my question is about "z3 vs z3" performance. Consider for example the following data points:
| Test Case | Z3 Median Runtime | Max Outlier |
|-------------------|-------------------|-------------|
| insn_add unrolled | 873.35 seconds | +/-  0% |
| insn_add compact | 1837.59 seconds | +/- 1% |
| insn_sub unrolled | 4395.67 seconds | +/- 16% |
| insn_sub compact | 2199.21 seconds | +/- 5% |
The problems insn_add and insn_sub are almost identical. Both are generated from Verilog using Yosys and the only difference is that insn_add is using this Verilog module and insn_sub is using this one in its place. Here is the diff between those two source files:
--- insn_add.v 2017-01-31 15:20:47.395354732 +0100
+++ insn_sub.v 2017-01-31 15:20:47.395354732 +0100
## -1,6 +1,6 ##
// DO NOT EDIT -- auto-generated from generate.py
-module rvfi_insn_add (
+module rvfi_insn_sub (
input rvfi_valid,
input [ 32 - 1 : 0] rvfi_insn,
input [`RISCV_FORMAL_XLEN - 1 : 0] rvfi_pc_rdata,
## -29,9 +29,9 ##
wire [4:0] insn_rd = rvfi_insn[11: 7];
wire [6:0] insn_opcode = rvfi_insn[ 6: 0];
- // ADD instruction
- wire [`RISCV_FORMAL_XLEN-1:0] result = rvfi_rs1_rdata + rvfi_rs2_rdata;
- assign spec_valid = rvfi_valid && insn_funct7 == 7'b 0000000 && insn_funct3 == 3'b 000 && insn_opcode == 7'b 0110011;
+ // SUB instruction
+ wire [`RISCV_FORMAL_XLEN-1:0] result = rvfi_rs1_rdata - rvfi_rs2_rdata;
+ assign spec_valid = rvfi_valid && insn_funct7 == 7'b 0100000 && insn_funct3 == 3'b 000 && insn_opcode == 7'b 0110011;
assign spec_rs1_addr = insn_rs1;
assign spec_rs2_addr = insn_rs2;
assign spec_rd_addr = insn_rd;
But their behavior in this benchmark is very different: Overall the performance for insn_sub is much worse than the performance for insn_add. Furthermore, in the case of insn_add the unrolled version runs about twice as fast as the compact version, but in the case of insn_sub the compact version runs about twice as fast as the unrolled version.
Here are the times before creating the median. The :random-seed setting obviously does not seem to make much of a difference:
insn_add unrolled: 868.15 873.34 873.35 873.36 874.88
insn_add compact: 1828.70 1829.32 1837.59 1843.74 1867.13
insn_sub unrolled: 3204.06 4195.10 4395.67 4539.30 4596.05
insn_sub compact: 2003.26 2187.52 2199.21 2206.04 2209.87
Since the value of :random-seed does not seem to have much of an effect, I would assume there is something intrinsic to those .smt2 files that makes them fast or slow on z3. How would I investigate this? How would I find out what makes the fast cases fast and the slow cases slow, so I can avoid whatever makes the slow cases slow? (Yes, I know that this is a very broad question. Sorry. :)
<edit>
Here are some more concrete questions along the lines of my primary question. This questions are directly inspired by the obvious differences I can see between the (compact) insn_add and insn_sub benchmarks.
Can the order of (declare-..) and (define-..) statements in my SMT input influence performance?
Can changing the names of declared or defined function influence performance?
If I split a BV into smaller BVs, and then concatenate them back again, can this influence performance?
If I either compare two BVs for equality, or split the BV into single bit variables and compare each of the bits individually, can this influence performance?
Also: What operations in z3 do actually change when I chose a different value for :random-seed?
</edit>
Making small changes to the .smt2 files without changing the semantics can be very difficult for large test cases generated by complex tools. I'm hoping there are other things I can try first, or maybe there is some existing expert knowledge about the kind of changes that might be worth investigating. Or alternatively: What kind of changes would effectively be equivalent to changing :random-seed and thus are not worth investigating.
(Tests performed with git rev c67cf16, i.e. current git head of z3 on an AWS EC2 c4.8xlarge instance with make -j40. The runtimes are CPU seconds, not wall-clock seconds.)
Edit 2:
I have now three test cases (test1.smt2, test2.smt2, and test3.smt2) that are identical except that I've renamed some of the functions I declare/define. The test cases can be found at http://svn.clifford.at/handicraft/2017/z3perf/.
This is a variation of the original problem that takes ~2 minutes to solve instead of ~1 hour. As before, changing the value of :random-seed only has a marginal effect. But renaming some of the functions without changing anything else changes the runtime by more than 2x:
I've now opened an issue on github, arguing that :random-seed should by tied into whatever thing I change randomly inside z3 when I rename the functions in my SMT2 code.
As you say, there can be many things that may be creating that perf difference in add vs sub.
A good start is to check if the formulas after preprocessing are equal modulo add/sub (btw, Z3 converts 'a - b' into 'a + (-1) * b'). If not, then trace down which preprocessing step is at fault. Then trace down the problem and send us a patch :)
Alternatively, the problem could be down the line, e.g., in the bitblaster. You can also dump the bit-blasted formulas of both of your files and check if there is a significant difference in terms of number of variables and/or clauses.
Anyway, you'll need to be prepared to invest a day or two (maybe more) to track down these issues. If you find something, let us know and/or send us a patch! :)

Find size contributed by each external library on iOS

I'm trying to reduce my app store binary size and we have lots of external libs that might be contributing to the size of the final ipa. Is there any way to find out how much each external static lib takes up in the final binary (Other than going about removing each one ?) ?
All of this information is contained in the link map, if you have the patience for sifting through it (for large apps, it can be quite large). The link map has a listing of all the libraries, their object files, and all symbols that were packaged into your app, all in human-readable text. Normally, projects aren't configured to generate them by default, so you'll have to make a quick project file change.
From within Xcode:
Under 'Build Settings' for your target, search for "map"
In the results below, under the 'Linking' section, set 'Write Link Map File' to "Yes"
Make sure to make note of the full path and file name listed under 'Path to Link Map File'
The next time you build your app you'll get a link map dumped to that file path. Note that the path is relative to your app's location in the DerivedData folder (usually ~/Library/Developer/Xcode/DerivedData/<your-app-name>-<random-string-of-letters-and-numbers>/Build/Intermediates/..., but YMMV). Since it's just a text file, you can read it with any text editor.
The contents of the link map are divided into 3 sections, of which 2 will be relevant to what you're looking for:
Object Files: this section contains a listing of all of the object files included in your final app, including your own code and that of any third-party libraries you've included. Importantly, each object file also lists the library where it came from;
Sections: this section, not relevant to your question, contains a list of the processor segments and their sections;
Symbols: this section contains the raw data that you're interested in: a list of all symbols/methods with their absolute location (i.e. address in the processor's memory map), size, and most important of all, a cross-reference to their containing object module (under the 'File' column).
From this raw data, you have everything you need to do the required size calculation. From #1, you see that, for every library, there are N possible constituent object modules; from #2, you see that, for every object module, there are M possible symbols, each occupying size S. For any given library, then, your rough order of size will be something like O(N * M * S). That's only to give you an indication of the components that would go into your actual calculations, it's not any sort of a useful formula. To perform the calculation itself, I'm sorry to say that I'm not aware of any existing tools that will do the requisite processing for you, but given that the link map is just a text file, with a little script magic and ingenuity you can construct a script to do the heavy lifting.
For example, I have a little sample project that links to the following library: https://github.com/ColinEberhardt/LinqToObjectiveC (the sample project itself is from a nice tutorial on ReactiveCocoa, here: http://www.raywenderlich.com/62699/reactivecocoa-tutorial-pt1), and I want to know how much space it occupies. I've generated a link map, TwitterInstant-LinkMap-normal-x86_64.txt (it runs in the simulator). In order to find all object modules included by the library, I do this:
$ grep -i "libLinqToObjectiveC.a" TwitterInstant-LinkMap-normal-x86_64.txt
which gives me this:
[ 8] /Users/XXX/Library/Developer/Xcode/DerivedData/TwitterInstant-ecppmzhbawtxkwctokwryodvgkur/Build/Products/Debug-iphonesimulator/libLinqToObjectiveC.a(LinqToObjectiveC-dummy.o)
[ 9] /Users/XXX/Library/Developer/Xcode/DerivedData/TwitterInstant-ecppmzhbawtxkwctokwryodvgkur/Build/Products/Debug-iphonesimulator/libLinqToObjectiveC.a(NSArray+LinqExtensions.o)
[ 10] /Users/XXX/Library/Developer/Xcode/DerivedData/TwitterInstant-ecppmzhbawtxkwctokwryodvgkur/Build/Products/Debug-iphonesimulator/libLinqToObjectiveC.a(NSDictionary+LinqExtensions.o)
The first column contains the cross-references to the symbol table that I need, so I can search for those:
$ cat TwitterInstant-LinkMap-normal-x86_64.txt | grep -e "\[ 8\]"
which gives me:
0x100087161 0x0000001B [ 8] literal string: PodsDummy_LinqToObjectiveC
0x1000920B8 0x00000008 [ 8] anon
0x100093658 0x00000048 [ 8] l_OBJC_METACLASS_RO_$_PodsDummy_LinqToObjectiveC
0x1000936A0 0x00000048 [ 8] l_OBJC_CLASS_RO_$_PodsDummy_LinqToObjectiveC
0x10009F0A8 0x00000028 [ 8] _OBJC_METACLASS_$_PodsDummy_LinqToObjectiveC
0x10009F0D0 0x00000028 [ 8] _OBJC_CLASS_$_PodsDummy_LinqToObjectiveC
The second column contains the size of the symbol in question (in hexadecimal), so if I add them all up, I get 0x103, or 259 bytes.
Even better, I can do a bit of stream hacking to whittle it down to the essential elements and do the addition for me:
$ cat TwitterInstant-LinkMap-normal-x86_64.txt | grep -e "\[ 8\]" | grep -e "0x" | awk '{print $2}' | xargs printf "%d\n" | paste -sd+ - | bc
which gives me the number straight up:
259
Doing the same for "\[ 9\]" (13016 bytes) and "\[ 10\]" (5503 bytes), and adding them to the previous 259 bytes, gives me 18778 bytes.
You can certainly improve upon the stream hacking I've done here to make it a bit more robust (in this implementation, you have to make sure you get the exact number of spaces right and quote the brackets), but you at least get the idea.
Make a .ipa file of your app and save it in your system.
Then open the terminal and execute the following command:
unzip -lv /path/to/your/app.ipa
It will return a table of data about your .ipa file. The size column has the compressed size of each file within your .ipa file.
I think you should be able to extract the information you need from this:
symbols -w -noSources YourFileHere
Ref: https://devforums.apple.com/message/926442#926442
IIRC, it isn't going to give you clear summary information on each lib, but you should find that the functions from each library should be clustered together, so with a bit of effort you can calculate the approximate contribution from each lib:
Also make sure that you set Generate Debug Symbols to NO in your build settings. This can reduce the size of your static library by about 30%.
In case it's part of your concern, a static library is just the relevant .o files archived together plus some bookkeeping. So a 1.7mb static library — even if the code within it is the entire 1.7mb — won't usually add 1.7mb to your product. The usual rules about dead code stripping will apply.
Beyond that you can reduce the built size of your code. The following probably isn't a comprehensive list.
In your target's build settings look for 'Optimization Level'. By switching that to 'Fastest, Smallest -Os' you'll permit the compiler to sacrifice some speed for size.
Make sure you're building for thumb, the more compact ARM code. Assuming you're using LLVM that means making sure you don't have -mno-thumb anywhere in your project settings.
Also consider which architectures you want to build for. Apple doesn't allow submission of an app that supports both ARMv6 and the iPhone 5 screen and have dropped ARMv6 support entirely from the latest Xcode. So there's probably no point including that at this point.

Does awk store an output file in RAM?

I was doing a simple parsing like awk '{print $3 > "file1.txt"}'
What I noticed was that awk is taking up too much of RAM (the files were huge). Does streaming awk output to file consume memory? Does this work like stream write or does the file remain open till the program terminates?
The exact command that I gave was:
for i in ../../*.txt; do j=${i#*/}; mawk -v f=${j%.txt} '{if(NR%8<=4 && NR%8!=0){print >f"_1.txt" } else{print >f"_2.txt"}}' $i & done
As evident I used mawk. The five input files were around 6GB each and when I ran top I saw 22% memory ~5GB being taken up by each mawk process at its peak. I noticed it because my system was hanging because of low memory.
I am particularly sure that redirection outside awk consumes negligible memory. Have done it several times with files much larger than this and operations more complex than this; I never faced this problem. Since I had to copy different sections of the input files to different output files, I used redirection inside awk.
I know there are other ways to implement this task and in any case my job is done without much issues. All I was interested in is how awk works when writing to a file.
I am not sure if this question is better suited for Superuser.

Resources