libclang not emitting certain AST nodes - clang

I'm using the go-clang library to parse the following C file: aac.c. For some reason when I run the file through clang and dump the AST, I don't get AST output for certain functions. For example, the C file contains a forward declaration of aac_ioctl_send_raw_srb and the actual definition later on in the file.
Given this I was expecting to see two AST nodes in the output but only one FuncDecl (the forward declaration) is dumped:
clang -Xclang -ast-dump -fsyntax-only aac.c | grep "aac_ioctl_send_raw_srb" | wc -l
aac.c:38:10: fatal error: 'opt_aac.h' file not found
#include "opt_aac.h"
^
1 error generated.
1 <--- wc output
(Ignoring the error)
I get the same result using the go-clang library to parse the C file from within my own application. Is there any explanation for why the definition is not dumped?

I got some help in #llvm IRC and someone suggested that the errors actually are causing the issue. Even though other nodes are being emitted, LLVM may just be ignoring ones that it thinks require information that reside in the missing #includes.
I fixed the include paths and sure enough the nodes I was looking for were emitted.

Related

How does clang check redefinitions?

I'm new to Clang, and trying to write some clang-tidy checks. I want to find something that works as a "variable table", to check if some names are well-formed.
My intuition is like this:
To write redefinition code will sometimes cause an error, which is thrown out by Clang's diagnostics. like:
int main(){
int x;
int x; // error: redefinition
return 0;
}
From my perspective, clang may keep a dynamic variable table to check whether a new definition is compatible/overloading/error.
I tried to dive into clang source code and explored something:
Identifiertable, is kept by preprocessor, which marks all the identifiers, but does not do the semantic legal checking.
DeclContext, which seems to be an interface for users to use, a product produced by semantic checking.
My question is :
How Clang do the legal checking?
Am I able to get the variable table(If there exists such kind of things)?
If I cannot get such things, how could I know which variables are reachable from a location?
Thanks for your suggestions!
TLDR; see Answers below.
Discussion
All of your questions are related to one term of C standard, identifier, in C99-6.2.1-p1:
An identifier can denote an object; a function; a tag or a member of a structure, union, or
enumeration; a typedef name; a label name; a macro name; or a macro parameter.
Each identifier has its own scope, one of the following, according to C99-6.2.1-p2:
For each different entity that an identifier designates, the identifier is visible (i.e., can be
used) only within a region of program text called its scope.
Since what you are interested in are the variables inside a function (i.e., int x), then it should then obtain a block scope.
There is an process called linkage for the identifiers in the same scope, according to C99-6.2.2-p2:
An identifier declared in different scopes or in the same scope more than once can be
made to refer to the same object or function by a process called linkage.
This is exactly the one that put a constraint that there should be only one identifier for one same object, or in your saying, definition legally checking. Therefore compiling the following codes
/* file_1.c */
int a = 123;
/* file_2.c */
int a = 456;
would cause an linkage error:
% clang file_*
...
ld: 1 duplicate symbol
clang: error: linker command failed with exit code 1
However, in your case, the identifiers are inside the same function body, which is more likely the following:
/* file.c */
int main(){
int b;
int b=1;
}
Here identifier b has a block scope, which shall have no linkage, according to C99-6.2.2-p6:
The following identifiers have no linkage: an identifier declared to be anything other than
an object or a function; an identifier declared to be a function parameter; a block scope
identifier for an object declared without the storage-class specifier extern.
Having no linkage means that we cannot apply the rules mentioned above to it, that is, it should not be related to a linkage error kind.
It is naturally considered it as an error of redefinition. But, while it is indeed defined in C++, which is called One Definition Rule, it is NOT in C.(check this or this for more details) There is no exact definition for dealing with those duplicate identifiers in a same block scope. Hence it is an implementation-defined behavior. This might be the reason why with clang, the resulting errors after compiling the above codes (file.c) differs from the ones by gcc, as shown below:
(note that the term 'with no linkage' by gcc)
# ---
# GCC (gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04))
# ---
$ gcc file.c
file.c: In function ‘main’:
file.c:4:6: error: redeclaration of ‘b’ with no linkage
int b=1;
^
file.c:3:6: note: previous declaration of ‘b’ was here
int b;
^
# ---
# CLANG (Apple clang version 13.0.0 (clang-1300.0.29.3))
# ---
% clang file.c
file.c:4:6: error: redefinition of 'b'
int b;
^
file.c:3:6: note: previous definition is here
int b=1;
^
1 error generated.
Answers
With all things above, I think it suffices to answer your questions:
How clang perform the definition legally checking?
For global variables, either clang or gcc would follow the C standard rules, that is to say, they handle the so-called "redefinition errors" by the process called Linkage. For local variables, it is undefined behavior, or more precisely, implementation-defined behavior.
In fact, They both view the "redefinition" as an error. Although variable names inside a function body would be vanished after compiled (you can verify this in the assembly output), it is undoubtedly more natural and helpful for letting them be unique.
Am I able to get the variable table(If there exists such kind of things)?
Having not so much knowledge about clang internals, but according to the standards quoted above, along with an analysis of compiling, we can infer that IdentifierTable might not much fit your needs, since it exists in "preprocessing" stage, which is before "linking" stage. To take a look how clang compiler deals with duplicate variables (or more formally, symbols), and how to store them, you might want to check the whole project of lld, or in particular, SymbolTable.

Phred qual error after fastq trimming with Cutadapt

I would like to trim the beginning of all the reads in fastq file by a given length, before mapping to the genome with bowtie2. I have used Cutadapt:
cutadapt -u 48 -o output.fastq.gz input.fastq.gz
my fastq files after trimming looks like this:
gunzip -c output.fastq.gz | head
#NB502143:99:HFF7TAFX2:1:11101:4133:1019 1:N:0:ATCACG
CATGAAAAAGAGCTCATTTTCAGATGCAGGAATTCCTATCCG
+
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
#NB502143:99:HFF7TAFX2:1:11101:19790:1020 1:N:0:ATCACG
CATGATCCACTTTTCCACGCGCTTTGACGACCATTTTATAA
+
EEEEE<EEEEEEEEEEEEEEEEE<EE/EEAEEEEEEEEEEE
#NB502143:99:HFF7TAFX2:1:11101:6327:1020 1:N:0:ATCACG
CATGATCTCAGTAAAGGCATTTGTGGTTGTTAAGTAGCCATT
When I try to map it with bowtie2, I get the following error message:
Saw ASCII character 10 but expected 33-based Phred qual.
I don't get this error if I map input.fastq.gz, so I suspect something wrong is happening during the trimming but I can't figure out what!
I checked both files with FastQC and they're both Sanger / Illumina 1.9 encoded.
Thanks for your help.
I have been having a similar issue. The error occurs when I use cutadapt, but does not happen when I trim with another tool, fastp.
Checking the integrity of the resulting trimmed fastq files showed that some reads had no bases. A tool like fastq_info from the fastq_utils package would work.
If there is an issue, you might need to use the -m <minimum-length> flag when running cutadapt. This will remove reads below a designated length. Alignment after that should work if that was the issue.

Why does the nm tool output for the extern-only and defined-only options overlap?

I'll start by giving my understanding of the options:
extern-only: Show me only those symbols which are referenced by the binary but whose definitions (the code or variable) will be provided by another binary
defined-only: Show me only those symbols whose definitions are contained in the binary.
Here are my commands and their output:
$nm -defined-only GenerationOfNow | grep FIRAZeroingWeakContainer
000000010002c128 t -[FIRAZeroingWeakContainer .cxx_destruct]
000000010002c0fb t -[FIRAZeroingWeakContainer object]
000000010002c114 t -[FIRAZeroingWeakContainer setObject:]
000000010021a218 S _OBJC_CLASS_$_FIRAZeroingWeakContainer
00000001002177f8 s _OBJC_IVAR_$_FIRAZeroingWeakContainer._object
000000010021a1f0 S _OBJC_METACLASS_$_FIRAZeroingWeakContainer
$nm -extern-only GenerationOfNow | grep FIRAZeroingWeakContainer
000000010021a218 S _OBJC_CLASS_$_FIRAZeroingWeakContainer
000000010021a1f0 S _OBJC_METACLASS_$_FIRAZeroingWeakContainer
As you can see, the -extern-only output is a subset of the -defined-only output. Why? Perhaps my question should be: What is the meaning of those items which have a S in the second column?
You're confusing -extern-only with -undefined-only.
There are two concepts that are being mixed here:
extern vs. local (in C extern vs. static, "local" is sometimes also called "private")
defined vs. undefined
The former describes the availability of a symbol while the latter describes its origin. And yes, even the notion of a private undefined symbol exists, as per man nm:
Each symbol name is preceded by its value (blanks if undefined). [...] A lower case u in a dynamic shared library indicates a undefined reference to a private external in another module in the same library.
Now, when using -undefined-only you actually do get the complement of -undefined-only
bash$ nm test.dylib
0000000000000f60 T _derp
0000000000000f70 t _herp
U _printf
U dyld_stub_binder
bash$ nm -defined-only test.dylib
0000000000000f60 T _derp
0000000000000f70 t _herp
bash$ nm -undefined-only test.dylib
_printf
dyld_stub_binder
bash$ nm -extern-only test.dylib
0000000000000f60 T _derp
U _printf
U dyld_stub_binder
-extern-only does not seem to have a complementary flag however.

OpenCV 3.1: CMake error if source or bin path contains "++"

If the source or binary path in CMake contain the character sequence "++" (without quotation marks) I get a CMake error when trying to create a project for OpenCV 3.1:
CMake Error at cmake/OpenCVUtils.cmake:76 (if):
if given arguments:
"G:/Desktop/C++ projects/project" "MATCHES" "^G:/Desktop/C++ projects/sources" "OR" "G:/Desktop/C++ projects/project" "MATCHES" "^G:/Desktop/C++ projects/project"
Regular expression "^G:/Desktop/C++ projects/sources" cannot compile
Call Stack (most recent call first):
CMakeLists.txt:437 (ocv_include_directories)
Apparently this line inside OpenCVUtils causes the problem:
if("${__abs_dir}" MATCHES "^${OpenCV_SOURCE_DIR}" OR "${__abs_dir}" MATCHES "^${OpenCV_BINARY_DIR}")
I noticed the problem because I have a folder called "C++ Projects" where I keep C++ projects and libraries. Does anyone know, why the sequence causes the problem and if there is a quick way to fix this? I will also report this as a bug in the OpenCV bug tracker.
+ is a special character used in pattern matching (documentation). The MATCHES indicates a pattern matching.
Either the strings have to be escaped first or the real fix would be to test whether __abs_dir is the beginning of the string of OpenCV_SOURCE_DIR:
string(FIND "${OpenCV_SOURCE_DIR}" "${__abs_dir}" strPosSrc)
string(FIND "${OpenCV_SOURCE_DIR}" "${__abs_dir}" strPosBin)
if (strPosSrc EQUALS 0 OR strPosBin EQUALS 0)
So basically it is a bug in OpenCV. Ask them to fix it.
Missing CMake feature
Overall I think it is a missing CMake feature that it does not provide a method to escape input strings.
There are bugs that could be solved by such a function:
https://cmake.org/Bug/view.php?id=15908
https://cmake.org/Bug/view.php?id=10365

Showing full expected and value information when ?_assertEqual fails

I'm coding a unit test where a (rather lengthy) binary is generated, and I want to assert that the generated binary equals the one I expect to be generated. I'm running eunit through "rebar eunit".
Thing is, when this assertion fails, the output is abreviated with "...", and I want to see the complete output so I can spot where the difference is.
I'm now using "?debugFmt()" as a temporary solution, but I'd like to know if there's an alternative to it (a config option or argument somewhere that can be applied to "?_assertEqual()" so the output is only shown when the assertion fails).
Thanks in advance!
EDIT: Due to legoscia's answer, I'm including a test sample using a test generator, with multiple asserts:
can_do_something(SetupData) ->
% ... some code ...
[?_assertEqual(Expected1, Actual1), ?_assertEqual(Expected2, Actual2)].
The best I can think of for actually showing the value in the console is something like this:
Actual =:= Expected orelse ?assert(?debugFmt("~p is not ~p", [Actual, Expected]))
?debugFmt returns ok, which is not true, so the assertion will always fail.
Alternatively, to use it as a test generator, the entire thing can be put inside ?_assert:
?_assert(Actual =:= Expected orelse ?debugFmt("~p is not ~p", [Actual, Expected]))
The way I usually achieve this is by having Eunit output XML files (in "Surefire" format, AKA "Junit" format). The XML files have much higher limits for term print depth, and thus probably contain the information you need.
Add this to your rebar.config:
{eunit_opts,
[verbose,
%% eunit truncates output from tests - capture full output in
%% XML files in .eunit
{report,{eunit_surefire,[{dir,"."}]}}]}.
Then you can find the results for module foo in .eunit/TEST-foo.xml. I find the files quite readable in a text editor.
1). Open your eunit sources. In my system:
cd /usr/lib/erlang/lib/eunit-2.3.2/src
2). Edit eunit_lib.erl in such way:
diff
54c54
< format_exception(Exception, 20).
---
> format_exception(Exception, 99999).
3). sudo erlc -I ../include eunit_lib.erl
4). mv eunit_lib.beam ../ebin
5). Have a good day))
This PR introduces print_depth option to eunit:test/2:
eunit:test(my_test, [{print_depth, 200}]).
It should be available starting from OTP-23.
Setting print_depth to a larger number will decrease truncation of the output.

Resources