I have a utility that uses Clang's LibTooling framework to parse the AST and perform static code analysis. I am using LLVM and Clang v10.0.
Recently I observed that the utility never finishes parsing the AST of a particular file. On debugging, I observed that the SourceManager.cpp calls an abort due to failed assertion. The exact place is here:
FileID SourceManager::createFileID(const ContentCache *File,
SourceLocation IncludePos,
SrcMgr::CharacteristicKind FileCharacter,
int LoadedID, unsigned LoadedOffset) {
...
...
assert(NextLocalOffset + FileSize + 1 > NextLocalOffset &&
NextLocalOffset + FileSize + 1 <= CurrentLoadedOffset &&
"Ran out of source locations!");
...
...
...
}
The values of the variables when the assertion fails are: NextLocalOffset=2147335549, FileSize=303516, CurrentLoadedOffset=2147483648 and (NextLocalOffset + FileSize)=2147639065.
The source file is automatically generated and is around 28,268,746 bytes (~27MB) and contains several include directives (~7000) for memory mapping different code blocks.
Is there a limit to the source files size Clang can process?
Related
I have to work with opencv in an android project. Everything worked fine until I recently had to use c++ exception_ptr as well.
Since then, the use of std::rethrow_exception causes a SIGBUS (signal SIGBUS: illegal alignment).
I created a minimal example to illustrate the problem. The example application only links to opencv 3.4.4 but does not use any opencv function. If you remove the linking to opencv in CMakeLists.txt the app works fine and doesn't crash. If you add it however, the app will crash as soon as the native method triggerException() is called.
In my implementation the example application calls this method if a button is pressed.
native-lib.cpp:
#include <jni.h>
#include <string>
#include <exception>
/*
* code based on: https://en.cppreference.com/w/cpp/error/exception_ptr
*/
std::string handle_eptr2(std::exception_ptr eptr)
{
try {
if (eptr) {
std::rethrow_exception(eptr);
}
} catch (const std::exception &e) {
return "Caught exception \"" + std::string(e.what()) + "\"\n";
}
return "Something went wrong";
}
extern "C" JNIEXPORT jstring JNICALL
Java_com_example_user_exceptiontest_MainActivity_triggerException(
JNIEnv *env,
jobject /* this */) {
std::exception_ptr eptr;
try {
std::string().at(1); // this generates an std::out_of_range
} catch(...) {
eptr = std::current_exception(); // capture
}
std::string res = handle_eptr2(eptr);
return env->NewStringUTF(res.c_str());
}
CMakeLists.txt
cmake_minimum_required(VERSION 3.4.1)
set(OPENCV_DIR $ENV{HOME}/lib/OpenCV-android-sdk/sdk )
include_directories(${OPENCV_DIR}/native/jni/include )
add_library( native-lib
SHARED
src/main/cpp/native-lib.cpp)
find_library( log-lib
log)
target_link_libraries(
native-lib
# Removing the following line will make everything work as expected (what() message is returned)
${OPENCV_DIR}/native/libs/${ANDROID_ABI}/libopencv_java3.so # <--- critical line
${log-lib})
build.gradle
To use exceptions and c++17 support, I added the following lines to the configuration that is created by android-studio.
externalNativeBuild {
cmake {
arguments '-DANDROID_TOOLCHAIN=clang',
'-DANDROID_STL=c++_shared'
cppFlags "-std=c++1z -frtti -fexceptions"
}
}
Stacktrace:
<unknown> 0x004c4e47432b2b01
___lldb_unnamed_symbol15856$$libopencv_java3.so 0x0000007f811c4a58
_Unwind_Resume_or_Rethrow 0x0000007f811c4fc8
__cxa_rethrow 0x0000007f81181e50
__gnu_cxx::__verbose_terminate_handler() 0x0000007f811b1580
__cxxabiv1::__terminate(void (*)()) 0x0000007f81181c54
std::terminate() 0x0000007f81181cc0
std::rethrow_exception(std::exception_ptr) 0x0000007f802db2cc
handle_eptr2(std::exception_ptr) native-lib.cpp:35
::Java_com_example_user_exceptiontest_MainActivity_triggerException(JNIEnv *, jobject) native-lib.cpp:58
While searching for a solution I looked at the opencv sources (https://github.com/opencv/opencv/blob/master/modules/core/src/parallel.cpp) and stumbled upon this code snippet:
#ifndef CV__EXCEPTION_PTR
# if defined(__ANDROID__) && defined(ATOMIC_INT_LOCK_FREE) && ATOMIC_INT_LOCK_FREE < 2
# define CV__EXCEPTION_PTR 0 // Not supported, details: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58938
I'd understand if this changes the behavior of opencv, but I don't get how this might influence code that does not use opencv at all.
EDIT: It is also worth mentioning that linking to opencv has no impact if I use this code directly (without jni) in a linux (x86_64) desktop setting (clang, libc++, opencv3.4.4). Thus, my conclusion that it is an android specific problem...
Does anyone has an idea how to solve that issue or what to try next?
Thanks a lot in advance!
Opencv is compiled with gnu runtime while you are using c++ stl. See One STL per app. You will need to either use gnustl (you will need to go back to ndk 15 for that) or build opencv with c++ stl.
In order to build opencv with c++_static you can try to follow comment in opencv bugtracker
cmake -GNinja -DINSTALL_ANDROID_EXAMPLES=ON
-DANDROID_EXAMPLES_WITH_LIBS=ON -DBUILD_EXAMPLES=ON -DBUILD_DOCS=OFF -DWITH_OPENCL=OFF -DWITH_IPP=ON -DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK}/build/cmake/android.toolchain.cmake
-DANDROID_TOOLCHAIN=clang "-DANDROID_STL=c++_static" -DANDROID_ABI=x86 -DANDROID_SDK_TARGET=18 ../opencv
Followed by
make && make install
I'm using MPLAB X (3.26) with a PIC32 on windows (XC32 v1.40 compiler). I'm trying to use splint to do static code analysis on someones code as part of a review. I've got most of the compiler defines and search paths sorted, but are a bit stumped when it comes to avoiding the parse errors in the PIC32 std include files.
The command I am using to run splint is
splint ^
-D"__32MX370F512L__" ^
-D"__PIC32_FEATURE_SET__"=370 ^
-D"__LANGUAGE_C__" ^
+I"C:/Program Files (x86)/Microchip/xc32/v1.40/pic32mx/include/" ^
main.c
The output then gives
< Location unknown >: Field name reused:
Code cannot be parsed. For help on parse errors, see splint -help
parseerrors. (Use -syntax to inhibit warning)
< Location unknown >: Previous use of
< Location unknown >: Previous use of
.... approx 100 times then...
C:\Program Files (x86)\Microchip\xc32\v1.40\pic32mx\include\\stddef.h(4,18):
Datatype ptrdiff_t declared with inconsistent type: long int
A function, variable or constant is redefined with a different type. (Use
-incondefs to inhibit warning)
load file standard.lcd: Specification of ptrdiff_t: arbitrary integral type
C:\Program Files (x86)\Microchip\xc32\v1.40\pic32mx\include\\stddef.h(5,27):
Datatype size_t declared with inconsistent type: unsigned long int
load file standard.lcd: Specification of size_t:
arbitrary unsigned integral type
C:\Program Files (x86)\Microchip\xc32\v1.40\pic32mx\include\\stddef.h(6,13):
Datatype wchar_t declared with inconsistent type: int
load file standard.lcd: Specification of wchar_t: arbitrary integral type
C:\Program Files (x86)\Microchip\xc32\v1.40\pic32mx\include\\stdarg.h(75,36):
No type before declaration name (implicit int type): __builtin_va_list :
int
A variable declaration has no explicit type. The type is implicitly int.
(Use -imptype to inhibit warning)
C:\Program Files (x86)\Microchip\xc32\v1.40\pic32mx\include\\stdarg.h(75,36):
Parse Error: Suspect missing struct or union keyword: __builtin_va_list :
int. (For help on parse errors, see splint -help parseerrors.)
*** Cannot continue.
The last one causes things to stop. I've tried things like -skip-iso-headers with no luck. It seems it is seeing issues with its standard.lcd file and the xc32 std files
Can anyone tell me
What the < Location unknown >: Field name reused: means or possibly is referring to?
A way to resolve the parse error due to the std header files?
So far only way to solve the header file issue is to define the types, e.g.
-D"__builtin_va_list"=int ^
I think your code (or some code that you #include) is using anonymous bitfields or/and structs. Anonymous structs and anonymous unions are provided by a GNU extension for versions of C earlier than C11. Since Splint doesn't know about C11 (I only found mentions of C99 in the manual, and google agrees) and only partial support for the GNU extensions (search for gnu-extensions), it has a hard time parsing them.
I had a similar problem with some code written for a PIC18f46k22, though I was using sdcc instead of XC8.
The issue was with pic18f46k22.h, which had anonymous structs (bitfields, specifically) inside a typedef union.
This code...
typedef union
{
struct
{
unsigned name0 : 1;
unsigned name1 : 1;
unsigned name2 : 1;
unsigned name3 : 1;
unsigned name4 : 1;
unsigned : 1;
unsigned : 1;
unsigned : 1;
};
struct
{
unsigned name : 6;
unsigned : 2;
};
} __NAMEbits_t;
...would produce these errors...
< Location unknown >: Field name reused:
Code cannot be parsed. For help on parse errors, see splint -help
parseerrors. (Use -syntax to inhibit warning)
< Location unknown >: Previous use of
...but this code wouldn't.
struct indv
{
unsigned name0 : 1;
unsigned name1 : 1;
unsigned name2 : 1;
unsigned name3 : 1;
unsigned name4 : 1;
unsigned : 1;
unsigned : 1;
unsigned : 1;
};
struct all
{
unsigned name : 6;
unsigned : 2;
};
typedef union
{
struct indv individualbits;
struct all allbits;
} __NAMEbits_t;
I am working with a different processor, compiler, and static analysis tool (PRQA / Helix QAC), but I think we are facing the same problem regarding the parse issue of the standard header files. It took me some time to figure out what is going on.
For one thing, I can say that your workaround is good enough and apparently you should not worry about it too much. I used a slightly different workaround described here:
Pycparser not working on preprocessed code
-D __builtin_va_list = struct __builtin_va_list {}
I guess another way would be to use stub standard headers instead of the real ones. My tool manual claims, for example, that there should be such header files supplied with the tool, although I haven't found/obtained them yet.
After compiling an application with clang 3.6 using -fsanitize=undefined,
I'm trying to start the instrumented program while using a suppression file to ignore some of the errors:
UBSAN_OPTIONS="suppressions=ubsan.supp" ./app.exe
The suppression file ubsan.supp contains:
signed-integer-overflow:example.c
This leads to an error message:
UndefinedBehaviorSanitizer: failed to parse suppressions
The same occurs with a gcc 4.9 build.
The only documentation I can find is http://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html, which is for clang 3.9, while I use 3.6 (which doesn't have documentation for ubsan included).
Can anyone provide working examples for ubsan suppression files, that work in clang 3.6?
Edit: By browsing the source code of ubsan, I found that the only valid suppression type might be "vptr_check" - dunno which version I was looking at though.
Can anyone confirm that in clang 3.9 more suppression types are available?
I didn't spend the time to find out exactly which suppressions were available in clang-3.6, but it appears that in clang-3.7 only vptr_check is available as a suppression. Starting in clang-3.8, the suppressions list is defined to be the list of checks, plus vptr_check.
In clang-3.9 the checks available are:
"undefined"
"null"
"misaligned-pointer-use"
"alignment"
"object-size"
"signed-integer-overflow"
"unsigned-integer-overflow"
"integer-divide-by-zero"
"float-divide-by-zero"
"shift-base"
"shift-exponent"
"bounds"
"unreachable"
"return"
"vla-bound"
"float-cast-overflow"
"bool"
"enum"
"function"
"returns-nonnull-attribute"
"nonnull-attribute"
"vptr"
"cfi"
"vptr_check"
I'd tried it by creating three files, compile.sh, main.cpp and suppressions.supp as shown below. The unsigned-integer-overflow is not a part of undefined that's why it needs to be included specifically. This works on my machine with clang-3.9.
So, I'd guess more suppression types are valid in clang-3.9.
# compile.sh
set -x
UBSAN_OPTIONS=suppressions=suppressions.supp:print_stacktrace=1 #:help=1
export UBSAN_OPTIONS
clang++-3.9 -g -std=c++11 -fsanitize=undefined -fno-omit-frame-pointer -fsanitize=unsigned-integer-overflow main.cpp
./a.out
// main.cpp
#include <bits/stdc++.h>
#include <bits/stl_tree.h>
using namespace std;
int main(int argc, char **argv) {
unsigned int k = UINT_MAX;
k += 1;
return 0;
}
# suppressions.supp
unsigned-integer-overflow:main.cpp
Here a snippet of generated .c code from .Lex.
And the Coredump is coming at the very first Iteration
while (1) /* loops until end-of-file is reached */{
yy_cp = yy_c_buf_p;
/* Support of yytext. */
*yy_cp = yy_hold_char; // receiving coredump here
/* yy_bp points to the position in yy_ch_buf of the start of
* the current run.*/
yy_bp = yy_cp;
yy_current_state = yy_start;}
Here you can find code
I have answer of my own question. Here are some explanation of Solution
I have two .Lex (Type1_Lex.l & Type2_Lex.l)and two .Yacc (Type1_Yacc.y & Type2_Yacc.y) code
I am compiling all and relevant .c (Type1_Lex.c, Type2_Lex.c, Type1_Yacc.c & Type2_Yacc.v) and .h files are getting generated
And further compilation of .c with generates Type1_Lex.o, Type2_Lex.o, Type1_Yacc.o Type2_Yacc.o
Further I am putting all these object files in a single .a
The Problems are Here
...
ld: Warning: size of symbol `yy_create_buffer' changed from 318 in libuperbe.a(TYPE1_Lex.o) to 208 in libxxx.a (TYPE2_Lex.o)
ld: Warning: size of symbol `yy_load_buffer_state' changed from 262 in libuperbe.a(TYPE1_Lex.o) to 146 in libxxx.a(TYPE2_Lex.o)
ld: Warning: size of symbol `yy_init_buffer' changed from 278 in libuperbe.a(TYPE1_Lex.o) to 164 in libxxx.a(TYPE2_Lex.o)
Some symbols are same in both generated .c (TYPE1_Lex.c & TYPE2_Lex.c)
When both object file bind in a single .a the similar
(yy_create_buffer,yy_init_buffer,yy_load_buffer_state) symbols got
overridden.
At the runtime when the methods yy_create_buffer(),yy_init_buffer(), yy_load_buffer_state() should be called defined in TYPE2_Lex.c but in actual those methods are called from the file TYPE1_Lex.c and the leads to the memory corruption some how.
For moving ahead I decided to use sed with following patterns :
Sed TYPE2_Lex.c with :
s/yy_create_buffer()/TYPE1_create_buffer/g
s/yy_init_buffer()/TYPE1_init_buffer/g
s/yy_load_buffer_state()/TYPE1_load_buffer_state/g
Sed TYPE2_Lex.c with
s/yy_create_buffer()/TYPE2_create_buffer/g
s/yy_init_buffer()/TYPE2_init_buffer/g
s/yy_load_buffer_state()/TYPE2_load_buffer_state/g
So that the Loader can easily differentiate the symbol. And at the run time confusion between the methods name become null.
After all these Step I am able to move ahead :)
Thanks all for your help :)
I've been using fslex and fsyacc, and the F# source files (.fs they generate from the lexer (.fsl) and parser (.fsp) rules refer to the original .fsl (and sometimes to the same .fs source file) all over the place with statement such as this (numbers are line numbers):
lex.fs
1 # 1 "/[PROJECT-PATH-HERE]/lex.fsp
...
16 # 16 "/PROJECT-PATH-HERE]/lex.fs
17 // This is the type of tokens accepted by the parser
18 type token =
19 | EOF
...
Also, the .fs files generated by pars.fsp do the same kind of thing, but additionaly reference to the F# signature file (.fsi) generated alongside it. What does any of this do/mean?
The annotations you see in the generated code are F# Compiler Directives (specifically, the 'line' directive).
The 'line' directive makes it so that when the F# compiler needs to emit a warning/error message for some part of the generated code, it has a way to determine which part of the original file corresponds to that part of the generated code. In other words, the F# compiler can generate a warning/error message referencing the original code which is the basis of the generated code causing the error.