Linker script: defining symbol size and type - symbols

I know I can define a symbol/label in my (GNU binutils ld) linker script simply by assigning an address to the symbol like so: my_symbol = 0xdeadbeef; But, what if I wanted to give that symbol a size (like 0x02) and a type (like OBJECT or FUNC) as it would show up in the symbol table (.symtab)?

Related

How does clang check redefinitions?

I'm new to Clang, and trying to write some clang-tidy checks. I want to find something that works as a "variable table", to check if some names are well-formed.
My intuition is like this:
To write redefinition code will sometimes cause an error, which is thrown out by Clang's diagnostics. like:
int main(){
int x;
int x; // error: redefinition
return 0;
}
From my perspective, clang may keep a dynamic variable table to check whether a new definition is compatible/overloading/error.
I tried to dive into clang source code and explored something:
Identifiertable, is kept by preprocessor, which marks all the identifiers, but does not do the semantic legal checking.
DeclContext, which seems to be an interface for users to use, a product produced by semantic checking.
My question is :
How Clang do the legal checking?
Am I able to get the variable table(If there exists such kind of things)?
If I cannot get such things, how could I know which variables are reachable from a location?
Thanks for your suggestions!
TLDR; see Answers below.
Discussion
All of your questions are related to one term of C standard, identifier, in C99-6.2.1-p1:
An identifier can denote an object; a function; a tag or a member of a structure, union, or
enumeration; a typedef name; a label name; a macro name; or a macro parameter.
Each identifier has its own scope, one of the following, according to C99-6.2.1-p2:
For each different entity that an identifier designates, the identifier is visible (i.e., can be
used) only within a region of program text called its scope.
Since what you are interested in are the variables inside a function (i.e., int x), then it should then obtain a block scope.
There is an process called linkage for the identifiers in the same scope, according to C99-6.2.2-p2:
An identifier declared in different scopes or in the same scope more than once can be
made to refer to the same object or function by a process called linkage.
This is exactly the one that put a constraint that there should be only one identifier for one same object, or in your saying, definition legally checking. Therefore compiling the following codes
/* file_1.c */
int a = 123;
/* file_2.c */
int a = 456;
would cause an linkage error:
% clang file_*
...
ld: 1 duplicate symbol
clang: error: linker command failed with exit code 1
However, in your case, the identifiers are inside the same function body, which is more likely the following:
/* file.c */
int main(){
int b;
int b=1;
}
Here identifier b has a block scope, which shall have no linkage, according to C99-6.2.2-p6:
The following identifiers have no linkage: an identifier declared to be anything other than
an object or a function; an identifier declared to be a function parameter; a block scope
identifier for an object declared without the storage-class specifier extern.
Having no linkage means that we cannot apply the rules mentioned above to it, that is, it should not be related to a linkage error kind.
It is naturally considered it as an error of redefinition. But, while it is indeed defined in C++, which is called One Definition Rule, it is NOT in C.(check this or this for more details) There is no exact definition for dealing with those duplicate identifiers in a same block scope. Hence it is an implementation-defined behavior. This might be the reason why with clang, the resulting errors after compiling the above codes (file.c) differs from the ones by gcc, as shown below:
(note that the term 'with no linkage' by gcc)
# ---
# GCC (gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04))
# ---
$ gcc file.c
file.c: In function ‘main’:
file.c:4:6: error: redeclaration of ‘b’ with no linkage
int b=1;
^
file.c:3:6: note: previous declaration of ‘b’ was here
int b;
^
# ---
# CLANG (Apple clang version 13.0.0 (clang-1300.0.29.3))
# ---
% clang file.c
file.c:4:6: error: redefinition of 'b'
int b;
^
file.c:3:6: note: previous definition is here
int b=1;
^
1 error generated.
Answers
With all things above, I think it suffices to answer your questions:
How clang perform the definition legally checking?
For global variables, either clang or gcc would follow the C standard rules, that is to say, they handle the so-called "redefinition errors" by the process called Linkage. For local variables, it is undefined behavior, or more precisely, implementation-defined behavior.
In fact, They both view the "redefinition" as an error. Although variable names inside a function body would be vanished after compiled (you can verify this in the assembly output), it is undoubtedly more natural and helpful for letting them be unique.
Am I able to get the variable table(If there exists such kind of things)?
Having not so much knowledge about clang internals, but according to the standards quoted above, along with an analysis of compiling, we can infer that IdentifierTable might not much fit your needs, since it exists in "preprocessing" stage, which is before "linking" stage. To take a look how clang compiler deals with duplicate variables (or more formally, symbols), and how to store them, you might want to check the whole project of lld, or in particular, SymbolTable.

GNU Assembler (Arm64). Parentheses in macro arguments references

I'm trying to find any information parentheses syntax for macro arguments in GNU Assembler. E.g. I have following code:
.macro do_block, enc, in, rounds, rk, rkp, i
eor \in\().16b, \in\().16b, v15.16b
...
(taken from here)
What does paretheses in \in\().16b mean? Where to find documentaion for this syntax?
Okay, I've found the answer. This is special syntax to escape macro-argument name.
From the documentation:
Note that since each of the macargs can be an identifier exactly as any other one permitted by the target architecture, there may be occasional problems if the target hand-crafts special meanings to certain characters when they occur in a special position. For example:
...
problems might occur with the period character (‘.’) which is often allowed inside opcode names (and hence identifier names). So for example constructing a macro to build an opcode from a base name and a length specifier like this:
.macro opcode base length
\base.\length
.endm
and invoking it as ‘opcode store l’ will not create a ‘store.l’ instruction but instead > generate some kind of error as the assembler tries to interpret the text \base.\length.
The string \() can be used to separate the end of a macro argument from the following text. eg:
.macro opcode base length
\base\().\length
.endm

Why does the nm tool output for the extern-only and defined-only options overlap?

I'll start by giving my understanding of the options:
extern-only: Show me only those symbols which are referenced by the binary but whose definitions (the code or variable) will be provided by another binary
defined-only: Show me only those symbols whose definitions are contained in the binary.
Here are my commands and their output:
$nm -defined-only GenerationOfNow | grep FIRAZeroingWeakContainer
000000010002c128 t -[FIRAZeroingWeakContainer .cxx_destruct]
000000010002c0fb t -[FIRAZeroingWeakContainer object]
000000010002c114 t -[FIRAZeroingWeakContainer setObject:]
000000010021a218 S _OBJC_CLASS_$_FIRAZeroingWeakContainer
00000001002177f8 s _OBJC_IVAR_$_FIRAZeroingWeakContainer._object
000000010021a1f0 S _OBJC_METACLASS_$_FIRAZeroingWeakContainer
$nm -extern-only GenerationOfNow | grep FIRAZeroingWeakContainer
000000010021a218 S _OBJC_CLASS_$_FIRAZeroingWeakContainer
000000010021a1f0 S _OBJC_METACLASS_$_FIRAZeroingWeakContainer
As you can see, the -extern-only output is a subset of the -defined-only output. Why? Perhaps my question should be: What is the meaning of those items which have a S in the second column?
You're confusing -extern-only with -undefined-only.
There are two concepts that are being mixed here:
extern vs. local (in C extern vs. static, "local" is sometimes also called "private")
defined vs. undefined
The former describes the availability of a symbol while the latter describes its origin. And yes, even the notion of a private undefined symbol exists, as per man nm:
Each symbol name is preceded by its value (blanks if undefined). [...] A lower case u in a dynamic shared library indicates a undefined reference to a private external in another module in the same library.
Now, when using -undefined-only you actually do get the complement of -undefined-only
bash$ nm test.dylib
0000000000000f60 T _derp
0000000000000f70 t _herp
U _printf
U dyld_stub_binder
bash$ nm -defined-only test.dylib
0000000000000f60 T _derp
0000000000000f70 t _herp
bash$ nm -undefined-only test.dylib
_printf
dyld_stub_binder
bash$ nm -extern-only test.dylib
0000000000000f60 T _derp
U _printf
U dyld_stub_binder
-extern-only does not seem to have a complementary flag however.

List of all built in symbols in z3

I'm using the smt2-lib interface of z3 and trying to define the following:
(declare-const rem (set sl$REQ))
And get this error:
(error "line 36 column 31: invalid declaration, builtin symbol rem")
Is there a way to get a complete list of all the predefined symbols so that I can do an automatic renaming?
Thanks!
Simon
Yes, but it's not quite that trivial. Depending on options and logic definitions, the list of pre-defined symbols may change. But, you can get a list of all potentially predefined symbols by grepping for builtin_name in src/ast/*_decl_plugin.cpp. For example, the rem symbol is defined at arith_decl_plugin.cpp:540.

What kind of resource is __TFE12CoreGraphicsVSC6CGRectCfMS0_FT1xSi1ySi5widthSi6heightSi_S0_?

My iPhone crashlog (and not my Simulator) shows me that I have the following issue:
Dyld Error Message:
Symbol not found: __TFE12CoreGraphicsVSC6CGRectCfMS0_FT1xSi1ySi5widthSi6heightSi_S0_
Referenced from: /private/var/mobile/Containers/Bundle/Application/8F97818E-F019-42E8-883C-6FB1994C24B7/Ekalipi.app/PlugIns/EkalipiKeyboard.appex/EkalipiKeyboard
Expected in: /private/var/mobile/Containers/Bundle/Application/8F97818E-F019-42E8-883C-6FB1994C24B7/Ekalipi.app/PlugIns/EkalipiKeyboard.appex/../../Frameworks/libswiftCoreGraphics.dylib
Dyld Version: 353.5
Is this a Unicode symbol that can't be loaded?
Last meaningful stack entry is this:
6 EkalipiKeyboard 0x0010ad88 0xf5000 + 89480
7 UIKit 0x2acbe4f0 -[_UIViewServiceViewControllerOperator __createViewController:withContextToken:fbsDisplays:appearanceSerializedRepresentations:legacyAppearance:hostAccessibilityServerPort:canShowTextServices:replyHandler:] + 1152
What is the pattern for understanding the above resource string?
Many thanks in advance!
Klaus
That is a mangled label that the compiler generated for that function (the CGRect initializer)
You can break down the full label like this (I think)
__TFE12CoreGraphicsVSC6CGRectCfMS0_FT1xSi1ySi5widthSi6heightSi_S0_
_ is a common beginning of a symbol
_T is the marker for a Swift global symbol
F says that it's a function
I don't know what E means (but looking at the detangled symbol it seems to correspond to ext)
12CoreGraphics is the name of the module (prefixes with the length of the name)
V marks the start of a struct
I don't know what S or what C means
6CGRect is the name of the function (I think it's the function)
I don't know what C means (see M below)
f marks this symbol as an "uncurried function"
I don't know what M means (CfM together seem to mean an init function but I don't know what the individual letters mean)
S0_ is a substitution. I think it's a substitution for "self" which is passed to curry the function
F here marks the beginning of the function's parameter list
T marks the beginning of a "tuple" (for the arguments)
1x is the name of the first parameter (prefixed with the length of the name)
Si says that it is of the Swift.Int type
1y is the name of the second parameter (prefixed with the length of the name)
Si says that it is of the Swift.Int type
5width is the name of the third parameter (prefixed with the length of the name)
Si says that it is of the Swift.Int type
6height is the name of the fourth parameter (prefixed with the length of the name)
Si says that it is of the Swift.Int type
_ marks the end of the uncurried function's arguments tuple
S0_ is the same substitution again (which I think means that it returns "self")
Additionally, running it through xcrun swift-demangle gives the official demangling:
ext.CoreGraphics.C.CGRect.init (C.CGRect.Type)(x : Swift.Int, y : Swift.Int, width : Swift.Int, height : Swift.Int) -> C.CGRect
Gwynne Raskind wrote a very detailed article about Swift Name Mangling where you can read more about this name mangling

Resources