Understanding linker script NOLOAD sections in embedded software - memory

According to the GNU documentation for ld, a NOLOAD section works as following:
The `(NOLOAD)' directive will mark a section to not be loaded at run
time. The linker will process the section normally, but will mark it
so that a program loader will not load it into memory.
Now, regarding to the program loader, accordign to wikipedia:
Embedded systems typically do not have loaders, and instead, the code
executes directly from ROM. In order to load the operating system
itself, as part of booting, a specialized boot loader is used.
Then, what exactly a NOLOAD section does for FW / embedded software?

The NOLOAD section defines a section that is required to link the program properly but must not be loaded to memory. For example you may need to link your program with some code located in ROM, so you tell the linker to mark the code in ROM as NOLOAD. Then the tool that will load the program (a debugger, an OS or whatever) will not load this part of the code.

I have done several experiments, so I will add an answer in the specific context of embedded software, in particular in absence of loaders (included flash tools with such functionality). This is my understanding after the experiments I have done, but if anything is wrong I will be more than happy to edit the answer.
It's true that NOLOAD will mark a section so that 'the loader' knows that it must not load it into memory. Something similar will happen for flashing tools that receive an .elf file as an input, therefore, to some extent, act as loaders.
However, there are many cases in embedded software in which a loader is not present, for instance if the target device allows to update the firmware only with raw binary files, and it just copies the raw data at a particular address and nothing else. In this cases, NOLOAD will be useful to avoid making the raw binary file unnecessarily big: it will prevent a section to be added to the final binary file when it's not desirable.
Use cases in which it would not be desirable:
If the section in question needs to be initialized to a known value, such as 0x0, 0xff, or some magic number and then all 0x0 etc. Therefore it's not worth to have a (potentially) big section full of 0xff's, when this can be done in a loop at the startup script.
If that particular area of memory must not be overwritten during the update process. Beware of this will require to place that section at a particular position. See below.
If the section contains non-initialized data, such as .bss, but it's not .bss, in which case NOLOAD will need to be set explicitly. Of course, this will need to be initialized to 0 or whatever constant value by the startup script.
Notice that the NOLOAD section will still be present in the .elf file (it won't be marked as LOAD though), but it should be removed by objcopy when generating the raw binary file. There is one exception to this though: objcopy cannot 'remove' a section in between another two because it needs to keep the memory map somehow. For instance, having sections A B C, NOLOAD won't work on B, because objcopy will need to fill B with zeros to keep the offset of section C as expected. However, if C is NOLOAD and it's the last (potentially loadable) section, then it can be removed from the final binary image.
Another special case I have observed is the .bss section. This section is always treated (at least by my version of riscv64-uknown-elf toolchain) as NOLOAD, regardless if the linker script specifies it or not.
In summary, the uses of NOLOAD for embedded systems that do not use a loader are: to keep raw binary images smaller, or to not overwrite memory regions (usually flash) that one wants to leave with the value they had before the firmware update process started.
What NOLOAD cannot do if no loader is present, a.k.a. FW is updated as raw binary files, is to avoid a section to be put in a raw binary file if that section is in between another two that do need to be loaded. In such case, all the the NOLOAD sections will need to be moved to the end of the loadable memory.
As a side note, for the case of avoiding to overwrite memory, as far as I can see that could be worked out by not declaring that section at all. If symbols in that section need to be referenced from the source code, just declare those symbols at the right addresses.

Related

Can I add a entitlements.plist to jailbreak tweaks?

I'd like to restrict how much access to resources jailbreak tweaks receive. Things like network/keychain/location access... Is it possible to manually add a entitlements plist per tweak?
Many thanks.
Tweak is a dylib - it will be loaded in a process. That process may have entitlements and those entitlements will be used for the tweak. That's it. Tweak doesn't have it's own entitlements.
As for your question. Because of what I said before you can't restrict just a tweak - your restrictions will be applied to the whole process that is being tweaked. You can't do anything about that. That's how tweaks work - they are dylibs dynamically loaded into process address space. After that the tweak becomes a part of the process. So any restrictions will be applied to the whole process which includes the tweak, application code and any other dylib/framework application is linked to.
So if you want to develop an application which will help a user put restrictions on tweaks, I don't think you can do such a thing. What you can do is to analyze which application are being tweaked, what entitlements does they have, what frameworks and dylibs are used by a tweak (mainly the private ones). And from that user can either enable or disable that tweak. You can even analyze import section and string literals of the tweak to determine exactly which APIs does it use.
Update
Could you explain to me how a native process communicates with a
tweak, before being loaded within the process space?
It doesn't. Before injecting tweak is a separate dylib that is not linked to any binary. CydiaSubstrate does all the injecting. The main part of the CydiaSubstrate is a special loader dylib. It's linked dynamically to launchd process on device start, to the process which is the first process in the iOS that starts all other processes. When a new process is spawned CydiaSubstrate loader dylib checks all tweak filters to see which ones it needs to inject into the process and injects them. After that tweak is loaded into process address space (becomes a part of the process) and tweak's constructor is called where usually all the hooks are being setup.
Could you explain to me as to how this is accomplished?
Suppose you have an array of objc class names, C/C++ functions, frameworks and dylibs as strings usage of which you would like to detect. There's easy solution. You can open tweak's file and just search through it for any matches. As tweaks are usually not very large it shouldn't take much time. And there's more difficult solution. Use dyld or any other API to parse mach-o sections to find imported symbols and string literals and then search through them for any matches.
I'm not sure if this answers your question from a user side, but if you are making a tweak, you can add XXX_CODESIGN_FLAGS = -Sentitlements.xml to your Makefile to add the entitlements described in entitlements.xml.

Unconcatenating files

I have a corrupted 7-zip archive that I am extracting manually using the method outlined by Igor Pavlov at this link. An intermediate result is a large file that is a bunch of files cat'ed together that must be separated manually. I understand that some file formats will need to be extracted manually by a human using discretion (text files, etc.) but many file formats encode the size of the file as part of the file itself (e.g. .zip). Furthermore, some files can be parsed and their size can be deduced with just a little information about the file format (e.g. .pdf). Let's say the large file consists of the following files concatenated together:
Key: <filename>(<contents>)
badfile(aaaaaaaaaaabbbbbbbbbcccccccdddddddd) -> zip1.zip(aaaaaaaaaaa)
badfile2(bbbbbbbbbcccccccdddddddd)
I am looking for a program that I can run on a large file (call it badfile) that can determine the type and size of the first logical file (let's say it's a .zip file) contained within and create a new file to hold the contents (e.g. zip1.zip since filenames are lost) and chop the file off the front of badfile. This would allow me to run the program in a loop to extract files with known types and/or pause and let the user handle the difficult cases. Does such a program exist? I know that the *nix command file(1) will do a lot of the work here, but there would be a lot of effort in encoding rules for sizing files (e.g. .pdf) that I would prefer to not duplicate.
I believe this question should be closed due to being off topic as it asks to find existing programs to solve the problem, but open bounty prevents close vote. However.
Does such a program exist?
Yes they exist is and are called data carving tools.
Some commom ones include scalpel and foremost and PhotoRec
A list of other tools is avaliable here

How do headers work in Objective-C?

Beyond allowing one file to use another file's attributes, what actually happens behind the scenes? Does it just provide the location to access to that file when its contents are later needed, or does it load the implementation's data into memory?
In short;
The header file defines the API for a module. It's a contract listing which methods a third party can call. The module can be considered a black box to third parties.
The implementation implements the module. It is the inside of the black box. As a developer of a module you have to write this, but as a user of a third party module you shouldn't need to know anything about the implementation. The header should contain all the information you need.
Some parts of a header file could be auto generated - the method declarations. This would require you to annotate the implementation as there are likely to be private methods in the implementation which don't form part of the API and don't belong in the header.
Header files sometimes have other information in them; type definitions, constant definitions etc. These belong in the header file, and not in the implementation.
The main reason for a header is to be able to #include it in some other file, so you can use the functions in one file from that other file. The header includes (only) enough to be able to use the functions, not the functions themselves, so (we hope) compiling it is considerably faster.
Maintaining the two separately most results from nobody ever having written an editor that automates the process very well. There's not really a lot of reason they couldn't do so, and a few have even tried to -- but the editors that have done so have never done very well in the market, and the more mainstream editors haven't adopted it.
Well i will try:
Header files are only needed in the preprocessing phase. Once the preprocessor is done with them the compiler never even sees them. Obviously, the target system doesn't need them either for execution (the same way .c files aren't needed).
Instead libraries are executed during the linking phase.If a program is dynamically linked and the target environment doesn't have the necessary libraries, in the right places, with the right versions it won't run.
In C nothing like that is needed since once you compile it you get native code. The header files are copy pasted when u #include it . It is very different from the byte-code you get from java. There's no need for an interpreter(like the JVM): you just feed it your binary stuff to the CPU and it does its thing.

Rewriting symbols in static iOS libraries

I am working on an iOS app which links several static libraries. The challenge is, those linked libraries define same method names with different implementations. Oddly, I don't get any duplicate symbol definition errors; but, to no surprise, I end up with access to only one implementation of the method.
To be more clear, say I have libA and libB and they both define a global C method called func1()
When I link both libA and libB, and make a call to func1(), it resolves to either libA's or libB's implementation without any compilation warning. I, however, need to be able to access both libA's func1() and libB's func1() separately.
There's a similar SO post that explains how it can be done in C (via symbol renaming) but unfortunately, as I found out, objcopy tool doesn't work for ARM architecture (hence iPhone).
(I will submit it to the App Store, hence, dynamic linking is not an option)
It appears that you are in luck - you can still rename symbols with the ARM binary format, it's just a bit more hacky than the objcopy method...
NOTE: This has only been tested minimally, and I would strongly advise you to make a backup of all libraries in question before trying this!
Also note that this only works for files not compiled with the C++ compiler! This will fail if the C++ compiler was used on these files.
First, you will need a decent hex editor, for this example, I will be using Hex Fiend.
Next, you will open up a copy of your of of your libraries, let's call it lib1-renamed.a, and do the following with it:
Find the name of the symbol you wish to re-name. It can be found using the nm tool, or, if you know the header name, you should be set.
Next, you will use hex fiend, and to a textual replace of the old name (in this case foo), and give it a new name (in this case, bar). These names must have the same length, or it will corrupt the binary's offsets!
Note: if there is more than one function that contain's foo's name in it, you may have problems.
Now, you must edit the headers of the library you changed, to use the new function name (bar) instead of the old one.
If you have done the three simple† steps above properly, you should now be able to compile & link the two files successfully, and call both implementations.
If you are trying to do this with a universal binary (e.g. one the works on the simulator as well), you'd be best off using lipo to separate the two binaries, using objcopy on the i386/x64 binary, and then using my method on the ARM binary, and lipo it back together.
†: Simplicity is not guaranteed, nor is it covered by the Richard J. Ross III super warranty. For more information about the super warranty, call 1-800-FREE-WARRANTY now. That's 1-800-FREE-WARRANTY now!

How to make an object file that cannot be dead_stripped?

What is the easiest way to produce a Mach-O object file that does not have the SUBSECTIONS_VIA_SYMBOLS flag set, such that the linker (with -dead_strip) will not later try to cut the text section into pieces and guess which pieces are used?
I can use either a command-line option to llvm/gcc (4.2.1) that will prevent it from emitting .subsections_via_symbols in the first place, or a command-line tool that will remove the flag from an existing object file.
(Writing such a tool myself based on the Mach-O spec is an option, but if possible I'd rather not reinvent the wheel that hard).
Platform: iOS, cross-compiling from OSX with XCode 4.5.
Background: We're supplying a static library that other companies build into apps. When our library encounters a problem it produces a crash report with a stack trace and certain other key information that (if we're lucky) we get to analyze later. Typically the apps as deployed have been stripped of debug information so interpreting stack traces is a problem. If we were making the app ourselves we would just save the DWARF debug data from before stripping and use that to decode the addresses in the incoming crash reports. But we can't depend on the app makers supplying us with such data from their linking steps.
What we're doing instead is to let the crash report include the run-time address of selected function; from that we can deduce the offset between addresses in our linker map and addresses in the crash report. We're linking our entire library incrementally into a single .o before we stuff it into an .a; since it does only one big thing there wouldn't be much to save from removing unused functionality from it when the app is eventually linked. Unfortunately there's a few small pieces of code in the library that are sometimes not used (alternative API entry points for the main functionality, small helper functions for interpreting our error codes and the like), and if the app developer links with -dead_strip, it disturbs the address reconstruction of crash reports that the relative offsets in the final app differ from the linker map from our incremental link operation.
We can't realistically ask all app developers to disable dead-code stripping in their build process, so it seems a better way forward if we could mark our .o as "not dead-strippable" and have the eventual app linking respect that.
I solved it.
The output of an incremental link operation only has MH_SUBSECTIONS_VIA_SYMBOLS set if all the input objects have it set. And an object file produced from assembler input only has it set if there's an explicit directive set. So one can remove the flag by linking with an empty assembler input:
echo > empty.s
$(CC) $(CFLAGS) input.o empty.s -nostdlib -Wl,r -o output.o

Resources