GCC optimisation: use of ARM conditional instructions? - ios

I'm looking at some code compiled for iOS in XCode (so compiled for ARM with gcc) and as far as I can see, the compiler has never used ARM's feature of allowing arbitrary instructions to have a condition attached to them, but instead always branches on a condition as would be the case on Intel and other architectures.
Is this simply a restriction of GCC (I can understand that it might be: that "condition = branch" is embedded at a too high a level in the compiler architecture to allow otherwise), or is there a particular optimisation flag that needs to be turned on to allow compilation of conditional instructions?
(Obviously I appreciate I'm making big assumptions about where use of conditional instructions "ought" to be used and would actually be an optimisation, but I have experience of programming earlier ARM chips and using and analysing the output of Acorn's original ARM C compiler, so I have a rough idea.)
Update: Having investigated this more thanks to the information below, it turns out that:
XCode compiles in Thumb-2 mode, in which conditional execution of arbitrary instructions is not available;
Under some circumstances, it does however use the ITE (if-then-else) instruction to effectively produce instructions with conditional execution.

Seeing some actual assembly would make things clear, but I suspect that the default settings for iOS compilation prefer generation of Thumb code instead of ARM for better code density. While there are pseudo-conditional instructions in Thumb32 aka Thumb-2 (supported in ARMv7 architecture via the IT instruction), the original Thumb16 only has conditional branches. Also, even in ARM mode there are some instructions that cannot be conditional (e.g. many NEON instructions use the extended opcode space with condition field set to NV).

Yes, gcc does not really produce the most optimal code WRT conditional instructions. It works well in the most simple cases, but real code suffers from some pointless slowdowns that can be avoided in hand coded arm ASM. Just to give you a rough idea, I was able to get a 2x speedup for a very low level graphics blit method by doing the read/write and copy logic in ARM asm instead of the C code emitted by gcc. But, keep in mind that this optimization is only worth it for the most heavily used parts of your code. It takes a lot of work to write well optimized ARM asm, so don't even attempt it unless there is a real benefit in the optimization.
The first thing to keep in mind is that xcode uses Thumb mode by default, so in order to generate ARM asm you will need to add the -mno-thumb option to the module specific options for the specific .c file that will contain the ARM asm. Once the ARM asm is getting emitted, you will want to conditionally compile asm statements as indicated in the answer to the following question:
ARM asm conditional compilation question

Related

Dart is compiled or interpreted language?

I've searched a lot on the internet about if Dart is a compiled or interpreted language, also I searched in the documentation and didn't get a clear answer about if Dart is compiled or interpreted or both, and how that works
Yes!
The concepts of "compiled language" and "interpreted language" are not well defined.
Dart is definitely compiled in some cases. Say, when compiling to JavaScript for the web. That translates the program to a program in a different language, while preserving runtime behavior, which is the definition of compilation. (So "compilation" is well defined!)
To be pedantic, languages are not inherently compiled or interpreted. Implementations of languages may compiling or interpreting.
When running Dart on the native VM, directly from source code, it looks very much interpreted. But "interpretation" is a much mushier concept than compilation.
The only thing which is fundamentally and unequivocally interpretation is executing code directly by interpreting the source code, with no intermediate representation.
Nobody does that, not since BASIC. It's just too inefficient.
These days, you at least convert the source code to an abstract syntax tree. That's technically a compilation (converts a program from one representation to another, while preserving semantics). Then you might compile to a different internal representation. Maybe to byte code. Maybe even to native code.
Sometimes you do all of these, doing just-in-time compilation to the next level when optimization calls for it.
The Dart VM compiles source code, through a number of steps, into native machine code. (Which is then interpreted by the CPU.)
Whether that counts as "interpretation" or not is really a philosophical question.
From the outside, it matches one of the definitions of interpretation ("program goes in, runtime behavior occurs", as opposed to compilation which is "program goes in, other program comes out").
Internally, the VM is very much considering itself a compiler.

How to restrict a BenchmarkDotNet job to run only on specific platforms?

I am writing an F# port of a program I wrote in native code in the past. I used BenchmarkDotNet to measure its performance. I also placed a native EXE in the application's output directory.
I set my native program as the baseline benchmark and saw it was 5x faster than my F# program. Just as I expected!
However, the native program is posted on GitHub and distributed as a Win64 binary only. In case somebody using another OS tries to run it, it will crash.
So, how to specify that this benchmark will only run on 64-bit Windows?
In BenchmarkDotNet, there is a concept of Jobs. Jobs define how the benchmark should be executed.
So, you can express your "x64 only" condition as a job. Note that there are several different 64x jit-compilers depends on runtime (LegacyJIT-x64 and RyuJIT-x64 for the full .NET Framework, RyuJIT-x64 for .NET Core, and Mono JIT compiler). You can request not only a specific platform but also a specific JIT-compiler (it can significantly affect performance), e.g.:
[<RyuJitX64Job>]
member this.MyAwesomeBenchmark () = // ...
In this case, a user will be notified that it's impossible to compile the benchmark for required platform.
Unfortunately, there is no way to require a specific OS for now (there is only one option: current OS). So, in your case, it's probably better to check System.Environment.Is64BitOperatingSystem and System.Environment.OSVersion at the start and don't run benchmarks on invalid operation systems.

Can bytecode produced by luac be used on computers with no Lua library?

If I compile a regular .lua file with luac, can the result be ran without the Lua library or interpreter installed?
No. You can run it on a version of Lua that was built without the compiler, but you still need the Lua interpreter to execute the code.
Incidentally, the compiled Lua bytecode is also machine-specific; i.e. you can't compile on one architecture and then run that output on another architecture unless you understand the subtleties (endianness, sizes of types, etc.).
If your code doesn't use any dynamic load-based facility (that's loadstring, loadfile, require, etc.) you can strip Lua library to just a VM, because what compiler emits is code to be run on this virtual machine. This can easily cut Lua already small footprint to 1/3 fraction of original.
However, since this is NOT a native binary code for any currently existing architecture, you still CAN'T run it directly without assistance of VM.

Translation of machinecode into LLVM IR (disassembly / reassembly of X86_64. X86. ARM into LLVM bitcode)

I would like to translate X86_64, x86, ARM executables into LLVM IR (disassembly).
What solution do you suggest ?
mcsema is a production-quality binary lifter. It takes x86 and x86-64 and statically "lifts" it to LLVM IR. It's actively maintained, BSD licensed, and has extensive tests and documentation.
https://github.com/trailofbits/mcsema
Consider using RevGen tool developed within the S2E project. It allows converting x86 binaries to LLVM IR. The source code could be checked out from Revgen branch of GIT repository available by url https://dslabgit.epfl.ch/git/s2e/s2e.git.
As regards to RevGen tool mentioned by #bsa2000, this latest paper "A compiler level intermediate representation based binary analysis and rewriting system" has pointed out some limitations in S2E and Revinc.
I pull them out here.
shortcoming of dynamic translation:
S2E [16] and Revnic [14] present a method for dynamically translating
x86 to LLVM using QEMU. Unlike our approach, these methods convert
blocks of code to LLVM on the fly which limits the application of LLVM
analyses to only one block at a time.
IR incomplete:
Revnic [14] and RevGen [15] recover an IR by merging the translated
blocks, but the recovered IR is incomplete and is only valid for
current execution; consequently, various whole program analyses will
provide incomplete information.
no abstract stack or promoting information
Further, the translated code retains all the assumptions of the
original bi- nary about the stack layout. They do not provide any
methods for obtaining an abstract stack or promoting memory locations
to symbols, which are essential for the application of several
source-level analyses.
I doubt there will be universal solution (think about indirect branches, etc.), LLVM IR is much "higher level" than any assembler. Though it's possible to translate on per-BB basis. You might want to check llvm-qemu and libcpu projects among others.
Just post some references on translating ARM binary to LLVM IR:
disarm - arm binary to llvm ir disassembler
https://code.google.com/p/disarm/
However, I have not tried it, thus not sure about its quality and stability. Anyone else may post additional information about this project?
There is new project, being in some early phases, The libbeauty:
https://github.com/jcdutton/libbeauty
Article about project: Libbeauty: Another Reverse-Engineering Tool, 24 December 2013, Michael Larabel - http://www.phoronix.com/scan.php?page=news_item&px=MTU1MTU
It only supports subset of x86_64 as input now. One of the project goals - is to be able to compile the generated LLVM IR back to assembly to get the binary with same functionality.

Vala: reducing the size of dependencies

I am developing small command line utilities using Vala on win32. Programs compiled using vala depend on the following DLLs
libgobject-2.0-0.dll
libgthread-2.0-0.dll
libglib-2.0-0.dll
They are taking up 1500 kbyes of space. Is there a way to reduce the size of these dependencies (besides compressing them with UPX and the like)? I can't imagine a simple helloworld like app using all the features provided by glib.
Thanks!
If your vala source is fairly simple, you may be able to compile it in the posix profile
valac --profile posix hello.vala
Then your binary will not have any dependency outside of the standard C library. However, the posix profile may still be experimental.

Resources