I am relatively new to Xcode. We are testing an app that displays incoming data and it needs to be as fast as possible. With other platforms I need to change from "debug" to "release" in order for optimizations to kick in and debug code to be removed, which can have a profound effect on speed. What are the equivalent things I need to do in Xcode to build in fast/release mode?
(I am googling this and see lots of hits that seem to be in the general vicinity but I might be a little thrown off by the terminology, I might need it dumbed down a bit :))
Thanks for the help.
The first step is to set the Optimization Level for release as described above. There are lots of options here. From the clang LLVM compiler man page (man cc) -- (note that -Os is the default for Release):
Code Generation Options
-O0 -O1 -O2 -O3 -Ofast -Os -Oz -O -O4
Specify which optimization level to use:
-O0 Means "no optimization": this level compiles the fastest and
generates the most debuggable code.
-O1 Somewhere between -O0 and -O2.
-O2 Moderate level of optimization which enables most
optimizations.
-O3 Like -O2, except that it enables optimizations that take longer
to perform or that may generate larger code (in an attempt to
make the program run faster).
-Ofast
Enables all the optimizations from -O3 along with other
aggressive optimizations that may violate strict compliance
with language standards.
-Os Like -O2 with extra optimizations to reduce code size.
-Oz Like -Os (and thus -O2), but reduces code size further.
-O Equivalent to -O2.
-O4 and higher
Currently equivalent to -O3
You will notice the 'Ofast' option -- very fast, somewhat risky.
A second step is to consider whether to enable "Unroll Loops". I've read that this can in some code lead to a 15% speed increase (at the expense of debugging, but not an issue for Release builds).
Next, consider whether you want to Build and use an Optimization Profile. See Apple for details, but the gist is that:
Profile Guided Optimization (PGO) is a means to improve compiler
optimization of an app. PGO utilizes a specially instrumented build of
the app to generate profile information about the most commonly used
code paths and methods. The compiler then uses this profile
information to focus optimization efforts on the most frequently used
code, taking advantage of the extra information about how the program
typically behaves to do a better job of optimization.
You define the profile and whether you use it under Build Settings -> Apple LLVM 6.0 - Code Generation -> Use Optimization Profile.
First have a look at this part in Xcode (screenshot of Xcode 5 but same on Xcode 6)
You should also prefer PNG to Jpeg (as Jpeg requires more calculation - but are generally smaller in terms of size so better for network...)
Finally, Use multi-threading.
Those are (to mu humble opinion) the first steps to look at.
Edit the scheme to use release configuration.
Related
I am able to cross compile some Fortran 90 code (large block written by someone else so do not want to convert it) using x86_64 GNU/Linux as the build system and aarch64-linux as the host system and using dynamic linking. However, I want to generate a statically linked binary so added -static to the mpif90 call. When I do this, I get this warning:
/home/me/CROSS-REPOS/glibc-2.35/math/../sysdeps/ieee754/dbl-64/e_log.c:106: warning: too many GOT entries for -fpic, please recompile with -fPIC
When I add this flag as in "mpif90 -static -fPIC" the same error appears. Also tried -mcmodel=large option as in "mpif90 -static -mcmodel=large" to no avail.
Then checked the options for "/home/me/CROSS-JUL2022/lib/gcc/aarch64-linux/12.1.0/../../../../aarch64-linux/bin/ld", I see this one, --long-plt (to generate long PLT entries and to handle large .plt/.got displacements). But trying "mpif90 -static -Wl,--long-plt" says --long-plt is not an option. How to invoke this --long-plt option then?
One other thing, I know static linking will make the binaries a fair amount bigger but do not want to carry libs over to the Android device. Furthermore, some reading is indicating that dynamic linking on the Android device could lead to some security issues. Thanks for any suggestions.
As I know, GCC has this website to figure out the relationship between different flags using while optimization. GCC example website. Like fpartialInlining can only be useful when findirectInlining is turned on.
I think the same thing would happen in clang, in other words, I think the different passes may have this kind of dependcy/confilcts relationship in LLVM(CLANG).
But after checking all the document provided by developers, I find it just say something about the functionality in these passes. LLVM PASS DOC
So my question can be divided into 2 parts I think:
Does the dependency exists in LLVM PASS or there is no such dependency/conflicts
If there is, how can I find them.
You can find which passes are using in which optimization levels by clang while compiling any c or c++ code with clang and try to figure out dependencies. For example:
clang -O2 --target=riscv32 -mllvm -debug-pass=Structure example.c
(You can use also -debug-pass=Arguments instead of -debug-pass=Structure. It depends readability.)
this will give which passes used by clang at 2. optimization level for riscv32 target. If you don't give a target it sets default as your host machine target, and keep in mind that some used passes changes related to different targets at same optimization levels.
I have been trying to compile my code using -pg to enable profiling in the simulator and once I do that it gives me linker errors.
Compilation command
hexagon-clang++ main.cpp -o hello -mv62 -pg
Error
hexagon-clang++ main.cpp -o hello -mv62 -pg
Error: /tmp/main-924ac3.o(.text+0x30): undefined reference to `mcount'
Error: /tmp/main-924ac3.o(.text+0x130): undefined reference to `mcount'
Fatal: Linking had errors.
This is my first time to write code for DSP chip, specifically the hexagon 682. Are there any tutorials or references other than the programmer reference manual because they haven't been very useful in helping me understand how things work. Specially I don't understand how SIMD programming works. I am not sure what's the size of SIMD registers. Also it seems that using Floating point in DSP chips is not a great idea. So would it be better if I convert my code to use fixed point.
You can use hexagon-sim to generate the profiling data without rebuilding instrumented binaries.
hexagon-sim --profile ./hello will generate the gmon input file(s) necessary for hexagon-gprof to consume.
e.g. (taken from SDK 3.3.3 Examples/)
hexagon-clang -O2 -g -mv5 -c -o mandelbrot.o mandelbrot.c
hexagon-clang -O2 -g -mv5 mandelbrot.o -o mandelbrot -lhexagon
hexagon-sim -mv5 --timing --profile mandelbrot
hexagon-gprof mandelbrot gmon.t*
Note also that the SDK comes with hexagon-profiler, a richer tool that allows you to see in depth performance counters -- information beyond just which code was executed and how often.
See "Hexagon Profiler User Guide" (doc number 80-N2040-10 A) for details.
Are there any tutorials or references other than the programmer
reference manual because they haven't been very useful in helping me
understand how things work.
Specially I don't understand how SIMD programming works. I am not sure
what's the size of SIMD registers.
Hexagon's vector programming extension is called "HVX". There's a HVX-specific PRM that's available at https://developer.qualcomm.com/software/hexagon-dsp-sdk/tools -- it describes different 512-bit and 1024-bit vector modes.
I am trying to build clang, however the build size is quite large. As clang supports non-C family languages as well ( e.g. Java, Fortran ), is there a way to turn that off during the build. I just want to have support for C and C++ and don't care about other languages.
Is there a CMake option that needs to be set to do that??
Thanks a lot!
Best Regards,
Nitish
As others have commented, clang is a C/C++ front end only, and there's no Java/Fortran front end to disable.
However, there are others ways to reduce clang build size:
Choosing a suitable build configuration
The default build configuration for LLVM/clang is Debug. Building for Debug (not specifying a build configuration) results with huge executables, and build folder may take > 20GB. This is primarily due to debug information.
If you're not developing clang, and don't need debug information, you may build for MinSizeRel, which is a release build that is optimized for size.
Tweaking build settings
If you are planning to debug clang or do light clang developement, another option is building with a minimal debug information - the -gmlt option keeps line debug information only which allows source stepping, and results with much more compact object files, compared to full debug information (-g).
Disabling build components
You may disable some components from building, such as tests and examples:
-DLLVM_INCLUDE_TESTS=Off -DLLVM_INCLUDE_EXAMPLES=Off
Putting it together:
cmake -DCMAKE_BUILD_TYPE=MinSizeRel -DLLVM_INCLUDE_TESTS=Off -DLLVM_INCLUDE_EXAMPLES=Off
For compact debug build:
cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_CXX_FLAGS=-gmlt -DLLVM_INCLUDE_TESTS=Off -DLLVM_INCLUDE_EXAMPLES=Off
Hope this helps!
The answer is easy: clang is C/C++ frontend, it does not support neither Java nor Fortran, therefore there is no such option - there is nothing to turn off.
I'm not sure how much it would help, but you could optimize your compilation of clang for size. Disabling debug symbols as others have said should also help. Set CFLAGS="-Os" CXXFLAGS="-Os" as environment variables when you build clang.
This is from GCC 4.8.5
-Os Optimize for size. -Os enables all -O2 optimizations that do not typically increase code size. It also performs further optimizations designed to reduce code size.
-Os disables the following optimization flags: -falign-functions -falign-jumps -falign-loops -falign-labels -freorder-blocks
-freorder-blocks-and-partition -fprefetch-loop-arrays -ftree-vect-loop-version
I want to use DS-5 Streamline profiler to profile my code. In the documentation its mentioned that to be able to see call stacks, we need to compile code with compiler option -fno-omit-frame-pointer. This option is there in gcc.
Is there an equivalent option for clang also?
-fno-omit-frame-pointer is not working for me with clang.
I have also tried setting the compiler optimization level to 0, but still I am not getting call stacks in streamline.
It looks like DS-5 is an ARM thing, so this might not be relevant, but I ran into a similar issue trying to get good stack traces out of a clang-compiled executable using Linux's perf profiler.
The problem ended up being that, on x86-64 Linux at least, Clang requires both -fno-omit-frame-pointer and -mno-omit-leaf-frame-pointer in order to get the same behavior that gcc gives with only -fno-omit-frame-pointer. See this LLVM bug: "Need both -fno-omit-frame-pointer and -mno-omit-leaf-frame-pointer to get a fp on linux on a leaf function"