Compiler flag to prevent partial redundancy elimination in gfortran - gfortran

I'm trying to get a little more speed out of a large Fortran program I'm working with. I'm not (yet) intimately familiar with the code so I thought compiler optimization would be a good first stop. Using gfortran's -O1 option is fine but -O2 and -O3 produced runtime errors in the form of unexpected NaN's. By breaking -O2 down into its constituent flags, I determined that -ftree-pre was the problem and everything else works fine.
Is there a way of suppressing just the -ftree-pre flag? That way I can compile with -O3 -no-ftree-pre, if that makes sense.
I've already determined which part of the code is producing the error, so my long term plan is to eliminate the error. But this will constitute a quick fix for now.

As documented in the GCC manual page:
Many options have long names starting with -f or with -W - for example, -fmove-loop-invariants, -Wformat and so on. Most of these have both positive and negative forms; the negative form of -ffoo would be -fno-foo. This manual documents only one of these two forms, whichever one is not the default.
In your case, in order to negate the effect of -O3 enabling -ftree-pre, you should append -fno-tree-pre after the -O3 flag.

Related

What does .text.unlikely mean in ELF object files?

In my objdump -t output, I see the following two lines:
00000000000004d2 l F .text.unlikely 00000000000000ec function-signature-goes-here [clone .cold.427]
and
00000000000018e0 g F .text 0000000000000690 function-signature-goes-here
I know l means local and g means global. I also know that .text is a section, or a type of section, in an object file, containing compiled program instructions. But what is .text.unlikely? Assuming it's a different section (or type-of-section) from .text - what's the difference?
In my GCC v5.4.0 manpage, I found the following switch:
-freorder-functions
which says:
Reorder functions in the object file in order to improve code
locality. This is implemented by using special subsections
".text.hot" for most frequently executed functions and
".text.unlikely" for unlikely executed functions. Reordering is done
by the linker so object file format must support named sections and
linker must place them in a reasonable way.
Also profile feedback must be available to make this option effective.
See -fprofile-arcs for details.
Enabled at levels -O2, -O3, -Os.
Looks like the compiler was run with optimization flags or that switch for this binary, and functions are organized in subsections to optimize spatial locality.

Flex multiple .l file arguments don't work? (eg "flex a.l b.l")

I finally have a comfortable-enough workflow for writing my flex programs, and I'll work bison into it soon (I dabbled with it before but I restarted my project entirely).
flex yy.l; flex flip.l will generate a lex.yy.c and lex.flip.ccorrectly, since I use the prefix option. But I am curious why flex yy.l flip.l or flex *.l does not.
gcc lex* seems to work perfectly fine when all .c files are correctly generated, as by the first command, but trying the same shortcut with flex produces a single lex.yy.c file, which seemed valid up until the unprocessed flip.l file pasted on the end, preventing gcc compilation.
Is this just flex telling me my workflow is dumb and I should use more start conditions in a big file? I'd prefer not to, at least until I have a more complete program to tweak for speed.
My workflow is:
fg 1; fg 2; fg 3; fg 4; flex a.l; flex flip.l; flex rot.l; gcc -g lex*; ./a.out < in
With nano editors as jobs 1, 2, 3, 4 to fg out of the background.
I'm lexing the file in this order: flip, rot, a, rot, flip. And it works, and I can even use preprocessor definitions gcc -DALONE to correctly compile my .c files alone, for testing.
I think what flex is telling you, if anything, is to learn how to use make rather than trying to put together massive build commands.
It's true that flex will only process one file per invocation. On the other hand, both gcc and clang are simply drivers which invoke the actual compiler(s) and linker(s) so that you don't have to write more complicated build recipes. You could easily write a little driver program which invoked flex multiple times, once per argument, but it would be even simpler to use make, with the additional advantage that flex would only be invoked as necessary.
In fact, most large C projects do not use gcc's ability to compile multiple files in a single invocation. Instead, they let make figure out which object files need to be rebuilt (because the corresponding source file changed), thereby considerably speeding up the debug/edit/build cycle.

Is there an -Os equivalent in Rust? [duplicate]

Executing rustc -C help shows (among other things):
-C opt-level=val -- optimize with possible levels 0-3, s, or z
The levels 0 to 3 are fairly intuitive, I think: the higher the level, the more aggressive optimizations will be performed. However, I have no clue what the s and z options are doing and I couldn't find Rust-related information about them.
It seems like you are not the only one confused, as described in a Rust issue. It seems to follow the same pattern as Clang:
Os For optimising the size when compiling.
Oz For even more size optimisation.
Looking at these and these lines in Rust's source code, I can say that s means optimize for size, and z means optimize for size some more.
All optimizations seem to be performed by the LLVM code-generation engine.
These two sequences, Os and Oz, within LLVM, are pretty similar. Oz invokes 260 passes (I am using LLVM 12.0), whereas Os invokes 264. Oz' sequence of analyses and optimizations is almost a strict subsequence of Os', except for one pass (opt -loops), which appears in a different place within Os. This said, notice that the effects of the optimizations can still be different, because they use different cost models, e.g., constants that determine the behavior of optimizations. Thus, optimizations that have impact on size, like loop unrolling and inlining can behave differently in these two sequences.

GCC ppc64 aligned functions

I'm using GCC for make some powerpc64 executable, but sometimes between functions i have the following mistakes: Screenshot
Powerpc instructions format are still in 4 bytes, i tried some gcc commands (-fno-align-functions) but the compiler still fill bytes between functions.
I want my functions start directly after the end of the previous functions, without any values/zero filled (in the case of the screenshots the functions should start at 0x124).
Thanks.
The PPC64 ABI specifies a traceback table appended to functions. The zeroes may be due to the traceback table and not related to alignment. Try using the
-mtraceback=no command line option.
In addition to the traceback table issue noted in the previous answer, functions are normally aligned on a 16-byte boundary. This is important for various reasons, including so the compiler can align hot loops on a 16-byte boundary for improved icache performance. Assembly code from GCC will have a directive like:
.p2align 4,,15
before each function definition to enforce this. So even without the traceback table your function will not start at address 0x124 without more effort.
This behavior can be overridden using -fno-align-functions, or using optimization level -Os (optimize for size). I've tried both methods, and they both remove the .p2align directive. Using -fno-align-functions is preferable unless you really want smaller and potentially slower code.
(If you are compiling with -O0 or -O1, you won't see the directive either, but we do not recommend compiling at such low optimization levels for either size or speed.)

meaning of llvm[n] when compiling llvm, where n is an integer

I'm compiling LLVM as well as clang. I noticed that the output of compilation has llvm[1]: or llvm[2]: or llvm[3]: prefixed to each line. What do those integers in brackets mean?
Apparently, it's not connected to the number of the compilation job (can be easily checked via make -j 1). The autoconf-based build system indicates the "level" of the makefile inside the source tree). To be prices, it's a value of make's MAKELEVEL variable.
The currently accepted answer is not correct. Furthermore, this is really a GNU Make question, not a LLVM question.
What you're seeing is the current value of the MAKELEVEL variable echoed by make to the command line. This value is set as a result of recursive execution. From the GNU make manual:
As a special feature, the variable MAKELEVEL is changed when it is passed down from level to level. This variable’s value is a string which is the depth of the level as a decimal number. The value is ‘0’ for the top-level make; ‘1’ for a sub-make, ‘2’ for a sub-sub-make, and so on. The incrementation happens when make sets up the environment for a recipe.
If you have a copy of the GNU Make source code on hand, you can see your output message being generated in output.c with the void message(...) function. In GNU make v4.2, this happens on line 626. Specifically the program argument is set to the string "llvm" and the makelevel argument is set as noted above.
Since it was erroneously brought up, it is not the number of the compilation job. The -j [jobs] or --jobs[=jobs] options enable parallel execution of up to jobs number of recipes simultaneously. If -j or --jobs is selected, but jobs is not set, GNU Make attempts to execute as many recipes simultaneously as possible. See this section of the GNU Make manual for more information.
It is possible to have recursive execution without parallel execution, and parallel execution without recursive execution. This is the main reason that the currently accepted answer is not correct.
It's the number of the compilation job (make -j). Helpful to trace compilation errors.

Resources