Is there an -Os equivalent in Rust? [duplicate] - ios

Executing rustc -C help shows (among other things):
-C opt-level=val -- optimize with possible levels 0-3, s, or z
The levels 0 to 3 are fairly intuitive, I think: the higher the level, the more aggressive optimizations will be performed. However, I have no clue what the s and z options are doing and I couldn't find Rust-related information about them.

It seems like you are not the only one confused, as described in a Rust issue. It seems to follow the same pattern as Clang:
Os For optimising the size when compiling.
Oz For even more size optimisation.

Looking at these and these lines in Rust's source code, I can say that s means optimize for size, and z means optimize for size some more.
All optimizations seem to be performed by the LLVM code-generation engine.

These two sequences, Os and Oz, within LLVM, are pretty similar. Oz invokes 260 passes (I am using LLVM 12.0), whereas Os invokes 264. Oz' sequence of analyses and optimizations is almost a strict subsequence of Os', except for one pass (opt -loops), which appears in a different place within Os. This said, notice that the effects of the optimizations can still be different, because they use different cost models, e.g., constants that determine the behavior of optimizations. Thus, optimizations that have impact on size, like loop unrolling and inlining can behave differently in these two sequences.

Related

GCC ppc64 aligned functions

I'm using GCC for make some powerpc64 executable, but sometimes between functions i have the following mistakes: Screenshot
Powerpc instructions format are still in 4 bytes, i tried some gcc commands (-fno-align-functions) but the compiler still fill bytes between functions.
I want my functions start directly after the end of the previous functions, without any values/zero filled (in the case of the screenshots the functions should start at 0x124).
Thanks.
The PPC64 ABI specifies a traceback table appended to functions. The zeroes may be due to the traceback table and not related to alignment. Try using the
-mtraceback=no command line option.
In addition to the traceback table issue noted in the previous answer, functions are normally aligned on a 16-byte boundary. This is important for various reasons, including so the compiler can align hot loops on a 16-byte boundary for improved icache performance. Assembly code from GCC will have a directive like:
.p2align 4,,15
before each function definition to enforce this. So even without the traceback table your function will not start at address 0x124 without more effort.
This behavior can be overridden using -fno-align-functions, or using optimization level -Os (optimize for size). I've tried both methods, and they both remove the .p2align directive. Using -fno-align-functions is preferable unless you really want smaller and potentially slower code.
(If you are compiling with -O0 or -O1, you won't see the directive either, but we do not recommend compiling at such low optimization levels for either size or speed.)

Erlang: Will adding type spec to code make dialyzer more effective?

I have a project that doesn't have -spec or -type in code, currently dialyzer can find some warnings, most of them are in machine generated codes.
Will adding type specs to code make dialyzer find more errors?
Off the topic, is there any tool to check whether the specs been violated or not?
Adding typespecs will dramatically improve the accuracy of Dialyzer.
Because Erlang is a dynamic language Dialyzer must default to a rather broad interpretation of types unless you give it hints to narrow the "success" typing that it will go by. Think of it like giving Dialyzer a filter by which it can transform a set of possible successes to a subset of explicit types that should ever work.
This is not the same as Haskell, where the default assumption is failure and all code must be written with successful typing to be compiled at all -- Dialyzer must default to assume success unless it knows for sure that a type is going to fail.
Typespecs are the main part of this, but Dialyzer also checks guards, so a function like
increment(A) -> A + 1.
Is not the same as
increment(A) when A > 100 -> A + 1.
Though both may be typed as
-spec increment(integer()) -> integer().
Most of the time you only care about integer values being integer(), pos_integer(), neg_integer(), or non_neg_integer(), but occasionally you need an arbitrary range bounded only on one side -- and the type language has no way to represent this currently (though personally I would like to see a declaration of 100..infinity work as expected).
The unbounded-range of when A > 100 requires a guard, but a bounded range like when A > 100 and A < 201 could be represented in the typespec alone:
-spec increment(101..200) -> pos_integer().
increment(A) -> %stuff.
Guards are fast with the exception of calling length/1 (which you should probably never actually need in a guard), so don't worry with the performance overhead until you actually know and can demonstrate that you have a performance problem that comes from guards. Using guards and typespecs to constrain Dialyzer is extremely useful. It is also very useful as documentation for yourself and especially if you use edoc, as the typespec will be shown there, making APIs less mysterious and easy to toy with at a glance.
There is some interesting literature on the use of Dialyzer in existing codebases. A well-documented experience is here: Gradual Typing of Erlang Programs: A Wrangler Experience. (Unfortunately some of the other links I learned a lot from previously have disappeared or moved. (!.!) A careful read of the Wrangler paper, skimming over the User's Guide and man page, playing with Dialyzer, and some prior experience in a type system like Haskell's will more than prepare you for getting a lot of mileage out of Dialyzer, though.)
[On a side note, I've spoken with a few folks before about specifying "pure" functions that could be guaranteed as strongly typed either with a notation or by using a different definition syntax (maybe Prolog's :- instead of Erlang's ->... or something), but though that would be cool, and it is very possible even now to concentrate side-effects in a tiny part of the program and pass all results back in a tuple of {Results, SideEffectsTODO}, this is simply not a pressing need and Erlang works pretty darn well as-is. But Dialyzer is indeed very helpful for showing you where you've lost track of yourself!]

Lua floating point operations

I run Lua on a CPU without dedicated floating point HW, depending on SW emulation.
From luaopt.h I can see that some macros are set to double, but it does not clearly state when floats are used and its a little hard to track it.
If my script does simple stuff like:
a=0
a=a+1
for...
Would that involve a floating point operations at any level?
If no that's fine, but what is then the benefit to change macros to long?
(I tried of course but did not work....)
All numeric operations in Lua are performed (according to the default configuration) in floating point. There is no distinction made between floating point and integer, all values are simply numbers.
The actual C type used to store a Lua number is set in luaconf.h, and it is both allowed and even practical to change that to a suitable integral type. You start by changing LUA_NUMBER from double to int, long, or perhaps ptrdiff_t. Then you will find you need to tweak the related macros that control the conversions between strings and numbers. And, of course, you will likely need to eliminate most or all of the base math library since math.sin() and its friends and neighbors are not particularly useful over integers.
The result will be a Lua interpreter where all numbers are integers. The language will still allow you to type 3.14, but it will be stored as 3. Your code will likely not be completely portable to a Lua interpreter built with the standard configuration since a huge amount of Lua code casually assumes that floating point arithmetic is permitted, and remember that your compiled byte code will definitely not be compatible since byte code will store numbers as LUA_NUMBER.
There is LNUM patch (used, for example, by OpenWrt project which relies heavily on Lua for providing Web UI on hardware without FPU) that allows dual integer/floating point representation of numbers in Lua with conversions happening behind the scenes when required. With it most integer computations will be performed without resorting to FPU. Unfortunately, it's only applicable to Lua 5.1; 5.2 is not supported.

Could the compiler automatically parallelize code that support parallelization?

Instead of adding the PLinq extension statement AsParallel() manually could not the compiler figure this out for us auto-magically? Are there any examples where you specifically do not want parallelization if the code supports it?
I do research in the area of automatic parallelization. This is a very tricky topic that is the subject of many Ph.D. dissertations. Static code analysis and automatic parallelization has been done with great success for languages such as Fortran, but there are limits to the compiler's ability to analyze inter-regional dependencies. Most people would not be willing to sacrifice the guarantee of code correctness in exchange for potential parallel performance gains, and so the compiler must be quite conservative in where it inserts parallel markers.
Bottom line: yes, a compiler can parallelize code. But a human can often parallelize it better, and having the compiler figure out where to put the markers can be very, very, very tricky. There are dozens of research papers available on the subject, such as the background work for the Mitosis parallelizing compiler or the D-MPI work.
Auto parallelization is trickier than it might initially appear. It's that "if the code supports it" part that gets you. Consider something like
counter = 0
f(i)
{
counter = counter + 1
return i + counter
}
result = map(f, {1,2,3,4})
If the compiler just decides to parallelize map here, you could get different results on each run of the program. Granted, it is obvious that f doesn't actually support being used in this manner because it has a global variable. However if f is in a different assembly the compiler can't know that it is unsafe to parallelize it. It could introspect the assembly f is in at runtime and decide then, but then it becomes a question of "is the introspection fast enough and the dynamic parallelization fast enough to not negate the benefits from doing this?" And sometimes it may not be possible to introspect, f could be a P/Invoked function, that might actually be perfectly suited to parallelization, but since the runtime/compiler has no way of knowing it has to assume it can't be. This is just the tip of the iceberg.
In short, it is possible, but difficult, so the trade-off between implementing the magic and the benefits of the magic were likely skewed too far in the wrong direction.
The good news is that the compiler researchers say that we are really close to having automatically parallelizing compilers. The bad news is that they have been saying that for fifty years.
The problem with impure languages such as C# is usually that there is not enough parallelism. In an impure language, it is very hard for a human and pretty much impossible for a program to figure out if and how two pieces of code interact with each other or with the environment.
In a pure, referentially transparent, language, you have the opposite problem: everything is parallelizable, but it usually doesn't make sense, because scheduling a thread takes way longer than just simply executing the damn thing.
For example, in a pure, referentially transparent, functional language, if I have something like this:
if a <= b + c
foo
else
bar
end
I could fire up five threads and compute a, b, c, foo and bar in parallel, then I compute the + and then the <= and lastly, I compute the if, which simply means throwing away the result of either foo or bar, which both have already been computed. (Note how this depends on being functional: in an impure language, you cannot simply compute both branches of an if and then throw one away. What if both print something? How would you "unprint" that?)
But if a, b, c, foo and bar are really cheap, then the overhead of those five threads will be far greater than the computations themselves.

Does how you name a variable have any impact on the memory usage of an application?

Declaring a variable name, how much (if at all) impact does the length of its name have to the total memory of the application? Is there a maximum length anyway? Or are we free to elaborate on our variables (and instances) as much as we want?
It depends on the language, actually.
If you're using C++ or C, it has no impact.
If you're using an interpreted language, you're passing the source code around, so it can have a dramatic impact.
If you're using a compiled language that compiles to an intermediate language, such as Java or any of the .NET languages, then typically the variable names, class names, method names, etc are all part of the IL. Having longer method names will have an impact. However, if you later run through an obfuscator, this goes away, since the obfuscator will rename everything to (typically) very short names. This is why obfuscation often gives performance impacts.
However, I will strongly suggest using long, descriptive variable/method/class names. This makes your code understandable, maintainable, and readable - in the long run, that far outweighs any slight perf. benefit.
It has no impact in a compiled language.
In compiled languages, almost certainly not; everything becomes a symbol in a symbol table. In interpreted languages, the answer is also no, with a few extremely rare exceptions (in certain older versions of Python there would be a difference, for example).
MSVC++ truncates variable names to 255 characters. Variable name length has no impact on compiled code size.
As stated by others, variable names disappear in compiled languages. I believe that local variable names in .Net may be discarded. But generally speaking, even in an interpreted language, the memory consumption of variable names is negligible, especially in light of the advantages of good variable names.
Actually in ASP.NET long variable names for controls and master pages do add to the size of the generated HTML. This will add some insignificant extra memory to buffer the output stream, but the effect will be most noticed in the extra few hundred bytes your sending over the network.
In Python, the names appear to be collected into a number of simple tables; each name appears exactly once in each code object.The names have no impact on performance.
For statistical purposes, I looked at a 20 line function that was a solution to Project Euler problem 15. This function created a 292-byte code object. It used 7 distinct names in the name table. You'd have to use 41-character variable names to double the size of the byte-code file.
That would be the only impact -- insanely large names might slow down load time for your module.

Resources