I'm working in automotive field and my Company tends to buy a stack analysis tool (a tool to compute the maximum stack for a given source code or binary). We are using different targets ranging from 8 bits to 32 bits, previously we were using a home made tool, and we are currently evaluating stack analyzer from Absint,
any other tool suggestions will be helpful.
If you can be satisfied with the kind of approximation that can be made by doing the analysis at the source level, and you are using C, Frama-C's value analysis can give you an exhaustive list of call stacks (in terms of source functions) that can happen at run-time.
Frama-C also provides the building blocks to quickly convert these source-level possible call stacks into stack depths if you know precisely how your C compiler works: for each function, you can programmatically inspect local variables, arguments, ...
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I was trying to understand why math might be so slow in Erlang, and if so, what I could do to find out where it is slow and try to speed it up. Some people said it's because it's a VM, but I doubt that because the JVM is fast at math, and so is V8. Other people said it's because it's an immutable language, but OCaml is immutable and quite fast at math. So what makes Erlang slow at math, and how would I go about finding where in the code it is slow? I can only imagine using DTrace, as I don't know Linux/BSD tools that well that I should use as well, and I don't know which ones would be good at profiling code within a VM and the VM itself, and if those require different tools.
Before OTP24:
BEAM does not have JIT, so typically the erlang compiler (erlc) outputs bytecode: Any math operation needs to access the VM registers (which are memory positions), perform the actual operation, place the result in the relevant VM register and then jump to the next instruction. This is quite slow compared to just performing the actual operation in machine code.
If you use any language that compiles directly to machine code (like C), the compiler has a lot more information about the code and the platform and thus is able to use features that speed up execution of the operations like optimizing the processor's pipeline, using vectorized instructions, placing the most accessed variables in processor's registers, optimizing memory access so that they hit the cache...
The HiPE compiler is there to compile erlang code to native, and you should use it if your program uses a lot of math. Be sure to check its limitations, though.
If the HiPE compiler is not enough, you can always code the critical math operations in C and include it.
Furthermore, you should check this question with a comparison between Erlang and C (and others) for a pure math problem
Regarding immutability, it shouldn't have any impact unless your integers are placed on the thread's heap (>60bits) because those are the only ones that require explicit memory handling.
In order to profile Erlang code, you have these tools to deal with Erlang from within.
Lastly, you can always post your snippet here and maybe we'll be able to point something.
After OTP24:
Erlang now has JIT, some reports state as much as 25% improvement.
I came across with a new dynamic language. I would like to create a coverage tool for that language. I started reading the source code of Perl 5 and Python coverage modules but it got complicated. It's a dynamic scripting language so I guess that source code of static languages (like Java & C++) won't help me here. Also, as I understand, each language was built in a different way and the same ideas won't work. But, the big concepts could be similar.
My question is as follows: how do I "attack" this task? What is the proper workflow I need to follow? What I need to investigate? Are there any books or blogs I can read about those kind of stuff?
There are two kinds of coverage collection mechanisms:
1) Real-time sampling of the program counter, typically by a clock running at 1-10ms. Difficulties: a) mapping an actual PC value back to a source line, b) sampling means you might not see execution of a rarely used bit of code, so your coverage reporting is inaccurate. Because of these issues, this approach isn't used very often.
2) Instrumenting the program so that it collects coverage as it runs. This is hard to do with object code... a) you have to decode the instructions to see where to put probes, and this can be very hard to do right, b) you have patch the source code to include the probes (this can be really awkward; a "probe" might consist of a 5 byte subroutine call but the probe has replace a single-byte instruction). c) you still have to figure out how to map a probe location back to a source code line. A more effective way is to instrument the source code, which requires pretty sophisticated machinery to read source, make probe patches, and regenerate the instrumented code for execution/compilation.
My technical paper Branch Coverage for Arbitrary Languages Made Easy provides explicit detail for how to do this in a general way. My company has built commercial test coverage tools for a wide variety of languages (C, Python, PHP, COBOL, Java, C++, C#, ProC,....) using this approach. This covers most static and dynamic languages. Some dynamic mechanisms are extremely difficult to instrument, e.g., eval() but that is true of every approach.
In addition to Ira's answer, there is a third coverage collection mechanism: the language implementation provides a callback that can inform you about program events. For example, Python has sys.settrace: you provide it a function, and Python calls your function for every function called or returned, and every line executed.
for a study on genetic programming, I would like to implement an evolutionary system on basis of llvm and apply code-mutations (possibly on IR level).
I found llvm-mutate which is quite useful executing point mutations.
As far as I have understood, the instructions get count/numbered, one can then e.g. delete a numbered instruction.
However, introduction of new instructions seems to be possible as one of the availeable statements in the code.
Real mutation however would allow to insert any of the allowed IR instructions, irrespective of it beeing used in the code to be mutated.
In addition, it should be possible to insert library function calls of linked libraries (not used in the current code, but possibly available, because the lib has been linked in clang).
Did I overlook this in the llvm-mutate or is it really not possible so far?
Are there any projects trying to /already have implement(ed) such mutations for llvm?
llvm has lots of code analysis tools which should allow the implementation of the afore mentioned approach. llvm is huge, so I'm a bit disoriented. Any hints which tools could be helpful (e.g. getting a list of available library functions etc.)?
Thanks
Alex
Very interesting question. I have been intrigued by the possibility of doing binary-level genetic programming for a while. With respect to what you ask:
It is apparent from their documentation that LLVM-mutate can't do what you are asking. However, I think it is wise for it not to. My reasoning is that any machine-language genetic program would inevitably face the "Halting Problem", e.g. it would be impossible to know if a randomly generated instruction would completely crash the whole computer (for example, by assigning a value to a OS-reserved pointer), or it might run forever and take all of your CPU cycles. Turing's theorem tells us that it is impossible to know in advance if a given program would do that. Mind you, LLVM-mutate can cause for a perfectly harmless program to still crash or run forever, but I think their approach makes it less likely by only taking existing instructions.
However, such a thing as "impossibility" only deters scientists, not engineers :-)...
What I have been thinking is this: In nature, real mutations work a lot more like LLVM-mutate that like what we do in normal Genetic Programming. In other words, they simply swap letters out of a very limited set (A,T,C,G) and every possible variation comes out of this. We could have a program or set of programs with an initial set of instructions, plus a set of "possible functions" either linked or defined in the program. Most of these functions would not be actually used, but they will be there to provide "raw DNA" for mutations, just like in our DNA. This set of functions would have the complete (or semi-complete) set of possible functions for a problem space. Then, we simply use basic operations like the ones in LLVM-mutate.
Some possible problems though:
Given the amount of possible variability, the only way to have
acceptable execution times would be to have massive amounts of
computing power. Possibly achievable in the Cloud or with GPUs.
You would still have to contend with Mr. Turing's Halting Problem.
However I think this could be resolved by running the solutions in a
"Sandbox" that doesn't take you down if the solution blows up:
Something like a single-use virtual machine or a Docker-like
container, with a time limitation (to get out of infinite loops). A
solution that crashes or times out would get the worst possible
fitness, so that the programs would tend to diverge away from those
paths.
As to why do this at all, I can see a number of interesting applications: Self-healing programs, programs that self-optimize for an specific environment, program "vaccination" against vulnerabilities, mutating viruses, quality assurance, etc.
I think there's a potential open source project here. It would be insane, dangerous and a time-sucking vortex: Just my kind of project. Count me in if someone doing it.
Are floating point operations in Delphi deterministic?
I.E. will I get the same result from an identical floating point mathematical operation on the same executable compiled with Delphi Win32 compiler as I would with the Win64 compiler, or the OS X compiler, or the iOS compiler, or the Android compiler?
This is a crucial question as I'm implementing multiplayer support in my game engine, and I'm concerned that the predictive results on the client side could very often differ from the definitive (and authoritative) determination of the server.
The consequence of this would be the appearance of "lag" or "jerkiness" on the client side when the authoritative game state data overrules the predictive state on the client side.
Since I don't realistically have the ability to test dozens of different devices on different platforms compiled with the different compilers in anything resembling a "controlled condition", I figure it's best to put this question to the Delphi developers out there to see if anyone has an internal understanding of floating point determinism at a low-level on the compilers.
I think there is no simple answer. Similar task was discussed here.
In general, there are two standards for presentation of floating point numbers:
IEEE 754-1985 and EEE 754-2008.
All modern (and quite old actually) CPUs follow the standards and it guarantees some things:
Binary presentation of same standard floating type will be equal
Result of some operations (not all, only basic operations!) is guaranteed to be equal, but only if compiler will use same type of the command, i am not sure it is true.
But if you use some extended operations, such as square root, result may vary even for different models of desktop CPUs. You can read this article for some details:
http://randomascii.wordpress.com/2013/07/16/floating-point-determinism/
P.S. As tmyklebu mentioned, square root is also defined by IEEE 754, so it is possible to guarantee same result for same input for Add, Subtract, Multiply, Divide and Square root. Few other operations are also defined by IEEE, but for all details it is better to read IEEE.
Putting aside the standards for floating point calculations for a moment, consider that the 32 and 64 bit compilers compile to use the old FPU vs the newer SSE instructions. I would find it difficult to trust that every calculation would always come out exactly the same on different hardware implementations. Better go the safe route and assume that if its within a small delta pf difference, you evaluate as equal.
From experience I can tell that the results are different: 32-bit works with Extended precision by default, while 64-bit works with double precision by default.
Consider the statement
x,y,z: double;
x := y * z;
in Win32 this will execute as "x := double(Extended(y)*Extended(z));
in Win64 this will execute as "x := double(double(y)*double(z));
you put a lot of effort into ensuring that you use the same precision and mode. However whenever you call 3rd party libraries, you need to consider that they may internally change these flags.
I've been reading up on JIT's and LuaJIT's trace compiler in particular, and I ended up with some questions.
From what I understand, LuaJIT's JIT doesn't compile hot methods like Java's HotSpot does, it compiles hot paths originating from loops. Does this mean that if something doesn't originate from a loop (say, I call Lua functions from the C-api) then the code will never be jitted? And what happens when you hit another loop? Will the path to the second loop be JIT'ed, and then a new path from that loop jitted as well, or will the second loop be a part of the same path?
How does the interpreter choose the most optimal hot path? Let's say I have a hash-table of ints -> strings. Now imagine that I've called table[x] with x being 3 and 5 enough times that they've become hot paths and jitted, how does the interpreter decide which jitted code to call for table[x] where x is 4?
Another thing that has been racking my brain. Since paths are compiled, not functions, won't a trace compiler require more memory? Since you can't really re-use compiled code of another path I mean, and since paths will probably be larger than single functions in the general case...
Mike Pall responded in quite detail on the LuaJIT mailing list.
http://www.freelists.org/post/luajit/How-does-LuaJITs-trace-compiler-work,1
The first part you need to under stand is the LuaJIT IR and Bytecode, which you can check out on the wiki, this is what the LuaJIT interpreter runs and optimizes and hence does the traces on to determine what needs to be compiled and various as well as the additional of optimizations such as loop-unrolling for hot-loops in the trace path.
The second place to check is the LJ FAQ, which has this to say:
Q: Where can I learn more about the compiler technology used by
LuaJIT?
I'm planning to write more documentation about the internals
of LuaJIT. In the meantime, please use the following Google Scholar
searches to find relevant papers:
Search for: Trace Compiler
Search for: JIT Compiler
Search for: Dynamic Language Optimizations
Search for: SSA Form
Search for: Linear Scan Register Allocation
Here is a list of the innovative features in LuaJIT. And, you know, reading the
source is of course the only way to enlightenment. :-)
Abet very tongue-in-cheek (mainly 'cause Mike focuses on development rather than documentation), the most important part there is the last sentence, the source is very clean and the only actual way to find out how LJ does its magic. Additionally the innovative features link also gives one more clues on what to search for.
Wikipedia has a more descriptive page on tracing JIT, however, the papers at the bottom are what you'll find most useful to help understand the concepts used in the LJ source.
Some of the source files (in C) to get you started
lj_record.c: core trace recorder, converts bytecode into IR
lj_trace.c: more trace management
lj_snap.c: handles/creates trace snapshots
lj_ffrecord.c: records data for fast functions
lj_crecord.c: records C data ops