Why is -compile(export_all) bad practice? - erlang

All the erlang books seem to say export_all is bad practice but don't give a reason. In the end most modules spend a majority of their time with compile(export_all) because constantly updating the list of modules to remove the helper functions is a hassle. Is it bad practice because I'm supposed to care about the functions I expose to other developers? Or is it bad practice because there's some kind of performance cost in the number of functions a module has, because of maybe things like hot code loading. If there is a performance hit to stuffing a module with a lot of functions, how bad is it?

For several reasons:
Clarity: it's easier to see which functions are intended to be used outside the module.
When you tab complete in the Erlang shell you get a list of only the exported functions and no others. When you refactor the module, you know which functions you can safely rename without external users depending on them.
Code smell: you get warnings for unused functions.
Therefore you'll avoid dead code.
Optimization: the compiler might be able to make more aggressive optimizations knowing that not all functions have to be exported.

While I don't know for sure if there are any practical performance implications of using -compile(export_all)., I doubt they are significant enough to care.
However, there a benefit of declaring the list of exports explicitly. By doing this, everyone can figure out the interface of the module by looking at the first page of the .erl file. Also, as with many other things that we tend to write down, explicit declaration of the module interface helps to maintain its clarity.
With that said, when I start working on a new Erlang module I always type -module(...). -compile(export_all). After the interface becomes mature enough I add an explicit -export([...]) while keeping the export_all compile option.

Having a defined list of which functions are external, and therefore which ones are internal, is extremely useful for anyone who will work on your code in the future. I've recently been refactoring some old code, and the use of export_all in most of the modules has been a continual source of annoyance.

Related

Why are compilers non-reentrant at the beginning?

I'm reading some yacc and lex related stuff and some other compiler implementations, it seems like they are all using global state and hence really unsafe to use multithreaded situations so it is hard to embed them in other programs. I know GNU Bison and Flex could be used for re-entrancy but why are they not on by default?
Because when the interface for Lex and Yacc was defined, many many many years ago, the use of globals was much more common. Reentrancy changes the interface, and the reentrant interfaces have never been formally standardised (which is probably just as well, given the state of play). At the time, multithreading was not very common, largely because a typical computer just barely had the resources to do one compilation (and sometimes not even that; it was also pretty common for compilation passes to be sequentially loaded executables).
So the default continues to be the non-reentrant, standardised interface. And it probably will remain that way, whether or not we like it.

llvm based code mutation for genetic programming?

for a study on genetic programming, I would like to implement an evolutionary system on basis of llvm and apply code-mutations (possibly on IR level).
I found llvm-mutate which is quite useful executing point mutations.
As far as I have understood, the instructions get count/numbered, one can then e.g. delete a numbered instruction.
However, introduction of new instructions seems to be possible as one of the availeable statements in the code.
Real mutation however would allow to insert any of the allowed IR instructions, irrespective of it beeing used in the code to be mutated.
In addition, it should be possible to insert library function calls of linked libraries (not used in the current code, but possibly available, because the lib has been linked in clang).
Did I overlook this in the llvm-mutate or is it really not possible so far?
Are there any projects trying to /already have implement(ed) such mutations for llvm?
llvm has lots of code analysis tools which should allow the implementation of the afore mentioned approach. llvm is huge, so I'm a bit disoriented. Any hints which tools could be helpful (e.g. getting a list of available library functions etc.)?
Thanks
Alex
Very interesting question. I have been intrigued by the possibility of doing binary-level genetic programming for a while. With respect to what you ask:
It is apparent from their documentation that LLVM-mutate can't do what you are asking. However, I think it is wise for it not to. My reasoning is that any machine-language genetic program would inevitably face the "Halting Problem", e.g. it would be impossible to know if a randomly generated instruction would completely crash the whole computer (for example, by assigning a value to a OS-reserved pointer), or it might run forever and take all of your CPU cycles. Turing's theorem tells us that it is impossible to know in advance if a given program would do that. Mind you, LLVM-mutate can cause for a perfectly harmless program to still crash or run forever, but I think their approach makes it less likely by only taking existing instructions.
However, such a thing as "impossibility" only deters scientists, not engineers :-)...
What I have been thinking is this: In nature, real mutations work a lot more like LLVM-mutate that like what we do in normal Genetic Programming. In other words, they simply swap letters out of a very limited set (A,T,C,G) and every possible variation comes out of this. We could have a program or set of programs with an initial set of instructions, plus a set of "possible functions" either linked or defined in the program. Most of these functions would not be actually used, but they will be there to provide "raw DNA" for mutations, just like in our DNA. This set of functions would have the complete (or semi-complete) set of possible functions for a problem space. Then, we simply use basic operations like the ones in LLVM-mutate.
Some possible problems though:
Given the amount of possible variability, the only way to have
acceptable execution times would be to have massive amounts of
computing power. Possibly achievable in the Cloud or with GPUs.
You would still have to contend with Mr. Turing's Halting Problem.
However I think this could be resolved by running the solutions in a
"Sandbox" that doesn't take you down if the solution blows up:
Something like a single-use virtual machine or a Docker-like
container, with a time limitation (to get out of infinite loops). A
solution that crashes or times out would get the worst possible
fitness, so that the programs would tend to diverge away from those
paths.
As to why do this at all, I can see a number of interesting applications: Self-healing programs, programs that self-optimize for an specific environment, program "vaccination" against vulnerabilities, mutating viruses, quality assurance, etc.
I think there's a potential open source project here. It would be insane, dangerous and a time-sucking vortex: Just my kind of project. Count me in if someone doing it.

static analysis vs static typing

I'm learning Elixir, and the tool 'dialyzer' lets you do static analysis - annotate the function definition with the type specification of the parameters it expects and the output it returns. It's completely optional, but if it were to be used to the full extent possible, how does it match up to good 'ol static typing?
My impression was that dialyzer is not as exact as static typing, meaning that it sometimes doesn't report an error, although it should.
On the plus side, if a dialyzer complains, it's almost always my fault. More often than not, errors are usually due to incorrect typespec.
So, while I don't think dialyzer is as good tool as static typing, it still helps. In particular, I find typespecs very useful, since they can serve as a documentation. Recently I've switched job, and the project I joined is a complex Erlang project. Owing to typespecs it was easy to find my way around the codebase.
So my advice is to use typespecs in larger projects. We write them only for exported (public) functions and records, and it's a big help, without taking up too much time. I usually first make the code work, and when I'm happy with it, add specs, and run dialyzer to verify all is fine.
While static typing takes care of a whole class of bugs, static analysis tools like dialyzer can tell you a lot more about potential pitfalls in your code. Assuming you used specs to their fullest extent, dialyzer would probably be more useful than static typing would be on it's own, at least compared to languages like Go, C#, etc. Something with a much more powerful type system like Haskell still can benefit from static analysis, but less so than a language with a more naive type system like Go. Static analysis is most useful when combined with a static type system though, and since Erlang and Elixir are both dynamic languages, static analysis can only do so much. That said, dialyzer is extremely powerful and useful, and if used consistently, should offer at least the same level of protection, if not more, as the type systems you are probably already familiar with.
I would take a look at the dialyzer docs (http://www.erlang.org/doc/man/dialyzer.html), they can tell you a lot more about what you can expect from the tool with respect to Erlang and Elixir. Hopefully that helps!
For one thing static typing is built into the compilation phase--sort of impossible to miss. Static analysis on the other hand is something that a developer has to run voluntarily.
Likewise, one doesn't have to annotate anything in Elixir; it's entirely down to programmer discretion. In statically typed languages it's impossible to avoid.
I would say your question is sort of broad and therefore hard to answer with any rigor. You might want to put this over on Programmers.Stackechange.

Delphi 2010 inlining useless?

What is the go with inlining functions or procedures in Delphi (specifically v2010 here, but I had the same issue with Turbo Delphi)?
There is some discalimer in the help about it may not always inline a function because of "certain criteria" whatever that means.
But I have found that generally inlining functions (even very simple ones that have 3 or 4 lines of code) slows down code rather than speeds it up.
A great idea would be a compiler option to "inline everything". I don't care if my exe grows by 50% or so to get it working faster.
Is there a way I can force Delphi to really inline code even if it is not decided to be inlinded by the compiler? That would really help. Otherwise you need to do "manual inlining" of replicating the procedure code throughout multiple areas of your code with remarks like "//inlining failed here, so if you change the next 5 lines, change them in the other 8 duplicate spots this code exists"
Any tips here?
There's a compiler option for automatic inlining of short routines. In Project Options, under Delphi Compiler -> Compiling -> Code Generation, turn "Code inlining control" to Auto. Be aware, though, that this should only be on a release build, since inlined code is difficult to debug.
Also, you said that you don't mind making your program larger as long as it gets faster, but that often inlining makes it slower. You should be aware that that might be related. The larger your compiled code is, the more instruction cache misses you'll have, which slows down execution.
If you really want to speed your program up, run it through a profiler. I recommend Sampling Profiler, which is free, is made to work with Delphi code (including 2010) and doesn't slow down your execution. It'll show you a detailed report of what code you're actually spending the most time executing. Once you've found that, you can focus on the bottlenecks and try to optimize them.
Inlining can make things slower in some cases. The inlined function may increase the number of CPU registers required for local variables. If there aren't enough registers available variables will be located in memory instead, which makes it slower.
If the function isn't inlined it will have (almost) all CPU registers available.
I've found that's it's typically not a good idea to inline functions containing loops. They will use a couple of variables which are likely to end up in memory, making the inlined code slower.
If you want to force inlining then use include files. You need to make sure you declare the correct variables, and then use {$I filename.inc}. That will always inject that specific code right where you want it, and make it easier to maintain if you need to change it.
Keep in mind that the compiler is written by people way smarter then most mere mortals (including myself) and has access to more information when deciding to inline or not, so when it doesn't inline it probably has a good reason.
If I understood one of the FPC compiler devels (which has the same issue) correctly, inlining can only happen when the routine to be inline was already compiled.
IOW if you make the unit with the inlined-to-be functions a "leaf" unit, and put it as first in the uses clause of your project (.dpr), it should be ok. Note that with "leaf" unit, I mean a unit that has no dependancy on other units in the project, iow only on already compiled units.
I wouldn't be surprised it was the same in Delphi, since it shares an unit system based on the same principles.
It is also pretty unfixable without violating separate compilation principles.

Does F# provide you automatic parallelism?

By this I meant: when you design your app side effects free, etc, will F# code be automatically distributed across all cores?
No, I'm afraid not. Given that F# isn't a pure functional language (in the strictest sense), it would be rather difficult to do so I believe. The primary way to make good use of parallelism in F# is to use Async Workflows (mainly via the Async module I believe). The TPL (Task Parallel Library), which is being introduced with .NET 4.0, is going to fulfil a similar role in F# (though notably it can be used in all .NET languages equally well), though I can't say I'm sure exactly how it's going to integrate with the existing async framework. Perhaps Microsoft will simply advise the use of the TPL for everything, or maybe they will leave both as an option and one will eventually become the de facto standard...
Anyway, here are a few articles on asynchronous programming/workflows in F# to get you started.
http://blogs.msdn.com/dsyme/archive/2007/10/11/introducing-f-asynchronous-workflows.aspx
http://strangelights.com/blog/archive/2007/09/29/1597.aspx
http://www.infoq.com/articles/pickering-fsharp-async
F# does not make it automatic, it just makes it easy.
Yet another chance to link to Luca's PDC talk. Eight minutes starting at 52:20 are an awesome demo of F# async workflows. It rocks!
No, I'm pretty sure that it won't automatically parallelise for you. It would have to know that your code was side-effect free, which could be hard to prove, for one thing.
Of course, F# can make it easier to parallelise your code, particularly if you don't have any side effects... but that's a different matter.
Like the others mentioned, F# will not automatically scale across cores and will still require a framework such as the port of ParallelFX that Josh mentioned.
F# is commonly associated with potential for parallel processing because it defaults to objects being immutable, removing the need for locking for many scenarios.
On purity annotations: Code Contracts have a Pure attribute. I remember hearing the some parts of the BCL already use this. Potentially, this attribute could be used by parallellization frameworks as well, but I'm not aware of such work at this point. Also, I' not even sure how well code contacts are usable from within F#, so a lot of unknowns here.
Still, it will be interesting to see how all this stuff comes together.
No it will not. You must still explicitly marshal calls to other threads via one of the many mechanisms supported by F#.
My understanding is that it won't but Parallel Extensions is being modified to make it consumable by F#. Which won't make it automatically multi-thread it, should make it very easy to achieve.
Well, you have your answer, but I just wanted to add that I think this is the most significant limitation of F# stemming from the fact that it is a hybrid imperative/functional language.
I would like to see some extension to F# that declares a function to be pure. That is, it has no side-effects that are not denoted by the function's type. The idea would be that a function is pure only if it references other "known-pure" functions. Of course, this would only be useful if it were then possible to require that a delegate passed as a function parameter references a pure function.

Resources