Why is math so slow in Erlang/the BEAM VM? [closed]

Why is math so slow in Erlang/the BEAM VM? [closed] - erlang

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I was trying to understand why math might be so slow in Erlang, and if so, what I could do to find out where it is slow and try to speed it up. Some people said it's because it's a VM, but I doubt that because the JVM is fast at math, and so is V8. Other people said it's because it's an immutable language, but OCaml is immutable and quite fast at math. So what makes Erlang slow at math, and how would I go about finding where in the code it is slow? I can only imagine using DTrace, as I don't know Linux/BSD tools that well that I should use as well, and I don't know which ones would be good at profiling code within a VM and the VM itself, and if those require different tools.

Before OTP24:
BEAM does not have JIT, so typically the erlang compiler (erlc) outputs bytecode: Any math operation needs to access the VM registers (which are memory positions), perform the actual operation, place the result in the relevant VM register and then jump to the next instruction. This is quite slow compared to just performing the actual operation in machine code.
If you use any language that compiles directly to machine code (like C), the compiler has a lot more information about the code and the platform and thus is able to use features that speed up execution of the operations like optimizing the processor's pipeline, using vectorized instructions, placing the most accessed variables in processor's registers, optimizing memory access so that they hit the cache...
The HiPE compiler is there to compile erlang code to native, and you should use it if your program uses a lot of math. Be sure to check its limitations, though.
If the HiPE compiler is not enough, you can always code the critical math operations in C and include it.
Furthermore, you should check this question with a comparison between Erlang and C (and others) for a pure math problem
Regarding immutability, it shouldn't have any impact unless your integers are placed on the thread's heap (>60bits) because those are the only ones that require explicit memory handling.
In order to profile Erlang code, you have these tools to deal with Erlang from within.
Lastly, you can always post your snippet here and maybe we'll be able to point something.
After OTP24:
Erlang now has JIT, some reports state as much as 25% improvement.

Related

Is it preferable to expose a C library to Elixir/Erlang, or to implement in Erlang/Elixir? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
My background's in Perl/Python/node and a smattering of Ruby. I've not used BEAM VM languages before.
In Perl/Python/Node/Ruby, if I wanted to handle 'lower level' tasks - intense compute, needing access to threads, or more commonly, wrapping a C library - I'd write something in C. Elixir/Erlang obviously has great parallelism in the form of Erlang processes and very low latency, eliminating much of that need.
So if I had a C library, would it be preferable to make an Elixir/Erlang wrapper or just reimplement the functionality?
A very specific example: does Elixir/Erlang's TLS use OpenSSL, or is it implemented in a BEAM language?

The usual way to implement things in Erlang is to implement it in Erlang first, then measure it. If it doesn't perform well, look for improvement in Erlang first. If it is not still good enough, rewrite hot parts in C. You can use ports as a safer way or NIFs if data transfer would kill it. The rationale behind is that Erlang (or Elixir as well) provide a great environment for writing simple, safe code. It provides great exception handling and support for reliability. You can think about Erlang as a domain specific language for writing server service and container for handling concurrent tasks and distributing load to CPUs and executed in high performing small chunks of code in another language. The less code you have to implement in the other language than Erlang/Elixir the better.
If you think about binary protocol handling as low level, definitely try implement it in Erlang because bit manipulation syntax in Erlang is just amazing.
And yes, TLS is mostly written in Erlang (key handling, IO, ...) with performance critical parts in C (hash and crypto algorithms).
One of very good advice is: Always measure, don't expect. You can even see a nontrivial solution that performs better implemented in pure Erlang than in C++. See Comparing Cpp And Erlang For Motorola Telecoms Software.

Moving away from Itanium [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
We currently have a large business-critical application written in COBOL, running on OpenVMS (Integrity/Itanium).
As the months pass, there is more and more speculation about the lifetime of the Itanium architecture. Nothing is said out in the open, of course, but articles like this and this paint a worrying picture. Although I can find nothing official to support this, there are even murmurings in the corridors of our company of HP ditching OpenVMS and HP COBOL along with it.
I cannot believe that we are alone in this.
The way I see it, there are a few options:
Emulate some old hardware and run the application on that using a product like CHARON-VAX or CHARON-AXP. The way I see it, the pros are that the process should be relatively painless, especially if the 64-bit (AXP) option is used. Potential cons are a degradation in performance (although this should be offset by faster and faster hardware);
Port the HP COBOL-based application to a more modern dialect of COBOL, such as Visual COBOL. The pros, then, are the fact that the porting effort is relatively low (it's still COBOL) and the fact that one can run the application on a Unix or Windows platform. The cons are that although you're porting COBOL, the fact that you're porting to a different operating system could make things tricky (esp. if there are OpenVMS-specific dependencies);
Automatically translate the COBOL into a more modern language like Java. This has the obvious benefit of immediately freeing one from all the legacy issues in one fell swoop: hardware support, operating system support, and especially finding administrators and programmers. Apart from this being a big job, an obvious downside is the fact that one will end up with non-idiomatic Java (or whatever target language is ultimately chosen); arguably, this is something that can be ameliorated over time.
A rewrite, from the ground up (naturally, using modern technologies). Anyone who has done this knows how expensive and time-consuming it is. I've only included it to make the list complete :)
Note that there is no dependency on a proprietary DBMS; the database is ISAM file-based.
So ... my question is:
What are others faced with the imminent obsolescence of Itanium doing to maintain business continuity when their platform of choice is OpenVMS and COBOL?
UPDATE:
We have had an official assurance from our local HP representative that Integrity/Itanium/OpenVMS will be supported at least up until 2022. I guess this means that this whole issue is less about the platform, and more about the language (COBOL).

The main problem with this effort will be the portions of the code that are OpenVMS specific. Most applications developed on OpenVMS typically use routines and procedures that are not easily ported to another platform. Rather that worry about specific language compatibility, I would initially focus on the runtime routines and command procedures used by the application.
An alternative approach may be to continue to use the current application while developing a new one or modifying a commercially available application to suit your needs. While the long term status of Itanium is in question, history indicates that OpenVMS will remain viable for some time to come. There are still VAX machines being used today for business critical applications. The fact that OpenVMS and its hardware is stable is the main reason for its longevity.
Dan

Looks like COBOL is the main dependency that keeps you worried. I undrestand Itanium+OpenVMS in this picture is just a platform.
You're definitely not alone running mission-critical stuff on OpenVMS. HP site has OpenVMS roadmap (both Alpha and Integrity), support currently stretches to 2015. Oracle seems trying to leverage it's SUN asset in different domains recently.
In any case, if your worries are substantial (sure we all worried about COMPAQ, then HP, vax>>alpha>>Itanium transitions in the past), there's time to un-tie the COBOL dependency.
So I would look now into charting out migration path from COBOL onto more portable language of choice (eg. C/C++ ANSII without platform extensions). Perhaps Java isn't the frendliest choice, given Oracle's activity. Re-write, how unpleasant it is, will be more progressive and likely will streamline the whole process. The sooner one starts, the sooner one completes.
Also, in addition to emulators, there're still plenty of second-hand hardware. Ironically, one company I know just now phases-in Integrity platforms to supplant misson-critical Alphas -- I guess, it's "corporate testing requirements"...
Do-nothing is an option as well, though obviously riskier: OpenVMS platforms are proven to be dependable, so alternatively, finding a reliable third-party support company may extend your future hardware contingency.

This summer's Rolling Roadmap makes porting off OpenVMS look like an excellent idea.
Given how much COBOL exists in the world finding people to support COBOL will not be a problem for the foreseeable future. As noted above there are COBOL compilers on other platforms. The problem lies in the OpenVMS system service calls and DEC language extensions your application uses. You do not mention where your data is stored, so worst case your COBOL uses RMS. There is a company that provides an implementation of many OpenVMS system services on Linux and the Unixes. Not needing to replace those services while porting to another operating system may reduce the complexity. Check out Sector7.com.

Why use a stack-oriented language? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
The community reviewed whether to reopen this question 3 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
I recently took a look at Factor, and the idea of having a language based around the concept of a stack is very interesting. (This was my first encounter with a stack-oriented language.) However, I don't see any practical advantages of such a paradigm. To me, it just seems like more trouble than it's worth. Why would I use a stack-oriented language such as Factor or Forth?
I'm ignoring factors (excuse the pun) such as the availability of tools and libraries. I'm asking only about the language paradigm itself.

Stack orientation is an implementation detail. For example, Joy can be implemented using rewriting - no stack. This is why some prefer to say "concatenative" or "compositional". With quotations and combinators you can code without thinking about the stack.
Expressing yourself with pure composition and without locals or named arguments is the key. It's extremely succinct with no syntactic overhead. Composition makes it very easy to factor out redundancy and to "algebraically" manipulate your code; boiling it down to its essence.
Once you've fallen in love with this point-free style you'll become annoyed by even the slightest composition syntax in other languages (even if just a dot). In concatenative languages, white space is the composition operator.

I'm not sure whether this will quite answer your question, but you'll find that Factor describes itself as a concatenative language first and foremost. It just happens also to have a stack-based execution model. Unfortunately, I can't find Slava's blog post(? or maybe on the Factor Wiki?) talking about this.
The concatenative model basically means that you pass around "hunks of code" (well, that's how you program anyway) and composition looks like concatenation. Operations like currying are also easy to express in a stack-based language since you just pre-compose with code that adds one thing to the stack. In Factor, at least, this is expressed via a word called curry. This makes it much easier to do higher order programming, and mapping over sequences eventually becomes the "obvious way to do it". I came from Lisp and was amazed going back after programming in Factor for a bit that you couldn't do "obvious things" like bi in Lisp. It really does change how you express things.
Incidentally, it's wise not to get too hung up on the whole stack manipulation thing. Using the locals vocabulary (described here: http://docs.factorcode.org/content/article-locals.html), you don't have to worry about shuffling things around. Often there's a neat way to express things without local variables, but I tend to do that second.

One of the important reasons stack-based languages are being developed is because the minimalism of their semantics allows straightforward interpreter and compiler implementation, as well as optimization.
So, one of the practical advantage of such paradigm is that it allows enthusiast people to easily build more complex things and paradigms on top of them.
The Scheme programming language is another example of that: minimalist syntax and semantics, straightforward implementation, and lots of fun!

[EDITED] We already have good answers and I know nothing about the Factor language. However, the favouring of stack usage is a practical advantage of a stack-oriented paradigma and a reason to adopt such paradigma, as asked.
So, I think it is worth listing the advantages of stack usage instead of heap allocation for completeness:
CPU Time -- The time cost of memory allocation in the stack is practically free: doesn't matter if you are allocating one or one thousand integers, all it takes is a stack pointer decrement operation. example
Memory leak -- There are no memory leaks when using the stack only. That happens naturally without additional code overhead to deal with it. The memory used by a function is completely released when returning from each function even on exception handling or using longjmp (no referencing counting, garbage collection, etc).
Fragmentation -- Stacks also avoid memory fragmentation naturally. You can achieve zero fragmentation without any additional code to deal with this like an object pool or slab memory allocation.
Locality -- Data in stack favors the data locality, taking advantage of cache and avoiding page swaps.
Of course, it may be more complicated to implement, depending on your problem, but we shall favor stack over heap always we can in any language. Leave malloc/new to be used only when actually needed (size or lifetime requirements).

For some people it's easier to think in terms of managing stacks than other paradigms. At the very least, doing some hacking in a stack-based language will improve your ability to manage stacks in general.
Aside: in the early days of handheld calculators, they used something called Reverse Polish notation, which is a very simple stack-based postfix notation, and is extremely memory efficient. People who learn to use it efficiently tend to prefer it over algebraic calculation.

Embedded language: Lua vs Common Lisp (ECL) [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
Does anybody here have a experience with Common Lisp as a embedded language (using ECL)? If so, how good is ECL compared to Lua?

I haven't embedded CL before, but I've done it with both Lua and two particular Scheme implementations (Gambit-C and GNU Guile).
Scheme makes a great embedded language in my opinion, because it is flexible and not too bloated. Gambit-C is particularly awesome for this because it allows you to both run interpreted scripts, and also compile your code down to C. In my tests, Gambit-C's generated C code was only a little slower than handwritten C (for example, a particular test that ran 0.030s in C was 0.040 in Gambit!). Gambit also has a really nice FFI (foreign function interface), which is essentially just Scheme with special syntax for writing bindings to C libraries (ObjC and C++ are directly supported also). Gambit also has a very nice repl with some debugging capabilities.
Guile is also pretty nice, and it actually runs faster than Lua (fastest interpreted language that I currently know of -- Guile has made great progress in recent years). But since Gambit-C can compile to really fast code, I generally don't use Guile as much unless I intend to use interpreted code in the final version.
Lua has closures, but you won't get continuations like in Scheme, and you also won't get the macros. It's still possible to do a reasonable amount of functional stuff though. It won't have a fully featured object system (like CLOS in CL), but it does have tables and they can be used to implement both class-based inheritance and prototype-based inheritance quite easily. Also, Lua has an excellent C API that is really a pleasure to work with. It's stack-based, and designed in a way that you don't have to worry about the Lua side of memory management at all. The API is very clear and well organized, and there is a lot of great documentation and example code out there. Lua can't compile down, but it does use byte-code (always -- when you send code to the Lua VM, it always compiles that code down the byte-code first and then runs it).
Now, as for Common Lisp, I think that it would probably not make a very good language for embedding. The reason for this is just that CL is huge. Generally, it's desirable to embed a lightweight language because it's going to be using the platform/libs that you provide to it, and not so much external stuff.
So, I think you can't go wrong with either Gambit-C, Guile or Lua. They'll all really nice. CL is powerful, but I just think it's too big for embedding.

I can only agree that Lua is terrible. It works well when you have a pure imperative functional programming style but not if you try OO with large hierarchies, for example NEVER try to wrap a typical GUI toolkit like GTK in a Lua hierarchy, the performance will be just terrible.
I still use Lua because it's so lightweight that you can have dozens of interpreters running at the same time and end users understand to write code snippets with it while Lisp/Scheme has an expert only (lack of) syntax.
I would now add that mruby 3.0 is out and a great language to embed. Unfortunately in the meantime everyone went Javascript and Javascript only.

Delphi Profiling tools [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
I am having some performance problems with my Delphi 2006 app.
Can you Suggest any profiling tools that will help me find the bottle neck
i.e. A tool like turbo Profiler

I asked the same question not too long ago
I've downloaded and tried AQtime. It does seem comprehensive, but it is not an easy to use tool and is VERY expensive for an individual programmer (i.e. $600 US). I loved the fact that it was non-invasive (did not change your code), and that it could do line-by-line profiling, until I found that because it is an instrumenting profiler, it can lead to improper optimizations as in: Why is CharInSet faster than Case statement?
I tried a demo of ProDelphi, much less expensive (about $80 I think), but it was much too clunky for me - I didn't like the user interface at all, and it is invasive - changing your code to add the instrumenting, which you have to be careful about.
I used GpProfile with Delphi 4 for many years. I loved it. It also was invasive, but it worked so well I learned to trust it and it never gave me a problem in 10 years. But when I upgraded to Delphi 2009, I didn't think it best to try using it, since it hasn't been upgraded and by GP's admission, won't work without modifications. I expect you won't be able to use it either with Delphi 2006.
ProDelphi and GpProfile will only profile at the procedure level. If you want to do individual lines (which I sometimes had to), you have to call PROC1, PROC2, PROC3 for each line and put the one line in each PROC. It was a bit of an annoyance to have to do that, but it gave me good results (at least I was happy with the results of GpProfile doing that).
The answer I accepted in my CharInSet question said that "Sampling profilers, which periodically check the location of the CPU, are usually better for measuring code time." and a later answer gave Eric Grange's free sampling profiler for Delphi that now supports Delphi 2009. I haven't tried it yet, but I've heard good things about it, and it is the next one I'm going to try.
By the way, you might be best off by saving your $600 by NOT buying AQtime, and instead using that to upgrade your Delphi 2006 to Delphi 2009. The stability, speed and extra features (expecially Unicode), will be worth your while. See: What are major incentives to upgrade to D2009 (Unicode excluded)?
Also AQtime does not integrate into Delphi 2009 yet.
One other free one, with source that I found out about, but haven't tried yet is TProfiler. If anyone has tried that one, I'd like to know what they think.
Note: The Addenum I added afterwards to question 291631 seems like it may be the answer. See Andre's open source program: asmprofiler
Feb 2010 followup. I bit the bullet and purchased AQTime. A few months ago they finally integrated it into Delphi 2009 which is what I use (but they still have to do Delphi 2010). The viewing of source lines and their individual times and counts is invaluable to me, and AQTime does a superb job of this.

I have just found a very nice free sampling profiler and it supports Delphi 2009

I've used ProDelphi, mostly to determine which routines are eating the most time. It's an Instrumenting Profiler, meaning it adds a bit of code to the beginning and end of each routine. You control which routines it profiles by directives inside comments. You can also profile sections of a routine. But the sections must start and stop at the same block level, with no entry into or exit out of the section. Optimization must be off where ProDelphi inserts it's code (where you put the directives), but you can turn it on anywhere else.
The interface is kinda klunky, but very fast once you get the hang of it. You can do useful work with the free version (limited to 10 routines or sections). ProDelphi can quickly tell you which routines you should examine. But not why, or which lines.
Recently, I've started using Intel's VTune Performance Analyzer. 'WOW' doesn't begin to sum it up. I am impressed. I simply had no idea all this was built into modern Intel processors. Did you know it can tell you exactly how often a single instruction needed to wait for the L1 Data Cache to look sideways at another core before reloading a word from a higher cache? If I keep writing, I'll just sound like a breathless advert for the product.
Go to Intel and download the full-working timed demo. Dig around the net and find a couple of videos on how to get started. (Otherwise, you run the risk of being stymied by all the options.) It works with any compiler. Just point it to a .exe. It'll show you source lines if your .exe includes debug info & you point it to the source code.
I was stuck trying to optimize an inner loop that called a function I wrote. There were no external calls except length(str). This inner loop ran billions of times per run, and ate up about half the cpu time -- a perfect candidate for optimization. I tried all sorts of standard optimizations, with little to no effect. VTune shows hot-spots. I just drilled down till it showed me the ASM my code generated, and how much time each instruction took.
Here's what VTune told me:
line nnnn [line of delphi code] ...
addr hhhh cmp byte ptr [edx+ecx],0x14h - - - - - - - - 3 cycles
addr hhhh ja label_x - - - - - - - - - - - - - - - - - - -10302 cycles
The absolute values mean nothing. (I think I was measuring cycles per instruction retired.) The relative values make it kinda clear where all the time went. The great thing was the Advice Window. It told me the code stalled waiting for data to load into the L1 data cache, and actually gave me good advice on how to avoid stalls.
My mistake was in thinking of the Core2 Quad as just a really fast 8086 CPU. No^3. The code was spending 99% of its time waiting for data to load from memory because I was jumping around too much. My algorithm assumed that memory was RAM (Random Access). That's not how modern CPUs work. Data in L1 cache might be accessed in 1 or 2 cycles, but accessing the L2 or L3 cache costs tens to hundreds of cycles, and going to RAM costs thousands. However, all that latency is avoided when you access your data sequentially -- because the processor will pre-load the cache with the data following the first byte you ask for.
Net result is that I rewrote the algorithm to access the data more sequentially, and got a 10x speedup, which was good enough. When I have the time, I'm certain I can get another 10x out of it. But that's just the Geek in me. Good Enough is good enough.
I already knew that you get the most bang by optimizing your algorithm, not your code. I thought I only needed the profiler to tell me what needed optimizing. But I also needed it to find the reason for the bottleneck so I could design a faster algorithm.
The new algorithm isn't radically different from the old. It just stores the data such that it can be accessed sequentially. For example, in one place I moved a field from an array of records into it's own array of integers -- because the inner loop didn't need the rest of the data in each record. I also had a rectangular matrix stored as a dynamic array of dynamic arrays. The code used this to randomly access megabytes of data (and the poor L1 data cache is only 64Kb). I figured out how to store it in a linear array as diagonals of the matrix, which is the order I use the data. (OK, maybe that part is radical.)
Anyway, I'm sold on VTune.

I have used http://www.prodelphi.de with success on Delphi 7 project in the past. Cheap and works. Don't let the bush league web site scare you off.

www.AutomatedQA.com has the best choice for Delphi profiling (AQTime)

I use and recomend Sampling Profiler, I think you can get it from embarcadeiro.public,attachments newsgroup.

Here's another choice, I haven't used this one before: http://www.prodelphi.de

Final choice that I know of for Delphi, http://gp.17slon.com/gpprofile/index.htm

Final note, www.torry.net is a great place for Delphi component/tool search

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart