I have a legacy Erlang program that needs optimizations. This piece of code uses up to 20G memory in run time. I'm wondering if there is a way to get the Erlang Beam size of the process itself in run time? If that is possible then I can do something like if beam size>10GB then reject all calls to gen_server process. Thanks for the help!
Perhaps you could use some proces_info data:
{memory, Size}:
Size is the size in bytes of the process. This includes call
stack, heap and internal structures.
process_info(self(), memory).
{memory,17128}
Just start with calling memory() from the shell to learn if it is in binaries, ets, processes and so on the memory is being kept. Next you can ask a tool like etop to give you the processes using the most memory if a process is the culprit. This can often track down the problem.
If the problem is ETS or binaries, then you may be keeping certain large binaries around for a long time due to sub-binary pointers inside them. This needs GC tweaks to fix.
Related
Whenever I use spawn(Mod, Func, Arguments) all the arguments are copied. Why are they copied if everything is immutable in Erlang? Why isn't just the pointer copied? Is it because that makes the garbage collection much more complicated?
At present, the Erlang VM maintains a separate heap per process*. This means that a process can collect its garbage independently of others, making Erlang less vulnerable to the effects of GC pauses than runtimes that keep a global heap.
In order for this to be effective, it is imperative that no process references memory allocated on the heap of another process. Presumably, the reason for copying the arguments sent to spawn/3 is so that they are moved into the newly spawned process' heap. The same holds for messages sent to a process, by the way (source: see the link above):
All data in messages between Erlang processes is copied, with the exception of refc binaries on the same Erlang node.
(*) You might enjoy reading this blog post about garbage collection in Erlang. It's actually a little more complicated than I said in the beginning as some objects (notably atoms and large binaries) are handled separately.
Robert Virding added the following in a comment below:
Having separate heaps for each process make the GC simpler and more efficient, you can reclaim much more memory in each pass than with a real-time collector. Also it scales much better in a parallel system as there are much much fewer locks and less synchronisation, which kills speed. It can also give better locality of memory and cache performance. It's one of those things which sounds worse but ends up being better.
In a Haskell program compiled with GHC, is it possible to programmatically guard against excessive memory usage? That is, have it notify the program when memory usage reaches a specified limit, preferably indicating the offending thread.
For example, suppose I want to write a server, hosting a scripting language interpreter, that users can connect to. It's Turing-complete, so programs could theoretically use unlimited memory or time. Suppose each client is handled with a separate thread. If a client writes an infinite loop that consumes memory very quickly, I want to ensure that the thread consumes no more than, say, 1 MB of memory, before being alerted with an exception. I do not want other users to be affected when that happens.
This is probably possible using separate processes and ulimit, but:
I would rather keep it in one program, to avoid the complexity of inter-process communication.
I need to support both Linux and Windows, so I would prefer to keep it platform-agnostic if possible.
Edward Z. Yang and David Mazières have developed an extension to GHC that supports dynamic resource limits, and discuss it at http://ezyang.com/rlimits.html They also provide a version of GHC 7.8 that supports this.
Unfortunately, their work was not included in GHC upstream.
May not be exactly what you want. But, as documented here you have a ghc compile option:
-Ksize, update: Oops, sorry, -K is for stack overflows. Still, you can check that link.
In your example, you may need to modify the source of the scripting language interpreter, make some twists to the memory mgmt. module(s), of course IF it has some managed memory allocation features, the interpreter can complain about an execessive use of memory quota by an API callback to your host application.
A faithful implementation of the actor message-passing semantics means that message contents are deep-copied from a logical point-of-view, even for immutable types. Deep-copying of message contents remains a bottleneck for implementations the actor model, so for performance some implementations support zero-copy message passing (although it's still deep-copy from the programmer's point-of-view).
Is zero-copy message-passing implemented at all in Erlang? Between nodes it obviously can't be implemented as such, but what about between processes on the same node? This question is related.
I don't think your assertion is correct at all - deep copying of inter-process messages isn't a bottleneck in Erlang, and with the default VM build/settings, this is exactly what all Erlang systems are doing.
Erlang process heaps are completely separate from each other, and the message queue is located in the process heap, so messages must be copied. This is also true for transferring data into and out of ETS tables as their data is stored in a separate allocation area from process heaps.
There are a number of shared datastructures however. Large binaries (>64 bytes long) are generally allocated in a node-wide area and are reference counted. Erlang processes just store references to these binaries. This means that if you create a large binary and send it to another process, you're only sending the reference.
Sending data between processes is actually worse in terms of allocation size than you might imagine - sharing inside a term isn't preserved during the copy. This means that if you carefully construct a term with sharing to reduce memory consumption, it will expand to its unshared size in the other process. You can see a practical example in the OTP Efficiency Guide.
As Nikolaus Gradwohl pointed out, there was an experimental hybrid heap mode for the VM which did allow term sharing between processes and enabled zero-copy message passing. It hasn't been a particularly promising experiment as I understand it - it requires extra locking and complicates the existing ability of processes to independently garbage collect. So not only is copying inter-process messages not the usual bottleneck in Erlang systems, allowing it actually reduced performance.
AFAIK there was/is experimental support for zero-copy message-passing in erlang using the -shared or -hybrid modell. I read a blog post in 2009 claiming that it's broken on smp machines, but I have no idea about the current status
As has been mentioned here and in other questions current versions of Erlang basically copy everything except for larger binaries. In older pre-SMP times it was feasible to not copy but pass references. While this resulted in very fast message passing it created other problems in the implementation, primarily it made garbage collection more difficult and complicated implementation. I think that today passing references and having shared data could result in excessive locking and synchronisation which is, of course, not a Good Thing.
I wrote the accepted answer to that other question you're referencing, and in it I give you a direct pointer to this line of code:
message = copy_struct(message, msize, &hp, &bp->off_heap);
This is in a function called when the Erlang run-time system needs to send a message, and it's not inside any kind of "if" that could cause it to be skipped. So, as far as I can tell, the answer is "yes, it's always copied." (That's not strictly true -- there is an "if", but it seems to be dealing with exceptional cases, not the normal code-flow path.)
(I'm ignoring the hybrid heap option brought up by Nikolaus. It looks like he's right, but since this isn't the way Erlang is normally built and it has its own penalties, I don't see that it's worth considering as a way to answer your concern.)
I don't know why you're considering 10 GByte/sec a bottleneck, though. Nothing short of registers or CPU cache goes faster in the computer, and such memories are small, thus constituting a kind of bottleneck themselves. Besides which, the zero-copy idea you're proposing would require locking in the case of cross-CPU message passing in a multi-core system, which is also a bottleneck. We're already paying the locking penalty once in this function to copy the message into the other process's message queue; why pay it again later when that process gets around to reading the message?
Bottom line, I don't think your ideas of ways to make it go faster would actually help much.
I mostly work on C language for my work. I have faced many issues and spent lot time in debugging issues related to dynamically allocated memory corruption/overwriting. Like malloc(A) A bytes but use write more than A bytes. Towards that i was trying to read few things when i read about :-
1.) An approach wherein one allocates more memory than what is needed. And write some known value/pattern in that extra locations. Then during program execution that pattern should be untouched, else it indicated memory corruption/overwriting. But how does this approach work. Does it mean for every write to that pointer which is allocated using malloc() i should be doing a memory read of the additional sentinel pattern and read for its sanity? That would make my whole program very slow.
And to say that we can remove these checks from the release version of the code, is also not fruitful as memory related issues can happen more in 'real scenario'. So can we handle this?
2.) I heard that there is something called HEAP WALKER, which enables programs to detect memory related issues? How can one enable this.
thank you.
-AD.
If you're working under Linux or OSX, have a look at Valgrind (free, available on OSX via Macports). For Windows, we're using Rational PurifyPlus (needs a license).
You can also have a look at Dmalloc or even at Paul Nettle's memory manager which helps tracking memory allocation related bugs.
If you're on Mac OS X, there's an awesome library called libgmalloc. libgmalloc places each memory allocation on a separate page. Any memory access/write beyond the page will immediately trigger a bus error. Note however that running your program with libgmalloc will likely result in a significant slowdown.
Memory guards can catch some heap corruption. It is slower (especially deallocations) but it's just for debug purposes and your release build would not include this.
Heap walking is platform specific, but not necessarily too useful. The simplest check is simply to wrap your allocations and log them to a file with the LINE and FILE information for your debug mode, and most any leaks will be apparent very quickly when you exit the program and numbers don't tally up.
Search google for LINE and I am sure lots of results will show up.
I have a VPS with not very much memory (256Mb) which I am trying to use for Common Lisp development with SBCL+Hunchentoot to write some simple web-apps. A large amount of memory appears to be getting used without doing anything particularly complex, and after a while of serving pages it runs out of memory and either goes crazy using all swap or (if there is no swap) just dies.
So I need help to:
Find out what is using all the memory (if it's libraries or me, especially)
Limit the amount of memory which SBCL is allowed to use, to avoid massive quantities of swapping
Handle things cleanly when memory runs out, rather than crashing (since it's a web-app I want it to carry on and try to clean up).
I assume the first two are reasonably straightforward, but is the third even possible?
How do people handle out-of-memory or constrained memory conditions in Lisp?
(Also, I note that a 64-bit SBCL appears to use literally twice as much memory as 32-bit. Is this expected? I can run a 32-bit version if it will save a lot of memory)
To limit the memory usage of SBCL, use --dynamic-space-size option (e.g.,sbcl --dynamic-space-size 128 will limit memory usage to 128M).
To find out who is using memory, you may call (room) (the function that tells how much memory is being used) at different times: at startup, after all libraries are loaded and then during work (of cource, call (sb-ext:gc :full t) before room not to measure the garbage that has not yet been collected).
Also, it is possible to use SBCL Profiler to measure memory allocation.
Find out what is using all the memory
(if it's libraries or me, especially)
Attila Lendvai has some SBCL-specific code to find out where an allocated objects comes from. Refer to http://article.gmane.org/gmane.lisp.steel-bank.devel/12903 and write him a private mail if needed.
Be sure to try another implementation, preferably with a precise GC (like Clozure CL) to ensure it's not an implementation-specific leak.
Limit the amount of memory which SBCL
is allowed to use, to avoid massive
quantities of swapping
Already answered by others.
Handle things cleanly when memory runs
out, rather than crashing (since it's
a web-app I want it to carry on and
try to clean up).
256MB is tight, but anyway: schedule a recurring (maybe 1s) timed thread that checks the remaining free space. If the free space is less than X then use exec() to replace the current SBCL process image with a new one.
If you don't have any type declarations, I would expect 64-bit Lisp to take twice the space of a 32-bit one. Even a plain (small) int will use a 64-bit chunk of memory. I don't think it'll use less than a machine word, unless you declare it.
I can't help with #2 and #3, but if you figure out #1, I suspect it won't be a problem. I've seen SBCL/Hunchentoot instances running for ages. If I'm using an outrageous amount of memory, it's usually my own fault. :-)
I would not be surprised by a 64-bit SBCL using twice the meory, as it will probably use a 64-bit cell rather than a 32-bit one, but couldn't say for sure without actually checking.
Typical things that keep memory hanging around for longer than expected are no-longer-useful references that still have a path to the root allocation set (hash tables are, I find, a good way of letting these things linger). You could try interspersing explicit calls to GC in your code and make sure to (as far as possible) not store things in global variables.