Understanding Heroku's recommendations for Node.js --gc_interval flag

Understanding Heroku's recommendations for Node.js --gc_interval flag - heroku-nodejs

What exactly is --gc_interval, and what are the best practices for setting this value in production if the goal is to distinguish clearly between memory leaks and lazy garbage collection?
Heroku recommends setting --gc_interval=100 in their Node.js best practices documentation. However, I'm having trouble finding any official resources or documentation that explain the precise semantics of this parameter. 100 seems arbitrary.
I have even found this thread which says
"--gc-interval is a debugging flag and not supposed to be used for production" but perhaps that is outdated since the thread is from 2016.
This gist hosted by https://gist.github.com/listochkin says
--gc_interval (garbage collect after allocations)
type: int default: -1
but I have no idea who listochkin is.
What exactly is --gc_interval , and what are the best practices for setting this value in production if the goal is to distinguish clearly between memory leaks and lazy garbage collection?
I have tried searching for best practices and official documentation for --gc_interval but have been unable to find any.

It seems to call the GC after that amount of (new) allocations. I think it's left vague enough so people don't use it much and to keep it internal.
Depending on your problem 100 might be short, or too long if you mostly handle big buffers.
It's one of the options node passes through to the V8 engine.
Calling node --v8-options:
--gc_interval (garbage collect after <n> allocations)
type: int default: -1
I suggest you instead use the --expose-gc flag and control when the GC happens calling global.gc() in your code. There is plenty more documentation for that. You can check memory usage using memoryUsage() before and after your call to check if it's a leak or just lazy GC'ing.
Besides these, there are more GC related options you can tune, listed when calling node --v8-options. Here's a great article on how V8's garbage collector works that could bring some clarity on how to choose from those options.
If you really want to find out what it does you might find it in V8's source code, it's called FLAG_gc_interval.

Related

Why is Application Insights Performance Counter Collection causing high CPU usage

Logging performed by our Performance Team has indicated that this line specifically, is killing our CPUs
Microsoft.AI.PerfCounterCollector!Microsoft.ApplicationInsights.Extensibility.PerfCounterCollector.Implementation.PerformanceCounterUtility.ExpandInstanceName()
One theory was the the regexes used to identify Perf Counters in the library are recursing
https://adtmag.com/blogs/dev-watch/2016/07/stack-overflow-crash.aspx
I've inspected the Perf Counter names and nothing looks particularly out of kilter regarding the names and the regexes should have no trouble chewing over them. Certainly for large periods of time there are no issues whatsoever.
I've now turned on Applications Insights Diagnostic Logging in an attempt to observe the issue (in a test environment)
Has anyone else observed this, how can we mitigate this?
We have ensured DeveloperMode is NOT set to on.

this answer most likely won`t be usable now as AppInsights 2 code is much improved, but in my case it was double call AddApplicationInsightsTelemetry(). It adds collector each time this method is called, and due to lack of synchronization inside perf counter collector code it creates CPU spikes.
So avoid calling multiple times AddApplicationInsightsTelemetry, or use AppInsights 2.x. (and the best thing is to do both).

Do the counters you're collecting utilize instance placeholders in their names? If the instance name is known at build time, getting rid of placeholders may significantly improve performance. For instance, instead of
\Process(??APP_WIN32_PROC??)\% Processor Time
try using
\Process(w3wp)\% Processor Time
Also, how many counters are you collecting overall?

Getting details on application RAM usage

According to process explorer / task manager my application has a private working set size of around 190MB even while not performing a specific task, which is way more than I would expect it to need. Using FastMM I have validated that none of this is an actual memory leak in a traditional sense.
I have also read the related discussion going on here, which suggests using FastMM's LogMemoryManagerStateToFile();. However the output generated states "21299K Allocated, 49086K Overhead", which combined (70MB) is way less than the task manager suggests.
Is there any way I can find out what causes the huge differences, might 190MB even be an expectable value for an application with ~15 forms? Also, is having 70% overhead "bad", any way of reducing that number?

You can use VMMap from Sysinternals to get a complete overview of the virtual memory addres space your proces is using. This should allow you to work out the difference you are seeing between taks manager and FastMM.
I doubt that FastMM reports or even can report sections like Mapped File, Shareable, Page Table while those sections do occupy Private WS.

DDDebug can give you insights about memory allocation by objects in your app. You can monitor changes live.
Test the trial version or checkout the introductory video on the website.

Does Erlang always copy messages between processes on the same node?

A faithful implementation of the actor message-passing semantics means that message contents are deep-copied from a logical point-of-view, even for immutable types. Deep-copying of message contents remains a bottleneck for implementations the actor model, so for performance some implementations support zero-copy message passing (although it's still deep-copy from the programmer's point-of-view).
Is zero-copy message-passing implemented at all in Erlang? Between nodes it obviously can't be implemented as such, but what about between processes on the same node? This question is related.

I don't think your assertion is correct at all - deep copying of inter-process messages isn't a bottleneck in Erlang, and with the default VM build/settings, this is exactly what all Erlang systems are doing.
Erlang process heaps are completely separate from each other, and the message queue is located in the process heap, so messages must be copied. This is also true for transferring data into and out of ETS tables as their data is stored in a separate allocation area from process heaps.
There are a number of shared datastructures however. Large binaries (>64 bytes long) are generally allocated in a node-wide area and are reference counted. Erlang processes just store references to these binaries. This means that if you create a large binary and send it to another process, you're only sending the reference.
Sending data between processes is actually worse in terms of allocation size than you might imagine - sharing inside a term isn't preserved during the copy. This means that if you carefully construct a term with sharing to reduce memory consumption, it will expand to its unshared size in the other process. You can see a practical example in the OTP Efficiency Guide.
As Nikolaus Gradwohl pointed out, there was an experimental hybrid heap mode for the VM which did allow term sharing between processes and enabled zero-copy message passing. It hasn't been a particularly promising experiment as I understand it - it requires extra locking and complicates the existing ability of processes to independently garbage collect. So not only is copying inter-process messages not the usual bottleneck in Erlang systems, allowing it actually reduced performance.

AFAIK there was/is experimental support for zero-copy message-passing in erlang using the -shared or -hybrid modell. I read a blog post in 2009 claiming that it's broken on smp machines, but I have no idea about the current status

As has been mentioned here and in other questions current versions of Erlang basically copy everything except for larger binaries. In older pre-SMP times it was feasible to not copy but pass references. While this resulted in very fast message passing it created other problems in the implementation, primarily it made garbage collection more difficult and complicated implementation. I think that today passing references and having shared data could result in excessive locking and synchronisation which is, of course, not a Good Thing.

I wrote the accepted answer to that other question you're referencing, and in it I give you a direct pointer to this line of code:
message = copy_struct(message, msize, &hp, &bp->off_heap);
This is in a function called when the Erlang run-time system needs to send a message, and it's not inside any kind of "if" that could cause it to be skipped. So, as far as I can tell, the answer is "yes, it's always copied." (That's not strictly true -- there is an "if", but it seems to be dealing with exceptional cases, not the normal code-flow path.)
(I'm ignoring the hybrid heap option brought up by Nikolaus. It looks like he's right, but since this isn't the way Erlang is normally built and it has its own penalties, I don't see that it's worth considering as a way to answer your concern.)
I don't know why you're considering 10 GByte/sec a bottleneck, though. Nothing short of registers or CPU cache goes faster in the computer, and such memories are small, thus constituting a kind of bottleneck themselves. Besides which, the zero-copy idea you're proposing would require locking in the case of cross-CPU message passing in a multi-core system, which is also a bottleneck. We're already paying the locking penalty once in this function to copy the message into the other process's message queue; why pay it again later when that process gets around to reading the message?
Bottom line, I don't think your ideas of ways to make it go faster would actually help much.

Detect memory intrusion

There are software applications, such as ArtMoney, that edit the memory of other applications.
Is there a way to detect when some other application is editing the memory of my application?

The basic idea to protect from basic memory modification is to encrypt the parts of memory you care about, and have redundant checks to ensure against modification.
None of which will stop a determined hacker, but it's sufficient to keep the script kiddies out of your address space.

One method, used by many virus checkers, is to perform a checksum of your executable or memory and save it. When running, occasionally calculate a new checksum and compare with the original. Most programs don't intentionally modify their executables.

The short answer is no, it's not possible in the general case. Even if you implement some of the suggestions that have been given, there's nothing stopping someone from patching the code that performs the checks.
I don't know the specifics of how ArtMonkey works, but if it functions as a debugger you could try checking regularly to see if DebugHook <> 0, and reacting appropriately if it is. (Just make sure to put that code in a {$IFNDEF DEBUG} block so it doesn't cause trouble for you!)
You might want to ask yourself why you want to prevent people from patchimg your memory, though. Unless there's a genuine security issue, you probably shouldn't even try. Remember that the user's computer, that your program will be running on, is their property, not yours, and if you interfere too much with the user's choices as to what to do with their property, your program is morally indistinguishable from malware.

I do not know how it works, I think it can be done in 3 ways:
ReadProcessMemory and WriteProcessMemory Windows API
using a debugger (check for debughook, but that's almost too easy so it won't use that)
injects a dll so it can acces all memory (because it is in the same process)
The last one is easier (check for injected dll or something like that). The first one is trickier, but I found some articles about it:
Memory breakpoints: http://www.codeproject.com/KB/security/AntiReverseEngineering.aspx?fid=1529949&fr=51&df=90&mpp=25&noise=3&sort=Position&view=Quick#BpMem
Hook "WriteProcessMemory" api: http://www.codeproject.com/KB/system/hooksys.aspx

I asked a similar question, and the conclusion was basically that you cannot stop this.
How can I increase memory security in Delphi

How to log mallocs

This is a bit hypothetical and grossly simplified but...
Assume a program that will be calling functions written by third parties. These parties can be assumed to be non-hostile but can't be assumed to be "competent". Each function will take some arguments, have side effects and return a value. They have no state while they are not running.
The objective is to ensure they can't cause memory leaks by logging all mallocs (and the like) and then freeing everything after the function exits.
Is this possible? Is this practical?
p.s. The important part to me is ensuring that no allocations persist so ways to remove memory leaks without doing that are not useful to me.

You don't specify the operating system or environment, this answer assumes Linux, glibc, and C.
You can set __malloc_hook, __free_hook, and __realloc_hook to point to functions which will be called from malloc(), realloc(), and free() respectively. There is a __malloc_hook manpage showing the prototypes. You can add track allocations in these hooks, then return to let glibc handle the memory allocation/deallocation.
It sounds like you want to free any live allocations when the third-party function returns. There are ways to have gcc automatically insert calls at every function entrance and exit using -finstrument-functions, but I think that would be inelegant for what you are trying to do. Can you have your own code call a function in your memory-tracking library after calling one of these third-party functions? You could then check if there are any allocations which the third-party function did not already free.

First, you have to provide the entrypoints for malloc() and free() and friends. Because this code is compiled already (right?) you can't depend on #define to redirect.
Then you can implement these in the obvious way and log that they came from a certain module by linking those routines to those modules.
The fastest way involves no logging at all. If the amount of memory they use is bounded, why not pre-allocate all the "heap" they'll ever need and write an allocator out of that? Then when it's done, free the entire "heap" and you're done! You could extend this idea to multiple heaps if it's more complex that that.
If you really do need to "log" and not make your own allocator, here's some ideas. One, use a hash table with pointers and internal chaining. Another would be to allocate extra space in front of every block and put your own structure there containing, say, an index into your "log table," then keep a free-list of log table entries (as a stack so getting a free one or putting a free one back is O(1)). This takes more memory but should be fast.
Is it practical? I think it is, so long as the speed-hit is acceptable.

You could run the third party functions in a separate process and close the process when you are done using the library.

A better solution than attempting to log mallocs might be to sandbox the functions when you call them—give them access to a fixed segment of memory and then free that segment when the function is done running.
Unconfined, incompetent memory usage can be just as damaging as malicious code.

Can't you just force them to allocate all their memory on the stack? This way it would be garanteed to be freed after the function exits.

In the past I wrote a software library in C that had a memory management subsystem that contained the ability to log allocations and frees, and to manually match each allocation and free. This was of some use when attempting to find memory leaks, but it was difficult and time consuming to use. The number of logs was overwhelming, and it took an extensive amount of time to understand the logs.
That being said, if your third party library has extensive allocations, its more then likely impractical to track this via logging. If you're running in a Windows environment, I would suggest using a tool such as Purify[1] or BoundsChecker[2] that should be able to detect leaks in your third party libraries. The investment in the tool should pay for itself in time saved.
[1]: http://www-01.ibm.com/software/awdtools/purify/ Purify
[2]: http://www.compuware.com/products/devpartner/visualc.htm BoundsChecker

Since you're worried about memory leaks and talking about malloc/free, I assume you're in C. I'm also assuming based on your question that you do not have access to the source code of the third party library.
The only thing I can think of is to examine memory consumption of your app before & after the call, log error messages if they're different and convince the third party vendor to fix any leaks you find.

If you have money to spare, then consider using Purify to track issues. It works wonders, and does not require source code or recompilation. There are also other debugging malloc libraries available that are cheaper. Electric Fence is one name I recall. That said, the debugging hooks mentioned by Denton Gentry seem interesting too.

If you're too poor for Purify, try Valgrind. It it a lot better than it was 6 years ago and a lot easier to dive into than Purify.

Microsoft Windows provides (use SUA if you need a POSIX), quite possibly, the most advanced heap+(other api known to use the heap) infrastructure of any shipping OS today.
the __malloc() debug hooks and the associated CRT debug interfaces are nice for cases where you have the source code to the tests, however they can often miss allocations by standard libraries or other code which is linked. This is expected as they are the Visual Studio heap debugging infrastructure.
gflags is a very comprehensive and detailed set of debuging capabilities which has been included with Windows for many years. Having advanced functionality for source and binary only use cases (as it is the OS heap debugging infrastructure).
It can log full stack traces (repaginating symbolic information in a post-process operation), of all heap users, for all heap modifying entrypoint's, serially if needed. Also, it may modify the heap with pathalogical cases which may align the allocation of data such that the page protection offered by the VM system is optimally assigned (i.e. allocate your requested heap block at the end of a page, so even a singele byte overflow is detected at the time of the overflow.
umdh is a tool which can help assess the status at various checkpoints, however the data is continually accumulated during the execution of the target o it is not a simple checkpointing debug stop in the traditional context. Also, WARNING, Last I checked at least, the total size of the circular buffer which store's the stack information, for each request is somewhat small (64k entries (entries+stack)), so you may need to dump rapidly for heavy heap users. There are other ways to access this data but umdh is fairly simple.
NOTE there are 2 modes;
MODE 1, umdh {-p:Process-id|-pn:ProcessName} [-f:Filename] [-g]
MODE 2, umdh [-d] {File1} [File2] [-f:Filename]
I do not know what insanity gripped the developer who chose to alternate between -p:foo argument specifier's and naked ordering of argument's but it can get a little confusing.
The debugging sdk works with a number of other tools, memsnap is a tool which apparently focuses on memory leask and such, but I have not used it, your milage may vary.
Execute gflags with no arguments for the UI mode, +arg's and /args are different "modes" of use also.

On Linux I've successfully used mtrace(3) to log allocations and freeings. Its usage is as simple as
Modify your program to call mtrace() when you need to begin tracing (e.g. at the top of main()),
Set environment variable MALLOC_TRACE to the file path where the trace should be saved and run the program.
After that the output file will contain something like this (excerpt from the middle to show a failed allocation):
# /usr/lib/tls/libnvidia-tls.so.390.116:[0xf44b795c] + 0x99e5e20 0x49
# /opt/gcc-7/lib/libstdc++.so.6:(_ZdlPv+0x18)[0xf6a80f78] - 0x99beba0
# /usr/lib/tls/libnvidia-tls.so.390.116:[0xf44b795c] + 0x9a23ec0 0x10
# /opt/gcc-7/lib/libstdc++.so.6:(_ZdlPv+0x18)[0xf6a80f78] - 0x9a23ec0
# /opt/Xorg/lib/video-libs/libGL.so.1:[0xf668ee49] + 0x99c67c0 0x8
# /opt/Xorg/lib/video-libs/libGL.so.1:[0xf668f14f] - 0x99c67c0
# /opt/Xorg/lib/video-libs/libGL.so.1:[0xf668ee49] + (nil) 0x30000000
# /lib/libc.so.6:[0xf677f8eb] + 0x99c21f0 0x158
# /lib/libc.so.6:(_IO_file_doallocate+0x91)[0xf677ee61] + 0xbfb00480 0x400
# /lib/libc.so.6:(_IO_setb+0x59)[0xf678d7f9] - 0xbfb00480

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart