What is causing repeated glibc error with plink/batch job software-? - memory

I am running plink software through a PBS batch job. This error occurs when I run the job:
*** glibc detected *** /software/plink: double free or corruption (out): 0x000000018dfafca0 ***
======= Backtrace: =========
[0x7d7691]
[0x7d8bea]
[0x45f5ed]
[0x47bb11]
[0x40669a]
[0x7bdb2c]
[0x400209]
However it only occur with one of my files (bw 30-60 gb files) and each rerun shows the exact same back trace map. I tried running it not through the batch scheduler and received the same error again, with the same backtrace map. I am just using the software (plink), and didn't write it, so most of the answers online are about writing and freeing memory in your program.
Any ideas on
what is causing this error, and
how I can fix it?

what is causing this error, and
A double-free or heap corruption in the plink
how I can fix it?
You can't. You can do one of two things, depending on how much you know and understand.
First, build the newest version of plink from source, and see if the problem persists.
If it does not, you are done (or at least you might hope that someone else found and fixed this problem).
If it does, you'll have to debug the problem sufficiently for either you, or plink developers to fix it. Some tools that should help: Valgrind and Address Sanitizer (note: in addition to Clang, Address Sanitizer is also included in GCC-4.8).
Once you have a good report (where the memory was allocated, and where it got corrupted), you should either fix it and submit your fix to plink developers, or give them a bug report with the allocation and corruption location and stack traces.

Related

Not sure how to resolve outofmemory issue on jenkins server?

My Jenkins server keeps crashing, so I generated a heap dump which I then put through visualVM. It shows most of the memory is being used up by the class java.util.concurrent.concurrenhashmapnode.
My understanding is loads objects are being referenced, which are unable to be GC'd. As a result, most the memory is being used up by this. Any idea how to resolve this? I'm new to system admin stuff, so not the most technically proficient sorry.
TIA
I've recently came across OutOfMemoryError which crashed my Jenkins every 2 days. It was due to Ldap bug in old version of java: Ldap Error and java fixed versions matrix
In my case updating java fixed the problem.
Anyway to investigate OutOfMemoryError I did as follows:
restarted jenkins after crash,
took incremental thread dumps every half an hour, can be taken from <jenkinsUrl>/threadDump,
comparing thread dumps pointed me to memory leak on ldap threads.
In general I'd also suggest to:
update: java, Jenkins and it's plugins, other problematic tools,
investigate Jenkins logs, dumps, profile heap (what you already did).

CVS Memory Allocation Error

Running cvs up -Ad, I'm getting the following error thrown:
cvs [update aborted]: out of memory; can not allocate 333685120 bytes
But upon investigation, (running top), there appears to be enough free memory after all.
Mem: 1025528k total, 521660k used, 503868k free, 48736k buffers
Does anyone know of any common CVS issues that might cause this error to be thrown as something of a red herring? If memory isn't the real problem, any ideas as to what I can do to find out what is?
There are at least 3 solutions for this bug:
A) Uninstall CVSNT, then install CVSNT-x64. Because this bug is already fixed in 64-bit version of CVSNT.
B) Fix this bug in your currently installed (32-bit) cvs.exe binary by using editbin.exe tool from Visual Studio distribution. Use this command:
editbin.exe /LARGEADDRESSAWARE cvs.exe
C) If you can't reinstall or modify your CVSNT, then as a temporary solution, try updating separate sub-folders and/or separate files in the folder you are trying to update, one-by-one, to make CVSNT allocate less memory.

Techniques and tools for debugging problems on remote machines?

Users have been reporting problems/crashes/bugs that I can't reproduce on my machine. I'm finding these problems difficult to fix.
I've started using EurekaLog (fantastic!) and SmartInspect. Both these tools have helped greatly but I'm still finding it difficult to catch some problems.
I've just purchased Debugging by David Agans (and waiting for it to arrive).
Are there any other tools or techniques specific to Delphi that will help with catching these hard to find remote problems? The kinds of problems I'm finding difficult to track down are those that don't raise exceptions or have a clear cause. EurekaLog catches exceptions and SmartInspect is pretty good once I have a theory to check. But in some cases it is a seemingly random crash and there are several thousand lines of code that may may be at fault. How to narrow down to the root cause?
MadExcept is what I use, and it is fabulous. I have also used EurekaLog and find the functionality almost exactly identical, except that I have more experience and time using MadExcept. it's free for non-commercial use, and reasonably priced for commercial use.
Update: MadExcept 4 is now out and even supports 64 bit Delphi XE2 apps, and has memory-leak checking too.
When nothing blows up, I rely on heavy use of trace logging. I have a TraceMessage(integer,string) function which I call throughout all my apps, and when someone has problems I get them to click a menu item that turns up the debug trace level to the most verbose level; it gives me a complete history of everything my application did, and this has helped me even more than madExcept, to solve problems at customer sites. Customers get a crash, and that crash report sent by madexcept contains a log file (created by my app) that is attached automatically. I believe you can do this equally well with madExcept and EurekaLog. If you need a logging system you could download Log4D, or you could write your own, it's pretty simple.
For always-free, try JclDebug, which requires more work to set up, but which has worked fabulously for me, also.
For help with heap problems, learn more about fastMM (full version) debug options.
And you shouldn't forget that Delphi itself supports Remote debugging, if you can reproduce a crash on machines in your office that don't have delphi installed, use remote debug across the office network instead of installing a complete RAD Studio installation on that other machine at your work. You could also use remote debug to connect to a client PC computer across the internet, but I have not tried remote debug across the internet yet, so I can't say whether it works great over the internet or not. I do know that since remote debug doesn't support automatic deploy of the EXE file you built (you have to do that part yourself), remote debug over internet, to a client PC is more work.
You might also find lots of your problems by fixing all your hints and warnings, and then going through with CodeHealer or Pascal Analyzer (PAL) from Peganza. These static analysis tools can help you find real code problems.
If performance and memory usage are your problems, get the full version of AQTime, and use it to profile and watch your system operate. It will help you fix your memory leaks, and understand your app's runtime behaviour and memory usage, not just leaks but bottlenecks for memory and CPU usage. Some of those bottlenecks can also be the source of some odd problems. I have even used AQTime to help me find deadlocks, since it can generate traces of execution, that can help me figure out what code is running, and locate deadlocks. Update: AQTime is not installable on machines other than your main dev machine, without violating the newly modified license terms for AQTime. These terms were never this restrictive in the good old days.
If you gave more exact idea of what your problems are, I'm sure other people could give you some more ideas that are specific, but all of the above are general techniques that have served me well.
One of the best way is to use the Remote Debugger that comes with Delphi, so you can debug directly the application running on the remote machine. THe remote debugger is somewhat buggy in some Delphi releases, and requires to follow the instructions carefully to make it working, but when needed it's a tool to consider. Also check if there are updates available for your version, they could come in a separate installer for deployment on "remote" systems. Otherwise first install the remote debugger, than check if the files installed has newer versions in your local installation, and the copy tehm on the remote machine.
CodeSite has helped me a lot in these situations. Since XE it is bundled with Delphi.
Logging is the key, in this matter.
Take a look at our TSynLog class available in our Open Source SynCommons library.
It does have the JCL Debug / MadExcept features, with some additional (like customer-side profiling, and logging):
logging with a set of levels;
fast, low execution overhead;
can load .map file symbols to be used in logging;
compression of .map into binary .mab (900 KB -> 70 KB);
inclusion of the .map/.mab into the .exe;
reading of an external .map to add unit names and line numbers to a log file without .map available information at execution;
exception logging (Delphi or low-level exceptions) with unit names and line numbers;
optional stack trace with units and line numbers;
methods or procedure recursive tracing, with Enter and auto-Leave using interfaces;
high resolution time stamps, for customer-side profiling of the application execution;
set / enumerates / TList / TPersistent / TObjectList / TContainer / dynamic array JSON serialization;
per-thread or global logging;
multiple log files on the same process;
integrated log archival (in zip or any other format);
Open Source, works from Delphi 5 up to XE.

PermGen problems with Lift and Jetty

I'm developing on the standard Lift platform (maven and jetty). I'm repeatedly (once every couple of days) getting this:
Exception in thread "7048009#qtp-3179125-12" java.lang.OutOfMemoryError: PermGen space
2009-09-15 19:41:38.629::WARN: handle failed
java.lang.OutOfMemoryError: PermGen space
This is in my dev environment. It's not a problem because I can keep restarting the server. In deployment I'm not having these problems so it's not a real issue. I'm just curious.
I don't know too much about the JVM. I think I'm correct in thinking that permanent generation memory is for things like classes and interned strings? What I remember is a bit mixed up with the .NET memory model...
Any reason why this is happening? Are the defaults just crazily low? Is it to do with all the auxiliary objects that Scala has to create for Function objects and similar FP things? Every time I restart Jetty with newly written code (every few minutes) I imagine it re-loads classes etc. But even so, it cant' be that many can it? And shouldn't the JVM be able to deal with a large number of classes?
Cheers
Joe
From this post:
This exception occurred for one simple reason :
the permgenspace is where class properties, such as methods, fields, annotations, and also static variables, etc. are stored in the Java VM, but this space has the particularity to not being cleaned by the garbage collector.
So if your webapp uses or creates a lot of classes (I’m thinking dynamic generations of classes), chances are you met this problem.
Here are some solutions that helped me get rid of this exception :
-XX:+CMSClassUnloadingEnabled : this setting enables garbage collection in the permgenspace
-XX:+CMSPermGenSweepingEnabled : allows the garbage collector to remove even classes from the memory
-XX:PermSize=64M -XX:MaxPermSize=128M : raises the amount of memory allocated to the permgenspace
May be this could help.
Edit July 2012 (almost 3 years later):
Ondra Žižka comments (and I have updated the answer above):
JVM 1.6.0_27 says: Please use:
CMSClassUnloadingEnabled (Whether class unloading enabled when using CMS GC)
in place of CMSPermGenSweepingEnabled in the future
See the full Hotspot JVM Options - The complete reference for mroe.
If you see this when running mvn jetty:run,
set the MAVEN_OPTS.
For Linux:
export MAVEN_OPTS="-XX:+CMSClassUnloadingEnabled -XX:PermSize=256M -XX:MaxPermSize=512M"
mvn jetty:run
For Windows:
set "MAVEN_OPTS=-XX:+CMSClassUnloadingEnabled -XX:PermSize=256M -XX:MaxPermSize=512M"
mvn jetty:run
Should be fine now. If not, increase -XX:MaxPermSize.
You can also put these permanently to your environment.
For Linux, append the export line to ~/.bashrc
For Windows, press Win-key + PrintScreen, and go Advanced > Environment.
See also http://support.microsoft.com/kb/310519.
This is because of the reloading of classes as you suggested. If you are using lots of libraries etc. the sum of classes will rapidly grow for each restart. Try monitoring your jetty instance with VisualVM to get an overview of memory consumption when reloading.
The mailing list (http://groups.google.com/group/liftweb/) is the official support forum for Lift, and where you'll be able to get a better answer. I don't know the particulars of your dev setup (you don't go into much detail), but I assume you're reloading your war in Jetty without actually restarting it. Lift doesn't perform dynamic class generation (as suggested by VonC above), but Scala compiles each closure as a separate class. If you're adding and removing closures to your code over the course of several days, it's possible that too many classes are being loaded and never unloaded and taking up perm space. I'd suggest you enable the options JVM options mentioned by VonC above and see if they help.
The permanent generation is where the JVM puts stuff that will probably not be (garbage) collected like custom classloaders.
Depending on what you are deploying, the perm gen setting can be low. Some applications and/or containers combination do contain some memory leaks, so when an app gets undeployed sometimes some stuff like class loaders are not collected, resulting in filling the Perm Space thus generating the error you are having.
Unfortunately, currently the best option in this case is to max up the perm space with the following jvm flag (example for 192m perm size):
-XX:MaxPermSize=192M (or 256M)
The other option is to make sure that either the container or the framework do not leak memory.

strategies to fix runtime errors

I was wondering what strategies you guys are using to fix runtime errors? Really appreciate if you could share some tips!
Here is some of my thought (possibly with the help of gdb):
when runtime error happens because some memory is wrongly accessed, is the address stored in the dumped core showing where the memory is?
If I can find the address/memory whose being accessed causes the runtime error, is it possible to find out which variable is using that address (which may be at the begining or middle of the memory of the variable)? And find out the nearby variables that takes the memory down below and right above that memory block?
If all these are possible, will it help to fix the bugs?
Thanks and regards!
I use gdb's --args option to start my programs from the command-line.
Example:
gdb --args foocode --with-super-awesome-option
run
This will load the program foocode and pass the --with-super-awesome-option parameter to it. When the program fails, you'll have a ready-to-use gdb session to work within.
From there you can use the backtrace command:
bt
This will show you the chain of events (function calls) that lead to your crash.

Resources