Java Memory Usage, many contradictory numbers - memory

I am running multiple instances of a java web app (Play Framework).
The longer the web apps run, the less memory is available until I restart the web apps. Sometimes I get an OutOfMemory Exception.
I am trying to find the problem, but I get a lot of contradictory infos, so I am having trouble finding the source.
This are the infos:
Ubuntu 14.04.5 LTS with 12 GB of RAM
OpenJDK Runtime Environment (build 1.8.0_111-8u111-b14-3~14.04.1-b14)
OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode)
EDIT:
Here are the JVM settings:
-Xms64M
-Xmx128m
-server
(I am not 100% sure if these parameters are passed correctly to the JVM since I am using an /etc/init.d script with start-stop-daemon which starts the play framework script, which starts the JVM)
This is, how I use it:
start() {
echo -n "Starting MyApp"
sudo start-stop-daemon --background --start \
--pidfile ${APPLICATION_PATH}/RUNNING_PID \
--chdir ${APPLICATION_PATH} \
--exec ${APPLICATION_PATH}/bin/myapp \
-- \
-Dinstance.name=${NAME} \
-Ddatabase.name=${DATABASE} \
-Dfile.encoding=utf-8 \
-Dsun.jnu.encoding=utf-8 \
-Duser.country=DE \
-Duser.language=de \
-Dhttp.port=${PORT} \
-J-Xms64M \
-J-Xmx128m \
-J-server \
-J-XX:+HeapDumpOnOutOfMemoryError \
>> \
$LOGFILE 2>&1
I am picking on instance of the web apps now:
htop shows 4615M of VIRT and 338M of RES.
When I create a heap dump with jmap -dump:live,format=b,file=mydump.dump <mypid> the file has only about 50MB.
When I open it in Eclipse MAT the overview shows "20.1MB" of used memory (with the "Keep unreachable objects" option set to ON).
So how can 338MB shown in htop shrink to 20.1MB in Eclipse MAT?
I don't think this is GC related, because it doesn't matter how long I wait, htop always shows about this amount of memory, it never goes down.
In fact, I would assume that my simple app does not use more then 20MB, mabye 30MB.
I compared to heap dumps with a age difference of 4 hours with Eclipse MAT and I don't see any significant increase in objects.
PS: I added the -XX:+HeapDumpOnOutOfMemoryError option, but I have to wait for 5-7 days until it happens again. I hope to find the problem earlier with you helping me interpreting my numbers.
Thank you,
schube

The heap is the memory containing Java objects. htop surely doesn’t know about the heap. Among the things that contribute to the used memory, as reported by VIRT are
The JVM’s own code and that of the required libraries
The byte code and meta information of the loaded classes
The JIT compiled code of frequently used methods
I/O buffers
thread stacks
memory allocated for the heap, but currently not containing live objects
When you dump the heap, it will contain live Java objects, plus meta information allowing to understand the content, like class and field names. When a tool calculates the used heap, it will incorporate the objects only. So it will naturally be smaller than the heap dump file size. Also, this used memory often does not contain the unusable memory due to padding/alignment, further, the tools sometimes assume the wrong pointer size, as the relevant information (32 bit architecture vs 64 bit architecture vs compressed oops) is not available in the heap dump. These errors may sum up.
Note that there might be other reasons for an OutOfMemoryError than having too much objects in the heap. E.g. there might be too much meta information, due to a memory leak combined with dynamic class loading or too many native I/O buffers…

Related

Using Docker to install QIIME2

Can anyone help me figure out why it took around 20G of my C disk to install QIIME2 through Docker?
Thank you!
Before installing QIIME2, I had 30GB in my C disk, but only remains 8GB after installation.
The short answer to that question is: QIIME2 is pretty big. But I'm sure you knew that already, so let's dig into the details.
First, the QIIME image is roughly 12GB when uncompressed. (This raises the question of where the other 8GB went if you lost 20GB in total. I don't have an answer to that.)
Using a tool called dive, I can explore the QIIME image, and see where that disk space is going. There's one entry that stands out in the log:
5.9 GB |1 QIIME2_RELEASE=2022.8 /bin/sh -c chmod -R a+rwx /opt/conda
For reference, the chmod command is a command which changes the permissions on a directory, without changing the directory itself. Yet, this command is responsible for half the size of the image. It turns out that due to the way docker works internally. If a layer changes the metadata or permissions of a file, then the original file must be re-included into the layer. More information
The remainder is 6GB, which comes mostly from a step where QIIME installs all of its dependencies. That's fairly reasonable for a project packaged with conda.
To summarize, it's an intersection of three factors:
Conda is fairly space-hungry, compared to equivalent pip packages.
QIIME has a lot of features and dependencies.
Every dependency is included twice.
Edit: this is now fixed in version 2022.11.

{jpillora/chisel} High RAM usage on ARM by Chisel

I am using jpillora's Chisel for WebSockets. I needed to use Chisel on ARM. I cross compiled it and reduced the binary size using the following two commands:
env GOOS=linux GOARCH=arm go build -ldflags "-w -s"
~/go/src/github.com/pwaller/goupx/goupx --brute chisel
However, when I run the chisel binary on the ARM board (512MB RAM), I find that it is taking a huge amount of RAM.
The "top" yields an output of usage of 161% and 775m! However, the difference of output of "free" command taken before and after the execution of chisel client is ~6MB.
I ran strace too, and the sum of all mmap2 memory allocated is 700MB+.
The command I executed to connect to the server:
./chisel client --fingerprint <> 10.137.12.88:2002 127.0.0.1:9191:10.137.12.88:9191
Is there some way to optimize / reduce the RAM usage on Chisel?
Any pointers would be helpful!
Thanks,
I was able to reduce the VSZ to ~279m (i.e by ~60%) by modifying the arenaSizes in malloc.go (/usr/local/go/src/runtime/malloc.go).

Spark Cloudera - Worker Memory Setting [duplicate]

I am configuring an Apache Spark cluster.
When I run the cluster with 1 master and 3 slaves, I see this on the master monitor page:
Memory
2.0 GB (512.0 MB Used)
2.0 GB (512.0 MB Used)
6.0 GB (512.0 MB Used)
I want to increase the used memory for the workers but I could not find the right config for this. I have changed spark-env.sh as below:
export SPARK_WORKER_MEMORY=6g
export SPARK_MEM=6g
export SPARK_DAEMON_MEMORY=6g
export SPARK_JAVA_OPTS="-Dspark.executor.memory=6g"
export JAVA_OPTS="-Xms6G -Xmx6G"
But the used memory is still the same. What should I do to change used memory?
When using 1.0.0+ and using spark-shell or spark-submit, use the --executor-memory option. E.g.
spark-shell --executor-memory 8G ...
0.9.0 and under:
When you start a job or start the shell change the memory. We had to modify the spark-shell script so that it would carry command line arguments through as arguments for the underlying java application. In particular:
OPTIONS="$#"
...
$FWDIR/bin/spark-class $OPTIONS org.apache.spark.repl.Main "$#"
Then we can run our spark shell as follows:
spark-shell -Dspark.executor.memory=6g
When configuring it for a standalone jar, I set the system property programmatically before creating the spark context and pass the value in as a command line argument (I can make it shorter than the long winded system props then).
System.setProperty("spark.executor.memory", valueFromCommandLine)
As for changing the default cluster wide, sorry, not entirely sure how to do it properly.
One final point - I'm a little worried by the fact you have 2 nodes with 2GB and one with 6GB. The memory you can use will be limited to the smallest node - so here 2GB.
In Spark 1.1.1, to set the Max Memory of workers.
in conf/spark.env.sh, write this:
export SPARK_EXECUTOR_MEMORY=2G
If you have not used the config file yet, copy the template file
cp conf/spark-env.sh.template conf/spark-env.sh
Then make the change and don't forget to source it
source conf/spark-env.sh
In my case, I use ipython notebook server to connect to spark. I want to increase the memory for executor.
This is what I do:
from pyspark import SparkContext
from pyspark.conf import SparkConf
conf = SparkConf()
conf.setMaster(CLUSTER_URL).setAppName('ipython-notebook').set("spark.executor.memory", "2g")
sc = SparkContext(conf=conf)
According to Spark documentation you can change the Memory per Node with command line argument --executor-memory while submitting your application. E.g.
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://master.node:7077 \
--executor-memory 8G \
--total-executor-cores 100 \
/path/to/examples.jar \
1000
I've tested and it works.
The default configuration for the worker is to allocate Host_Memory - 1Gb for each worker. The configuration parameter to manually adjust that value is SPARK_WORKER_MEMORY, like in your question:
export SPARK_WORKER_MEMORY=6g.

How to change memory in EMR hadoop streaming job

I am trying to overcome the following error in a hadoop streaming job on EMR.
Container [pid=30356,containerID=container_1391517294402_0148_01_000021] is running beyond physical memory limits
I tried searching for answers but the one I found isn't working. My job is launched as shown below.
hadoop jar ../.versions/2.2.0/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar \
-input determinations/part-00000 \
-output determinations/aggregated-0 \
-mapper cat \
-file ./det_maker.py \
-reducer det_maker.py \
-Dmapreduce.reduce.java.opts="-Xmx5120M"
The last line above is supposed to do the trick as far as I understand, but I get the error:
ERROR streaming.StreamJob: Unrecognized option: -Dmapreduce.reduce.java.opts="-Xmx5120M"
What is the correct way change the memory usage ?
Also is there some documentation that explains these things to n00bs like me?
You haven't elaborated on what memory you are running low, physical or virtual.
For both problems, take a look at Amazon's documentation:
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/TaskConfiguration_H2.html
Usually the solution is to increase the amout of memory per mapper, and possibly reduce the number of mappers:
s3://elasticmapreduce/bootstrap-actions/configure-hadoop -m mapreduce.map.memory.mb=4000
s3://elasticmapreduce/bootstrap-actions/configure-hadoop -m mapred.tasktracker.map.tasks.maximum=2

AIX 6.1 , tar issue

AIX6.1, I use java to execute a tar command to extract a tar package. one stange thing I met is that some files with long name in thi tar package failed to be extracted to where they should be. but occurs at current working folder. and the file owner of these files are not correct too.
I googled and found that there many post for use GUN tar instead to avoid long file name issue. but I am sure this is not the same issue as I met.
is there anyone know why this happen? any tips are appreciate much. thanks.
The man pages are pretty instructive on this topic. Probably your tar file is not strictly POSIX compatible. On AIX:
The prefix buffer can be a maximum of 155 bytes and the name buffer can
hold a maximum of 100 bytes. If the path name cannot be split into
these two parts by a slash, it cannot be archived.
The Linux man page for GNU tar says it can handle a variety of tar file format variants. One of these is the 'ustar' POSIX standard, which appears to be the one handled by AIX tar. There is a separate gnu format, which is the default for GNU tar.
I'd suspect you're opening a GNU tar archive with a tar tool which only understands the POSIX standard, and it can't quite cope.

Resources