jvm in kubernetes/docker running out of memory faster than standalone - docker

We are moving our JDK 1.8v131 JVM servers to Kubernetes/Docker environment.
We have few JVM servers running in stand alone VMs and few running Kubernetes/Docker environment and both types are present in production.
With the same load Kubernetes/Docker JVMs are running out of memory whereas JVMs in VMs are running fine without issues.
We used exact SAME JVM parameters for running in VM & Container.
Any ideas how to fix this issue?
Here are the options:
Environment:
JAVA_MEM_OPTS: -Xms2048M -Xmx2048M
-XX:MaxPermSize=256M -XX:+ExitOnOutOfMemoryError -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/heapdumps/${HOSTNAME}_$(date +%Y%m%d_%H_%M_%S).hprof
JAVA_GC_OPTS: -Dnogclogging=true -XX:+PrintGC -XX:+PrintGCDetail
2018-12-07T15:43:21.42043862Z {Heap before GC invocations=2880 (full
625): 2018-12-07T15:43:21.420465613Z PSYoungGen total
435712K, used 249344K
2018-12-07T15:43:21.420469712Z eden space 249344K, 100% used 2018-12-07T15:43:21.420472561Z from space 186368K, 0% used
2018-12-07T15:43:21.420475332Z to space 228352K, 0% used
2018-12-07T15:43:21.420477921Z ParOldGen total 1398272K,
used 1397679K 2018-12-07T15:43:21.420480674Z object space
1398272K, 99% used 2018-12-07T15:43:21.420483127Z Metaspace
used 229431K, capacity 249792K, committed 249968K, reserved 1271808K
2018-12-07T15:43:21.420485549Z class space used 24598K,
capacity 27501K, committed 27544K, reserved 1048576K
2018-12-07T15:43:22.628605014Z 2018-12-07T15:43:21.420+0000:
124733.208: ] ] 1647023K->1646334K(1833984K), ], 1.2079201 secs] [Times: user=1.98 sys=0.01, real=1.21 secs]
2018-12-07T15:43:22.62868917Z Heap after GC invocations=2880 (full
625): 2018-12-07T15:43:22.628794768Z PSYoungGen total
435712K, used 248654K
2018-12-07T15:43:22.628799885Z eden space 249344K, 99% used 2018-12-07T15:43:22.628803713Z from space 186368K, 0% used
2018-12-07T15:43:22.628807485Z to space 228352K, 0% used
2018-12-07T15:43:22.628811115Z ParOldGen total 1398272K,
used 1397679K 2018-12-07T15:43:22.62881498Z object space
1398272K, 99% used 2018-12-07T15:43:22.628818943Z Metaspace
used 229431K, capacity 249792K, committed 249968K, reserved 1271808K
2018-12-07T15:43:22.628827543Z class space used 24598K,
capacity 27501K, committed 27544K, reserved 1048576K
2018-12-07T15:43:22.628831766Z } 2018-12-07T15:43:22.632712004Z
{Heap before GC invocations=2881 (full 626):
2018-12-07T15:43:22.63273803Z PSYoungGen total 435712K, used
249344K
2018-12-07T15:43:22.632742051Z eden space 249344K, 100% used **
**2018-12-07T15:43:22.63274617Z from space 186368K, 0% used 2018-12-07T15:43:22.632752151Z to space 228352K, 0% used
2018-12-07T15:43:22.632756279Z ParOldGen total 1398272K, used
1397679K 2018-12-07T15:43:22.632760269Z object space
1398272K, 99% used 2018-12-07T15:43:22.632764456Z Metaspace
used 229431K, capacity 249792K, committed 249968K, reserved 1271808K
2018-12-07T15:43:22.632768599Z class space used 24598K,
capacity 27501K, committed 27544K, reserved 1048576K
2018-12-07T15:43:23.164683101Z 2018-12-07T15:43:22.632+0000:
124734.420:
SERVER RESTARTS HERE

Did you set your container memory resouce requests and limits? Jdk 8u131 doesn't know that it is running inside a container. It still sees the host VMs resources. That could be why your JVM inside the container is killed immediately.
There's a good article from redhat back in 2017.
https://developers.redhat.com/blog/2017/03/14/java-inside-docker/

Related

Growing memory usage (leak?) in Dask Distributed profiler

I have a longish running task that I submit to a Dask cluster (worker is running 1 process and 1 thread) and I use tracemalloc to track memory usage. The task can run long enough that memory usage builds up and has caused all sorts of problems. Here is the structure of how I used tracemalloc.
def task():
tracemalloc.start()
...
snapshot1 = tracemalloc.take_snapshot()
for i in range(10):
...
snapshot2 = tracemalloc.take_snapshot()
top_stats = snapshot2.compare_to(snapshot1, "lineno")
print("[ Top 6 differences ]")
for stat in top_stats[:6]:
print(str(stat))
I get the following (cleaned up a tad) which shows that the profiler in Dask Distributed is accruing memory. This was after the second iteration and these memory numbers grow linearly.
[ Top 6 differences ]
/usr/local/lib/python3.8/site-packages/distributed/profile.py:112:
size=137 MiB (+113 MiB), count=1344168 (+1108779), average=107 B
/usr/local/lib/python3.8/site-packages/distributed/profile.py:68:
size=135 MiB (+110 MiB), count=1329005 (+1095393), average=106 B
/usr/local/lib/python3.8/site-packages/distributed/profile.py:48:
size=93.7 MiB (+78.6 MiB), count=787568 (+655590), average=125 B
/usr/local/lib/python3.8/site-packages/distributed/profile.py:118:
size=82.3 MiB (+66.5 MiB), count=513462 (+414447), average=168 B
/usr/local/lib/python3.8/site-packages/distributed/profile.py:67:
size=64.4 MiB (+53.1 MiB), count=778747 (+647905), average=87 B
/usr/local/lib/python3.8/site-packages/distributed/profile.py:115:
size=48.1 MiB (+40.0 MiB), count=787415 (+655449), average=64 B
Does anyone know how to clean out the profiler or not use it (we're not using the dashboard so we don't need it)?
I set the following environment variables on the worker pods so this would dramatically reduce profiling. It seems to be working.
DASK_DISTRIBUTED__WORKER__PROFILE__INTERVAL=10000ms
DASK_DISTRIBUTED__WORKER__PROFILE__CYCLE=1000000ms
The defaults can be found here: https://github.com/dask/distributed/blob/master/distributed/distributed.yaml#L74-L76
ETA: #rpanai This is what we in the K8s manifest for the deployment
spec:
template:
spec:
containers:
- env:
- name: DASK_DISTRIBUTED__WORKER__PROFILE__INTERVAL
value: 10000ms
- name: DASK_DISTRIBUTED__WORKER__PROFILE__CYCLE
value: 1000000ms

dask jobqueue worker failure at startup 'Resource temporarily unavailable'

I'm running dask over slurm via jobqueue and I have been getting 3 errors pretty consistently...
Basically my question is what could be causing these failures? At first glance the problem is that too many workers are writing to disk at once, or my workers are forking into many other processes, but it's pretty difficult to track that. I can ssh into the node but I'm not seeing an abnormal number of processes, and each node has a 500gb ssd, so I shouldn't be writing excessively.
Everything below this is just information about my configurations and such
My setup is as follows:
cluster = SLURMCluster(cores=1, memory=f"{args.gbmem}GB", queue='fast_q', name=args.name,
env_extra=["source ~/.zshrc"])
cluster.adapt(minimum=1, maximum=200)
client = await Client(cluster, processes=False, asynchronous=True)
I suppose i'm not even sure if processes=False should be set.
I run this starter script via sbatch under the conditions of 4gb of memory, 2 cores (-c) (even though i expect to only need 1) and 1 task (-n). And this sets off all of my jobs via the slurmcluster config from above. I dumped my slurm submission scripts to files and they look reasonable.
Each job is not complex, it is a subprocess.call( command to a compiled executable that takes 1 core and 2-4 GB of memory. I require the client call and further calls to be asynchronous because I have a lot of conditional computations. So each worker when loaded should consist of 1 python processes, 1 running executable, and 1 shell.
Imposed by the scheduler we have
>> ulimit -a
-t: cpu time (seconds) unlimited
-f: file size (blocks) unlimited
-d: data seg size (kbytes) unlimited
-s: stack size (kbytes) 8192
-c: core file size (blocks) 0
-m: resident set size (kbytes) unlimited
-u: processes 512
-n: file descriptors 1024
-l: locked-in-memory size (kbytes) 64
-v: address space (kbytes) unlimited
-x: file locks unlimited
-i: pending signals 1031203
-q: bytes in POSIX msg queues 819200
-e: max nice 0
-r: max rt priority 0
-N 15: unlimited
And each node has 64 cores. so I don't really think i'm hitting any limits.
i'm using the jobqueue.yaml file that looks like:
slurm:
name: dask-worker
cores: 1 # Total number of cores per job
memory: 2 # Total amount of memory per job
processes: 1 # Number of Python processes per job
local-directory: /scratch # Location of fast local storage like /scratch or $TMPDIR
queue: fast_q
walltime: '24:00:00'
log-directory: /home/dbun/slurm_logs
I would appreciate any advice at all! Full log is below.
FORK BLOCKING IO ERROR
distributed.nanny - INFO - Start Nanny at: 'tcp://172.16.131.82:13687'
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/dbun/.local/share/pyenv/versions/3.7.0/lib/python3.7/multiprocessing/forkserver.py", line 250, in main
pid = os.fork()
BlockingIOError: [Errno 11] Resource temporarily unavailable
distributed.dask_worker - INFO - End worker
Aborted!
CANT START NEW THREAD ERROR
https://pastebin.com/ibYUNcqD
BLOCKING IO ERROR
https://pastebin.com/FGfxqZEk
EDIT:
Another piece of the puzzle:
It looks like dask_worker is running multiple multiprocessing.forkserver calls? does that sound reasonable?
https://pastebin.com/r2pTQUS4
This problem was caused by having ulimit -u too low.
As it turns out each worker has a few processes associated with it, and the python ones have multiple threads. In the end you end up with approximately 14 threads that contribute to your ulimit -u. Mine was set to 512, and with a 64 core system I was likely hitting ~896. It looks like the a maximum threads per a process I could have had would have been 8.
Solution:
in .zshrc (.bashrc) I added the line
ulimit -u unlimited
Haven't had any problems since.

How to speed up sonarqube analysis job?

I have one java based application which is having huge line of source code(~1m).Now I am using jenkins with sonar-runner-2.4 to run analysis with code coverage and test cases count.I have upgraded sonarqube server from 5.4 to 6.3.1.Before upgrade this job took 9hrs to complete the whole analysis (still it is very much long time but fine) but after upgrade to sonarqube-6.3.1 same job taking 13hrs to complete the same analysis.
How do I improve analysis time at least my earlier time 9hr ?
EDIT
Here is my JAVA_OPTS for sonarqube-6.3.1 instance
sonar.web.javaOpts=-Xmx6G -Xms2G -XX:MaxPermSize=1G -XX:+HeapDumpOnOutOfMemoryError -Djava.net.preferIPv4Stack=true
Available Hardware :
$lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 26
Stepping: 5
CPU MHz: 1596.000
BogoMIPS: 3999.44
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 4096K
NUMA node0 CPU(s): 0-3
NUMA node1 CPU(s): 4-7
Available Memory :
$free -m
total used free shared buff/cache available
Mem: 128714 58945 66232 430 3535 68298
Swap: 32767 957 31810
sonar-project.properties for the long running job:
sonar-project.properties
As you haven't really given many details, I can't really give many details in the answer, but the simple answer is that you have to make the scan do less work.
Look at your codebase. Is your scan processing generated classes? Is it scanning test classes? Is it scanning classes that have little real business logic? If you answer "yes" to any of those, consider excluding those classes.
Look at the SonarQube plugins you're using. Are you running every possible plugin you can run? Are there some heuristics you don't need to run, or perhaps you could run less frequently?

Cassandra eats memory

I have Cassandra 2.1 and following properties set:
MAX_HEAP_SIZE="5G"
HEAP_NEWSIZE="800M"
memtable_allocation_type: heap_buffers
top utility shows that cassandra eats 14.6G virtual memory:
KiB Mem: 16433148 total, 16276592 used, 156556 free, 22920 buffers
KiB Swap: 16777212 total, 0 used, 16777212 free. 9295960 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
23120 cassand+ 20 0 14.653g 5.475g 29132 S 318.8 34.9 27:07.43 java
It also dies with various OutOfMemoryError exceptions when I am accessing it from Spark.
How I can prevent this "OutOfMemoryErrors" and reduce memory usage?
Cassandra do eat to much memory but it can be controlled but tuning the GC [Garbage Collection] setting.
GC parameters are contained in the bin/cassandra.in.sh file in the JAVA_OPTS variable.
you can apply these settings in JAVA_OPTS
-XX:+UseConcMarkSweepGC
-XX:ParallelCMSThreads=1
-XX:+CMSIncrementalMode
-XX:+CMSIncrementalPacing
-XX:CMSIncrementalDutyCycleMin=0
-XX:CMSIncrementalDutyCycle=10
Or instead of specifying MAX_HEAP_SIZE and HEAP_NEWSIZE these parameter let cassandra'script specify these parameter Because it will assign best values for these parameter.

Why does my garbage collection log show 3.8GB as the max available heap size while I have allocated 4GB as the max heap size?

I have a 64-bit hotspot JDK version 1.7.0 installed on a 64-bit RHEL 6 machine. I use the following JVM options for my tomcat application.
CATALINA_OPTS="${CATALINA_OPTS} -Dfile.encoding=UTF8 -Dorg.apache.catalina.loader.WebappClassLoader.ENABLE_CLEAR_REFERENCES=false -Duser.timezone=EST5EDT"
# General Heap sizing
CATALINA_OPTS="${CATALINA_OPTS} -Xms4096m -Xmx4096m -XX:NewSize=2048m -XX:MaxNewSize=2048m -XX:PermSize=512m -XX:MaxPermSize=512m -XX:+UseCompressedOops -XX:+DisableExplicitGC"
# Enable the CMS GC policy
CATALINA_OPTS="${CATALINA_OPTS} -XX:+UseConcMarkSweepGC -XX:CMSWaitDuration=15000 -XX:+CMSParallelRemarkEnabled -XX:+CMSCompactWhenClearAllSoftRefs -XX:+CMSConcurrentMTEnabled -XX:+CMSScavengeBeforeRemark -XX:+CMSClassUnloadingEnabled"
# Verbose Garbage Collection Logging
CURRENT_DATE=`date +%Y%m%d%H%M%S`
CATALINA_OPTS="${CATALINA_OPTS} -verbose:gc -XX:+PrintGCDetails -Xloggc:${CATALINA_BASE}/logs/gc-${CURRENT_DATE}.log -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution"
When I have a Garbage Collection analysis, the GC logs show a maximum available heap of only 3.8GB instead of 4GB allocated to the JVM. Why is that?
New Generation (2048M) consists of 80% Eden (1638.4M) and two Survivor Spaces (10% or 204.8M each):
Heap
par new generation total 1887488K, used 134226K [0x00000006fae00000, 0x000000077ae00000, 0x000000077ae00000)
eden space 1677824K, 8% used [0x00000006fae00000, 0x00000007031148e0, 0x0000000761480000)
from space 209664K, 0% used [0x0000000761480000, 0x0000000761480000, 0x000000076e140000)
to space 209664K, 0% used [0x000000076e140000, 0x000000076e140000, 0x000000077ae00000)
concurrent mark-sweep generation total 2097152K, used 242K [0x000000077ae00000, 0x00000007fae00000, 0x00000007fae00000)
At any time one of survivor spaces is empty (see Generations).
So, the useful heap size is 1638.4 + 204.8 + 2048 = 3891.2 MB

Resources