I am using jpillora's Chisel for WebSockets. I needed to use Chisel on ARM. I cross compiled it and reduced the binary size using the following two commands:
env GOOS=linux GOARCH=arm go build -ldflags "-w -s"
~/go/src/github.com/pwaller/goupx/goupx --brute chisel
However, when I run the chisel binary on the ARM board (512MB RAM), I find that it is taking a huge amount of RAM.
The "top" yields an output of usage of 161% and 775m! However, the difference of output of "free" command taken before and after the execution of chisel client is ~6MB.
I ran strace too, and the sum of all mmap2 memory allocated is 700MB+.
The command I executed to connect to the server:
./chisel client --fingerprint <> 10.137.12.88:2002 127.0.0.1:9191:10.137.12.88:9191
Is there some way to optimize / reduce the RAM usage on Chisel?
Any pointers would be helpful!
Thanks,
I was able to reduce the VSZ to ~279m (i.e by ~60%) by modifying the arenaSizes in malloc.go (/usr/local/go/src/runtime/malloc.go).
Related
i have a server instance here with 4 Cores and 32 GB RAM and Ubuntu 20.04.3 LTS installed. On this machine there is an opengrok-instance running as docker container.
Inside of the docker container it uses AdoptOpenJDK:
OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9)
Eclipse OpenJ9 VM AdoptOpenJDK-11.0.11+9 (build openj9-0.26.0, JRE 11 Linux amd64-64-Bit Compressed References 20210421_975 (JIT enabled, AOT enabled)
OpenJ9 - b4cc246d9
OMR - 162e6f729
JCL - 7796c80419 based on jdk-11.0.11+9)
The code-base that the opengrok-indexer scans is 320 GB big and tooks 21 hours.
What i am figured is out was, that i've am disable the history-option it tooks lesser time. Is there a possibility to reduce this time, if the history-flag is set.
Here are my index-command:
opengrok-indexer -J=-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -J=-Djava.util.logging.config.file=/usr/share/tomcat10/conf/logging.properties -J=-XX:-UseGCOverheadLimit -J=-Xmx30G -J=-Xms30G -J=-server -a /var/opengrok/dist/lib/opengrok.jar -- -R /var/opengrok/etc/read-only.xml -m 256 -c /usr/bin/ctags -s /var/opengrok/src/ -d /var/opengrok/data --remote on -H -P -S -G -W /var/opengrok/etc/configuration.xml --progress -v -O on -T 3 --assignTags --search --remote on -i *.so -i *.o -i *.a -i *.class -i *.jar -i *.apk -i *.tar -i *.bz2 -i *.gz -i *.obj -i *.zip"
Thank you for your help in advance.
Kind Regards
Siegfried
You should try to increase the number of threads using the following options:
--historyThreads number
The number of threads to use for history cache generation on repository level. By default the number of threads will be set to the number of available CPUs.
Assumes -H/--history.
--historyFileThreads number
The number of threads to use for history cache generation when dealing with individual files.
By default the number of threads will be set to the number of available CPUs.
Assumes -H/--history.
-T, --threads number
The number of threads to use for index generation, repository scan
and repository invalidation.
By default the number of threads will be set to the number of available
CPUs. This influences the number of spawned ctags processes as well.
Take a look at the "renamedHistory" option too. Theoretically "off" is the default option but this has a huge impact on the index time, so it's worth the check:
--renamedHistory on|off
Enable or disable generating history for renamed files.
If set to on, makes history indexing slower for repositories
with lots of renamed files. Default is off.
I am running multiple instances of a java web app (Play Framework).
The longer the web apps run, the less memory is available until I restart the web apps. Sometimes I get an OutOfMemory Exception.
I am trying to find the problem, but I get a lot of contradictory infos, so I am having trouble finding the source.
This are the infos:
Ubuntu 14.04.5 LTS with 12 GB of RAM
OpenJDK Runtime Environment (build 1.8.0_111-8u111-b14-3~14.04.1-b14)
OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode)
EDIT:
Here are the JVM settings:
-Xms64M
-Xmx128m
-server
(I am not 100% sure if these parameters are passed correctly to the JVM since I am using an /etc/init.d script with start-stop-daemon which starts the play framework script, which starts the JVM)
This is, how I use it:
start() {
echo -n "Starting MyApp"
sudo start-stop-daemon --background --start \
--pidfile ${APPLICATION_PATH}/RUNNING_PID \
--chdir ${APPLICATION_PATH} \
--exec ${APPLICATION_PATH}/bin/myapp \
-- \
-Dinstance.name=${NAME} \
-Ddatabase.name=${DATABASE} \
-Dfile.encoding=utf-8 \
-Dsun.jnu.encoding=utf-8 \
-Duser.country=DE \
-Duser.language=de \
-Dhttp.port=${PORT} \
-J-Xms64M \
-J-Xmx128m \
-J-server \
-J-XX:+HeapDumpOnOutOfMemoryError \
>> \
$LOGFILE 2>&1
I am picking on instance of the web apps now:
htop shows 4615M of VIRT and 338M of RES.
When I create a heap dump with jmap -dump:live,format=b,file=mydump.dump <mypid> the file has only about 50MB.
When I open it in Eclipse MAT the overview shows "20.1MB" of used memory (with the "Keep unreachable objects" option set to ON).
So how can 338MB shown in htop shrink to 20.1MB in Eclipse MAT?
I don't think this is GC related, because it doesn't matter how long I wait, htop always shows about this amount of memory, it never goes down.
In fact, I would assume that my simple app does not use more then 20MB, mabye 30MB.
I compared to heap dumps with a age difference of 4 hours with Eclipse MAT and I don't see any significant increase in objects.
PS: I added the -XX:+HeapDumpOnOutOfMemoryError option, but I have to wait for 5-7 days until it happens again. I hope to find the problem earlier with you helping me interpreting my numbers.
Thank you,
schube
The heap is the memory containing Java objects. htop surely doesn’t know about the heap. Among the things that contribute to the used memory, as reported by VIRT are
The JVM’s own code and that of the required libraries
The byte code and meta information of the loaded classes
The JIT compiled code of frequently used methods
I/O buffers
thread stacks
memory allocated for the heap, but currently not containing live objects
When you dump the heap, it will contain live Java objects, plus meta information allowing to understand the content, like class and field names. When a tool calculates the used heap, it will incorporate the objects only. So it will naturally be smaller than the heap dump file size. Also, this used memory often does not contain the unusable memory due to padding/alignment, further, the tools sometimes assume the wrong pointer size, as the relevant information (32 bit architecture vs 64 bit architecture vs compressed oops) is not available in the heap dump. These errors may sum up.
Note that there might be other reasons for an OutOfMemoryError than having too much objects in the heap. E.g. there might be too much meta information, due to a memory leak combined with dynamic class loading or too many native I/O buffers…
Quick question: what is the compiler flag to allow g++ to spawn multiple instances of itself in order to compile large projects quicker (for example 4 source files at a time for a multi-core CPU)?
You can do this with make - with gnu make it is the -j flag (this will also help on a uniprocessor machine).
For example if you want 4 parallel jobs from make:
make -j 4
You can also run gcc in a pipe with
gcc -pipe
This will pipeline the compile stages, which will also help keep the cores busy.
If you have additional machines available too, you might check out distcc, which will farm compiles out to those as well.
There is no such flag, and having one runs against the Unix philosophy of having each tool perform just one function and perform it well. Spawning compiler processes is conceptually the job of the build system. What you are probably looking for is the -j (jobs) flag to GNU make, a la
make -j4
Or you can use pmake or similar parallel make systems.
People have mentioned make but bjam also supports a similar concept. Using bjam -jx instructs bjam to build up to x concurrent commands.
We use the same build scripts on Windows and Linux and using this option halves our build times on both platforms. Nice.
If using make, issue with -j. From man make:
-j [jobs], --jobs[=jobs]
Specifies the number of jobs (commands) to run simultaneously.
If there is more than one -j option, the last one is effective.
If the -j option is given without an argument, make will not limit the
number of jobs that can run simultaneously.
And most notably, if you want to script or identify the number of cores you have available (depending on your environment, and if you run in many environments, this can change a lot) you may use ubiquitous Python function cpu_count():
https://docs.python.org/3/library/multiprocessing.html#multiprocessing.cpu_count
Like this:
make -j $(python3 -c 'import multiprocessing as mp; print(int(mp.cpu_count() * 1.5))')
If you're asking why 1.5 I'll quote user artless-noise in a comment above:
The 1.5 number is because of the noted I/O bound problem. It is a rule of thumb. About 1/3 of the jobs will be waiting for I/O, so the remaining jobs will be using the available cores. A number greater than the cores is better and you could even go as high as 2x.
make will do this for you. Investigate the -j and -l switches in the man page. I don't think g++ is parallelizable.
distcc can also be used to distribute compiles not only on the current machine, but also on other machines in a farm that have distcc installed.
I'm not sure about g++, but if you're using GNU Make then "make -j N" (where N is the number of threads make can create) will allow make to run multple g++ jobs at the same time (so long as the files do not depend on each other).
GNU parallel
I was making a synthetic compilation benchmark and couldn't be bothered to write a Makefile, so I used:
sudo apt-get install parallel
ls | grep -E '\.c$' | parallel -t --will-cite "gcc -c -o '{.}.o' '{}'"
Explanation:
{.} takes the input argument and removes its extension
-t prints out the commands being run to give us an idea of progress
--will-cite removes the request to cite the software if you publish results using it...
parallel is so convenient that I could even do a timestamp check myself:
ls | grep -E '\.c$' | parallel -t --will-cite "\
if ! [ -f '{.}.o' ] || [ '{}' -nt '{.}.o' ]; then
gcc -c -o '{.}.o' '{}'
fi
"
xargs -P can also run jobs in parallel, but it is a bit less convenient to do the extension manipulation or run multiple commands with it: Calling multiple commands through xargs
Parallel linking was asked at: Can gcc use multiple cores when linking?
TODO: I think I read somewhere that compilation can be reduced to matrix multiplication, so maybe it is also possible to speed up single file compilation for large files. But I can't find a reference now.
Tested in Ubuntu 18.10.
I'm using bazel on a computer with 4 GB RAM (to compile the tensorflow project). Bazel does however not take into account the amount of memory I have and spawns too many jobs causing my machine to swap and leading to a longer build time.
I already tried setting the ram_utilization_factor flag through the following lines in my ~/.bazelrc
build --ram_utilization_factor 30
test --ram_utilization_factor 30
but that did not help. How are these factors to be understood anyway? Should I just randomly try out some others?
Some other flags that might help:
--host_jvm_args can be used to set how much memory the JVM should use by setting -Xms and/or -Xmx, e.g., bazel --host_jvm_args=-Xmx4g --host_jvm_args=-Xms512m build //foo:bar (docs).
--local_resources in conjunction with the --ram_utilization_factor flag (docs).
--jobs=10 (or some other low number, it defaults to 200), e.g. bazel build --jobs=2 //foo:bar (docs).
Note that --host_jvm_args is a startup option so it goes before the command (build) and --jobs is a "normal" build option so it goes after the command.
For me, the --jobs argument from #kristina's answer worked:
bazel build --jobs=1 tensorflow:libtensorflow_all.so
Note: --jobs=1 must follow, not precede build, otherwise bazel will not recognize it. If you were to type bazel --jobs=1 build tensorflow:libtensorflow_all.so, you would get this error message:
Unknown Bazel startup option: '--jobs=1'.
Just wanted to second #sashoalm's comment that the --jobs=1 flag was what made bazel build finally work.
For reference, I'm running bazel on Lubuntu 17.04, running as a VirtualBox guest with about 1.5 GB RAM and two cores of an Intel i3 (I'm running a Thinkpad T460). I was following the O'Reilly tutorial on TensorFlow (https://www.oreilly.com/learning/dive-into-tensorflow-with-linux), and ran into trouble at the following step:
$ bazel build tensorflow/examples/label_image:label_image
Changing this to bazel build --jobs=1 tensorflow/... did the trick.
i ran into quite a few unstability that bazel build failed in my k8s cluster.
Besides --jobs=1, try this:
https://docs.bazel.build/versions/master/command-line-reference.html#flag--local_resources
E.g. --local_resources=4096,2.0,1.0
In the Apache Jackrabbit Oak travis build we have a unit test that
makes the build erroring out
Running org.apache.jackrabbit.oak.plugins.segment.HeavyWriteIT
/home/travis/build.sh: line 41: 3342 Killed mvn verify -P${PROFILE} ${FIXTURES} ${SUREFIRE_SKIP}
The command "mvn verify -P${PROFILE} ${FIXTURES} ${SUREFIRE_SKIP}" exited with 137.
https://travis-ci.org/apache/jackrabbit-oak/jobs/44526993
The test code can be seen at
https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/test/java/org/apache/jackrabbit/oak/plugins/segment/HeavyWriteIT.java
What's the actual explanation for the error code? How could we
workaround/solve the issue?
Error code 137 usually comes up when a script gets killed due to exhaustion of available system resources, in this case it's very likely memory. The infrastructure this build is running on has some limitations due to the underlying virtualization that can cause these errors.
I'd recommend trying out our new infrastructure, which has more resources available and should give you more stable builds: http://blog.travis-ci.com/2014-12-17-faster-builds-with-container-based-infrastructure/
Usually Killed message means that you are out of memory. Check your limits by ulimit -a or available memory by free -m, then try to increase your stack size, e.g. ulimit -s 82768 or even more.