Increase Flume MaxHeap - flume

Good Afternoon,
I'm having trouble increasing the Heap Size for Flume. As a result, I get:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
I've increased the heap defined in "flume-env.sh" as well as Hadoop/Yarn. No luck.
One thing to notice, on starting flume, the Exec (processbuilder?) seems to be defining heap as 20Mb. Any ideas on how to override it?
Info: Including Hadoop libraries found via (/usr/local/hadoop/bin/hadoop) for HDFS access
Info: Including Hive libraries found via () for Hive access
+ exec /usr/lib/jvm/java-9-openjdk-amd64/bin/java -Xmx20m -cp 'conf:/usr/local/flume/lib/* :
........
Ultimately I'm trying to set Heapsize to 1512MB.

Increasing the heap in "flume_env.sh" should work. You can also try executing your Flume agent as follows:
flume-ng agent -n myagent -Xmx512m
Flume is able to read -D and -X options from the command line.

Related

Tweak logging of standalone Neo4j server

When running the Neo4j standalone community Docker, some logs are written to stdout and some to a file inside the container: debug.log.
I would like to be able to set log4j options, like log-level, appenders etc. The reasons are:
I can't access the log files when running in e.g AWS ECS
The debug logs are quite verbose
log4j-properties are convenient to deploy and manage
So the question is, how can I set a custom log4j properties for a Neo4j server that's running inside a container?
It's not documented if it's even possible, but I have tried the usual things one starts to thing of. First adding a log4j.xml to the classpath but to no avail. Also I have tried setting dbms.jvm.additional to
-Dlog4j.configuration=<path_to_file>
-Dlog4j.configurationFile=<path_to_file>
-Dlog4j2.configurationFile==<path_to_file>
I've verified that the Neo4j process has the correct jvm-arguments:
/usr/local/openjdk-11/bin/java -cp /var/lib/neo4j/plugins:/var/lib/neo4j/conf:/var/lib/neo4j/lib/*:/var/lib/neo4j/plugins/*
-Xms2048m -Xmx2048m -XX:+UseG1GC -XX:-OmitStackTraceInFastThrow -XX:+AlwaysPreTouch
-XX:+UnlockExperimentalVMOptions -XX:+TrustFinalNonStaticFields -XX:+DisableExplicitGC
-XX:MaxInlineLevel=15 -XX:-UseBiasedLocking -Djdk.nio.maxCachedBufferSize=262144
-Dio.netty.tryReflectionSetAccessible=true -Djdk.tls.ephemeralDHKeySize=2048
-Djdk.tls.rejectClientInitiatedRenegotiation=true -XX:FlightRecorderOptions=stackdepth=256
-XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -Dlog4j2.disable.jmx=true
-Dlog4j.configuration=file:/var/lib/neo4j/conf/log4j.properties
-Dlog4j.configurationFile=file:/var/lib/neo4j/conf/log4j.xml
-Dlog4j2.configurationFile=file:/var/lib/neo4j/conf/log4j2.xml
-Dfile.encoding=UTF-8 org.neo4j.server.CommunityEntryPoint
--home-dir=/var/lib/neo4j --config-dir=/var/lib/neo4j/conf
I have verified that no lib-jars contains a log4j-properties that takes precedence.
And whatever I try, no changes are made to the way the server is logging. The same events are written to /logs/debug.log. No exceptions regarding bad log4j config or something like that.
I have created a small project for easier debugging.
I would write out the files to stdout.
# forward logs to docker log collector
RUN ln -sf /dev/stdout /logs/debug.log
So this is a 'docker' solution but handy and works also for other usecases.

Java process killed due to out of memory after running job in container for an hour

I have setup a job for running automation tests in CircleCI (https://hub.docker.com/r/jiteshsojitra/docker-headless-vnc-container), it works fine but after running tests for an hour it reaches to memory limit and suddenly kills running Java/ant job. So is there any way to increase the container memory so tests can be ran for 5-6 hours in container or its paid feature?
I tried by putting - JAVA_OPTS: -Xms512m -Xmx1024m in YAML script but overall container memory size reaches to ~4GB as it looks like.
References:
https://circleci.com/gh/jiteshsojitra/zm-selenium/231
https://circleci.com/api/v1.1/project/github/jiteshsojitra/zm-selenium/231/output/106/0?file=true
Log trail:
BUILD FAILED
/headless/zm-selenium/build.xml:348: Java returned: 137
Total time: 76 minutes 26 seconds
Exited with code 137
Hint: Exit code 137 typically means the process is killed because it was running out of memory
Hint: Check if you can optimize the memory usage in your app
Hint: Max memory usage of this container is 4286337024
according to /sys/fs/cgroup/memory/memory.max_usage_in_bytes
We had this problem. It is a limit in CircleCI (or the VM, really). Only solution is to make your app use less memory.
fiskeben is right. I think your mentioned container is an fork of our consol/docker-headless-vnc-container image. So you can add in the startup script the following lines.
# set correct java startup
export _JAVA_OPTIONS="-Duser.home=$HOME -Xmx${JVM_HEAP_XMX}m"
# add docker jvm flags, can maybe removed with JDK9
export _JAVA_OPTIONS="$_JAVA_OPTIONS -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap"
Now you would be able to set the environment variable JVM_HEAP_XMX with amount of megabytes your JVM should use, e.g.
docker run -e JVM_HEAP_XMX=512 ...
If you want to determine the size dynamically take a look at that script jvm_options.sh.

Spark Cloudera - Worker Memory Setting [duplicate]

I am configuring an Apache Spark cluster.
When I run the cluster with 1 master and 3 slaves, I see this on the master monitor page:
Memory
2.0 GB (512.0 MB Used)
2.0 GB (512.0 MB Used)
6.0 GB (512.0 MB Used)
I want to increase the used memory for the workers but I could not find the right config for this. I have changed spark-env.sh as below:
export SPARK_WORKER_MEMORY=6g
export SPARK_MEM=6g
export SPARK_DAEMON_MEMORY=6g
export SPARK_JAVA_OPTS="-Dspark.executor.memory=6g"
export JAVA_OPTS="-Xms6G -Xmx6G"
But the used memory is still the same. What should I do to change used memory?
When using 1.0.0+ and using spark-shell or spark-submit, use the --executor-memory option. E.g.
spark-shell --executor-memory 8G ...
0.9.0 and under:
When you start a job or start the shell change the memory. We had to modify the spark-shell script so that it would carry command line arguments through as arguments for the underlying java application. In particular:
OPTIONS="$#"
...
$FWDIR/bin/spark-class $OPTIONS org.apache.spark.repl.Main "$#"
Then we can run our spark shell as follows:
spark-shell -Dspark.executor.memory=6g
When configuring it for a standalone jar, I set the system property programmatically before creating the spark context and pass the value in as a command line argument (I can make it shorter than the long winded system props then).
System.setProperty("spark.executor.memory", valueFromCommandLine)
As for changing the default cluster wide, sorry, not entirely sure how to do it properly.
One final point - I'm a little worried by the fact you have 2 nodes with 2GB and one with 6GB. The memory you can use will be limited to the smallest node - so here 2GB.
In Spark 1.1.1, to set the Max Memory of workers.
in conf/spark.env.sh, write this:
export SPARK_EXECUTOR_MEMORY=2G
If you have not used the config file yet, copy the template file
cp conf/spark-env.sh.template conf/spark-env.sh
Then make the change and don't forget to source it
source conf/spark-env.sh
In my case, I use ipython notebook server to connect to spark. I want to increase the memory for executor.
This is what I do:
from pyspark import SparkContext
from pyspark.conf import SparkConf
conf = SparkConf()
conf.setMaster(CLUSTER_URL).setAppName('ipython-notebook').set("spark.executor.memory", "2g")
sc = SparkContext(conf=conf)
According to Spark documentation you can change the Memory per Node with command line argument --executor-memory while submitting your application. E.g.
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://master.node:7077 \
--executor-memory 8G \
--total-executor-cores 100 \
/path/to/examples.jar \
1000
I've tested and it works.
The default configuration for the worker is to allocate Host_Memory - 1Gb for each worker. The configuration parameter to manually adjust that value is SPARK_WORKER_MEMORY, like in your question:
export SPARK_WORKER_MEMORY=6g.

How can I run Neo4j with larger heap size, specify -server and correct GC strategy

As a someone who never really messed with the JVM much how can I ensure my Neo4j instances are running with all of the recommended JVM settings. E.g. Heap size, server mode, and -XX:+UseConcMarkSweepGC
Should these be set inside a config file? Can I set the dynamically at runtime? Are they set at a system level? Can I have different settings when running two instances of neo4j on the same machine?
It is a bit fuzzy at what point all of these things get set.
I am running neo4j inside a docker container so that is something to consider as well.
Dockerfile as follows. I am starting neo4j with the console command
FROM dockerfile/java:oracle-java8
# INSTALL OS DEPENDENCIES AND NEO4J
ADD /files/neo4j-enterprise-2.1.3-unix.tar.gz /opt/neo
RUN rm /opt/neo/neo4j-enterprise-2.1.3/conf/neo4j-server.properties
ADD /files/neo4j-server.properties /opt/neo/neo4j-enterprise-2.1.3/conf/neo4j-server.properties
#RUN mv -f /files/neo4j-server.properties /opt/neo/neo4j-enterprise-2.1.3/conf/neo4j-server.properties
EXPOSE 7474
CMD ["console"]
ENTRYPOINT ["/opt/neo/neo4j-enterprise-2.1.3/bin/neo4j"]
Ok, so you are using the Neo4j server script. In this case you should configure the low level JVM properties in neo4j.properties which should also live in the conf directory. Basically do the same thing for neo4j.properties as you already do for neo4j-server.properties. Create the properties file in your Docker context and configure the properties you want to add. Then in the Dockerfile use:
ADD /files/neo4j.properties /opt/neo/neo4j-enterprise-2.1.3/conf/neo4j.properties
The syntax in the properties files is the following (from the documetnation):
# initial heap size (in MB)
wrapper.java.initmemory=<value>
# maximum heap size (in MB)
wrapper.java.maxmemory=<value>
# additional literal JVM parameter, where N is a number for each
wrapper.java.additional.N=<value>
See also http://docs.neo4j.org/chunked/stable/server-performance.html.
One way to test whether the settings are applied is to run jinfo <pid> in the Docker container, where is the process id of the Neo4j JVM. To enter the container, you can either change the entrypoint to /bin/bash at the command line when you run the container or you use nsenter. The latter would be my choice.

How to change the memory config for the Neo4j-shell command tool of Cypher batch import

I am trying to use neo4j-shell command tool to do Cypher batch import. I followed the instructions described in Import data into your neo4j database from the neo4j-shell command. Here was the command that I ran:
import-cypher -d "," -i c://temp//neo//import.csv -o c://temp//neo//out.csv start n=node:employee_idx(EmpID={emp_id}), m=node:permit_idx(PmtID={pmtid}) create n<-[:Assign{AssID:{assid}}]-m
If there were only 100000 records in import.csv file, it ran perfectly. But if there were 200000 records in import.csv file, I got error: Error occurred in server thread; nested exception is: java.lang.OutOfMemory Error: Java heap space.
How to change the default memory config of this tool?
You need to set the environment variable JAVA_OPTS to appropriate values, e.g. on Linux it can be done using
JAVA_OPTS="-Xmx4G" bin/neo4j-shell

Resources