How can I run Neo4j with larger heap size, specify -server and correct GC strategy - neo4j

As a someone who never really messed with the JVM much how can I ensure my Neo4j instances are running with all of the recommended JVM settings. E.g. Heap size, server mode, and -XX:+UseConcMarkSweepGC
Should these be set inside a config file? Can I set the dynamically at runtime? Are they set at a system level? Can I have different settings when running two instances of neo4j on the same machine?
It is a bit fuzzy at what point all of these things get set.
I am running neo4j inside a docker container so that is something to consider as well.
Dockerfile as follows. I am starting neo4j with the console command
FROM dockerfile/java:oracle-java8
# INSTALL OS DEPENDENCIES AND NEO4J
ADD /files/neo4j-enterprise-2.1.3-unix.tar.gz /opt/neo
RUN rm /opt/neo/neo4j-enterprise-2.1.3/conf/neo4j-server.properties
ADD /files/neo4j-server.properties /opt/neo/neo4j-enterprise-2.1.3/conf/neo4j-server.properties
#RUN mv -f /files/neo4j-server.properties /opt/neo/neo4j-enterprise-2.1.3/conf/neo4j-server.properties
EXPOSE 7474
CMD ["console"]
ENTRYPOINT ["/opt/neo/neo4j-enterprise-2.1.3/bin/neo4j"]

Ok, so you are using the Neo4j server script. In this case you should configure the low level JVM properties in neo4j.properties which should also live in the conf directory. Basically do the same thing for neo4j.properties as you already do for neo4j-server.properties. Create the properties file in your Docker context and configure the properties you want to add. Then in the Dockerfile use:
ADD /files/neo4j.properties /opt/neo/neo4j-enterprise-2.1.3/conf/neo4j.properties
The syntax in the properties files is the following (from the documetnation):
# initial heap size (in MB)
wrapper.java.initmemory=<value>
# maximum heap size (in MB)
wrapper.java.maxmemory=<value>
# additional literal JVM parameter, where N is a number for each
wrapper.java.additional.N=<value>
See also http://docs.neo4j.org/chunked/stable/server-performance.html.
One way to test whether the settings are applied is to run jinfo <pid> in the Docker container, where is the process id of the Neo4j JVM. To enter the container, you can either change the entrypoint to /bin/bash at the command line when you run the container or you use nsenter. The latter would be my choice.

Related

Install Keycloak adapter on WILDFLY that depends on ENVs in standalone.xml

I am trying to install the Keycloak Adapter to my WILDFLY Application Server that runs as Docker Container. I am using the image jboss/wildfly:17.0.0.Final as base image. I am having a big trouble while building my actual own image.
My Dockerfile:
FROM jboss/wildfly:17.0.0.Final
ENV $WILDFLY_HOME /opt/jboss/wildfly
COPY keycloak-adapter.zip $WILDFLY_HOME
RUN unzip $WILDFLY_HOME/keycloak-adapter.zip -d $WILDFLY_HOME
# My standalone.xml that contains ENVs
COPY standalone.xml $WILDFLY_HOME/standalone/configuration/
# Here it crashes!
RUN $WILDFLY_HOME/bin/jboss-cli.sh --file=$WILDFLY_HOME/bin/adapter-elytron-install-offline.cli
The official documentation says:
Unzip the adapter zip file in $WILDFLY_HOME (/opt/jboss/wildfly) - I've done this, works.
In order to install the adapter (when server is offline) you need to execute ./bin/jboss-cli.sh --file=bin/adapter-elytron-install-offline.cli which basically starts the server (which is needed as you cant modify the configuration otherwise) and modifies the standalone.xml.
Here is the problem. My standalone.xml is parametrized with environment variables that are only set during runtime as it runs in multiple different environments. When the ENVs are not set, the server crashes and so does the command above.
The error during docker build at the last step:
Cannot start embedded server WFLYEMB0021: Cannot start embedded process: JBTHR00005: Operation failed WFLYSRV0056: Server boot has fialed in an unrecoverable manner.
The cause
Despite of the not very precise error message I have clearly identified the unset ENVs as the cause by running the container with bash, setting the required ENVs with some random values and executing the jboss-cli command again - and it worked.
I know that the docs say its also possible to configure when the server is running but this is not an option for me, i need this configured at docker build stage.
So the problem here is they provide an offline installation that fails if the standalone.xml depends on environment variables which are usually not set during docker build. Unfortunately, i could not find a way to tell the jboss cli to ignore unset environment variables.
Do you know any workaround?

Can I run scripts from a docker build context without a copy?

I want to build on top of a windows docker container by installing a couple programs. The files total .5 GB and I want to keep the layers as small as possible. I am hoping I can run the setup files from the build-context, and then have the build-context swept away at the end so I don't have a needless copy of the source files for the setup.exe embedded in my container layers. However, I have not found an example where this is the case. Instead I mostly see people run a COPY command to a temporary build folder, run their setup, then remove the folder. Won't those files still be in the container layers because the COPY command creates a new layer when it's done?
I don't know if the container can see the build-context directly. I was hoping for some magical folder filled with the build-context files so I could run a script using it, but haven't found anything.
It seems like the alternative is to create a private file-server and perform a RUN that can download them from that private server and unpack them, run the install, and remove them (all as 1 docker step). I understand this would make it more available to others who need to rerun the build, but I'm not convinced we'll need to rerun it. It's not likely to change as the container will build patches for a legacy application. Just seems like a lot to host files on a private, public-facing server for something that will get called once every couple years if ever.
So are these my two options?
Make a container with needless copies of source files embedded within
Host the files on a private file server and download/install/remove them
Or am I missing another option or point about how the containers work?
It's a long shot as Windows is a tricky thing with file system, but you could do this way:
In your Dockerfile use a COPY command, install then RUN del ... to remove the installation files
Build your image docker build -t my-large-image:latest .
Run your image docker run --name my-large-container my-large-image:latest
Stop the container
Export your container filesystem docker export my-large-container > my-large-container.tar
Import the filesystem to a new image cat my-large-container.tar | docker import - my-small-image
Caveat is you need to run the container once which might not be what you want. And also I haven't tested with windows container, sorry.
I usually do the download or copy in one step, then in the next step I do the silent installation and remove the installer.
# escape=`
FROM mcr.microsoft.com/dotnet/framework/wcf:4.8-windowsservercore-ltsc2016
SHELL ["powershell", "-Command", "$ErrorActionPreference = 'Stop'; $ProgressPreference = 'SilentlyContinue';"]
ADD https://download.visualstudio.microsoft.com/download/pr/6afa582f-fa26-4a73-8cb9-194321e85f8d/ecea51ead62beb7acc73ad9799511ffdb3083ad384fe04ec50e2cbecfb426482/VS_RemoteTools.exe VS_RemoteTools_x64.exe
RUN Start-Process .\\VS_RemoteTools_x64.exe -ArgumentList #('/install','/quiet','/norestart') -NoNewWindow -Wait; `
Remove-Item -Path C:/VS_RemoteTools_x64.exe -Force;
But otherwise, I don't think you can mount a custom volume while it's being built.
I didn't find a satisfactory answer to this. Docker seems designed for only the modern era and assumes you'll be able to download what you need via scripts and tools hitting APIs and file servers. The easiest option I found that I eventually went with was to host the files on a private file server or service (in my case, AWS S3).
I really wish there was a way to have files hosted by the docker daemon in some way, eg. if it acted like a temporary server that you could get data from via http instead of needing to COPY the files and create a layer. Alas, I found no such feature.
Taking this route made my container about a GB smaller.

Automatically Configure Config inside Docker Container

While setting up and configure some docker containers I asked myself how I could automatically edit some config files inside the container after the containerized service finished installing (since the config files are created at the installation).
I have tried that using a shell file and adding it as the entrypoint in the Dockerfile. However, as I have said the config file does not exist right at the beginning and hence the sed commands in the script fail.
Linking an config files with - ./myConfig.conf:/xy/myConfig.conf is also not an option because the config contains some installation dependent options.
The most reasonable solution I have found was running a script, which edits the config, manually after the installation has finished with docker exec -i mycontainer sh < editconfig.sh
EDIT
My question is formulated in general terms. However, the question arose while working with Nextcloud in a docker-compose setup similar to the official example. That container contains a config.php file which is the general config file of Nextcloud and is generated during the installation. Certain properties of that files have to be changed (there are only a very limited number of environmental variables to specify). Since I am conducting some tests with this container I have to repeatedly reinstall it and thus reedit the config file.
Maybe you can try another approach and have your config file/application pick its settings from the environmental variables. That would be consistent with the 12factor app methodology see here
How I understand your case you need to start your container from creating config by some template.
I see a number of options to do it:
Use some script that generates a config from template and arguments from a command line or environment variables. (Jinja2 and python for example or Mustache and node.js ). In this case, your entrypoint generate the template and after this start application. For change config, you will be forced restart service (container).
Run some service can save the configuration and render you configuration in run time. Personally, I like consul template, we active use this engine in our environment, and have no problems for while. In this case, config is more dynamic and able to be changed "on the fly". In your container, you will have two processes, application, and consul-template daemon. Obviously, you will need to run and maintain consul. For reloading config restart of an application process is enough.
Run a custom script to create the config. :)

Set line-buffering in container output

I use Java S2I image for a container running in Openshift (on premise). My problem is that the output of the image is page-buffered and oc logs ... does not show me the last logs.
I could probably spin up my docker image that would do stdbuf -oL -e0 java ... but I would prefer to stick to the 'official' image (just adding the jar to /deployments). Is there any way to reduce buffering (use line-buffering instead of page-buffering), or flush the output on demand?
EDIT: It seems that I could update deployment config and pass stdbuf in there, but that means that I'd have to compose all the args myself. Ideal solution would be passing --tty do Docker, but I can't see how a custom arguments could be passed that way in Openshift.
In your repo, try creating the file .s2i/bin/run. In it add:
#/bin/bash
exec stdbuf -oL -e0 /usr/local/s2i/run
I always forget where the S2I assemble and run scripts are in the Java S2I image, so you may need to replace /usr/local/s2i with the correct path.
What adding this file does is that it will be run as the startup command instead of the original run script. You can then run the original script with stdbuf. Ensure you use exec so that the sub process replaces the current one, else signals will not be propagated through properly.
Even though this might work, am surprised logging isn't working in an unbuffered mode already. I expect there would be a better way of controlling it through some Java config instead.

Spark Cloudera - Worker Memory Setting [duplicate]

I am configuring an Apache Spark cluster.
When I run the cluster with 1 master and 3 slaves, I see this on the master monitor page:
Memory
2.0 GB (512.0 MB Used)
2.0 GB (512.0 MB Used)
6.0 GB (512.0 MB Used)
I want to increase the used memory for the workers but I could not find the right config for this. I have changed spark-env.sh as below:
export SPARK_WORKER_MEMORY=6g
export SPARK_MEM=6g
export SPARK_DAEMON_MEMORY=6g
export SPARK_JAVA_OPTS="-Dspark.executor.memory=6g"
export JAVA_OPTS="-Xms6G -Xmx6G"
But the used memory is still the same. What should I do to change used memory?
When using 1.0.0+ and using spark-shell or spark-submit, use the --executor-memory option. E.g.
spark-shell --executor-memory 8G ...
0.9.0 and under:
When you start a job or start the shell change the memory. We had to modify the spark-shell script so that it would carry command line arguments through as arguments for the underlying java application. In particular:
OPTIONS="$#"
...
$FWDIR/bin/spark-class $OPTIONS org.apache.spark.repl.Main "$#"
Then we can run our spark shell as follows:
spark-shell -Dspark.executor.memory=6g
When configuring it for a standalone jar, I set the system property programmatically before creating the spark context and pass the value in as a command line argument (I can make it shorter than the long winded system props then).
System.setProperty("spark.executor.memory", valueFromCommandLine)
As for changing the default cluster wide, sorry, not entirely sure how to do it properly.
One final point - I'm a little worried by the fact you have 2 nodes with 2GB and one with 6GB. The memory you can use will be limited to the smallest node - so here 2GB.
In Spark 1.1.1, to set the Max Memory of workers.
in conf/spark.env.sh, write this:
export SPARK_EXECUTOR_MEMORY=2G
If you have not used the config file yet, copy the template file
cp conf/spark-env.sh.template conf/spark-env.sh
Then make the change and don't forget to source it
source conf/spark-env.sh
In my case, I use ipython notebook server to connect to spark. I want to increase the memory for executor.
This is what I do:
from pyspark import SparkContext
from pyspark.conf import SparkConf
conf = SparkConf()
conf.setMaster(CLUSTER_URL).setAppName('ipython-notebook').set("spark.executor.memory", "2g")
sc = SparkContext(conf=conf)
According to Spark documentation you can change the Memory per Node with command line argument --executor-memory while submitting your application. E.g.
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://master.node:7077 \
--executor-memory 8G \
--total-executor-cores 100 \
/path/to/examples.jar \
1000
I've tested and it works.
The default configuration for the worker is to allocate Host_Memory - 1Gb for each worker. The configuration parameter to manually adjust that value is SPARK_WORKER_MEMORY, like in your question:
export SPARK_WORKER_MEMORY=6g.

Resources