PySpark doesn't find Kafka source - docker

I am trying to deploy a docker container with Kafka and Spark and would like to read to Kafka Topic from a pyspark application. Kafka is working and I can write to a topic and also spark is working. But when I try to read the Kafka stream I get the error message:
pyspark.sql.utils.AnalysisException: Failed to find data source: kafka. Please deploy the application as per the deployment section of "Structured Streaming + Kafka Integration Guide".
My Docker Compose yaml looks like this:
---
version: '3.7'
services:
zookeeper:
image: bitnami/zookeeper:3
ports:
- 2181:2181
environment:
ALLOW_ANONYMOUS_LOGIN: "yes"
kafka:
image: bitnami/kafka:2
ports:
- 9092:9092
environment:
KAFKA_CFG_ZOOKEEPER_CONNECT: zookeeper:2181
ALLOW_PLAINTEXT_LISTENER: "yes"
KAFKA_LISTENERS: >-
INTERNAL://:29092,EXTERNAL://:9092
KAFKA_ADVERTISED_LISTENERS: >-
INTERNAL://kafka:29092,EXTERNAL://localhost:9092
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: >-
INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT
KAFKA_INTER_BROKER_LISTENER_NAME: "INTERNAL"
depends_on:
- zookeeper
spark:
image: docker.io/bitnami/spark:3-debian-10
environment:
- SPARK_MODE=master
- SPARK_RPC_AUTHENTICATION_ENABLED=no
- SPARK_RPC_ENCRYPTION_ENABLED=no
- SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
- SPARK_SSL_ENABLED=no
ports:
- '8080:8080'
volumes:
- ./:/home/workspace/
- ./spark/jars:/opt/bitnami/spark/.ivy2
spark-worker-1:
image: docker.io/bitnami/spark:3-debian-10
environment:
- SPARK_MODE=worker
- SPARK_MASTER_URL=spark://spark:7077
- SPARK_WORKER_MEMORY=1G
- SPARK_WORKER_CORES=1
- SPARK_RPC_AUTHENTICATION_ENABLED=no
- SPARK_RPC_ENCRYPTION_ENABLED=no
- SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
- SPARK_SSL_ENABLED=no
volumes:
- ./:/home/workspace/
- ./spark/jars:/opt/bitnami/spark/.ivy2
kafdrop:
image: obsidiandynamics/kafdrop:latest
ports:
- 9000:9000
environment:
KAFKA_BROKERCONNECT: kafka:29092
depends_on:
- kafka
and the pyspark app:
from pyspark.sql import SparkSession
import os
#os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.2.0,org.apache.kafka:kafka-clients:2.8.1'
# the source for this data pipeline is a kafka topic, defined below
spark = SparkSession.builder.appName("fuel-level").master("local[*]").getOrCreate()
spark.sparkContext.setLogLevel('WARN')
kafkaRawStreamingDF = spark \
.readStream \
.format("kafka") \
.option("kafka.bootstrap.servers", "localhost:9092") \
.option("subscribe","SimLab-KUKA") \
.option("startingOffsets","earliest")\
.load()
#this is necessary for Kafka Data Frame to be readable, into a single column value
kafkaStreamingDF = kafkaRawStreamingDF.selectExpr("cast(key as string) key", "cast(value as string) value")
kafkaStreamingDF.writeStream.outputMode("append").format("console").start().awaitTermination()
I am new to Spark and docker, so maybe It's an obvious mistake, I hope you can help me
EDIT
When I uncomment os.env I get the following error:
Error: Missing application resource.
Usage: spark-submit [options] <app jar | python file | R file> [app arguments]
Usage: spark-submit --kill [submission ID] --master [spark://...]
Usage: spark-submit --status [submission ID] --master [spark://...]
Usage: spark-submit run-example [options] example-class [example args]
Options:
--master MASTER_URL spark://host:port, mesos://host:port, yarn,
k8s://https://host:port, or local (Default: local[*]).
--deploy-mode DEPLOY_MODE Whether to launch the driver program locally ("client") or
on one of the worker machines inside the cluster ("cluster")
(Default: client).
--class CLASS_NAME Your application's main class (for Java / Scala apps).
--name NAME A name of your application.
--jars JARS Comma-separated list of jars to include on the driver
and executor classpaths.
--packages Comma-separated list of maven coordinates of jars to include
on the driver and executor classpaths. Will search the local
maven repo, then maven central and any additional remote
repositories given by --repositories. The format for the
coordinates should be groupId:artifactId:version.
--exclude-packages Comma-separated list of groupId:artifactId, to exclude while
resolving the dependencies provided in --packages to avoid
dependency conflicts.
--repositories Comma-separated list of additional remote repositories to
search for the maven coordinates given with --packages.
--py-files PY_FILES Comma-separated list of .zip, .egg, or .py files to place
on the PYTHONPATH for Python apps.
--files FILES Comma-separated list of files to be placed in the working
directory of each executor. File paths of these files
in executors can be accessed via SparkFiles.get(fileName).
--archives ARCHIVES Comma-separated list of archives to be extracted into the
working directory of each executor.
--conf, -c PROP=VALUE Arbitrary Spark configuration property.
--properties-file FILE Path to a file from which to load extra properties. If not
specified, this will look for conf/spark-defaults.conf.
--driver-memory MEM Memory for driver (e.g. 1000M, 2G) (Default: 1024M).
--driver-java-options Extra Java options to pass to the driver.
--driver-library-path Extra library path entries to pass to the driver.
--driver-class-path Extra class path entries to pass to the driver. Note that
jars added with --jars are automatically included in the
classpath.
--executor-memory MEM Memory per executor (e.g. 1000M, 2G) (Default: 1G).
--proxy-user NAME User to impersonate when submitting the application.
This argument does not work with --principal / --keytab.
--help, -h Show this help message and exit.
--verbose, -v Print additional debug output.
--version, Print the version of current Spark.
Cluster deploy mode only:
--driver-cores NUM Number of cores used by the driver, only in cluster mode
(Default: 1).
Spark standalone or Mesos with cluster deploy mode only:
--supervise If given, restarts the driver on failure.
Spark standalone, Mesos or K8s with cluster deploy mode only:
--kill SUBMISSION_ID If given, kills the driver specified.
--status SUBMISSION_ID If given, requests the status of the driver specified.
Spark standalone, Mesos and Kubernetes only:
--total-executor-cores NUM Total cores for all executors.
Spark standalone, YARN and Kubernetes only:
--executor-cores NUM Number of cores used by each executor. (Default: 1 in
YARN and K8S modes, or all available cores on the worker
in standalone mode).
Spark on YARN and Kubernetes only:
--num-executors NUM Number of executors to launch (Default: 2).
If dynamic allocation is enabled, the initial number of
executors will be at least NUM.
--principal PRINCIPAL Principal to be used to login to KDC.
--keytab KEYTAB The full path to the file that contains the keytab for the
principal specified above.
Spark on YARN only:
--queue QUEUE_NAME The YARN queue to submit to (Default: "default").
Traceback (most recent call last):
File "/Users/janikbischoff/Documents/Uni/PuL/BA/Code/Tests/spark-test.py", line 6, in <module>
spark = SparkSession.builder.appName("fuel-level").master("local[*]").getOrCreate()
File "/Users/janikbischoff/Library/Python/3.8/lib/python/site-packages/pyspark/sql/session.py", line 228, in getOrCreate
sc = SparkContext.getOrCreate(sparkConf)
File "/Users/janikbischoff/Library/Python/3.8/lib/python/site-packages/pyspark/context.py", line 392, in getOrCreate
SparkContext(conf=conf or SparkConf())
File "/Users/janikbischoff/Library/Python/3.8/lib/python/site-packages/pyspark/context.py", line 144, in __init__
SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
File "/Users/janikbischoff/Library/Python/3.8/lib/python/site-packages/pyspark/context.py", line 339, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway(conf)
File "/Users/janikbischoff/Library/Python/3.8/lib/python/site-packages/pyspark/java_gateway.py", line 108, in launch_gateway
raise RuntimeError("Java gateway process exited before sending its port number")
RuntimeError: Java gateway process exited before sending its port number

Missing application resource
This implies you're running the code using python rather than spark-submit
I was able to reproduce the error by copying your environment, as well as using findspark, it seems PYSPARK_SUBMIT_ARGS aren't working in that container, even though the variable does get loaded...
The workaround would be to pass the argument at execution time.
spark-submit \
--packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.2.0 \
script.py

Related

Apache Nifi (on docker): only one of the HTTP and HTTPS connectors can be configured at one time error

Have a problem adding authentication due to a new needs while using Apache NiFi (NiFi) without SSL processing it in a container.
The image version is apache/nifi:1.13.0
It's said that SSL is unconditionally required to add authentication. It's recommended to use tls-toolkit in the NiFi image to add SSL. Worked on the following process:
Except for environment variable nifi.web.http.port for HTTP communication, and executed up the standalone mode container with nifi.web.https.port=9443
docker-compose up
Joined to the container and run the tls-toolkit script in the nifi-toolkit.
cd /opt/nifi/nifi-toolkit-1.13.0/bin &&\
sh tls-toolkit.sh standalone \
-n 'localhost' \
-C 'CN=yangeok,OU=nifi' \
-O -o $NIFI_HOME/conf
Attempt 1
Organized files in directory $NIFI_HOME/conf. Three files keystore.jks, truststore.jsk, and nifi.properties were created in folder localhost that entered the value of the option -n of the tls-toolkit script.
cd $NIFI_HOME/conf &&
cp localhost/*.jks .
The file $NIFI_HOME/conf/localhost/nifi.properties was not overwritten as it is, but only the following properties were imported as a file $NIFI_HOME/conf/nifi.properties:
nifi.web.http.host=
nifi.web.http.port=
nifiweb.https.host=localhost
nifiweb.https.port=9443
Restarted container
docker-compose restart
The container died with below error log:
Only one of the HTTP and HTTPS connectors can be configured at one time
Attempt 2
After executing the tls-toolkit script, all files a were overwritten, including file nifi.properties
cd $NIFI_HOME/conf &&
cp localhost/* .
Restarted container
docker-compose restart
The container died with the same error log
Hint
The dead container volume was also accessible, so copied and checked file nifi.properties, and when did docker-compose up or restart, it changed as follows:
The part I overwritten or modified:
nifi.web.http.host=
nifi.web.http.port=
nifi.web.http.network.interface.default=
#############################################
nifi.web.https.host=localhost
nifi.web.https.port=9443
The changed part after re-executing the container:
nifi.web.http.host=a8e283ab9421
nifi.web.http.port=9443
nifi.web.http.network.interface.default=
#############################################
nifi.web.https.host=a8e283ab9421
nifi.web.https.port=9443
I'd like to know how to execute the container with http.host, http.port empty. docker-compose.yml file is as follows:
version: '3'
services:
nifi:
build:
context: .
args:
NIFI_VERSION: ${NIFI_VERSION}
container_name: nifi
user: root
restart: unless-stopped
network_mode: bridge
ports:
- ${NIFI_HTTP_PORT}:8080/tcp
- ${NIFI_HTTPS_PORT}:9443/tcp
volumes:
- ./drivers:/opt/nifi/nifi-current/drivers
- ./templates:/opt/nifi/nifi-current/templates
- ./data:/opt/nifi/nifi-current/data
environment:
TZ: 'Asia/Seoul'
########## JVM ##########
NIFI_JVM_HEAP_INIT: ${NIFI_HEAP_INIT} # The initial JVM heap size.
NIFI_JVM_HEAP_MAX: ${NIFI_HEAP_MAX} # The maximum JVM heap size.
########## Web ##########
# NIFI_WEB_HTTP_HOST: ${NIFI_HTTP_HOST} # nifi.web.http.host
# NIFI_WEB_HTTP_PORT: ${NIFI_HTTP_PORT} # nifi.web.http.port
NIFI_WEB_HTTPS_HOST: ${NIFI_HTTPS_HOST} # nifi.web.https.host
NIFI_WEB_HTTP_PORT: ${NIFI_HTTPS_PORT} # nifi.web.https.port
Thank you

Monitor ksqlDB with JMX

I have followed the instructions on this page: https://docs.ksqldb.io/en/latest/operate-and-deploy/monitoring/
So this is my ksqldb-server part of docker-compose:
ksqldb-server:
image: confluentinc/ksqldb-server:0.15.0
hostname: ksqldb-server
container_name: ksqldb-server
depends_on:
- kafka
- schema-registry
- kafka-connect
ports:
- "8088:8088"
- "1099:1099"
environment:
KSQL_LISTENERS: http://0.0.0.0:8088
KSQL_BOOTSTRAP_SERVERS: kafka:29092
KSQL_KSQL_LOGGING_PROCESSING_STREAM_AUTO_CREATE: "true"
KSQL_KSQL_LOGGING_PROCESSING_TOPIC_AUTO_CREATE: "true"
KSQL_KSQL_SCHEMA_REGISTRY_URL: http://schema-registry:8081
KSQL_KSQL_CONNECT_URL: http://kafka-connect:8083
KSQL_KSQL_QUERY_PULL_METRICS_ENABLED: "true"
KSQL_JMX_OPTS: >
-Djava.rmi.server.hostname=localhost
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=1099
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.rmi.port=1099
I have setup Prometheus in the same docker-compose file, and when I visit {prometheus-url}/targets, I see Get "http://ksqldb-server:1099/metrics": EOF
I have already tried plenty configurations during my research, including changing the -Djava.rmi.server.hostname either to the host's IP address or to ksqldb-server's container IP address, but none of them worked. Does anyone have a solution?
Well, six months later after having dealt with this topic once again, I managed to set this up. This follows the approach suggested by swist in my GitHub issue I created back then when I created this issue, too.
You need JMX Exporter. Download it here
You need a YAML file, telling the JMX exporter which metrics to export. You can get it here. If you are only interested in the ksqlDB metrics, remove all other patterns, e.g. the kafka patterns.
Place the JMX Exporter and the YAML file on every node on which you want to monitor a ksqlDB instance
Before starting ksqlDB, create the environment variable KSQL_JMX_OPTS as follows:
export KSQL_JMX_OPTS="-Dcom.sun.management.jmxremote \
-Dcom.sun.management.jmxremote.authenticate=false \
-Dcom.sun.management.jmxremote.ssl=false \
-Djava.util.logging.config.file=logging.properties \
-javaagent:[BLUB]/jmx_prometheus_javaagent.jar=7010:ksqldb.yml"
You need to either create this variable every time you have a new session or create it permantently. [BLUB] is the absolute path to your JMX JAR.
Now you can run ksqlDB and the metrics become available at port 7010 (you can specify any other free port). If you want to have a good dashboard, go with this one.
The jmxremote.port value is also not a proper Prometheus target; it's for jconsole, Visualvm, or other JMX monitoring tools, as the documentation you've linked to says
If you want to use Prometheus, you need to download and mount the JMX exporter agent JAR into the container and modify the JVM arguments to include the agent+scraper port+mbeans config file...
You could also switch to using minikube and apply the Confluent ksqlDB Helm Chart, which does this for you

I cannot use --package option on bitnami/spark docker container

I pulled docker image and executed below command to run image.
docker run -it bitnami/spark:latest /bin/bash
spark-shell --packages="org.elasticsearch:elasticsearch-spark-20_2.11:7.5.0"
and i got message like below
Ivy Default Cache set to: /opt/bitnami/spark/.ivy2/cache
The jars for the packages stored in: /opt/bitnami/spark/.ivy2/jars
:: loading settings :: url = jar:file:/opt/bitnami/spark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
org.elasticsearch#elasticsearch-spark-20_2.11 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-c785f3e6-7c78-469f-ab46-451f8be61a4c;1.0
confs: [default]
Exception in thread "main" java.io.FileNotFoundException: /opt/bitnami/spark/.ivy2/cache/resolved-org.apache.spark-spark-submit-parent-c785f3e6-7c78-469f-ab46-451f8be61a4c-1.0.xml (No such file or directory)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at java.io.FileOutputStream.<init>(FileOutputStream.java:162)
at org.apache.ivy.plugins.parser.xml.XmlModuleDescriptorWriter.write(XmlModuleDescriptorWriter.java:70)
at org.apache.ivy.plugins.parser.xml.XmlModuleDescriptorWriter.write(XmlModuleDescriptorWriter.java:62)
at org.apache.ivy.core.module.descriptor.DefaultModuleDescriptor.toIvyFile(DefaultModuleDescriptor.java:563)
at org.apache.ivy.core.cache.DefaultResolutionCacheManager.saveResolvedModuleDescriptor(DefaultResolutionCacheManager.java:176)
at org.apache.ivy.core.resolve.ResolveEngine.resolve(ResolveEngine.java:245)
at org.apache.ivy.Ivy.resolve(Ivy.java:523)
at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1300)
at org.apache.spark.deploy.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:54)
at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:304)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:774)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
I tried other package, but it is not working with all same error message.
Can you give some advice to avoid this error?
Found the solution to it
as given in https://github.com/bitnami/bitnami-docker-spark/issues/7
what we have to do is create a volume on host mapped to docker path
volumes:
- ./jars_dir:/opt/bitnami/spark/ivy:z
give this path as cache path like this
spark-shell --conf spark.jars.ivy=/opt/bitnami/spark/ivy --conf
spark.cassandra.connection.host=127.0.0.1 --packages
com.datastax.spark:spark-cassandra-connector_2.12:3.0.0-beta --conf
spark.sql.extensions=com.datastax.spark.connector.CassandraSparkExtensions
All happened because /opt/bitnami/spark is not writable and we have to mount a volume to bypass that.
The error "java.io.FileNotFoundException: /opt/bitnami/spark/.ivy2/" occured because the location /opt/bitnami/spark/ is not writable. so in order to resolve this issue do modify the master spark service like this.
Added user as root and add mounted volume path for required jars.
see the working block of spark service written in docker compose:
spark:
image: docker.io/bitnami/spark:3
container_name: spark
environment:
- SPARK_MODE=master
- SPARK_RPC_AUTHENTICATION_ENABLED=no
- SPARK_RPC_ENCRYPTION_ENABLED=no
- SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
- SPARK_SSL_ENABLED=no
user: root
ports:
- '8880:8080'
volumes:
- ./spark-defaults.conf:/opt/bitnami/spark/conf/spark-defaults.conf
- ./jars_dir:/opt/bitnami/spark/ivy:z

Local Vault using docker-compose

I'm having big trouble running Vault in docker-compose.
My requirements are :
running as deamon (so restarting when I restart my Mac)
secret being persisted between container restart
no human intervention between restart (unsealing, etc.)
using a generic token
My current docker-compose
version: '2.3'
services:
vault-dev:
image: vault:1.2.1
restart: always
container_name: vault-dev
environment:
VAULT_DEV_ROOT_TOKEN_ID: "myroot"
VAULT_LOCAL_CONFIG: '{"backend": {"file": {"path": "/vault/file"}}, "default_lease_ttl": "168h", "max_lease_ttl": "720h"}'
ports:
- "8200:8200"
volumes:
- ./storagedc/vault/file:/vault/file
However, when the container restart, I get the log
==> Vault server configuration:
Api Address: http://0.0.0.0:8200
Cgo: disabled
Cluster Address: https://0.0.0.0:8201
Listener 1: tcp (addr: "0.0.0.0:8200", cluster address: "0.0.0.0:8201", max_request_duration: "1m30s", max_request_size: "33554432", tls: "disabled")
Log Level: info
Mlock: supported: true, enabled: false
Storage: file
Version: Vault v1.2.1
Error initializing Dev mode: Vault is already initialized
Is there any recommendation on that matter?
I'm going to pseudo-code an answer to work around the problems specified, but please note that this is a massive hack and should NEVER be deployed in production as a hard-coded master key and single unseal key is COLOSSALLY INSECURE.
So, you want a test vault server, with persistence.
You can accomplish this, it will need a little bit of work because of the default behavior of the vault container - if you just start it, it will start with a dev mode container, which won't allow for persistence. Just adding persistence via the environment variable won't solve that problem entirely because it will conflict with the default start mode of the container.
so we need to replace this entrypoint script with something that does what we want it to do instead.
First we copy the script out of the container:
$ docker create --name vault vault:1.2.1
$ docker cp vault:/usr/local/bin/docker-entrypoint.sh .
$ docker rm vault
For simplicity, we're going to edit the file and mount it into the container using the docker-compose file. I'm not going to make it really functional - just enough to get it to do what's desired. The entire point here is sample, not something that is usable in production.
My customizations all start at about line 98 - first we launch a dev-mode server in order to record the unseal key, then we terminate the dev mode server.
# Here's my customization:
if [ ! -f /vault/unseal/sealfile ]; then
# start in dev mode, in the background to record the unseal key
su-exec vault vault server \
-dev -config=/vault/config \
-dev-root-token-id="$VAULT_DEV_ROOT_TOKEN_ID" \
2>&1 | tee /vault/unseal/sealfile &
while ! grep -q 'core: vault is unsealed' /vault/unseal/sealfile; do
sleep 1
done
kill %1
fi
Next we check for supplemental config. This is where the extra config goes for disabling TLS, and for binding the appropriate interface.
if [ -n "$VAULT_SUPPLEMENTAL_CONFIG" ]; then
echo "$VAULT_SUPPLEMENTAL_CONFIG" > "$VAULT_CONFIG_DIR/supplemental.json"
fi
Then we launch vault in 'release' mode:
if [ "$(id -u)" = '0' ]; then
set -- su-exec vault "$#"
"$#"&
Then we get the unseal key from the sealfile:
unseal=$(sed -n 's/Unseal Key: //p' /vault/unseal/sealfile)
if [ -n "$unseal" ]; then
while ! vault operator unseal "$unseal"; do
sleep 1
done
fi
We just wait for the process to terminate:
wait
exit $?
fi
There's a full gist for this on github.
Now the docker-compose.yml for doing this is slightly different to your own:
version: '2.3'
services:
vault-dev:
image: vault:1.2.1
restart: always
container_name: vault-dev
command: [ 'vault', 'server', '-config=/vault/config' ]
environment:
VAULT_DEV_ROOT_TOKEN_ID: "myroot"
VAULT_LOCAL_CONFIG: '{"backend": {"file": {"path": "/vault/file"}}, "default_lease_ttl": "168h", "max_lease_ttl": "720h"}'
VAULT_SUPPLEMENTAL_CONFIG: '{"ui":true, "listener": {"tcp":{"address": "0.0.0.0:8200", "tls_disable": 1}}}'
VAULT_ADDR: "http://127.0.0.1:8200"
ports:
- "8200:8200"
volumes:
- ./vault:/vault/file
- ./unseal:/vault/unseal
- ./docker-entrypoint.sh:/usr/local/bin/docker-entrypoint.sh
cap_add:
- IPC_LOCK
The command is the command to execute. This is what's in the "$#"& of the script changes.
I've added VAULT_SUPPLEMENTAL_CONFIG for the non-dev run. It needs to specify the interfaces, it needs to turn of tls. I added the ui, so I can access it using http://127.0.0.1:8200/ui. This is part of the changes I made to the script.
Because this is all local, for me, test purposes, I'm mounting ./vault as the data directory, I'm mounting ./unseal as the place to record the unseal code and mounting ./docker-entrypoint.sh as the entrypoint script.
I can docker-compose up this and it launches a persistent vault - there are some errors on the log as I try to unseal before the server has launched, but it works, and persists across multiple docker-compose runs.
Again, to mention that this is completely unsuitable for any form of long-term use. You're better off using docker's own secrets engine if you're doing things like this.
I'd like to suggest a simpler solution for local development with docker-compose.
Vault is always unsealed
Vault UI is enabled and accessible at http://localhost:8200/ui/vault on your dev machine
Vault has predefined root token which can be used by services to communicate with it
docker-compose.yml
vault:
hostname: vault
container_name: vault
image: vault:1.12.0
environment:
VAULT_ADDR: "http://0.0.0.0:8200"
VAULT_API_ADDR: "http://0.0.0.0:8200"
ports:
- "8200:8200"
volumes:
- ./volumes/vault/file:/vault/file:rw
cap_add:
- IPC_LOCK
entrypoint: vault server -dev -dev-listen-address="0.0.0.0:8200" -dev-root-token-id="root"

Elastic search TestContainers Timed out waiting for URL to be accessible in Docker

Local env:
MacOS 10.14.6
Docker Desktop 2.0.1.2
Docker Engine 19.03.2
Compose Engine 1.24.1
Test containers 1.12.1
I'm using Elastic search in an app, and I want to be able to use TestContainers in my integration tests. Sample code in a Play Framework app that uses ElasticSearch testcontainer:
#BeforeAll
public static void setup() {
private static final ElasticsearchContainer ES = new ElasticsearchContainer();
ES.start();
}
This works when testing locally, but I want to be able to run this inside a Docker container to run on my CI server. I'm getting this exception when running the tests inside the Docker container:
[warn] o.t.u.RegistryAuthLocator - Failure when attempting to lookup auth config (dockerImageName: alpine:3.5, configFile: /root/.docker/config.json. Falling back to docker-java default behaviour. Exception message: /root/.docker/config.json (No such file or directory)
[warn] o.t.u.RegistryAuthLocator - Failure when attempting to lookup auth config (dockerImageName: quay.io/testcontainers/ryuk:0.2.3, configFile: /root/.docker/config.json. Falling back to docker-java default behaviour. Exception message: /root/.docker/config.json (No such file or directory)
?? Checking the system...
? Docker version should be at least 1.6.0
? Docker environment should have more than 2GB free disk space
[warn] o.t.u.RegistryAuthLocator - Failure when attempting to lookup auth config (dockerImageName: docker.elastic.co/elasticsearch/elasticsearch:7.1.1, configFile: /root/.docker/config.json. Falling back to docker-java default behaviour. Exception message: /root/.docker/config.json (No such file or directory)
[error] d.e.c.1.1] - Could not start container
org.testcontainers.containers.ContainerLaunchException: Timed out waiting for URL to be accessible (http://172.17.0.1:32911/ should return HTTP [200])
at org.testcontainers.containers.wait.strategy.HttpWaitStrategy.waitUntilReady(HttpWaitStrategy.java:197)
at org.testcontainers.containers.wait.strategy.AbstractWaitStrategy.waitUntilReady(AbstractWaitStrategy.java:35)
at org.testcontainers.containers.GenericContainer.waitUntilContainerStarted(GenericContainer.java:675)
at org.testcontainers.containers.GenericContainer.tryStart(GenericContainer.java:332)
at org.testcontainers.containers.GenericContainer.lambda$doStart$0(GenericContainer.java:285)
at org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:81)
at org.testcontainers.containers.GenericContainer.doStart(GenericContainer.java:283)
at org.testcontainers.containers.GenericContainer.start(GenericContainer.java:272)
at controllers.HomeControllerTest.setup(HomeControllerTest.java:56)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
I've read the instructions here: https://www.testcontainers.org/supported_docker_environment/continuous_integration/dind_patterns/
So my docker-compose.yml looks like (note: I've been testing with another ES container as seen commented out below, but I have not been using it with this test)($INSTANCE is a random 16 char string for a particular test run):
version: '3'
services:
# elasticsearch:
# container_name: elasticsearch_${INSTANCE}
# image: docker.elastic.co/elasticsearch/elasticsearch:6.7.2
# ports:
# - 9200:9200
# - 9300:9300
# command: elasticsearch -E transport.host=0.0.0.0
# logging:
# driver: 'none'
# environment:
# ES_JAVA_OPTS: "-Xms750m -Xmx750m"
mainapp:
container_name: mainapp_${INSTANCE}
image: test_image:${INSTANCE}
stop_signal: SIGKILL
stdin_open: true
tty: true
working_dir: $PWD
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- $PWD:$PWD
environment:
ES_JAVA_OPTS: "-Xms1G -Xmx1G"
command: /bin/bash /projectfolder/build/tests/wrapper.sh
I've also tried running my tests with this command but received the same error:
docker run -it --rm -v $PWD:$PWD -w $PWD -v /var/run/docker.sock:/var/run/docker.sock test_image:68F75D8FD4C7003772C7E52B87B774F5 /bin/bash /testproject/build/tests/wrapper.sh
I tried creating a postgres container the same way inside my testing container and had no issues. I've also tried making a GenericContainer with the Elasticsearch image with no luck.
I don't think this is a connection issue because if I run curl 172.17.0.1:{port printed to test console} from inside my test container, I do get a valid elastic search response with status code 200, so it almost seems like its timing out trying to connect even though the connection is there.
Thanks.

Resources