Filebeat is not sending logs to logstash on kubernetes - docker

I'm trying to send kubernetes' logs with Filebeat and Logstash. I do have some deployment on the same namespace.
I tried the suggested configuration for filebeat.yml from elastic in this [link].(https://raw.githubusercontent.com/elastic/beats/7.x/deploy/kubernetes/filebeat-kubernetes.yaml)
So, this is my overall configuration:
filebeat.yml
filebeat.inputs:
- type: container
paths:
- '/var/lib/docker/containers/*.log'
processors:
- add_kubernetes_metadata:
host: ${NODE_NAME}
matchers:
- logs_path:
logs_path: "/var/log/containers/"
# To enable hints based autodiscover, remove `filebeat.inputs` configuration and uncomment this:
#filebeat.autodiscover:
# providers:
# - type: kubernetes
# node: ${NODE_NAME}
# hints.enabled: true
# hints.default_config:
# type: container
# paths:
# - /var/log/containers/*${data.kubernetes.container.id}.log
output.logstash:
hosts: ['logstash.default.svc.cluster.local:5044']
Logstash Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: logstash-deployment
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: logstash
template:
metadata:
labels:
app: logstash
spec:
containers:
- name: logstash
image: docker.elastic.co/logstash/logstash:7.15.0
ports:
- containerPort: 5044
volumeMounts:
- name: config-volume
mountPath: /usr/share/logstash/config
- name: logstash-pipeline-volume
mountPath: /usr/share/logstash/pipeline
volumes:
- name: config-volume
configMap:
name: logstash-configmap
items:
- key: logstash.yml
path: logstash.yml
- name: logstash-pipeline-volume
configMap:
name: logstash-configmap
items:
- key: logstash.conf
path: logstash.conf
Logstash Configmap
apiVersion: v1
kind: ConfigMap
metadata:
name: logstash-configmap
namespace: default
data:
logstash.yml: |
http.host: "0.0.0.0"
path.config: /usr/share/logstash/pipeline
logstash.conf: |
input {
beats {
port => 5044
}
}
output {
tcp {
mode => "client"
host => "10.184.0.4"
port => 5001
codec => "json_lines"
}
stdout {
codec => rubydebug
}
}
Logstash Service
kind: Service
apiVersion: v1
metadata:
name: logstash
namespace: default
spec:
selector:
app: logstash
ports:
- protocol: TCP
port: 5044
targetPort: 5044
Filebeat daemonset are running, also the Logstash deployment. Both of them kubectl logs shows:
Filebeat daemonset shows
2021-10-13T04:10:14.201Z INFO instance/beat.go:665 Home path: [/usr/share/filebeat] Config path: [/usr/share/filebeat] Data path: [/usr/share/filebeat/data] Logs path: [/usr/share/filebeat/logs]
2021-10-13T04:10:14.219Z INFO instance/beat.go:673 Beat ID: b90d1561-e989-4ed1-88f9-9b88045cee29
2021-10-13T04:10:14.220Z INFO [seccomp] seccomp/seccomp.go:124 Syscall filter successfully installed
2021-10-13T04:10:14.220Z INFO [beat] instance/beat.go:1014 Beat info {"system_info": {"beat": {"path": {"config": "/usr/share/filebeat", "data": "/usr/share/filebeat/data", "home": "/usr/share/filebeat", "logs": "/usr/share/filebeat/logs"}, "type": "filebeat", "uuid": "b90d1561-e989-4ed1-88f9-9b88045cee29"}}}
2021-10-13T04:10:14.220Z INFO [beat] instance/beat.go:1023 Build info {"system_info": {"build": {"commit": "9023152025ec6251bc6b6c38009b309157f10f17", "libbeat": "7.15.0", "time": "2021-09-16T03:16:09.000Z", "version": "7.15.0"}}}
2021-10-13T04:10:14.220Z INFO [beat] instance/beat.go:1026 Go runtime info {"system_info": {"go": {"os":"linux","arch":"amd64","max_procs":2,"version":"go1.16.6"}}}
2021-10-13T04:10:14.221Z INFO [beat] instance/beat.go:1030 Host info {"system_info": {"host": {"architecture":"x86_64","boot_time":"2021-10-06T19:41:55Z","containerized":true,"name":"filebeat-hvqx4","ip":["127.0.0.1/8","10.116.6.42/24"],"kernel_version":"5.4.120+","mac":["ae:ab:28:37:27:2a"],"os":{"type":"linux","family":"redhat","platform":"centos","name":"CentOS Linux","version":"7 (Core)","major":7,"minor":9,"patch":2009,"codename":"Core"},"timezone":"UTC","timezone_offset_sec":0,"id":"38c2fd0d69ba05ae64d8a4d4fc156791"}}}
2021-10-13T04:10:14.221Z INFO [beat] instance/beat.go:1059 Process info {"system_info": {"process": {"capabilities": {"inheritable":["chown","dac_override","fowner","fsetid","kill","setgid","setuid","setpcap","net_bind_service","net_raw","sys_chroot","mknod","audit_write","setfcap"],"permitted":["chown","dac_override","fowner","fsetid","kill","setgid","setuid","setpcap","net_bind_service","net_raw","sys_chroot","mknod","audit_write","setfcap"],"effective":["chown","dac_override","fowner","fsetid","kill","setgid","setuid","setpcap","net_bind_service","net_raw","sys_chroot","mknod","audit_write","setfcap"],"bounding":["chown","dac_override","fowner","fsetid","kill","setgid","setuid","setpcap","net_bind_service","net_raw","sys_chroot","mknod","audit_write","setfcap"],"ambient":null}, "cwd": "/usr/share/filebeat", "exe": "/usr/share/filebeat/filebeat", "name": "filebeat", "pid": 8, "ppid": 1, "seccomp": {"mode":"filter","no_new_privs":true}, "start_time": "2021-10-13T04:10:12.819Z"}}}
2021-10-13T04:10:14.221Z INFO instance/beat.go:309 Setup Beat: filebeat; Version: 7.15.0
2021-10-13T04:10:14.222Z INFO [publisher] pipeline/module.go:113 Beat name: filebeat-hvqx4
2021-10-13T04:10:14.224Z WARN beater/filebeat.go:178 Filebeat is unable to load the Ingest Node pipelines for the configured modules because the Elasticsearch output is not configured/enabled. If you have already loaded the Ingest Node pipelines or are using Logstash pipelines, you can ignore this warning.
2021-10-13T04:10:14.225Z INFO [monitoring] log/log.go:142 Starting metrics logging every 30s
2021-10-13T04:10:14.225Z INFO instance/beat.go:473 filebeat start running.
2021-10-13T04:10:14.227Z INFO memlog/store.go:119 Loading data file of '/usr/share/filebeat/data/registry/filebeat' succeeded. Active transaction id=0
2021-10-13T04:10:14.227Z INFO memlog/store.go:124 Finished loading transaction log file for '/usr/share/filebeat/data/registry/filebeat'. Active transaction id=0
2021-10-13T04:10:14.227Z WARN beater/filebeat.go:381 Filebeat is unable to load the Ingest Node pipelines for the configured modules because the Elasticsearch output is not configured/enabled. If you have already loaded the Ingest Node pipelines or are using Logstash pipelines, you can ignore this warning.
2021-10-13T04:10:14.228Z INFO [registrar] registrar/registrar.go:109 States Loaded from registrar: 0
2021-10-13T04:10:14.228Z INFO [crawler] beater/crawler.go:71 Loading Inputs: 1
2021-10-13T04:10:14.228Z INFO beater/crawler.go:148 Stopping Crawler
2021-10-13T04:10:14.228Z INFO beater/crawler.go:158 Stopping 0 inputs
2021-10-13T04:10:14.228Z INFO beater/crawler.go:178 Crawler stopped
2021-10-13T04:10:14.228Z INFO [registrar] registrar/registrar.go:132 Stopping Registrar
2021-10-13T04:10:14.228Z INFO [registrar] registrar/registrar.go:166 Ending Registrar
2021-10-13T04:10:14.228Z INFO [registrar] registrar/registrar.go:137 Registrar stopped
2021-10-13T04:10:44.229Z INFO [monitoring] log/log.go:184 Non-zero metrics in the last 30s {"monitoring": {"metrics": {"beat":{"cgroup":{"cpu":{"cfs":{"period":{"us":100000}},"id":"/"},"cpuacct":{"id":"/","total":{"ns":307409530}},"memory":{"id":"/","mem":{"limit":{"bytes":209715200},"usage":{"bytes":52973568}}}},"cpu":{"system":{"ticks":80,"time":{"ms":85}},"total":{"ticks":270,"time":{"ms":283},"value":270},"user":{"ticks":190,"time":{"ms":198}}},"handles":{"limit":{"hard":1048576,"soft":1048576},"open":9},"info":{"ephemeral_id":"f5abb082-a094-4f99-a046-bc183d415455","uptime":{"ms":30208},"version":"7.15.0"},"memstats":{"gc_next":19502448,"memory_alloc":10052000,"memory_sys":75056136,"memory_total":55390312,"rss":112922624},"runtime":{"goroutines":12}},"filebeat":{"harvester":{"open_files":0,"running":0}},"libbeat":{"config":{"module":{"running":0}},"output":{"events":{"active":0},"type":"logstash"},"pipeline":{"clients":0,"events":{"active":0},"queue":{"max_events":4096}}},"registrar":{"states":{"current":0}},"system":{"cpu":{"cores":2},"load":{"1":0.14,"15":0.28,"5":0.31,"norm":{"1":0.07,"15":0.14,"5":0.155}}}}}}
Logtash deployment logs shows:
Using bundled JDK: /usr/share/logstash/jdk
warning: no jvm.options file found
Could not find log4j2 configuration at path /usr/share/logstash/config/log4j2.properties. Using default config which logs errors to the console
[INFO ] 2021-10-13 08:46:58.674 [main] runner - Starting Logstash {"logstash.version"=>"7.15.0", "jruby.version"=>"jruby 9.2.19.0 (2.5.8) 2021-06-15 55810c552b OpenJDK 64-Bit Server VM 11.0.11+9 on 11.0.11+9 +jit [linux-x86_64]"}
[INFO ] 2021-10-13 08:46:58.698 [main] writabledirectory - Creating directory {:setting=>"path.queue", :path=>"/usr/share/logstash/data/queue"}
[INFO ] 2021-10-13 08:46:58.700 [main] writabledirectory - Creating directory {:setting=>"path.dead_letter_queue", :path=>"/usr/share/logstash/data/dead_letter_queue"}
[WARN ] 2021-10-13 08:46:59.077 [LogStash::Runner] multilocal - Ignoring the 'pipelines.yml' file because modules or command line options are specified
[INFO ] 2021-10-13 08:46:59.097 [LogStash::Runner] agent - No persistent UUID file found. Generating new UUID {:uuid=>"7a0e5b89-70a1-4004-b38e-c31fadcd7251", :path=>"/usr/share/logstash/data/uuid"}
[INFO ] 2021-10-13 08:47:00.950 [Api Webserver] agent - Successfully started Logstash API endpoint {:port=>9600}
[INFO ] 2021-10-13 08:47:01.468 [Converge PipelineAction::Create<main>] Reflections - Reflections took 203 ms to scan 1 urls, producing 120 keys and 417 values
[WARN ] 2021-10-13 08:47:02.496 [Converge PipelineAction::Create<main>] plain - Relying on default value of `pipeline.ecs_compatibility`, which may change in a future major release of Logstash. To avoid unexpected changes when upgrading Logstash, please explicitly declare your desired ECS Compatibility mode.
[WARN ] 2021-10-13 08:47:02.526 [Converge PipelineAction::Create<main>] beats - Relying on default value of `pipeline.ecs_compatibility`, which may change in a future major release of Logstash. To avoid unexpected changes when upgrading Logstash, please explicitly declare your desired ECS Compatibility mode.
[WARN ] 2021-10-13 08:47:02.664 [Converge PipelineAction::Create<main>] jsonlines - Relying on default value of `pipeline.ecs_compatibility`, which may change in a future major release of Logstash. To avoid unexpected changes when upgrading Logstash, please explicitly declare your desired ECS Compatibility mode.
[INFO ] 2021-10-13 08:47:02.947 [[main]-pipeline-manager] javapipeline - Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>1, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50, "pipeline.max_inflight"=>125, "pipeline.sources"=>["/usr/share/logstash/pipeline/logstash.conf"], :thread=>"#<Thread:0x3b822f13#/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:125 run>"}
[INFO ] 2021-10-13 08:47:05.467 [[main]-pipeline-manager] javapipeline - Pipeline Java execution initialization time {"seconds"=>2.52}
[INFO ] 2021-10-13 08:47:05.473 [[main]-pipeline-manager] beats - Starting input listener {:address=>"0.0.0.0:5044"}
[INFO ] 2021-10-13 08:47:05.555 [[main]-pipeline-manager] javapipeline - Pipeline started {"pipeline.id"=>"main"}
[INFO ] 2021-10-13 08:47:05.588 [Agent thread] agent - Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[INFO ] 2021-10-13 08:47:05.907 [[main]<beats] Server - Starting server on port: 5044
So, my questions are:
Why is Filebeat not ingesting the logs from kubernetes?
Are there different ways to use hosts logstash on filebeat.yml? Because some examples are using DNS name just like my conf. when others are just using service names.
How to trigger/test logs to make sure my configuration is running
well?

My mistake, on filebeat environment I missed initiating the ENV node name. So, from the configuration above I just added
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
the filebeat running well now

Related

How do I start my jar application from a flink docker image inside Kubernetes?

I am trying to use my felipeogutierrez/explore-flink:1.11.1-scala_2.12 image available here into a kubernetes cluster configuration like it is saying here. I compile my project https://github.com/felipegutierrez/explore-flink with maven and I extend the default flink image flink:1.11.1-scala_2.12 with this Dockerfile:
FROM maven:3.6-jdk-8-slim AS builder
# get explore-flink job and compile it
COPY ./java/explore-flink /opt/explore-flink
WORKDIR /opt/explore-flink
RUN mvn clean install
FROM flink:1.11.1-scala_2.12
WORKDIR /opt/flink/usrlib
COPY --from=builder /opt/explore-flink/target/explore-flink.jar /opt/flink/usrlib/explore-flink.jar
ADD /opt/flink/usrlib/explore-flink.jar /opt/flink/usrlib/explore-flink.jar
#USER flink
then the tutorial 2 says to create the common cluster components:
kubectl create -f k8s/flink-configuration-configmap.yaml
kubectl create -f k8s/jobmanager-service.yaml
kubectl proxy
kubectl create -f k8s/jobmanager-rest-service.yaml
kubectl get svc flink-jobmanager-rest
and then create the jobmanager-job.yaml:
kubectl create -f k8s/jobmanager-job.yaml
I am getting a CrashLoopBackOff status error on the flink-jobmanager pod and the log says that it cannot find the class org.sense.flink.examples.stream.tpch.TPCHQuery03 in the flink-dist_2.12-1.11.1.jar:1.11.1 jar file. However, I want that kubernetes try also to look into the /opt/flink/usrlib/explore-flink.jar jar file. I am copying and adding this jar file on the Dockerfile of my image, but it seems that it is not working. What am I missing here? Below is my jobmanager-job.yaml file:
apiVersion: batch/v1
kind: Job
metadata:
name: flink-jobmanager
spec:
template:
metadata:
labels:
app: flink
component: jobmanager
spec:
restartPolicy: OnFailure
containers:
- name: jobmanager
image: felipeogutierrez/explore-flink:1.11.1-scala_2.12
imagePullPolicy: Always
env:
args: ["standalone-job", "--job-classname", "org.sense.flink.examples.stream.tpch.TPCHQuery03"]
ports:
- containerPort: 6123
name: rpc
- containerPort: 6124
name: blob-server
- containerPort: 8081
name: webui
livenessProbe:
tcpSocket:
port: 6123
initialDelaySeconds: 30
periodSeconds: 60
volumeMounts:
- name: flink-config-volume
mountPath: /opt/flink/conf
- name: job-artifacts-volume
mountPath: /opt/flink/usrlib
securityContext:
runAsUser: 9999 # refers to user _flink_ from official flink image, change if necessary
volumes:
- name: flink-config-volume
configMap:
name: flink-config
items:
- key: flink-conf.yaml
path: flink-conf.yaml
- key: log4j-console.properties
path: log4j-console.properties
- name: job-artifacts-volume
hostPath:
path: /host/path/to/job/artifacts
and my complete log file:
$ kubectl logs flink-jobmanager-qfkjl
Starting Job Manager
sed: couldn't open temporary file /opt/flink/conf/sedSg30ro: Read-only file system
sed: couldn't open temporary file /opt/flink/conf/sed1YrBco: Read-only file system
/docker-entrypoint.sh: 72: /docker-entrypoint.sh: cannot create /opt/flink/conf/flink-conf.yaml: Permission denied
/docker-entrypoint.sh: 91: /docker-entrypoint.sh: cannot create /opt/flink/conf/flink-conf.yaml.tmp: Read-only file system
Starting standalonejob as a console application on host flink-jobmanager-qfkjl.
2020-09-21 08:08:29,528 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - --------------------------------------------------------------------------------
2020-09-21 08:08:29,531 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Preconfiguration:
2020-09-21 08:08:29,532 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] -
JM_RESOURCE_PARAMS extraction logs:
jvm_params: -Xmx1073741824 -Xms1073741824 -XX:MaxMetaspaceSize=268435456
logs: INFO [] - Loading configuration property: jobmanager.rpc.address, flink-jobmanager
INFO [] - Loading configuration property: taskmanager.numberOfTaskSlots, 4
INFO [] - Loading configuration property: blob.server.port, 6124
INFO [] - Loading configuration property: jobmanager.rpc.port, 6123
INFO [] - Loading configuration property: taskmanager.rpc.port, 6122
INFO [] - Loading configuration property: queryable-state.proxy.ports, 6125
INFO [] - Loading configuration property: jobmanager.memory.process.size, 1600m
INFO [] - Loading configuration property: taskmanager.memory.process.size, 1728m
INFO [] - Loading configuration property: parallelism.default, 2
INFO [] - The derived from fraction jvm overhead memory (160.000mb (167772162 bytes)) is less than its min value 192.000mb (201326592 bytes), min value will be used instead
INFO [] - Final Master Memory configuration:
INFO [] - Total Process Memory: 1.563gb (1677721600 bytes)
INFO [] - Total Flink Memory: 1.125gb (1207959552 bytes)
INFO [] - JVM Heap: 1024.000mb (1073741824 bytes)
INFO [] - Off-heap: 128.000mb (134217728 bytes)
INFO [] - JVM Metaspace: 256.000mb (268435456 bytes)
INFO [] - JVM Overhead: 192.000mb (201326592 bytes)
2020-09-21 08:08:29,533 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - --------------------------------------------------------------------------------
2020-09-21 08:08:29,533 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Starting StandaloneApplicationClusterEntryPoint (Version: 1.11.1, Scala: 2.12, Rev:7eb514a, Date:2020-07-15T07:02:09+02:00)
2020-09-21 08:08:29,533 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - OS current user: flink
2020-09-21 08:08:29,533 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Current Hadoop/Kerberos user: <no hadoop dependency found>
2020-09-21 08:08:29,534 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.265-b01
2020-09-21 08:08:29,534 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Maximum heap size: 989 MiBytes
2020-09-21 08:08:29,534 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - JAVA_HOME: /usr/local/openjdk-8
2020-09-21 08:08:29,534 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - No Hadoop Dependency available
2020-09-21 08:08:29,534 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - JVM Options:
2020-09-21 08:08:29,534 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Xmx1073741824
2020-09-21 08:08:29,534 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Xms1073741824
2020-09-21 08:08:29,535 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -XX:MaxMetaspaceSize=268435456
2020-09-21 08:08:29,535 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Dlog.file=/opt/flink/log/flink--standalonejob-0-flink-jobmanager-qfkjl.log
2020-09-21 08:08:29,535 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Dlog4j.configuration=file:/opt/flink/conf/log4j-console.properties
2020-09-21 08:08:29,535 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Dlog4j.configurationFile=file:/opt/flink/conf/log4j-console.properties
2020-09-21 08:08:29,535 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Dlogback.configurationFile=file:/opt/flink/conf/logback-console.xml
2020-09-21 08:08:29,535 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Program Arguments:
2020-09-21 08:08:29,536 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - --configDir
2020-09-21 08:08:29,536 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - /opt/flink/conf
2020-09-21 08:08:29,536 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - --job-classname
2020-09-21 08:08:29,536 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - org.sense.flink.examples.stream.tpch.TPCHQuery03
2020-09-21 08:08:29,537 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Classpath: /opt/flink/lib/flink-csv-1.11.1.jar:/opt/flink/lib/flink-json-1.11.1.jar:/opt/flink/lib/flink-shaded-zookeeper-3.4.14.jar:/opt/flink/lib/flink-table-blink_2.12-1.11.1.jar:/opt/flink/lib/flink-table_2.12-1.11.1.jar:/opt/flink/lib/log4j-1.2-api-2.12.1.jar:/opt/flink/lib/log4j-api-2.12.1.jar:/opt/flink/lib/log4j-core-2.12.1.jar:/opt/flink/lib/log4j-slf4j-impl-2.12.1.jar:/opt/flink/lib/flink-dist_2.12-1.11.1.jar:::
2020-09-21 08:08:29,538 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - --------------------------------------------------------------------------------
2020-09-21 08:08:29,540 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Registered UNIX signal handlers for [TERM, HUP, INT]
2020-09-21 08:08:29,577 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Could not create application program.
org.apache.flink.util.FlinkException: Could not find the provided job class (org.sense.flink.examples.stream.tpch.TPCHQuery03) in the user lib directory (/opt/flink/usrlib).
at org.apache.flink.client.deployment.application.ClassPathPackagedProgramRetriever.getJobClassNameOrScanClassPath(ClassPathPackagedProgramRetriever.java:140) ~[flink-dist_2.12-1.11.1.jar:1.11.1]
at org.apache.flink.client.deployment.application.ClassPathPackagedProgramRetriever.getPackagedProgram(ClassPathPackagedProgramRetriever.java:123) ~[flink-dist_2.12-1.11.1.jar:1.11.1]
at org.apache.flink.container.entrypoint.StandaloneApplicationClusterEntryPoint.getPackagedProgram(StandaloneApplicationClusterEntryPoint.java:110) ~[flink-dist_2.12-1.11.1.jar:1.11.1]
at org.apache.flink.container.entrypoint.StandaloneApplicationClusterEntryPoint.main(StandaloneApplicationClusterEntryPoint.java:78) [flink-dist_2.12-1.11.1.jar:1.11.1]
I had two problems with my configurations. First the Dockerfile was not copying the explore-flink.jar to the right location. Second I did not need to mount the volume job-artifacts-volume on the Kubernetes file jobmanager-job.yaml. Here is my Dockerfile:
FROM maven:3.6-jdk-8-slim AS builder
# get explore-flink job and compile it
COPY ./java/explore-flink /opt/explore-flink
WORKDIR /opt/explore-flink
RUN mvn clean install
FROM flink:1.11.1-scala_2.12
WORKDIR /opt/flink/lib
COPY --from=builder --chown=flink:flink /opt/explore-flink/target/explore-flink.jar /opt/flink/lib/explore-flink.jar
and the jobmanager-job.yaml file:
apiVersion: batch/v1
kind: Job
metadata:
name: flink-jobmanager
spec:
template:
metadata:
labels:
app: flink
component: jobmanager
spec:
restartPolicy: OnFailure
containers:
- name: jobmanager
image: felipeogutierrez/explore-flink:1.11.1-scala_2.12
imagePullPolicy: Always
env:
#command: ["ls"]
args: ["standalone-job", "--job-classname", "org.sense.flink.App", "-app", "36"] #, <optional arguments>, <job arguments>] # optional arguments: ["--job-id", "<job id>", "--fromSavepoint", "/path/to/savepoint", "--allowNonRestoredState"]
#args: ["standalone-job", "--job-classname", "org.sense.flink.examples.stream.tpch.TPCHQuery03"] #, <optional arguments>, <job arguments>] # optional arguments: ["--job-id", "<job id>", "--fromSavepoint", "/path/to/savepoint", "--allowNonRestoredState"]
ports:
- containerPort: 6123
name: rpc
- containerPort: 6124
name: blob-server
- containerPort: 8081
name: webui
livenessProbe:
tcpSocket:
port: 6123
initialDelaySeconds: 30
periodSeconds: 60
volumeMounts:
- name: flink-config-volume
mountPath: /opt/flink/conf
securityContext:
runAsUser: 9999 # refers to user _flink_ from official flink image, change if necessary
volumes:
- name: flink-config-volume
configMap:
name: flink-config
items:
- key: flink-conf.yaml
path: flink-conf.yaml
- key: log4j-console.properties
path: log4j-console.properties

How to configure a RabbitMQ cluster in Kubernetes with a mounted persistent volume that will allow data to persist when the entire cluster restarts?

I am trying to setup a high-availability RabbitMQ cluster of nodes in my Kubernetes cluster as a StatefulSet so that my data (e.g. queues, messages) persist even after restarting all of the nodes simultaneously. Since I'm deploying the RabbitMQ nodes in Kubernetes, I understand that I need to include an external persistent volume for the nodes to store data in so that the data will persist after a restart. I have mounted an Azure Files Share into my containers as a volume at the directory /var/lib/rabbitmq/mnesia.
When starting with a fresh (empty) volume, the nodes start up without any issues and successfully form a cluster. I can open the RabbitMQ management UI and see that any queue I create is mirrored on all of the nodes, as expected, and the queue (plus any messages in it) will persist as long as there is at least 1 active node. Deleting pods with kubectl delete pod rabbitmq-0 -n rabbit will cause the node to stop and then restart, and the logs show that it successfully syncs with any remaining/active node so everything is fine.
The problem I have encountered is that when I simultaneously delete all RabbitMQ nodes in the cluster, the first node to start up will have the persisted data from the volume and tries to re-cluster with the other two nodes which are, of course, not active. What I expected to happen was that the node would start up, load the queue and message data, and then form a new cluster (since it should notice that no other nodes are active).
I suspect that there may be some data in the mounted volume that indicates the presence of other nodes which is why it tries to connect with them and join the supposed cluster, but I haven't found a way to prevent that and am not certain that this is the cause.
There are two different error messages: one in the pod description (kubectl describe pod rabbitmq-0 -n rabbit) when the RabbitMQ node is in a crash loop and another in the pod logs. The pod description error output includes the following:
exited with 137:
20:38:12.331 [error] Cookie file /var/lib/rabbitmq/.erlang.cookie must be accessible by owner only
Error: unable to perform an operation on node 'rabbit#rabbitmq-0.rabbitmq-internal.rabbit.svc.cluster.local'. Please see diagnostics information and suggestions below.
Most common reasons for this are:
* Target node is unreachable (e.g. due to hostname resolution, TCP connection or firewall issues)
* CLI tool fails to authenticate with the server (e.g. due to CLI tool's Erlang cookie not matching that of the server)
* Target node is not running
In addition to the diagnostics info below:
* See the CLI, clustering and networking guides on https://rabbitmq.com/documentation.html to learn more
* Consult server logs on node rabbit#rabbitmq-0.rabbitmq-internal.rabbit.svc.cluster.local
* If target node is configured to use long node names, don't forget to use --longnames with CLI tools
DIAGNOSTICS
===========
attempted to contact: ['rabbit#rabbitmq-0.rabbitmq-internal.rabbit.svc.cluster.local']
rabbit#rabbitmq-0.rabbitmq-internal.rabbit.svc.cluster.local:
* connected to epmd (port 4369) on rabbitmq-0.rabbitmq-internal.rabbit.svc.cluster.local
* epmd reports: node 'rabbit' not running at all
no other nodes on rabbitmq-0.rabbitmq-internal.rabbit.svc.cluster.local
* suggestion: start the node
Current node details:
* node name: 'rabbitmqcli-345-rabbit#rabbitmq-0.rabbitmq-internal.rabbit.svc.cluster.local'
* effective user's home directory: /var/lib/rabbitmq
* Erlang cookie hash: xxxxxxxxxxxxxxxxx
and the logs output the following info:
Config file(s): /etc/rabbitmq/rabbitmq.conf
Starting broker...2020-06-12 20:39:08.678 [info] <0.294.0>
node : rabbit#rabbitmq-0.rabbitmq-internal.rabbit.svc.cluster.local
home dir : /var/lib/rabbitmq
config file(s) : /etc/rabbitmq/rabbitmq.conf
cookie hash : xxxxxxxxxxxxxxxxx
log(s) : <stdout>
database dir : /var/lib/rabbitmq/mnesia/rabbit#rabbitmq-0.rabbitmq-internal.rabbit.svc.cluster.local
...
2020-06-12 20:48:39.015 [warning] <0.294.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,['rabbit#rabbitmq-2.rabbitmq-internal.rabbit.svc.cluster.local','rabbit#rabbitmq-1.rabbitmq-internal.rabbit.svc.cluster.local','rabbit#rabbitmq-0.rabbitmq-internal.rabbit.svc.cluster.local'],[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]}
2020-06-12 20:48:39.015 [info] <0.294.0> Waiting for Mnesia tables for 30000 ms, 0 retries left
2020-06-12 20:49:09.341 [info] <0.44.0> Application mnesia exited with reason: stopped
2020-06-12 20:49:09.505 [error] <0.294.0>
2020-06-12 20:49:09.505 [error] <0.294.0> BOOT FAILED
2020-06-12 20:49:09.505 [error] <0.294.0> ===========
2020-06-12 20:49:09.505 [error] <0.294.0> Timeout contacting cluster nodes: ['rabbit#rabbitmq-2.rabbitmq-internal.rabbit.svc.cluster.local',
2020-06-12 20:49:09.505 [error] <0.294.0> 'rabbit#rabbitmq-1.rabbitmq-internal.rabbit.svc.cluster.local'].
...
BACKGROUND
==========
This cluster node was shut down while other nodes were still running.
2020-06-12 20:49:09.506 [error] <0.294.0>
2020-06-12 20:49:09.506 [error] <0.294.0> This cluster node was shut down while other nodes were still running.
2020-06-12 20:49:09.506 [error] <0.294.0> To avoid losing data, you should start the other nodes first, then
2020-06-12 20:49:09.506 [error] <0.294.0> start this one. To force this node to start, first invoke
To avoid losing data, you should start the other nodes first, then
start this one. To force this node to start, first invoke
"rabbitmqctl force_boot". If you do so, any changes made on other
cluster nodes after this one was shut down may be lost.
What I've tried so far is clearing the /var/lib/rabbitmq/mnesia/rabbit#rabbitmq-0.rabbitmq-internal.rabbit.svc.cluster.local/nodes_running_at_shutdown file contents, and fiddling with config settings such as the volume mount directory and erlang cookie permissions.
Below are the relevant deployment files and config files:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: rabbitmq
namespace: rabbit
spec:
serviceName: rabbitmq-internal
revisionHistoryLimit: 3
updateStrategy:
type: RollingUpdate
replicas: 3
selector:
matchLabels:
app: rabbitmq
template:
metadata:
name: rabbitmq
labels:
app: rabbitmq
spec:
serviceAccountName: rabbitmq
terminationGracePeriodSeconds: 10
containers:
- name: rabbitmq
image: rabbitmq:0.13
lifecycle:
postStart:
exec:
command:
- /bin/sh
- -c
- >
until rabbitmqctl --erlang-cookie ${RABBITMQ_ERLANG_COOKIE} node_health_check; do sleep 1; done;
rabbitmqctl --erlang-cookie ${RABBITMQ_ERLANG_COOKIE} set_policy ha-all "" '{"ha-mode":"all", "ha-sync-mode": "automatic"}'
ports:
- containerPort: 4369
- containerPort: 5672
- containerPort: 5671
- containerPort: 25672
- containerPort: 15672
resources:
requests:
memory: "500Mi"
cpu: "0.4"
limits:
memory: "600Mi"
cpu: "0.6"
livenessProbe:
exec:
# Stage 2 check:
command: ["rabbitmq-diagnostics", "status", "--erlang-cookie", "$(RABBITMQ_ERLANG_COOKIE)"]
initialDelaySeconds: 60
periodSeconds: 60
timeoutSeconds: 15
readinessProbe:
exec:
# Stage 2 check:
command: ["rabbitmq-diagnostics", "status", "--erlang-cookie", "$(RABBITMQ_ERLANG_COOKIE)"]
initialDelaySeconds: 20
periodSeconds: 60
timeoutSeconds: 10
envFrom:
- configMapRef:
name: rabbitmq-cfg
env:
- name: HOSTNAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: RABBITMQ_USE_LONGNAME
value: "true"
- name: RABBITMQ_NODENAME
value: "rabbit#$(HOSTNAME).rabbitmq-internal.$(NAMESPACE).svc.cluster.local"
- name: K8S_SERVICE_NAME
value: "rabbitmq-internal"
- name: RABBITMQ_DEFAULT_USER
value: user
- name: RABBITMQ_DEFAULT_PASS
value: pass
- name: RABBITMQ_ERLANG_COOKIE
value: my-cookie
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
volumeMounts:
- name: my-volume-mount
mountPath: "/var/lib/rabbitmq/mnesia"
imagePullSecrets:
- name: my-secret
volumes:
- name: my-volume-mount
azureFile:
secretName: azure-rabbitmq-secret
shareName: my-fileshare-name
readOnly: false
---
apiVersion: v1
kind: ConfigMap
metadata:
name: rabbitmq-cfg
namespace: rabbit
data:
RABBITMQ_VM_MEMORY_HIGH_WATERMARK: "0.6"
---
kind: Service
apiVersion: v1
metadata:
namespace: rabbit
name: rabbitmq-internal
labels:
app: rabbitmq
spec:
clusterIP: None
ports:
- name: http
protocol: TCP
port: 15672
- name: amqp
protocol: TCP
port: 5672
- name: amqps
protocol: TCP
port: 5671
selector:
app: rabbitmq
---
kind: Service
apiVersion: v1
metadata:
namespace: rabbit
name: rabbitmq
labels:
app: rabbitmq
type: LoadBalancer
spec:
selector:
app: rabbitmq
ports:
- name: http
protocol: TCP
port: 15672
targetPort: 15672
- name: amqp
protocol: TCP
port: 5672
targetPort: 5672
- name: amqps
protocol: TCP
port: 5671
targetPort: 5671
Dockerfile:
FROM rabbitmq:3.8.4
COPY conf/rabbitmq.conf /etc/rabbitmq
COPY conf/enabled_plugins /etc/rabbitmq
USER root
COPY conf/.erlang.cookie /var/lib/rabbitmq
RUN /bin/bash -c 'ls -ld /var/lib/rabbitmq/.erlang.cookie; chmod 600 /var/lib/rabbitmq/.erlang.cookie; ls -ld /var/lib/rabbitmq/.erlang.cookie'
rabbitmq.conf
## cluster formation settings
cluster_formation.peer_discovery_backend = rabbit_peer_discovery_k8s
cluster_formation.k8s.host = kubernetes.default.svc.cluster.local
cluster_formation.k8s.address_type = hostname
cluster_formation.k8s.service_name = rabbitmq-internal
cluster_formation.k8s.hostname_suffix = .rabbitmq-internal.rabbit.svc.cluster.local
cluster_formation.node_cleanup.interval = 60
cluster_formation.node_cleanup.only_log_warning = true
cluster_partition_handling = autoheal
queue_master_locator=min-masters
## general settings
log.file.level = debug
## Mgmt UI secure/non-secure connection settings (secure not implemented yet)
management.tcp.port = 15672
## RabbitMQ entrypoint settings (will be injected below when image is built)
Thanks in advance!

Flink 1.5.4 is not registering Google Cloud Storage (GCS) filesystem in Kubernetes, although it works in docker container

Note: Baking keys into an image is the worst you can do, I did this here to have a binary equal filesystem between Docker and Kubernetes while debugging.
I am trying to start up a flink-jobmanager that persists its state in GCS, so I added a high-availability.storageDir: gs://BUCKET/ha line to my flink-conf.yaml and I am building my Dockerfile as described here
This is my Dockerfile:
FROM flink:1.5-hadoop28
ADD https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-latest-hadoop2.jar /opt/flink/lib/gcs-connector-latest-hadoop2.jar
RUN mkdir /opt/flink/etc-hadoop
COPY flink-conf.yaml /opt/flink/conf/flink-conf.yaml
COPY key.json /opt/flink/etc-hadoop/key.json
COPY core-site.xml /opt/flink/etc-hadoop/core-site.xml
Now if I build this container via docker build -t flink:dev . and start an interactive shell in it like docker run -ti flink:dev /bin/bash, I am able to start the flink jobmanager via:
flink-console.sh jobmanager --configDir=/opt/flink/conf/ --executionMode=cluster
Flink is picking up the jar's and starting normally. However, when I use the following yaml for starting it on Kubernetes, based on the one here:
apiVersion: apps/v1
kind: Deployment
metadata:
name: flink-jobmanager
spec:
replicas: 1
selector:
matchLabels:
app: flink
component: jobmanager
template:
metadata:
labels:
app: flink
component: jobmanager
spec:
containers:
- name: jobmanager
image: flink:dev
imagePullPolicy: Always
resources:
requests:
memory: "1024Mi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "1000m"
args: ["jobmanager"]
ports:
- containerPort: 6123
name: rpc
- containerPort: 6124
name: blob
- containerPort: 6125
name: query
- containerPort: 8081
name: ui
- containerPort: 46110
name: ha
env:
- name: GOOGLE_APPLICATION_CREDENTIALS
value: /opt/flink/etc-hadoop/key.json
- name: JOB_MANAGER_RPC_ADDRESS
value: flink-jobmanager
Flink seems to be unable to register the filesystem:
2018-10-04 09:20:51,357 DEBUG org.apache.flink.runtime.util.HadoopUtils - Cannot find hdfs-default configuration-file path in Flink config.
2018-10-04 09:20:51,358 DEBUG org.apache.flink.runtime.util.HadoopUtils - Cannot find hdfs-site configuration-file path in Flink config.
2018-10-04 09:20:51,359 DEBUG org.apache.flink.runtime.util.HadoopUtils - Adding /opt/flink/etc-hadoop//core-site.xml to hadoop configuration
2018-10-04 09:20:51,767 DEBUG org.apache.hadoop.security.UserGroupInformation - PrivilegedActionException as:flink (auth:SIMPLE) cause:java.io.IOException: Could not create FileSystem for highly available storage (high-availability.storageDir)
2018-10-04 09:20:51,767 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Cluster initialization failed.
java.io.IOException: Could not create FileSystem for highly available storage (high-availability.storageDir)
at org.apache.flink.runtime.blob.BlobUtils.createFileSystemBlobStore(BlobUtils.java:122)
at org.apache.flink.runtime.blob.BlobUtils.createBlobStoreFromConfig(BlobUtils.java:95)
at org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createHighAvailabilityServices(HighAvailabilityServicesUtils.java:115)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.createHaServices(ClusterEntrypoint.java:402)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.initializeServices(ClusterEntrypoint.java:270)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:225)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:189)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:188)
at org.apache.flink.runtime.entrypoint.StandaloneSessionClusterEntrypoint.main(StandaloneSessionClusterEntrypoint.java:91)
Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 'gs'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded.
at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:405)
at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:320)
at org.apache.flink.core.fs.Path.getFileSystem(Path.java:298)
at org.apache.flink.runtime.blob.BlobUtils.createFileSystemBlobStore(BlobUtils.java:119)
... 12 more
Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Hadoop File System abstraction does not support scheme 'gs'. Either no file system implementation exists for that scheme, or the relevant classes are missing from the classpath.
at org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(HadoopFsFactory.java:102)
at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:401)
... 15 more
Caused by: java.io.IOException: No FileSystem for scheme: gs
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2799)
at org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(HadoopFsFactory.java:99)
... 16 more
As Kubernetes should be using the same image, I am confused how this is possible. Am I overseeing something here?
The problem was using the dev tag. Using specific version tags fixed the issue.

activemq stateful set kubernetes pods failing to start

I have this weird error plaguing me.
I am trying to get an activemq pod running with a kubernetes stateful set, volume attached.
The activemq is just a plain old vanila docker image, picked it from here https://hub.docker.com/r/rmohr/activemq/
INFO | Refreshing org.apache.activemq.xbean.XBeanBrokerFactory$1#3fee9989: startup date [Thu Aug 23 22:12:07 GMT 2018]; root of context hierarchy
INFO | Using Persistence Adapter: KahaDBPersistenceAdapter[/opt/activemq/data/kahadb]
INFO | KahaDB is version 6
INFO | PListStore:[/opt/activemq/data/localhost/tmp_storage] started
INFO | Apache ActiveMQ 5.15.4 (localhost, ID:activemq-0-43279-1535062328969-0:1) is starting
INFO | Listening for connections at: tcp://activemq-0:61616?maximumConnections=1000&wireFormat.maxFrameSize=104857600
INFO | Connector openwire started
INFO | Listening for connections at: amqp://activemq-0:5672?maximumConnections=1000&wireFormat.maxFrameSize=104857600
INFO | Connector amqp started
INFO | Listening for connections at: stomp://activemq-0:61613?maximumConnections=1000&wireFormat.maxFrameSize=104857600
INFO | Connector stomp started
INFO | Listening for connections at: mqtt://activemq-0:1883?maximumConnections=1000&wireFormat.maxFrameSize=104857600
INFO | Connector mqtt started
WARN | ServletContext#o.e.j.s.ServletContextHandler#65a15628{/,null,STARTING} has uncovered http methods for path: /
INFO | Listening for connections at ws://activemq-0:61614?maximumConnections=1000&wireFormat.maxFrameSize=104857600
INFO | Connector ws started
INFO | Apache ActiveMQ 5.15.4 (localhost, ID:activemq-0-43279-1535062328969-0:1) started
INFO | For help or more information please see: http://activemq.apache.org
WARN | Store limit is 102400 mb (current store usage is 6 mb). The data directory: /opt/activemq/data/kahadb only has 95468 mb of usable space. - resetting to maximum available disk space: 95468 mb
WARN | Failed startup of context o.e.j.w.WebAppContext#478ee483{/admin,file:/opt/apache-activemq-5.15.4/webapps/admin/,null}
java.lang.IllegalStateException: Parent for temp dir not configured correctly: writeable=false
at org.eclipse.jetty.webapp.WebInfConfiguration.makeTempDirectory(WebInfConfiguration.java:336)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.webapp.WebInfConfiguration.resolveTempDirectory(WebInfConfiguration.java:304)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.webapp.WebInfConfiguration.preConfigure(WebInfConfiguration.java:69)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.webapp.WebAppContext.preConfigure(WebAppContext.java:468)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:504)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:132)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:114)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:61)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:132)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:114)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:61)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.security.SecurityHandler.doStart(SecurityHandler.java:391)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.security.ConstraintSecurityHandler.doStart(ConstraintSecurityHandler.java:449)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:132)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:114)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:61)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:132)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.server.Server.start(Server.java:387)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:114)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:61)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.server.Server.doStart(Server.java:354)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606]
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)[:1.8.0_171]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)[:1.8.0_171]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)[:1.8.0_171]
at java.lang.reflect.Method.invoke(Method.java:498)[:1.8.0_171]
at org.springframework.util.MethodInvoker.invoke(MethodInvoker.java:265)[spring-core-4.3.17.RELEASE.jar:4.3.17.RELEASE]
at org.springframework.beans.factory.config.MethodInvokingBean.invokeWithTargetException(MethodInvokingBean.java:119)[spring-beans-4.3.17.RELEASE.jar:4.3.17.RELEASE]
at org.springframework.beans.factory.config.MethodInvokingFactoryBean.afterPropertiesSet(MethodInvokingFactoryBean.java:106)[spring-beans-4.3.17.RELEASE.jar:4.3.17.RELEASE]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1692)[spring-beans-4.3.17.RELEASE.jar:4.3.17.RELEASE]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1630)[spring-beans-4.3.17.RELEASE.jar:4.3.17.RELEASE]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:555)[spring-beans-4.3.17.RELEASE.jar:4.3.17.RELEASE]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:483)[spring-beans-4.3.17.RELEASE.jar:4.3.17.RELEASE]
at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:312)[spring-beans-4.3.17.RELEASE.jar:4.3.17.RELEASE]
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:230)[spring-beans-4.3.17.RELEASE.jar:4.3.17.RELEASE]
at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:308)[spring-beans-4.3.17.RELEASE.jar:4.3.17.RELEASE]
at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:197)[spring-beans-4.3.17.RELEASE.jar:4.3.17.RELEASE]
at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:742)[spring-beans-4.3.17.RELEASE.jar:4.3.17.RELEASE]
at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:867)[spring-context-4.3.17.RELEASE.jar:4.3.17.RELEASE]
at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:543)[spring-context-4.3.17.RELEASE.jar:4.3.17.RELEASE]
at org.apache.xbean.spring.context.ResourceXmlApplicationContext.<init>(ResourceXmlApplicationContext.java:64)[xbean-spring-4.2.jar:4.2]
at org.apache.xbean.spring.context.ResourceXmlApplicationContext.<init>(ResourceXmlApplicationContext.java:52)[xbean-spring-4.2.jar:4.2]
at org.apache.activemq.xbean.XBeanBrokerFactory$1.<init>(XBeanBrokerFactory.java:104)[activemq-spring-5.15.4.jar:5.15.4]
at org.apache.activemq.xbean.XBeanBrokerFactory.createApplicationContext(XBeanBrokerFactory.java:104)[activemq-spring-5.15.4.jar:5.15.4]
at org.apache.activemq.xbean.XBeanBrokerFactory.createBroker(XBeanBrokerFactory.java:67)[activemq-spring-5.15.4.jar:5.15.4]
at org.apache.activemq.broker.BrokerFactory.createBroker(BrokerFactory.java:71)[activemq-broker-5.15.4.jar:5.15.4]
at org.apache.activemq.broker.BrokerFactory.createBroker(BrokerFactory.java:54)[activemq-broker-5.15.4.jar:5.15.4]
at org.apache.activemq.console.command.StartCommand.runTask(StartCommand.java:87)[activemq-console-5.15.4.jar:5.15.4]
at org.apache.activemq.console.command.AbstractCommand.execute(AbstractCommand.java:63)[activemq-console-5.15.4.jar:5.15.4]
at org.apache.activemq.console.command.ShellCommand.runTask(ShellCommand.java:154)[activemq-console-5.15.4.jar:5.15.4]
at org.apache.activemq.console.command.AbstractCommand.execute(AbstractCommand.java:63)[activemq-console-5.15.4.jar:5.15.4]
at org.apache.activemq.console.command.ShellCommand.main(ShellCommand.java:104)[activemq-console-5.15.4.jar:5.15.4]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)[:1.8.0_171]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)[:1.8.0_171]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)[:1.8.0_171]
at java.lang.reflect.Method.invoke(Method.java:498)[:1.8.0_171]
at org.apache.activemq.console.Main.runTaskClass(Main.java:262)[activemq.jar:5.15.4]
at org.apache.activemq.console.Main.main(Main.java:115)[activemq.jar:5.15.4]
The kubernete activemq pod is running fine if we don't define it with stateful sets.
Below is the spec
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: activemq
namespace: dev
labels:
app: activemq
spec:
replicas: 1
serviceName: activemq-svc
selector:
matchLabels:
app: activemq
template:
metadata:
labels:
app: activemq
spec:
securityContext:
runAsUser: 1000
fsGroup: 2000
runAsNonRoot: false
containers:
- name: activemq
image: "mydocker/amq:latest"
imagePullPolicy: "Always"
ports:
- containerPort: 61616
name: port-61616
- containerPort: 8161
name: port-8161
volumeMounts:
- name: activemq-data
mountPath: "/opt/activemq/data"
restartPolicy: Always
imagePullSecrets:
- name: regsecret
tolerations:
- effect: NoExecute
key: appstype
operator: Equal
value: ibd-mq
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: appstype
operator: In
values:
- dev-mq
volumeClaimTemplates:
- metadata:
name: activemq-data
spec:
accessModes:
- ReadWriteOnce
storageClassName: "gp2-us-east-2a"
resources:
requests:
storage: 100Gi
WARN | Failed startup of context o.e.j.w.WebAppContext#478ee483{/admin,file:/opt/apache-activemq-5.15.4/webapps/admin/,null}
java.lang.IllegalStateException: Parent for temp dir not configured correctly: writeable=false
Unless you altered the activemq userid in your image, then that filesystem permission issue is caused by this stanza in your PodSpec:
spec:
securityContext:
runAsUser: 1000
fsGroup: 2000
runAsNonRoot: false
failing to match up with the userid configuration in rmohr/activemq:5.15.4:
$ docker run -it --entrypoint=/bin/bash rmohr/activemq:5.15.4 -c 'id -a'
uid=999(activemq) gid=999(activemq) groups=999(activemq)

OpenMapTiles docker doesn't start with previous configuration

I created a OpenMapTiles container:
using a volume for /data directory
using the image klokantech/openmaptiles-server:1.6.
The container started nicely. I downloaded the planet file. And the service was working fine.
As I am gonna push this to production: if the container dies, my orchestration system (Kubernetes) will restart it automatically and I want it to pick the previous configuration (so it doesn't need to download the planet file again or set any configuration).
So I killed my container and restart it using the same previous volume.
Problem: when my container was restarted, my restarted MapTiles didn't have the previous configuration and I got this error in the UI:
OpenMapTiles Server is designed to work with data downloaded from OpenMapTiles.com, the following files are unknown and will not be used:
osm-2018-04-09-v3.8-planet.mbtiles
Also, I the logs, it appeared:
/usr/lib/python2.7/dist-packages/supervisor/options.py:298: UserWarning: Supervisord is running as root and it is searching for its configuration file in default locations (including its current working directory); you probably want to specify a "-c" argument specifying an absolute path to a configuration file for improved security.
'Supervisord is running as root and it is searching '
2018-05-09 09:20:18,359 CRIT Supervisor running as root (no user in config file)
2018-05-09 09:20:18,359 INFO Included extra file "/etc/supervisor/conf.d/openmaptiles.conf" during parsing
2018-05-09 09:20:18,382 INFO Creating socket tcp://localhost:8081
2018-05-09 09:20:18,383 INFO Closing socket tcp://localhost:8081
2018-05-09 09:20:18,399 INFO RPC interface 'supervisor' initialized
2018-05-09 09:20:18,399 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2018-05-09 09:20:18,399 INFO supervisord started with pid 1
2018-05-09 09:20:19,402 INFO spawned: 'wizard' with pid 11
2018-05-09 09:20:19,405 INFO spawned: 'xvfb' with pid 12
2018-05-09 09:20:20,407 INFO success: wizard entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
2018-05-09 09:20:20,407 INFO success: xvfb entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
Starting OpenMapTiles Map Server (action: run)
Existing configuration found in /data/config.json
Data file "undefined" not found!
Starting installation...
Installation wizard started at http://:::80/
List of available downloads ready.
And I guess maybe its this undefined in the config the one that is causing problems:
Existing configuration found in /data/config.json
Data file "undefined" not found!
This is my config file:
root#maptiles-0:/# cat /data/config.json
{
"styles": {
"standard": [
"dark-matter",
"klokantech-basic",
"osm-bright",
"positron"
],
"custom": [],
"lang": "",
"langLatin": true,
"langAlts": true
},
"settings": {
"serve": {
"vector": true,
"raster": true,
"services": true,
"static": true
},
"raster": {
"format": "PNG_256",
"hidpi": 2,
"maxsize": 2048
},
"server": {
"title": "",
"redirect": "",
"domains": []
},
"memcache": {
"size": 23.5,
"servers": [
"localhost:11211"
]
}
}
Should i mount a new volume somewhere else? should I change my /data/config.json? I have no idea how to make it ok for it to be killed
I fixed this using the image klokantech/tileserver-gl:v2.3.1. With this image, you can download the vector tiles in form of MBTiles file OpenMapTiles Downloads
you can find the instructions here: https://openmaptiles.org/docs/host/tileserver-gl/
Also: I deployed it to kubernetes using the following StatefulSet:
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
labels:
name: maptiles
name: maptiles
spec:
replicas: 2
selector:
matchLabels:
name: maptiles
serviceName: maptiles
template:
metadata:
labels:
name: maptiles
spec:
containers:
- name: maptiles
command: ["/bin/sh"]
args:
- -c
- |
echo "[INFO] Startingcontainer"; if [ $(DOWNLOAD_MBTILES) = "true" ]; then
echo "[INFO] Download MBTILES_PLANET_URL";
rm /data/*
cd /data/
wget -q -c $(MBTILES_PLANET_URL)
echo "[INFO] Download finished";
fi; echo "[INFO] Start app in /usr/src/app"; cd /usr/src/app && npm install --production && /usr/src/app/run.sh;
env:
- name: MBTILES_PLANET_URL
value: 'https://openmaptiles.com/download/W...'
- name: DOWNLOAD_MBTILES
value: 'true'
livenessProbe:
failureThreshold: 120
httpGet:
path: /health
port: 80
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 5
ports:
- containerPort: 80
name: http
protocol: TCP
readinessProbe:
failureThreshold: 120
httpGet:
path: /health
port: 80
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 5
resources:
limits:
cpu: 500m
memory: 4Gi
requests:
cpu: 100m
memory: 2Gi
volumeMounts:
- mountPath: /data
name: maptiles
volumeClaimTemplates:
- metadata:
creationTimestamp: null
name: maptiles
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 60Gi
storageClassName: standard
I first deploy it with DOWNLOAD_MBTILES='true' and after I change it to DOWNLOAD_MBTILES='false' (so it doesnt clean up the map next time it is deployed).
I tested it and when it has DOWNLOAD_MBTILES='false', you can kill the containers and they start again in a minute or so.

Resources