Filebeat is not sending logs to logstash on kubernetes - docker
I'm trying to send kubernetes' logs with Filebeat and Logstash. I do have some deployment on the same namespace.
I tried the suggested configuration for filebeat.yml from elastic in this [link].(https://raw.githubusercontent.com/elastic/beats/7.x/deploy/kubernetes/filebeat-kubernetes.yaml)
So, this is my overall configuration:
filebeat.yml
filebeat.inputs:
- type: container
paths:
- '/var/lib/docker/containers/*.log'
processors:
- add_kubernetes_metadata:
host: ${NODE_NAME}
matchers:
- logs_path:
logs_path: "/var/log/containers/"
# To enable hints based autodiscover, remove `filebeat.inputs` configuration and uncomment this:
#filebeat.autodiscover:
# providers:
# - type: kubernetes
# node: ${NODE_NAME}
# hints.enabled: true
# hints.default_config:
# type: container
# paths:
# - /var/log/containers/*${data.kubernetes.container.id}.log
output.logstash:
hosts: ['logstash.default.svc.cluster.local:5044']
Logstash Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: logstash-deployment
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: logstash
template:
metadata:
labels:
app: logstash
spec:
containers:
- name: logstash
image: docker.elastic.co/logstash/logstash:7.15.0
ports:
- containerPort: 5044
volumeMounts:
- name: config-volume
mountPath: /usr/share/logstash/config
- name: logstash-pipeline-volume
mountPath: /usr/share/logstash/pipeline
volumes:
- name: config-volume
configMap:
name: logstash-configmap
items:
- key: logstash.yml
path: logstash.yml
- name: logstash-pipeline-volume
configMap:
name: logstash-configmap
items:
- key: logstash.conf
path: logstash.conf
Logstash Configmap
apiVersion: v1
kind: ConfigMap
metadata:
name: logstash-configmap
namespace: default
data:
logstash.yml: |
http.host: "0.0.0.0"
path.config: /usr/share/logstash/pipeline
logstash.conf: |
input {
beats {
port => 5044
}
}
output {
tcp {
mode => "client"
host => "10.184.0.4"
port => 5001
codec => "json_lines"
}
stdout {
codec => rubydebug
}
}
Logstash Service
kind: Service
apiVersion: v1
metadata:
name: logstash
namespace: default
spec:
selector:
app: logstash
ports:
- protocol: TCP
port: 5044
targetPort: 5044
Filebeat daemonset are running, also the Logstash deployment. Both of them kubectl logs shows:
Filebeat daemonset shows
2021-10-13T04:10:14.201Z INFO instance/beat.go:665 Home path: [/usr/share/filebeat] Config path: [/usr/share/filebeat] Data path: [/usr/share/filebeat/data] Logs path: [/usr/share/filebeat/logs]
2021-10-13T04:10:14.219Z INFO instance/beat.go:673 Beat ID: b90d1561-e989-4ed1-88f9-9b88045cee29
2021-10-13T04:10:14.220Z INFO [seccomp] seccomp/seccomp.go:124 Syscall filter successfully installed
2021-10-13T04:10:14.220Z INFO [beat] instance/beat.go:1014 Beat info {"system_info": {"beat": {"path": {"config": "/usr/share/filebeat", "data": "/usr/share/filebeat/data", "home": "/usr/share/filebeat", "logs": "/usr/share/filebeat/logs"}, "type": "filebeat", "uuid": "b90d1561-e989-4ed1-88f9-9b88045cee29"}}}
2021-10-13T04:10:14.220Z INFO [beat] instance/beat.go:1023 Build info {"system_info": {"build": {"commit": "9023152025ec6251bc6b6c38009b309157f10f17", "libbeat": "7.15.0", "time": "2021-09-16T03:16:09.000Z", "version": "7.15.0"}}}
2021-10-13T04:10:14.220Z INFO [beat] instance/beat.go:1026 Go runtime info {"system_info": {"go": {"os":"linux","arch":"amd64","max_procs":2,"version":"go1.16.6"}}}
2021-10-13T04:10:14.221Z INFO [beat] instance/beat.go:1030 Host info {"system_info": {"host": {"architecture":"x86_64","boot_time":"2021-10-06T19:41:55Z","containerized":true,"name":"filebeat-hvqx4","ip":["127.0.0.1/8","10.116.6.42/24"],"kernel_version":"5.4.120+","mac":["ae:ab:28:37:27:2a"],"os":{"type":"linux","family":"redhat","platform":"centos","name":"CentOS Linux","version":"7 (Core)","major":7,"minor":9,"patch":2009,"codename":"Core"},"timezone":"UTC","timezone_offset_sec":0,"id":"38c2fd0d69ba05ae64d8a4d4fc156791"}}}
2021-10-13T04:10:14.221Z INFO [beat] instance/beat.go:1059 Process info {"system_info": {"process": {"capabilities": {"inheritable":["chown","dac_override","fowner","fsetid","kill","setgid","setuid","setpcap","net_bind_service","net_raw","sys_chroot","mknod","audit_write","setfcap"],"permitted":["chown","dac_override","fowner","fsetid","kill","setgid","setuid","setpcap","net_bind_service","net_raw","sys_chroot","mknod","audit_write","setfcap"],"effective":["chown","dac_override","fowner","fsetid","kill","setgid","setuid","setpcap","net_bind_service","net_raw","sys_chroot","mknod","audit_write","setfcap"],"bounding":["chown","dac_override","fowner","fsetid","kill","setgid","setuid","setpcap","net_bind_service","net_raw","sys_chroot","mknod","audit_write","setfcap"],"ambient":null}, "cwd": "/usr/share/filebeat", "exe": "/usr/share/filebeat/filebeat", "name": "filebeat", "pid": 8, "ppid": 1, "seccomp": {"mode":"filter","no_new_privs":true}, "start_time": "2021-10-13T04:10:12.819Z"}}}
2021-10-13T04:10:14.221Z INFO instance/beat.go:309 Setup Beat: filebeat; Version: 7.15.0
2021-10-13T04:10:14.222Z INFO [publisher] pipeline/module.go:113 Beat name: filebeat-hvqx4
2021-10-13T04:10:14.224Z WARN beater/filebeat.go:178 Filebeat is unable to load the Ingest Node pipelines for the configured modules because the Elasticsearch output is not configured/enabled. If you have already loaded the Ingest Node pipelines or are using Logstash pipelines, you can ignore this warning.
2021-10-13T04:10:14.225Z INFO [monitoring] log/log.go:142 Starting metrics logging every 30s
2021-10-13T04:10:14.225Z INFO instance/beat.go:473 filebeat start running.
2021-10-13T04:10:14.227Z INFO memlog/store.go:119 Loading data file of '/usr/share/filebeat/data/registry/filebeat' succeeded. Active transaction id=0
2021-10-13T04:10:14.227Z INFO memlog/store.go:124 Finished loading transaction log file for '/usr/share/filebeat/data/registry/filebeat'. Active transaction id=0
2021-10-13T04:10:14.227Z WARN beater/filebeat.go:381 Filebeat is unable to load the Ingest Node pipelines for the configured modules because the Elasticsearch output is not configured/enabled. If you have already loaded the Ingest Node pipelines or are using Logstash pipelines, you can ignore this warning.
2021-10-13T04:10:14.228Z INFO [registrar] registrar/registrar.go:109 States Loaded from registrar: 0
2021-10-13T04:10:14.228Z INFO [crawler] beater/crawler.go:71 Loading Inputs: 1
2021-10-13T04:10:14.228Z INFO beater/crawler.go:148 Stopping Crawler
2021-10-13T04:10:14.228Z INFO beater/crawler.go:158 Stopping 0 inputs
2021-10-13T04:10:14.228Z INFO beater/crawler.go:178 Crawler stopped
2021-10-13T04:10:14.228Z INFO [registrar] registrar/registrar.go:132 Stopping Registrar
2021-10-13T04:10:14.228Z INFO [registrar] registrar/registrar.go:166 Ending Registrar
2021-10-13T04:10:14.228Z INFO [registrar] registrar/registrar.go:137 Registrar stopped
2021-10-13T04:10:44.229Z INFO [monitoring] log/log.go:184 Non-zero metrics in the last 30s {"monitoring": {"metrics": {"beat":{"cgroup":{"cpu":{"cfs":{"period":{"us":100000}},"id":"/"},"cpuacct":{"id":"/","total":{"ns":307409530}},"memory":{"id":"/","mem":{"limit":{"bytes":209715200},"usage":{"bytes":52973568}}}},"cpu":{"system":{"ticks":80,"time":{"ms":85}},"total":{"ticks":270,"time":{"ms":283},"value":270},"user":{"ticks":190,"time":{"ms":198}}},"handles":{"limit":{"hard":1048576,"soft":1048576},"open":9},"info":{"ephemeral_id":"f5abb082-a094-4f99-a046-bc183d415455","uptime":{"ms":30208},"version":"7.15.0"},"memstats":{"gc_next":19502448,"memory_alloc":10052000,"memory_sys":75056136,"memory_total":55390312,"rss":112922624},"runtime":{"goroutines":12}},"filebeat":{"harvester":{"open_files":0,"running":0}},"libbeat":{"config":{"module":{"running":0}},"output":{"events":{"active":0},"type":"logstash"},"pipeline":{"clients":0,"events":{"active":0},"queue":{"max_events":4096}}},"registrar":{"states":{"current":0}},"system":{"cpu":{"cores":2},"load":{"1":0.14,"15":0.28,"5":0.31,"norm":{"1":0.07,"15":0.14,"5":0.155}}}}}}
Logtash deployment logs shows:
Using bundled JDK: /usr/share/logstash/jdk
warning: no jvm.options file found
Could not find log4j2 configuration at path /usr/share/logstash/config/log4j2.properties. Using default config which logs errors to the console
[INFO ] 2021-10-13 08:46:58.674 [main] runner - Starting Logstash {"logstash.version"=>"7.15.0", "jruby.version"=>"jruby 9.2.19.0 (2.5.8) 2021-06-15 55810c552b OpenJDK 64-Bit Server VM 11.0.11+9 on 11.0.11+9 +jit [linux-x86_64]"}
[INFO ] 2021-10-13 08:46:58.698 [main] writabledirectory - Creating directory {:setting=>"path.queue", :path=>"/usr/share/logstash/data/queue"}
[INFO ] 2021-10-13 08:46:58.700 [main] writabledirectory - Creating directory {:setting=>"path.dead_letter_queue", :path=>"/usr/share/logstash/data/dead_letter_queue"}
[WARN ] 2021-10-13 08:46:59.077 [LogStash::Runner] multilocal - Ignoring the 'pipelines.yml' file because modules or command line options are specified
[INFO ] 2021-10-13 08:46:59.097 [LogStash::Runner] agent - No persistent UUID file found. Generating new UUID {:uuid=>"7a0e5b89-70a1-4004-b38e-c31fadcd7251", :path=>"/usr/share/logstash/data/uuid"}
[INFO ] 2021-10-13 08:47:00.950 [Api Webserver] agent - Successfully started Logstash API endpoint {:port=>9600}
[INFO ] 2021-10-13 08:47:01.468 [Converge PipelineAction::Create<main>] Reflections - Reflections took 203 ms to scan 1 urls, producing 120 keys and 417 values
[WARN ] 2021-10-13 08:47:02.496 [Converge PipelineAction::Create<main>] plain - Relying on default value of `pipeline.ecs_compatibility`, which may change in a future major release of Logstash. To avoid unexpected changes when upgrading Logstash, please explicitly declare your desired ECS Compatibility mode.
[WARN ] 2021-10-13 08:47:02.526 [Converge PipelineAction::Create<main>] beats - Relying on default value of `pipeline.ecs_compatibility`, which may change in a future major release of Logstash. To avoid unexpected changes when upgrading Logstash, please explicitly declare your desired ECS Compatibility mode.
[WARN ] 2021-10-13 08:47:02.664 [Converge PipelineAction::Create<main>] jsonlines - Relying on default value of `pipeline.ecs_compatibility`, which may change in a future major release of Logstash. To avoid unexpected changes when upgrading Logstash, please explicitly declare your desired ECS Compatibility mode.
[INFO ] 2021-10-13 08:47:02.947 [[main]-pipeline-manager] javapipeline - Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>1, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50, "pipeline.max_inflight"=>125, "pipeline.sources"=>["/usr/share/logstash/pipeline/logstash.conf"], :thread=>"#<Thread:0x3b822f13#/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:125 run>"}
[INFO ] 2021-10-13 08:47:05.467 [[main]-pipeline-manager] javapipeline - Pipeline Java execution initialization time {"seconds"=>2.52}
[INFO ] 2021-10-13 08:47:05.473 [[main]-pipeline-manager] beats - Starting input listener {:address=>"0.0.0.0:5044"}
[INFO ] 2021-10-13 08:47:05.555 [[main]-pipeline-manager] javapipeline - Pipeline started {"pipeline.id"=>"main"}
[INFO ] 2021-10-13 08:47:05.588 [Agent thread] agent - Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[INFO ] 2021-10-13 08:47:05.907 [[main]<beats] Server - Starting server on port: 5044
So, my questions are:
Why is Filebeat not ingesting the logs from kubernetes?
Are there different ways to use hosts logstash on filebeat.yml? Because some examples are using DNS name just like my conf. when others are just using service names.
How to trigger/test logs to make sure my configuration is running
well?
My mistake, on filebeat environment I missed initiating the ENV node name. So, from the configuration above I just added
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
the filebeat running well now
Related
How do I start my jar application from a flink docker image inside Kubernetes?
I am trying to use my felipeogutierrez/explore-flink:1.11.1-scala_2.12 image available here into a kubernetes cluster configuration like it is saying here. I compile my project https://github.com/felipegutierrez/explore-flink with maven and I extend the default flink image flink:1.11.1-scala_2.12 with this Dockerfile: FROM maven:3.6-jdk-8-slim AS builder # get explore-flink job and compile it COPY ./java/explore-flink /opt/explore-flink WORKDIR /opt/explore-flink RUN mvn clean install FROM flink:1.11.1-scala_2.12 WORKDIR /opt/flink/usrlib COPY --from=builder /opt/explore-flink/target/explore-flink.jar /opt/flink/usrlib/explore-flink.jar ADD /opt/flink/usrlib/explore-flink.jar /opt/flink/usrlib/explore-flink.jar #USER flink then the tutorial 2 says to create the common cluster components: kubectl create -f k8s/flink-configuration-configmap.yaml kubectl create -f k8s/jobmanager-service.yaml kubectl proxy kubectl create -f k8s/jobmanager-rest-service.yaml kubectl get svc flink-jobmanager-rest and then create the jobmanager-job.yaml: kubectl create -f k8s/jobmanager-job.yaml I am getting a CrashLoopBackOff status error on the flink-jobmanager pod and the log says that it cannot find the class org.sense.flink.examples.stream.tpch.TPCHQuery03 in the flink-dist_2.12-1.11.1.jar:1.11.1 jar file. However, I want that kubernetes try also to look into the /opt/flink/usrlib/explore-flink.jar jar file. I am copying and adding this jar file on the Dockerfile of my image, but it seems that it is not working. What am I missing here? Below is my jobmanager-job.yaml file: apiVersion: batch/v1 kind: Job metadata: name: flink-jobmanager spec: template: metadata: labels: app: flink component: jobmanager spec: restartPolicy: OnFailure containers: - name: jobmanager image: felipeogutierrez/explore-flink:1.11.1-scala_2.12 imagePullPolicy: Always env: args: ["standalone-job", "--job-classname", "org.sense.flink.examples.stream.tpch.TPCHQuery03"] ports: - containerPort: 6123 name: rpc - containerPort: 6124 name: blob-server - containerPort: 8081 name: webui livenessProbe: tcpSocket: port: 6123 initialDelaySeconds: 30 periodSeconds: 60 volumeMounts: - name: flink-config-volume mountPath: /opt/flink/conf - name: job-artifacts-volume mountPath: /opt/flink/usrlib securityContext: runAsUser: 9999 # refers to user _flink_ from official flink image, change if necessary volumes: - name: flink-config-volume configMap: name: flink-config items: - key: flink-conf.yaml path: flink-conf.yaml - key: log4j-console.properties path: log4j-console.properties - name: job-artifacts-volume hostPath: path: /host/path/to/job/artifacts and my complete log file: $ kubectl logs flink-jobmanager-qfkjl Starting Job Manager sed: couldn't open temporary file /opt/flink/conf/sedSg30ro: Read-only file system sed: couldn't open temporary file /opt/flink/conf/sed1YrBco: Read-only file system /docker-entrypoint.sh: 72: /docker-entrypoint.sh: cannot create /opt/flink/conf/flink-conf.yaml: Permission denied /docker-entrypoint.sh: 91: /docker-entrypoint.sh: cannot create /opt/flink/conf/flink-conf.yaml.tmp: Read-only file system Starting standalonejob as a console application on host flink-jobmanager-qfkjl. 2020-09-21 08:08:29,528 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -------------------------------------------------------------------------------- 2020-09-21 08:08:29,531 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Preconfiguration: 2020-09-21 08:08:29,532 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - JM_RESOURCE_PARAMS extraction logs: jvm_params: -Xmx1073741824 -Xms1073741824 -XX:MaxMetaspaceSize=268435456 logs: INFO [] - Loading configuration property: jobmanager.rpc.address, flink-jobmanager INFO [] - Loading configuration property: taskmanager.numberOfTaskSlots, 4 INFO [] - Loading configuration property: blob.server.port, 6124 INFO [] - Loading configuration property: jobmanager.rpc.port, 6123 INFO [] - Loading configuration property: taskmanager.rpc.port, 6122 INFO [] - Loading configuration property: queryable-state.proxy.ports, 6125 INFO [] - Loading configuration property: jobmanager.memory.process.size, 1600m INFO [] - Loading configuration property: taskmanager.memory.process.size, 1728m INFO [] - Loading configuration property: parallelism.default, 2 INFO [] - The derived from fraction jvm overhead memory (160.000mb (167772162 bytes)) is less than its min value 192.000mb (201326592 bytes), min value will be used instead INFO [] - Final Master Memory configuration: INFO [] - Total Process Memory: 1.563gb (1677721600 bytes) INFO [] - Total Flink Memory: 1.125gb (1207959552 bytes) INFO [] - JVM Heap: 1024.000mb (1073741824 bytes) INFO [] - Off-heap: 128.000mb (134217728 bytes) INFO [] - JVM Metaspace: 256.000mb (268435456 bytes) INFO [] - JVM Overhead: 192.000mb (201326592 bytes) 2020-09-21 08:08:29,533 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -------------------------------------------------------------------------------- 2020-09-21 08:08:29,533 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Starting StandaloneApplicationClusterEntryPoint (Version: 1.11.1, Scala: 2.12, Rev:7eb514a, Date:2020-07-15T07:02:09+02:00) 2020-09-21 08:08:29,533 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - OS current user: flink 2020-09-21 08:08:29,533 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Current Hadoop/Kerberos user: <no hadoop dependency found> 2020-09-21 08:08:29,534 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.265-b01 2020-09-21 08:08:29,534 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Maximum heap size: 989 MiBytes 2020-09-21 08:08:29,534 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - JAVA_HOME: /usr/local/openjdk-8 2020-09-21 08:08:29,534 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - No Hadoop Dependency available 2020-09-21 08:08:29,534 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - JVM Options: 2020-09-21 08:08:29,534 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Xmx1073741824 2020-09-21 08:08:29,534 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Xms1073741824 2020-09-21 08:08:29,535 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -XX:MaxMetaspaceSize=268435456 2020-09-21 08:08:29,535 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Dlog.file=/opt/flink/log/flink--standalonejob-0-flink-jobmanager-qfkjl.log 2020-09-21 08:08:29,535 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Dlog4j.configuration=file:/opt/flink/conf/log4j-console.properties 2020-09-21 08:08:29,535 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Dlog4j.configurationFile=file:/opt/flink/conf/log4j-console.properties 2020-09-21 08:08:29,535 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Dlogback.configurationFile=file:/opt/flink/conf/logback-console.xml 2020-09-21 08:08:29,535 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Program Arguments: 2020-09-21 08:08:29,536 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - --configDir 2020-09-21 08:08:29,536 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - /opt/flink/conf 2020-09-21 08:08:29,536 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - --job-classname 2020-09-21 08:08:29,536 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - org.sense.flink.examples.stream.tpch.TPCHQuery03 2020-09-21 08:08:29,537 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Classpath: /opt/flink/lib/flink-csv-1.11.1.jar:/opt/flink/lib/flink-json-1.11.1.jar:/opt/flink/lib/flink-shaded-zookeeper-3.4.14.jar:/opt/flink/lib/flink-table-blink_2.12-1.11.1.jar:/opt/flink/lib/flink-table_2.12-1.11.1.jar:/opt/flink/lib/log4j-1.2-api-2.12.1.jar:/opt/flink/lib/log4j-api-2.12.1.jar:/opt/flink/lib/log4j-core-2.12.1.jar:/opt/flink/lib/log4j-slf4j-impl-2.12.1.jar:/opt/flink/lib/flink-dist_2.12-1.11.1.jar::: 2020-09-21 08:08:29,538 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -------------------------------------------------------------------------------- 2020-09-21 08:08:29,540 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Registered UNIX signal handlers for [TERM, HUP, INT] 2020-09-21 08:08:29,577 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Could not create application program. org.apache.flink.util.FlinkException: Could not find the provided job class (org.sense.flink.examples.stream.tpch.TPCHQuery03) in the user lib directory (/opt/flink/usrlib). at org.apache.flink.client.deployment.application.ClassPathPackagedProgramRetriever.getJobClassNameOrScanClassPath(ClassPathPackagedProgramRetriever.java:140) ~[flink-dist_2.12-1.11.1.jar:1.11.1] at org.apache.flink.client.deployment.application.ClassPathPackagedProgramRetriever.getPackagedProgram(ClassPathPackagedProgramRetriever.java:123) ~[flink-dist_2.12-1.11.1.jar:1.11.1] at org.apache.flink.container.entrypoint.StandaloneApplicationClusterEntryPoint.getPackagedProgram(StandaloneApplicationClusterEntryPoint.java:110) ~[flink-dist_2.12-1.11.1.jar:1.11.1] at org.apache.flink.container.entrypoint.StandaloneApplicationClusterEntryPoint.main(StandaloneApplicationClusterEntryPoint.java:78) [flink-dist_2.12-1.11.1.jar:1.11.1]
I had two problems with my configurations. First the Dockerfile was not copying the explore-flink.jar to the right location. Second I did not need to mount the volume job-artifacts-volume on the Kubernetes file jobmanager-job.yaml. Here is my Dockerfile: FROM maven:3.6-jdk-8-slim AS builder # get explore-flink job and compile it COPY ./java/explore-flink /opt/explore-flink WORKDIR /opt/explore-flink RUN mvn clean install FROM flink:1.11.1-scala_2.12 WORKDIR /opt/flink/lib COPY --from=builder --chown=flink:flink /opt/explore-flink/target/explore-flink.jar /opt/flink/lib/explore-flink.jar and the jobmanager-job.yaml file: apiVersion: batch/v1 kind: Job metadata: name: flink-jobmanager spec: template: metadata: labels: app: flink component: jobmanager spec: restartPolicy: OnFailure containers: - name: jobmanager image: felipeogutierrez/explore-flink:1.11.1-scala_2.12 imagePullPolicy: Always env: #command: ["ls"] args: ["standalone-job", "--job-classname", "org.sense.flink.App", "-app", "36"] #, <optional arguments>, <job arguments>] # optional arguments: ["--job-id", "<job id>", "--fromSavepoint", "/path/to/savepoint", "--allowNonRestoredState"] #args: ["standalone-job", "--job-classname", "org.sense.flink.examples.stream.tpch.TPCHQuery03"] #, <optional arguments>, <job arguments>] # optional arguments: ["--job-id", "<job id>", "--fromSavepoint", "/path/to/savepoint", "--allowNonRestoredState"] ports: - containerPort: 6123 name: rpc - containerPort: 6124 name: blob-server - containerPort: 8081 name: webui livenessProbe: tcpSocket: port: 6123 initialDelaySeconds: 30 periodSeconds: 60 volumeMounts: - name: flink-config-volume mountPath: /opt/flink/conf securityContext: runAsUser: 9999 # refers to user _flink_ from official flink image, change if necessary volumes: - name: flink-config-volume configMap: name: flink-config items: - key: flink-conf.yaml path: flink-conf.yaml - key: log4j-console.properties path: log4j-console.properties
How to configure a RabbitMQ cluster in Kubernetes with a mounted persistent volume that will allow data to persist when the entire cluster restarts?
I am trying to setup a high-availability RabbitMQ cluster of nodes in my Kubernetes cluster as a StatefulSet so that my data (e.g. queues, messages) persist even after restarting all of the nodes simultaneously. Since I'm deploying the RabbitMQ nodes in Kubernetes, I understand that I need to include an external persistent volume for the nodes to store data in so that the data will persist after a restart. I have mounted an Azure Files Share into my containers as a volume at the directory /var/lib/rabbitmq/mnesia. When starting with a fresh (empty) volume, the nodes start up without any issues and successfully form a cluster. I can open the RabbitMQ management UI and see that any queue I create is mirrored on all of the nodes, as expected, and the queue (plus any messages in it) will persist as long as there is at least 1 active node. Deleting pods with kubectl delete pod rabbitmq-0 -n rabbit will cause the node to stop and then restart, and the logs show that it successfully syncs with any remaining/active node so everything is fine. The problem I have encountered is that when I simultaneously delete all RabbitMQ nodes in the cluster, the first node to start up will have the persisted data from the volume and tries to re-cluster with the other two nodes which are, of course, not active. What I expected to happen was that the node would start up, load the queue and message data, and then form a new cluster (since it should notice that no other nodes are active). I suspect that there may be some data in the mounted volume that indicates the presence of other nodes which is why it tries to connect with them and join the supposed cluster, but I haven't found a way to prevent that and am not certain that this is the cause. There are two different error messages: one in the pod description (kubectl describe pod rabbitmq-0 -n rabbit) when the RabbitMQ node is in a crash loop and another in the pod logs. The pod description error output includes the following: exited with 137: 20:38:12.331 [error] Cookie file /var/lib/rabbitmq/.erlang.cookie must be accessible by owner only Error: unable to perform an operation on node 'rabbit#rabbitmq-0.rabbitmq-internal.rabbit.svc.cluster.local'. Please see diagnostics information and suggestions below. Most common reasons for this are: * Target node is unreachable (e.g. due to hostname resolution, TCP connection or firewall issues) * CLI tool fails to authenticate with the server (e.g. due to CLI tool's Erlang cookie not matching that of the server) * Target node is not running In addition to the diagnostics info below: * See the CLI, clustering and networking guides on https://rabbitmq.com/documentation.html to learn more * Consult server logs on node rabbit#rabbitmq-0.rabbitmq-internal.rabbit.svc.cluster.local * If target node is configured to use long node names, don't forget to use --longnames with CLI tools DIAGNOSTICS =========== attempted to contact: ['rabbit#rabbitmq-0.rabbitmq-internal.rabbit.svc.cluster.local'] rabbit#rabbitmq-0.rabbitmq-internal.rabbit.svc.cluster.local: * connected to epmd (port 4369) on rabbitmq-0.rabbitmq-internal.rabbit.svc.cluster.local * epmd reports: node 'rabbit' not running at all no other nodes on rabbitmq-0.rabbitmq-internal.rabbit.svc.cluster.local * suggestion: start the node Current node details: * node name: 'rabbitmqcli-345-rabbit#rabbitmq-0.rabbitmq-internal.rabbit.svc.cluster.local' * effective user's home directory: /var/lib/rabbitmq * Erlang cookie hash: xxxxxxxxxxxxxxxxx and the logs output the following info: Config file(s): /etc/rabbitmq/rabbitmq.conf Starting broker...2020-06-12 20:39:08.678 [info] <0.294.0> node : rabbit#rabbitmq-0.rabbitmq-internal.rabbit.svc.cluster.local home dir : /var/lib/rabbitmq config file(s) : /etc/rabbitmq/rabbitmq.conf cookie hash : xxxxxxxxxxxxxxxxx log(s) : <stdout> database dir : /var/lib/rabbitmq/mnesia/rabbit#rabbitmq-0.rabbitmq-internal.rabbit.svc.cluster.local ... 2020-06-12 20:48:39.015 [warning] <0.294.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,['rabbit#rabbitmq-2.rabbitmq-internal.rabbit.svc.cluster.local','rabbit#rabbitmq-1.rabbitmq-internal.rabbit.svc.cluster.local','rabbit#rabbitmq-0.rabbitmq-internal.rabbit.svc.cluster.local'],[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]} 2020-06-12 20:48:39.015 [info] <0.294.0> Waiting for Mnesia tables for 30000 ms, 0 retries left 2020-06-12 20:49:09.341 [info] <0.44.0> Application mnesia exited with reason: stopped 2020-06-12 20:49:09.505 [error] <0.294.0> 2020-06-12 20:49:09.505 [error] <0.294.0> BOOT FAILED 2020-06-12 20:49:09.505 [error] <0.294.0> =========== 2020-06-12 20:49:09.505 [error] <0.294.0> Timeout contacting cluster nodes: ['rabbit#rabbitmq-2.rabbitmq-internal.rabbit.svc.cluster.local', 2020-06-12 20:49:09.505 [error] <0.294.0> 'rabbit#rabbitmq-1.rabbitmq-internal.rabbit.svc.cluster.local']. ... BACKGROUND ========== This cluster node was shut down while other nodes were still running. 2020-06-12 20:49:09.506 [error] <0.294.0> 2020-06-12 20:49:09.506 [error] <0.294.0> This cluster node was shut down while other nodes were still running. 2020-06-12 20:49:09.506 [error] <0.294.0> To avoid losing data, you should start the other nodes first, then 2020-06-12 20:49:09.506 [error] <0.294.0> start this one. To force this node to start, first invoke To avoid losing data, you should start the other nodes first, then start this one. To force this node to start, first invoke "rabbitmqctl force_boot". If you do so, any changes made on other cluster nodes after this one was shut down may be lost. What I've tried so far is clearing the /var/lib/rabbitmq/mnesia/rabbit#rabbitmq-0.rabbitmq-internal.rabbit.svc.cluster.local/nodes_running_at_shutdown file contents, and fiddling with config settings such as the volume mount directory and erlang cookie permissions. Below are the relevant deployment files and config files: apiVersion: apps/v1 kind: StatefulSet metadata: name: rabbitmq namespace: rabbit spec: serviceName: rabbitmq-internal revisionHistoryLimit: 3 updateStrategy: type: RollingUpdate replicas: 3 selector: matchLabels: app: rabbitmq template: metadata: name: rabbitmq labels: app: rabbitmq spec: serviceAccountName: rabbitmq terminationGracePeriodSeconds: 10 containers: - name: rabbitmq image: rabbitmq:0.13 lifecycle: postStart: exec: command: - /bin/sh - -c - > until rabbitmqctl --erlang-cookie ${RABBITMQ_ERLANG_COOKIE} node_health_check; do sleep 1; done; rabbitmqctl --erlang-cookie ${RABBITMQ_ERLANG_COOKIE} set_policy ha-all "" '{"ha-mode":"all", "ha-sync-mode": "automatic"}' ports: - containerPort: 4369 - containerPort: 5672 - containerPort: 5671 - containerPort: 25672 - containerPort: 15672 resources: requests: memory: "500Mi" cpu: "0.4" limits: memory: "600Mi" cpu: "0.6" livenessProbe: exec: # Stage 2 check: command: ["rabbitmq-diagnostics", "status", "--erlang-cookie", "$(RABBITMQ_ERLANG_COOKIE)"] initialDelaySeconds: 60 periodSeconds: 60 timeoutSeconds: 15 readinessProbe: exec: # Stage 2 check: command: ["rabbitmq-diagnostics", "status", "--erlang-cookie", "$(RABBITMQ_ERLANG_COOKIE)"] initialDelaySeconds: 20 periodSeconds: 60 timeoutSeconds: 10 envFrom: - configMapRef: name: rabbitmq-cfg env: - name: HOSTNAME valueFrom: fieldRef: fieldPath: metadata.name - name: NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace - name: RABBITMQ_USE_LONGNAME value: "true" - name: RABBITMQ_NODENAME value: "rabbit#$(HOSTNAME).rabbitmq-internal.$(NAMESPACE).svc.cluster.local" - name: K8S_SERVICE_NAME value: "rabbitmq-internal" - name: RABBITMQ_DEFAULT_USER value: user - name: RABBITMQ_DEFAULT_PASS value: pass - name: RABBITMQ_ERLANG_COOKIE value: my-cookie - name: NODE_NAME valueFrom: fieldRef: fieldPath: metadata.name volumeMounts: - name: my-volume-mount mountPath: "/var/lib/rabbitmq/mnesia" imagePullSecrets: - name: my-secret volumes: - name: my-volume-mount azureFile: secretName: azure-rabbitmq-secret shareName: my-fileshare-name readOnly: false --- apiVersion: v1 kind: ConfigMap metadata: name: rabbitmq-cfg namespace: rabbit data: RABBITMQ_VM_MEMORY_HIGH_WATERMARK: "0.6" --- kind: Service apiVersion: v1 metadata: namespace: rabbit name: rabbitmq-internal labels: app: rabbitmq spec: clusterIP: None ports: - name: http protocol: TCP port: 15672 - name: amqp protocol: TCP port: 5672 - name: amqps protocol: TCP port: 5671 selector: app: rabbitmq --- kind: Service apiVersion: v1 metadata: namespace: rabbit name: rabbitmq labels: app: rabbitmq type: LoadBalancer spec: selector: app: rabbitmq ports: - name: http protocol: TCP port: 15672 targetPort: 15672 - name: amqp protocol: TCP port: 5672 targetPort: 5672 - name: amqps protocol: TCP port: 5671 targetPort: 5671 Dockerfile: FROM rabbitmq:3.8.4 COPY conf/rabbitmq.conf /etc/rabbitmq COPY conf/enabled_plugins /etc/rabbitmq USER root COPY conf/.erlang.cookie /var/lib/rabbitmq RUN /bin/bash -c 'ls -ld /var/lib/rabbitmq/.erlang.cookie; chmod 600 /var/lib/rabbitmq/.erlang.cookie; ls -ld /var/lib/rabbitmq/.erlang.cookie' rabbitmq.conf ## cluster formation settings cluster_formation.peer_discovery_backend = rabbit_peer_discovery_k8s cluster_formation.k8s.host = kubernetes.default.svc.cluster.local cluster_formation.k8s.address_type = hostname cluster_formation.k8s.service_name = rabbitmq-internal cluster_formation.k8s.hostname_suffix = .rabbitmq-internal.rabbit.svc.cluster.local cluster_formation.node_cleanup.interval = 60 cluster_formation.node_cleanup.only_log_warning = true cluster_partition_handling = autoheal queue_master_locator=min-masters ## general settings log.file.level = debug ## Mgmt UI secure/non-secure connection settings (secure not implemented yet) management.tcp.port = 15672 ## RabbitMQ entrypoint settings (will be injected below when image is built) Thanks in advance!
Flink 1.5.4 is not registering Google Cloud Storage (GCS) filesystem in Kubernetes, although it works in docker container
Note: Baking keys into an image is the worst you can do, I did this here to have a binary equal filesystem between Docker and Kubernetes while debugging. I am trying to start up a flink-jobmanager that persists its state in GCS, so I added a high-availability.storageDir: gs://BUCKET/ha line to my flink-conf.yaml and I am building my Dockerfile as described here This is my Dockerfile: FROM flink:1.5-hadoop28 ADD https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-latest-hadoop2.jar /opt/flink/lib/gcs-connector-latest-hadoop2.jar RUN mkdir /opt/flink/etc-hadoop COPY flink-conf.yaml /opt/flink/conf/flink-conf.yaml COPY key.json /opt/flink/etc-hadoop/key.json COPY core-site.xml /opt/flink/etc-hadoop/core-site.xml Now if I build this container via docker build -t flink:dev . and start an interactive shell in it like docker run -ti flink:dev /bin/bash, I am able to start the flink jobmanager via: flink-console.sh jobmanager --configDir=/opt/flink/conf/ --executionMode=cluster Flink is picking up the jar's and starting normally. However, when I use the following yaml for starting it on Kubernetes, based on the one here: apiVersion: apps/v1 kind: Deployment metadata: name: flink-jobmanager spec: replicas: 1 selector: matchLabels: app: flink component: jobmanager template: metadata: labels: app: flink component: jobmanager spec: containers: - name: jobmanager image: flink:dev imagePullPolicy: Always resources: requests: memory: "1024Mi" cpu: "500m" limits: memory: "2Gi" cpu: "1000m" args: ["jobmanager"] ports: - containerPort: 6123 name: rpc - containerPort: 6124 name: blob - containerPort: 6125 name: query - containerPort: 8081 name: ui - containerPort: 46110 name: ha env: - name: GOOGLE_APPLICATION_CREDENTIALS value: /opt/flink/etc-hadoop/key.json - name: JOB_MANAGER_RPC_ADDRESS value: flink-jobmanager Flink seems to be unable to register the filesystem: 2018-10-04 09:20:51,357 DEBUG org.apache.flink.runtime.util.HadoopUtils - Cannot find hdfs-default configuration-file path in Flink config. 2018-10-04 09:20:51,358 DEBUG org.apache.flink.runtime.util.HadoopUtils - Cannot find hdfs-site configuration-file path in Flink config. 2018-10-04 09:20:51,359 DEBUG org.apache.flink.runtime.util.HadoopUtils - Adding /opt/flink/etc-hadoop//core-site.xml to hadoop configuration 2018-10-04 09:20:51,767 DEBUG org.apache.hadoop.security.UserGroupInformation - PrivilegedActionException as:flink (auth:SIMPLE) cause:java.io.IOException: Could not create FileSystem for highly available storage (high-availability.storageDir) 2018-10-04 09:20:51,767 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Cluster initialization failed. java.io.IOException: Could not create FileSystem for highly available storage (high-availability.storageDir) at org.apache.flink.runtime.blob.BlobUtils.createFileSystemBlobStore(BlobUtils.java:122) at org.apache.flink.runtime.blob.BlobUtils.createBlobStoreFromConfig(BlobUtils.java:95) at org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createHighAvailabilityServices(HighAvailabilityServicesUtils.java:115) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.createHaServices(ClusterEntrypoint.java:402) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.initializeServices(ClusterEntrypoint.java:270) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:225) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:189) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836) at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:188) at org.apache.flink.runtime.entrypoint.StandaloneSessionClusterEntrypoint.main(StandaloneSessionClusterEntrypoint.java:91) Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 'gs'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded. at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:405) at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:320) at org.apache.flink.core.fs.Path.getFileSystem(Path.java:298) at org.apache.flink.runtime.blob.BlobUtils.createFileSystemBlobStore(BlobUtils.java:119) ... 12 more Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Hadoop File System abstraction does not support scheme 'gs'. Either no file system implementation exists for that scheme, or the relevant classes are missing from the classpath. at org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(HadoopFsFactory.java:102) at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:401) ... 15 more Caused by: java.io.IOException: No FileSystem for scheme: gs at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2799) at org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(HadoopFsFactory.java:99) ... 16 more As Kubernetes should be using the same image, I am confused how this is possible. Am I overseeing something here?
The problem was using the dev tag. Using specific version tags fixed the issue.
activemq stateful set kubernetes pods failing to start
I have this weird error plaguing me. I am trying to get an activemq pod running with a kubernetes stateful set, volume attached. The activemq is just a plain old vanila docker image, picked it from here https://hub.docker.com/r/rmohr/activemq/ INFO | Refreshing org.apache.activemq.xbean.XBeanBrokerFactory$1#3fee9989: startup date [Thu Aug 23 22:12:07 GMT 2018]; root of context hierarchy INFO | Using Persistence Adapter: KahaDBPersistenceAdapter[/opt/activemq/data/kahadb] INFO | KahaDB is version 6 INFO | PListStore:[/opt/activemq/data/localhost/tmp_storage] started INFO | Apache ActiveMQ 5.15.4 (localhost, ID:activemq-0-43279-1535062328969-0:1) is starting INFO | Listening for connections at: tcp://activemq-0:61616?maximumConnections=1000&wireFormat.maxFrameSize=104857600 INFO | Connector openwire started INFO | Listening for connections at: amqp://activemq-0:5672?maximumConnections=1000&wireFormat.maxFrameSize=104857600 INFO | Connector amqp started INFO | Listening for connections at: stomp://activemq-0:61613?maximumConnections=1000&wireFormat.maxFrameSize=104857600 INFO | Connector stomp started INFO | Listening for connections at: mqtt://activemq-0:1883?maximumConnections=1000&wireFormat.maxFrameSize=104857600 INFO | Connector mqtt started WARN | ServletContext#o.e.j.s.ServletContextHandler#65a15628{/,null,STARTING} has uncovered http methods for path: / INFO | Listening for connections at ws://activemq-0:61614?maximumConnections=1000&wireFormat.maxFrameSize=104857600 INFO | Connector ws started INFO | Apache ActiveMQ 5.15.4 (localhost, ID:activemq-0-43279-1535062328969-0:1) started INFO | For help or more information please see: http://activemq.apache.org WARN | Store limit is 102400 mb (current store usage is 6 mb). The data directory: /opt/activemq/data/kahadb only has 95468 mb of usable space. - resetting to maximum available disk space: 95468 mb WARN | Failed startup of context o.e.j.w.WebAppContext#478ee483{/admin,file:/opt/apache-activemq-5.15.4/webapps/admin/,null} java.lang.IllegalStateException: Parent for temp dir not configured correctly: writeable=false at org.eclipse.jetty.webapp.WebInfConfiguration.makeTempDirectory(WebInfConfiguration.java:336)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606] at org.eclipse.jetty.webapp.WebInfConfiguration.resolveTempDirectory(WebInfConfiguration.java:304)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606] at org.eclipse.jetty.webapp.WebInfConfiguration.preConfigure(WebInfConfiguration.java:69)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606] at org.eclipse.jetty.webapp.WebAppContext.preConfigure(WebAppContext.java:468)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606] at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:504)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606] at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606] at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:132)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606] at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:114)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606] at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:61)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606] at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606] at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:132)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606] at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:114)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606] at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:61)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606] at org.eclipse.jetty.security.SecurityHandler.doStart(SecurityHandler.java:391)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606] at org.eclipse.jetty.security.ConstraintSecurityHandler.doStart(ConstraintSecurityHandler.java:449)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606] at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606] at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:132)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606] at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:114)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606] at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:61)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606] at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606] at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:132)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606] at org.eclipse.jetty.server.Server.start(Server.java:387)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606] at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:114)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606] at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:61)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606] at org.eclipse.jetty.server.Server.doStart(Server.java:354)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606] at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)[jetty-all-9.2.22.v20170606.jar:9.2.22.v20170606] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)[:1.8.0_171] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)[:1.8.0_171] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)[:1.8.0_171] at java.lang.reflect.Method.invoke(Method.java:498)[:1.8.0_171] at org.springframework.util.MethodInvoker.invoke(MethodInvoker.java:265)[spring-core-4.3.17.RELEASE.jar:4.3.17.RELEASE] at org.springframework.beans.factory.config.MethodInvokingBean.invokeWithTargetException(MethodInvokingBean.java:119)[spring-beans-4.3.17.RELEASE.jar:4.3.17.RELEASE] at org.springframework.beans.factory.config.MethodInvokingFactoryBean.afterPropertiesSet(MethodInvokingFactoryBean.java:106)[spring-beans-4.3.17.RELEASE.jar:4.3.17.RELEASE] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1692)[spring-beans-4.3.17.RELEASE.jar:4.3.17.RELEASE] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1630)[spring-beans-4.3.17.RELEASE.jar:4.3.17.RELEASE] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:555)[spring-beans-4.3.17.RELEASE.jar:4.3.17.RELEASE] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:483)[spring-beans-4.3.17.RELEASE.jar:4.3.17.RELEASE] at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:312)[spring-beans-4.3.17.RELEASE.jar:4.3.17.RELEASE] at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:230)[spring-beans-4.3.17.RELEASE.jar:4.3.17.RELEASE] at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:308)[spring-beans-4.3.17.RELEASE.jar:4.3.17.RELEASE] at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:197)[spring-beans-4.3.17.RELEASE.jar:4.3.17.RELEASE] at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:742)[spring-beans-4.3.17.RELEASE.jar:4.3.17.RELEASE] at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:867)[spring-context-4.3.17.RELEASE.jar:4.3.17.RELEASE] at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:543)[spring-context-4.3.17.RELEASE.jar:4.3.17.RELEASE] at org.apache.xbean.spring.context.ResourceXmlApplicationContext.<init>(ResourceXmlApplicationContext.java:64)[xbean-spring-4.2.jar:4.2] at org.apache.xbean.spring.context.ResourceXmlApplicationContext.<init>(ResourceXmlApplicationContext.java:52)[xbean-spring-4.2.jar:4.2] at org.apache.activemq.xbean.XBeanBrokerFactory$1.<init>(XBeanBrokerFactory.java:104)[activemq-spring-5.15.4.jar:5.15.4] at org.apache.activemq.xbean.XBeanBrokerFactory.createApplicationContext(XBeanBrokerFactory.java:104)[activemq-spring-5.15.4.jar:5.15.4] at org.apache.activemq.xbean.XBeanBrokerFactory.createBroker(XBeanBrokerFactory.java:67)[activemq-spring-5.15.4.jar:5.15.4] at org.apache.activemq.broker.BrokerFactory.createBroker(BrokerFactory.java:71)[activemq-broker-5.15.4.jar:5.15.4] at org.apache.activemq.broker.BrokerFactory.createBroker(BrokerFactory.java:54)[activemq-broker-5.15.4.jar:5.15.4] at org.apache.activemq.console.command.StartCommand.runTask(StartCommand.java:87)[activemq-console-5.15.4.jar:5.15.4] at org.apache.activemq.console.command.AbstractCommand.execute(AbstractCommand.java:63)[activemq-console-5.15.4.jar:5.15.4] at org.apache.activemq.console.command.ShellCommand.runTask(ShellCommand.java:154)[activemq-console-5.15.4.jar:5.15.4] at org.apache.activemq.console.command.AbstractCommand.execute(AbstractCommand.java:63)[activemq-console-5.15.4.jar:5.15.4] at org.apache.activemq.console.command.ShellCommand.main(ShellCommand.java:104)[activemq-console-5.15.4.jar:5.15.4] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)[:1.8.0_171] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)[:1.8.0_171] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)[:1.8.0_171] at java.lang.reflect.Method.invoke(Method.java:498)[:1.8.0_171] at org.apache.activemq.console.Main.runTaskClass(Main.java:262)[activemq.jar:5.15.4] at org.apache.activemq.console.Main.main(Main.java:115)[activemq.jar:5.15.4] The kubernete activemq pod is running fine if we don't define it with stateful sets. Below is the spec apiVersion: apps/v1beta1 kind: StatefulSet metadata: name: activemq namespace: dev labels: app: activemq spec: replicas: 1 serviceName: activemq-svc selector: matchLabels: app: activemq template: metadata: labels: app: activemq spec: securityContext: runAsUser: 1000 fsGroup: 2000 runAsNonRoot: false containers: - name: activemq image: "mydocker/amq:latest" imagePullPolicy: "Always" ports: - containerPort: 61616 name: port-61616 - containerPort: 8161 name: port-8161 volumeMounts: - name: activemq-data mountPath: "/opt/activemq/data" restartPolicy: Always imagePullSecrets: - name: regsecret tolerations: - effect: NoExecute key: appstype operator: Equal value: ibd-mq affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: appstype operator: In values: - dev-mq volumeClaimTemplates: - metadata: name: activemq-data spec: accessModes: - ReadWriteOnce storageClassName: "gp2-us-east-2a" resources: requests: storage: 100Gi
WARN | Failed startup of context o.e.j.w.WebAppContext#478ee483{/admin,file:/opt/apache-activemq-5.15.4/webapps/admin/,null} java.lang.IllegalStateException: Parent for temp dir not configured correctly: writeable=false Unless you altered the activemq userid in your image, then that filesystem permission issue is caused by this stanza in your PodSpec: spec: securityContext: runAsUser: 1000 fsGroup: 2000 runAsNonRoot: false failing to match up with the userid configuration in rmohr/activemq:5.15.4: $ docker run -it --entrypoint=/bin/bash rmohr/activemq:5.15.4 -c 'id -a' uid=999(activemq) gid=999(activemq) groups=999(activemq)
OpenMapTiles docker doesn't start with previous configuration
I created a OpenMapTiles container: using a volume for /data directory using the image klokantech/openmaptiles-server:1.6. The container started nicely. I downloaded the planet file. And the service was working fine. As I am gonna push this to production: if the container dies, my orchestration system (Kubernetes) will restart it automatically and I want it to pick the previous configuration (so it doesn't need to download the planet file again or set any configuration). So I killed my container and restart it using the same previous volume. Problem: when my container was restarted, my restarted MapTiles didn't have the previous configuration and I got this error in the UI: OpenMapTiles Server is designed to work with data downloaded from OpenMapTiles.com, the following files are unknown and will not be used: osm-2018-04-09-v3.8-planet.mbtiles Also, I the logs, it appeared: /usr/lib/python2.7/dist-packages/supervisor/options.py:298: UserWarning: Supervisord is running as root and it is searching for its configuration file in default locations (including its current working directory); you probably want to specify a "-c" argument specifying an absolute path to a configuration file for improved security. 'Supervisord is running as root and it is searching ' 2018-05-09 09:20:18,359 CRIT Supervisor running as root (no user in config file) 2018-05-09 09:20:18,359 INFO Included extra file "/etc/supervisor/conf.d/openmaptiles.conf" during parsing 2018-05-09 09:20:18,382 INFO Creating socket tcp://localhost:8081 2018-05-09 09:20:18,383 INFO Closing socket tcp://localhost:8081 2018-05-09 09:20:18,399 INFO RPC interface 'supervisor' initialized 2018-05-09 09:20:18,399 CRIT Server 'unix_http_server' running without any HTTP authentication checking 2018-05-09 09:20:18,399 INFO supervisord started with pid 1 2018-05-09 09:20:19,402 INFO spawned: 'wizard' with pid 11 2018-05-09 09:20:19,405 INFO spawned: 'xvfb' with pid 12 2018-05-09 09:20:20,407 INFO success: wizard entered RUNNING state, process has stayed up for > than 0 seconds (startsecs) 2018-05-09 09:20:20,407 INFO success: xvfb entered RUNNING state, process has stayed up for > than 0 seconds (startsecs) Starting OpenMapTiles Map Server (action: run) Existing configuration found in /data/config.json Data file "undefined" not found! Starting installation... Installation wizard started at http://:::80/ List of available downloads ready. And I guess maybe its this undefined in the config the one that is causing problems: Existing configuration found in /data/config.json Data file "undefined" not found! This is my config file: root#maptiles-0:/# cat /data/config.json { "styles": { "standard": [ "dark-matter", "klokantech-basic", "osm-bright", "positron" ], "custom": [], "lang": "", "langLatin": true, "langAlts": true }, "settings": { "serve": { "vector": true, "raster": true, "services": true, "static": true }, "raster": { "format": "PNG_256", "hidpi": 2, "maxsize": 2048 }, "server": { "title": "", "redirect": "", "domains": [] }, "memcache": { "size": 23.5, "servers": [ "localhost:11211" ] } } Should i mount a new volume somewhere else? should I change my /data/config.json? I have no idea how to make it ok for it to be killed
I fixed this using the image klokantech/tileserver-gl:v2.3.1. With this image, you can download the vector tiles in form of MBTiles file OpenMapTiles Downloads you can find the instructions here: https://openmaptiles.org/docs/host/tileserver-gl/ Also: I deployed it to kubernetes using the following StatefulSet: apiVersion: apps/v1beta1 kind: StatefulSet metadata: labels: name: maptiles name: maptiles spec: replicas: 2 selector: matchLabels: name: maptiles serviceName: maptiles template: metadata: labels: name: maptiles spec: containers: - name: maptiles command: ["/bin/sh"] args: - -c - | echo "[INFO] Startingcontainer"; if [ $(DOWNLOAD_MBTILES) = "true" ]; then echo "[INFO] Download MBTILES_PLANET_URL"; rm /data/* cd /data/ wget -q -c $(MBTILES_PLANET_URL) echo "[INFO] Download finished"; fi; echo "[INFO] Start app in /usr/src/app"; cd /usr/src/app && npm install --production && /usr/src/app/run.sh; env: - name: MBTILES_PLANET_URL value: 'https://openmaptiles.com/download/W...' - name: DOWNLOAD_MBTILES value: 'true' livenessProbe: failureThreshold: 120 httpGet: path: /health port: 80 scheme: HTTP initialDelaySeconds: 10 periodSeconds: 30 successThreshold: 1 timeoutSeconds: 5 ports: - containerPort: 80 name: http protocol: TCP readinessProbe: failureThreshold: 120 httpGet: path: /health port: 80 scheme: HTTP initialDelaySeconds: 10 periodSeconds: 30 successThreshold: 1 timeoutSeconds: 5 resources: limits: cpu: 500m memory: 4Gi requests: cpu: 100m memory: 2Gi volumeMounts: - mountPath: /data name: maptiles volumeClaimTemplates: - metadata: creationTimestamp: null name: maptiles spec: accessModes: - ReadWriteOnce resources: requests: storage: 60Gi storageClassName: standard I first deploy it with DOWNLOAD_MBTILES='true' and after I change it to DOWNLOAD_MBTILES='false' (so it doesnt clean up the map next time it is deployed). I tested it and when it has DOWNLOAD_MBTILES='false', you can kill the containers and they start again in a minute or so.