I am trying to run spark sample SparkPi docker image on EKS. My Spark version is 3.0.
I created spark serviceaccount and role binding. When I submit the job, there is error below:
2020-07-05T12:19:40.862635502Z Exception in thread "main" java.io.IOException: failure to login
2020-07-05T12:19:40.862756537Z at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:841)
2020-07-05T12:19:40.862772672Z at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:777)
2020-07-05T12:19:40.862777401Z at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:650)
2020-07-05T12:19:40.862788327Z at org.apache.spark.util.Utils$.$anonfun$getCurrentUserName$1(Utils.scala:2412)
2020-07-05T12:19:40.862792294Z at scala.Option.getOrElse(Option.scala:189)
2020-07-05T12:19:40.8628321Z at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2412)
2020-07-05T12:19:40.862836906Z at org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.configurePod(BasicDriverFeatureStep.scala:119)
2020-07-05T12:19:40.862907673Z at org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$3(KubernetesDriverBuilder.scala:59)
2020-07-05T12:19:40.862917119Z at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
2020-07-05T12:19:40.86294845Z at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
2020-07-05T12:19:40.862964245Z at scala.collection.immutable.List.foldLeft(List.scala:89)
2020-07-05T12:19:40.862979665Z at org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:58)
2020-07-05T12:19:40.863055425Z at org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:98)
2020-07-05T12:19:40.863060434Z at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$4(KubernetesClientApplication.scala:221)
2020-07-05T12:19:40.863096062Z at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$4$adapted(KubernetesClientApplication.scala:215)
2020-07-05T12:19:40.863103831Z at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2539)
2020-07-05T12:19:40.863163804Z at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:215)
2020-07-05T12:19:40.863168546Z at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:188)
2020-07-05T12:19:40.863194449Z at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
2020-07-05T12:19:40.863218817Z at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
2020-07-05T12:19:40.863246594Z at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
2020-07-05T12:19:40.863252341Z at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
2020-07-05T12:19:40.863277236Z at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
2020-07-05T12:19:40.863314173Z at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
2020-07-05T12:19:40.863319847Z at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
2020-07-05T12:19:40.863653699Z Caused by: javax.security.auth.login.LoginException: java.lang.NullPointerException: invalid null input: name
2020-07-05T12:19:40.863660447Z at com.sun.security.auth.UnixPrincipal.<init>(UnixPrincipal.java:71)
2020-07-05T12:19:40.863663683Z at com.sun.security.auth.module.UnixLoginModule.login(UnixLoginModule.java:133)
2020-07-05T12:19:40.863667173Z at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2020-07-05T12:19:40.863670199Z at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
2020-07-05T12:19:40.863673467Z at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2020-07-05T12:19:40.86367674Z at java.lang.reflect.Method.invoke(Method.java:498)
2020-07-05T12:19:40.863680205Z at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
2020-07-05T12:19:40.863683401Z at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
2020-07-05T12:19:40.86368671Z at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
2020-07-05T12:19:40.863689794Z at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
2020-07-05T12:19:40.863693081Z at java.security.AccessController.doPrivileged(Native Method)
2020-07-05T12:19:40.863696183Z at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
2020-07-05T12:19:40.863698579Z at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
2020-07-05T12:19:40.863700844Z at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:815)
2020-07-05T12:19:40.863703393Z at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:777)
2020-07-05T12:19:40.86370659Z at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:650)
2020-07-05T12:19:40.863709809Z at org.apache.spark.util.Utils$.$anonfun$getCurrentUserName$1(Utils.scala:2412)
2020-07-05T12:19:40.863712847Z at scala.Option.getOrElse(Option.scala:189)
2020-07-05T12:19:40.863716102Z at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2412)
2020-07-05T12:19:40.863719273Z at org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.configurePod(BasicDriverFeatureStep.scala:119)
2020-07-05T12:19:40.86372651Z at org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$3(KubernetesDriverBuilder.scala:59)
2020-07-05T12:19:40.863728947Z at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
2020-07-05T12:19:40.863731207Z at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
2020-07-05T12:19:40.863733458Z at scala.collection.immutable.List.foldLeft(List.scala:89)
2020-07-05T12:19:40.863736237Z at org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:58)
2020-07-05T12:19:40.863738769Z at org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:98)
2020-07-05T12:19:40.863742105Z at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$4(KubernetesClientApplication.scala:221)
2020-07-05T12:19:40.863745486Z at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$4$adapted(KubernetesClientApplication.scala:215)
2020-07-05T12:19:40.863749154Z at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2539)
2020-07-05T12:19:40.863752601Z at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:215)
2020-07-05T12:19:40.863756118Z at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:188)
2020-07-05T12:19:40.863759673Z at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
2020-07-05T12:19:40.863762774Z at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
2020-07-05T12:19:40.863765929Z at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
2020-07-05T12:19:40.86376906Z at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
2020-07-05T12:19:40.863792673Z at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
2020-07-05T12:19:40.863797161Z at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
2020-07-05T12:19:40.863799703Z at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
2020-07-05T12:19:40.863802085Z
2020-07-05T12:19:40.863804184Z at javax.security.auth.login.LoginContext.invoke(LoginContext.java:856)
2020-07-05T12:19:40.863806454Z at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
2020-07-05T12:19:40.863808705Z at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
2020-07-05T12:19:40.863811134Z at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
2020-07-05T12:19:40.863815328Z at java.security.AccessController.doPrivileged(Native Method)
2020-07-05T12:19:40.863817575Z at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
2020-07-05T12:19:40.863819856Z at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
2020-07-05T12:19:40.863829171Z at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:815)
2020-07-05T12:19:40.86385963Z ... 24 more
My deployments are:
apiVersion: v1
kind: Namespace
metadata:
name: helios
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: spark
namespace: helios
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: spark-role-binding
namespace: helios
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: edit
subjects:
- kind: ServiceAccount
name: spark
namespace: helios
---
apiVersion: batch/v1
kind: Job
metadata:
name: spark-pi
namespace: helios
spec:
template:
spec:
containers:
- name: spark-pi
image: <registry>/spark-pi-3.0
command: [
"/bin/sh",
"-c",
"/opt/spark/bin/spark-submit \
--master k8s://https://<EKS_API_SERVER> \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.kubernetes.namespace=helios
--conf spark.executor.instances=2 \
--conf spark.executor.memory=2G \
--conf spark.executor.cores=2 \
--conf spark.kubernetes.container.image=<registry>/spark-pi-3.0 \
--conf spark.kubernetes.container.image.pullPolicy=Always \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.jars.ivy=/tmp/.ivy
local:///opt/spark/examples/jars/spark-examples_2.11-2.4.0.jar"
]
serviceAccountName: spark
restartPolicy: Never
The docker image is created using OOTB dockerfile provided in Spark installation.
docker build -t spark:latest -f kubernetes/dockerfiles/spark/Dockerfile .
What am I doing wrong here? Please help.
SOLUTION
Finally it worked out after I comment the below line from docker file.
USER ${spark_uid}
Though, now, container is running as root but at least it is working.
I had the same problem. I solved it by changing the k8s job.
Hadoop is failing to find a username for the user. You can see the problem by running whoami in the container, which yields whoami: cannot find name for user ID 185. The spark image entrypoint.sh contains code to add the user to /etc/passwd, which sets a username. However command bypasses the entrypoint.sh, so instead you should use args like so:
containers:
- name: spark-pi
image: <registry>/spark-pi-3.0
args: [
"/bin/sh",
"-c",
"/opt/spark/bin/spark-submit \
--master k8s://https://10.100.0.1:443 \
--deploy-mode cluster ..."
]
Seems like you are missing the ServiceAccount/AWS role credentials so that your job can connect to the EKS cluster.
I recommend you set up fine-grained IAM roles for service accounts.
Basically, you would have something like this (after you set up the roles in AWS):
apiVersion: v1
kind: ServiceAccount
metadata:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/my-serviceaccount-Role1
name: spark
namespace: helios
Then your job would look something like this:
apiVersion: batch/v1
kind: Job
metadata:
name: spark-pi
namespace: helios
spec:
template:
spec:
containers:
- name: spark-pi
image: <registry>/spark-pi-3.0
command: [
"/bin/sh",
"-c",
"/opt/spark/bin/spark-submit \
--master k8s://https://<EKS_API_SERVER> \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.kubernetes.namespace=helios
--conf spark.executor.instances=2 \
--conf spark.executor.memory=2G \
--conf spark.executor.cores=2 \
--conf spark.kubernetes.container.image=<registry>/spark-pi-3.0 \
--conf spark.kubernetes.container.image.pullPolicy=Always \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.jars.ivy=/tmp/.ivy
local:///opt/spark/examples/jars/spark-examples_2.11-2.4.0.jar" ]
env:
- name: AWS_ROLE_ARN
value: arn:aws:iam::123456789012:role/my-serviceaccount-Role1
- name: AWS_WEB_IDENTITY_TOKEN_FILE
value: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
volumeMounts:
- mountPath: /var/run/secrets/eks.amazonaws.com/serviceaccount
name: aws-iam-token
readOnly: true
serviceAccountName: spark
restartPolicy: Never
I had the same problem. I solved it by adding into submit container
export SPARK_USER=spark3
without comment line USER ${spark_uid}
Finally it worked out after I comment the below line from docker file.
USER ${spark_uid}
Though, now, container is running as root but at least it is working.
I ran into the same issue and was able to resolve it by specifying runAsUser on the pod spec without having to modify the spark docker image.
securityContext:
runAsUser: 65534
runAsGroup: 65534
I had the same issue, fixed it by adding
RUN echo 1000:x:1000:0:anonymous uid:/opt/spark:/bin/false >> /etc/passwd
line in the last part Spark Dockerfile
RUN echo '1000:x:1000:0:anonymous uid:/opt/spark:/bin/false' >> /etc/passwd
ENTRYPOINT [ "/opt/entrypoint.sh" ]
# Specify the User that the actual main process will run as
USER ${spark_uid}
so full dockerfile looks like this
cat spark-3.2.0-bin-hadoop3.2/kubernetes/dockerfiles/spark/Dockerfile
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
ARG ROOT_CONTAINER=ubuntu:focal
FROM ${ROOT_CONTAINER}
ARG openjdk_version="8"
ARG spark_uid=1000
# Before building the docker image, first build and make a Spark distribution following
# the instructions in http://spark.apache.org/docs/latest/building-spark.html.
# If this docker file is being used in the context of building your images from a Spark
# distribution, the docker build command should be invoked from the top level directory
# of the Spark distribution. E.g.:
# docker build -t spark:latest -f kubernetes/dockerfiles/spark/Dockerfile .
RUN apt-get update --yes && \
apt-get install --yes --no-install-recommends \
"openjdk-${openjdk_version}-jre-headless" \
ca-certificates-java
RUN apt-get install --yes software-properties-common
RUN add-apt-repository ppa:deadsnakes/ppa
RUN apt-get update && apt-get install -y \
python3.7 \
python3-pip \
python3-distutils \
python3-setuptools
RUN pip install pyspark==3.2.0
RUN set -ex && \
sed -i 's/http:\/\/deb.\(.*\)/https:\/\/deb.\1/g' /etc/apt/sources.list && \
apt-get update && \
ln -s /lib /lib64 && \
export DEBIAN_FRONTEND=noninteractive && \
apt install -y -qq bash tini libc6 libpam-modules krb5-user libnss3 procps && \
mkdir -p /opt/spark && \
mkdir -p /opt/spark/examples && \
mkdir -p /opt/spark/work-dir && \
mkdir -p /etc/metrics/conf/ && \
mkdir -p /opt/hadoop/ && \
touch /opt/spark/RELEASE && \
rm /bin/sh && \
ln -sv /bin/bash /bin/sh && \
echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && \
chgrp root /etc/passwd && chmod ug+rw /etc/passwd && \
apt-get clean && rm -rf /var/lib/apt/lists/* \
rm -rf /var/cache/apt/*
COPY jars /opt/spark/jars
COPY bin /opt/spark/bin
COPY sbin /opt/spark/sbin
COPY kubernetes/dockerfiles/spark/entrypoint.sh /opt/
COPY kubernetes/dockerfiles/spark/decom.sh /opt/
COPY examples /opt/spark/examples
COPY kubernetes/tests /opt/spark/tests
COPY data /opt/spark/data
COPY conf/prometheus.yaml /etc/metrics/conf/
ENV SPARK_HOME /opt/spark
ENV JAVA_HOME /usr/lib/jvm/java-8-openjdk-amd64
WORKDIR /opt/spark/work-dir
RUN chmod g+w /opt/spark/work-dir
RUN chmod a+x /opt/decom.sh
RUN mkdir -p /opt/spark/logs && \
chown -R 1000:1000 /opt/spark/logs
RUN echo '1000:x:1000:0:anonymous uid:/opt/spark:/bin/false' >> /etc/passwd
RUN cat /etc/passwd
ENTRYPOINT [ "/opt/entrypoint.sh" ]
# Specify the User that the actual main process will run as
USER ${spark_uid}
Build spark-docker image
sudo ./bin/docker-image-tool.sh -r <my_docker_repo>/spark-3.2.0-bin-hadoop3.2-gcs -t <tag_number> build
Related
I need to mount an s3 bucket in a kubernetes pod. I am using this guide to help me. It works perfectly, however, the pod is stuck indefinitely in the status of "terminating" when giving the command to delete the pod. I don't know why that is.
Here the .yaml
apiVersion: v1
kind: Pod
metadata:
name: worker
spec:
volumes:
- name: mntdatas3fs
emptyDir: {}
- name: devfuse
hostPath:
path: /dev/fuse
restartPolicy: Always
containers:
- image: nginx
name: s3-test
securityContext:
privileged: true
volumeMounts:
- name: mntdatas3fs
mountPath: /var/s3fs:shared
- name: s3fs
image: meain/s3-mounter
imagePullPolicy: IfNotPresent
securityContext:
privileged: true
env:
- name: S3_REGION
value: "us-east-1"
- name: S3_BUCKET
value: "xxxxxxx"
- name: AWS_KEY
value: "xxxxxx"
- name: AWS_SECRET_KEY
value: "xxxxxx"
volumeMounts:
- name: devfuse
mountPath: /dev/fuse
- name: mntdatas3fs
mountPath: /var/s3fs:shared
Here the Dockerfile of meain/s3-mounter used by s3fs container
FROM alpine:3.3
ENV MNT_POINT /var/s3fs
ARG S3FS_VERSION=v1.86
RUN apk --update --no-cache add fuse alpine-sdk automake autoconf libxml2-dev fuse-dev curl-dev git bash; \
git clone https://github.com/s3fs-fuse/s3fs-fuse.git; \
cd s3fs-fuse; \
git checkout tags/${S3FS_VERSION}; \
./autogen.sh; \
./configure --prefix=/usr; \
make; \
make install; \
make clean; \
rm -rf /var/cache/apk/*; \
apk del git automake autoconf;
RUN mkdir -p "$MNT_POINT"
COPY run.sh run.sh
CMD ./run.sh
Here the run.sh copied into the container
#!/bin/sh
set -e
echo "$AWS_KEY:$AWS_SECRET_KEY" > passwd && chmod 600 passwd
s3fs "$S3_BUCKET" "$MNT_POINT" -o passwd_file=passwd && tail -f /dev/null
I had this exact problem with a very similiar setup. s3fs mounts s3 to /var/s3fs. The mount has to be unmounted before the pod can happily be terminated. This is done done with: umount /var/s3fs. See https://manpages.ubuntu.com/manpages/xenial/man1/s3fs.1.html
So in your case adding
lifecycle:
preStop:
exec:
command: ["sh","-c","umount /var/mnts3fs"]
Should fix it.
I'm trying to set up Ansible molecule for testing roles on different OSes. For example, this role is failing when it gets to the task that installs with snap install core:
https://github.com/ProfessorManhattan/Ansible-Role-Snapd
molecule.yml:
---
dependency:
name: galaxy
options:
role-file: requirements.yml
requirements-file: requirements.yml
driver:
name: docker
platforms:
- name: Ubuntu-20.04
image: professormanhattan/ansible-molecule-ubuntu2004
command: /sbin/init
tmpfs:
- /run
- /tmp
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
privileged: true
pre_build_image: true
provisioner:
name: ansible
connection_options:
ansible_connection: docker
ansible_password: ansible
ansible_ssh_user: ansible
inventory:
group_vars:
all:
molecule_test: true
options:
vvv: true
playbooks:
converge: converge.yml
verifier:
name: ansible
scenario:
create_sequence:
- dependency
- create
- prepare
check_sequence:
- dependency
- cleanup
- destroy
- create
- prepare
- converge
- check
- destroy
converge_sequence:
- dependency
- create
- prepare
- converge
destroy_sequence:
- dependency
- cleanup
- destroy
test_sequence:
- lint
- dependency
- cleanup
- destroy
- syntax
- create
- prepare
- converge
- idempotence
- side_effect
- verify
- cleanup
- destroy
install-Debian.yml:
---
- name: Ensure snapd is installed
apt:
name: snapd
state: present
update_cache: true
- name: Ensure fuse filesystem is installed
apt:
name: fuse
state: present
- name: Ensure snap is started and enabled on boot
ansible.builtin.systemd:
enabled: true
name: snapd
state: started
- name: Ensure snap core is installed # This task is failing
community.general.snap:
name: core
state: present
The error I receive is:
TASK [professormanhattan.snapd : Ensure fuse filesystem is installed] **********
ok: [Ubuntu-20.04]
TASK [professormanhattan.snapd : Ensure snap is started and enabled on boot] ***
changed: [Ubuntu-20.04]
TASK [professormanhattan.snapd : Ensure snap core is installed] ****************
fatal: [Ubuntu-20.04]: FAILED! => {"changed": false, "channel": "stable", "classic": false, "cmd": "sh -c \"/usr/bin/snap install core\"", "msg": "Ooops! Snap installation failed while executing 'sh -c \"/usr/bin/snap install core\"', please examine logs and error output for more details.", "rc": 1, "stderr": "error: cannot perform the following tasks:\n- Setup snap \"core\" (10823) security profiles (cannot reload udev rules: exit status 1\nudev output:\nFailed to send reload request: No such file or directory\n)\n", "stderr_lines": ["error: cannot perform the following tasks:", "- Setup snap \"core\" (10823) security profiles (cannot reload udev rules: exit status 1", "udev output:", "Failed to send reload request: No such file or directory", ")"], "stdout": "", "stdout_lines": []}
The same is true for all the other operating systems I'm trying to test. Here's a link to the Dockerfile I'm using to build the Ubuntu image:
Dockerfile:
FROM ubuntu:20.04
LABEL maintainer="help#megabyte.space"
ENV container docker
ENV DEBIAN_FRONTEND noninteractive
# Source: https://github.com/ansible/molecule/issues/1104
RUN set -xe \
&& apt-get update \
&& apt-get install -y apt-utils \
&& apt-get upgrade -y \
&& apt-get install -y \
build-essential \
libyaml-dev \
python3-apt \
python3-dev \
python3-pip \
python3-setuptools \
python3-yaml \
software-properties-common \
sudo \
systemd \
systemd-sysv \
&& apt-get clean \
&& pip3 install \
ansible \
ansible-lint \
flake8 \
molecule \
yamllint \
&& mkdir -p /etc/ansible \
&& echo "[local]\nlocalhost ansible_connection=local" > /etc/ansible/hosts \
&& groupadd -r ansible \
&& useradd -m -g ansible ansible \
&& usermod -aG sudo ansible \
&& sed -i "/^%sudo/s/ALL\$/NOPASSWD:ALL/g" /etc/sudoers
VOLUME ["/sys/fs/cgroup", "/tmp", "/run"]
CMD ["/sbin/init"]
Looking for a geerlingguy.
I'm fairly new to kubernetes and I'm trying to orchestrate my rails app using minikube on my MacBook. My app includes MySQL, Redis and Sidekiq. I'm running webapp, sidekiq, redis and database in isolated pods. Sidekiq pod is not connecting to redis pod.
kubectl logs of sidekiq pod says this:
2020-09-15T14:01:16.978Z 1 TID-gnaz4yzs0 INFO: Booting Sidekiq 4.2.10 with redis options {:url=>"redis://redis:6379/0"}
2020-09-15T14:01:18.475Z 1 TID-gnaz4yzs0 INFO: Running in ruby 2.6.3p62 (2019-04-16 revision 67580) [x86_64-linux]
2020-09-15T14:01:18.475Z 1 TID-gnaz4yzs0 INFO: See LICENSE and the LGPL-3.0 for licensing details.
2020-09-15T14:01:18.475Z 1 TID-gnaz4yzs0 INFO: Upgrade to Sidekiq Pro for more features and support: http://sidekiq.org
/usr/local/bundle/gems/redis-3.3.1/lib/redis/client.rb:459: warning: constant ::Fixnum is deprecated
Error connecting to Redis on redis:6379 (Errno::ECONNREFUSED)
/usr/local/bundle/gems/redis-3.3.1/lib/redis/client.rb:345:in `rescue in establish_connection'
/usr/local/bundle/gems/redis-3.3.1/lib/redis/client.rb:330:in `establish_connection'
/usr/local/bundle/gems/redis-3.3.1/lib/redis/client.rb:101:in `block in connect'
/usr/local/bundle/gems/redis-3.3.1/lib/redis/client.rb:293:in `with_reconnect'
/usr/local/bundle/gems/redis-3.3.1/lib/redis/client.rb:100:in `connect'
/usr/local/bundle/gems/redis-3.3.1/lib/redis/client.rb:364:in `ensure_connected'
/usr/local/bundle/gems/redis-3.3.1/lib/redis/client.rb:221:in `block in process'
/usr/local/bundle/gems/redis-3.3.1/lib/redis/client.rb:306:in `logging'
/usr/local/bundle/gems/redis-3.3.1/lib/redis/client.rb:220:in `process'
/usr/local/bundle/gems/redis-3.3.1/lib/redis/client.rb:120:in `call'
/usr/local/bundle/gems/redis-3.3.1/lib/redis.rb:251:in `block in info'
/usr/local/bundle/gems/redis-3.3.1/lib/redis.rb:58:in `block in synchronize'
/usr/local/lib/ruby/2.6.0/monitor.rb:230:in `mon_synchronize'
/usr/local/bundle/gems/redis-3.3.1/lib/redis.rb:58:in `synchronize'
/usr/local/bundle/gems/redis-3.3.1/lib/redis.rb:250:in `info'
/usr/local/bundle/gems/sidekiq-4.2.10/lib/sidekiq.rb:113:in `block in redis_info'
/usr/local/bundle/gems/sidekiq-4.2.10/lib/sidekiq.rb:95:in `block in redis'
/usr/local/bundle/gems/connection_pool-2.2.3/lib/connection_pool.rb:63:in `block (2 levels) in with'
/usr/local/bundle/gems/connection_pool-2.2.3/lib/connection_pool.rb:62:in `handle_interrupt'
/usr/local/bundle/gems/connection_pool-2.2.3/lib/connection_pool.rb:62:in `block in with'
/usr/local/bundle/gems/connection_pool-2.2.3/lib/connection_pool.rb:59:in `handle_interrupt'
/usr/local/bundle/gems/connection_pool-2.2.3/lib/connection_pool.rb:59:in `with'
/usr/local/bundle/gems/sidekiq-4.2.10/lib/sidekiq.rb:92:in `redis'
/usr/local/bundle/gems/sidekiq-4.2.10/lib/sidekiq.rb:106:in `redis_info'
/usr/local/bundle/gems/sidekiq-4.2.10/lib/sidekiq/cli.rb:71:in `run'
/usr/local/bundle/gems/sidekiq-4.2.10/bin/sidekiq:12:in `<top (required)>'
/usr/local/bundle/bin/sidekiq:29:in `load'
/usr/local/bundle/bin/sidekiq:29:in `<main>'
My webapp.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: checklist-deployment
spec:
replicas: 2
template:
metadata:
labels:
app: railsapp
spec:
containers:
- name: webapp
image: masettyabhishek/checklist:latest
command: ["rails", "s", "-p", "3001", "-b", "0.0.0.0", "-e", "PRODUCTION"]
ports:
- name: checklist-port
containerPort: 3001
env:
- name: MYSQL_HOST
value: database-service
- name: MYSQL_USER
value: root
- name: MYSQL_PASSWORD
value: Mission2019
- name: MYSQL_DATABASE
value: checklist
- name: MYSQL_ROOT_PASSWORD
value: Mission2019
- name: REDIS_URL
value: redis
- name: REDIS_PORT
value: "6379"
selector:
matchLabels:
app: railsapp
webapp-service.yaml
apiVersion: v1
kind: Service
metadata:
name: webapp-service
spec:
ports:
- port: 3001
protocol: TCP
type: NodePort
selector:
app: railsapp
sidekiq.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: sidekiq-deployment
spec:
replicas: 1
template:
metadata:
labels:
instance: sidekiq
spec:
containers:
- name: sidekiq
image: masettyabhishek/checklist:latest
command: ["sidekiq", "-C", "config/sidekiq.yml"]
env:
- name: MYSQL_HOST
value: database-service
- name: MYSQL_USER
value: root
- name: MYSQL_PASSWORD
value: Mission2019
- name: MYSQL_DATABASE
value: checklist
- name: MYSQL_ROOT_PASSWORD
value: Mission2019
- name: REDIS_URL
value: redis
- name: REDIS_PORT
value: "6379"
ports:
- name: redis-port
containerPort: 6379
selector:
matchLabels:
instance: sidekiq
redis.yaml
apiVersion: v1
kind: Pod
metadata:
name: redis-pod
spec:
containers:
- name: redis
image: redis:alpine
command: ["redis-server"]
ports:
- containerPort: 6379
---
apiVersion: v1
kind: Service
metadata:
name: redis
spec:
selector:
name: redis-pod
instance: sidekiq
app: railsapp
type: NodePort
ports:
- port: 6379
This is sidekiq.yml in my rails app
Sidekiq.configure_server do |config|
config.redis = { url: "redis://#{ENV['REDIS_URL']}:#{ENV['REDIS_PORT']}/0"}
end
Sidekiq.configure_client do |config|
config.redis = { url: "redis://#{ENV['REDIS_URL']}:#{ENV['REDIS_PORT']}/0"}
end
This is Dockerfile if that helps to answer the question.
FROM ubuntu:16.04
ENV RUBY_MAJOR="2.6" \
RUBY_VERSION="2.6.3" \
RUBYGEMS_VERSION="3.0.8" \
BUNDLER_VERSION="1.17.3" \
RAILS_VERSION="5.2.1" \
RAILS_ENV="production" \
GEM_HOME="/usr/local/bundle"
ENV BUNDLE_PATH="$GEM_HOME" \
BUNDLE_BIN="$GEM_HOME/bin" \
BUNDLE_SILENCE_ROOT_WARNING=1 \
BUNDLE_APP_CONFIG="$GEM_HOME"
ENV PATH="$BUNDLE_BIN:$GEM_HOME/bin:$GEM_HOME/gems/bin:$PATH"
USER root
RUN apt-get update && \
apt-get -y install sudo
RUN echo "%sudo ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers && \
addgroup --gid 1024 stars && \
useradd -G stars,sudo -d /home/user --shell /bin/bash -m user
RUN mkdir -p /usr/local/etc \
&& echo 'install: --no-document' >> /usr/local/etc/gemrc \
&& echo 'update: --no-document' >> /usr/local/etc/gemrc
USER user
RUN sudo apt-get -y install --no-install-recommends vim make gcc zlib1g-dev autoconf build-essential libssl-dev libsqlite3-dev \
curl htop unzip mc openssh-server openssl bison libgdbm-dev ruby git libmysqlclient-dev tzdata mysql-client
RUN sudo rm -rf /var/lib/apt/lists/* \
&& sudo curl -fSL -o ruby.tar.gz "http://cache.ruby-lang.org/pub/ruby/$RUBY_MAJOR/ruby-$RUBY_VERSION.tar.gz" \
&& sudo mkdir -p /usr/src/ruby \
&& sudo tar -xzf ruby.tar.gz -C /usr/src/ruby --strip-components=1 \
&& sudo rm ruby.tar.gz
USER root
RUN cd /usr/src/ruby \
&& { sudo echo '#define ENABLE_PATH_CHECK 0'; echo; cat file.c; } > file.c.new && mv file.c.new file.c \
&& autoconf \
&& ./configure --disable-install-doc
USER user
RUN cd /usr/src/ruby \
&& sudo make -j"$(nproc)" \
&& sudo make install \
&& sudo gem update --system $RUBYGEMS_VERSION \
&& sudo rm -r /usr/src/ruby
RUN sudo gem install bundler --version "$BUNDLER_VERSION"
RUN sudo mkdir -p "$GEM_HOME" "$BUNDLE_BIN" \
&& sudo chmod 777 "$GEM_HOME" "$BUNDLE_BIN" \
&& sudo gem install rails --version "$RAILS_VERSION"
RUN mkdir -p ~/.ssh && \
chmod 0700 ~/.ssh && \
ssh-keyscan github.com > ~/.ssh/known_hosts
ARG ssh_pub_key
ARG ssh_prv_key
RUN echo "$ssh_pub_key" > ~/.ssh/id_rsa.pub && \
echo "$ssh_prv_key" > ~/.ssh/id_rsa && \
chmod 600 ~/.ssh/id_rsa.pub && \
chmod 600 ~/.ssh/id_rsa
USER root
RUN curl -sL https://deb.nodesource.com/setup_10.x | sudo -E bash -
RUN apt-get install -y nodejs
USER user
WORKDIR /data
RUN sudo mkdir /data/checklist
WORKDIR /data/checklist
ADD Gemfile Gemfile.lock ./
RUN sudo chown -R user /data/checklist
RUN bundle install
ADD . .
RUN sudo chown -R user /data/checklist
EXPOSE 3001
ENV RAILS_SERVE_STATIC_FILES true
ENV RAILS_LOG_TO_STDOUT true
RUN chmod +x ./config/docker/prepare-db.sh && sh ./config/docker/prepare-db.sh
ENTRYPOINT ["bundle", "exec"]
CMD ["sh", "./config/docker/startup.sh"]
kubectl describe svc redis
➜ checklist kubectl describe svc redis
Name: redis
Namespace: default
Labels: <none>
Annotations: <none>
Selector: app=railsapp,instance=sidekiq,name=redis-pod
Type: NodePort
IP: 10.103.6.43
Port: <unset> 6379/TCP
TargetPort: 6379/TCP
NodePort: <unset> 31886/TCP
Endpoints: <none>
Session Affinity: None
External Traffic Policy: Cluster
Events: <none>
As you can see the Endpoints section of the redis service is not having pod IPs which is the reason for Connection refused error. The Pod need to have label matching with selector of service. Updating the redis pod with labels as below should solve the issue.
apiVersion: v1
kind: Pod
metadata:
name: redis-pod
labels:
instance: sidekiq
app: ailsapp
name: redis-pod
spec:
containers:
- name: redis
image: redis:alpine
command: ["redis-server"]
ports:
- containerPort: 6379
I have a container I'm deploying to Kubernetes (GKE), and the image I have built locally is good, and runs as expected, but it appears that the image being pulled from Google Container Registry, when the run command is changed to pwd && ls returns the output shown here:
I 2020-06-17T16:24:54.222382706Z /app
I 2020-06-17T16:24:54.226108583Z lost+found
I 2020-06-17T16:24:54.226143620Z package-lock.json
and the output of the same commands when running in the container locally, with docker run -it <container:tag> bash is this:
#${API_CONTAINER} resolves to gcr.io/<project>/container: I.E. tag gets appended
.../# docker run -it ${API_CONTAINER}latest bash
root#362737147de4:/app# pwd
/app
root#362737147de4:/app# ls
Dockerfile dist files node_modules package.json ssh.bat stop_forever.bat test tsconfig.json
cloudbuild.yaml environments log package-lock.json src startApi.sh swagger.json test.pdf tsconfig.test.json
root#362737147de4:/app#
My thoughts on this start with, either the push to the registry is literally failing to work, or I'm not pulling the right one, i.e. pulling some off latest tag that was build by cloud build in a previous attempt to get this going.
What could be the potential issue? what could potentially fix this issue?
Edit: After using differing tags in deployment, using --no-cache during build, and pulling from the registry on another machine, my inclination is that GKE is having an issue pulling the image from GCR. Is there a way I can put this somewhere else, or get visibility on what's going on with the pull?
EDIT 2:
So Yes, I have a docker file I can share, but please be aware that I have inherited it, and don't understand the process that went into building this, or why some steps were necessary to the other developer. (I am definitely interested in refactoring this as much as possible.
FROM node:8.12.0
RUN mkdir /app
WORKDIR /app
ENV PATH /app/node_modules/.bin:$PATH
RUN apt-get update && apt-get install snmp -y
RUN npm install --unsafe-perm=true
RUN apt-get update \
&& apt-get install -y \
gconf-service \
libasound2 \
libatk1.0-0 \
libatk-bridge2.0-0 \
libc6 \
libcairo2 \
libcups2 \
libdbus-1-3 \
libexpat1 \
libfontconfig1 \
libgcc1 \
libgconf-2-4 \
libgdk-pixbuf2.0-0 \
libglib2.0-0 \
libgtk-3-0 \
libnspr4 \
libpango-1.0-0 \
libpangocairo-1.0-0 \
libstdc++6 \
libx11-6 \
libx11-xcb1 \
libxcb1 \
libxcomposite1 \
libxcursor1 \
libxdamage1 \
libxext6 \
libxfixes3 \
libxi6 \
libxrandr2 \
libxrender1 \
libxss1 \
libxtst6 \
ca-certificates \
fonts-liberation \
libappindicator1 \
libnss3 \
lsb-release \
xdg-utils \
wget
COPY . /app
# Installing puppeteer and chromium for generating PDF of the invoices.
# Install latest chrome dev package and fonts to support major charsets (Chinese, Japanese, Arabic, Hebrew, Thai and a few others)
# Note: this installs the necessary libs to make the bundled version of Chromium that Puppeteer
# installs, work.
RUN apt-get update \
&& apt-get install -y wget gnupg libpam-cracklib \
&& wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \
&& sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' \
&& apt-get update \
&& apt-get install -y google-chrome-unstable fonts-ipafont-gothic fonts-wqy-zenhei fonts-thai-tlwg fonts-kacst fonts-freefont-ttf \
--no-install-recommends \
&& rm -rf /var/lib/apt/lists/*
# Uncomment to skip the chromium download when installing puppeteer. If you do,
# you'll need to launch puppeteer with:
# browser.launch({executablePath: 'google-chrome-unstable'})
# ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD true
# Install puppeteer so it's available in the container.
RUN npm i puppeteer \
# Add user so we don't need --no-sandbox.
# same layer as npm install to keep re-chowned files from using up several hundred MBs more space
&& groupadd -r pptruser && useradd -r -g pptruser -G audio,video pptruser \
&& mkdir -p /home/pptruser/Downloads \
&& chown -R pptruser:pptruser /home/pptruser \
&& chown -R pptruser:pptruser /app/node_modules
#build the api, and move into place.... framework options are limited with the build.
RUN npm i puppeteer kiwi-server-cli && kc build -e prod
RUN rm -Rf ./environments & rm -Rf ./src && cp -R ./dist/prod/* .
# Run everything after as non-privileged user.
# USER pptruser
CMD ["google-chrome-unstable"] # I have tried adding this here as well "&&", "node", "src/server.js"
For pushing the image I'm using this command:
docker push gcr.io/<projectid>/api:latest-<version> and I have the credentials setup with cloud auth configure-docker and here's a sanitized version of the yaml manifest:
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
kompose.cmd: kompose convert -f ./docker-compose.yml
kompose.version: 1.21.0 ()
creationTimestamp: null
labels:
io.kompose.service: api
name: api
spec:
replicas: 1
selector:
matchLabels:
io.kompose.service: api
strategy:
type: Recreate
template:
metadata:
annotations:
kompose.cmd: kompose convert -f ./docker-compose.yml
kompose.version: 1.21.0 ()
creationTimestamp: null
labels:
io.kompose.service: api
spec:
containers:
- args:
- bash
- -c
- node src/server.js
env:
- name: NODE_ENV
value: production
- name: TZ
value: America/New_York
image: gcr.io/<projectId>/api:latest-0.0.9
imagePullPolicy: Always
name: api
ports:
- containerPort: 8087
resources: {}
volumeMounts:
- mountPath: /app
name: api-claim0
- mountPath: /files
name: api-claim1
restartPolicy: Always
serviceAccountName: ""
volumes:
- name: api-claim0
persistentVolumeClaim:
claimName: api-claim0
- name: api-claim1
persistentVolumeClaim:
claimName: api-claim1
status: {}
The solution comes from the original intent of the docker-compose.yml file which was converted into a kubernetes manifest via a tool called kompose. The original docker-compose file was intended for development and as such had overrides in place to push the local development environment into the running container.
This was because of this in the yml file:
services:
api:
build: ./api
volumes:
- ./api:/app
- ./api/files:/files
which translates to this on the kubernetes manifest:
volumeMounts:
- mountPath: /app
name: api-claim0
- mountPath: /files
name: api-claim1
volumes:
- name: api-claim0
persistentVolumeClaim:
claimName: api-claim0
- name: api-claim1
persistentVolumeClaim:
claimName: api-claim1
Which Kubernetes has no files to supply, and the app is essentially overwritten with an empty volume, so the file is not found.
removal of the directives in the kubernetes manifest resulted in success.
Reminder to us all to be mindful.
To manage images [1] includes listing images in a repository, adding tags, deleting tags, copying images to a new repository, and deleting images. I hope the troubleshooting documents [2] could be helpful for you to troubleshoot the issue.
[1] https://cloud.google.com/container-registry/docs/managing
[2] https://cloud.google.com/container-registry/docs/troubleshooting
I use Jenkins as a docker container in a custom Docker image, starting it like:
docker run -dt --privileged -p 8080:8080 -p 80:80 -p 443:443 -p 40440:40440 \
--name jenkins-master-$(hostname) \
--restart unless-stopped \
-h jenkins-master-$(hostname) \
-v $LOCAL_JENKINS_ROOT_DIR/ssl:/etc/nginx/ssl \
-v $LOCAL_JENKINS_ROOT_DIR/users:/var/jenkins/users \
-v $LOCAL_JENKINS_ROOT_DIR/jobs:/var/jenkins/jobs \
-v /sys/fs/cgroup:/sys/fs/cgroup:ro \
-v /var/log/jenkins:/var/log/jenkins \
[artifactory-url-here]/jenkins-master:prod
Lately, I've noticed that my container randomly restarts throughout the day, and using docker events I discovered that everytime it was restarted automatically, there was a quadruple like this:
2020-02-28T12:55:08.022251372+01:00 container die 87c7c5601bdee803072d2c8fe9405cb56b765ba6b1298461fbe17304979d7c2a (exitCode=0, image=ulcentral.ullink.lan:5001/ul-jenkins-master:prod, name=jenkins-master-cos-jenkins-prd-qEvu, version=1.0+beta.2)
2020-02-28T12:55:08.278391656+01:00 network disconnect c5afeafee778f463af73e5b7cc305cb90f0a6d6c0d81e8814cd43e5d73bac24e (container=87c7c5601bdee803072d2c8fe9405cb56b765ba6b1298461fbe17304979d7c2a, name=bridge, type=bridge)
2020-02-28T12:55:08.426582932+01:00 network connect c5afeafee778f463af73e5b7cc305cb90f0a6d6c0d81e8814cd43e5d73bac24e (container=87c7c5601bdee803072d2c8fe9405cb56b765ba6b1298461fbe17304979d7c2a, name=bridge, type=bridge)
2020-02-28T12:55:08.993064785+01:00 container start 87c7c5601bdee803072d2c8fe9405cb56b765ba6b1298461fbe17304979d7c2a (image=ulcentral.ullink.lan:5001/ul-jenkins-master:prod, name=jenkins-master-cos-jenkins-prd-qEvu, version=1.0+beta.2)
The exitcode is 0, I need more info on how to debug the underlying reason that the container stops.
Docker image (some sections omitted due to privacy reasons):
FROM debian:stretch
LABEL version 1.0+beta.2
ENV JENKINS_HOME /var/jenkins
ENV DEBIAN_FRONTEND noninteractive
RUN echo "moonlight:/share/dev-common/Applications/x86-64/linux /mnt/applis nfs defaults 0 0" >> /etc/fstab && \
echo "moonlight:/share/home /home nfs defaults 0 0" >> /etc/fstab && \
echo "sharing:/mnt/samba/share /mnt/share nfs defaults 0 0" >> /etc/fstab
# Global config
# FIXME: nfs mounting hangs forever, so no path, etc...
RUN echo "nslcd nslcd/ldap-base string dc=openldap,dc=ullink,dc=lan" | debconf-set-selections && \
echo "nslcd nslcd/ldap-uris string ldap://ldap" | debconf-set-selections && \
echo "libnss-ldapd:amd64 libnss-ldapd/nsswitch multiselect group, passwd, shadow" | debconf-set-selections
RUN apt-get upgrade -y && apt-get update
RUN apt-get -y install \
git \
libnss-ldapd \
libpam-ldapd \
locales \
maven \
nfs-common \
ntp \
openjdk-8-jdk \
openssh-server \
python2.7 \
sudo \
supervisor \
unzip \
vim \
wget \
ca-certificates \
nginx \
--no-install-recommends
RUN wget https://bootstrap.pypa.io/get-pip.py && \
python get-pip.py && \
rm get-pip.py && \
pip install pywinrm
RUN sed 's/#PermitRootLogin yes/PermitRootLogin yes/' -i /etc/ssh/sshd_config && \
sed 's/# fr_FR.UTF-8/fr_FR.UTF-8/' -i /etc/locale.gen && \
sed 's/# en_US.UTF-8/en_US.UTF-8/' -i /etc/locale.gen && \
locale-gen && \
update-locale LANG=en_US.UTF-8 && \
echo 'Europe/Paris' > /etc/timezone && \
cp /usr/share/zoneinfo/Europe/Paris /etc/localtime
RUN mkdir -p /var/run/sshd \
/var/log/supervisor \
/var/log/jenkins \
/mnt/applis \
/mnt/share \
$JENKINS_HOME/plugins
# until https://github.com/jenkinsci/jenkins/pull/3293 is merged we use a custom Jenkins build
# For some reason wget doesn't recognize the GoDaddy certs, even with `ca-certificates` being installed
RUN wget --no-proxy --no-check-certificate https://ulcentral.itiviti.com/artifactory/ext-release-local/org/jenkins-ci/main/jenkins-war/2.187-ullink/jenkins-war-2.187-ullink.war \
-O $JENKINS_HOME/jenkins.war
COPY packaged/install_plugin.sh $JENKINS_HOME/install_plugin.sh
RUN JENKINS_HOME=$JENKINS_HOME \
$JENKINS_HOME/install_plugin.sh \
[A LIST OF JENKINS PLUGINS]
COPY packaged/bootstrap.sh /var/bootstrap.sh
COPY packaged/subversion_servers /root/.subversion/servers
ADD packaged/ssh $JENKINS_HOME/ssh
COPY packaged/proxy.xml $JENKINS_HOME/proxy.xml
COPY packaged/commit-jenkins-config.sh /usr/bin/commit-jenkins-config.sh
COPY packaged/ntp.conf /etc/ntp.conf
COPY packaged/default.conf /etc/nginx/sites-available/default
RUN chmod +x /var/bootstrap.sh
RUN chmod +x /usr/bin/commit-jenkins-config.sh
EXPOSE 8080
EXPOSE 443
# For jnlp agents
EXPOSE 40440
WORKDIR $JENKINS_HOME
CMD ["/var/bootstrap.sh"]
# OUTPUT OF DOCKER INFO:
Containers: 1
Running: 1
Paused: 0
Stopped: 0
Images: 57
Server Version: 18.09.6
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: bb71b10fd8f58240ca47fbb579b9d1028eea7c84
runc version: 2b18fe1d885ee5083ef9f0838fee39b62d653e30
init version: fec3683
Security Options:
seccomp
Profile: default
Kernel Version: 3.10.0-693.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.639GiB
Name: cos-jenkins-prd
ID: NTNU:KI65:HCY3:5EG4:AGN5:NUGB:HS7U:I75I:LSVE:EEBN:ZY7D:HU3M
Docker Root Dir: /opt/docker/lib
Debug Mode (client): false
Debug Mode (server): false
HTTP Proxy: http://proxy.ullink.lan:9876/
HTTPS Proxy: http://proxy.ullink.lan:9876/
No Proxy: ulcentral,.ullink.lan,localhost,127.0.0.1
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
localhost:5001
ulcentral.ullink.lan:5001
127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine
EDIT
Contents of the bootstrap.sh file (used to start the jenkins process):
#!/bin/bash
export http_proxy https_proxy no_proxy
systemctl start commit-jenkins-config.timer
nslcd
rpcbind
service ntp start
service nginx start
$(which sshd)
cd $JENKINS_HOME && git pull
java -XX:+ExitOnOutOfMemoryError -Dhttp.proxyHost=proxy.ullink.lan -Dhttp.proxyPort=9876 -Dhttps.proxyHost=proxy.ullink.lan -Dhttps.proxyPort=9876 -Dhttp.nonProxyHosts="LIST OF HOSTS HERE" -Dhudson.remoting.ExportTable.unexportLogSize=0 -Dhudson.model.ParametersAction.keepUndefinedParameters=false -Dhudson.model.DirectoryBrowserSupport.CSP="" -jar jenkins.war -httpPort=8080 --sessionTimeout=10080 --httpKeepAliveTimeout=60000 2>&1 | tee /var/log/jenkins/jenkins.log
docker logs {containername} - it shows what was going on inside of the container.
or
kubectl logs {containername} - shows you logs, if you are using kuberneties.
for me it shows logs even if container was stopped(restarted) which is very useful to se the reason why.
And you need to make sure that you output as much as you can inside of container - if you are running some application inside do console output for any error you have.
But in general i bet that that is happening for you because of memory lack - you can just try increase amount of memory available for container. (depends on the way you are running container, would definitely recommend kubernetes for production - way more control on everything).