mkdir: cannot create directory '/var/lib/rabbitmq/mnesia': Permission denied - docker

When I use dockerfile to build a rabbitmq image, it runs well with docker and kubernetes. but when I add pvc on it, container shows ' Crash loop back off'.
and the error is :
The following plugins have been configured:
rabbitmq_management
rabbitmq_management_agent
rabbitmq_web_dispatch
Applying plugin configuration to rabbit#rongqiyun-dev-base-qqqqqqqq-0...
The following plugins have been enabled:
rabbitmq_management
rabbitmq_management_agent
rabbitmq_web_dispatch
set 3 plugins.
Offline change; changes will take effect at broker restart.
mkdir: cannot create directory '/var/lib/rabbitmq/mnesia': Permission denied
Failed to create directory: /var/lib/rabbitmq/mnesia
Here is my dockerfile
FROM hub.gcloud.lab/library/centos:7.4.1708
WORKDIR /root
RUN groupadd rabbitmq
RUN useradd -g rabbitmq rabbitmq
RUN mkdir -p /var/lib/rabbitmq/mnesia && \
chown -R rabbitmq:rabbitmq /var/lib/rabbitmq && \
chown -R rabbitmq:rabbitmq /var/lib/rabbitmq/mnesia
RUN yum install -y epel-release
RUN yum install -y deltarpm gcc glibc-devel make ncurses-devel openssl-devel xmlto perl wget xz lsof dos2unix unixODBC unixODBC-devel wxBase wxGTK SDL wxGTK-gl socat git
RUN yum clean all
RUN wget https://packages.erlang-solutions.com/erlang-solutions-1.0-1.noarch.rpm
RUN rpm -Uvh erlang-solutions-1.0-1.noarch.rpm
RUN yum install -y erlang
RUN yum install -y initscripts logrotate
RUN wget https://github.com/rabbitmq/rabbitmq-server/releases/download/v3.7.8/rabbitmq-server-3.7.8-1.el6.noarch.rpm
RUN rpm -ivh rabbitmq-server-3.7.8-1.el6.noarch.rpm
ENTRYPOINT rabbitmq-plugins enable rabbitmq_management && rabbitmq-server
EXPOSE 5672
EXPOSE 15672
CMD ["rabbitmq-server"]
And this is my statefulset.
Before I add rabbitmq-persistent-storage, it can start normally.
however when I add it, container can't start
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: rongqiyun-dev-base-qqqqqqqq
namespace: rongqiyun-dev
spec:
podManagementPolicy: OrderedReady
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: rongqiyun-dev
ns-baseServiceName: rongqiyun-dev-base-qqqqqqqq
serviceName: qqqqqqqq
template:
metadata:
labels:
app: rongqiyun-dev
ns-baseServiceName: rongqiyun-dev-base-qqqqqqqq
spec:
containers:
- env:
- name: RABBITMQ_DEFAULT_PASS
value: "12345"
image: hub.gcloud.lab/library/rabbitmq:3.7
imagePullPolicy: Always
name: qqqqqqqq
ports:
- containerPort: 15672
protocol: TCP
- containerPort: 5672
protocol: TCP
resources:
limits:
cpu: "1"
memory: 800Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/lib/rabbitmq
name: rabbitmq-persistent-storage
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 0
terminationGracePeriodSeconds: 30
updateStrategy:
rollingUpdate:
partition: 0
type: RollingUpdate
volumeClaimTemplates:
- metadata:
name: rabbitmq-persistent-storage
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
storageClassName: rook-ceph-block

try:
chmod 777 /var/lib/rabbitmq/mnesia

Related

Jenkins Container - Docker installation - docker daemon not working

I am trying to install Docker within Jenkins docker container.
Here is my Dockerfile.
from jenkinsci/jenkins:lts
USER root
RUN apt-get update -qq \
&& apt-get install -qqy apt-transport-https ca-certificates curl gnupg2 software-properties-common
RUN curl -fsSL https://download.docker.com/linux/debian/gpg | apt-key add -
RUN add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/debian \
$(lsb_release -cs) \
stable"
RUN apt-get update -qq \
&& apt-get install docker-ce=17.12.1~ce-0~debian -y
RUN usermod -aG docker jenkins && docker images
As I am mentioning firing the command docker images and here it throws error
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Any suggestion how make Docker run ?
(not important, FYI):
Finally I am running it from kubernetes which works fine without Docker but I need the docker images to be build from jenkins ci, hence it is needed.
Jenkins yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: jenkins
labels:
app: jenkins
spec:
selector:
matchLabels:
app: jenkins
replicas: 1
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
template:
metadata:
labels:
app: jenkins
spec:
serviceAccountName: jenkins
containers:
- name: jenkins
image: europe-west1-docker.pkg.dev/xxx/xxxxx-custom-jenkins/jenkins:v5
imagePullPolicy: IfNotPresent
env:
- name: JAVA_OPTS
value: -Xmx2048m -Dhudson.slaves.NodeProvisioner.MARGIN=50 -Dhudson.slaves.NodeProvisioner.MARGIN0=0.85
ports:
- containerPort: 8080
protocol: TCP
- containerPort: 50000
protocol: TCP
volumeMounts:
- mountPath: /var/jenkins_home
name: jenkins
restartPolicy: Always
securityContext:
runAsUser: 0
terminationGracePeriodSeconds: 30
volumes:
- name: jenkins
persistentVolumeClaim:
claimName: pvc-jenkins

How to use the --tmpfs mount in the Kubernetes YAML file?

I am trying to use the --tmpfs flag from the docker run command in the Kubernetes YAML file but could not find the way.
sudo docker run --name=ubuntu-gnome -d --rm \
--tmpfs /run --tmpfs /run/lock --tmpfs /tmp \
--cap-add SYS_BOOT --cap-add SYS_ADMIN \
-v /sys/fs/cgroup:/sys/fs/cgroup \
-p 5901:5901 -p 6901:6901 \
darkdragon001/ubuntu-gnome-vnc
You're looking for an emptyDir volume, such as the following:
apiVersion: v1
kind: Pod
metadata:
name: demo
spec:
containers:
- name: container
...
volumeMounts:
- mountPath: /tmp
name: tmp
subPath: tmp
- mountPath: /run
name: tmp
subPath: run
- mountPath: /run/lock
name: tmp
subPath: run-lock
volumes:
- name: tmp
emptyDir:
medium: Memory
sizeLimit: 64Mi

Pod with S3FS stuck in terminating status

I need to mount an s3 bucket in a kubernetes pod. I am using this guide to help me. It works perfectly, however, the pod is stuck indefinitely in the status of "terminating" when giving the command to delete the pod. I don't know why that is.
Here the .yaml
apiVersion: v1
kind: Pod
metadata:
name: worker
spec:
volumes:
- name: mntdatas3fs
emptyDir: {}
- name: devfuse
hostPath:
path: /dev/fuse
restartPolicy: Always
containers:
- image: nginx
name: s3-test
securityContext:
privileged: true
volumeMounts:
- name: mntdatas3fs
mountPath: /var/s3fs:shared
- name: s3fs
image: meain/s3-mounter
imagePullPolicy: IfNotPresent
securityContext:
privileged: true
env:
- name: S3_REGION
value: "us-east-1"
- name: S3_BUCKET
value: "xxxxxxx"
- name: AWS_KEY
value: "xxxxxx"
- name: AWS_SECRET_KEY
value: "xxxxxx"
volumeMounts:
- name: devfuse
mountPath: /dev/fuse
- name: mntdatas3fs
mountPath: /var/s3fs:shared
Here the Dockerfile of meain/s3-mounter used by s3fs container
FROM alpine:3.3
ENV MNT_POINT /var/s3fs
ARG S3FS_VERSION=v1.86
RUN apk --update --no-cache add fuse alpine-sdk automake autoconf libxml2-dev fuse-dev curl-dev git bash; \
git clone https://github.com/s3fs-fuse/s3fs-fuse.git; \
cd s3fs-fuse; \
git checkout tags/${S3FS_VERSION}; \
./autogen.sh; \
./configure --prefix=/usr; \
make; \
make install; \
make clean; \
rm -rf /var/cache/apk/*; \
apk del git automake autoconf;
RUN mkdir -p "$MNT_POINT"
COPY run.sh run.sh
CMD ./run.sh
Here the run.sh copied into the container
#!/bin/sh
set -e
echo "$AWS_KEY:$AWS_SECRET_KEY" > passwd && chmod 600 passwd
s3fs "$S3_BUCKET" "$MNT_POINT" -o passwd_file=passwd && tail -f /dev/null
I had this exact problem with a very similiar setup. s3fs mounts s3 to /var/s3fs. The mount has to be unmounted before the pod can happily be terminated. This is done done with: umount /var/s3fs. See https://manpages.ubuntu.com/manpages/xenial/man1/s3fs.1.html
So in your case adding
lifecycle:
preStop:
exec:
command: ["sh","-c","umount /var/mnts3fs"]
Should fix it.

Not able to connect to redis pod in kubernetes using NodePort service

I'm fairly new to kubernetes and I'm trying to orchestrate my rails app using minikube on my MacBook. My app includes MySQL, Redis and Sidekiq. I'm running webapp, sidekiq, redis and database in isolated pods. Sidekiq pod is not connecting to redis pod.
kubectl logs of sidekiq pod says this:
2020-09-15T14:01:16.978Z 1 TID-gnaz4yzs0 INFO: Booting Sidekiq 4.2.10 with redis options {:url=>"redis://redis:6379/0"}
2020-09-15T14:01:18.475Z 1 TID-gnaz4yzs0 INFO: Running in ruby 2.6.3p62 (2019-04-16 revision 67580) [x86_64-linux]
2020-09-15T14:01:18.475Z 1 TID-gnaz4yzs0 INFO: See LICENSE and the LGPL-3.0 for licensing details.
2020-09-15T14:01:18.475Z 1 TID-gnaz4yzs0 INFO: Upgrade to Sidekiq Pro for more features and support: http://sidekiq.org
/usr/local/bundle/gems/redis-3.3.1/lib/redis/client.rb:459: warning: constant ::Fixnum is deprecated
Error connecting to Redis on redis:6379 (Errno::ECONNREFUSED)
/usr/local/bundle/gems/redis-3.3.1/lib/redis/client.rb:345:in `rescue in establish_connection'
/usr/local/bundle/gems/redis-3.3.1/lib/redis/client.rb:330:in `establish_connection'
/usr/local/bundle/gems/redis-3.3.1/lib/redis/client.rb:101:in `block in connect'
/usr/local/bundle/gems/redis-3.3.1/lib/redis/client.rb:293:in `with_reconnect'
/usr/local/bundle/gems/redis-3.3.1/lib/redis/client.rb:100:in `connect'
/usr/local/bundle/gems/redis-3.3.1/lib/redis/client.rb:364:in `ensure_connected'
/usr/local/bundle/gems/redis-3.3.1/lib/redis/client.rb:221:in `block in process'
/usr/local/bundle/gems/redis-3.3.1/lib/redis/client.rb:306:in `logging'
/usr/local/bundle/gems/redis-3.3.1/lib/redis/client.rb:220:in `process'
/usr/local/bundle/gems/redis-3.3.1/lib/redis/client.rb:120:in `call'
/usr/local/bundle/gems/redis-3.3.1/lib/redis.rb:251:in `block in info'
/usr/local/bundle/gems/redis-3.3.1/lib/redis.rb:58:in `block in synchronize'
/usr/local/lib/ruby/2.6.0/monitor.rb:230:in `mon_synchronize'
/usr/local/bundle/gems/redis-3.3.1/lib/redis.rb:58:in `synchronize'
/usr/local/bundle/gems/redis-3.3.1/lib/redis.rb:250:in `info'
/usr/local/bundle/gems/sidekiq-4.2.10/lib/sidekiq.rb:113:in `block in redis_info'
/usr/local/bundle/gems/sidekiq-4.2.10/lib/sidekiq.rb:95:in `block in redis'
/usr/local/bundle/gems/connection_pool-2.2.3/lib/connection_pool.rb:63:in `block (2 levels) in with'
/usr/local/bundle/gems/connection_pool-2.2.3/lib/connection_pool.rb:62:in `handle_interrupt'
/usr/local/bundle/gems/connection_pool-2.2.3/lib/connection_pool.rb:62:in `block in with'
/usr/local/bundle/gems/connection_pool-2.2.3/lib/connection_pool.rb:59:in `handle_interrupt'
/usr/local/bundle/gems/connection_pool-2.2.3/lib/connection_pool.rb:59:in `with'
/usr/local/bundle/gems/sidekiq-4.2.10/lib/sidekiq.rb:92:in `redis'
/usr/local/bundle/gems/sidekiq-4.2.10/lib/sidekiq.rb:106:in `redis_info'
/usr/local/bundle/gems/sidekiq-4.2.10/lib/sidekiq/cli.rb:71:in `run'
/usr/local/bundle/gems/sidekiq-4.2.10/bin/sidekiq:12:in `<top (required)>'
/usr/local/bundle/bin/sidekiq:29:in `load'
/usr/local/bundle/bin/sidekiq:29:in `<main>'
My webapp.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: checklist-deployment
spec:
replicas: 2
template:
metadata:
labels:
app: railsapp
spec:
containers:
- name: webapp
image: masettyabhishek/checklist:latest
command: ["rails", "s", "-p", "3001", "-b", "0.0.0.0", "-e", "PRODUCTION"]
ports:
- name: checklist-port
containerPort: 3001
env:
- name: MYSQL_HOST
value: database-service
- name: MYSQL_USER
value: root
- name: MYSQL_PASSWORD
value: Mission2019
- name: MYSQL_DATABASE
value: checklist
- name: MYSQL_ROOT_PASSWORD
value: Mission2019
- name: REDIS_URL
value: redis
- name: REDIS_PORT
value: "6379"
selector:
matchLabels:
app: railsapp
webapp-service.yaml
apiVersion: v1
kind: Service
metadata:
name: webapp-service
spec:
ports:
- port: 3001
protocol: TCP
type: NodePort
selector:
app: railsapp
sidekiq.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: sidekiq-deployment
spec:
replicas: 1
template:
metadata:
labels:
instance: sidekiq
spec:
containers:
- name: sidekiq
image: masettyabhishek/checklist:latest
command: ["sidekiq", "-C", "config/sidekiq.yml"]
env:
- name: MYSQL_HOST
value: database-service
- name: MYSQL_USER
value: root
- name: MYSQL_PASSWORD
value: Mission2019
- name: MYSQL_DATABASE
value: checklist
- name: MYSQL_ROOT_PASSWORD
value: Mission2019
- name: REDIS_URL
value: redis
- name: REDIS_PORT
value: "6379"
ports:
- name: redis-port
containerPort: 6379
selector:
matchLabels:
instance: sidekiq
redis.yaml
apiVersion: v1
kind: Pod
metadata:
name: redis-pod
spec:
containers:
- name: redis
image: redis:alpine
command: ["redis-server"]
ports:
- containerPort: 6379
---
apiVersion: v1
kind: Service
metadata:
name: redis
spec:
selector:
name: redis-pod
instance: sidekiq
app: railsapp
type: NodePort
ports:
- port: 6379
This is sidekiq.yml in my rails app
Sidekiq.configure_server do |config|
config.redis = { url: "redis://#{ENV['REDIS_URL']}:#{ENV['REDIS_PORT']}/0"}
end
Sidekiq.configure_client do |config|
config.redis = { url: "redis://#{ENV['REDIS_URL']}:#{ENV['REDIS_PORT']}/0"}
end
This is Dockerfile if that helps to answer the question.
FROM ubuntu:16.04
ENV RUBY_MAJOR="2.6" \
RUBY_VERSION="2.6.3" \
RUBYGEMS_VERSION="3.0.8" \
BUNDLER_VERSION="1.17.3" \
RAILS_VERSION="5.2.1" \
RAILS_ENV="production" \
GEM_HOME="/usr/local/bundle"
ENV BUNDLE_PATH="$GEM_HOME" \
BUNDLE_BIN="$GEM_HOME/bin" \
BUNDLE_SILENCE_ROOT_WARNING=1 \
BUNDLE_APP_CONFIG="$GEM_HOME"
ENV PATH="$BUNDLE_BIN:$GEM_HOME/bin:$GEM_HOME/gems/bin:$PATH"
USER root
RUN apt-get update && \
apt-get -y install sudo
RUN echo "%sudo ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers && \
addgroup --gid 1024 stars && \
useradd -G stars,sudo -d /home/user --shell /bin/bash -m user
RUN mkdir -p /usr/local/etc \
&& echo 'install: --no-document' >> /usr/local/etc/gemrc \
&& echo 'update: --no-document' >> /usr/local/etc/gemrc
USER user
RUN sudo apt-get -y install --no-install-recommends vim make gcc zlib1g-dev autoconf build-essential libssl-dev libsqlite3-dev \
curl htop unzip mc openssh-server openssl bison libgdbm-dev ruby git libmysqlclient-dev tzdata mysql-client
RUN sudo rm -rf /var/lib/apt/lists/* \
&& sudo curl -fSL -o ruby.tar.gz "http://cache.ruby-lang.org/pub/ruby/$RUBY_MAJOR/ruby-$RUBY_VERSION.tar.gz" \
&& sudo mkdir -p /usr/src/ruby \
&& sudo tar -xzf ruby.tar.gz -C /usr/src/ruby --strip-components=1 \
&& sudo rm ruby.tar.gz
USER root
RUN cd /usr/src/ruby \
&& { sudo echo '#define ENABLE_PATH_CHECK 0'; echo; cat file.c; } > file.c.new && mv file.c.new file.c \
&& autoconf \
&& ./configure --disable-install-doc
USER user
RUN cd /usr/src/ruby \
&& sudo make -j"$(nproc)" \
&& sudo make install \
&& sudo gem update --system $RUBYGEMS_VERSION \
&& sudo rm -r /usr/src/ruby
RUN sudo gem install bundler --version "$BUNDLER_VERSION"
RUN sudo mkdir -p "$GEM_HOME" "$BUNDLE_BIN" \
&& sudo chmod 777 "$GEM_HOME" "$BUNDLE_BIN" \
&& sudo gem install rails --version "$RAILS_VERSION"
RUN mkdir -p ~/.ssh && \
chmod 0700 ~/.ssh && \
ssh-keyscan github.com > ~/.ssh/known_hosts
ARG ssh_pub_key
ARG ssh_prv_key
RUN echo "$ssh_pub_key" > ~/.ssh/id_rsa.pub && \
echo "$ssh_prv_key" > ~/.ssh/id_rsa && \
chmod 600 ~/.ssh/id_rsa.pub && \
chmod 600 ~/.ssh/id_rsa
USER root
RUN curl -sL https://deb.nodesource.com/setup_10.x | sudo -E bash -
RUN apt-get install -y nodejs
USER user
WORKDIR /data
RUN sudo mkdir /data/checklist
WORKDIR /data/checklist
ADD Gemfile Gemfile.lock ./
RUN sudo chown -R user /data/checklist
RUN bundle install
ADD . .
RUN sudo chown -R user /data/checklist
EXPOSE 3001
ENV RAILS_SERVE_STATIC_FILES true
ENV RAILS_LOG_TO_STDOUT true
RUN chmod +x ./config/docker/prepare-db.sh && sh ./config/docker/prepare-db.sh
ENTRYPOINT ["bundle", "exec"]
CMD ["sh", "./config/docker/startup.sh"]
kubectl describe svc redis
➜ checklist kubectl describe svc redis
Name: redis
Namespace: default
Labels: <none>
Annotations: <none>
Selector: app=railsapp,instance=sidekiq,name=redis-pod
Type: NodePort
IP: 10.103.6.43
Port: <unset> 6379/TCP
TargetPort: 6379/TCP
NodePort: <unset> 31886/TCP
Endpoints: <none>
Session Affinity: None
External Traffic Policy: Cluster
Events: <none>
As you can see the Endpoints section of the redis service is not having pod IPs which is the reason for Connection refused error. The Pod need to have label matching with selector of service. Updating the redis pod with labels as below should solve the issue.
apiVersion: v1
kind: Pod
metadata:
name: redis-pod
labels:
instance: sidekiq
app: ailsapp
name: redis-pod
spec:
containers:
- name: redis
image: redis:alpine
command: ["redis-server"]
ports:
- containerPort: 6379

Spark submit fails on Kubernetes (EKS) with "invalid null input: name"

I am trying to run spark sample SparkPi docker image on EKS. My Spark version is 3.0.
I created spark serviceaccount and role binding. When I submit the job, there is error below:
2020-07-05T12:19:40.862635502Z Exception in thread "main" java.io.IOException: failure to login
2020-07-05T12:19:40.862756537Z at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:841)
2020-07-05T12:19:40.862772672Z at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:777)
2020-07-05T12:19:40.862777401Z at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:650)
2020-07-05T12:19:40.862788327Z at org.apache.spark.util.Utils$.$anonfun$getCurrentUserName$1(Utils.scala:2412)
2020-07-05T12:19:40.862792294Z at scala.Option.getOrElse(Option.scala:189)
2020-07-05T12:19:40.8628321Z at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2412)
2020-07-05T12:19:40.862836906Z at org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.configurePod(BasicDriverFeatureStep.scala:119)
2020-07-05T12:19:40.862907673Z at org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$3(KubernetesDriverBuilder.scala:59)
2020-07-05T12:19:40.862917119Z at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
2020-07-05T12:19:40.86294845Z at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
2020-07-05T12:19:40.862964245Z at scala.collection.immutable.List.foldLeft(List.scala:89)
2020-07-05T12:19:40.862979665Z at org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:58)
2020-07-05T12:19:40.863055425Z at org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:98)
2020-07-05T12:19:40.863060434Z at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$4(KubernetesClientApplication.scala:221)
2020-07-05T12:19:40.863096062Z at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$4$adapted(KubernetesClientApplication.scala:215)
2020-07-05T12:19:40.863103831Z at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2539)
2020-07-05T12:19:40.863163804Z at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:215)
2020-07-05T12:19:40.863168546Z at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:188)
2020-07-05T12:19:40.863194449Z at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
2020-07-05T12:19:40.863218817Z at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
2020-07-05T12:19:40.863246594Z at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
2020-07-05T12:19:40.863252341Z at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
2020-07-05T12:19:40.863277236Z at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
2020-07-05T12:19:40.863314173Z at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
2020-07-05T12:19:40.863319847Z at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
2020-07-05T12:19:40.863653699Z Caused by: javax.security.auth.login.LoginException: java.lang.NullPointerException: invalid null input: name
2020-07-05T12:19:40.863660447Z at com.sun.security.auth.UnixPrincipal.<init>(UnixPrincipal.java:71)
2020-07-05T12:19:40.863663683Z at com.sun.security.auth.module.UnixLoginModule.login(UnixLoginModule.java:133)
2020-07-05T12:19:40.863667173Z at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2020-07-05T12:19:40.863670199Z at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
2020-07-05T12:19:40.863673467Z at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2020-07-05T12:19:40.86367674Z at java.lang.reflect.Method.invoke(Method.java:498)
2020-07-05T12:19:40.863680205Z at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
2020-07-05T12:19:40.863683401Z at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
2020-07-05T12:19:40.86368671Z at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
2020-07-05T12:19:40.863689794Z at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
2020-07-05T12:19:40.863693081Z at java.security.AccessController.doPrivileged(Native Method)
2020-07-05T12:19:40.863696183Z at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
2020-07-05T12:19:40.863698579Z at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
2020-07-05T12:19:40.863700844Z at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:815)
2020-07-05T12:19:40.863703393Z at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:777)
2020-07-05T12:19:40.86370659Z at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:650)
2020-07-05T12:19:40.863709809Z at org.apache.spark.util.Utils$.$anonfun$getCurrentUserName$1(Utils.scala:2412)
2020-07-05T12:19:40.863712847Z at scala.Option.getOrElse(Option.scala:189)
2020-07-05T12:19:40.863716102Z at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2412)
2020-07-05T12:19:40.863719273Z at org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.configurePod(BasicDriverFeatureStep.scala:119)
2020-07-05T12:19:40.86372651Z at org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$3(KubernetesDriverBuilder.scala:59)
2020-07-05T12:19:40.863728947Z at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
2020-07-05T12:19:40.863731207Z at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
2020-07-05T12:19:40.863733458Z at scala.collection.immutable.List.foldLeft(List.scala:89)
2020-07-05T12:19:40.863736237Z at org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:58)
2020-07-05T12:19:40.863738769Z at org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:98)
2020-07-05T12:19:40.863742105Z at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$4(KubernetesClientApplication.scala:221)
2020-07-05T12:19:40.863745486Z at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$4$adapted(KubernetesClientApplication.scala:215)
2020-07-05T12:19:40.863749154Z at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2539)
2020-07-05T12:19:40.863752601Z at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:215)
2020-07-05T12:19:40.863756118Z at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:188)
2020-07-05T12:19:40.863759673Z at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
2020-07-05T12:19:40.863762774Z at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
2020-07-05T12:19:40.863765929Z at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
2020-07-05T12:19:40.86376906Z at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
2020-07-05T12:19:40.863792673Z at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
2020-07-05T12:19:40.863797161Z at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
2020-07-05T12:19:40.863799703Z at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
2020-07-05T12:19:40.863802085Z
2020-07-05T12:19:40.863804184Z at javax.security.auth.login.LoginContext.invoke(LoginContext.java:856)
2020-07-05T12:19:40.863806454Z at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
2020-07-05T12:19:40.863808705Z at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
2020-07-05T12:19:40.863811134Z at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
2020-07-05T12:19:40.863815328Z at java.security.AccessController.doPrivileged(Native Method)
2020-07-05T12:19:40.863817575Z at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
2020-07-05T12:19:40.863819856Z at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
2020-07-05T12:19:40.863829171Z at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:815)
2020-07-05T12:19:40.86385963Z ... 24 more
My deployments are:
apiVersion: v1
kind: Namespace
metadata:
name: helios
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: spark
namespace: helios
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: spark-role-binding
namespace: helios
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: edit
subjects:
- kind: ServiceAccount
name: spark
namespace: helios
---
apiVersion: batch/v1
kind: Job
metadata:
name: spark-pi
namespace: helios
spec:
template:
spec:
containers:
- name: spark-pi
image: <registry>/spark-pi-3.0
command: [
"/bin/sh",
"-c",
"/opt/spark/bin/spark-submit \
--master k8s://https://<EKS_API_SERVER> \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.kubernetes.namespace=helios
--conf spark.executor.instances=2 \
--conf spark.executor.memory=2G \
--conf spark.executor.cores=2 \
--conf spark.kubernetes.container.image=<registry>/spark-pi-3.0 \
--conf spark.kubernetes.container.image.pullPolicy=Always \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.jars.ivy=/tmp/.ivy
local:///opt/spark/examples/jars/spark-examples_2.11-2.4.0.jar"
]
serviceAccountName: spark
restartPolicy: Never
The docker image is created using OOTB dockerfile provided in Spark installation.
docker build -t spark:latest -f kubernetes/dockerfiles/spark/Dockerfile .
What am I doing wrong here? Please help.
SOLUTION
Finally it worked out after I comment the below line from docker file.
USER ${spark_uid}
Though, now, container is running as root but at least it is working.
I had the same problem. I solved it by changing the k8s job.
Hadoop is failing to find a username for the user. You can see the problem by running whoami in the container, which yields whoami: cannot find name for user ID 185. The spark image entrypoint.sh contains code to add the user to /etc/passwd, which sets a username. However command bypasses the entrypoint.sh, so instead you should use args like so:
containers:
- name: spark-pi
image: <registry>/spark-pi-3.0
args: [
"/bin/sh",
"-c",
"/opt/spark/bin/spark-submit \
--master k8s://https://10.100.0.1:443 \
--deploy-mode cluster ..."
]
Seems like you are missing the ServiceAccount/AWS role credentials so that your job can connect to the EKS cluster.
I recommend you set up fine-grained IAM roles for service accounts.
Basically, you would have something like this (after you set up the roles in AWS):
apiVersion: v1
kind: ServiceAccount
metadata:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/my-serviceaccount-Role1
name: spark
namespace: helios
Then your job would look something like this:
apiVersion: batch/v1
kind: Job
metadata:
name: spark-pi
namespace: helios
spec:
template:
spec:
containers:
- name: spark-pi
image: <registry>/spark-pi-3.0
command: [
"/bin/sh",
"-c",
"/opt/spark/bin/spark-submit \
--master k8s://https://<EKS_API_SERVER> \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.kubernetes.namespace=helios
--conf spark.executor.instances=2 \
--conf spark.executor.memory=2G \
--conf spark.executor.cores=2 \
--conf spark.kubernetes.container.image=<registry>/spark-pi-3.0 \
--conf spark.kubernetes.container.image.pullPolicy=Always \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.jars.ivy=/tmp/.ivy
local:///opt/spark/examples/jars/spark-examples_2.11-2.4.0.jar" ]
env:
- name: AWS_ROLE_ARN
value: arn:aws:iam::123456789012:role/my-serviceaccount-Role1
- name: AWS_WEB_IDENTITY_TOKEN_FILE
value: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
volumeMounts:
- mountPath: /var/run/secrets/eks.amazonaws.com/serviceaccount
name: aws-iam-token
readOnly: true
serviceAccountName: spark
restartPolicy: Never
I had the same problem. I solved it by adding into submit container
export SPARK_USER=spark3
without comment line USER ${spark_uid}
Finally it worked out after I comment the below line from docker file.
USER ${spark_uid}
Though, now, container is running as root but at least it is working.
I ran into the same issue and was able to resolve it by specifying runAsUser on the pod spec without having to modify the spark docker image.
securityContext:
runAsUser: 65534
runAsGroup: 65534
I had the same issue, fixed it by adding
RUN echo 1000:x:1000:0:anonymous uid:/opt/spark:/bin/false >> /etc/passwd
line in the last part Spark Dockerfile
RUN echo '1000:x:1000:0:anonymous uid:/opt/spark:/bin/false' >> /etc/passwd
ENTRYPOINT [ "/opt/entrypoint.sh" ]
# Specify the User that the actual main process will run as
USER ${spark_uid}
so full dockerfile looks like this
cat spark-3.2.0-bin-hadoop3.2/kubernetes/dockerfiles/spark/Dockerfile
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
ARG ROOT_CONTAINER=ubuntu:focal
FROM ${ROOT_CONTAINER}
ARG openjdk_version="8"
ARG spark_uid=1000
# Before building the docker image, first build and make a Spark distribution following
# the instructions in http://spark.apache.org/docs/latest/building-spark.html.
# If this docker file is being used in the context of building your images from a Spark
# distribution, the docker build command should be invoked from the top level directory
# of the Spark distribution. E.g.:
# docker build -t spark:latest -f kubernetes/dockerfiles/spark/Dockerfile .
RUN apt-get update --yes && \
apt-get install --yes --no-install-recommends \
"openjdk-${openjdk_version}-jre-headless" \
ca-certificates-java
RUN apt-get install --yes software-properties-common
RUN add-apt-repository ppa:deadsnakes/ppa
RUN apt-get update && apt-get install -y \
python3.7 \
python3-pip \
python3-distutils \
python3-setuptools
RUN pip install pyspark==3.2.0
RUN set -ex && \
sed -i 's/http:\/\/deb.\(.*\)/https:\/\/deb.\1/g' /etc/apt/sources.list && \
apt-get update && \
ln -s /lib /lib64 && \
export DEBIAN_FRONTEND=noninteractive && \
apt install -y -qq bash tini libc6 libpam-modules krb5-user libnss3 procps && \
mkdir -p /opt/spark && \
mkdir -p /opt/spark/examples && \
mkdir -p /opt/spark/work-dir && \
mkdir -p /etc/metrics/conf/ && \
mkdir -p /opt/hadoop/ && \
touch /opt/spark/RELEASE && \
rm /bin/sh && \
ln -sv /bin/bash /bin/sh && \
echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && \
chgrp root /etc/passwd && chmod ug+rw /etc/passwd && \
apt-get clean && rm -rf /var/lib/apt/lists/* \
rm -rf /var/cache/apt/*
COPY jars /opt/spark/jars
COPY bin /opt/spark/bin
COPY sbin /opt/spark/sbin
COPY kubernetes/dockerfiles/spark/entrypoint.sh /opt/
COPY kubernetes/dockerfiles/spark/decom.sh /opt/
COPY examples /opt/spark/examples
COPY kubernetes/tests /opt/spark/tests
COPY data /opt/spark/data
COPY conf/prometheus.yaml /etc/metrics/conf/
ENV SPARK_HOME /opt/spark
ENV JAVA_HOME /usr/lib/jvm/java-8-openjdk-amd64
WORKDIR /opt/spark/work-dir
RUN chmod g+w /opt/spark/work-dir
RUN chmod a+x /opt/decom.sh
RUN mkdir -p /opt/spark/logs && \
chown -R 1000:1000 /opt/spark/logs
RUN echo '1000:x:1000:0:anonymous uid:/opt/spark:/bin/false' >> /etc/passwd
RUN cat /etc/passwd
ENTRYPOINT [ "/opt/entrypoint.sh" ]
# Specify the User that the actual main process will run as
USER ${spark_uid}
Build spark-docker image
sudo ./bin/docker-image-tool.sh -r <my_docker_repo>/spark-3.2.0-bin-hadoop3.2-gcs -t <tag_number> build

Resources