GCR - image missing all but two files, local image is complete - docker

I have a container I'm deploying to Kubernetes (GKE), and the image I have built locally is good, and runs as expected, but it appears that the image being pulled from Google Container Registry, when the run command is changed to pwd && ls returns the output shown here:
I 2020-06-17T16:24:54.222382706Z /app
I 2020-06-17T16:24:54.226108583Z lost+found
I 2020-06-17T16:24:54.226143620Z package-lock.json
and the output of the same commands when running in the container locally, with docker run -it <container:tag> bash is this:
#${API_CONTAINER} resolves to gcr.io/<project>/container: I.E. tag gets appended
.../# docker run -it ${API_CONTAINER}latest bash
root#362737147de4:/app# pwd
/app
root#362737147de4:/app# ls
Dockerfile dist files node_modules package.json ssh.bat stop_forever.bat test tsconfig.json
cloudbuild.yaml environments log package-lock.json src startApi.sh swagger.json test.pdf tsconfig.test.json
root#362737147de4:/app#
My thoughts on this start with, either the push to the registry is literally failing to work, or I'm not pulling the right one, i.e. pulling some off latest tag that was build by cloud build in a previous attempt to get this going.
What could be the potential issue? what could potentially fix this issue?
Edit: After using differing tags in deployment, using --no-cache during build, and pulling from the registry on another machine, my inclination is that GKE is having an issue pulling the image from GCR. Is there a way I can put this somewhere else, or get visibility on what's going on with the pull?
EDIT 2:
So Yes, I have a docker file I can share, but please be aware that I have inherited it, and don't understand the process that went into building this, or why some steps were necessary to the other developer. (I am definitely interested in refactoring this as much as possible.
FROM node:8.12.0
RUN mkdir /app
WORKDIR /app
ENV PATH /app/node_modules/.bin:$PATH
RUN apt-get update && apt-get install snmp -y
RUN npm install --unsafe-perm=true
RUN apt-get update \
&& apt-get install -y \
gconf-service \
libasound2 \
libatk1.0-0 \
libatk-bridge2.0-0 \
libc6 \
libcairo2 \
libcups2 \
libdbus-1-3 \
libexpat1 \
libfontconfig1 \
libgcc1 \
libgconf-2-4 \
libgdk-pixbuf2.0-0 \
libglib2.0-0 \
libgtk-3-0 \
libnspr4 \
libpango-1.0-0 \
libpangocairo-1.0-0 \
libstdc++6 \
libx11-6 \
libx11-xcb1 \
libxcb1 \
libxcomposite1 \
libxcursor1 \
libxdamage1 \
libxext6 \
libxfixes3 \
libxi6 \
libxrandr2 \
libxrender1 \
libxss1 \
libxtst6 \
ca-certificates \
fonts-liberation \
libappindicator1 \
libnss3 \
lsb-release \
xdg-utils \
wget
COPY . /app
# Installing puppeteer and chromium for generating PDF of the invoices.
# Install latest chrome dev package and fonts to support major charsets (Chinese, Japanese, Arabic, Hebrew, Thai and a few others)
# Note: this installs the necessary libs to make the bundled version of Chromium that Puppeteer
# installs, work.
RUN apt-get update \
&& apt-get install -y wget gnupg libpam-cracklib \
&& wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \
&& sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' \
&& apt-get update \
&& apt-get install -y google-chrome-unstable fonts-ipafont-gothic fonts-wqy-zenhei fonts-thai-tlwg fonts-kacst fonts-freefont-ttf \
--no-install-recommends \
&& rm -rf /var/lib/apt/lists/*
# Uncomment to skip the chromium download when installing puppeteer. If you do,
# you'll need to launch puppeteer with:
# browser.launch({executablePath: 'google-chrome-unstable'})
# ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD true
# Install puppeteer so it's available in the container.
RUN npm i puppeteer \
# Add user so we don't need --no-sandbox.
# same layer as npm install to keep re-chowned files from using up several hundred MBs more space
&& groupadd -r pptruser && useradd -r -g pptruser -G audio,video pptruser \
&& mkdir -p /home/pptruser/Downloads \
&& chown -R pptruser:pptruser /home/pptruser \
&& chown -R pptruser:pptruser /app/node_modules
#build the api, and move into place.... framework options are limited with the build.
RUN npm i puppeteer kiwi-server-cli && kc build -e prod
RUN rm -Rf ./environments & rm -Rf ./src && cp -R ./dist/prod/* .
# Run everything after as non-privileged user.
# USER pptruser
CMD ["google-chrome-unstable"] # I have tried adding this here as well "&&", "node", "src/server.js"
For pushing the image I'm using this command:
docker push gcr.io/<projectid>/api:latest-<version> and I have the credentials setup with cloud auth configure-docker and here's a sanitized version of the yaml manifest:
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
kompose.cmd: kompose convert -f ./docker-compose.yml
kompose.version: 1.21.0 ()
creationTimestamp: null
labels:
io.kompose.service: api
name: api
spec:
replicas: 1
selector:
matchLabels:
io.kompose.service: api
strategy:
type: Recreate
template:
metadata:
annotations:
kompose.cmd: kompose convert -f ./docker-compose.yml
kompose.version: 1.21.0 ()
creationTimestamp: null
labels:
io.kompose.service: api
spec:
containers:
- args:
- bash
- -c
- node src/server.js
env:
- name: NODE_ENV
value: production
- name: TZ
value: America/New_York
image: gcr.io/<projectId>/api:latest-0.0.9
imagePullPolicy: Always
name: api
ports:
- containerPort: 8087
resources: {}
volumeMounts:
- mountPath: /app
name: api-claim0
- mountPath: /files
name: api-claim1
restartPolicy: Always
serviceAccountName: ""
volumes:
- name: api-claim0
persistentVolumeClaim:
claimName: api-claim0
- name: api-claim1
persistentVolumeClaim:
claimName: api-claim1
status: {}

The solution comes from the original intent of the docker-compose.yml file which was converted into a kubernetes manifest via a tool called kompose. The original docker-compose file was intended for development and as such had overrides in place to push the local development environment into the running container.
This was because of this in the yml file:
services:
api:
build: ./api
volumes:
- ./api:/app
- ./api/files:/files
which translates to this on the kubernetes manifest:
volumeMounts:
- mountPath: /app
name: api-claim0
- mountPath: /files
name: api-claim1
volumes:
- name: api-claim0
persistentVolumeClaim:
claimName: api-claim0
- name: api-claim1
persistentVolumeClaim:
claimName: api-claim1
Which Kubernetes has no files to supply, and the app is essentially overwritten with an empty volume, so the file is not found.
removal of the directives in the kubernetes manifest resulted in success.
Reminder to us all to be mindful.

To manage images [1] includes listing images in a repository, adding tags, deleting tags, copying images to a new repository, and deleting images. I hope the troubleshooting documents [2] could be helpful for you to troubleshoot the issue.
[1] https://cloud.google.com/container-registry/docs/managing
[2] https://cloud.google.com/container-registry/docs/troubleshooting

Related

Docker sync files

im not familiar with docker at all. im trying to use symfony with docker (windows). for docker i use (without changes, only +mysql)
https://github.com/dunglas/symfony-docker
it works well
problem that when i change files or create new files changes not sync, only after new build.
Please tell how to write proper dockerfile so files from host (and from docker) will be in sync
i need to use docker volumes, but idk how 😁
dockerfile
ARG PHP_VERSION=8.1
ARG CADDY_VERSION=2
# "php" stage
FROM php:${PHP_VERSION}-fpm-alpine AS symfony_php
# persistent / runtime deps
RUN apk add --no-cache \
acl \
fcgi \
file \
gettext \
git \
;
ARG APCU_VERSION=5.1.21
RUN set -eux; \
apk add --no-cache --virtual .build-deps \
$PHPIZE_DEPS \
icu-data-full \
icu-dev \
libzip-dev \
zlib-dev \
; \
\
docker-php-ext-configure zip; \
docker-php-ext-install -j$(nproc) \
intl \
zip \
; \
pecl install \
apcu-${APCU_VERSION} \
; \
pecl clear-cache; \
docker-php-ext-enable \
apcu \
opcache \
; \
\
runDeps="$( \
scanelf --needed --nobanner --format '%n#p' --recursive /usr/local/lib/php/extensions \
| tr ',' '\n' \
| sort -u \
| awk 'system("[ -e /usr/local/lib/" $1 " ]") == 0 { next } { print "so:" $1 }' \
)"; \
apk add --no-cache --virtual .phpexts-rundeps $runDeps; \
\
apk del .build-deps
RUN docker-php-ext-install mysqli pdo pdo_mysql && docker-php-ext-enable pdo_mysql
COPY docker/php/docker-healthcheck.sh /usr/local/bin/docker-healthcheck
RUN chmod +x /usr/local/bin/docker-healthcheck
HEALTHCHECK --interval=10s --timeout=3s --retries=3 CMD ["docker-healthcheck"]
RUN ln -s $PHP_INI_DIR/php.ini-production $PHP_INI_DIR/php.ini
COPY docker/php/conf.d/symfony.prod.ini $PHP_INI_DIR/conf.d/symfony.ini
COPY docker/php/php-fpm.d/zz-docker.conf /usr/local/etc/php-fpm.d/zz-docker.conf
COPY docker/php/docker-entrypoint.sh /usr/local/bin/docker-entrypoint
RUN chmod +x /usr/local/bin/docker-entrypoint
VOLUME /var/run/php
COPY --from=composer:latest /usr/bin/composer /usr/bin/composer
# https://getcomposer.org/doc/03-cli.md#composer-allow-superuser
ENV COMPOSER_ALLOW_SUPERUSER=1
ENV PATH="${PATH}:/root/.composer/vendor/bin"
WORKDIR /srv/app
# Allow to choose skeleton
ARG SKELETON="symfony/skeleton"
ENV SKELETON ${SKELETON}
# Allow to use development versions of Symfony
ARG STABILITY="stable"
ENV STABILITY ${STABILITY}
# Allow to select skeleton version
ARG SYMFONY_VERSION=""
ENV SYMFONY_VERSION ${SYMFONY_VERSION}
# Download the Symfony skeleton and leverage Docker cache layers
#RUN composer create-project "${SKELETON} ${SYMFONY_VERSION}" . --stability=$STABILITY --prefer-dist --no-dev --no-progress --no-interaction; \
# composer clear-cache
###> recipes ###
###> doctrine/doctrine-bundle ###
#RUN apk add --no-cache --virtual .pgsql-deps postgresql-dev; \
# docker-php-ext-install -j$(nproc) pdo_pgsql; \
# apk add --no-cache --virtual .pgsql-rundeps so:libpq.so.5; \
# apk del .pgsql-deps
###< doctrine/doctrine-bundle ###
###< recipes ###
COPY . .
RUN set -eux; \
mkdir -p var/cache var/log; \
composer install --prefer-dist --no-dev --no-progress --no-scripts --no-interaction; \
composer dump-autoload --classmap-authoritative --no-dev; \
composer symfony:dump-env prod; \
composer run-script --no-dev post-install-cmd; \
chmod +x bin/console; sync
VOLUME /srv/app/var
ENTRYPOINT ["docker-entrypoint"]
CMD ["php-fpm"]
FROM caddy:${CADDY_VERSION}-builder-alpine AS symfony_caddy_builder
RUN xcaddy build \
--with github.com/dunglas/mercure \
--with github.com/dunglas/mercure/caddy \
--with github.com/dunglas/vulcain \
--with github.com/dunglas/vulcain/caddy
FROM caddy:${CADDY_VERSION} AS symfony_caddy
WORKDIR /srv/app
COPY --from=dunglas/mercure:v0.11 /srv/public /srv/mercure-assets/
COPY --from=symfony_caddy_builder /usr/bin/caddy /usr/bin/caddy
COPY --from=symfony_php /srv/app/public public/
COPY docker/caddy/Caddyfile /etc/caddy/Caddyfile
docker-compose
version: "3.4"
services:
db:
image: mysql
# NOTE: use of "mysql_native_password" is not recommended: https://dev.mysql.com/doc/refman/8.0/en/upgrading-from-previous-series.html#upgrade-caching-sha2-password
# (this is just an example, not intended to be a production configuration)
command: --default-authentication-plugin=mysql_native_password
restart: always
environment:
MYSQL_ROOT_PASSWORD: root
adminer:
image: adminer
restart: always
ports:
- 8080:8080
php:
build:
context: .
target: symfony_php
args:
SYMFONY_VERSION: ${SYMFONY_VERSION:-}
SKELETON: ${SKELETON:-symfony/skeleton}
STABILITY: ${STABILITY:-stable}
restart: unless-stopped
volumes:
- php_socket:/var/run/php
healthcheck:
interval: 10s
timeout: 3s
retries: 3
start_period: 30s
environment:
MERCURE_URL: ${CADDY_MERCURE_URL:-http://caddy/.well-known/mercure}
MERCURE_PUBLIC_URL: https://${SERVER_NAME:-localhost}/.well-known/mercure
MERCURE_JWT_SECRET: ${CADDY_MERCURE_JWT_SECRET:-!ChangeMe!}
caddy:
build:
context: .
target: symfony_caddy
depends_on:
- php
environment:
SERVER_NAME: ${SERVER_NAME:-localhost, caddy:80}
MERCURE_PUBLISHER_JWT_KEY: ${CADDY_MERCURE_JWT_SECRET:-!ChangeMe!}
MERCURE_SUBSCRIBER_JWT_KEY: ${CADDY_MERCURE_JWT_SECRET:-!ChangeMe!}
restart: unless-stopped
volumes:
- php_socket:/var/run/php
- caddy_data:/data
- caddy_config:/config
ports:
# HTTP
- target: 80
published: ${HTTP_PORT:-80}
protocol: tcp
# HTTPS
- target: 443
published: ${HTTPS_PORT:-443}
protocol: tcp
# HTTP/3
- target: 443
published: ${HTTP3_PORT:-443}
protocol: udp
volumes:
php_socket:
caddy_data:
caddy_config:
This is not something you declare on the dockerfile level, the dockerfile is a layer you tell the container (or the host) to add the container in it's run time. (Installations, copies..)
this a flag on the docker API `docker run -v vol:vol .... ```
Or inside of the docker-compose file
I see your docker-compose file has a declaration of volumes for service caddy.
Try to init the volume as an empty one
# bottom of the docker-compose file
volumes:
php_socket: {}
Docker is built upon layers, every change you'll make, will create a new layer and will be saved on the volume.

How to continuously copy files into docker

I am a docker newbie and i can't rly figure out how the changes that will be made to my working directory will be continuously copied to the docker container. Is there a command that copies all my changes to the docker container all the time ?
Edit : i added docker file and docker compose
My docker file
FROM scratch
ADD centos-7-x86_64-docker.tar.xz /
LABEL \
org.label-schema.schema-version="1.0" \
org.label-schema.name="CentOS Base Image" \
org.label-schema.vendor="CentOS" \
org.label-schema.license="GPLv2" \
org.label-schema.build-date="20201113" \
org.opencontainers.image.title="CentOS Base Image" \
org.opencontainers.image.vendor="CentOS" \
org.opencontainers.image.licenses="GPL-2.0-only" \
org.opencontainers.image.created="2020-11-13 00:00:00+00:00"
RUN yum clean all && yum update -y && yum -y upgrade
RUN yum groupinstall "Development Tools" -y
RUN yum install -y wget gettext-devel curl-devel openssl-devel perl-devel perl-CPAN zlib-devel && wget https://github.com/git/git/archive/v2.26.2.tar.gz\
&& tar -xvzf v2.26.2.tar.gz && cd git-2.26.2 && make configure && ./configure --prefix=/usr/local && make install
# RUN mkdir -p /root/.ssh && \
# chmod 0700 /root/.ssh && \
# ssh-keyscan github.com > /root/.ssh/known_hosts
# RUN ssh-keygen -q -t rsa -N '' -f /id_rsa
# RUN echo "$ssh_prv_key" > /root/.ssh/id_rsa && \
# echo "$ssh_pub_key" > /root/.ssh/id_rsa.pub && \
# chmod 600 /root/.ssh/id_rsa && \
# chmod 600 /root/.ssh/id_rsa.pub
RUN ls
RUN cd / && git clone https://github.com/odoo/odoo.git \
&& cd odoo \
&& git fetch \
&& git checkout 9.0
RUN yum install python-devel libxml2-devel libxslt-dev openldap-devel libtiff-devel libjpeg-devel libzip-devel freetype-devel lcms2-devel \
libwebp-devel tcl-devel tk-devel python-pip nodejs
RUN pip install setuptools==1.4.1 beautifulsoup4==4.9.3 pillow openpyxl==2.6.4 luhn gmp-devel paramiko==1.7.7.2 python2-secrets cffi pysftp==0.2.8
RUN pip install -r requirements.txt
RUN npm install -g less
CMD ["/bin/bash","git"]
My docker-compose
version: '3.3'
services:
app: &app
build:
context: .
dockerfile: ./docker/app/Dockerfile
container_name: app
tty: true
db:
image: postgres:9.2.18
environment:
- POSTGRES_DB=test
ports:
- 5432:5432
volumes:
- ./docker/db/pg-data:/var/lib/postgresql/data
odoo:
<<: *app
command: python odoo.py -w odoo -r odoo
ports:
- '8069:8069'
depends_on:
- db
If I understand correctly you want to mount a path from the host into a container which can be done using volumes. Something like this would keep the folders in sync which can be useful for development
docker run -v /path/to/local/folder:/path/in/container busybox

Apache is adding header to images resulting in corrupting images

I'm dockerizing a laravel application, my image is based on an apache image, this is being hosted in AKS, where I'm mounting azure files with images share inside /public/images, the problem is apache would add header inside the image resulting in corrupting the images
even if I exec inside the pod itself and try curl localhost, I get the same problem so I'm sure it's not a problem with routing or my ingress
FROM php:7.3-apache
#install all the system dependencies and enable PHP modules
RUN apt-get update -y && apt-get install -y libmcrypt-dev openssl
RUN apt-get update && apt-get install -y libmcrypt-dev \
&& pecl install mcrypt-1.0.2 \
&& docker-php-ext-enable mcrypt
RUN docker-php-ext-install pdo mbstring
RUN apt-get install -y \
libzip-dev \
zip \
&& docker-php-ext-install zip
RUN apt-get install -y libfreetype6-dev libjpeg62-turbo-dev libpng-dev && \
docker-php-ext-configure gd --with-freetype-dir=/usr/include/ --with-jpeg-dir=/usr/include/
RUN docker-php-ext-install gd
RUN docker-php-ext-install mysqli pdo pdo_mysql
# RUN apt-get install wget
RUN apt-get update; apt-get install curl -y
#install composer
RUN curl -sS https://getcomposer.org/installer | php -- --install-dir=/usr/bin/ --filename=composer
#set our application folder as an environment variable
ENV APP_HOME /var/www/html
#change uid and gid of apache to docker user uid/gid
RUN usermod -u 1000 www-data && groupmod -g 1000 www-data
#change the web_root to laravel /var/www/html/public folder
#RUN sed -i -e "s/html/html\/public/g" /etc/apache2/sites-enabled/000-default.conf
COPY vhost.conf /etc/apache2/sites-available/000-default.conf
RUN echo "EnableSendfile off" >> /etc/apache2/apache2.conf
# enable apache module rewrite
RUN a2enmod rewrite
#copy source files and run composer
COPY . $APP_HOME
# install all PHP dependencies
RUN composer install --no-interaction
#change ownership of our applications
RUN chown -R www-data:www-data $APP_HOME
next using regular deployment yaml file to push this to kubernetes with the following Volume mounts:
volumeMounts:
- name: sessions
mountPath: /var/www/html/storage/framework/sessions
- name: cache
mountPath: /var/www/html/storage/framework/cache
- name: views
mountPath: /var/www/html/storage/framework/views
- name: images
mountPath: /var/www/html/public/images
volumes:
name: sessions
azureFile:
secretName: appmnt
shareName: sessions
readOnly: false
name: cache
azureFile:
secretName: appmnt
shareName: cache
readOnly: false
name: views
azureFile:
secretName: appmnt
shareName: views
readOnly: false
name: images
azureFile:
secretName: appmnt
shareName: images
readOnly: false
now the problem is if i try to access a static file from images folder, by example using a url like "https://www.somedomain.com/images/somefile.png"
the file will be download but apache will attach the above headers to the content resulting in corruption.
the web applications work perfectly fine, except for any files inside the volume mounts.
if i do "kubectl exec -it podname -- bash" and browse the files i can see the volume mounts are working fine, also if i try to upload files from the application interface, the file gets written in the write way inside the folder, only problem is with browsing the file.
We fixed the issue, simply in the vhost.conf, we needed to turn off EnableMMAP
EnableMMAP off

Spark submit fails on Kubernetes (EKS) with "invalid null input: name"

I am trying to run spark sample SparkPi docker image on EKS. My Spark version is 3.0.
I created spark serviceaccount and role binding. When I submit the job, there is error below:
2020-07-05T12:19:40.862635502Z Exception in thread "main" java.io.IOException: failure to login
2020-07-05T12:19:40.862756537Z at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:841)
2020-07-05T12:19:40.862772672Z at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:777)
2020-07-05T12:19:40.862777401Z at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:650)
2020-07-05T12:19:40.862788327Z at org.apache.spark.util.Utils$.$anonfun$getCurrentUserName$1(Utils.scala:2412)
2020-07-05T12:19:40.862792294Z at scala.Option.getOrElse(Option.scala:189)
2020-07-05T12:19:40.8628321Z at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2412)
2020-07-05T12:19:40.862836906Z at org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.configurePod(BasicDriverFeatureStep.scala:119)
2020-07-05T12:19:40.862907673Z at org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$3(KubernetesDriverBuilder.scala:59)
2020-07-05T12:19:40.862917119Z at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
2020-07-05T12:19:40.86294845Z at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
2020-07-05T12:19:40.862964245Z at scala.collection.immutable.List.foldLeft(List.scala:89)
2020-07-05T12:19:40.862979665Z at org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:58)
2020-07-05T12:19:40.863055425Z at org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:98)
2020-07-05T12:19:40.863060434Z at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$4(KubernetesClientApplication.scala:221)
2020-07-05T12:19:40.863096062Z at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$4$adapted(KubernetesClientApplication.scala:215)
2020-07-05T12:19:40.863103831Z at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2539)
2020-07-05T12:19:40.863163804Z at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:215)
2020-07-05T12:19:40.863168546Z at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:188)
2020-07-05T12:19:40.863194449Z at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
2020-07-05T12:19:40.863218817Z at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
2020-07-05T12:19:40.863246594Z at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
2020-07-05T12:19:40.863252341Z at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
2020-07-05T12:19:40.863277236Z at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
2020-07-05T12:19:40.863314173Z at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
2020-07-05T12:19:40.863319847Z at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
2020-07-05T12:19:40.863653699Z Caused by: javax.security.auth.login.LoginException: java.lang.NullPointerException: invalid null input: name
2020-07-05T12:19:40.863660447Z at com.sun.security.auth.UnixPrincipal.<init>(UnixPrincipal.java:71)
2020-07-05T12:19:40.863663683Z at com.sun.security.auth.module.UnixLoginModule.login(UnixLoginModule.java:133)
2020-07-05T12:19:40.863667173Z at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2020-07-05T12:19:40.863670199Z at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
2020-07-05T12:19:40.863673467Z at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2020-07-05T12:19:40.86367674Z at java.lang.reflect.Method.invoke(Method.java:498)
2020-07-05T12:19:40.863680205Z at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
2020-07-05T12:19:40.863683401Z at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
2020-07-05T12:19:40.86368671Z at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
2020-07-05T12:19:40.863689794Z at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
2020-07-05T12:19:40.863693081Z at java.security.AccessController.doPrivileged(Native Method)
2020-07-05T12:19:40.863696183Z at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
2020-07-05T12:19:40.863698579Z at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
2020-07-05T12:19:40.863700844Z at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:815)
2020-07-05T12:19:40.863703393Z at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:777)
2020-07-05T12:19:40.86370659Z at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:650)
2020-07-05T12:19:40.863709809Z at org.apache.spark.util.Utils$.$anonfun$getCurrentUserName$1(Utils.scala:2412)
2020-07-05T12:19:40.863712847Z at scala.Option.getOrElse(Option.scala:189)
2020-07-05T12:19:40.863716102Z at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2412)
2020-07-05T12:19:40.863719273Z at org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.configurePod(BasicDriverFeatureStep.scala:119)
2020-07-05T12:19:40.86372651Z at org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$3(KubernetesDriverBuilder.scala:59)
2020-07-05T12:19:40.863728947Z at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
2020-07-05T12:19:40.863731207Z at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
2020-07-05T12:19:40.863733458Z at scala.collection.immutable.List.foldLeft(List.scala:89)
2020-07-05T12:19:40.863736237Z at org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:58)
2020-07-05T12:19:40.863738769Z at org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:98)
2020-07-05T12:19:40.863742105Z at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$4(KubernetesClientApplication.scala:221)
2020-07-05T12:19:40.863745486Z at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$4$adapted(KubernetesClientApplication.scala:215)
2020-07-05T12:19:40.863749154Z at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2539)
2020-07-05T12:19:40.863752601Z at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:215)
2020-07-05T12:19:40.863756118Z at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:188)
2020-07-05T12:19:40.863759673Z at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
2020-07-05T12:19:40.863762774Z at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
2020-07-05T12:19:40.863765929Z at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
2020-07-05T12:19:40.86376906Z at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
2020-07-05T12:19:40.863792673Z at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
2020-07-05T12:19:40.863797161Z at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
2020-07-05T12:19:40.863799703Z at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
2020-07-05T12:19:40.863802085Z
2020-07-05T12:19:40.863804184Z at javax.security.auth.login.LoginContext.invoke(LoginContext.java:856)
2020-07-05T12:19:40.863806454Z at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
2020-07-05T12:19:40.863808705Z at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
2020-07-05T12:19:40.863811134Z at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
2020-07-05T12:19:40.863815328Z at java.security.AccessController.doPrivileged(Native Method)
2020-07-05T12:19:40.863817575Z at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
2020-07-05T12:19:40.863819856Z at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
2020-07-05T12:19:40.863829171Z at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:815)
2020-07-05T12:19:40.86385963Z ... 24 more
My deployments are:
apiVersion: v1
kind: Namespace
metadata:
name: helios
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: spark
namespace: helios
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: spark-role-binding
namespace: helios
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: edit
subjects:
- kind: ServiceAccount
name: spark
namespace: helios
---
apiVersion: batch/v1
kind: Job
metadata:
name: spark-pi
namespace: helios
spec:
template:
spec:
containers:
- name: spark-pi
image: <registry>/spark-pi-3.0
command: [
"/bin/sh",
"-c",
"/opt/spark/bin/spark-submit \
--master k8s://https://<EKS_API_SERVER> \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.kubernetes.namespace=helios
--conf spark.executor.instances=2 \
--conf spark.executor.memory=2G \
--conf spark.executor.cores=2 \
--conf spark.kubernetes.container.image=<registry>/spark-pi-3.0 \
--conf spark.kubernetes.container.image.pullPolicy=Always \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.jars.ivy=/tmp/.ivy
local:///opt/spark/examples/jars/spark-examples_2.11-2.4.0.jar"
]
serviceAccountName: spark
restartPolicy: Never
The docker image is created using OOTB dockerfile provided in Spark installation.
docker build -t spark:latest -f kubernetes/dockerfiles/spark/Dockerfile .
What am I doing wrong here? Please help.
SOLUTION
Finally it worked out after I comment the below line from docker file.
USER ${spark_uid}
Though, now, container is running as root but at least it is working.
I had the same problem. I solved it by changing the k8s job.
Hadoop is failing to find a username for the user. You can see the problem by running whoami in the container, which yields whoami: cannot find name for user ID 185. The spark image entrypoint.sh contains code to add the user to /etc/passwd, which sets a username. However command bypasses the entrypoint.sh, so instead you should use args like so:
containers:
- name: spark-pi
image: <registry>/spark-pi-3.0
args: [
"/bin/sh",
"-c",
"/opt/spark/bin/spark-submit \
--master k8s://https://10.100.0.1:443 \
--deploy-mode cluster ..."
]
Seems like you are missing the ServiceAccount/AWS role credentials so that your job can connect to the EKS cluster.
I recommend you set up fine-grained IAM roles for service accounts.
Basically, you would have something like this (after you set up the roles in AWS):
apiVersion: v1
kind: ServiceAccount
metadata:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/my-serviceaccount-Role1
name: spark
namespace: helios
Then your job would look something like this:
apiVersion: batch/v1
kind: Job
metadata:
name: spark-pi
namespace: helios
spec:
template:
spec:
containers:
- name: spark-pi
image: <registry>/spark-pi-3.0
command: [
"/bin/sh",
"-c",
"/opt/spark/bin/spark-submit \
--master k8s://https://<EKS_API_SERVER> \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.kubernetes.namespace=helios
--conf spark.executor.instances=2 \
--conf spark.executor.memory=2G \
--conf spark.executor.cores=2 \
--conf spark.kubernetes.container.image=<registry>/spark-pi-3.0 \
--conf spark.kubernetes.container.image.pullPolicy=Always \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.jars.ivy=/tmp/.ivy
local:///opt/spark/examples/jars/spark-examples_2.11-2.4.0.jar" ]
env:
- name: AWS_ROLE_ARN
value: arn:aws:iam::123456789012:role/my-serviceaccount-Role1
- name: AWS_WEB_IDENTITY_TOKEN_FILE
value: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
volumeMounts:
- mountPath: /var/run/secrets/eks.amazonaws.com/serviceaccount
name: aws-iam-token
readOnly: true
serviceAccountName: spark
restartPolicy: Never
I had the same problem. I solved it by adding into submit container
export SPARK_USER=spark3
without comment line USER ${spark_uid}
Finally it worked out after I comment the below line from docker file.
USER ${spark_uid}
Though, now, container is running as root but at least it is working.
I ran into the same issue and was able to resolve it by specifying runAsUser on the pod spec without having to modify the spark docker image.
securityContext:
runAsUser: 65534
runAsGroup: 65534
I had the same issue, fixed it by adding
RUN echo 1000:x:1000:0:anonymous uid:/opt/spark:/bin/false >> /etc/passwd
line in the last part Spark Dockerfile
RUN echo '1000:x:1000:0:anonymous uid:/opt/spark:/bin/false' >> /etc/passwd
ENTRYPOINT [ "/opt/entrypoint.sh" ]
# Specify the User that the actual main process will run as
USER ${spark_uid}
so full dockerfile looks like this
cat spark-3.2.0-bin-hadoop3.2/kubernetes/dockerfiles/spark/Dockerfile
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
ARG ROOT_CONTAINER=ubuntu:focal
FROM ${ROOT_CONTAINER}
ARG openjdk_version="8"
ARG spark_uid=1000
# Before building the docker image, first build and make a Spark distribution following
# the instructions in http://spark.apache.org/docs/latest/building-spark.html.
# If this docker file is being used in the context of building your images from a Spark
# distribution, the docker build command should be invoked from the top level directory
# of the Spark distribution. E.g.:
# docker build -t spark:latest -f kubernetes/dockerfiles/spark/Dockerfile .
RUN apt-get update --yes && \
apt-get install --yes --no-install-recommends \
"openjdk-${openjdk_version}-jre-headless" \
ca-certificates-java
RUN apt-get install --yes software-properties-common
RUN add-apt-repository ppa:deadsnakes/ppa
RUN apt-get update && apt-get install -y \
python3.7 \
python3-pip \
python3-distutils \
python3-setuptools
RUN pip install pyspark==3.2.0
RUN set -ex && \
sed -i 's/http:\/\/deb.\(.*\)/https:\/\/deb.\1/g' /etc/apt/sources.list && \
apt-get update && \
ln -s /lib /lib64 && \
export DEBIAN_FRONTEND=noninteractive && \
apt install -y -qq bash tini libc6 libpam-modules krb5-user libnss3 procps && \
mkdir -p /opt/spark && \
mkdir -p /opt/spark/examples && \
mkdir -p /opt/spark/work-dir && \
mkdir -p /etc/metrics/conf/ && \
mkdir -p /opt/hadoop/ && \
touch /opt/spark/RELEASE && \
rm /bin/sh && \
ln -sv /bin/bash /bin/sh && \
echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && \
chgrp root /etc/passwd && chmod ug+rw /etc/passwd && \
apt-get clean && rm -rf /var/lib/apt/lists/* \
rm -rf /var/cache/apt/*
COPY jars /opt/spark/jars
COPY bin /opt/spark/bin
COPY sbin /opt/spark/sbin
COPY kubernetes/dockerfiles/spark/entrypoint.sh /opt/
COPY kubernetes/dockerfiles/spark/decom.sh /opt/
COPY examples /opt/spark/examples
COPY kubernetes/tests /opt/spark/tests
COPY data /opt/spark/data
COPY conf/prometheus.yaml /etc/metrics/conf/
ENV SPARK_HOME /opt/spark
ENV JAVA_HOME /usr/lib/jvm/java-8-openjdk-amd64
WORKDIR /opt/spark/work-dir
RUN chmod g+w /opt/spark/work-dir
RUN chmod a+x /opt/decom.sh
RUN mkdir -p /opt/spark/logs && \
chown -R 1000:1000 /opt/spark/logs
RUN echo '1000:x:1000:0:anonymous uid:/opt/spark:/bin/false' >> /etc/passwd
RUN cat /etc/passwd
ENTRYPOINT [ "/opt/entrypoint.sh" ]
# Specify the User that the actual main process will run as
USER ${spark_uid}
Build spark-docker image
sudo ./bin/docker-image-tool.sh -r <my_docker_repo>/spark-3.2.0-bin-hadoop3.2-gcs -t <tag_number> build

Dockerized lambda : Invoking sam local invoke returns an Unable to import module 'index': Dockerized

I am using AWS SAM and dockerized the lambdas. Here is my Dockerfile
FROM python:3.7-alpine
RUN apk update && \
apk upgrade && \
apk add bash && \
apk add --no-cache --virtual build-deps build-base gcc && \
pip install aws-sam-cli && \
apk del build-deps
WORKDIR /app/
RUN ls
COPY bin/sam_entrypoint.sh bin/
COPY lambda/hello_world/requirements.txt .
RUN pip install -r requirements.txt
EXPOSE 8000
ENTRYPOINT "bin/sam_entrypoint.sh"
And Here is my docker-compose.yaml
version: '3.6'
services:
sam_app:
build:
context: .
dockerfile: Dockerfile
command: ["$PWD"]
image: sam:0.1
ports:
- "8000:8000"
volumes:
- .:/app
- /var/run/docker.sock:/var/run/docker.sock
Here is my app structure.
And here is my bin/sam_entrypoint.sh
#!/bin/sh
set -ex
echo "Hello"
BASEDIR="${PWD}/lambda/"
/usr/local/bin/sam build \
--template lambda/template.yaml
/usr/local/bin/sam local start-api \
--template lambda/template.yaml \
--host 0.0.0.0 \
--port 8000 \
--docker-volume-basedir "${BASEDIR}" \
--docker-network sam-app_default \
--skip-pull-image
when I run dc up (starts api in local) and hit http://0.0.0.0:8000/hello. I got the following error.
It seems like mounting is correct, but not sure what exactly i am doing wrong. It got to be something simple missing. I would appreciate any suggestions or help.
EDIT: Adding template.yaml
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: >
sam-app
Sample SAM Template for sam-app
Globals:
Function:
Timeout: 3
Resources:
HelloWorldFunction:
Type: AWS::Serverless::Function
Properties:
CodeUri: hello_world/
Handler: index.lambda_handler
Runtime: python3.7
Events:
HelloWorld:
Type: Api
Properties:
Path: /hello
Method: get

Resources