running spark application on GKE - failing on spark-submit - docker

i've a simple spark application which i'm trying to run on existing GKE cluster.
Note : I've spark installed on GKE using helm.
Here is what is done so far :
created a Dockerfile and small pyspark application.
Dockerfile:
FROM python:3.7-slim
RUN apt-get update && \
apt-get install -y default-jre && \
apt-get install -y openjdk-11-jre-headless && \
apt-get clean
ENV JAVA_HOME /usr/lib/jvm/java-11-openjdk-amd64
RUN pip install pyspark
RUN mkdir -p /myexample && chmod 755 /myexample
WORKDIR /myexample
COPY src/StructuredStream-on-gke.py /myexample/StructuredStream-on-gke.py
CMD ["pyspark"]
simple pyspark code:
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("StructuredStreaming-on-gke").getOrCreate()
data = [('k1', 123000), ('k2', 234000), ('k3', 456000)]
df = spark.createDataFrame(data, ('id', 'salary'))
df.show(5, False)
Command being run :
spark-submit --master k8s://https://<IP>:<port> --deploy-mode cluster --name pyspark-example --conf spark.kubernetes.container.image=pyspark-example:0.1 --conf spark.kubernetes.file.upload.path=/myexample src/StructuredStream-on-gke.py```
Error i get :
23/02/13 13:18:27 INFO KubernetesUtils: Uploading file: /Users/karanalang/PycharmProjects/Kafka/pyspark-docker/src/StructuredStream-on-gke.py to dest: /myexample/spark-upload-12228079-d652-4bf3-b907-3810d275124a/StructuredStream-on-gke.py...
Exception in thread "main" org.apache.spark.SparkException: Uploading file /Users/karanalang/PycharmProjects/Kafka/pyspark-docker/src/StructuredStream-on-gke.py failed...
at org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:296)
at org.apache.spark.deploy.k8s.KubernetesUtils$.renameMainAppResource(KubernetesUtils.scala:270)
at org.apache.spark.deploy.k8s.features.DriverCommandFeatureStep.configureForPython(DriverCommandFeatureStep.scala:109)
at org.apache.spark.deploy.k8s.features.DriverCommandFeatureStep.configurePod(DriverCommandFeatureStep.scala:44)
at org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$3(KubernetesDriverBuilder.scala:59)
at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
at scala.collection.immutable.List.foldLeft(List.scala:89)
at org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:58)
at org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:106)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3(KubernetesClientApplication.scala:213)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3$adapted(KubernetesClientApplication.scala:207)
at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2622)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:207)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:179)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.spark.SparkException: Error uploading file StructuredStream-on-gke.py
at org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileToHadoopCompatibleFS(KubernetesUtils.scala:319)
at org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:292)
... 21 more
Caused by: java.io.IOException: Mkdirs failed to create /myexample/spark-upload-12228079-d652-4bf3-b907-3810d275124a
at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:317)
at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:305)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1098)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:987)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:414)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:387)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:2369)
at org.apache.hadoop.fs.FilterFileSystem.copyFromLocalFile(FilterFileSystem.java:368)
at org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileToHadoopCompatibleFS(KubernetesUtils.scala:316)
... 22 more
23/02/13 13:18:27 INFO ShutdownHookManager: Shutdown hook called
23/02/13 13:18:27 INFO ShutdownHookManager: Deleting directory /private/var/folders/yp/p_9783hn1fg_dyzbf4sgnxqh0000gn/T/spark-39c64e15-6c4c-491f-9d29-1e88fa90b2e8
When i logon to spark master node on Kuberenetes, i don't see the /myexample folder being created.
What needs to be done to fix this ?
tia!

Related

Packer fails on Gitlab-CI with {message:401 Unauthorized}: command not found

I am trying to use packer with GCP in gitlab-ci but every time I get to the packer build it will fail with the following error:
starting remote command: chmod +x /tmp/script_5147.sh; DEBIAN_FRONTEND='noninteractive' PACKER_BUILDER_TYPE='googlecompute' PACKER_BUILD_NAME='my_vm' /tmp/script_5147.sh
==> googlecompute.my_vm: /tmp/script_5147.sh: line 1: {message:401 Unauthorized}: command not found
2022/11/13 13:26:05 [INFO] 0 bytes written for 'stdout'
2022/11/13 13:26:05 packer-plugin-googlecompute_v1.0.16_x5.0_linux_amd64 plugin: 2022/11/13 13:26:05 [ERROR] Remote command exited with '127': chmod +x /tmp/script_5147.sh; DEBIAN_FRONTEND='noninteractive' PACKER_BUILDER_TYPE='googlecompute' PACKER_BUILD_NAME='my_vm' /tmp/script_5147.sh
2022/11/13 13:26:05 packer-plugin-googlecompute_v1.0.16_x5.0_linux_amd64 plugin: 2022/11/13 13:26:05 [INFO] RPC endpoint: Communicator ended with: 127
The script
#!/bin/bash
set -e
if [ "$EUID" -ne 0 ]
then echo "Please run as root"
exit
fi
apt update
apt install -y curl
curl -fsSL https://deb.nodesource.com/setup_18.x | bash -
apt upgrade -y iptables
# The iptables-persistent must be installed in order to create the /etc/iptables/rules.v4 file
apt install -y nginx libzmq3-dev nodejs ipset iptables-persistent net-tools libre2-dev
npm install -g yarn
rm /etc/nginx/sites-enabled/default
The packer is working locally (also with the image) and I created a VM in GCP all of them are working, only in gitlab it fails.
I created a custom image of packer and it still failed for me on gitlab.
I will try to move it to my own runner but it will take a few days for me.
Would love if someone could help me figure it out.
Ok, I figured out the problem.
I curled some files and the token was not good so when it got to packer it failed because the files were not as expected.
I printing the 401 of the curl.

Error when starting custom Airflow Docker Image GROUP_OR_COMMAND

I created a custom image with the following Dockerfile:
FROM apache/airflow:2.1.1-python3.8
USER root
RUN apt-get update \
&& apt-get -y install gcc gnupg2 \
&& curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add - \
&& curl https://packages.microsoft.com/config/debian/10/prod.list > /etc/apt/sources.list.d/mssql-release.list
RUN apt-get update \
&& ACCEPT_EULA=Y apt-get -y install msodbcsql17 \
&& ACCEPT_EULA=Y apt-get -y install mssql-tools
RUN echo 'export PATH="$PATH:/opt/mssql-tools/bin"' >> ~/.bashrc \
&& echo 'export PATH="$PATH:/opt/mssql-tools/bin"' >> ~/.bashrc \
&& source ~/.bashrc
RUN apt-get -y install unixodbc-dev \
&& apt-get -y install python-pip \
&& pip install pyodbc
RUN echo -e “AIRFLOW_UID=$(id -u) \nAIRFLOW_GID=0” > .env
USER airflow
The image creates successfully, but when I try to run it, I get this error:
"airflow command error: the following arguments are required: GROUP_OR_COMMAND, see help above."
I have tried supplying a group ID with the --user, but I can't figure it out.
How can I start this custom Airflow Docker image?
Thanks!
First of all this line is wrong:
RUN echo -e “AIRFLOW_UID=$(id -u) \nAIRFLOW_GID=0” > .env
If you are running it with Docker Compose (I presume you took it from https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html), this is something you should run on "Host" machine, not in the image. Remove that line, it has no effect.
Secondly - it really depends what "command" you run. The "GROUP_OR_COMMAND" message you got is the output of "airflow" command. You have not copied the whole output of your command but this is a message you get when you try to run airflow without telling it what to do. When you run the image you will run by default the airflow command which has a number of subcommands that can be executed. So the "see help above" message tells you the very thing you should do - look at the help and see what subcommand you wanted to run (and possibly run it).
docker run -it apache/airflow:2.1.2
usage: airflow [-h] GROUP_OR_COMMAND ...
positional arguments:
GROUP_OR_COMMAND
Groups:
celery Celery components
config View configuration
connections Manage connections
dags Manage DAGs
db Database operations
jobs Manage jobs
kubernetes Tools to help run the KubernetesExecutor
pools Manage pools
providers Display providers
roles Manage roles
tasks Manage tasks
users Manage users
variables Manage variables
Commands:
cheat-sheet Display cheat sheet
info Show information about current Airflow and environment
kerberos Start a kerberos ticket renewer
plugins Dump information about loaded plugins
rotate-fernet-key
Rotate encrypted connection credentials and variables
scheduler Start a scheduler instance
sync-perm Update permissions for existing roles and optionally DAGs
version Show the version
webserver Start a Airflow webserver instance
optional arguments:
-h, --help show this help message and exit
airflow command error: the following arguments are required: GROUP_OR_COMMAND, see help above.
when you extend the official image, it will pass the parametor to "airflow" command which causing this problem. Check this out: https://airflow.apache.org/docs/docker-stack/entrypoint.html#entrypoint-commands

shell script having ssh command inside docker container fails with - ssh: not found

I have a spring boot java application running on a docker container, and it tries to run a shell script. The shell script has a ssh command and I get the following error while running it
2020-08-12 09:22:29.425 INFO 1 --- [io-11013-exec-1] b.n.i.s.d.e.service.EmrManagerService : Executing spark submit, calling shell script: /tmp/temp843155675494688636.sh 172.29.199.15
2020-08-12 09:22:29.434 DEBUG 1 --- [io-11013-exec-1] b.n.i.s.d.e.service.EmrManagerService : Starting Input Stream:
2020-08-12 09:22:29.435 INFO 1 --- [io-11013-exec-1] b.n.i.s.d.e.service.EmrManagerService : #1 arg: 172.29.199.15
2020-08-12 09:22:29.436 INFO 1 --- [io-11013-exec-1] b.n.i.s.d.e.service.EmrManagerService : Exist Value127
2020-08-12 09:22:29.436 ERROR 1 --- [io-11013-exec-1] b.n.i.s.d.e.service.EmrManagerService : Starting Error Stream:
2020-08-12 09:22:29.436 ERROR 1 --- [io-11013-exec-1] b.n.i.s.d.e.service.EmrManagerService :
/tmp/temp843155675494688636.sh: line 5: ssh: not found
The same code works fine when am running the jar directly and not as docker container.
Is it something to do with ssh not recognized in docker container?
shell script -
#!/bin/bash
echo "#1 arg:" $1
ssh -i /home/dnaidaasd/aws-oneid-idaas-2020Q2.pem -oStrictHostKeyChecking=no hadoop#$1 '/etc/alternatives/jre/bin/java -Xmx1000m -server \
-XX:OnOutOfMemoryError="kill -9 %p" -cp "/usr/share/aws/emr/instance \
-controller/lib/*" -Dhadoop.log.dir=/mnt/var/log/hadoop/steps/s-100-120 \
-Dhadoop.log.file=syslog -Dhadoop.home.dir=/usr/lib/hadoop \
-Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,DRFA -Djava.library.path=:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native \
-Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true \
-Djava.io.tmpdir=/mnt/var/lib/hadoop/steps/s-14611-353/tmp \
-Dhadoop.security.logger=INFO,NullAppender \
-Dsun.net.inetaddr.ttl=30 \
org.apache.hadoop.util.RunJar /var/lib/aws/emr/step-runner/hadoop-jars/command-runner.jar spark-submit \
--conf spark.hadoop.mapred.output.compress=true \
--conf spark.hadoop.mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec \
--class biz.neustar.idaas.services.dataprofile.ProfileMain \
--name IdaasProfile --conf spark.dynamicAllocation.enabled=true \
--conf spark.executor.instances=2 --conf spark.driver.memory=8G \
--conf spark.executor.memory=4G --conf spark.executor.cores=1 \
--conf spark.sql.catalogImplementation=hive \
--jars s3://oneid-idaas-dev-us-east-1/dev/emr/TestIdaasProfile/spark-core_2.11-2.4.5.jar,s3://oneid-idaas-dev-us-east-1/dev/emr/TestIdaasProfile/spark-sql_2.11-2.4.5.jar,s3://oneid-idaas-dev-us-east-1/dev/emr/TestIdaasProfile/spark-mllib_2.11-2.4.5.jar,s3://oneid-idaas-dev-us-east-1/dev/emr/TestIdaasProfile/jackson-module-scala_2.11-2.6.7.1.jar,s3://oneid-idaas-dev-us-east-1/dev/emr/TestIdaasProfile/jackson-databind-2.6.7.jar s3://oneid-idaas-dev-us-east-1/dev/emr/TestIdaasProfile/data-profile-14.0.jar' \
$2 $3 $4
This shell script is called as -
public void executeSparkSubmit(String masterNodeIp, String pathToScript, String input_hive_table, String s3_output_path, String output_hive_table ) throws IOException, InterruptedException, DataProfileServiceException {
log.info("Executing spark submit, calling shell script: " + pathToScript + " " + masterNodeIp);
ProcessBuilder pb = new ProcessBuilder("sh", pathToScript, masterNodeIp, input_hive_table, s3_output_path, output_hive_table);
Process pr = pb.start();
And the Dockerfile contents are:
FROM openjdk:8-jdk-alpine
ADD ./data-profile-provider/build/libs/data-profile-provider-203.2.0-SNAPSHOT.jar data-profile.jar
EXPOSE 11013
ENTRYPOINT ["java", "-jar", "data-profile.jar", "application.properties"]
As I suspected - your image is Alpine-based and Alpine does not have SSH client installed by default.
Corrected Dockerfile:
FROM openjdk:8-jdk-alpine
RUN apk add --no-cache openssh-client
ADD ./data-profile-provider/build/libs/data-profile-provider-203.2.0-SNAPSHOT.jar data-profile.jar
EXPOSE 11013
ENTRYPOINT ["java", "-jar", "data-profile.jar", "application.properties"]
Edit: I forgot to add that Alpine does not have Bash either. Luckily your app invokes your script with sh scriptname.sh - otherwise you'd get bash: not found error.
SSH might not be installed.
My example here assumes an Ubuntu/Linux image derived from since you did not specify the Dockfile contents at the time.
If your container can launch successfully (ignore the fact that your app is failing), you can just simply run ssh on the command-line to see (it will give you something similar to command not found)
To run commands inside Docker container: Since an Ubuntu image has bash installed, you can run like this:
docker exec -ti containername bash
Inside Docker container: (One of my containers where there is no SSH installed)
ssh
ssh: command not found
The base container you inherit from might not have the tool installed. Most Docker containers you inherit from are usually with 'bare minimum' in mind, so your custom Docker image needs to install it otherwise.
Just adding the run command that you can add onto the Dockerfile, make sure your user are able to run these. (In this example I made sure the container image user is root) This example installs only the ssh-client only (which is what is required)
USER root
RUN apt-get update \
&& apt-get install openssh-client
USER mydockercontaineruser

fastlane - error at google cloud build: "OCI runtime create failed: container_linux.go:345"

I'm using fastlane container that stores at google container registry to upload APK to google play store using Google Cloud Build.
APK has been succesfully created.However, when processing last step (fastlane), it face errors:
Step #2: 487ea6dabc0c: Pull complete
Step #2: a7ae4fee33c9: Pull complete
Step #2: Digest: sha256:2e31d5ae64984a598856f1138c6be0577c83c247226c473bb5ad302f86267545
Step #2: Status: Downloaded newer image for gcr.io/myapp789-app/fastlane:latest
Step #2: gcr.io/myapp789-app/fastlane:latest
Step #2: docker: Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "exec: \"supply\": executable file not found in $PATH": unknown.
Step #2: time="2019-08-29T23:22:55Z" level=error msg="error waiting for container: context canceled"
Finished Step #2
ERROR
ERROR: build step 2 "gcr.io/myapp789-app/fastlane" failed: exit status 127
Note:
1) Docker Source file was taken from https://hub.docker.com/r/fastlanetools/fastlane and then I build my own image.
2) Docker Image Build on Google Cloud VM using Debian GNU/Linux 9 (stretch)
Docker Source File for fastlane:
# Final image #
###############
FROM circleci/ruby:latest
MAINTAINER milch
ENV PATH $PATH:/usr/local/itms/bin
# Java versions to be installed
ENV JAVA_VERSION 8u131
ENV JAVA_DEBIAN_VERSION 8u131-b11-1~bpo8+1
ENV CA_CERTIFICATES_JAVA_VERSION 20161107~bpo8+1
# Needed for fastlane to work
ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8
# Required for iTMSTransporter to find Java
ENV JAVA_HOME /usr/lib/jvm/java-8-openjdk-amd64/jre
USER root
# iTMSTransporter needs java installed
# We also have to install make to install xar
# And finally shellcheck
RUN echo 'deb http://archive.debian.org/debian jessie-backports main' > /etc/apt/sources.list.d/jessie-
backports.list \
&& apt-get -o Acquire::Check-Valid-Until=false update \
&& apt-get install --yes \
make \
shellcheck \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
USER circleci
COPY --from=xar_builder /tmp/xar /tmp/xar
RUN cd /tmp/xar \
&& sudo make install \
&& sudo rm -rf /tmp/*
CloudBuild.yaml:
- name: 'gcr.io/$PROJECT_ID/fastlane'
args: ['supply', '--package_name','${_ANDROID_PACKAGE_NAME}', '--track', '${_ANDROID_RELEASE_CHANNEL}', '--json_key_data', '${_GOOGLE_PLAY_UPLOAD_KEY_JSON}', '--apk', '/workspace/${_REPO_NAME}/build/app/outputs/bundle/release/app.aab']
timeout: 1200s
Any Idea to solve this?
I solve this by building docker image using docker source from Google Cloud Official other than fastlane on hub.docker.com (where's it never update since 5 month ago)

How to setup local syslog-ng in docker container of CentOS 7?

I want to isolate a testing environment in docker, I did that on CentOS 6 How to let syslog workable in docker?
In CentOS 7, the syslog-ng's configuration is different, when I run
/usr/sbin/syslog-ng -F -p /var/run/syslogd.pid
It appears the following error message, but there is no proc/kmsg in config files.
syslog-ng: Error setting capabilities, capability management disabled; error='Operation not permitted'
Error opening file for reading; filename='/proc/kmsg', error='Operation not permitted (1)'
The Dockerfile
FROM centos
RUN yum update --exclude=systemd -y \
&& yum install -y yum-plugin-ovl \
&& yum install -y epel-release
RUN yum install -y syslog-ng syslog-ng-libdbi
The test process:
docker build -t t1 .
docker run --rm -i -t t1 /bin/bash
In container, run following commands
# check config, no keyword like proc/kmsg
cd /etc/syslog-ng
grep -r -E 'proc|kmsg'
/usr/sbin/syslog-ng -F -p /var/run/syslogd.pid
Change /etc/syslog-ng/syslog-ng.conf from
source s_sys {
system();
internal();
};
to
source s_sys {
unix-stream("/dev/log");
internal();
};
It still show error message, but running instead of exit
syslog-ng: Error setting capabilities, capability management disabled; error='Operation not permitted'
To solve this, just run with --no-caps option
/usr/sbin/syslog-ng --no-caps -F -p /var/run/syslogd.pid

Resources