Dataflow flex template job attempts to launch second job (for pipeline) with same job_name - google-cloud-dataflow

I am trying to launch a Dataflow flex template. As part of the build and deploy process, I am pre-building a custom SDK container image to reduce worker start-up time.
I have attempted this in these ways:
When no sdk_container_image is specified and a requirements.txt file is provided, a Dataflow flex template is successfully launched and a graph is built, but the workers cannot start because they lack authority to install private packages.
Errors:
Dataflow job appears to be stuck; no worker activity
Workers fail to install requirements
When an sdk_container_image is given (with dependencies pre-installed), the Dataflow job starts, but instead of running the pipeline on the same job, it attempts to launch a Dataflow job for the pipeline using the same name, which results in an error.
Errors:
Duplicate Dataflow job name, DataflowJobAlreadyExistsError
If I attempt to pass a second job_name for the pipeline, the pipeline successfully starts in a separate job, but the original flex template job eventually fails due to a polling timeout. Pipeline fails at very last step due to sdk harness disconnected because "The worker VM had to shut down one or more processes due to lack of memory."
Template errors:
Timeout in polling result file
Pipeline Errors:
SDK harness disconnected (Out of memory)
When I launch the pipeline locally using DataflowRunner, the pipeline runs successfully under one job name.
Here are my Dockerfiles and gcloud commands:
Flex template Dockerfile:
FROM gcr.io/dataflow-templates-base/python39-template-launcher-base
# Create working directory
ARG WORKDIR=/flex
RUN mkdir -p ${WORKDIR}
WORKDIR ${WORKDIR}
# Due to a change in the Apache Beam base image in version 2.24, you must to install
# libffi-dev manually as a dependency. For more information:
# https://github.com/GoogleCloudPlatform/python-docs-samples/issues/4891
RUN apt-get update && apt-get install -y libffi-dev && rm -rf /var/lib/apt/lists/*
COPY ./ ./
ENV FLEX_TEMPLATE_PYTHON_PY_FILE="${WORKDIR}/launch_pipeline.py"
# Install the pipeline dependencies
RUN pip install --no-cache-dir --upgrade pip setuptools wheel
RUN pip install --no-cache-dir apache-beam[gcp]==2.41.0
RUN pip install --no-cache-dir -r requirements.txt
ENTRYPOINT [ "/opt/google/dataflow/python_template_launcher" ]
Worker Dockerfile:
# Set up image for worker.
FROM apache/beam_python3.9_sdk:2.41.0
WORKDIR /worker
COPY ./requirements.txt ./
RUN pip install --no-cache-dir --upgrade pip setuptools wheel
RUN pip install --no-cache-dir -r requirements.txt
Building template:
gcloud dataflow flex-template build $TEMPLATE_LOCATION \
--image "$IMAGE_LOCATION" \
--sdk-language "PYTHON" \
--metadata-file "metadata.json"
Launching template:
gcloud dataflow flex-template run ddjanke-local-flex \
--template-file-gcs-location=$TEMPLATE_LOCATION \
--project=$PROJECT \
--service-account-email=$EMAIL \
--parameters=[OTHER_ARGS...],sdk_container_image=$WORKER_IMAGE \
--additional-experiments=use_runner_v2

I actually managed to solve this yesterday. The problem was that I was passing sdk_container_image to Dataflow through the flex template and then passing that through to the PipelineOptions within my code. After I removed sdk_container_image from the options, it launched the pipeline in the same job.

Related

Docker build on top existing docker image

I have a nodejs app wrapped in a docker image. Whenever I do any change in the code, even adding few console.logs, I need to rebuild the whole image, a long process of over 10 minutes, during the CI process.
Is there a way to build one image on top of another, say adding only the delta?
Thanks
EDIT in response to #The Fool comment
I'm running the CI using aws tools, mainly CodeBuild. I can use the latest docker image I created (all are stored in aws ecr) and build based on it, would it know to take only the delta even if it conflicts with the new code?
Following is my Dockerfile:
FROM node:15.3.0
RUN mkdir -p /usr/src/app
WORKDIR /usr/src/app
COPY package.json /usr/src/app
RUN apt-get update
RUN apt-get install -y build-essential libcairo2-dev libpango1.0-dev libjpeg-dev libgif-dev librsvg2-dev
RUN npm install
COPY . /usr/src/app
EXPOSE 3000
CMD bash -c "npm run sequelize db:migrate&&npm run sequelize db:seed:all&&npm run prod"

Confused by Dockerfile

I feel confused by the Dockerfile and build process. Specifically, I am working my way through the book Docker on AWS and I feel stuck until I can work my way through a few more of the details. The book had me write the following Dockerfile.
#Test stage
FROM alpine as test
LABEL application=todobackend
#Install basic utilities
RUN apk add --no-cache bash git
#Install build dependencies
RUN apk add --no-cache gcc python3-dev libffi-dev musl-dev linux-headers mariadb-dev py3-pip
RUN ../../usr/bin/pip3 install wheel
#Copy requirements
COPY /src/requirements* /build/
WORKDIR /build
#Build and install requirements
RUN pip3 wheel -r requirements_test.txt --no-cache-dir --no-input
RUN pip3 install -r requirements_test.txt -f /build --no-index --no-cache-dir
# Copy source code
COPY /src /app
WORKDIR /app
# Test entrypoint
CMD ["python3","manage.py","test","--noinput","--settings=todobackend.settings_test"]
The following is a list of the things I understand versus don't understand.
I understand this.
#Test stage
FROM alpine as test
LABEL application=todobackend
It is defining a 'test' stage so I can run commands like docker build --target test and will execute all of the following commands until the next FROM / as command indicates a different target. LABEL is labeling the specific docker image that is built and from which containers will be 'born' (not sure if that is the right word to use). I don't feel any confusion about that EXCEPT if that tag translates to containers spawned from that image.
So NOW I start to feel confused.
I PARTLY understand this
#Install basic utilities
RUN apk add --no-cache bash git
I understand that apk is an overloaded term that represents both the package manager on Alpine Linux and a file type. In this context, it is a package manager command to install (or upgrade) a package to the running system. HOWEVER, I am suppose to be building / packaging up an application and all of its dependencies into an enclosed 'environment'. Sooo... where / when does this 'environment' come in? That is where I feel confused. When the docker file is running apk, is it just saying "locally, on your current machine, please install these the normal way." (ie, the equivalent of a bash script where apk installs to its working directory). When I run docker build --target test -t todobackend-test on my previously pasted docker file, is the docker command doing both a native command execution AND a Docker Engine call to create an isolated environment for my docker image? I feel like what must be happening is when the docker command is run it acts like a wrapper around the built-in package manager / bash / pip functionality AND the docker engine and is doing both but I don't know.
Anyways, I feel hope that this made sense. I just want some implementation details. Feel free to link documentation but it can feel super tedious and unnecessarily detailed OR obfuscated sometimes.
I DO want to point out that if I run an apk command in my Dockerfile with a bad dependency name (e.g. python3-pip instead of py3-pip). I get a very interesting error:
/bin/sh: pip3: not found
Notice the command path. I am assuming anyone reading this will understand why that feels hella confusing.

How to retain cmake changes when building Docker image in Google Cloud Build?

I am working on a CI pipeline with Google Cloud Build to run tests on code stored in Cloud Source Repositories. As it stands, the pipeline uses the docker cloud builder to build an image with docker build. The building process takes nearly an hour to complete and it runs periodically. It builds the program and then runs a single test on it in one build step, this part works fine. What I want to do is build the program using cmake and make then store the image in the container registry so that I can run future tests from that image without having to spend the time building it before testing.
The issue is that when I run the custom image from the Container Registry in Cloud Build, it does not recognize the module that was built with cmake. The pipeline ran tests just fine when I built it then ran tests in the same build steps, but no longer recognizes the module when I run the image as a custom builder on Cloud Build.
The dockerfile used to build the image is as follows:
FROM ubuntu
ARG DEBIAN_FRONTEND=noninteractive
COPY . /app
WORKDIR /app
RUN apt-get update
RUN apt-get -y install python3.7
RUN apt-get -y install python3-pip
RUN pip3 install numpy
RUN pip3 install matplotlib
RUN pip3 install transitions
RUN pip3 install pandas
RUN apt-get install -y cmake
RUN apt-get install -y swig
RUN pip3 install conan
RUN ln -s ~/.local/bin/conan /usr/bin/conan
RUN apt-get install gcc
RUN cd ~
RUN python3 path/to/master_build.py
The master_build.py uses os.system commands to build the program. It calls a shell script to do the cmake process. The shell script is:
#!/bin/sh
mkdir dist3
cd dist3
cmake -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release ../src
make
cd ~
This builds the program no problem, then the python script calls other scripts to run basic tests, which all work fine when I do it in this build step. The issue is, when I use Cloud Build to run the custom image from container registry, it can no longer find the module that was built with CMake.
This is the cloudbuild config file that runs the custom image:
steps:
- name: 'gcr.io/$PROJECT_ID/built_images:latest'
args: ['python3', 'path/to/run_tests.py']
I get a ModuleNotFoundError, which is weird because it worked fine when I ran the test script in the same build after calling cmake. I'm guessing that the file system is not being retained when I push the image to container registry and it can no longer find the dist3 folder.
How can I retain the dist3 folder when I am pushing the image to container registry?
Thanks!

Using docker to create CI server agents

I'm trying to set up a local GoCD CI server using docker for both the base server and agents. I can get everything running fine, but issues spring up when I try make sure the agent containers have everything installed in them that I need to build my projects.
I want to preface this with I'm aware that I might not be using these technologies correctly, but I don't know much better atm. If there are better ways of doing things, I'd love to learn.
To start, I'm using the official GoCD docker image and that works just fine.
Creating a blank agent also works just fine.
However, one of my projects requires node, yarn and webpack to be build (good ol' react site).
Of course a standard agent container has nothing but the agent installed on it so I've had a shot using a Dockerfile to install all the tech I need to build my projects.
FROM gocd/gocd-agent-ubuntu-18.04:v19.11.0
SHELL ["/bin/bash", "-c"]
USER root
RUN apt-get update
RUN apt-get install -y git curl wget build-essential ca-certificates libssl-dev htop openjdk-8-jre python python-pip
RUN curl -sS https://dl.yarnpkg.com/debian/pubkey.gpg | apt-key add - && \
echo "deb https://dl.yarnpkg.com/debian/ stable main" | tee /etc/apt/sources.list.d/yarn.list
RUN apt-get update && apt-get install -y yarn
# This user is created in the base agent image
USER go
ENV NVM_DIR /home/go/.nvm
ENV NODE_VERSION 10.17.0
RUN curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.35.1/install.sh | bash \
&& . $NVM_DIR/nvm.sh \
&& nvm install $NODE_VERSION \
&& nvm alias default $NODE_VERSION \
&& nvm use default \
&& npm install -g webpack webpack-cli
ENV NODE_PATH $NVM_DIR/v$NODE_VERSION/lib/node_modules
ENV PATH $NVM_DIR/v$NODE_VERSION/bin:$PATH
This is the current version of this file, but I've been through many many iterations of frustrations where an globally installed npm package is never on the path and thus not conveniently available.
The docker build works fine, its just that in this iteration of the Dockerfile, webpack is not found when the agent tries running a build.
My question is:
Is a Dockerfile the right place to do things like install yarn, node, webpack etc... ?
If so, how can I ensure everything I install through npm is actually available?
If not, what are the current best practices about this?
Any help, thoughts and anecdotes are fully welcomed and appreciated!
Cheers~!
You should separate gocd-server and gocd-agent to various containers.
Pull images:
docker pull gocd/gocd-server:v18.10.0 docker pull
gocd/gocd-agent-alpine-3.8:v18.10.0
Build and run them, check if it's ok. Then connect into bash in agent container
docker exec -it gocd-agent bash
Install the binaries using the alpine package manager.
apk add --no-cache nodejs yarn
Then logout and update the container image. Now you have an image with needed packeges. Also read this article.
You have two options with gocd agents.
The first one is the agent use docker, and create other containers, for any purpose that the pipeline needs. So you can have a lot of agents with this option, and the rules or definitions occurs in the pipeline. The agent only execute.
The second one, is an agent with al kind of program installed you needed. I use this one. For this case, you use a Dockerfile with all, and generate the image for all the agents.
For example i have an agent with gcloud, kubectl, sonar scanner and jmeter, who test with sonar before the deploy, then deploy in gcp, and for last step, it test with jmeter after the deploy.

Docker-compose cannot find Java

I am trying to use a Python wrapper for a Java library called Tabula. I need both Python and Java images within my Docker container. I am using the openjdk:8 and python:3.5.3 images. I am trying to build the file using Docker-compose, but it returns the following message:
/bin/sh: 1: java: not found
when it reaches the line RUN java -version within the Dockerfile. The line RUN find / -name "java" also doesn't return anything, so I can't even find where Java is being installed in the Docker environment.
Here is my Dockerfile:
FROM python:3.5.3
FROM openjdk:8
FROM tailordev/pandas
RUN apt-get update && apt-get install -y \
python3-pip
# Create code directory
ENV APP_HOME /usr/src/app
RUN mkdir -p $APP_HOME/temp
WORKDIR /$APP_HOME
# Install app dependencies
ADD requirements.txt $APP_HOME
RUN pip3 install -r requirements.txt
# Copy source code
COPY *.py $APP_HOME/
RUN find / -name "java"
RUN java -version
ENTRYPOINT [ "python3", "runner.py" ]
How do I install Java within the Docker container so that the Python wrapper class can invoke Java methods?
This Dockerfile can not work because the multiple FROM statements at the beginning don't mean what you think it means. It doesn't mean that all the contents of the Images you're referring to in the FROM statements will end up in the Images you're building somehow, it actually meant two different concepts throughout the history of docker:
In the newer Versions of Docker multi stage builds, which is a very different thing from what you're trying to achieve (but very interesting nontheless).
In earlier Versions of Docker, it gave you the ability to simply build multiple images in one Dockerfile.
The behavior you are describing makes me assume you are using such an earlier Version. Let me explain what's actually happening when you run docker build on this Dockerfile:
FROM python:3.5.3
# Docker: "The User wants me to build an
Image that is based on python:3.5.3. No Problem!"
# Docker: "Ah, the next FROM Statement is coming up,
which means that the User is done with building this image"
FROM openjdk:8
# Docker: "The User wants me to build an Image that is based on openjdk:8. No Problem!"
# Docker: "Ah, the next FROM Statement is coming up,
which means that the User is done with building this image"
FROM tailordev/pandas
# Docker: "The User wants me to build an Image that is based on python:3.5.3. No Problem!"
# Docker: "A RUN Statement is coming up. I'll put this as a layer in the Image the user is asking me to build"
RUN apt-get update && apt-get install -y \
python3-pip
...
# Docker: "EOF Reached, nothing more to do!"
As you can see, this is not what you want.
What you should do instead is build a single image where you will first install your runtimes (python, java, ..), and then your application specific dependencies. The last two parts you're already doing, here's how you could go about installing your general dependencies:
# Let's start from the Alpine Java Image
FROM openjdk:8-jre-alpine
# Install Python runtime
RUN apk add --update \
python \
python-dev \
py-pip \
build-base \
&& pip install virtualenv \
&& rm -rf /var/cache/apk/*
# Install your framework dependencies
RUN pip install numpy scipy pandas
... do the rest ...
Note that I haven't tested the above snippet, you may have to adapt a few things.

Resources