Trouble Building and Deploying a Docker Container to Cloud Run

Trouble Building and Deploying a Docker Container to Cloud Run - docker

I've had a couple Cloud Run services live for several months. However, I attempted to make some updates to a service yesterday, and suddenly the scripts I have been using since the beginning are no longer functioning.
gcloud build submit
I've been using the following command to build my node/npm project via the remote docker container:
gcloud builds submit --tag gcr.io/PROJECT_ID/generator
I have a dockerfile and .dockerignore in the same directory as my package.json from where I run this script. However, yesterday I suddenly started getting an error which read that a dockerfile is required when using the --tag parameter and the image would not build.
Tentative Solution
After some research, I tried moving my build cnfig into a gcloudbuild-staging.json, which looks like this:
{
"steps": [
{
"name": "gcr.io/cloud-builders/docker",
"args": [
"build",
"-t",
"gcr.io/PROJECT_ID/generator",
"."
]
}
]
}
And I've chnaged my build script to:
gcloud builds submit --config=./gcloudbuild-staging.json
After doing this, the container will build - or as far as I can tell. The console output looks like this:
------------------------------------------------- REMOTE BUILD OUTPUT --------------------------------------------------
starting build "8ca1af4c-d337-4349-959f-0000577e4528"
FETCHSOURCE
Fetching storage object: gs://PROJECT_ID/source/1650660913.623365-8a689bcf007749b7befa6e21ab9086dd.tgz#1650660991205773
Copying gs://PROJECT_ID/source/1650660913.623365-8a689bcf007749b7befa6e21ab9086dd.tgz#1650660991205773...
/ [0 files][ 0.0 B/ 22.2 MiB]
/ [1 files][ 22.2 MiB/ 22.2 MiB]
-
Operation completed over 1 objects/22.2 MiB.
BUILD
Already have image (with digest): gcr.io/cloud-builders/docker
Sending build context to Docker daemon 785.4kB
Step 1/6 : FROM node:14-slim
14-slim: Pulling from library/node
8bd3f5a20b90: Pulling fs layer
3a665e454db5: Pulling fs layer
11fcaa1377c4: Pulling fs layer
bf0a7233d366: Pulling fs layer
0d4d73621610: Pulling fs layer
bf0a7233d366: Waiting
0d4d73621610: Waiting
3a665e454db5: Verifying Checksum
3a665e454db5: Download complete
bf0a7233d366: Verifying Checksum
bf0a7233d366: Download complete
8bd3f5a20b90: Verifying Checksum
8bd3f5a20b90: Download complete
0d4d73621610: Verifying Checksum
0d4d73621610: Download complete
11fcaa1377c4: Verifying Checksum
11fcaa1377c4: Download complete
8bd3f5a20b90: Pull complete
3a665e454db5: Pull complete
11fcaa1377c4: Pull complete
bf0a7233d366: Pull complete
0d4d73621610: Pull complete
Digest: sha256:9ea3dfdff723469a060d1fa80577a090e14ed28157334d649518ef7ef8ba5b9b
Status: Downloaded newer image for node:14-slim
---> 913d072dc4d9
Step 2/6 : WORKDIR /usr/src/app
---> Running in 96bc104b9501
Removing intermediate container 96bc104b9501
---> 3b1b05ea0470
Step 3/6 : COPY package*.json ./
---> a6eca4a75ddd
Step 4/6 : RUN npm ci --only=production
---> Running in 7e870db13a9b
> protobufjs#6.11.2 postinstall /usr/src/app/node_modules/protobufjs
> node scripts/postinstall
added 237 packages in 7.889s
Removing intermediate container 7e870db13a9b
---> 6a86cc961a09
Step 5/6 : COPY . ./
---> 9e1f0f7a69a9
Step 6/6 : CMD [ "node", "index.js" ]
---> Running in d1b4d054a974
Removing intermediate container d1b4d054a974
---> 672075ef5897
Successfully built 672075ef5897
Successfully tagged gcr.io/PROJECT_ID/generator:latest
PUSH
DONE
------------------------------------------------------------------------------------------------------------------------
ID CREATE_TIME DURATION SOURCE IMAGES STATUS
8ca1af4c-d337-4349-959f-0000577e4528 2022-04-22T20:56:31+00:00 31S gs://PROJECT_ID/source/1650660913.623365-8a689bcf007749b7befa6e21ab9086dd.tgz - SUCCESS
There are no errors in the online logs.
gcloud run deploy
Here is the code I use to deploy the container:
gcloud run deploy generator --image gcr.io/PROJECT_ID/generator --region=us-central1 --set-env-vars ENVIRONMENT=DEV
The console output for this is:
Deploying container to Cloud Run service [generator] in project [PROJECT_ID] region [us-central1]
✓ Deploying... Done.
✓ Creating Revision...
✓ Routing traffic...
Done.
Service [generator] revision [generator-00082-kax] has been deployed and is serving 100 percent of traffic.
Service URL: https://generator-SERVICE_ID-uc.a.run.app
No errors in the run console, either. It shows the deployment as if everything is fine.
The Problem
Nothing is changing. Locally, running this service with the front-end app which accesses it produces successful results. However, my staging version of the app hosted on Firebase is still acting as if the old version of the code is active.
What I've Tried
I've made sure I'm testing and deploying on the same git branch
I've done multiple builds and deployments in case there was some kind of fluke.
I've tried using the gcloud command to update a service's traffic to its latest revision
I've made sure my client app is using the correct service URL. It doesn't appear to have changed but I copy/pasted it anyway just in case
My last successful deployment was on March 19, 2022. Since them, the only thing I've done is update all my WSL linux apps - which would include gcloud. I couldn't tell what version I ws on before, but I'm now on 38.0.0 of the Google Cloud CLI.
I've tried searching for my issue but nothing relevant is coming up. I'm totally stumped as to why all of this has stopped working and I'm receiving no errors whatsoever. Any suggestions or further info I can provide?

gcloud builds submit should (!?) continue to work with --tag as long as there is a Dockerfile in the folder from which you're running the command or you explicitly specify a source folder.
I'm not disputing that you received an error but it would be helpful to see the command you used and the error that resulted. You shouldn't have needed to switch to a build config file. Although that isn't the problem.
Using latest as a tag value is challenging. The term suggests that the latest version of a container image will be used but this is often not what happens. It is particularly challenging when a service like Cloud Run is running an image tagged latest and a developer asks the service to run -- what the developer knows (!) is a different image -- but also tagged latest.
As far as most services are concerned, same tag means same image and so it's possible (!) either that Cloud Run is not finding a different image or you're not providing it with a different image. I'm unclear which alternative is occurring but I'm confident that your use of latest is causing some of your issues.
So.... for starters, please consider using a system in which every time you create a new container, you tag it with a unique identifier. A common way to do this is to use a commit hash (as these change with every commit). Alternatively you can use the container's digest (instead of a tag) to reference an image version. This requires image references of the form {IMG}#sha256:{HASH}.
Lastly, gcloud run now (has always?) supported deployment from source (folder) to running service (it does the Cloud Build process for you and deploys the result to Cloud Run. It may be worth using this flow to reduce your steps and thereby the possibility of error.
See: Deploying from source code

Related

Can kaniko take snapshots by each stage? (Not each RUN or COPY operation)

I have a Dockerfile and I'm building container image on Google Cloud Build (GCP) using Kaniko.
About my Dockerfile
The Dockerfile has 4 stages (Multi-Stage Build).
And there are 13 RUN or COPY steps in the Dockerfile.
Current build speed.
Kaniko on GCP
Full build on Kaniko: about 10minutes.
Rebuild without code change: about 3~4 minutes
docker build on my local Mac
Full build: About 6 min 58 sec.
Rebuild without code change: 3.48 sec.
Question
I'd like to try to decrease the count of cache pulling and cache saving if Kaneko can do it.
Kaniko looks don't have an option to take snapshots by each docker build stage. (not each step)
https://github.com/GoogleContainerTools/kaniko/blob/master/README.md
Does anyone know a solution?
Otherwise, Do you have an idea about suppress cache pulling/cache saving overhead?

similar issue where kaniko was taking a current image of filesystem after every docker command and some taking upwards of ~5 mins each which was too long,
2022-11-04T09:09:38+13:00 INFO[0170] Taking snapshot of full filesystem...
2022-11-04T09:09:38+13:00 DEBU[0170] Current image filesystem: map[ ***REMOVED LARGE LOGS**** ]
***~5mins***
2022-11-04T09:14:43+13:00 DEBU[0000] Getting source context from dir://
tried the parameter mentioned,
--single-snapshot
which did skip the intermediate snapshots saving tons of time,
DEBU[0169] Build: skipping snapshot for [RUN ls -la /workspace/appHelloWorld]

Bitbucket pipelines: Why does the pipeline not seem to be using my custom docker image?

In my pipelines yml file, I specify a custom image to use from my AWS ECR repository. When the pipeline runs, the "Build setup" logs suggests that the image was pulled in and used without issue:
Images used:
build : 123456789.dkr.ecr.ca-central-1.amazonaws.com/my-image#sha256:346c49ea675d8a0469ae1ddb0b21155ce35538855e07a4541a0de0d286fe4e80
I had worked through some issues locally relating to having my Cypress E2E test suite run properly in the container. Having fixed those issues, I expected everything to run the same in the pipeline. However, looking at the pipeline logs it seems that it was being run with an image other than the one I specified (I suspect it's using the Atlassian default image). Here is the source of my suspicion:
STDERR: /opt/atlassian/pipelines/agent/build/packages/server/node_modules/.cache/mongodb-memory-server/mongodb-binaries/4.0.14/mongod: /usr/lib/x86_64-linux-gnu/libcurl.so.4: version `CURL_OPENSSL_3' not found (required by /opt/atlassian/pipelines/agent/build/packages/server/node_modules/.cache/mongodb-memory-server/mongodb-binaries/4.0.14/mongod)
I know the working directory of the default Atlassian image is "/opt/atlassian/pipelines/agent/build/". Is there a reason that this image would be used and not the one I specified? Here is my pipelines config:
image:
name: 123456789.dkr.ecr.ca-central-1.amazonaws.com/my-image:1.4
aws:
access-key: $AWS_ACCESS_KEY_ID
secret-key: $AWS_SECRET_ACCESS_KEY
cypress-e2e: &cypress-e2e
name: "Cypress E2E tests"
caches:
- cypress
- nodecustom
- yarn
script:
- yarn pull-dev-secrets
- yarn install
- $(npm bin)/cypress verify || $(npm bin)/cypress install && $(npm bin)/cypress verify
- yarn build:e2e
- MONGOMS_DEBUG=1 yarn start:e2e && yarn workspace e2e e2e:run
artifacts:
- packages/e2e/cypress/screenshots/**
- packages/e2e/cypress/videos/**
pipelines:
custom:
cypress-e2e:
- step:
<<: *cypress-e2e

For anyone who happens to stumble across this, I suspect that the repository is mounted into the pipeline container at "/opt/atlassian/pipelines/agent/build" rather than the working directory specified in the image. I ran a "pwd" which gave "/opt/atlassian/pipelines/agent/build", though I also ran a "cat /etc/os-release" which led me to the conclusion that it was in fact running the image I specified. I'm still not entirely sure why, even testing everything locally in the exact same container, I was getting that error.
For posterity: I was using an in-memory mongo database from this project "https://github.com/nodkz/mongodb-memory-server". It generally works by automatically downloading a mongod executable into your node_modules and using it to spin up a mongo instance. I was running into a similar error locally, which I fixed by upgrading my base image from a Debian 9 to a Debian 10 based image. Again, still not sure why it didn't run the same in the pipeline, I suppose there might be some peculiarities with how containers are run in pipelines that I'm unaware of. Ultimately my solution was installing mongod into the image itself, and forcing mongodb-memory-server to use that executable rather than the one in node_modules.

Docker build not using cache when copying Gemfile while using --cache-from

On my local machine, I have built the latest image, and running another docker build uses cache everywhere it should.
Then I upload the image to the registry as the latest, and then on my CI server, I'm pulling the latest image of my app in order to use it as the build cache to build the new version :
docker pull $CONTAINER_IMAGE:latest
docker build --cache-from $CONTAINER_IMAGE:latest \
--tag $CONTAINER_IMAGE:$CI_COMMIT_SHORT_SHA \
.
From the build output we can see the COPY of the Gemfile is not using the cache from the latest image, while I haven't updated that file :
Step 15/22 : RUN gem install bundler -v 1.17.3 && ln -s /usr/local/lib/ruby/gems/2.2.0/gems/bundler-1.16.0 /usr/local/lib/ruby/gems/2.2.0/gems/bundler-1.16.1
---> Using cache
---> 47a9ad7747c6
Step 16/22 : ENV BUNDLE_GEMFILE=$APP_HOME/Gemfile BUNDLE_JOBS=8
---> Using cache
---> 1124ad337b98
Step 17/22 : WORKDIR $APP_HOME
---> Using cache
---> 9cd742111641
Step 18/22 : COPY Gemfile $APP_HOME/
---> f7ff0ee82ba2
Step 19/22 : COPY Gemfile.lock $APP_HOME/
---> c963b4c4617f
Step 20/22 : RUN bundle install
---> Running in 3d2cdf999972
Aside node : It is working perfectly on my local machine.
Looking at the Docker documentation Leverage build cache doesn't seem to explain the behaviour here as neither the Dockerfile, nor the Gemfile has changed, so the cache should be used.
What could prevent Docker from using the cache for the Gemfile?
Update
I tried to copy the files setting the right permissions using COPY --chown=user:group source dest but it still doesn't use the cache.
Opened Docker forum topic: https://forums.docker.com/t/docker-build-not-using-cache-when-copying-gemfile-while-using-cache-from/69186

Let me share with you some information that helped me to fix some issues with Docker build and --cache-from, while optimizing a CI build.
I had struggled for several days because I didn't have the correct understanding, I was basing myself on incorrect explanations found on the webs.
So I'm sharing this here hoping it will be useful to you.
When providing multiple --cache-from, the order matters
The order is very important, because at the first match, Docker will stop looking for other matches and it will use that one for all the rest of the commands.
This is explained by the person who implemented the feature in the Github PR:
When using multiple --cache-from they are checked for a cache hit in the order that user specified. If one of the images produces a cache hit for a command only that image is used for the rest of the build.
There is also a lenghtier explanation in the initial ticket proposal:
Specifying multiple --cache-from images is bit problematic. If both images match there is no way(without doing multiple passes) to figure out what image to use. So we pick the first one(let user control the priority) but that may not be the longest chain we could have matched in the end. If we allow matching against one image for some commands and later switch to a different image that had a longer chain we risk in leaking some information between images as we only validate history and layers for cache. Currently I left it so that if we get a match we only use this target image for rest of the commands.
Using --cache-from is exclusive: the local Docker cache won't be used
This means that it doesn't add new caching sources, the image tags you provide will be the only caching source for the Docker build.
Even if you just built the same image locally, the next time you run docker build for it, in order to benefit from the cache, you need to either:
provide the correct tag(s) with --cache-from (and with the correct precedence); or
not use --cache-from at all (so that it will use the local build cache)
If the parent image changes, the cache will be invalidated
For example, if you have an image based on docker:stable, and docker:stable gets updated, the cached builds of your image will not be valid anymore as the layers of the base image were changed.
This is why, if you're configuring a CI build, it can be useful to docker pull the base image as well and include it in the --cache-from, as mentioned in this comment in yet another Github discussion.

I struggled with this problem, and in my case I used COPY when the checksum might have changed (but only technically, the content was functionally identical). So, I worked around this way:
Dockerfile:
ARG builder_image=base-builder
# Compilation/build stage
FROM golang:1.16 AS base-builder
RUN echo "build the app" > /go/app
# This step is required to facilitate docker cache. With the definition of a `builder_image` build tag
# we can essentially skip the build stage and use a prebuilt-image directly.
FROM $builder_image AS builder
# myapp docker image
FROM ubuntu:20.04 AS myapp
COPY --from=builder /go/app /opt/my-app/bin/
Then, I can run the following:
# build cache
DOCKER_BUILDKIT=1 docker build --target base-builder -t myapp-builder .
docker push myapp-builder
# use cache
DOCKER_BUILDKIT=1 docker build --target myapp --build-arg=builder_image=myapp-builder -t myapp .
docker push myapp
This way we can force Docker to use a prebuilt image as a cache.

For whoever is fighting with DockerHub automated builds and --cache-from. I realized images built from DockerHub would always lead to cache bust on COPY commands when pulled and used as build cache source. It seems to be also the case for #Marcelo (refs his comment).
I investigated by creating a very simple image doing a couple of RUN commands and later COPY. Everything is using the cache except the COPY. Even though content and permissions of the file being copied is the same on both the pulled image and the one built locally (verified via sha1sum and ls -l).
The solution for me was to publish the image to the registry from the CI (Travis in my case) rather than letting DockerHub automated build doing it. Let me emphasis here that I'm talking here about a specific case where files are definitely the same and should not cache bust, but you're using DockerHub automated builds.
I'm not sure why is that, but I know for instance old docker-engine version e.g. prior 1.8.0 didn't ignore file timestamp to decide whether to use the cache or not, refs https://docs.docker.com/release-notes/docker-engine/#180-2015-08-11 and https://github.com/moby/moby/pull/12031.

For a COPY command to be cached, the checksum needs to be identical on the source being copied. You can compare the checksum in the docker history output between the cache image and the one you just built. Most importantly, the checksum includes metadata like the file owner and file permission, in addition to file contents. Whitespace changes inside a file like changing to linefeeds between Linux and Windows styles will also affect this. If you download the code from a repo, it's likely the metadata, like the owner, will be different from the cached value.

How write dockerfile to properly pull code from my github

I'm working on building a website in Go, which is hosted on my home server via docker.
What I'm trying to do:
I make changes to my website/server locally, then push them to github. I'd like to write a dockerfile such that it pulls this data from my github, builds the image, which my docker-compose file will then use to create the container.
Unfortunately, all of my attempts have been somewhat close but wrong.
FROM golang:1.8-onbuild
MAINTAINER <my info>
RUN go get <my github url>
ENV webserver_path /website/
ENV PATH $PATH: webserver_path
COPY website/ .
RUN go build .
ENTRYPOINT ./website
EXPOSE <ports>
This file is kind of a combination of a few small guides I found through google searches, but none quite gave me the information I needed and it never quite worked.
I'm hoping somebody with decent docker experience can just put a Dockerfile together for me to use as a guide so I can find what I'm doing wrong? I think what I'm looking for can be done in only a few lines, and mine is a little more verbose than needed.
ADDITIONAL BUT PROBABLY UNNECESSARY INFORMATION BELOW
Project layout:
Data: is where my go files are Sidenote: This was throwing me errors when trying to build image, something about not being in the environment path. Not sure if that is helpful
Static: CSS, JS, Images
TPL: go template files
Main.go: launches server/website

There are several strategies:
Using of pre-build app. Build your app using
go build command according to target system architecture and OS (using GOOS and GOARCH system variable for example) then use COPY docker command to move this builded file (with assets and templates) to your WORKDIR and finally run it via CMD or ENTRYPOINT (last is preferable). Dockerfile for this example will look like:
FROM scratch
ENV PORT 8000 EXPOSE $PORT
COPY advent / CMD ["/advent"]
Build by dockerfile. Typical Dockerfile:
# Start from a Debian image with the latest version of Go installed
# and a workspace (GOPATH) configured at /go.
FROM golang
# Copy the local package files to the container's workspace.
ADD . /go/src/github.com/golang/example/outyet
# Build the outyet command inside the container.
# (You may fetch or manage dependencies here,
# either manually or with a tool like "godep".)
RUN go install github.com/golang/example/outyet
# Run the outyet command by default when the container starts.
ENTRYPOINT /go/bin/outyet
# Document that the service listens on port 8080.
EXPOSE 8080
Using GitHub. Build your app and pull to dockerhub as ready to use image.

Github supports Webhooks which can be used to do all sorts of things automagically when you push to a git repo. Since you're already running a web server on your home box, why don't you have Github send a POST request to that when it receives a commit on master and have your home box re-download the git repo and restart web services from that?

I was able to solve my issue by just creating an automated build through docker hub, and just using this for my dockerfile:
FROM golang-onbuild
EXPOSE <ports>
It isn't exactly the correct answer to my question, but it is an effective workaround. The automated build connects with my github repo the way I was hoping my dockerfile would.

wercker-cli - copying source to container takes a few minutes - how to make it faster

I'm running php/mysql/laravel project in wercker - to perform phpunit tests.
I installed wercker-cli and docker on my macbook.
I'm able to run it exactly in the same way as on remote wercker.com, however localy it takes much longer that remotely.
Longest step is when sources are copied to container. Is there any way to bypass this step or cache?
Disk SSD, 3GB reserved for Docker.
What exactly this step is doing?
wercker build --expose-ports
--> No Docker host specified, checking: /var/run/docker.sock
--> Executing pipeline
--> Running step: setup environment
Pulling from library/php: 7.1-fpm
Digest:
sha256:2e94b90aa3...f3b355fb
Status: Image is up to date for php:7.1-fpm
--> Copying source to container

I had the same issue a few days ago on a Node stack, with a lot of dependencies in node_modules.
The solution I found to make it faster was to clone from my git repository in a fresh path, NOT installing the dependencies and running Wercker from there.
I went from ~2minutes of copy to <1s ^_^
NOTE : I think that a .werckerignore file should also do the job.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart