Building a rust project with docker is extremely slow on google cloud - docker

I'm relatively new to Rust but I've been working a project within a Docker container. Below is my dockerfile and it works great. My build uses an intermediary container to build all the cargo containers before the main project. Unless I update a dependency the project builds very quickly locally. Even with the dependencies getting rebuilt it doesn't take more than 10 minutes max on my old macbook pro.
FROM ekidd/rust-musl-builder as builder
WORKDIR /home/rust/
# Avoid having to install/build all dependencies by copying
# the Cargo files and making a dummy src/main.rs
COPY Cargo.toml .
COPY Cargo.lock .
RUN echo "fn main() {}" > src/main.rs
RUN cargo test
RUN cargo build --release
# We need to touch our real main.rs file or else docker will use
# the cached one.
COPY . .
RUN sudo touch src/main.rs
RUN cargo test
RUN cargo build --release
# Size optimization
RUN strip target/x86_64-unknown-linux-musl/release/project-name
# Start building the final image
FROM scratch
WORKDIR /home/rust/
COPY --from=builder /home/rust/target/x86_64-unknown-linux-musl/release/project-name .
ENTRYPOINT ["./project-name"]
However, when I set up my project to automatically build from the github repo via google cloud build I was shocked to see builds taking almost 45 minutes! I figured if I got the caching setup properly for the intermediary container at least that would shave some time off. Even though the builder successful pulls the cached image it doesn't seem to use it and always build the intermediary container from scratch. Here is my cloudbuild.yaml:
steps:
- name: gcr.io/cloud-builders/docker
args:
- "-c"
- >-
docker pull $_GCR_HOSTNAME/$PROJECT_ID/$REPO_NAME/$_SERVICE_NAME:latest
|| exit 0
id: Pull
entrypoint: bash
- name: gcr.io/cloud-builders/docker
args:
- build
- "-t"
- "$_GCR_HOSTNAME/$PROJECT_ID/$REPO_NAME/$_SERVICE_NAME:latest"
- "--cache-from"
- "$_GCR_HOSTNAME/$PROJECT_ID/$REPO_NAME/$_SERVICE_NAME:latest"
- .
- "-f"
- Dockerfile
id: Build
- name: gcr.io/cloud-builders/docker
args:
- push
- "$_GCR_HOSTNAME/$PROJECT_ID/$REPO_NAME/$_SERVICE_NAME:latest"
id: Push
- name: gcr.io/google.com/cloudsdktool/cloud-sdk
args:
- run
- services
- update
- $_SERVICE_NAME
- "--platform=managed"
- "--image=$_GCR_HOSTNAME/$PROJECT_ID/$REPO_NAME/$_SERVICE_NAME:latest"
- >-
--labels=managed-by=gcp-cloud-build-deploy-cloud-run,commit-sha=$COMMIT_SHA,gcb-build-id=$BUILD_ID,gcb-trigger-id=$_TRIGGER_ID,$_LABELS
- "--region=$_DEPLOY_REGION"
- "--quiet"
id: Deploy
entrypoint: gcloud
timeout: 3600s
images:
- "$_GCR_HOSTNAME/$PROJECT_ID/$REPO_NAME/$_SERVICE_NAME:latest"
options:
substitutionOption: ALLOW_LOOSE
I'm looking for any info about what I'm doing wrong in my cloudbuild.yaml and tips on how to speed up my cloud builds considering it's so fast locally. Ideally I'd like to stick with google cloud but if there is another CI service that handles rust/docker builds like this better I'd be open to switch.

This is what I did to improve build time Rust projects on Google Cloud Build. Not a perfect solution, but better than nothing:
Similar changes to yours in Docker file to create different cache layers for deps and my own sources.
Used kaniko to leverage caching (this seems your particular issue)
steps:
- name: 'gcr.io/kaniko-project/executor:latest'
args:
- --destination=eu.gcr.io/$PROJECT_ID/$REPO_NAME:$COMMIT_SHA
- --cache=true
- --cache-ttl=96h
timeout: 2400s
Docs: https://cloud.google.com/build/docs/kaniko-cache
Changed machine type to higher options, in my case:
options:
machineType: 'E2_HIGHCPU_8'
Be careful though, changing machine types will affect your budgets, so you should consider if this worth it for your particular project.
If you push frequently your changes this works much better, yet still not good enough to be honest.

There is 2 things to consider in term of speed:
On your (even old) macbook pro,
You have multi core hyperthreaded CPU
The CPU can go up to 3.5Ghz in turbo mode
On Cloud Build
You have only one vCPU per build (by default)
The vCPU are "server designed CPU": no highend performance, but stable and consistent performance, around 2.1Ghz (slightly more in turbo mode)
So, the difference of performance is obvious. To speed up your build I can recommend to use the machine type option:
...
...
options:
substitutionOption: ALLOW_LOOSE
machineType: 'E2_HIGHCPU_8'
It should be better!

Related

Bitbucket pipelines: Why does the pipeline not seem to be using my custom docker image?

In my pipelines yml file, I specify a custom image to use from my AWS ECR repository. When the pipeline runs, the "Build setup" logs suggests that the image was pulled in and used without issue:
Images used:
build : 123456789.dkr.ecr.ca-central-1.amazonaws.com/my-image#sha256:346c49ea675d8a0469ae1ddb0b21155ce35538855e07a4541a0de0d286fe4e80
I had worked through some issues locally relating to having my Cypress E2E test suite run properly in the container. Having fixed those issues, I expected everything to run the same in the pipeline. However, looking at the pipeline logs it seems that it was being run with an image other than the one I specified (I suspect it's using the Atlassian default image). Here is the source of my suspicion:
STDERR: /opt/atlassian/pipelines/agent/build/packages/server/node_modules/.cache/mongodb-memory-server/mongodb-binaries/4.0.14/mongod: /usr/lib/x86_64-linux-gnu/libcurl.so.4: version `CURL_OPENSSL_3' not found (required by /opt/atlassian/pipelines/agent/build/packages/server/node_modules/.cache/mongodb-memory-server/mongodb-binaries/4.0.14/mongod)
I know the working directory of the default Atlassian image is "/opt/atlassian/pipelines/agent/build/". Is there a reason that this image would be used and not the one I specified? Here is my pipelines config:
image:
name: 123456789.dkr.ecr.ca-central-1.amazonaws.com/my-image:1.4
aws:
access-key: $AWS_ACCESS_KEY_ID
secret-key: $AWS_SECRET_ACCESS_KEY
cypress-e2e: &cypress-e2e
name: "Cypress E2E tests"
caches:
- cypress
- nodecustom
- yarn
script:
- yarn pull-dev-secrets
- yarn install
- $(npm bin)/cypress verify || $(npm bin)/cypress install && $(npm bin)/cypress verify
- yarn build:e2e
- MONGOMS_DEBUG=1 yarn start:e2e && yarn workspace e2e e2e:run
artifacts:
- packages/e2e/cypress/screenshots/**
- packages/e2e/cypress/videos/**
pipelines:
custom:
cypress-e2e:
- step:
<<: *cypress-e2e
For anyone who happens to stumble across this, I suspect that the repository is mounted into the pipeline container at "/opt/atlassian/pipelines/agent/build" rather than the working directory specified in the image. I ran a "pwd" which gave "/opt/atlassian/pipelines/agent/build", though I also ran a "cat /etc/os-release" which led me to the conclusion that it was in fact running the image I specified. I'm still not entirely sure why, even testing everything locally in the exact same container, I was getting that error.
For posterity: I was using an in-memory mongo database from this project "https://github.com/nodkz/mongodb-memory-server". It generally works by automatically downloading a mongod executable into your node_modules and using it to spin up a mongo instance. I was running into a similar error locally, which I fixed by upgrading my base image from a Debian 9 to a Debian 10 based image. Again, still not sure why it didn't run the same in the pipeline, I suppose there might be some peculiarities with how containers are run in pipelines that I'm unaware of. Ultimately my solution was installing mongod into the image itself, and forcing mongodb-memory-server to use that executable rather than the one in node_modules.

How write dockerfile to properly pull code from my github

I'm working on building a website in Go, which is hosted on my home server via docker.
What I'm trying to do:
I make changes to my website/server locally, then push them to github. I'd like to write a dockerfile such that it pulls this data from my github, builds the image, which my docker-compose file will then use to create the container.
Unfortunately, all of my attempts have been somewhat close but wrong.
FROM golang:1.8-onbuild
MAINTAINER <my info>
RUN go get <my github url>
ENV webserver_path /website/
ENV PATH $PATH: webserver_path
COPY website/ .
RUN go build .
ENTRYPOINT ./website
EXPOSE <ports>
This file is kind of a combination of a few small guides I found through google searches, but none quite gave me the information I needed and it never quite worked.
I'm hoping somebody with decent docker experience can just put a Dockerfile together for me to use as a guide so I can find what I'm doing wrong? I think what I'm looking for can be done in only a few lines, and mine is a little more verbose than needed.
ADDITIONAL BUT PROBABLY UNNECESSARY INFORMATION BELOW
Project layout:
Data: is where my go files are Sidenote: This was throwing me errors when trying to build image, something about not being in the environment path. Not sure if that is helpful
Static: CSS, JS, Images
TPL: go template files
Main.go: launches server/website
There are several strategies:
Using of pre-build app. Build your app using
go build command according to target system architecture and OS (using GOOS and GOARCH system variable for example) then use COPY docker command to move this builded file (with assets and templates) to your WORKDIR and finally run it via CMD or ENTRYPOINT (last is preferable). Dockerfile for this example will look like:
FROM scratch
ENV PORT 8000 EXPOSE $PORT
COPY advent / CMD ["/advent"]
Build by dockerfile. Typical Dockerfile:
# Start from a Debian image with the latest version of Go installed
# and a workspace (GOPATH) configured at /go.
FROM golang
# Copy the local package files to the container's workspace.
ADD . /go/src/github.com/golang/example/outyet
# Build the outyet command inside the container.
# (You may fetch or manage dependencies here,
# either manually or with a tool like "godep".)
RUN go install github.com/golang/example/outyet
# Run the outyet command by default when the container starts.
ENTRYPOINT /go/bin/outyet
# Document that the service listens on port 8080.
EXPOSE 8080
Using GitHub. Build your app and pull to dockerhub as ready to use image.
Github supports Webhooks which can be used to do all sorts of things automagically when you push to a git repo. Since you're already running a web server on your home box, why don't you have Github send a POST request to that when it receives a commit on master and have your home box re-download the git repo and restart web services from that?
I was able to solve my issue by just creating an automated build through docker hub, and just using this for my dockerfile:
FROM golang-onbuild
EXPOSE <ports>
It isn't exactly the correct answer to my question, but it is an effective workaround. The automated build connects with my github repo the way I was hoping my dockerfile would.

How to use big file only to build the container without adding it?

I have a big tar/executable (over 30GB) I COPY/ADD it but this is used only for the installation. Once the application is installed I don't need it anymore.
How can I do? I am trying to use it but:
Everytime I run a build, it takes minutes to define the build context.
I'd like to share this image, if I create a tar with docker save, Is the final version or each layer included in it?
I found some solutions that said I can use RUN wget tar ... && rm tar but I don't want to create webserver for that.
Why isn't possible to mount a volume during build process?! It would be very useful.
Use Docker's multi-stage builds. This mechanism allows you to drop intermediate artifacts and therefore achieve a lightweight image.
Example:
FROM alpine:latest as build
# copy large file
# build
FROM alpine:latest as output
# copy necessary files built in the previous stage
COPY --from=build app /app
Anything built in the build stage will not be included in the final image, unless you explicitly COPY them.
Docs: https://docs.docker.com/develop/develop-images/multistage-build/
This is solvable using 2 different context.
Please follow these steps as mentioned below.
Objective is to create a
docker image that will have you large-build file.
docker image that will have you real codebase/executables.
For this you have to create 2 folders (Build & CodeBase) as follow.
Application<br/>
|---> BUILD <br/>
|======|--->Large-File<br/>
|======|--->Dockerfile<br/>
|--->CodeBase<br/>
|======|--->SRC+Other stuff<br/>
|======|--->Dockerfile<br/>
Build & Codebase both folders will have individual Dockerfile and arrange files accordingly.
Dockerfile(Build)
FROM **Base-Image**
COPY Large-File /tmp/Large-File
Build this and tag it with some name like (base-build-app-image)
#>cd Application <==Application root folder as mentioned above==>
#>docker build -t base-build-app-image BUILD <==path of your build-folder==>
Dockerfile(Codebase)
FROM base-build-app-image
RUN *****
CMD *****
RUN rm -f **/tmp/Large-File**
RUN rm -f **Remove installation files that is not required**
ENTRYPOINT *****
Build this-code-base and base-build-app-image is already in your local docker-repository and your large iso file is not in the current-buid-context
#>cd Application <==Application root folder as mentioned above==>
#>docker build CodeBase <==path of your code-base==>
This time since the context size is only your code base and since this doesn't include that Large file - it will definitely reduce your build time.
You can also take an advance of using docker-compose to do both operations together so you will not have to execute 2 separate commands.
If you need help on preparing this docker-compose file then do let me know in comments.
If anything is not clear then leave a comment or come over chat to fix this issue.

Dockerfile with multiple base image

I m trying to create a simple dockerfile in which I need to build my nodejs project in multiple steps :
Installing and caching my dependencies
Running my unit tests
Running my acceptance tests
Building my project
to ensure that my project is working great. Here's what I m having for now :
FROM node:6.9
# Enviroment variables
ENV HOMEDIR /data
RUN mkdir -p ${HOMEDIR}
WORKDIR ${HOMEDIR}
# install all dependencies
ADD package.json ./
RUN npm install
# ... some stuff goes here without any importance
# add node content initially
ADD . .
CMD CI=true npm test && npm run test:acceptance && npm run build
When running my acceptance tests, I use a selenium server. And I need java for this.
The fact is that I don't have java installed, and I wanted to use a "standard" image, while conserving my current node:6.9 image, that would allow me to switch easily from version to version (like https://hub.docker.com/_/openjdk/). I mean there that I don't want to manually install JAVA on my current image.
My problem is that I can't use multiple FROM sources inside my dockerfile and I don't know if what I need is even possible.
Any suggestion ?
The docker way is have small and lightweight image as possible. You production image does not need a java, selenium and etc...
Building and testing application must be outside of container. It can be another image (with selenium, java, etc; or building cluster with multiple containers like selenium, java, etc) for building production images.
I would recommend you to have a base image that only contains the base OS and required software that your application require for running.
Use that base image and create multiple images for different tests.
Once you are done with all the testing use the same base image to package and dockerize your application.

Selecting different code branches when using a shared base image in Docker

I am containerising a codebase that serves multiple applications. I have created three images;
app-base:
FROM ubuntu
RUN apt-get install package
COPY ./app-code /code-dir
...
app-foo:
FROM app-base:latest
RUN foo-specific-setup.sh
and app-buzz which is very similar to app-foo.
This works currently, except I want to be able to build versions of app-foo and app-buzz for specific code branches and versions. It's easy to do that for app-base and tag appropriately, but app-foo and app-buzz can't dynamically select that tag, they are always pinned to app-base:latest.
Ultimately I want this build process automated by Jenkins. I could just dynamically re-write the Dockerfile, or not have three images and just have two nearly-but-not-quite identical Dockerfiles for each app that would need to be kept in sync manually (later increasing to 4 or 5). Each of those solutions has obvious drawbacks however.
I've seen lots of discussions in the past about things such as an INCLUDE statement, or dynamic tags. None seemed to come to anything.
Does anyone have a working, clean(ish) solution to this problem? As long as it means Dockerfile code can be shared across images, I'd be happy. If it also means that the shared layers of images don't need to be rebuilt for each app, then even better.
You could still use build args to do this.
Dockerfile:
FROM ubuntu
ARG APP_NAME
RUN echo $APP_NAME-specific-setup.sh >> /root/test
ENTRYPOINT cat /root/test
Build:
docker build --build-arg APP_NAME=foo -t foo .
Run:
$ docker run --rm foo
foo-specific-setup.sh
In your case you could run the correct script in the RUN using the argument you just set before. You would have one Dockerfile per app-base variant and run the correct set-up based on the build argument.
FROM ubuntu
RUN apt-get install package
COPY ./app-code /code-dir
ARG APP_NAME
RUN $APP_NAME-specific-setup.sh
Any layers before setting the ARG would not need to be rebuilt when creating other versions.
You can then push the built images to separate docker repositories for each app.
If your apps need different ENTRYPOINT instructions, you can have an APP_NAME-entrypoint.sh per app and rename it to entrypoint.sh within your APP_NAME-specific-setup.sh (or pass it through as an argument to run).

Resources