My docker builds are failing because of a file handle limit error. They crash out with
Error: EMFILE: too many open files
when I check ulimit -n on the container I see
-n: file descriptors 1024
So I pass the following flags to my build command
docker build --ulimit nofile=65536:65536 -t web .
but this does not change anything, my container still shows
-n: file descriptors 1024
No matter what I do I dont seem to be able to get that ulimit file descriptor limit to change.
What am I doing wrong here?
So, I discovered the cause. Posting the answer incase anyone else is having the same issue as I just wasted most of a day on this.
I have been debugging a very long running build and have been using
export DOCKER_BUILDKIT=1
to enable some extended build information. Very useful timings etc, although it appears as though enabling DOCKER_BUILDKIT completely ignores ulimit flags passed to the docker build command.
When I set
export DOCKER_BUILDKIT=0
it works. So long story short, avoid using buildkit with ulimit params
I wrote a simple test and it seams to work fine on Docker 18.06
> $ docker -v
Docker version 18.06.1-ce, build e68fc7a
I created a Dockerfile like this:
FROM alpine
RUN ulimit -n > /tmp/ulimit.txt
And then:
> $ docker build --ulimit nofile=65536:65536 .
Sending build context to Docker daemon 2.048kB
Step 1/2 : FROM alpine
---> e21c333399e0
Step 2/2 : RUN ulimit -n > /tmp/ulimit.txt
---> Running in 1aa4391d057d
Removing intermediate container 1aa4391d057d
---> 18dd1953d365
Successfully built 18dd1953d365
docker run -ti 18dd1953d365 cat /tmp/ulimit.txt
65536
> $ docker build --ulimit nofile=1024:1024 --no-cache .
Sending build context to Docker daemon 2.048kB
Step 1/2 : FROM alpine
---> e21c333399e0
Step 2/2 : RUN ulimit -n > /tmp/ulimit.txt
---> Running in c20067d1fe10
Removing intermediate container c20067d1fe10
---> 134fc7252574
Successfully built 134fc7252574
> $ docker run -ti 134fc7252574 cat /tmp/ulimit.txt
1024
When using the BuildKit, docker seems to execute the command in the systemd unit context of the daemon the has the ulimit.
I used the Dockerfile to test:
> cat <<EOF >Dockerfile
FROM alpine
RUN echo -e "\n\n-----------------\nulimit: $(ulimit -n)\n-----------------\n\n"
EOF
Check first the actual limit values for docker service:
> systemctl show docker.service | grep LimitNOFILE
LimitNOFILE=infinity
LimitNOFILESoft=infinity
The values set inside running container is 1048576:
> docker run -it --rm alpine sh -c "ulimit -n"
1048576
The values set inside BuildKit build is 1073741816:
> DOCKER_BUILDKIT=1 docker build --progress=plain --no-cache .
#2 [internal] load build definition from Dockerfile
#2 transferring dockerfile: 195B done
#2 DONE 0.0s
#1 [internal] load .dockerignore
#1 transferring context: 2B done
#1 DONE 0.0s
#3 [internal] load metadata for docker.io/library/alpine:latest
#3 DONE 0.0s
#5 [1/2] FROM docker.io/library/alpine
#5 CACHED
#4 [2/2] RUN echo -e "\n\n-----------------\nulimit: $(ulimit -n)\n--------...
#4 0.452
#4 0.452
#4 0.452 -----------------
#4 0.452 ulimit: 1073741816
#4 0.452 -----------------
#4 0.452
#4 0.452
#4 DONE 0.5s
#6 exporting to image
#6 exporting layers 0.0s done
#6 writing image sha256:facf7aee0b81d814d5b23a663e4f859ec8ba54d7e5fe6fdbbf8beacf0194393b done
#6 DONE 0.0s
Configure the docker.service to set a different default value (LimitNOFILE=1024) that will be also used by BuildKit (be careful not to overwrite an existing file):
> mkdir -p /etc/systemd/system/docker.service.d
> cat <<EOF >/etc/systemd/system/docker.service.d/service.conf.ok
[Service]
LimitNOFILE=1024
EOF
> systemctl daemon-reload
> systemctl restart docker.service
The values set inside running container remains unchanged to 1048576:
> docker run -it --rm alpine sh -c "ulimit -n"
1048576
The values set inside BuildKit build is now 1024:
> DOCKER_BUILDKIT=1 docker build --progress=plain --no-cache .
#2 [internal] load build definition from Dockerfile
#2 transferring dockerfile: 195B done
#2 DONE 0.0s
#1 [internal] load .dockerignore
#1 transferring context: 2B done
#1 DONE 0.0s
#3 [internal] load metadata for docker.io/library/alpine:latest
#3 DONE 0.0s
#5 [1/2] FROM docker.io/library/alpine
#5 CACHED
#4 [2/2] RUN echo -e "\n\n-----------------\nulimit: $(ulimit -n)\n--------...
#4 0.452
#4 0.452
#4 0.452 -----------------
#4 0.452 ulimit: 1024
#4 0.452 -----------------
#4 0.452
#4 0.452
#4 DONE 0.5s
#6 exporting to image
#6 exporting layers 0.0s done
#6 writing image sha256:7e40c8a8d5f0ca8f2b2b53515f11f47655f6e1693ffcd5f5a118402c13a44ab4 done
#6 DONE 0.0s
Related
I use Gitpod as my online IDE. Gitpod builds a Docker container from a user-provided Dockerfile. The user doesn't have access to the terminal which runs the docker build command and thus no flags can be passed. At the moment, my Dockerfile fails build because Docker incorrectly caches instructions, including mkdir commands. Specifically, given the Dockerfile:
# Base image is one of Ubuntu's official distributions.
FROM ubuntu:20.04
# Install curl.
RUN apt-get update
RUN apt-get -y install sudo
RUN sudo apt-get install -y curl
RUN sudo apt-get install -y python3-pip
# Download Google Cloud CLI installation script.
RUN mkdir -p /tmp/google-cloud-download
RUN curl -sSL https://sdk.cloud.google.com > /tmp/google-cloud-download/install.sh
# Install Google Cloud CLI.
RUN mkdir -p /tmp/google-cloud-cli
RUN bash /tmp/gcloud.sh --install-dir=/tmp/google-cloud-cli --disable-prompts
# Move the content of /tmp/gcloud into the container.
COPY /tmp/google-cloud-cli /google-cloud-cli
The build fails with the following log:
#1 [internal] load .dockerignore
#1 transferring context: 114B done
#1 DONE 0.0s
#2 [internal] load build definition from Dockerfile
#2 transferring dockerfile: 1.43kB done
#2 DONE 0.0s
#3 [internal] load metadata for docker.io/library/ubuntu:20.04
#3 DONE 1.2s
#4 [ 1/13] FROM docker.io/library/ubuntu:20.04#sha256:af5efa9c28de78b754777af9b4d850112cad01899a5d37d2617bb94dc63a49aa
#4 resolve docker.io/library/ubuntu:20.04#sha256:af5efa9c28de78b754777af9b4d850112cad01899a5d37d2617bb94dc63a49aa done
#4 sha256:3b65ec22a9e96affe680712973e88355927506aa3f792ff03330f3a3eb601a98 0B / 28.57MB 0.1s
#4 ...
#5 [internal] load build context
#5 transferring context: 1.70MB 0.1s done
#5 DONE 0.1s
#6 [ 5/13] RUN sudo apt-get install -y python3-pip
#6 CACHED
#7 [ 9/13] RUN bash /tmp/gcloud.sh --install-dir=/tmp/google-cloud-cli --disable-prompts
#7 CACHED
#8 [ 4/13] RUN sudo apt-get install -y curl
#8 CACHED
#9 [ 7/13] RUN curl -sSL https://sdk.cloud.google.com > /tmp/google-cloud-download/install.sh
#9 CACHED
#10 [ 8/13] RUN mkdir -p /tmp/google-cloud-cli
#10 CACHED
#11 [ 3/13] RUN apt-get -y install sudo
#11 CACHED
#12 [ 6/13] RUN mkdir -p /tmp/google-cloud-download
#12 CACHED
#13 [10/13] COPY /tmp/google-cloud-cli /google-cloud-cli
#13 ERROR: failed to calculate checksum of ref j0t2zzxkw0572xeibprcp5ebn::w8exf03p6f5luerwcumrkxeii: "/tmp/google-cloud-cli": not found
#14 [ 2/13] RUN apt-get update
#14 CANCELED
------
> [10/13] COPY /tmp/google-cloud-cli /google-cloud-cli:
------
Dockerfile:22
--------------------
20 |
21 | # Move the content of /tmp/gcloud into the container.
22 | >>> COPY /tmp/google-cloud-cli /google-cloud-cli
23 |
24 | # Copy local code to the container image.
--------------------
error: failed to solve: failed to compute cache key: failed to calculate checksum of ref j0t2zzxkw0572xeibprcp5ebn::w8exf03p6f5luerwcumrkxeii: "/tmp/google-cloud-cli": not found
{"#type":"type.googleapis.com/google.devtools.clouderrorreporting.v1beta1.ReportedErrorEvent","command":"build","error":"exit status 1","level":"error","message":"build failed","serviceContext":{"service":"bob","version":""},"severity":"ERROR","time":"2022-08-28T05:31:11Z"}
exit
headless task failed: exit status 1
Other than stop using Gitpod altogheter, which I'm considering, how could I solve this issue?
When you COPY /tmp/google-cloud-cli /google-cloud-cli, it tries to copy a file from outside of Docker space (the build context, the directory argument to docker build, frequently the same directory as the Dockerfile) into the image.
In your case, you already have the file inside the image, so you need to RUN cp or mv or another command to relocate the existing file.
RUN bash /tmp/gcloud.sh --install-dir=/tmp/google-cloud-cli --disable-prompts
RUN mv /tmp/google-cloud-cli /google-cloud-cli
A way to invalidate caches of docker layers in Gitpod is to put in an environment variable above all the layers you want to invalidate and change its value.
FROM gitpod/workspace-full
ENV INVALIDATE_CACHE=1
...
(If this doesn't help, please share a repository with the mentioned Dockerfile to reproduce)
I updated docker compose file. Then when I rebuild containers, it seems they are not restarted. Why?
Here we can see that last step is not cached and new image was created:
$ BUILDKIT_PROGRESS=plain docker compose --verbose -p bot -f docker-compose.yml -f docker-compose.dev.yml --env-file etc/db_env.conf up --detach --build
...
#16 [my-perl 12/12] COPY . .
#16 DONE 0.0s
#17 exporting to image
#17 exporting layers 0.0s done
DEBU[0001] stopping session
#17 writing image sha256:f64d73e7d5c5d5baa69df94dfc083bac08fd8395ae86e25204bad45d20007134 done
#17 naming to docker.io/library/bot_app done
#17 DONE 0.1s
Use 'docker scan' to run Snyk tests against images to find vulnerabilities and learn how to fix them
[+] Running 2/0
⠿ Container bot-db-1 Running 0.0s
⠿ Container bot-app-1 Running
Found answer: https://github.com/docker/compose/issues/9259
This is a bug and fixed at docker compose 2.6.1. Mine version is 2.3.3
I am trying to build a docker image for my sample-go app.
I am running it from the sample-app folder itself and using the goland editor's terminal. But the build is failing and giving me certain errors.
My docker file looks like this:
FROM alpine:latest
RUN mkdir -p /src/build
WORKDIR /src/build
RUN apk add --no-cache tzdata ca-certificates
COPY ./configs /configs
COPY main /main
EXPOSE 8000
CMD ["/main"]
command for building:
docker build --no-cache --progress=plain - < Dockerfile
Error And Logs:
#1 [internal] load build definition from Dockerfile
#1 sha256:8bb9ee83603259cf748d90ce42602f12527fa720d7417da22799b2ad4e503497
#1 transferring dockerfile: 222B done
#1 DONE 0.0s
#2 [internal] load .dockerignore
#2 sha256:f93d938488588cd0e0a94d9d343fe69dcfd28d0cb1da95ad7aab00aac50235c3
#2 transferring context: 2B done
#2 DONE 0.0s
#3 [internal] load metadata for docker.io/library/alpine:latest
#3 sha256:13549c58a76bcb5dac9d52bc368a8fb6b5cf7659f94e3fa6294917b85546978d
#3 DONE 0.0s
#10 [1/6] FROM docker.io/library/alpine:latest
#10 sha256:d20daa00e252bfb345a1b4f53b6bb332aafe702d8de5e583a76fcd09ba7ea1c1
#10 CACHED
#7 [internal] load build context
#7 sha256:0f7a8a6082a837c139acc2855e1b745bba9f28cc96709d45cd0b7be42442c0e8
#7 transferring context: 2B done
#7 DONE 0.0s
#4 [2/6] RUN mkdir -p /src/build
#4 sha256:b9fa3007a44471d47414dd29b3ff07ead6af28ede820a2b4bae0ce84cf2c5a83
#4 CACHED
#5 [3/6] WORKDIR /src/build
#5 sha256:b2ec58a365fdd74c4f9030b0caff2e2225eea33617da306678ad037fce675388
#5 CACHED
#6 [4/6] RUN apk add --no-cache tzdata ca-certificates
#6 sha256:0966097abf956d5781bc2330d49cf715cd52c3807e8fedfff07dec50907ff03b
#6 CACHED
#9 [6/6] COPY main /main
#9 sha256:f4b81960427c014a020361bea0903728f289e1d796892fe0adc6409434f3ca76
#9 ERROR: "/main" not found: not found
#8 [5/6] COPY ./configs /configs
#8 sha256:630f272dd60dd307f40dbbdaef277ee0dfc24b71fa11e10a3b8efd64d3c05086
#8 ERROR: "/configs" not found: not found
#4 [2/6] RUN mkdir -p /src/build
#4 sha256:b9fa3007a44471d47414dd29b3ff07ead6af28ede820a2b4bae0ce84cf2c5a83
#4 DONE 0.2s
------
> [5/6] COPY ./configs /configs:
------
------
> [6/6] COPY main /main:
------
failed to compute cache key: "/main" not found: not found
PS: I am not able to find where is the problem? Help Please
The two folders /main and /configs does not exist.
The COPY command can't copy into this folders.
1. Solution
Create the folders on build
RUN mkdir -p /main
RUN mkdir -p /configs
And than use COPY
2. Solution
Try to build without COPY and CMD
Than run the the new image
exec into running container with bash or sh
Create the folders
Exit exec container
Create a new image of the running container with docker run commit
Stop the container and delete it
Build again with your new image and include COPY and CMD
This is a basic mistake.
COPY ./configs /configs: copy the folder configs from the host to the Docker image.
COPY main /main: copy the executable file main from the host to the Docker image.
The problems are:
The base Docker images do not have these folders /configs, /main. You must create them manually (Docker understood your command this way).
But I have some advice:
Create 2 Docker images for 2 purposes: build, production.
Copy the source code into Docker builder image which is use for building your app.
Copy necessary output files from the Docker builder image into the Docker production image.
In my case, the issue was the connected vpn/proxy network from my machine.
It worked after I disconnecting the vpn/proxy network.
In my case I missed the folder entry in .dockerignore file. Do something like that.
**/*
!docker-images
!configs
!main
I have a Gitlab pipeline that builds my Docker image from a Dockerfile, but when the "docker build" command fails, the pipeline still reports a success.
build:
stage: build
script:
- docker build --no-cache -t $CI_REGISTRY/dockerfile:$CONTAINER_LABEL .
I've added an error to my Dockerfile so I can provoke an error:
FROM ubuntu:20.04
RUN not_a_real_command_that_should_fail
The pipeline runs:
Running with gitlab-runner 14.0.1 (c1edb478)
on ******* Cy33WtLD
Preparing the "shell" executor
00:00
Using Shell executor...
Preparing environment
00:01
Running on **********...
Getting source from Git repository
00:04
Fetching changes with git depth set to 50...
Reinitialized existing Git repository in C:/gitlab-runner/builds/Cy33WtLD/0/dockerfile/.git/
Checking out 999a815d as fix_pipeline_status...
git-lfs/2.13.2 (GitHub; windows amd64; go 1.14.13; git fc664697)
Skipping Git submodules setup
Executing "step_script" stage of the job script
00:03
$ docker build --no-cache -t $CI_REGISTRY/dockerfile:$CONTAINER_LABEL .
#1 [internal] load build definition from Dockerfile
#1 sha256:c544637cbaca3e93c2a8a8c00efd4f81ee45b1abd410d971af12de8dae21e8ea
#1 transferring dockerfile: 3.04kB done
#1 DONE 0.0s
#2 [internal] load .dockerignore
#2 sha256:ab745a167b371ba5e9380063cb278a7792a5838550b89f02f35d7f6a583fb548
#2 transferring context: 2B done
#2 DONE 0.0s
#3 [internal] load metadata for docker.io/library/ubuntu:20.04
#3 sha256:8e67b796a66f85f06793e026943184e32d365c77929e94d2ac98b34a1e1cb30e
#3 DONE 0.6s
#4 [ 1/17] FROM docker.io/library/ubuntu:20.04#sha256:9d6a8699fb5c9c39cf08a0871bd6219f0400981c570894cd8cbea30d3424a31f
#4 sha256:c8b7f784dc481f981cf0bc39c4d4e60a54a355d96ca108a13ffffa3bfa047067
#4 CACHED
#20 [internal] load build context
#20 sha256:d12ef8e847404a2cc9437d8099f4b73f215c48eb92002e759a5f264989ae3ace
#20 transferring context: 92B 0.0s done
#20 DONE 0.0s
#5 [ 2/17] RUN not_a_real_command_that_should_fail
#5 sha256:724c85340f260555ab116f9064ba3c7a2c16fe0af059ef5226df31545b30ddb6
#5 0.485 /bin/sh: 1: not_a_real_command_that_should_fail: not found
#5 ERROR: executor failed running [/bin/sh -c not_a_real_command_that_should_fail]: exit code: 127
------
> [ 2/17] RUN not_a_real_command_that_should_fail:
------
executor failed running [/bin/sh -c not_a_real_command_that_should_fail]: exit code: 127
Cleaning up file based variables
00:01
Job succeeded
The exit code is 127, I've added a -after_stage step to print out the return code "echo $?" and get "true" result back. Seems like this should be sufficient for the command to trigger a failure in the pipeline.
The gitlab-runner is a shell executor on a Windows-machine, if that matters.
Any suggestions?
1 - Gitlab issue
There is a bug for gitlab runner versions prior to 13.1.1 :
https://gitlab.com/gitlab-org/gitlab-runner/-/issues/26347
Where job is always successful when FF_USE_LEGACY_KUBERNETES_EXECUTION_STRATEGY is set to False
Solution seems to be updating.
2 - Script issue
Try to force exit 1 on error with
build:
stage: build
script:
- docker build --no-cache -t $CI_REGISTRY/dockerfile:$CONTAINER_LABEL . || exit 1
Edit 1
By default, Powershell continues its execution.
You can set those variables :
job:
stage: build
variables:
ErrorActionPreference: stop
script:
- docker build --no-cache -t $CI_REGISTRY/dockerfile:$CONTAINER_LABEL .
Please see this issue on github for more information :
https://gitlab.com/gitlab-org/gitlab-runner/-/issues/4683
After moving the pipeline to a gitlab-runner on a Ubuntu machine the pipeline fails as expected. Assuming the Windows-Powershell exit-codes does not trigger correctly
I'm trying to run docker inside a container with my app that creates images and build them. I've read that if I bind docker.sock from my host computer (In the docker-compose.yml) I can do it, but I'm getting this error:
Error, Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?panic: runtime error: invalid memory address or nil pointer dereference
My code:
docker-compose
version: "3.5"
services:
lqcli-backend:
build:
context: .
dockerfile: Dockerfile
image: lqcli-backend
container_name: lqcli-backend
volumes:
# Bind Docker socket on the host so we can connect to the daemon from
# within the container
- "/var/run/docker.sock:/var/run/docker.sock:rw"
Dockerfile
FROM golang:latest
WORKDIR /go/src/app
COPY . .
# Download all the dependencies
RUN go get -d -v ./...
# Install the package
RUN go install -v ./...
RUN go install lqcli.go
RUN go build lqcli.go
RUN mkdir /var/local/lightquery
WORKDIR /var/local/lightquery
RUN cp /go/src/app/lqcliconfig.yml .
RUN cp -R /go/src/app/templatefolder/ .
RUN cp -R /go/src/app/buildfolder/ .
RUN chown -R $(whoami) /var/local/lightquery/
WORKDIR /go/src/app
RUN lqcli -d example2.py -task example2
CMD ["lqcli","-d","example2.py"]
I have also try to RUN service docker start but It doesn't work:
------
> [15/16] RUN service docker start:
#19 0.410 docker: unrecognized service
------
Edit:
Complete error message:
loren#RONDAN1:/mnt/c/Users/rondan/Desktop/lightquery-cli$ docker-compose up --build
Building lqcli-backend
[+] Building 27.6s (19/19) FINISHED
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 38B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/library/golang:latest 2.3s
=> [internal] load build context 0.3s
=> => transferring context: 4.23kB 0.3s
=> [ 1/15] FROM docker.io/library/golang:latest#sha256:7f69ee6e3ea6c3acab98576d8d51bf2e72ed722a0bd4e4363423fddb3 0.0s
=> CACHED [ 2/15] WORKDIR /go/src/app 0.0s
=> [ 3/15] COPY . . 0.0s
=> [ 4/15] RUN go get -d -v ./... 9.7s
=> [ 5/15] RUN go install -v ./... 8.7s
=> [ 6/15] RUN go install lqcli.go 1.6s
=> [ 7/15] RUN go build lqcli.go 1.5s
=> [ 8/15] RUN mkdir /var/local/lightquery 0.5s
=> [ 9/15] WORKDIR /var/local/lightquery 0.1s
=> [10/15] RUN cp /go/src/app/lqcliconfig.yml . 0.4s
=> [11/15] RUN cp -R /go/src/app/templatefolder/ . 0.6s
=> [12/15] RUN cp -R /go/src/app/buildfolder/ . 0.5s
=> [13/15] RUN chown -R $(whoami) /var/local/lightquery/ 0.6s
=> [14/15] WORKDIR /go/src/app 0.0s
=> ERROR [15/15] RUN lqcli -d example2.py -task example2 0.8s
------
> [15/15] RUN lqcli -d example2.py -task example2:
[████████████ ] 25% 25/100Error, Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?panic: runtime error: invalid memory address or nil pointer dereference
#19 0.762 [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x49ea13]
#19 0.762
#19 0.762 goroutine 1 [running]:
#19 0.762 io.ReadAll(0x0, 0x0, 0x0, 0x0, 0x0, 0x1, 0x1)
#19 0.762 /usr/local/go/src/io/io.go:633 +0xb3
#19 0.762 io/ioutil.ReadAll(...)
#19 0.762 /usr/local/go/src/io/ioutil/ioutil.go:27
#19 0.762 main.buildDocker(0xc000105d70, 0x22, 0xc000105da0, 0x25, 0xc00033ff10)
#19 0.762 /go/src/app/lqcli.go:328 +0x83d
#19 0.762 main.main()
#19 0.762 /go/src/app/lqcli.go:571 +0x350
Is the error happening INSIDE the running container (not during docker build)? I will assume so, but if not, you cannot access the docker daemon from inside a build.
You need the following things for this kind of setup to work-
You need a docker CLIENT inside your image. apt install docker-cli or the like
You need a docker DAEMON exposed to your container. Bind mounting like you did there works so long as the daemon is setup to listen that way
You need PERMISSION for the daemon, inside your container. By default the daemon is using a unix socket, so the user running in your container must have unix filesystem permissions to write to that socket
You didn't post an error message, so can't guess which was your issue.