docker build running into GB size - docker

I have a Cassandra.tar.gz file which I want to convert into an image. I created a DockerFile (CassandarImageDockerFile.txt) with the following contents
FROM scratch
add apache-cassandra-3.11.6-bin.tar /
Then I ran the following command but noticed that that image size was running into GB while the .tar is only 140MB. I Ctrl+c to stopped the command
C:\Users\manuc\Documents\manu>docker build -f CassandraImageDockerFile.txt .
Sending build context to Docker daemon 4.34GB
What happened under the hood? Why did the image size go in GB? What is the right way to build the image?

The last arg to the build command is the build context. All files that you add or copy to the image must be within that context. It gets sent to the docker engine and the build runs within a sandbox (temp folder and containers) using that context. In this case, the context path is . aka the current directory. So look in that folder and all child directories for files that will total many GB. You can exclude files from being sent in the context to the engine using the .dockerignore file, with a nearly identical syntax to the .gitignore file.

Following things to check here.
Size of base image,.i.e., scratch.
Size of build context - Check the directory from where you are building the image.
For example, docker image build -t xyz:1 .
Here, the build context is the content of the current folder.
So, while building the image, docker sends the build context to the daemon and which gets copied over to the image, which might be the reason of huge size.
So, check the content of the directory and see if you are adding any unnecessary files to your image.

I think the image you are starting from already is some Gb of size. Can you please check? See that scratch image on the FROM on the DockerFile

Related

Dockerfile ADD variable not expandable

I am setting up a docker image, in the dockerfile I have an ADD command where source of the ADD command is a variable.
Dockerfile takes a build argument, I want to use that arg as source of the ADD command.
But ADD command is not expanding the variable and I get an error
Please share any workaround that comes in your mind
FROM ubuntu
ARG source_dir
RUN echo ${source_dir}
ADD ${source_dir} ./ContainerDir
Build command
docker build . -t image --build-arg source_dir=/home/john/Desktop/
data
Error
Step 3/3 : ADD ${source_dir} ./ContainerDir ADD failed: stat /var/lib/docker/tmp/docker-builder311119108/home/john/Desktop/
data: no such file or directory
However, the directory (/home/john/Desktop/
data) exists
From the error message, the variable expanded and complained that you don't have the path in your build context:
stat /var/lib/docker/tmp/docker-builder311119108/a/b/c: no such file or directory
In your example, the build context is . (the current directory) so you need a/b/c in the current directory for this to not error. That also need to not be in any ./.dockerignore file if you have one.
From your second edit:
docker build . -t image --build-arg source_dir=/home/john/Desktop/data
It looks like you are trying to include a directory inside your build from outside of the build context. That is explicitly not allowed in docker builds. All files needed for the ADD and COPY commands need to be included in your context, and the entire content of the context is sent to the build server in the first step, so you want to keep this small (rather than sending the entire home directory). The source is always relative to this context, so /home is looking for ./home since your context is . in the build command.
The fix is to move the data directory to be a sub directory of . where you are building your docker images. You can also switch to COPY since there is no functionality of ADD that you need.
Disclaimer: there are two pieces of over simplification here:
The COPY command can include files from different contexts using the --from option to COPY.
The entire context is sent before the build starts with the classic build command. The newer BuildKit implementation is much more selective about how much and what parts of the context to send.

Why does it show "File not found" when I am trying to run a command from a docker file to find and remove specific logs?

I have a docker file which has below command.
#Kafka log cleanup for log files older than 7 days
RUN find /opt/kafka/logs -name "*.log.*" -type f -mtime -7 -exec rm {} \;
While executing it gives an error opt/kafka/logs not found. But I can access to that directory. Any help on this is appreciated. Thank you.
Changing the contents of a directory defined with VOLUME in your Dockerfile using a RUN step will not work. The temporary container will be started with an anonymous volume and only changes to the container filesystem are saved to the image layer, not changes to the volume.
The RUN step, along with every other step in the Dockerfile, are used to build the image, and this image is the input to the container, it does not use your running containers or volumes for the build input, so it makes no sense to cleanup files that are not created as part of your image build.
If you do delete files created in your image build, you should make sure this is done within the same RUN step. Otherwise, files you delete are already written to an image layer, and are transferred and stored on disk, just not visible in containers based on the layer that includes the delete step.

How to navigate up one folder in a dockerfile

I'm having some trouble building a docker image, because the way the code has been structured. The code is written in C#, and in a solution there is a lot of projects that "support" the application i want to build.
My problem is if i put the dockerfile into the root i can build it, without any problem, and it's okay but i don't think it's the optimal way, because we have some other dockerfiles we also need to build and if i put them all into the root folder i think it will end up messy.
So if i put the dockerfile into the folder with the application, how do i navigate into the root folder to grab the folders i need?
I tried with "../" but from my point of view it didn't seem to work. Is there any way to do it, or what is best practice in this scenario?
TL;DR
run it from the root directory:
docker build . -f ./path/to/dockerfile
the long answer:
in dockerfile you cant really go up.
why
when the docker daemon is building you image, it uses 2 parameters:
your Dockerfile
the context
the context is what you refer to as . in the dockerfile. (for example as COPY . /app)
both of them affect the final image - the dockerfile determines what is going to happen. the context tells docker on which files it should perform the operations you've specified in that dockerfile.
thats how the docs put it:
A build’s context is the set of files located in the
specified PATH or URL. The build process can refer to any of the files
in the context. For example, your build can use a COPY instruction to
reference a file in the context.
so, usually the context is the directory where the Dockerfile is placed. my suggestion is to leave it where it belongs. name your dockerfiles after their role (Dockerfile.dev,Dockerfile.prod, etc) thats ok to have a few of them in the same dir.
the context can still be changed:
after all, you are the one that specify the context. since the docker build command accepts the context and the dockerfile path. when i run:
docker build .
i am actually giving it the context of my current directory, (ive omitted the dockerfile path so it defaults to PATH/Dockerfile)
so if you have a dockerfile in dockerfiles/Dockerfile.dev, you shoul place youself in the directory you want as context, and you run:
docker build . -f dockerfiles/Dockerfile.dev
same applies to docker-compose build section (you specify there a context and the dockerfile path)
hope that made sense.
You can use RUN command and after & do whatever you want.
RUN cd ../ &

docker: how to get the code inside the container

I was reading few articles on how to get code inside docker container.
I found "In short, for production use ADD/COPY method, for the development use docker volume feature"
What i understand form the above
1) We will build an image with the code inside it for production. i.e in the production server i have to pull the image and run it. No need to worry about the code files because everything is packed in the image.
2) While developing use volumes to share the folder.
My question is: wheneve i do a change, i will build an image on development server and pull and run that image in the production server.
Assuming my image Dockerfile is as below:
FROM some-os -- 375Mb
COPY codefolder /root/ --25MB
When i put updated codefolder the image is different from previous.
Most of the times in some-os there are no changes. So codefolder only changes
So everytime (after the first time) i pull the modified image how much MB Is downloaded 400MB or 25 MB
Only the new layer is downloaded after the first time: 25M.

Dockerfile build error and writes to another folder

I made a dockerfile like this
FROM hyeshik/tailseeker:latest
RUN rm /opt/tailseeker/conf/defaults.conf
COPY /Users/Downloads/defaults.conf /opt/tailseeker/conf/
COPY /Users/Downloads/level2/* /opt/tailseeker/refdb/level2/
COPY /Users/Downloads/level3/* /opt/tailseeker/refdb/level3/
My /Users/Downloads/ folder also has other folders named input
When I ran
docker build -f /Users/Downloads/Dockerfile /Users/Downloads/
I get an error saying
Sending build context to Docker daemon 126.8 GB
Error response from daemon: Error processing tar file(exit status 1): write /input/Logs/Log.00.xml: no space left on device
One strange thing here is why is it trying to write to the input folder? And the other one is why does it complain about no space left on device. I have a 1TB disk and only 210GB of it is used. I also used qemu-img and resized my Docker.qcow2. Here is the info of my Docker.qcow2
image:/Users/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/Docker.qcow2
file format: qcow2
virtual size: 214G (229780750336 bytes)
disk size: 60G
cluster_size: 65536
Format specific information:
compat: 1.1
lazy refcounts: true
refcount bits: 16
corrupt: false
Can anyone please help me to copy the contents from my /Users/Downloads folder into the docker image by using that dockerfile above?
Thanks in advance.
build starts by creating a tarball from the context directory (in your case /Users/Downloads/) and sending that tarball to the server. The tarball is created in the tmp directory, which is probably why you're running out of space when trying to build.
When you're working with large datasets the recommended approach is to use a volume. You can use a bind mount volume to mount the files from the host.
If the files you're trying to add aren't that large, you might need to use a .dockerignore to ignore other files under /Users/Downloads.
You can also start the docker daemon with an alternative temp directory using $DOCKER_TMPDIR

Resources