Docker commit alterations of a built image - docker

I 've built an image using A Dockerfile and I've noticed an error in the created image. I've then run the image and fixed the error.
Now I would like to know if the correct flow is to commit the changes to the built image, or create a new image.
Thanks

Basically you have two options:
Fix your Dockerfile and rebuild it, that's the reason we have Dockerfile, from which an expected/correct image can be built by anybody at anywhere
Run a container from your incorrect image and fix it, then commit back to the image. You can choose this option when your building process is extremely long but what you need is to do a quick/small fix. But remember to always fix Dockerfile, which is the "definetion" of your image and I believe you don't want to leave any kind of "bug" in it.

If a working image is all you want, then committing it is fine. You will then get a new image that includes your fix.
However, the benefit of having a Dockerfile is that your build is reproducible, so if your ever want to share the image with someone else, or foresee yourself rebuilding it, you should probably maintain the Dockerfile.

Related

Docker image size doesn’t make sense

I’ve created a ubuntu:bionic base image on my computer. Originally super large size but I deleted 80% of the content by running container and then committing. If I got to root directory and do “du -sh”, it said disk usage 4.5GB. Curious enough, the size of docker image when I do "docker images’ show 11 GB. After pushing to docker hub, I see that it’s 3.34 GB. So I thought perhaps it cleaned up something before compressing? I ran the new image, deleted some more content, commit, and pushed again. This time, “du–sh” said 3.0 GB, “docker images” still said 11GB and docker hub also 3.34 GB. Clearly it is compressing the 11GB file and not the 3.0GB content I’m expecting. Is there a easy way to “clean up” the image?
Docker images are built from layers. When you add a new layer, it doesn't remove the previous layers, it just adds a new one, rather like a new Git commit—the history is still there.
That means when you deleted the content, you made it invisible but it's still there in earlier layers.
You can see the layers and their sizes with docker history yourimagename.
Your options:
Make sure files you don't need don't make it in the first place, e.g. with .dockerignore.
Use a multi-stage build to create new image from the old one with only the files you need. https://docs.docker.com/develop/develop-images/multistage-build/

How to speed up pulling image speed for k8s cluster

I have one application will start one pod in any node of cluster, but if that node doesn't have this image, it will downloading it the first time and it takes a lot of time (it is around 1gb, and takes more than 3 minutes to download the image), what is the best practice to solve this kind of issue ? Pre pull image or share docker image via nfs ?
Deploy a personal docker repository.
https://docs.docker.com/registry/deploying/
Try to reduce the size of the image.
This can be done a few ways depending on your project. For example, if you are using Node you could use FROM node:11-alpine instead of FROM node:11 for a significantly smaller image. Also, make sure you are not putting build files inside the image. Languages like C# and Java have separate build and runtime images. For example, use java-8-jdk to build your project but use java-8-jre your final image as you only need the runtime.
Good luck!
Continuation to #anskurtis-streutker answer, this page explains in detail how to create smaller images.
Build smaller images - Kubernetes best practises
You can give a try to pod disruption budget. With it you can achieve high availability in your app and the download time shouldn't be a problem.
Regards

Outsource source code in docker-compose to use minimal disk space

I am using docker successfully in dev environment and want to use it now at staging and prod too.
I am developing a web application with symfony where the code is mounted local to the docker container. For staging and prod i want to "bake" the source code to the image, cause theres no need to change it anymore at this time.
At the moment my services "php" and "nginx" needs access to the src files. For staging/prod i would create a extra volume called "src" and mount it to both services. In one of the services (nginx/php) i would add a COPY command to copy the src code on build to the mounted "src" volume.
The problem now is the following:
Whenever a new version of my code exist, the whole image have to build new ... the smallest image (nginx) has a size of 200MB. So every time i want to update only my code (size just 10MB) the whole container (200MB) has to build new ...
In addition i want to check in all builds into a repository.
That is quite expensive with time ...
My thought is the following:
Is it possible to only build the data volume "src" new on each code update (triggered trough a jenkins build job) and check them in?
I think, there is no need to build rarely changed environments like php/nginx/mysql new on every build ...
Or is there another approach?
Initially having 1,5GB for all needed services is quite ok, But having for each version another 200 MB in the repository is too heavy.
Thanks
First the approach you are following is definitely a bad practice. A docker container should be portable and self-contained. Relying on data volumes that are bounded to the host machine will make your container not portable.
By design containers should package all of the dependencies needed to run the application. You should thus add the source to each image if the source code is a dependency that must be provided.
You should investigate other options to make the image size smaller. Depending on the programming language you are using, it is possible to compile/compress the source code and have a smaller binary for instance that can be copied into the image.
One final note is that using very different appraoches to deploy between environment(dev/staging/prod) is usually a bad idea. It is much preferable to have similar deployment strategies to avoid unexpected errors.
If you set up your Dockerfile properly (see docs) so you are adding the code last, it should be a pretty quick operation to update as all the other unchanged layers will be cached. This is pretty common practice as part of a Docker workflow.
You can use this same image for your local development and mount your working code over the code in the container for active development. As long as that exact same code is used to rebuild your images, you should maintain consistency. You could optimize further by choosing which parts of your code are likely to change and order your build accordingly.
You may also want to look into multi-stage build process where you can further optimize your base image and reduce final image size.

Why does the thrift docker image need go

This docker file's goal is to:
Goal: provide a thrift-compiler Docker image
I was just wondering why does this image need to install golang
It appears to download the Golang binary package but only copies over gofmt. Looking at https://github.com/apache/thrift/blob/19baeefd8c38d62085891d7956349601f79448b3/compiler/cpp/src/thrift/generate/t_go_generator.cc it seems that at one point they were running gofmt on the Golang generated code.
The comment for that part of code links to https://issues.apache.org/jira/browse/THRIFT-3893 which references pull request https://github.com/apache/thrift/pull/1061 where the feature was actually removed.
The specific commit (https://github.com/apache/thrift/commit/2007783e874d524a46b818598a45078448ecc53e) appears to be in 0.10 but not 0.9. So, along with the disabling of gofmt they probably just forgot to remove it from the image or decided it was just worth leaving as the feature could be fixed and re-enabled at a later date.
It might be worth opening an issue to ask the Thrift team about it and if it can be removed.

Dockerfile vs Docker image

I'm working on creating some docker images to be used for testing on dev machines. I plan to build one for our main app as well as one for each of our external dependencies (postgres, elasticsearch, etc). For the main app, I'm struggling with the decision of writing a Dockerfile or compiling an image to be hosted.
On one hand, a Dockerfile is easy to share and modify over time. On the other hand, I expect that advanced configuration (customizing application property files) will be much easier to do in vim before simply committing an new image.
I understand that I can get to the same result either way, but I'm looking for PROS, CONS, and gotchas with either direction.
As a side note, I plan on wrapping this all together using Fig. My initial impression of this tool has been very positive.
Thanks!
Using a Dockerfile:
You have an 'audit log' that describes how the image is built. For me this is fundamental if it is going to be used in a production pipeline where more people are working and maintainability should be a priority.
You can automate the building process of your image, being an easy way of updating the container with system updates, or if it has to take part in a continuous delivery pipeline.
It is a cleaner way of create the layers of your container (each Dockerfile command is a different layer)
Changing a container and committing the changes is great for testing purposes and for fast development for a conceptual test. But if you plan to use the result image for some time, I would definitely use Dockerfiles.
Apart from this, if you have to modify a file and doing it using bash tools (awk, sed...) results very tedious, you can add any file you wish from outside during the building process.
I totally agree with Javier but you need to understand that one image created with a dockerfile can be different with an image build with the same version of the dockerfile 1 day after.
maybe in your build process you retrieve automatically last updates of an app or the os etc …
And at this time if you need to reproduce a crash or whatever you can’t rely on the dockerfile.

Resources