Multiple Dockerfiles in project with different contexts - docker

I have a repository where I need to create several Dockerfiles, but each of them should have a different context.
I like the solution posted here, but it doesn't fully fit with my use case.
NO, THIS IS NOT A DUPLICATE. IT'S A DIFFERENT USE CASE. PLEASE KEEP READING.
I know it's better to exclude from the context unnecessary folders, especially if they are big. Well, my project consists of several folders, some of them are really huge.
For simplicity, suppose this is the file tree of my project:
hugeFolder1/
hugeFolder2/
littleFolder1/
littleFolder2/
And suppose that I need to create two Dockerfiles (following the solution that I previously mentioned):
docker/A/Dockerfile <- let's call this Dockerfile "A"
docker/B/Dockerfile <- let's call this Dockerfile "B"
docker-compose.yml
Now the point is:
A only needs hugeFolder1 and both the little folders.
B only needs hugeFolder2 and both the little folders.
So I would like to exclude the unneeded huge folders respectively.
What is the best way to achieve this?

Edit: Previous answer was adding folders to the image that were outside the build context, which won't work. Additionally OP clarified the contents and how the image will be used in the comments, showing a very good use case for multi stage builds.
I'll take a stab at it, based on the info provided.
Firstly, you can't exclude folders from a given docker context. If you use docker build -t bluescores/myimage /some/path, your context is /some/path/**/*. There's no excluding the huge folders, or the little folders, or anything in them.
Second, in order to use ADD or COPY to bring files into your docker image, they must exist in the build context.
That said, it sounds like you'll end up using various combinations of the huge and little folders. I think you'll be better off sticking with your existing strategy you've outlined, with some optimizations - namely using multi stage builds.
Skip docker-compose for now
The solution here that you reference isn't really aiming to solve the problem of closely controlling context. It's a good answer to a totally different question than what you're asking. You are just trying to build the images right now, and while docker-compose can handle that, it doesn't bring anything to the table you don't have with docker build. When you need to orchestrate these containers you're building, then docker-compose will be incredible.
If you aren't sure you need docker-compose, try doing this without it. You can always bring it back into the mix later.
Now my image is gigantic
See if multi-stage builds are something you can make use of.
This would essentially let you cherry pick the build output from the image you COPY the huge folders into, and put that output in a new, smaller, cleaner image.

Related

How to Dockerize multiple scripts that share requirements.txt and need frequent update

I'm new to Docker so I want to find best practices for my specific problem.
PROBLEM:
I have 6 python web-scraping scripts that run on same libraries (same requiraments.txt).
My scripts would need frequent updating (few times per week).
Also, my scripts have excel files from which they read and write stuff to, and I need to be able to update that excel files from time to time.
SOLUTIONS?
Do I really need 6 images and 6 containers even doe my containers will have same libraries? I find it time consuming to delete container and image every time I update my code.
For accessing files my excel files, I read about VOLUMES and I intend to implement them. Is that good solution?
Do I really need 6 images and 6 containers even doe my containers will have same libraries?
It depends on technical possibility and personal preference. If you find a good, maintainable way to run all scripts in one Docker container, there's no reason you cannot do it. You could easily use a cron-like solution such as this image.
There are advantages to keeping Docker images single-purpose, though. One of them is clear isolation. If one of your scripts fails to run, you'll have one failing container only and five others that still run successfully. Plus you have full transparency over what exactly fails where.
I find it time consuming to delete container and image every time I update my code.
I would propose to use some CI pipeline to do things like this. The pipeline would automatically build the images on a push, publish them to a registry and recreate the containers/services on your server.
For accessing files my excel files, I read about VOLUMES and I intend to implement them. Is that good solution?
Yes, that's what volumes were made for: Accessing and storing data that isn't part of your image.

How to list all files from the build context that affect the contents of the image?

Is it possible to list all files that get copied into the image from the build context, or affect the final contents of the image in any other way?
I need this for dependency tracking; I am sculpting a build system for a project that involves building multiple images and running containers from them in the local dev environment. I need this to be optimized for rapid code-build-debug cycle, and therefore I need to be able to avoid invoking docker build as tightly as possible. Knowing the exact set of files in the build context that end up affecting the image will allow me to specify those as tracked dependencies for the build step that invokes docker build, and avoid unnecessary rebuilds.
I don't need to have this filelist generated in advance, though that is prefereable. If no tool exists to generate it in advance, but there is a way to obtain it from a built image, then that's OK too; the build tool I use is capable of recording dynamic dependencies discovered by a post-build step.
Things that I am acutely aware of, and I still make an informed decision that pursuing this avenue is worthwile:
I know that the number of dependencies thus tracked can be huge-ish. I believe the build tool can handle it.
I know that there are other kinds of dependencies for a docker image besides files in the build context. This is solved by also tracking those dependencies outside of docker build. Unlike files from the build context, those dependencies are either much fewer in number (i.e. files that the Dockerfile's RUN commands explicitly fetch from the internet), or the problem of obtaining an exhaustive list of such dependencies is already solved (e.g. dependencies obtained using a package manager like apt-get are modeled separately, and the installing RUNs are generated into the Dockerfile from the model).
Nothing is copied to the image unless you specifically say so. So, check your Dockerfile for COPY statements and you will know what files from the build context are added to the image.
Notice that, in the event you have a COPY . ., you might have a .dockerignore file in the build context with files you don't want to copy.
I don't think what's you're looking for would be useful even if it was possible. A list of all files in the previously built image wouldn't factor in new files, and it would be difficult to differentiate new files that affect the build from new files that would be ignored.
It's possible that you could parse the Dockerfile, extract every COPY and ADD command, run the current files through a hashing process to identify if they changed from the hash in the image history (you would need to match docker's hashing algorithm which includes details like file ownership and permissions), and then when that hash doesn't match you would know the build needs to run again. You could look at creating a custom buildkit syntax parser, or reuse the low level buildkit code to build your own context processor.
But before you spend too much time trying to implement the above code, realize that it already exists, as docker build. Rather than trying to avoid running a build, I'd focus on getting the build to utilize the build cache so new builds skip all unchanged steps, possibly generating the exact same image id.

In "eshoponcontainers", most of the dockerfiles have copy(all csproj) and restore, does it not make overhaed on container?

I can see comment in each dockfile
Keep the project list and command dotnet restore identical in all Dockfiles to maximize image cache utilization
But I have confusion with,
it would build the fast(due to caching) but does it not take extra space in
container FS.
And in future if I add new project in solution should I make to
change every dockerfile.
Here https://github.com/dotnet-architecture/eShopOnContainers/blob/dev/src/Services/Basket/Basket.API/Dockerfile basket-api docfile have copy command on projects.
The reason for doing this is to take advantage of Docker's layer caching. Yes, there is maintenance overhead to ensure that the list here reflects the actual set of project files that are defined in the repo.
The caching optimization comes into play for iterative development scenarios (inner loop). In those scenarios, you're typically making changes to source code files, not the project files. If you haven't changed any project files since you last built the Dockerfile, you get to take advantage of Docker's layer caching and skip the restoration of NuGet packages for those projects, which can be quite time consuming.
Yes, there is a small amount of extra space being included in the container image because the project files end up getting copied twice which results in duplicate data amongst the two layers. But that's fairly small because project files aren't that big.
There are always tradeoffs. In this case, the desire is to have a fast inner loop build time at the expense of maintenance and a small amount of extra space in the container image.

Outsource source code in docker-compose to use minimal disk space

I am using docker successfully in dev environment and want to use it now at staging and prod too.
I am developing a web application with symfony where the code is mounted local to the docker container. For staging and prod i want to "bake" the source code to the image, cause theres no need to change it anymore at this time.
At the moment my services "php" and "nginx" needs access to the src files. For staging/prod i would create a extra volume called "src" and mount it to both services. In one of the services (nginx/php) i would add a COPY command to copy the src code on build to the mounted "src" volume.
The problem now is the following:
Whenever a new version of my code exist, the whole image have to build new ... the smallest image (nginx) has a size of 200MB. So every time i want to update only my code (size just 10MB) the whole container (200MB) has to build new ...
In addition i want to check in all builds into a repository.
That is quite expensive with time ...
My thought is the following:
Is it possible to only build the data volume "src" new on each code update (triggered trough a jenkins build job) and check them in?
I think, there is no need to build rarely changed environments like php/nginx/mysql new on every build ...
Or is there another approach?
Initially having 1,5GB for all needed services is quite ok, But having for each version another 200 MB in the repository is too heavy.
Thanks
First the approach you are following is definitely a bad practice. A docker container should be portable and self-contained. Relying on data volumes that are bounded to the host machine will make your container not portable.
By design containers should package all of the dependencies needed to run the application. You should thus add the source to each image if the source code is a dependency that must be provided.
You should investigate other options to make the image size smaller. Depending on the programming language you are using, it is possible to compile/compress the source code and have a smaller binary for instance that can be copied into the image.
One final note is that using very different appraoches to deploy between environment(dev/staging/prod) is usually a bad idea. It is much preferable to have similar deployment strategies to avoid unexpected errors.
If you set up your Dockerfile properly (see docs) so you are adding the code last, it should be a pretty quick operation to update as all the other unchanged layers will be cached. This is pretty common practice as part of a Docker workflow.
You can use this same image for your local development and mount your working code over the code in the container for active development. As long as that exact same code is used to rebuild your images, you should maintain consistency. You could optimize further by choosing which parts of your code are likely to change and order your build accordingly.
You may also want to look into multi-stage build process where you can further optimize your base image and reduce final image size.

Dockerfile vs Docker image

I'm working on creating some docker images to be used for testing on dev machines. I plan to build one for our main app as well as one for each of our external dependencies (postgres, elasticsearch, etc). For the main app, I'm struggling with the decision of writing a Dockerfile or compiling an image to be hosted.
On one hand, a Dockerfile is easy to share and modify over time. On the other hand, I expect that advanced configuration (customizing application property files) will be much easier to do in vim before simply committing an new image.
I understand that I can get to the same result either way, but I'm looking for PROS, CONS, and gotchas with either direction.
As a side note, I plan on wrapping this all together using Fig. My initial impression of this tool has been very positive.
Thanks!
Using a Dockerfile:
You have an 'audit log' that describes how the image is built. For me this is fundamental if it is going to be used in a production pipeline where more people are working and maintainability should be a priority.
You can automate the building process of your image, being an easy way of updating the container with system updates, or if it has to take part in a continuous delivery pipeline.
It is a cleaner way of create the layers of your container (each Dockerfile command is a different layer)
Changing a container and committing the changes is great for testing purposes and for fast development for a conceptual test. But if you plan to use the result image for some time, I would definitely use Dockerfiles.
Apart from this, if you have to modify a file and doing it using bash tools (awk, sed...) results very tedious, you can add any file you wish from outside during the building process.
I totally agree with Javier but you need to understand that one image created with a dockerfile can be different with an image build with the same version of the dockerfile 1 day after.
maybe in your build process you retrieve automatically last updates of an app or the os etc …
And at this time if you need to reproduce a crash or whatever you can’t rely on the dockerfile.

Resources