I'm wondering how people go about using docker for both production and development. In development I want to mount my source/build files to be able to quickly and easily make changes. For production, I want to include the source/build files in the image.
How is this typically done and is there a best or more common practice?
In my mind, ideally I would have one Dockerfile that uses something like a flag or environment variable to setup a prod or dev image, but I can't find any examples of people doing this and I am not sure how exactly it would be done.
I've also seen a few rough examples of projects with unique Dockerfiles for production and development, but then there is the issue of maintaining separate Dockerfiles which could diverge over time if we aren't careful.
Are both of these sensible or am I possibly misunderstanding something? I'm relatively new to docker. An example Dockerfile or project utilizing a similar setup would be great. I wary of dockerizing some of our services with bad practices at the start.
Edit: All my current apps are python base if that affects any responses.
A good aproach is to use Docker's multi-stage builds, since they will allow you to build your artifact inside an image that contains your dev dependencies and only use your artifact and runtime dependencies in the final image.
I'd generally advise against using different Dockerfiles for different environments, this should normally be achieved using configuration parameters (environment variables are a good solution).
Having a faster development feedback cycle depends IMO heavily on the used language. Many people will develop using an IDE, in a more classic fashion and only build the image for integration testing and so on (in my experience this is for example often the case for Java developers). Other interpreted languages might indeed profit from mounting your sources into a development environment image. In this case, you might incorperate this image into your multi-stage build.
In addition to the great answer by Kevin Wittek, it turns out this may have been more of a non issue.
I can use COPY to copy the files to the image, but I can simply mount over top of them for development.
If there's other things like dev-only deps that need to get installed, I can use ARG to specify environments like like ARG APP_ENV=prod and overwrite that with --build-arg=dev or vice-versa.
Related
I'm new to Docker so I want to find best practices for my specific problem.
PROBLEM:
I have 6 python web-scraping scripts that run on same libraries (same requiraments.txt).
My scripts would need frequent updating (few times per week).
Also, my scripts have excel files from which they read and write stuff to, and I need to be able to update that excel files from time to time.
SOLUTIONS?
Do I really need 6 images and 6 containers even doe my containers will have same libraries? I find it time consuming to delete container and image every time I update my code.
For accessing files my excel files, I read about VOLUMES and I intend to implement them. Is that good solution?
Do I really need 6 images and 6 containers even doe my containers will have same libraries?
It depends on technical possibility and personal preference. If you find a good, maintainable way to run all scripts in one Docker container, there's no reason you cannot do it. You could easily use a cron-like solution such as this image.
There are advantages to keeping Docker images single-purpose, though. One of them is clear isolation. If one of your scripts fails to run, you'll have one failing container only and five others that still run successfully. Plus you have full transparency over what exactly fails where.
I find it time consuming to delete container and image every time I update my code.
I would propose to use some CI pipeline to do things like this. The pipeline would automatically build the images on a push, publish them to a registry and recreate the containers/services on your server.
For accessing files my excel files, I read about VOLUMES and I intend to implement them. Is that good solution?
Yes, that's what volumes were made for: Accessing and storing data that isn't part of your image.
I am using docker containers and have docker-compose files for both local development and production environment. I want to try Google Cloud Platform for my new app and specifically Google Kubernetes Engine. My tools is Docker for Mac with Kubernetes on local machine.
It is super important for developers to be able to change code and to see changes live for local development.
Use cases:
Backend developer make changes to basic Flask API (or whatever you use) and should see changes on reloaded app immediately.
Frontend developer make changes to HTML layout and should see changes on web page immediately.
At the moment i am using docker-compose files to mount source code to local containers. But Kubernetes does not support relative paths to mount the source code.
Ideally i should be able to set the variable
Deployment.spec.templates.spec.containers.volumes.hostPath
as relative path to my repo. For example, in our team developers clone repo to this folders:
/User/BACKEND_developer/code/project_repo
/User/FRONTEND_developer/code/project_repo
Obviously you can't commit and build the image after every little change to the source code.
So what is the best practice for local development with Kubernetes? Do i need some additional tools to modify .yaml files for every developer?
#tgogos is right.
The best way to achieve your goal is to use Skaffold
It will rebuild container whenever it sees changes in source code.
Skaffold has a pluggable architecture that allows you to choose the tools in developer workflow that work best for you:
A very promising approach for dynamic languages is the hybrid approach recently introduced by Skaffold, allowing to take advantage of the usual auto-reload mechanisms. You can define two set of files:
Changing a file on the first set triggers the full rebuild+push+deploy mechanism.
Changing a file on the second set only syncs the file between the local machine and the container.
Such an hybrid approach is well suited to a large class of technology stacks, like Node.js, React, Angular, Python, where you can use the native hot-reload mechanism for source code changes, and trigger the full rebuild only when it’s needed (for example, adding a dependency). This helps a lot in keeping the latency low.
I spoke about this in my recent talk at All Day Devops. Here there’s an example based on Node.JS.
I am using docker containers and have docker-compose files for both local development and production environment. I want to try Google Cloud Platform for my new app and specifically Google Kubernetes Engine. My tools is Docker for Mac with Kubernetes on local machine.
It is super important for developers to be able to change code and to see changes live for local development.
Use cases:
Backend developer make changes to basic Flask API (or whatever you use) and should see changes on reloaded app immediately.
Frontend developer make changes to HTML layout and should see changes on web page immediately.
At the moment i am using docker-compose files to mount source code to local containers. But Kubernetes does not support relative paths to mount the source code.
Ideally i should be able to set the variable
Deployment.spec.templates.spec.containers.volumes.hostPath
as relative path to my repo. For example, in our team developers clone repo to this folders:
/User/BACKEND_developer/code/project_repo
/User/FRONTEND_developer/code/project_repo
Obviously you can't commit and build the image after every little change to the source code.
So what is the best practice for local development with Kubernetes? Do i need some additional tools to modify .yaml files for every developer?
#tgogos is right.
The best way to achieve your goal is to use Skaffold
It will rebuild container whenever it sees changes in source code.
Skaffold has a pluggable architecture that allows you to choose the tools in developer workflow that work best for you:
A very promising approach for dynamic languages is the hybrid approach recently introduced by Skaffold, allowing to take advantage of the usual auto-reload mechanisms. You can define two set of files:
Changing a file on the first set triggers the full rebuild+push+deploy mechanism.
Changing a file on the second set only syncs the file between the local machine and the container.
Such an hybrid approach is well suited to a large class of technology stacks, like Node.js, React, Angular, Python, where you can use the native hot-reload mechanism for source code changes, and trigger the full rebuild only when it’s needed (for example, adding a dependency). This helps a lot in keeping the latency low.
I spoke about this in my recent talk at All Day Devops. Here there’s an example based on Node.JS.
I am using docker successfully in dev environment and want to use it now at staging and prod too.
I am developing a web application with symfony where the code is mounted local to the docker container. For staging and prod i want to "bake" the source code to the image, cause theres no need to change it anymore at this time.
At the moment my services "php" and "nginx" needs access to the src files. For staging/prod i would create a extra volume called "src" and mount it to both services. In one of the services (nginx/php) i would add a COPY command to copy the src code on build to the mounted "src" volume.
The problem now is the following:
Whenever a new version of my code exist, the whole image have to build new ... the smallest image (nginx) has a size of 200MB. So every time i want to update only my code (size just 10MB) the whole container (200MB) has to build new ...
In addition i want to check in all builds into a repository.
That is quite expensive with time ...
My thought is the following:
Is it possible to only build the data volume "src" new on each code update (triggered trough a jenkins build job) and check them in?
I think, there is no need to build rarely changed environments like php/nginx/mysql new on every build ...
Or is there another approach?
Initially having 1,5GB for all needed services is quite ok, But having for each version another 200 MB in the repository is too heavy.
Thanks
First the approach you are following is definitely a bad practice. A docker container should be portable and self-contained. Relying on data volumes that are bounded to the host machine will make your container not portable.
By design containers should package all of the dependencies needed to run the application. You should thus add the source to each image if the source code is a dependency that must be provided.
You should investigate other options to make the image size smaller. Depending on the programming language you are using, it is possible to compile/compress the source code and have a smaller binary for instance that can be copied into the image.
One final note is that using very different appraoches to deploy between environment(dev/staging/prod) is usually a bad idea. It is much preferable to have similar deployment strategies to avoid unexpected errors.
If you set up your Dockerfile properly (see docs) so you are adding the code last, it should be a pretty quick operation to update as all the other unchanged layers will be cached. This is pretty common practice as part of a Docker workflow.
You can use this same image for your local development and mount your working code over the code in the container for active development. As long as that exact same code is used to rebuild your images, you should maintain consistency. You could optimize further by choosing which parts of your code are likely to change and order your build accordingly.
You may also want to look into multi-stage build process where you can further optimize your base image and reduce final image size.
I'm working on creating some docker images to be used for testing on dev machines. I plan to build one for our main app as well as one for each of our external dependencies (postgres, elasticsearch, etc). For the main app, I'm struggling with the decision of writing a Dockerfile or compiling an image to be hosted.
On one hand, a Dockerfile is easy to share and modify over time. On the other hand, I expect that advanced configuration (customizing application property files) will be much easier to do in vim before simply committing an new image.
I understand that I can get to the same result either way, but I'm looking for PROS, CONS, and gotchas with either direction.
As a side note, I plan on wrapping this all together using Fig. My initial impression of this tool has been very positive.
Thanks!
Using a Dockerfile:
You have an 'audit log' that describes how the image is built. For me this is fundamental if it is going to be used in a production pipeline where more people are working and maintainability should be a priority.
You can automate the building process of your image, being an easy way of updating the container with system updates, or if it has to take part in a continuous delivery pipeline.
It is a cleaner way of create the layers of your container (each Dockerfile command is a different layer)
Changing a container and committing the changes is great for testing purposes and for fast development for a conceptual test. But if you plan to use the result image for some time, I would definitely use Dockerfiles.
Apart from this, if you have to modify a file and doing it using bash tools (awk, sed...) results very tedious, you can add any file you wish from outside during the building process.
I totally agree with Javier but you need to understand that one image created with a dockerfile can be different with an image build with the same version of the dockerfile 1 day after.
maybe in your build process you retrieve automatically last updates of an app or the os etc …
And at this time if you need to reproduce a crash or whatever you can’t rely on the dockerfile.