How to speed up file change from host into docker container? - docker

My host is MacOS with DockerDesktop. I have a Debian container in which a PHP application is running. Parts of the PHP application are part of the docker image, the parts I am still working on are shared with the host through a volume. Think of
docker run -td --name my-app -v /Users/me/mycode:/var/www/html/phpApp/variableParts
My problem: When I save a change on the host it takes some 10-15 seconds until this change becomes available to the containerized app. So (1) after every save it takes (too) long waiting for the code to be available and (2) I cannot be sure whether I already see the new code running or still the old one.
My problem is not that the execution of the application is slow (as some sources in the web suggest), in fact it is quite fast. My problem is that the time for the change to propagate from the host to the docker container is too long. Earlier I developed and had the code from the remote server NFS-mounted on my developing machine and there it was blazing fast.
Is there any way I can reasonably speed this up? Or does a different workflow make more sense? Would mounting the code parts I want to edit from the container (as NFS server) to the host (where the editor runs) make sense?
My workflow consists of many small adaptations to be made to the PHP code, so waitint 10-15 seconds after every edit is a no-go.

I have used Docker on Mac, and have seen edits to a bind mount propagate to the Docker container in under a second, so I think Docker is not to blame here.
Instead, I would look at any caching that PHP is doing. Is PHP reloading your code from disk on every page view, or does it cache it? For example, the opcache feature of PHP keeps a pre-compiled version of your PHP code in memory, and occasionally checks if that version is still up to date. Take a look at your php.ini, and in particular what opcache.revalidate_freq is set to.

Related

What's the best way to cache downloads done by docker while building containers?

While testing new Docker builds (modifying Dockerfile) it can take quite some time for the image to rebuild due to huge download size (either direct by wget, or indirect using apt, pip, etc)
One way around this that I personally use often is to just split commands I plan to modify into their own RUN variable. This avoids re-downloading some parts because previous layers are cached. This, however, doesn't cut it if the command that requires "tuning" is early on in the Dockerfile.
Another solution is to use an image that already contains most of the required packages so that it would just be pulled once and cached, but this can come with unnecessary "baggage".
So is there a straight forward way to cache all downloads done by Docker while building/running? I'm thinking of having Memcached on the host machine but it seems kind of an overkill. Any suggestions?
I'm also aware that I can test in an interactive shell first but sometimes you need to test the Dockerfile and make sure it's production-ready (including arguments and defaults) especially if the only way you will ever see what's going on after that point is ELK or cluster crash logs
This here:
https://superuser.com/questions/303621/cache-for-apt-packages-in-local-network
Is the same question but regarding a local network instead of the same machine. However, the answer can be used in this scenario, it's actually a simpler scenario than a network with multiple machines.
If you install Squid locally you can use it to cache all your downloads including your host-side downloads.
But more specifically, there's also a Squid Docker image!
Headsup: If you use a squid service in a docker-compose file, don't forget to use the squid service name instead of docker's subnet gateway 172.17.0.1:3128 becomes squid:3128
the way i did this was
used the new --mount=type=cache,target=/home_folder/.cache/curl
wrote a script which looks into the cache before calling curl (wrapper over curl with cache)
called the script in the Dockerfile during build
it is more a hack, works

Is there a way to set the "--rm" option for a docker container deployed in a GCP compute instance?

I'm admittedly very new to Docker so this might be a dumb question but here it goes.
I have a Python ETL script that I've packaged in a Docker container essentially following this tutorial, then using cloud functions and cloud scheduler, I have the instance turn start every hour, run the sync and then shut down the instance.
I've run into an issue though where after this process has been running for a while the VM runs out of hard drive space. The script doesn't require any storage or persistence of state - it pulls any state data from external systems and only uses temporary files which are supposed to be deleted when the machine shuts down.
This has caused particular problems where updates I make to the script stop working because the machine doesn't have the space to download the latest version of the container.
I'm guessing it's either logs or perhaps files created automatically to try to persist the state - either within the Docker container or on the VM.
I'm wondering whether if I could get the VM to run the instance with the "--rm" flag so that the image was removed when it was finished this could solve this problem. This would theoretically guarantee that I'm always starting with the most recent image.
The trouble is, I can't for the life of my find a way to configure the "rm" option within the instance settings and the documentation for container options only covers passing arguments to the container ENTRYPOINT and not the docker run options docker run [OPTIONS] IMAGE [COMMAND] [ARG...]
I feel like I'm either missing something obvious or it's not designed to be used this way. Is this something that can be configured in the Dockerfile or is there a different way I have to set up the VM in the first place?
Basically I just want the docker image to be pulled fresh and run each time and not leave any remnants on the VM that will slowly run out of space.
Also, I know Cloud Run might work in some similar situations but I need the script to be able to run for as long as it needs to (particularly at the start when it's backfilling data) and so the 15 minute cap on runtime would be a problem.
Any suggestions would be appreciated!
Note: I'm posting this as an answer as I need more space than a comment. If anyone feels it is not a good answer and wants it deleted, I will be delighted to do such.
Recapping the story, we have a Compute Engine configured to start a Docker Container. The Compute Engine runs the container and then we stop it. An hour later we restart it, let it run and then we stop it again. This continues on into the future. What we seem to find is that the disk associated with the Compute Engine fills up and we end up breaking. The thinking is that the container contained within the Compute Engine is created at first launch of the Compute Engine and then each time it is restarted, it is being "re-used" as opposed to a brand new container instance being created. This means that resources consumed by the container from one run to the next (eg disk storage) continues to grow.
What we would like to happen is that when the Compute Engine starts, it will always create a brand new instance of the container with no history / resource usage of the past. This means that we won't consume resources over time.
One way to achieve this outside of GCP would be to start the container through Docker with the "--rm" flag. This means that when the container ends, it will be auto-deleted and hence there will be no previous container to start the next time the Compute Engine starts. Again ... this is a recap.
If we dig through how GCP Compute Engines work as they relate to containers, we come across a package called "Konlet" (Konlet). This is the package responsible for loading the container in the Compute engine. This appears to be itself a Docker container application written in Go. It appears to read the metadata associated with the Compute Engine and based on that, performs API calls to Docker to launch the target container. The first thing to see from this is that the launch of the target Docker container does not appear to be executed through simple docker command line. This then implies that we can't "simply" edit a script.
Konlet is open source so in principle, we could study it in detail and see if there are special flags associated with it to achieve the equivalent of --rm. However, my immediate recommendation is to post an issue at the Konlet GitHub site and ask the author whether there is a --rm equivalent option for Konlet and, if not, could one be added (and if not, what is the higher level thinking).
In the meantime, let me offer you an alternative to your story. If I am hearing you correctly, every hour you fire a job to start a compute engine, do work and then shutdown the compute engine. This compute engine hosts your "leaky" docker container. What if instead of starting/stopping your compute engine you created/destroyed your compute engine? While the creation/destruction steps may take a little longer to run, given that you are running this once an hour, a minute or two delay might not be egregious.

What's the purpose of the node-modules container in wolkenkit?

That container is built when deploying the application.
Looks like its purpose is to share dependencies across modules.
It looks like it is started as a container but nothing is apparently running, a bit like an init container.
Console says it starts/stops that component when using respective wolkenkit start and wolkenkit stop command.
On startup:
On shutdown:
When you docker ps, that container cannot be found:
Can someone explain these components?
When starting a wolkenkit application, the application is boxed in a number of Docker containers, and these containers are then started along with a few other containers that provide the infrastructure, such as databases, a message queue, ...
The reason why the application is split into several Docker containers is because wolkenkit builds upon the CQRS pattern, which suggests separating the read side of an application from the application's write side, and hence there is one container for the read side, and one for the write side (actually there are a few more, but you get the picture).
Now, since you may develop on an operating system other than Linux, the wolkenkit application may run under a different operating system than when you develop it, as within Docker it's always Linux. This means that the start command can not simply copy over the node_modules folder into the containers, as they may contain binary modules, which are then not compatible (imagine installing on Windows on the host, but running on Linux within Docker).
To avoid issues here, wolkenkit runs an npm install when starting the application inside of the containers. The problem now is that if wolkenkit did this in every single container, the start would be super slow (it's not the fastest thing on earth anyway, due to all the Docker building and starting that's happening under the hood). So wolkenkit tries to optimize this as much as possible.
One concept here is to run npm install only once, inside of a container of its own. This is the node-modules container you encountered. This container is then linked as a volume to all the containers that contain the application's code. This way you only have to run npm install once, but multiple containers can use the outcome of this command.
Since this container now contains data, but no code, it only has to be there, it doesn't actually do anything. This is why it gets created, but is not run.
I hope this makes it a little bit clearer, and I was able to answer your question :-)
PS: Please note that I am one of the core developers of wolkenkit, so take my answer with a grain of salt.

Avoiding a constant "stop, build, up" loop when developing with Docker locally

I've recently delved into the wonder that is Docker and have set up my containers using docker-compose (an Express app and a MySQL DB).
It's been great for sharing the project with the team and pushing to a VPS, but one thing that's fast becoming tedious is the need to stop the running app, docker-compose build then docker-compose up any time there are changes to the code (which I believe is also creating numerous unnecessary images?).
I've scoured about but haven't found a clear-cut way to get around this, barring ditching Docker-compose locally and using docker run to run the Express app pointing to a local DB (which would do away with a lot of the easy set up perks that come with Docker, such as building the DB from scratch).
Is there a Nodemon-style way of working with Docker (images/containers get updates automatically when code changes)? Is there something obvious I'm missing? Or is my approach the necessary "evil" that comes with working on a Dockerised app?
You can mount a volume to your source directory for local development. Any changes on the host will be reflected in the container. https://docs.docker.com/storage/volumes/
You might consider separate services for deployment/development. I usually have a separate service which mounts the source directory and installs test dependencies inside the container.

Is it possible/sane to develop within a container Docker

I'm new to Docker and was wondering if it was possible (and a good idea) to develop within a docker container.
I mean create a container, execute bash, install and configure everything I need and start developping inside the container.
The container becomes then my main machine (for CLI related works).
When I'm on the go (or when I buy a new machine), I can just push the container, and pull it on my laptop.
This sort the problem of having to keep and synchronize your dotfile.
I haven't started using docker yet, so is it something realistic or to avoid (spacke disk problem and/or pull/push timing issue).
Yes. It is a good idea, with the correct set-up. You'll be running code as if it was a virtual machine.
The Dockerfile configurations to create a build system is not polished and will not expand shell variables, so pre-installing applications may be a bit tedious. On the other hand after building your own image to create new users and working environment, it won't be necessary to build it again, plus you can mount your own file system with the -v parameter of the run command, so you can have the files you are going to need both in your host and container machine. It's versatile.
> sudo docker run -t -i -v
/home/user_name/Workspace/project:/home/user_name/Workspace/myproject <container-ID>
I'll play the contrarian and say it's a bad idea. I've done work where I've tried to keep a container "long running" and have modified it, but then accidentally lost it or deleted it.
In my opinion containers aren't meant to be long running VMs. They are just meant to be instances of an image. Start it, stop it, kill it, start it again.
As Alex mentioned, it's certainly possible, but in my opinion goes against the "Docker" way.
I'd rather use VirtualBox and Vagrant to create VMs to develop in.
Docker container for development can be very handy. Depending on your stack and preferred IDE you might want to keep the editing part outside, at host, and mount the directory with the sources from host to the container instead, as per Alex's suggestion. If you do so, beware potential performance issue on macos x with boot2docker.
I would not expect much from the workflow with pushing the images to sync between dev environments. IMHO keeping Dockerfiles together with the code and synching by SCM means is more straightforward direction to start with. I also carry supporting Makefiles to build image(s) / run container(s) same place.

Resources