How to preserve docker cache on disconnected systems? - docker

Synopsis. A remote instance gets connected to the Internet via satellite modem when technician visits the cabin. Technician setups the application stack via docker compose and leaves the location. The location has no internet connection and periodically loses electricity (once in a few days).
The application stack is typical, like mysql + nodejs. And it is used by "polar bears". I mean nobody, it is a monitoring app.
How to ensure that docker images will be persisted for an undefined amount of time and the compose stack survives through endless reboots?

Unfortunately there is no real easy solution.
But with a little bit of yq magic to parse docker-compose.yaml and docker save command it is possible to store the images locally to a specific location.
Then we can add startup script to import these images using docker load into the local docker cache.

Related

How to speed up file change from host into docker container?

My host is MacOS with DockerDesktop. I have a Debian container in which a PHP application is running. Parts of the PHP application are part of the docker image, the parts I am still working on are shared with the host through a volume. Think of
docker run -td --name my-app -v /Users/me/mycode:/var/www/html/phpApp/variableParts
My problem: When I save a change on the host it takes some 10-15 seconds until this change becomes available to the containerized app. So (1) after every save it takes (too) long waiting for the code to be available and (2) I cannot be sure whether I already see the new code running or still the old one.
My problem is not that the execution of the application is slow (as some sources in the web suggest), in fact it is quite fast. My problem is that the time for the change to propagate from the host to the docker container is too long. Earlier I developed and had the code from the remote server NFS-mounted on my developing machine and there it was blazing fast.
Is there any way I can reasonably speed this up? Or does a different workflow make more sense? Would mounting the code parts I want to edit from the container (as NFS server) to the host (where the editor runs) make sense?
My workflow consists of many small adaptations to be made to the PHP code, so waitint 10-15 seconds after every edit is a no-go.
I have used Docker on Mac, and have seen edits to a bind mount propagate to the Docker container in under a second, so I think Docker is not to blame here.
Instead, I would look at any caching that PHP is doing. Is PHP reloading your code from disk on every page view, or does it cache it? For example, the opcache feature of PHP keeps a pre-compiled version of your PHP code in memory, and occasionally checks if that version is still up to date. Take a look at your php.ini, and in particular what opcache.revalidate_freq is set to.

Run a Docker service multiple times in parallel to take advantage of a computer with multiple CPU cores

I have a small Python application which is packed into a compose file as an app and a db service.
The job of the app service is to run some spatial computations using data from the db (PostgreSQL) and writing some results in that same database along with some files on disk. For the latter point, I'm using a bind mount as a volume, so that the file are saved on the host machine.
The problem I'm facing is that, based on a sample dataset, I estimated the time to finish the computations on all the records of the database to roughly 1 year...
I also noticed that the Python scripts of the app are only using one CPU core at a time. This is fine, because I'm not used to parallel programming, and also because I rely on a third party software to run some analysis, and that piece of software is also mono-threaded.
On the other hand, I have access to a multi CPU-cores machine (60x). I noticed that each time I start my compose file, only one CPU core is active.
Hence my naive question; could I take advantage of the dockerization to run the same app service as many time there are available CPU on that machine (or a bit less maybe)?
Please notice that the db service can only be there once and shared to these multiple same app services.
If yes, how to do that properly and efficiently?
I was thinking of "copy-pasting" the app service 50 times in the compose file, giving it each time another name but this is probably awfully ugly(!). There should be better ways of doing that... From the host machine maybe? Any hints are appreciated. I'm not a docker expert.
In short, this is possible by using the --scale option of docker-compose up:
docker-compose up --scale app=50 app
Doc: https://docs.docker.com/compose/reference/up/
This will start 50 instances of the app service.
Of course, the application must be designed to be run in parallel if it accesses a unique database in order to avoid troubles.
Versionning information on Ubuntu 18.04 (`5.4.0-81-generic x86_64 GNU/Linux`):
$ docker-compose --version
dockedocker-compose version 1.27.4, build 40524192
$ docker --version
Docker version 20.10.8, build 3967b7d

Docker remote volume and remote machine performance

I am planning a setup where, the docker containers are using remote volume - volume that have ssh-ed to another machine and it is reading all the time.
Lets say we have 5 containers using that remote volume. In my understanding, the docker is ssh-ed to the remote machine and constantly reading on certain directory (with about 100 files, not more than few MB).
Presumably that constant reading will put some load to the remote machine. Will that load be significant or it can be negligible? There is php-fpm and Apache2 on the remote machine, will the constant reading slow down that web server? Also, how often the volume is refreshing the files?
Sincerely.
OK after some testing:
I have created a remote volume with vieux/sshfs driver.
Created an ubuntu container with the volume mounted under certain folder.
Then tail a txt file from the container itself.
Write to that txt file form the remote machine (the one that contains the physical folder).
I have found out, if we write to the file continuously (like echo "whatever" >> thefile.txt). The changes appear all at once after few seconds, not one by one as they have been introduced. Also, if I print or list the files in the mounted directory, the response is instant. This makes me thing, that Docker is making local copy of the folder ssh-ed in the volume and refreshes it every 5 sec or so. Basically negligible load after the folder is copied once.
Also, when trying to write from the container to the mounted folder, the changes on the file are reflected almost instantly (considering any latency). Which makes me think that the daemon is propagating the write changes instantly.
In conclusion - reading a remote folder, puts negligible load to the remote machine. The plan is to use such setup in production environment, so we don't have to pull changes on two different places (prod server and machine which is sharing (local) volume between containers).
If there is anyone who can confirm my findings, that would be great.
Sincerely

what's the BestPractice for Docker logging?

Im using docker with my Web service.
when I deploy using Docker, loosing some logging files (nginx accesslog, service log, system log.. etc)
Cause, docker deployment system using down and up container architecures.
So I thought about this problem.
LoggingServer and serviceServer(for api) must seperate!
using these, methods..
First, Using logstash(in elk)(attaching all my logFile) .
Second, Using batch system, this batch system will moves logfiles to otherServer on every midnight.
isn't it okay?
I expect a better answer.
thanks.
There are many ways for logging which most the admin uses for containers
1 ) mount log directory to host , so even if docker goes up/down logs will be persisted on host.
2) ELK server, using logstash/filebeat for pushing logs to elastic search server with tailing option of file, so if new log contents it pushes to server.
3) if there is application logs like maven based projects, then there are many plugins which pushes logs to server
4) batch system , which is not recommended because if containers dies before mid-night then logs will be lost.

How do I use Docker on cloud or datacenter

I couldn't have enough courage to start using docker now I'm feel like came from last century. I want to clear my doubts about docker before get started. My question is mainly for deploying/running docker images on cloud or hosting environment.
Can I build a docker image with any type of server (eg. wildfly, payara) and/or database server (eg. mysql, oracle) and will it work on docker enabled cloud/datacenter?
If it's yes how about persistent datas like database files and static storages (eg. images, uploaded documents, logs) those are stored in docker images or somewhere else? What will happen to those files when I update my application and redeploy new image?
I read posts about what is docker but I couln't find specific answer. Forgive me for not doing enough googling.
I have run docker on AWS and other cloud providers. It is really not that hard if you have some experience with system administration and or devops. Regarding cloud hosters and getting started, most providers have some sort of tutorial on how to get started using docker with their infrastructure:
http://docs.aws.amazon.com/AmazonECS/latest/developerguide/docker-basics.html
https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-linux-dockerextension/
Can I build a docker image with any type of server (eg. wildfly,
payara) and/or database server (eg. mysql, oracle) and will it work on
docker enabled cloud/datacenter?
To get a server up and running, you just need the docker engine installed on the host, there are packages for many distros:
https://docs.docker.com/engine/installation/
After docker engine is installed, you can create dockerfiles for basically any server or service. Hopefully you do not need to, in most cases, since there are countless docker files and pre-configured, vendor maintained images already available on dockerhub (I use wildfly, elk-stack, and mysql for example). Be careful about selecting images are maintained, otherwise you end up with security issues in your images that might never get fixed! Or you have to do it yourself!
Example images:
https://hub.docker.com/r/jboss/wildfly/
https://hub.docker.com/_/mysql/
https://hub.docker.com/_/oraclelinux/
https://hub.docker.com/u/payara/
If it's yes how about persistent datas like database files and static
storages (eg. images, uploaded documents, logs) those are stored in
docker images or somewhere else? What will happen to those files when
I update my application and redeploy new image?
In general, you will want to store persistent data external to the docker image and mount it into the image as a volume:
https://docs.docker.com/engine/tutorials/dockervolumes/
Some cloud based storage providers might be easier to mount or connect to in other ways, but this volume approach is standard, IMO.
For logfiles, I actually push them to an ELK server, so having a volume for the logs is not necessarily required. However, since the ELK server is also a docker image, it does have a volume where the data is persisted.
So you have:
documentation from your cloud hoster (or docker themselves)
a host in your cloud running docker engine
0..n images that you can either grab from dockerhub or build yourself.
storage for persistent data on this host or mounted from elsewhere that you mount into your docker images on startup. this is where e.g. mysql data folders live, or where you can persist logs, etc.
Of course, it can get much more complex from there, e.g. how to transparently scale and update your environment etc., but that is something for e.g. kubernetes or docker swarm or some other solution (I've scripted a bit on my own but do not need the robustness or elastic scalability of large systems).
Regarding cluster management, it should be noted that Swarm is now included in the Docker Core. This has created some controversy in the community and even talks of a fork of the core:
https://technologyconversations.com/2015/11/04/docker-clustering-tools-compared-kubernetes-vs-docker-swarm/
https://jaxenter.com/docker-1-12-is-probably-the-most-important-release-since-1-0-129080.html
http://searchitoperations.techtarget.com/news/450303918/Docker-fork-talk-prompts-container-standardization-brawl
http://www.infoworld.com/article/3118345/cloud-computing/why-kubernetes-is-winning-the-container-war.html
I have experience running docker on Alibaba cloud and AWS as well. I did not see any difference in working with docker on both cloud providers. Docker images can be build same way on all linux platform regardless of the cloud provider. However, persistence of data need to be taken care using docker volumes. However, it is recommended to use managed service such as RDS in Alibaba cloud for databases instead of using docker.
Can I build a docker image with any type of server (eg. wildfly,
payara) and/or database server (eg. mysql, oracle) and will it work on
docker enabled cloud/datacenter?
You can build your own Docker images or use solutions that are already pre-packaged and proven by cloud providers. For example, here is an auto-clustering Docker-based implementation of GlassFish that can be run and managed on Jelastic PaaS.
If it's yes how about persistent datas like database files and static
storages (eg. images, uploaded documents, logs) those are stored in
docker images or somewhere else? What will happen to those files when
I update my application and redeploy new image?
With the above mentioned cluster, all data is kept inside containers and stays without changes after restart. As an option, you can also connect a separate data storage container if you wish to share it across other containers.

Resources