Docker swarm - docker.sock slowdown - docker

I have a docker swarm where I deploy 3 copies of my microservice. The job of the microservice is to let a client download files. I am currently testing with large files of up to 3GB in size and multiple such downloads in parallel. I am on 17.06.1-ce
My microservice has “docker.sock” mounted inside my service. It is the same socket that is on my mac-docker-vm
I have a bash script that whether I execute it inside the microservice or on the mac, should give me the same output(As the same socket is mounted inside the container). The output is 3 IP addresses. The script basically is nothing but just runs "docker inspect to get IP addresses". It does that fine. The bash script uses docker command and I think it uses the docker.sock internally to process those commands.
Problem description
When I have made my microservice busy (I have more than one copy of the service running) that is streaming huge data streams, say up to 3 streams of 3GB files, the docker sock slows down I think. The reason I feel this is that when I send a download stream request, it hits the REST controller, the controller executes the bash script, and sits there waiting for script to finish. To verify my theory that script is the bottle neck and not “Scala’s” “Process” class, while this bottleneck is occurring, I executed the same bash script from my laptop. The script waited for over a minute to respond while the streaming was in progress. Remember, whether I execute the script from my laptop or from within my Scala code(which is inside the microservice, it is the same socket that is being used(as the same docker.sock is mounted)
How do I debug this further to make sure that my theory is correct and get around it? I understand it is my code base that I wrote to support download of files, but could I be potentially leaving a resource open that makes the socket behave bad? I have not tested this on CentOS Docker. Not sure if behavior will remain the same there too as on Mac

Related

ML serving service architecture with Docker

I am in the early stage of developing an image segmentation service. Currently, I have a simple Flask server that is responsible for receiving data and running a docker container with an AI model in the local GPU server. But I also think about something asynchronous like FastAPI or Nodejs to implement some scheduler for prediction tasks. What is better: a) when the server calls the docker container by ssh and the docker container run only when it is called, predicted images, saved results, and stopped, or b) running an API server inside the AI container? Each container is around 5-10GB. Running all containers looks more expensive, but I am not sure what practice is better.
I tried to call the container each time and stop it after work was done.
You should avoid approaches based on dynamically starting containers and approaches based on ssh. I'd recommend a long-running process that accepts some network input, like your existing Flask server, and either always has the ML model running or launches it as a subprocess.
If you can use a subprocess that could be a good match here. When the subprocess exits, all of its memory resources will be automatically cleaned up, so you won't have the cost of the subprocess when it's not being used. If the container happens to exit, the subprocess will get cleaned up with it. Subprocesses are also basic Unix functionality, so you can locally develop your service without needing any particular complex setup.
Dynamically launching containers comes with many challenges. It ties your application to the Docker API, which will make it harder to run, even in local development. Using that API grants unrestricted root-level access to the host system (you can very easily run a container that compromises the host). You need to remember to clean up after your own containers. The setup may not work in other container systems like Kubernetes that don't make a Docker socket available.
An ssh-based system presents different complexities. You need to distribute credentials to various places. If you're trying to run an ssh daemon inside a Docker container, that is difficult to configure securely (what creates the host keys? how do you provision users and private keys?). You also need to think about various failure cases around the ssh transport that might not be present in a purely-local system.

How to speed up file change from host into docker container?

My host is MacOS with DockerDesktop. I have a Debian container in which a PHP application is running. Parts of the PHP application are part of the docker image, the parts I am still working on are shared with the host through a volume. Think of
docker run -td --name my-app -v /Users/me/mycode:/var/www/html/phpApp/variableParts
My problem: When I save a change on the host it takes some 10-15 seconds until this change becomes available to the containerized app. So (1) after every save it takes (too) long waiting for the code to be available and (2) I cannot be sure whether I already see the new code running or still the old one.
My problem is not that the execution of the application is slow (as some sources in the web suggest), in fact it is quite fast. My problem is that the time for the change to propagate from the host to the docker container is too long. Earlier I developed and had the code from the remote server NFS-mounted on my developing machine and there it was blazing fast.
Is there any way I can reasonably speed this up? Or does a different workflow make more sense? Would mounting the code parts I want to edit from the container (as NFS server) to the host (where the editor runs) make sense?
My workflow consists of many small adaptations to be made to the PHP code, so waitint 10-15 seconds after every edit is a no-go.
I have used Docker on Mac, and have seen edits to a bind mount propagate to the Docker container in under a second, so I think Docker is not to blame here.
Instead, I would look at any caching that PHP is doing. Is PHP reloading your code from disk on every page view, or does it cache it? For example, the opcache feature of PHP keeps a pre-compiled version of your PHP code in memory, and occasionally checks if that version is still up to date. Take a look at your php.ini, and in particular what opcache.revalidate_freq is set to.

Logging from multiple processes in a single docker container

I have an application (let's call it Master) which runs on linux and starts several processes (let's call them Workers) using fork/exec. Therefore each Worker has its own PID and writes its own logs.
When running directly on a host machine (without docker) each process uses syslog for logging, and rsyslog puts ouptut from each Worker to a separate file, using a config like this:
$template workerfile,"/var/log/%programname%.log"
:programname, startswith, "worker" ?workerfile
:programname, isequal, "master" "/var/log/master"
Now, I want to run my application inside a docker container. Docker starts Master process as the main process (in CMD section of the Dockerfile), and then it forks the Workers at runtime (not sure if it's a canonical way to use docker, but that's what I have). Of course I'm getting only the stdout for the Master process from docker, and logs of Workers get lost.
So my question is, any way I could get the logs from the forked processes?
To be precise, I want the logs from different processes to appear in individual files on the host machine eventually.
I tried to run rsyslog daemon inside docker container (just like I do when running without docker), writing logs to a mounted volume, but it doesn't seem to work. I guess it requires a workaround like supervisord to run the Master process and rsyslogd at the same time, which looks like an overkill to me.
I couldn't find any simple solution for that, though my problem seems to be trivial.
Any help is appreciated, thanks

Best Practices for Cron on Docker

I've transitioned to using docker with cron for some time but I'm not sure my setup is optimal. I have one cron container that runs about 12 different scripts. I can edit the schedule of the scripts but in order to deploy a new version of the software running (some scripts which run for about 1/2 day) I have to create a new container to run some of the scripts while others finish.
I'm considering either running one container per script (the containers will share everything in the image but the crontab). But this will still make it hard to coordinate updates to multiple containers sharing some of the same code.
The other alternative I'm considering is running cron on the host machine and each command would be a docker run command. Doing this would let me update the next run image by using an environment variable in the crontab.
Does anybody have any experience with either of these two solutions? Are there any other solutions that could help?
If you are just running docker standalone (single host) and need to run a bunch of cron jobs without thinking too much about their impact on the host, then making it simple running them on the host works just fine.
It would make sense to run them in docker if you benefit from docker features like limiting memory and cpu usage (so they don't do anything disruptive). If you also use a log driver that writes container logs to some external logging service so you can easily monitor the jobs.. then that's another good reason to do it. The last (but obvious) advantage is that deploying new software using a docker image instead of messing around on the host is often a winner.
It's a lot cleaner to make one single image containing all the code you need. Then you trigger docker run commands from the host's cron daemon and override the command/entrypoint. The container will then die and delete itself after the job is done (you might need to capture the container output to logs on the host depending on what logging driver is configured). Try not to send in config values or parameters you change often so you keep your cron setup as static as possible. It can get messy if a new image also means you have to edit your cron data on the host.
When you use docker run like this you don't have to worry when updating images while jobs are running. Just make sure you tag them with for example latest so that the next job will use the new image.
Having 12 containers running in the background with their own cron daemon also wastes some memory, but the worst part is that cron doesn't use the environment variables from the parent process, so if you are injecting config with env vars you'll have to hack around that mess (write them do disk when the container starts and such).
If you worry about jobs running parallel there are tons of task scheduling services out there you can use, but that might be overkill for a single docker standalone host.

What is image "containersol/minimesos" in minimesos?

I was able to setup the minimesos cluster on my laptop and also could deploy a small command-line utility. Now the questions;
What is the image "containersol/minimesos" used for? It is pulled but I don't see it running, when I do "docker ps". "docker images" lists it.
How come when I run "top" inside the mesos-agent container, I see all the processes running in my host (laptop)? This is a bit strange.
I was trying to figure out what's inside minimesos script. I see that there's just one "docker run ... " command. Would really appreciate if I could get to know what the aforementioned command does that results into 4 containers (1 master, 1 slave, 1 zk, 1 marathon) running on my laptop.
containersol/minimesos runs the Java code that is the core of minimesos. It only runs until it executes the command from the CLI. When you do minimesos up the command name and the minimesosFile will be passed to this container. The container in turn will execute the Java code that will create the other containers that form the Mesos cluster specified in the minimesosFile. That should answer #3 also. Take a look at MesosCluster class thats the root of where the magic happens.
I don't know the answer to #2 will get back to you when I find out.
Every minimesos command runs as a short lived container, whose image is containersol/minimesos.
When you run 'minimesos up' it launches containersol/minimesos with 'up' as the argument. It then launches a cluster by starting other containers like containersol/mesos-agent and containersol/mesos-master. After the cluster is up the containersol/minimesos container exists and is removed.
We have separated cli and minimesos core as a refactoring to prepare for the upcoming API module. We are creating an API to support clients for different programming language. The first client will be a Golang client.
In this new setup minimesos will run launch a long running API server and any minimesos cli commands call the API. The clients will also launch the API server and call the API.

Resources