How to segregate docker services when using celery? - docker

I have a project with the following structure:
backend
|-app # fastapi and celery
|-scrapper # selenium and celery
I am using celery to run long and short tasks, so I will have multiple queues/workers. Also, I will have tasks called from app, but that will be processed by scrapper. I am thinking how to split things among containers and I am not sure how to proceed. Here is what I am thinking to do:
one container to run fastapi
one container for each worker related to the fastapi one. Each container here will be pretty much a copy of the fastapi one, but the entrypoint will be to run the celery worker
one container to run the scrapper module
one container for each worker related to the scrapper one. Each container here will be pretty much a copy of the scrapper one, but the entrypoint will be to run the celery worker
I am kinda new to docker and this seems to be a waste of resources (to have multiple copies of the same thing, one to run fastapi and another to run celery alone), but if I understood right this is the way to go. I that right?

I would actually have four different containers:
fastapi
fastapi-celery (worker(s) run with -Q fastapi)
scraper
scraper-celery (worker(s) run with -Q scraper)
Reason for this is simply the ability to manage and scale easily any piece of your infrastructure. At some point you may find that you do not have enough workers to handle heavy fastapi load - then you would just add more fastapi-celery containers, etc.

Related

docker-compose autorestart and supervisord autorestart : which to use?

I ve seen in some build the use of supervisor to run the docker-compose up -d command with the possibility to autostart and/or autorestart.
Im wondering if this cohabitation of supervisor and docker-compose works well? Aren't the two autorestart options interfering with each other? Also what is the benefit to use supervisor in place of a simple docker-compose except run at startup if the server is shut down?
Please share your experience if you have some on using theses two tools
Thank you
Running multiple single-process containers is almost always better than running a single multiple-process container; avoid supervisord when possible.
Mechanically, the combination should work fine. Supervisord will capture logs and take responsibility for restarting the process in the container. That means docker logs will have no interesting output, and you need to get the file content out of the container. If one of the managed processes fails then supervisord will restart it. The container itself will probably never be restarted, unless supervisord manages to crash somehow.
There are a couple of notable disadvantages to using supervisord:
As noted, it swallows logs, so you need a complex file-oriented approach to read them out.
If one of the processes fails then you'll have difficulty seeing that from outside the container.
If you have a code update you have to delete and recreate the container with the new image, which with supervisord means restarting every process.
In a clustered environment like Kubernetes, every process in the supervisord container runs on the same node, and you can't scale individual processes up to handle additional load.
Given the choice of tools you suggest, I'd pretty much always use Compose with its restart: policies, and not use supervisord at all.

Running FastAPI on Google Cloud Run (Docker Image)

I'm looking to build a Docker image to run FastAPI on Google Cloud Run. FastAPI uses Uvicorn as an ASGI server and Uvicorn recommend using Gunicorn with the Uvicorn worker class for production deployments. FastAPI themselves also have some excellent documentation on using Gunicorn with Uvicorn. I even see that FastAPI provide an official image combining the two (
uvicorn-gunicorn-fastapi-docker) but this comes with a warning:
You are probably using Kubernetes or similar tools. In that case, you
probably don't need this image (or any other similar base image). You
are probably better off building a Docker image from scratch
This warning basically explains that replication would be handled at cluster-level and doesn't need to be handled at process-level. This makes sense. I am however not quite sure if Cloud Run falls into this category? Essentially it is an abstracted and managed Knative service which therefore runs on Kubernetes.
My question is, should I be installing Gunicorn along with Uvicorn in my Dockerfile and handling replication at process-level? Along the lines of:
CMD ["gunicorn", "app.main:app", "-w", "4", "-k", "uvicorn.workers.UvicornWorker", "--bind", "0.0.0.0:80"]
Or should I stick with Uvicorn, a single process, and let Cloud Run (Kubernetes) handle replication at cluster-level? E.g.
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "80"]
A. Let's go the big scale first.
At the time of writing Cloud Run instances can be set to a maximum of 1000 concurrency requests. And CPUs can be set to 4 vCPUs.
Going back to basics a bit. Cloud Run will span many instances, each will work individually, and each could handle the maximum allowed concurrency, so each could handle 1000 requests. If you set multiple CPUs you need to handle multi processes.
When we talk about so large number we need to be cautious. If your container is big enough in CPU/Memory terms to handle this traffic, you may want to use a process manager (gunicorn) to start several Uvicorn threads (workers), as your referenced images do. So you can use the docker container.
B. Being on small scale.
On the other hand if you set 1 vCPU and be single threaded, you don't need gunicorn for process manager. You still can have concurrency enabled but not on the top level, maybe at the lower level, that fits your 1 vCPU model, something like 80 requests for concurrency. In this situation you will have on large traffic, many instances started by Cloud Run, and you rely at Cloud Run to spawn as many instances as needed, which does really nice. It's like a Kubernetes on top of your simple container.
I would suggest start with single process, build a container that doesn't use the referenced container, and only swap B to version A, when you know there are benefits(costs wise) to have larger instances.

What is the best way to do periodical cleanups inside a docker container?

I have a docker container that runs a simple custom download server using uwsgi on debian and a python script. The files are generated and saved inside the container for each request. Now, periodically I want to delete old files that the server generated for past requests.
So far, I achieved the cleanup via a cronjob on the host, that looks something like this:
*/30 * * * * docker exec mycontainer /path/on/container/delete_old_files.sh
But that has a few drawbacks:
Cron needs to be installed and running on the docker host
The user manually has to add a cronjob for each container they start
There is an extra cleanup script in the source
The fact that the cron job is needed needs to be documented
I would much prefer a solution that rolls out with the docker container and is also suitable for more general periodical tasks in the background of a docker container.
Any best practices on this?
Does python or uwsgi have an easy mechanism for periodical background tasks?
I'm aware, that I could install cron inside the container and to something like: CMD ['sh', '-c', 'cron; uswgi <uswgi-options>... --wsgi-file server.py'] but that seems a bit clonky and against docker philosopy.
A solution like this in server.py:
def cleanup():
# ...
threading.Timer(30*60, cleanup).start() # seconds...
cleanup()
# ... rest of the code here ...
Seems good, but I'm not sure how it interferes with uwsgi's own threading and processing.
It seems like a simple problem but isn't.
You should not store live data in containers. Containers can be a little bit fragile and need to be deleted and restarted routinely (because you forgot an option; because the underlying image has a critical security fix) and when this happens you will lose all of the data that's in the container.
What you can do instead is use a docker run -v option to cause the data to be stored in a path on the host. If they're all in the same place then you can have one cron job that cleans them all up. Running cron on the host is probably the right solution here, though in principle you could have a separate dedicated cron container that did the cleanup.

Running a cronjob or task inside a docker cloud container

I got stuck and need help. I have setup multiple stacks on docker cloud. The stacks are running multiple container like data, mysql, web, elasticsearch, etc.
Now I need to run commands on the web containers. Before docker I did this with cronjob eg:
*/10 * * * * php /var/www/public/index.php run my job
But my web Dockerfile ends with
CMD ["apache2-foreground"]
As I understand the docker concept running two commands on one container would be bad practice. But how would I schedule a job like the one cronjob above?
Should I start cron in the CMD too, something like?
CMD ["cron", "apache2-foreground"] ( should exit with 0 before apache starts)
Should I make a start up script running both commands?
In my opinion the smartest solution would be to create another service like the dockercloud haproxy one, where other services are linked.
Then the cron service would exec commands that are defined in the Stackfile of the linked containers/stacks.
Thanks for your help
With docker in general I see 3 options:
run your cron process in the same container
run your cron process in a different container
run cron on the host, outside of docker
For running cron in the same container you can look into https://github.com/phusion/baseimage-docker
Or you can create a separate container where the only running process inside is the cron daemon. I don't have a link handy for this, but they are our there. Then you use the cron invocations to connect to the other containers and call what you want to run. With an apache container that should be easy enough, just expose some minimal http API endpoint that will do what you want done when it's called (make sure it's not vulnerable to any injections, i.e. don't pass any arguments, keep it simple stupid).
If you have control of the host as well then you can (ab)use the cron daemon running there (I currently do this with my containers). I don't know docker cloud, but something tells me that this might not be an option for you.

Understanding Docker best practice and running webservers

I am in the situation of running a simple PHP7.0, Redis and NGINX server in a single container.
This means I run php7.0-fpm and ngxinx and redis as a service.
But in the best practices I am reading:
# Run only one process per container
In almost all cases, you should only run a single process in a single container.
Decoupling applications into multiple containers makes it much easier to scale horizontally and reuse containers.
If that service depends on another service, make use of container linking.
Does this mean that it would be best to run one container with PHP7.0 and the application and another with nginx and another with redis?
#nwinkler in comments is right, the recommendation is for good. Couple of advantages of decoupling applications into multiple containers are:
Build time
It is true that docker does hash check and does not build the layers of the image if no changes happened but this is limited to layers structure (if layer X changes all layers above X will be built). This means it will start getting painful when your images start getting bigger.
Containers are isolated
When you are attached to your ngxinx you are pretty sure that any changes you are doing is not going to cause changes in your php container and that's always a good practice.
Scalability
You need ten more Redis, good, let's run ten more Redis containers.
In general I would go for a dockerfile for a base image for any scenario and in your case one which is whatever all the three containers of yours (php, redis & nxginx) share (thirdparty libs, tools etc). Then three dockerfiles for building each image. Then a bash or docker-compose.yml script for running the images inside containers.

Resources