Running FastAPI on Google Cloud Run (Docker Image)

Running FastAPI on Google Cloud Run (Docker Image) - docker

I'm looking to build a Docker image to run FastAPI on Google Cloud Run. FastAPI uses Uvicorn as an ASGI server and Uvicorn recommend using Gunicorn with the Uvicorn worker class for production deployments. FastAPI themselves also have some excellent documentation on using Gunicorn with Uvicorn. I even see that FastAPI provide an official image combining the two (
uvicorn-gunicorn-fastapi-docker) but this comes with a warning:
You are probably using Kubernetes or similar tools. In that case, you
probably don't need this image (or any other similar base image). You
are probably better off building a Docker image from scratch
This warning basically explains that replication would be handled at cluster-level and doesn't need to be handled at process-level. This makes sense. I am however not quite sure if Cloud Run falls into this category? Essentially it is an abstracted and managed Knative service which therefore runs on Kubernetes.
My question is, should I be installing Gunicorn along with Uvicorn in my Dockerfile and handling replication at process-level? Along the lines of:
CMD ["gunicorn", "app.main:app", "-w", "4", "-k", "uvicorn.workers.UvicornWorker", "--bind", "0.0.0.0:80"]
Or should I stick with Uvicorn, a single process, and let Cloud Run (Kubernetes) handle replication at cluster-level? E.g.
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "80"]

A. Let's go the big scale first.
At the time of writing Cloud Run instances can be set to a maximum of 1000 concurrency requests. And CPUs can be set to 4 vCPUs.
Going back to basics a bit. Cloud Run will span many instances, each will work individually, and each could handle the maximum allowed concurrency, so each could handle 1000 requests. If you set multiple CPUs you need to handle multi processes.
When we talk about so large number we need to be cautious. If your container is big enough in CPU/Memory terms to handle this traffic, you may want to use a process manager (gunicorn) to start several Uvicorn threads (workers), as your referenced images do. So you can use the docker container.
B. Being on small scale.
On the other hand if you set 1 vCPU and be single threaded, you don't need gunicorn for process manager. You still can have concurrency enabled but not on the top level, maybe at the lower level, that fits your 1 vCPU model, something like 80 requests for concurrency. In this situation you will have on large traffic, many instances started by Cloud Run, and you rely at Cloud Run to spawn as many instances as needed, which does really nice. It's like a Kubernetes on top of your simple container.
I would suggest start with single process, build a container that doesn't use the referenced container, and only swap B to version A, when you know there are benefits(costs wise) to have larger instances.

Related

docker-compose autorestart and supervisord autorestart : which to use?

I ve seen in some build the use of supervisor to run the docker-compose up -d command with the possibility to autostart and/or autorestart.
Im wondering if this cohabitation of supervisor and docker-compose works well? Aren't the two autorestart options interfering with each other? Also what is the benefit to use supervisor in place of a simple docker-compose except run at startup if the server is shut down?
Please share your experience if you have some on using theses two tools
Thank you

Running multiple single-process containers is almost always better than running a single multiple-process container; avoid supervisord when possible.
Mechanically, the combination should work fine. Supervisord will capture logs and take responsibility for restarting the process in the container. That means docker logs will have no interesting output, and you need to get the file content out of the container. If one of the managed processes fails then supervisord will restart it. The container itself will probably never be restarted, unless supervisord manages to crash somehow.
There are a couple of notable disadvantages to using supervisord:
As noted, it swallows logs, so you need a complex file-oriented approach to read them out.
If one of the processes fails then you'll have difficulty seeing that from outside the container.
If you have a code update you have to delete and recreate the container with the new image, which with supervisord means restarting every process.
In a clustered environment like Kubernetes, every process in the supervisord container runs on the same node, and you can't scale individual processes up to handle additional load.
Given the choice of tools you suggest, I'd pretty much always use Compose with its restart: policies, and not use supervisord at all.

How to segregate docker services when using celery?

I have a project with the following structure:
backend
|-app # fastapi and celery
|-scrapper # selenium and celery
I am using celery to run long and short tasks, so I will have multiple queues/workers. Also, I will have tasks called from app, but that will be processed by scrapper. I am thinking how to split things among containers and I am not sure how to proceed. Here is what I am thinking to do:
one container to run fastapi
one container for each worker related to the fastapi one. Each container here will be pretty much a copy of the fastapi one, but the entrypoint will be to run the celery worker
one container to run the scrapper module
one container for each worker related to the scrapper one. Each container here will be pretty much a copy of the scrapper one, but the entrypoint will be to run the celery worker
I am kinda new to docker and this seems to be a waste of resources (to have multiple copies of the same thing, one to run fastapi and another to run celery alone), but if I understood right this is the way to go. I that right?

I would actually have four different containers:
fastapi
fastapi-celery (worker(s) run with -Q fastapi)
scraper
scraper-celery (worker(s) run with -Q scraper)
Reason for this is simply the ability to manage and scale easily any piece of your infrastructure. At some point you may find that you do not have enough workers to handle heavy fastapi load - then you would just add more fastapi-celery containers, etc.

Docker services route network before task is actually up - zero downtime

I'm currently running Docker version 18.03.1-ce, build 9ee9f40 on multiple nodes. My setup is a nginx service and multiple java restful API services running in a wildfly cluster.
For my API services I've configured a simple healthcheck to determine whether my API task is actually up:
HEALTHCHECK --interval=5m --timeout=3s \
--retries=2 --start-period=1m \
CMD curl -f http://localhost:8080/api/healthcheck || exit 1
But even with the use of HealthCheck my nginx sometimes gets and error - caused by the fact that the API is still not fully up - can't serve rest requests.
The only solution, that I managed to get working so far is increasing the --start-period by hand to a lot longer periods.
How does the docker service load balancer decide, when to start routing requests to the new service?
Is setting a higher time with the --start-period currently the only way to prevent load balancer from redirecting traffic to a task that is not ready for traffic or am I missing something?
I've seen the "blue-green" deployment answers like this where you can manage zero downtime, but I'm still hoping this could be done with the use of docker services.

The routing mesh will start routing traffic on the "first successful healthcheck", even if future ones fail.
Whatever you put in the HEALTHCHECK command it needs to only start returning "exit 0" when things are truly ready. If it returns a good result too early, then that's not a good healthcheck command.
The --start-period only tells swarm when to kill the task if it's yet to receive a successful healthcheck in that time, but it won't cause green healthchecks to be ignored during the start period.

Understanding Docker best practice and running webservers

I am in the situation of running a simple PHP7.0, Redis and NGINX server in a single container.
This means I run php7.0-fpm and ngxinx and redis as a service.
But in the best practices I am reading:
# Run only one process per container
In almost all cases, you should only run a single process in a single container.
Decoupling applications into multiple containers makes it much easier to scale horizontally and reuse containers.
If that service depends on another service, make use of container linking.
Does this mean that it would be best to run one container with PHP7.0 and the application and another with nginx and another with redis?

#nwinkler in comments is right, the recommendation is for good. Couple of advantages of decoupling applications into multiple containers are:
Build time
It is true that docker does hash check and does not build the layers of the image if no changes happened but this is limited to layers structure (if layer X changes all layers above X will be built). This means it will start getting painful when your images start getting bigger.
Containers are isolated
When you are attached to your ngxinx you are pretty sure that any changes you are doing is not going to cause changes in your php container and that's always a good practice.
Scalability
You need ten more Redis, good, let's run ten more Redis containers.
In general I would go for a dockerfile for a base image for any scenario and in your case one which is whatever all the three containers of yours (php, redis & nxginx) share (thirdparty libs, tools etc). Then three dockerfiles for building each image. Then a bash or docker-compose.yml script for running the images inside containers.

What are the Docker RUN params for mimicking IronWorker memory constraints?

In the past I've run into trouble when hosting my workers in a cloud infrastructure because of memory constraints that weren't faithfully reproduced when testing the code locally on my overpowered machine.
IronWorker is one such cloud provider that limits workers in its multi-tenant infrastructure to 380mb. Luckily with their switch to docker, I can hope to catch problems early on by asking my local docker container to use artificial memory limits when testing.
But I'm not sure as to which parameters from the following: https://docs.docker.com/engine/reference/run/ are the right ones to use when setting a 380mb limit ... any advice?
Does the logic from https://goldmann.pl/blog/2014/09/11/resource-management-in-docker/#_example_managing_the_memory_shares_of_a_container still apply?

You'll want to use --memory, for example, based on the node README:
docker run --memory 380M --rm -e "PAYLOAD_FILE=hello.payload.json" -v "$PWD":/worker -w /worker iron/node node hello.js

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Running FastAPI on Google Cloud Run (Docker Image) - docker

Related

docker-compose autorestart and supervisord autorestart : which to use?

How to segregate docker services when using celery?

Docker services route network before task is actually up - zero downtime

Understanding Docker best practice and running webservers

What are the Docker RUN params for mimicking IronWorker memory constraints?

Categories

Resources