I'm using Celery 5.2.7 + rabbitmq 3.9-management for RPC. My applications are separated into 3 different containers: worker, rabbitmq, and API which makes RPC call.
All containers are deployed via single docker-compose and connected to a bridge network. Whole cluster works on a single machine under Ubuntu 20.04.3 LTS.
Work scenario is the following:
API receives request from user
API puts a job for gpu worker to rabbitmq and waits for response.
GPU worker listens to task queue and performs computations when the new task occurs
When computations are complete, GPU worker sends results back (via another rabbitmq queue, I suppose)
My problem is that API never gets its results back. As I can see in logs, my gpu worker gets the job done in <0.1 seconds, but my API never gets results
Setup:
Worker:
#celery_worker.py
app = Celery('celery_worker', broker= f'amqp://{rabbitmq_username}:{rabbitmq_password}#{rabbitmq_host}:{rabbitmq_port}/', backend='rpc://')
#app.task
def remote_procedure(a, b):
#some heavy computations
return res
API main:
#API main.py
from proj.celery_worker import remote_procedure
RESPONSE_TIMEOUT_SECONDS = 1.5
#app.get('/search/')
def search(a,b):
try:
res = remote_procedure.apply_async((a,b), expires = RESPONSE_TIMEOUT_SECONDS-0.3, retry=False).get(timeout=RESPONSE_TIMEOUT_SECONDS)
except celery.exceptions.TimeoutError:
print('timeout')
return {"res":"timeout"}
return {"res":res}
def main():
"""
Main function
"""
uvicorn.run(app, host='127.0.0.1', port=CONFIG['port'])
if __name__ == '__main__':
main()
Compose:
gpu_worker:
build:
context: .
dockerfile: Dockerfile.worker
deploy:
resources:
limits:
memory: 10gb
reservations:
devices:
- capabilities: [ gpu ]
networks:
- search_net
api:
build:
context: .
dockerfile: Dockerfile.api
depends_on:
- gpu_worker
ports:
- "8001:8000"
environment:
MAX_WORKERS: 32
deploy:
resources:
limits:
memory: 10gb
reservations:
devices:
- capabilities: [ gpu ]
networks:
- search_net
rabbitmq:
image: rabbitmq:3.9-management
hostname: rabbitmq
container_name: 'rabbitmq'
ports:
- 5672:5672
- 15672:15672
volumes:
- rabbitmq_data:/var/lib/rabbitmq/
- rabbitmq_log:/var/log/rabbitmq/
restart: on-failure
networks:
- search_net
volumes:
rabbitmq_data:
rabbitmq_log:
networks:
search_net:
driver: bridge
Dockerfile.worker:
...
ENTRYPOINT [ "celery", "-A", "proj.celery_worker", "worker", "--loglevel=DEBUG", "--pool=solo", "-c", "1"]
Dockerfile.api:
...
EXPOSE 80
# Start the app
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
I'm not setting any other params, so all other parameters or configurations must be default.
I'm not sure, but it seems to me that there is some kind of trouble with returning results. Is it possible that there are networking troubles inside the container?
Related
I'm new to deployment world and having this issue when I try to deploy an app. The application I tried to deploy is consists of 2 services. First service is an AI model and the second one is the web app. In order to run the web app, the AI model is has to run first. This is the docker-compose.yml that I tried to make:
version: '3.8'
services:
max-image-caption-generator:
image: quay.io/codait/max-image-caption-generator
ports:
- "5000"
app:
build: .
depends_on:
- max-image-caption-generator
ports:
- "8088"
Here are my questions:
Am I defining the docker-compose.yml right?
How do I tell app to run the max-image-caption-generator first?
I was able to build from the file above, I could curl the http://localhost:5000 and it gave me the right html of the AI model, but I couldn't curl http://localhost:8088. It's either connection was reset by peer or it can't connect to the http://localhost:5000 which means the AI model is not running.
Here is couple misunderstandings in your question:
depends_on means that app will be run after max-image-caption-generator, but! Docker will not check if service inside max-image-caption-generator properly started or not. You have to add healthcheck to be sure that max-image-caption-generator is running properly, and after that add condition service_healthy to app.
or it can't connect to the http://localhost:5000
and it can't. Because localhost:5000 only accessible from Docker host but not from container's inside. You have to use container name to be able communicate between containers.
Your docker compose should be like:
version: '3.9'
services:
max-image-caption-generator:
image: quay.io/codait/max-image-caption-generator
ports:
- "5000"
# networks is optional parameter
networks:
service_network:
aliases:
- generator.hostname
# use it if you want to start app after max-image-caption-generator will be ready get requests
# healthcheck:
# test: ["CMD", "some_test_script", "--params"]
# interval: 30s
# timeout: 10s
# retries: 2
app:
build: .
# networks is optional parameter
networks:
- service_network
depends_on:
max-image-caption-generator:
# set this condition if you added healthcheck to max-image-caption-generator container
# condition: service_healthy
# this condition just run app after max-image-caption-generator, and no matter is max-image-caption-generator running properly or not
condition: service_started
ports:
- "8088"
# optional block that may be deleted (docker will use default network)
networks:
service_network:
name: service_network
driver: bridge
ipam:
driver: default
config:
- subnet: 10.0.10.240/28
gateway: 10.0.10.241
After that you will be able to connect to max-image-caption-generator container from app container using http://generator.hostname:5000 url (if networks block is not provided service may be accessed by http://max-image-caption-generator:5000 (same as service key))
*Here you can find information how healthcheck works.
I have to deploy a web app with a Jetty Server. This app need a database, running on MariaDB. Here the docker-compose file used to deploy the app:
version: '2.1'
services:
jetty:
build:
context: .
dockerfile: docker/jetty/Dockerfile
container_name: app-jetty
ports:
- "8080:8080"
depends_on:
mariadb:
condition: service_healthy
networks:
- app
links:
- "mariadb:mariadb"
mariadb:
image: mariadb:10.7
container_name: app-mariadb
restart: always
environment:
MARIADB_ROOT_PASSWORD: myPassword
MARIADB_DATABASE: APPDB
ports:
- "3307:3306"
healthcheck:
test: [ "CMD", "mariadb-admin", "--protocol", "tcp", "ping", "-u root", "-pmyPassword" ]
interval: 10s
timeout: 3m
retries: 10
volumes:
- datavolume:/var/lib/mysql
networks:
- app
networks:
app:
driver: bridge
volumes:
datavolume:
I use a volume to keep the data of mariaDB even if I use docker-compose down. On my Jetty app, the data is store into the database when the contextDestroyed function is load (when the container is stopped).
But I have an another problem: when I execute docker-compose down, all the containers are stopped and deleted. Although the mariaDB is the last stopped container (that's what the terminal is saying), the save on the contexDestroyed is "interrupt" and I lost some informations because mariaDB container is stopped when the Jetty container still saving data. I tested to stop every container but the mariaDB and my data is succefully saved without loss, so the problem is obviously the mariaDB container.
How can I indicate to the mariadb container to wait for all containers for stopping before stop itself?
According to the depends_on documentation - your dependency will force the following order of shutdown:
web
mariadb
You might want to look into what's happening during the shutdown of these containers and add some custom logic that will guarantee your data is consistent.
You can influence what happens during the shutdown of a container in 2 main ways:
adding a custom script as an entrypoint
handling the SIGTERM signal yourself
here's the relevant documentation on this.
Maybe the simplest - not necessarily the smartest - way would be to add a sleep(5) to the db shutdown, so your app has enough time to flush its writes.
This question already has answers here:
How to stop all containers when one container stops with docker-compose?
(4 answers)
Closed 1 year ago.
I have this docker-compose.yml which runs a node script which depends on Redis.
version: "3.9"
services:
redis:
image: "redis:alpine"
# restart: always
ports:
- "127.0.0.1:6379:6379"
volumes:
- ./docker/redis:/data
node:
image: "node:17-alpine"
user: "node"
depends_on:
- redis
environment:
- NODE_ENV=production
- REDIS_HOST_ENV=redis
volumes:
- ./docker/node/src:/home/node/app
- ./docker/node/log:/home/node/log
expose:
- "8081"
working_dir: /home/node/app
command: "npm start"
When starting this script with docker compose up both services will start. However when the node service is finished, the redis service keeps running. Is there a way to define that the redis service can stop when the node service is done?
I have examined the documentation for Compose Spec but I have not found anything that allows you to immediately stop containers based on the state of another one. Perhaps there really is a way, but you can always control the behaviour of the redis service by using an healthcheck:
services:
redis:
image: "redis:alpine"
# restart: always
ports:
- "127.0.0.1:6379:6379"
volumes:
- ./docker/redis:/data
healthcheck:
test: ping -c 2 mynode || kill 1
interval: 5s
retries: 1
start_period: 20s
node:
image: "node:17-alpine"
container_name: mynode
user: "node"
depends_on:
- redis
environment:
- NODE_ENV=production
- REDIS_HOST_ENV=redis
volumes:
- ./docker/node/src:/home/node/app
- ./docker/node/log:/home/node/log
expose:
- "8081"
working_dir: /home/node/app
command: "npm start"
As for the node service, I have added a container_name: mynode, necessary by the redis service in order to contact it. The container name becomes also the hostname, if not specified with the hostname property.
The redis service has an healthcheck that ping the node container every 5 seconds, starting after 30 seconds from the container start. If the ping is successful, the container is labeled as healthy, otherwise it is killed.
This solution might work in your case but has some downsides:
The healthcheck feature is abused here, besides what if you had another healthcheck?
You cannot always kill the init process, because protected by default. There are some discussions about this and it seems the most popular decision is to use tini as the init process. Fortunately, in the image you are using, it is possible.
redis service contacts the node service via the hostname, which means that they are supposed to be in the same network in your case. The current network is the default bridge network that should be avoided most of the times. I suggest you to declare a custom bridge network.
This solution is based on polling the node container, which is not very elegant, firstly because you have to hope that the time-based parameters in the healthcheck section are "good-enough".
I want to be able to restart a golang docker file on failure to connect to rabbitmq as outined here: (Docker Compose wait for container X before starting Y see answer by: svenhornberg).
Unfortunately my golang container will exit but never restart and I don't know why.
Docker-compose:
version: '3.3'
services:
mongo:
image: 'mongo:3.4.1'
container_name: 'datastore'
ports:
- '27017:27017'
rabbitmq:
restart: always
tty: true
image: rabbitmq:3.7-management-alpine
hostname: "rabbit"
ports:
- "15672:15672"
- "5672:5672"
labels:
NAME: "rabbitmq"
volumes:
- ./rabbitmq-isolated.conf:/etc/rabbitmq/rabbitmq.config
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:15672"]
interval: 3s
timeout: 5s
retries: 20
api:
restart: always
tty: true
container_name: 'api'
build: '.'
working_dir: /go/src/github.com/patientplatypus/project
ports:
- '8000:8000'
volumes:
- './:/go/src/github.com/patientplatypus/project'
- './uploads:/uploads'
- './scripts:/scripts'
- './templates:/templates'
depends_on:
- "mongo"
- "rabbitmq"
Docker file:
FROM golang:latest
WORKDIR /go/src/github.com/patientplatypus/project
COPY . .
RUN go get github.com/imroc/req
<...more go gets...>
RUN go get github.com/joho/godotenv
EXPOSE 8000
ENTRYPOINT [ "fresh" ]
Here is my golang code:
package main
import (
"fmt"
"log"
"os"
"os/exec"
"net/http"
)
func main() {
fmt.Println("Golang server started")
godotenv.Load()
fmt.Println("now doing healthcheck on rabbit")
exec.Command("docker-compose restart api")
os.Exit(1)
<...>
And here is my terminal output (golang never restarts after rabbit called):
api | 23:23:00 app | Golang server started
api | 23:23:00 app | now doing healthcheck on rabbit
rabbitmq_1 |
rabbitmq_1 | ## ##
rabbitmq_1 | ## ## RabbitMQ 3.7.11. Copyright (C) 2007-2019 Pivotal Software, Inc.
rabbitmq_1 | ########## Licensed under the MPL. See http://www.rabbitmq.com/
rabbitmq_1 | ###### ##
rabbitmq_1 | ########## Logs: <stdout>
<...more rabbit logging...>
I'm very confused on how to get this to work. What am I doing wrong?
EDIT:
The exec.Command was incorrectly implemented, however os.Exit(1), log.Fatal, and log.Panic exit the container, but the container does not restart. Still confused.
The Docker documentation says:
A restart policy only takes effect after a container starts successfully. In this case, starting successfully means that the container is up for at least 10 seconds and Docker has started monitoring it. This prevents a container which does not start at all from going into a restart loop.
Since the Go code you show exits basically immediately, it never meets this 10-second-minimum rule.
You can force Go to wait until the process has been alive a minimum of 10 seconds by using time.After somewhat like:
ch := time.After(10 * time.Second)
defer (func() { fmt.Println("waiting"); <-ch; fmt.Println("waited") })()
That is, create a channel that will receive an event after 10 seconds, and then actually receive it (immediately if it's happened, waiting if not) before main returns. From playing with https://play.golang.org/p/zGY5jFWbXyk, the one trick is that there needs to be some observable effect after receiving from the channel or else it doesn't actually wait.
I have the following docker-compose, where I need to wait for the service jhipster-registry to be up and accepting connections before starting myprogram-app.
I tried the healtcheck way, following the official doc https://docs.docker.com/compose/compose-file/compose-file-v2/
version: '2.1'
services:
myprogram-app:
image: myprogram
mem_limit: 1024m
environment:
- SPRING_PROFILES_ACTIVE=prod,swagger
- EUREKA_CLIENT_SERVICE_URL_DEFAULTZONE=http://admin:$${jhipster.registry.password}#jhipster-registry:8761/eureka
- SPRING_CLOUD_CONFIG_URI=http://admin:$${jhipster.registry.password}#jhipster-registry:8761/config
- SPRING_DATASOURCE_URL=jdbc:postgresql://myprogram-postgresql:5432/myprogram
- JHIPSTER_SLEEP=0
- SPRING_DATA_ELASTICSEARCH_CLUSTER_NODES=myprogram-elasticsearch:9300
- JHIPSTER_REGISTRY_PASSWORD=53bqDrurQAthqrXG
- EMAIL_USERNAME
- EMAIL_PASSWORD
ports:
- 8080:8080
networks:
- backend
depends_on:
- jhipster-registry:
"condition": service_started
- myprogram-postgresql
- myprogram-elasticsearch
myprogram-postgresql:
image: postgres:9.6.5
mem_limit: 256m
environment:
- POSTGRES_USER=myprogram
- POSTGRES_PASSWORD=myprogram
networks:
- backend
myprogram-elasticsearch:
image: elasticsearch:2.4.6
mem_limit: 512m
networks:
- backend
jhipster-registry:
extends:
file: jhipster-registry.yml
service: jhipster-registry
mem_limit: 512m
ports:
- 8761:8761
networks:
- backend
healthcheck:
test: "exit 0"
networks:
backend:
driver: "bridge"
but I get the following error when running docker-compose up:
ERROR: The Compose file './docker-compose.yml' is invalid because:
services.myprogram-app.depends_on contains {"jhipster-registry": {"condition": "service_started"}}, which is an invalid type, it should be a string
Am I doing something wrong, or this feature is no more supported? How to achieve this sync between services?
Updated version
version: '2.1'
services:
myprogram-app:
image: myprogram
mem_limit: 1024m
environment:
- SPRING_PROFILES_ACTIVE=prod,swagger
- EUREKA_CLIENT_SERVICE_URL_DEFAULTZONE=http://admin:$${jhipster.registry.password}#jhipster-registry:8761/eureka
- SPRING_CLOUD_CONFIG_URI=http://admin:$${jhipster.registry.password}#jhipster-registry:8761/config
- SPRING_DATASOURCE_URL=jdbc:postgresql://myprogram-postgresql:5432/myprogram
- JHIPSTER_SLEEP=0
- SPRING_DATA_ELASTICSEARCH_CLUSTER_NODES=myprogram-elasticsearch:9300
- JHIPSTER_REGISTRY_PASSWORD=53bqDrurQAthqrXG
- EMAIL_USERNAME
- EMAIL_PASSWORD
ports:
- 8080:8080
networks:
- backend
depends_on:
jhipster-registry:
condition: service_healthy
myprogram-postgresql:
condition: service_started
myprogram-elasticsearch:
condition: service_started
#restart: on-failure
myprogram-postgresql:
image: postgres:9.6.5
mem_limit: 256m
environment:
- POSTGRES_USER=myprogram
- POSTGRES_PASSWORD=tuenemreh
networks:
- backend
myprogram-elasticsearch:
image: elasticsearch:2.4.6
mem_limit: 512m
networks:
- backend
jhipster-registry:
extends:
file: jhipster-registry.yml
service: jhipster-registry
mem_limit: 512m
ports:
- 8761:8761
networks:
- backend
healthcheck:
test: ["CMD", "curl", "-f", "http://jhipster-registry:8761", "|| exit 1"]
interval: 30s
retries: 20
#start_period: 30s
networks:
backend:
driver: "bridge"
The updated version gives me a different error,
ERROR: for myprogram-app Container "8ebca614590c" is unhealthy.
ERROR: Encountered errors while bringing up the project.
saying that the container of jhipster-registry is unhealthy, but it's reachable via browser. How can I fix the command in the healthcheck to make it work?
Best Approach - Resilient App Starts
While docker does support startup dependencies, they officially recommend updating your app start logic to test for the availability of external dependencies and retry. This has lots of benefits for robust applications that may restart in the wild on the fly in addition to circumventing the race condition in docker compose up
depends_on & service_healthy - Compose 1.27.0+
depends_on is back in docker compose v1.27.0+ (was deprecated in v3) in the Compose Specification
Each service should also implement a service_healthy check to be able to report if it's fully setup and ready for downstream dependencies.
version: '3.0'
services:
php:
build:
context: .
dockerfile: tests/Docker/Dockerfile-PHP
depends_on:
redis:
condition: service_healthy
redis:
image: redis
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 1s
timeout: 3s
retries: 30
wait-for-it.sh
The recommended approach from docker according to their docs on Control startup and shutdown order in Compose is to download wait-for-it.sh which takes in the domain:port to poll and then executes the next set of commands if successful.
version: "2"
services:
web:
build: .
ports:
- "80:8000"
depends_on:
- "db"
command: ["./wait-for-it.sh", "db:5432", "--", "python", "app.py"]
db:
image: postgres
Note: This requires overriding the startup command of the image, so make sure you know what wanted to pass to maintain parity of the default startup.
Further Reading
Docker Compose wait for container X before starting Y
Difference between links and depends_on in docker_compose.yml
How can I wait for a docker container to be up and running?
Docker Compose Wait til dependency container is fully up before launching
depends_on doesn't wait for another service in docker-compose 1.22.0
The documentation suggests that, in Docker Compose version 2 files specifically, depends_on: can be a list of strings, or a mapping where the keys are service names and the values are conditions. For the services where you don't have (or need) health checks, there is a service_started condition.
depends_on:
# notice: these lines don't start with "-"
jhipster-registry:
condition: service_healthy
myprogram-postgresql:
condition: service_started
myprogram-elasticsearch:
condition: service_started
Depending on how much control you have over your program and its libraries, it's better still if you can arrange for the service to be able to start without its dependencies necessarily being available (equivalently, to function if its dependencies die while the service is running), and not use the depends_on: option. You might return an HTTP 503 Service Unavailable error if the database is down, for instance. Another strategy that often is helpful is to immediately exit if your dependencies aren't available but use a setting like restart: on-error to ask the orchestrator to restart the service.
Update to version 3+.
Please follow the documents from version 3:
There are several things to be aware of when using depends_on:
depends_on does not wait for db and redis to be “ready” before
starting web - only until they have been started. If you need to wait
for a service to be ready, see Controlling startup order for more on
this problem and strategies for solving it. Version 3 no longer
supports the condition form of depends_on.
The depends_on option is
ignored when deploying a stack in swarm mode with a version 3 Compose
file.
I would consider using the restart_policy option for configuring your myprogram-app to restart until the jhipster-registry is up and accepting connections:
restart_policy:
condition: on-failure
delay: 3s
max_attempts: 5
window: 60s
With the new docker compose API, we can now use the new --wait option:
docker compose up --wait
If your service has a healthcheck, Docker waits until it has the "healthy" status; otherwise, it waits for the service to be started. That's why it is crucial to have relevant healthchecks for all your services.
Note that this option automatically activate the --detach option.
Check out the documentation here.
The best approach I found is to check for the desired port in the entrypoint. There are different ways to do that e.g. wait-for-it but I like to use this solution that is cross-platform between apline and bash images and doesn't download custom scripts from GitHub:
Install netcat-openbsd (works with apt and apk). Then in the entrypoint (works with both #!/bin/bash and #!/bin/sh):
#!/bin/bash
wait_for()
{
echo "Waiting $1 seconds for $2:$3"
timeout $1 sh -c 'until nc -z $0 $1; do sleep 0.1; done' $2 $3 || return 1
echo "$2:$3 available"
}
wait_for 10 db 5432
wait_for 10 redis 6379
You can also make this into a 1-liner if you don't want to print anything.
Although you already got an answer, it should be mentioned that what you are trying to achieve have some nasty risks.
Ideally a service should be self sufficient and smart enough to retry and await for dependencies to be available (before a going down). Otherwise you will be more exposed to one failure propagating to other services. Also consider that a system reboot, unlike a manual start might ignore the dependencies order.
If one service crash causes all your system to go down, you might have a tool to restart everything again, but it would be better having services that resist that case.
After trying several approaches, IMO the simplest and most elegant option is using the jwilder/dockerize (dockerize) utility image with its -wait flag. Here is a simple example where I need a PostgreSQL database to be ready before starting my app:
version: "3.8"
services:
# Start Postgres.
db:
image: postgres
# Wait for Postgres to be joinable.
check-db-started:
image: jwilder/dockerize:0.6.1
depends_on:
- db
command: 'dockerize -wait=tcp://db:5432'
# Only start myapp once Postgres is joinable.
myapp:
image: myapp:latest
depends_on:
- check-db-started