Flink TaskManager Docker Swarm doesn't recover - docker

I'm Running a Flink v1.10 with 1 JobManager and 3 Taskmanagers in Docker Swarm, without Zookeeper. I've a Job running taking 12 Slots and i've 3 TM's with 20 Slots each (60 total).
After some tests everything went well except one test.
So, the test failing is, if i cancel the job manually i've a side-car retrying the job and the Taskmanager on the Browser Console doesn't recover and keeps decreasing.
More pratical example, so, i've a job running, consuming 12 slots of 60 total.
The web console shows me 48 Slots free and 3 TM's.
I cancel the job manually the side-car retriggers the job and the web
console shows me 36 Slots free and 2 TM's
The job enter's in a fail state and the Slot's will keep dreasing until 0 Slots free and 1 TM shows on the console.
The solution is scale down and scale up all the 3 TM's and everything get back to normal.
Everything work's fine with this configuration, the jobmanager recover's if i remove it, or if i scale up or down the TM's, but if i cancel the job the TM's looks like they loose the connection to the JM.
Any suggestions what i'm doing wrong?
Here is my flink-conf.yaml.
env.java.home: /usr/local/openjdk-8
env.log.dir: /opt/flink/
env.log.file: /var/log/flink.log
jobmanager.rpc.address: jobmanager1
jobmanager.rpc.port: 6123
jobmanager.heap.size: 2048m
#taskmanager.memory.process.size: 2048m
#env.java.opts.taskmanager: 2048m
taskmanager.memory.flink.size: 2048m
taskmanager.numberOfTaskSlots: 20
parallelism.default: 2
#==============================================================================
# High Availability
#==============================================================================
# The high-availability mode. Possible options are 'NONE' or 'zookeeper'.
#
high-availability: NONE
#high-availability.storageDir: file:///tmp/storageDir/flink_tmp/
#high-availability.zookeeper.quorum: zookeeper1:2181,zookeeper2:2181,zookeeper3:2181
#high-availability.zookeeper.quorum:
# ACL options are based on https://zookeeper.apache.org/doc/r3.1.2/zookeeperProgrammers.html#sc_BuiltinACLSchemes
# high-availability.zookeeper.client.acl: open
#==============================================================================
# Fault tolerance and checkpointing
#==============================================================================
# state.checkpoints.dir: hdfs://namenode-host:port/flink-checkpoints
# state.savepoints.dir: hdfs://namenode-host:port/flink-checkpoints
# state.backend.incremental: false
jobmanager.execution.failover-strategy: region
#==============================================================================
# Rest & web frontend
#==============================================================================
rest.port: 8080
rest.address: jobmanager1
# rest.bind-port: 8081
rest.bind-address: 0.0.0.0
#web.submit.enable: false
#==============================================================================
# Advanced
#==============================================================================
# io.tmp.dirs: /tmp
# classloader.resolve-order: child-first
# taskmanager.memory.network.fraction: 0.1
# taskmanager.memory.network.min: 64mb
# taskmanager.memory.network.max: 1gb
#==============================================================================
# Flink Cluster Security Configuration
#==============================================================================
# security.kerberos.login.use-ticket-cache: false
# security.kerberos.login.keytab: /mobi.me/flink/conf/smart3.keytab
# security.kerberos.login.principal: smart_user
# security.kerberos.login.contexts: Client,KafkaClient
#==============================================================================
# ZK Security Configuration
#==============================================================================
# zookeeper.sasl.login-context-name: Client
#==============================================================================
# HistoryServer
#==============================================================================
#jobmanager.archive.fs.dir: hdfs:///completed-jobs/
#historyserver.web.address: 0.0.0.0
#historyserver.web.port: 8082
#historyserver.archive.fs.dir: hdfs:///completed-jobs/
#historyserver.archive.fs.refresh-interval: 10000
blob.server.port: 6124
query.server.port: 6125
taskmanager.rpc.port: 6122
high-availability.jobmanager.port: 50010
zookeeper.sasl.disable: true
#recovery.mode: zookeeper
#recovery.zookeeper.quorum: zookeeper1:2181,zookeeper2:2181,zookeeper3:2181
#recovery.zookeeper.path.root: /
#recovery.zookeeper.path.namespace: /cluster_one

The solution was to increate the metaspace size in the flink-conf.yaml.
Br,
André.

Related

Where did Kubernetes Security context runAsUser 1000 come from?

I learnt that to run a container as rootless, you need to specify either the SecurityContext:runAsUser 1000 or specify the USER directive in the DOCKERFILE.
Question on this is that there is no UID 1000 on the Kubernetes/Docker host system itself.
I learnt before that Linux User Namespacing allows a user to have a different UID outside it's original NS.
Hence, how does UID 1000 exist under the hood? Did the original root (UID 0) create a new user namespace which is represented by UID 1000 in the container?
What happens if we specify UID 2000 instead?
Hope this answer helps you
I learnt that to run a container as rootless, you need to specify
either the SecurityContext:runAsUser 1000 or specify the USER
directive in the DOCKERFILE
You are correct except in runAsUser: 1000. you can specify any UID, not only 1000. Remember any UID you want to use (runAsUser: UID), that UID should already be there!
Often, base images will already have a user created and available but leave it up to the development or deployment teams to leverage it. For example, the official Node.js image comes with a user named node at UID 1000 that you can run as, but they do not explicitly set the current user to it in their Dockerfile. We will either need to configure it at runtime with a runAsUser setting or change the current user in the image using a derivative Dockerfile.
runAsUser: 1001 # hardcode user to non-root if not set in Dockerfile
runAsGroup: 1001 # hardcode group to non-root if not set in Dockerfile
runAsNonRoot: true # hardcode to non-root. Redundant to above if Dockerfile is set USER 1000
Remmeber that runAsUser and runAsGroup ensures container processes do not run as the root user but don’t rely on the runAsUser or runAsGroup settings to guarantee this. Be sure to also set runAsNonRoot: true.
Here is full example of securityContext:
# generic pod spec that's usable inside a deployment or other higher level k8s spec
apiVersion: v1
kind: Pod
metadata:
name: mypod
spec:
containers:
# basic container details
- name: my-container-name
# never use reusable tags like latest or stable
image: my-image:tag
# hardcode the listening port if Dockerfile isn't set with EXPOSE
ports:
- containerPort: 8080
protocol: TCP
readinessProbe: # I always recommend using these, even if your app has no listening ports (this affects any rolling update)
httpGet: # Lots of timeout values with defaults, be sure they are ideal for your workload
path: /ready
port: 8080
livenessProbe: # only needed if your app tends to go unresponsive or you don't have a readinessProbe, but this is up for debate
httpGet: # Lots of timeout values with defaults, be sure they are ideal for your workload
path: /alive
port: 8080
resources: # Because if limits = requests then QoS is set to "Guaranteed"
limits:
memory: "500Mi" # If container uses over 500MB it is killed (OOM)
#cpu: "2" # Not normally needed, unless you need to protect other workloads or QoS must be "Guaranteed"
requests:
memory: "500Mi" # Scheduler finds a node where 500MB is available
cpu: "1" # Scheduler finds a node where 1 vCPU is available
# per-container security context
# lock down privileges inside the container
securityContext:
allowPrivilegeEscalation: false # prevent sudo, etc.
privileged: false # prevent acting like host root
terminationGracePeriodSeconds: 600 # default is 30, but you may need more time to gracefully shutdown (HTTP long polling, user uploads, etc)
# per-pod security context
# enable seccomp and force non-root user
securityContext:
seccompProfile:
type: RuntimeDefault # enable seccomp and the runtimes default profile
runAsUser: 1001 # hardcode user to non-root if not set in Dockerfile
runAsGroup: 1001 # hardcode group to non-root if not set in Dockerfile
runAsNonRoot: true # hardcode to non-root. Redundant to above if Dockerfile is set USER 1000
sources:
Kubernetes Pod Specification Good Defaults
Configure a Security Context for a Pod or Container
10 Kubernetes Security Context settings you should understand
Something at the container layer calls the setuid(2) system call with that numeric user ID. There's no particular requirement to "create" a user; if you are able to call setuid() at all, you can call it with any numeric uid you want.
You can demonstrate this with plain Docker pretty easily. The docker run -u option takes any numeric uid, and you can docker run -u 2000 and your container will (probably) still run. It's common enough to docker run -u $(id -u) to run a container with the same numeric user ID as the host user even though that uid doesn't exist in the container's /etc/passwd file.
At a Kubernetes layer this is a little less common. A container can't usefully access host files in a clustered environment (...on which host?) so there's no need to have a user ID matching the host's. If the image already sets up a non-root user ID, you should be able to just use it as-is without setting it at the Kubernetes layer.

GKE private cluster Django server timeout 502 after 64 API requests

So I have a GKE private cluster production ready environment, where I host my Django Rest Framework microservices.
It all works fine, but after 64 API requests, the server does a timeout and the pod is unreachable.
I am not sure why this is happening.
I use the following stack:
Django 3.2.13
GKE v1.22.8-hke.201
Postgresql Cloud SQL
Docker
My Django application is a simple one. No authentication on POST. Just a small json-body is sent, and the server saves it to the PostgreSQL database.
The server connects via cloud_sql_proxy to the database, but I also tried to use the IP and the PySQL library. It works, but same error/timeout.
The workloads that are having the issue are any workloads that do a DB call, it does not matter if I do a SELECT * or INSERT call.
However, when I do a load balancing test (locust python) and test the home page of any microserver within the cluster (readiness) I do not experience any API calls timeout and server restart.
Type Name # reqs # fails | Avg Min Max Med | req/s failures/s
--------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|-----------
POST /api/logserver/logs/ 64 0(0.00%) | 96 29 140 110 | 10.00 0.00
--------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|-----------
Aggregated 64 0(0.00%) | 96 29 140 110 | 10.00 0.00
Type Name # reqs # fails | Avg Min Max Med | req/s failures/s
--------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|-----------
POST /api/logserver/logs/ 77 13(19.48%) | 92 17 140 100 | 0.90 0.00
--------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|-----------
Aggregated 77 13(19.48%) | 92 17 140 100 | 0.90 0.00
So It looks like it has something to do with the way the DB is connected to the pods?
I use cloud_sql_proxy to connect to the DB. And this also results in a Timeout and restart of the pod.
I have tried updating gunicorn in the docker environment for Django to:
CMD gunicorn -b :8080 --log-level debug --timeout 90 --workers 6 --access-logfile '-' --error-logfile '-' --graceful-timeout 30 helloadapta.wsgi
And I have tried replacing gunicorn with uwsgi.
I also tried using plain python manage.py runserver 0.0.0.0:8080
They all serve the backend, and I can connect to it. But the issue on timeout persists.
This is the infrastructure:
Private GKE cluster which uses subnetwork in GCP.
Cloud NAT on network for outbound external static IP (needed to whitelist microservers in third party servers)
The Cluster has more than enough memory and cpu:
nodes: 3
total vCPUs: 24
total memory: 96GB
Each node has:
CPU allocatable: 7.91 CPU
Memory allocatable: 29.79 GB
The config in the yaml file states that the pod gets:
resources:
limits:
cpu: "1"
memory: "2Gi"
requests:
cpu: "1"
memory: "2Gi"
Only when I do a readiness call to the server, there is no Timeout.
So it really points to a direction that the Cloud SQL breaks after 64 API calls.
The Cloud SQL Database stack is:
1 sql instance
1 database within the instance
4 vCPUs
15 GB memory
max_connections 85000
The CPU utilisation never goes above 5%

How to use the Kafka exporter Docker image?

I'm trying to use the Kafka Exporter packaged by Bitnami, https://github.com/bitnami/bitnami-docker-kafka-exporter, together with the Bitnami image for Kafka, https://github.com/bitnami/bitnami-docker-kafka. I'm trying to run the following docker-compose.yml:
version: '2'
networks:
app-tier:
driver: bridge
services:
zookeeper:
image: 'bitnami/zookeeper:latest'
environment:
- 'ALLOW_ANONYMOUS_LOGIN=yes'
networks:
- app-tier
kafka:
image: 'bitnami/kafka:latest'
environment:
- KAFKA_CFG_ZOOKEEPER_CONNECT=zookeeper:2181
- ALLOW_PLAINTEXT_LISTENER=yes
networks:
- app-tier
kafka-exporter:
image: bitnami/kafka-exporter:latest
ports:
- "9308:9308"
command:
- --kafka.server=kafka:9092
However, if I run this with docker-compose up, I get the following error:
bitnami-docker-kafka-kafka-exporter-1 | F0103 17:44:12.545739 1 kafka_exporter.go:865] Error Init Kafka Client: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)
I've tried to use the answer to How to pass arguments to entrypoint in docker-compose.yml to specify a command for the kafka-exporter service which - assuming the entrypoint is defined in exec form - should append additional flags to the invocation of the Docker Exporter binary. However, it seems that either the value of kafka:9092 is not right for the value of the kafka.server flag, or the flag is not getting picked up, or perhaps there is some kind of race condition where the exporter fails and exits before Kafka is up and running. Any ideas on how to get this example to work?
It would appear that this is just caused by a race condition with the Kafka Exporter trying to connect to Kafka before it has started up. If I just run docker-compose up and allow the Kafka Exporter to fail, and then separately run the danielqsh/kafka-exporter container, it works:
> docker run -it -p 9308:9308 --network bitnami-docker-kafka_app-tier danielqsj/kafka-exporter
I0103 18:49:04.694898 1 kafka_exporter.go:774] Starting kafka_exporter (version=1.4.2, branch=HEAD, revision=15e4ad6a9ea8203135d4b974e825f22e31c750e5)
I0103 18:49:04.703058 1 kafka_exporter.go:934] Listening on HTTP :9308
and on http://localhost:9308 I can see metrics:
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 8.82e-05
go_gc_duration_seconds{quantile="0.25"} 8.82e-05
go_gc_duration_seconds{quantile="0.5"} 8.82e-05
go_gc_duration_seconds{quantile="0.75"} 8.82e-05
go_gc_duration_seconds{quantile="1"} 8.82e-05
go_gc_duration_seconds_sum 8.82e-05
go_gc_duration_seconds_count 1
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 20
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.17.3"} 1
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 3.546384e+06
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 4.492048e+06
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 1.448119e+06
# HELP go_memstats_frees_total Total number of frees.
# TYPE go_memstats_frees_total counter
go_memstats_frees_total 6512
# HELP go_memstats_gc_cpu_fraction The fraction of this program's available CPU time used by the GC since the program started.
# TYPE go_memstats_gc_cpu_fraction gauge
go_memstats_gc_cpu_fraction 1.6141668631896834e-06
# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.
# TYPE go_memstats_gc_sys_bytes gauge
go_memstats_gc_sys_bytes 4.835688e+06
# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.
# TYPE go_memstats_heap_alloc_bytes gauge
go_memstats_heap_alloc_bytes 3.546384e+06
# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.
# TYPE go_memstats_heap_idle_bytes gauge
go_memstats_heap_idle_bytes 2.686976e+06
# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.
# TYPE go_memstats_heap_inuse_bytes gauge
go_memstats_heap_inuse_bytes 5.07904e+06
# HELP go_memstats_heap_objects Number of allocated objects.
# TYPE go_memstats_heap_objects gauge
go_memstats_heap_objects 4687
# HELP go_memstats_heap_released_bytes Number of heap bytes released to OS.
# TYPE go_memstats_heap_released_bytes gauge
go_memstats_heap_released_bytes 1.736704e+06
# HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.
# TYPE go_memstats_heap_sys_bytes gauge
go_memstats_heap_sys_bytes 7.766016e+06
# HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
# TYPE go_memstats_last_gc_time_seconds gauge
go_memstats_last_gc_time_seconds 1.6412358823440428e+09
# HELP go_memstats_lookups_total Total number of pointer lookups.
# TYPE go_memstats_lookups_total counter
go_memstats_lookups_total 0
# HELP go_memstats_mallocs_total Total number of mallocs.
# TYPE go_memstats_mallocs_total counter
go_memstats_mallocs_total 11199
# HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.
# TYPE go_memstats_mcache_inuse_bytes gauge
go_memstats_mcache_inuse_bytes 9600
# HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.
# TYPE go_memstats_mcache_sys_bytes gauge
go_memstats_mcache_sys_bytes 16384
# HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.
# TYPE go_memstats_mspan_inuse_bytes gauge
go_memstats_mspan_inuse_bytes 71536
# HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.
# TYPE go_memstats_mspan_sys_bytes gauge
go_memstats_mspan_sys_bytes 81920
# HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.
# TYPE go_memstats_next_gc_bytes gauge
go_memstats_next_gc_bytes 5.283728e+06
# HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.
# TYPE go_memstats_other_sys_bytes gauge
go_memstats_other_sys_bytes 1.499625e+06
# HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.
# TYPE go_memstats_stack_inuse_bytes gauge
go_memstats_stack_inuse_bytes 622592
# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.
# TYPE go_memstats_stack_sys_bytes gauge
go_memstats_stack_sys_bytes 622592
# HELP go_memstats_sys_bytes Number of bytes obtained from system.
# TYPE go_memstats_sys_bytes gauge
go_memstats_sys_bytes 1.6270344e+07
# HELP go_threads Number of OS threads created.
# TYPE go_threads gauge
go_threads 8
# HELP kafka_brokers Number of Brokers in the Kafka Cluster.
# TYPE kafka_brokers gauge
kafka_brokers 1
# HELP kafka_exporter_build_info A metric with a constant '1' value labeled by version, revision, branch, and goversion from which kafka_exporter was built.
# TYPE kafka_exporter_build_info gauge
kafka_exporter_build_info{branch="HEAD",goversion="go1.17.3",revision="15e4ad6a9ea8203135d4b974e825f22e31c750e5",version="1.4.2"} 1
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 0.11
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1.048576e+06
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 12
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 1.703936e+07
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.64123574422e+09
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 7.35379456e+08
# HELP process_virtual_memory_max_bytes Maximum amount of virtual memory available in bytes.
# TYPE process_virtual_memory_max_bytes gauge
process_virtual_memory_max_bytes 1.8446744073709552e+19
# HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
# TYPE promhttp_metric_handler_requests_in_flight gauge
promhttp_metric_handler_requests_in_flight 1
# HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
# TYPE promhttp_metric_handler_requests_total counter
promhttp_metric_handler_requests_total{code="200"} 3
promhttp_metric_handler_requests_total{code="500"} 0
promhttp_metric_handler_requests_total{code="503"} 0
Update
A more reliable way to do this is to use a wrapper script in order to perform an application-specific health check as described in https://docs.docker.com/compose/startup-order/. With the following directory structure,
.
├── docker-compose.yml
└── kafka-exporter
├── Dockerfile
└── run.sh
the following Dockerfile,
FROM bitnami/kafka-exporter:latest
COPY run.sh /opt/bitnami/kafka-exporter/bin
ENTRYPOINT ["run.sh"]
and the following run.sh,
#!/bin/sh
while ! bin/kafka_exporter; do
echo "Waiting for the Kafka cluster to come up..."
sleep 1
done
and the following docker-compose.yml,
version: '2'
networks:
app-tier:
driver: bridge
services:
zookeeper:
image: 'bitnami/zookeeper:latest'
environment:
- 'ALLOW_ANONYMOUS_LOGIN=yes'
networks:
- app-tier
kafka:
image: 'bitnami/kafka:latest'
environment:
- KAFKA_CFG_ZOOKEEPER_CONNECT=zookeeper:2181
- ALLOW_PLAINTEXT_LISTENER=yes
networks:
- app-tier
kafka-exporter:
build: kafka-exporter
ports:
- "9308:9308"
networks:
- app-tier
entrypoint: ["run.sh"]
myapp:
image: 'bitnami/kafka:latest'
environment:
- KAFKA_CFG_ZOOKEEPER_CONNECT=zookeeper:2181
- ALLOW_PLAINTEXT_LISTENER=yes
networks:
- app-tier
Upon running docker-compose build && docker-compose up, I can see from the logs that after ~2 seconds (on the third attempt) the Kafka Exporter starts successfully:
> docker logs bitnami-docker-kafka-kafka-exporter-1 -f
I0104 16:05:39.765921 8 kafka_exporter.go:769] Starting kafka_exporter (version=1.4.2, branch=non-git, revision=non-git)
F0104 16:05:40.525065 8 kafka_exporter.go:865] Error Init Kafka Client: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)
Waiting for the Kafka cluster to come up...
I0104 16:05:41.538482 16 kafka_exporter.go:769] Starting kafka_exporter (version=1.4.2, branch=non-git, revision=non-git)
F0104 16:05:42.295872 16 kafka_exporter.go:865] Error Init Kafka Client: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)
Waiting for the Kafka cluster to come up...
I0104 16:05:43.307293 24 kafka_exporter.go:769] Starting kafka_exporter (version=1.4.2, branch=non-git, revision=non-git)
I0104 16:05:43.686798 24 kafka_exporter.go:929] Listening on HTTP :9308
Its bit difficult to run all the required dependencies for kafka-exporter.
Myself did few simple steps as following below.
Step 1
docker pull danielqsj/kafka-exporter:latest
Step 2
./bin/zookeeper-server-start.sh config/zookeeper.properties
Step 3
./bin/kafka-server-start.sh config/server.properties
Step 4
docker run -ti --rm -p 9308:9308 danielqsj/kafka-exporter --kafka.server=host.docker.internal:9092 --log.enable-sarama
If you note the above command i used "host.docker.internal" which will helps kafka-exporter to listen on my machines localhost.

Airflow Gunicorn timeout

I have an external service that calls the airflow trigger DAG API (quite alot of requests) at once during a fixed timing.
I am having Gunicorn timeout despite tweaking the airflow.cfg settings.
master_timeout, worker_timeout, and refresh_interval.
I am also using gevent worker type for async.
Is there something am missing out?
# Number of seconds the webserver waits before killing gunicorn master that doesn't respond
web_server_master_timeout = 600
# Number of seconds the gunicorn webserver waits before timing out on a worker
web_server_worker_timeout = 480
# Number of workers to refresh at a time. When set to 0, worker refresh is
# disabled. When nonzero, airflow periodically refreshes webserver workers by
# bringing up new ones and killing old ones.
worker_refresh_batch_size = 1
# Number of seconds to wait before refreshing a batch of workers.
worker_refresh_interval = 30
# Number of workers to run the Gunicorn web server
workers = 5
# The worker class gunicorn should use. Choices include
# sync (default), eventlet, gevent
worker_class = gevent

How to increase RPS in distributed locust load test

I cannot get past 1200 RPS no matter if I use 4 or 5 workers.
I tried to start locust in 3 variations -- one, four, and five worker processes (docker-compose up --scale worker_locust=num_of_workers). I use 3000 clients with a hatch rate of 100. The service that I am loading is a dummy that just always returns yo and HTTP 200, i.e., it's not doing anything, but returning a constant string. When I have one worker I get up to 600 RPS (and start to see some HTTP errors), when I have 4 workers I can get up to the ~1200 RPS (without a single HTTP error):
When I have 5 workers I get the same ~1200 RPS, but with a lower CPU usage:
I suppose that if the CPU went down in the 5-worker case (with respect to 4-worker case), than it's not the CPU that is bounding the RPS.
I am running this on a 6-core MacBook.
The locustfile.py I use posts essentially almost empty requests (just a few parameters):
from locust import HttpUser, task, between, constant
class QuickstartUser(HttpUser):
wait_time = constant(1) # seconds
#task
def add_empty_model(self):
self.client.post(
"/models",
json={
"grouping": {
"grouping": "a/b"
},
"container_image": "myrepo.com",
"container_tag": "0.3.0",
"prediction_type": "prediction_type",
"model_state_base64": "bXkgc3RhdGU=",
"model_config": {},
"meta": {}
}
)
My docker-compose.yml:
services:
myservice:
build:
context: ../
ports:
- "8000:8000"
master_locust:
image: locustio/locust
ports:
- "8089:8089"
volumes:
- ./:/mnt/locust
command: -f /mnt/locust/locustfile.py --master
worker_locust:
image: locustio/locust
volumes:
- ./:/mnt/locust
command: -f /mnt/locust/locustfile.py --worker --master-host master_locust
Can someone suggest the direction of getting towards the 2000 RPS?
You should check out the FAQ.
https://github.com/locustio/locust/wiki/FAQ#increase-my-request-raterps
It's probably your server not being able to handle more requests, at least from your one machine. There are other things you can do to make more sure that's the case. You can try FastHttpUser, running on multiple machines, or just upping the number of users. But if you can, check to see how the server is handling the load and see what you can optimize there.
You will need more workers to generate more RPS. I thought one worker will have limited local port range when creating tcp connection to the destination.
You may check this value in your linux worker:
net.ipv4.ip_local_port_range
Try to tweak that number it on your each linux worker, or simply create hundreds of new worker with another powerful machine (your 6-core cpu macbook is to small)
To create many workers you could try Locust in kubernetes with horizontal pod autoscaling for the workers deployment.
Here is some helm chart to start play arround with Locust k8s deployment:
https://github.com/deliveryhero/helm-charts/tree/master/stable/locust
You may need to check these args for it:
worker.hpa.enabled
worker.hpa.maxReplicas
worker.hpa.minReplicas
worker.hpa.targetCPUUtilizationPercentage
simply set the maxReplicas value to get more workers when the load testing is started. Or you can scale it manually with kubectl command to scale worker pods to your desired number.
I've done to generate minimal 8K rps (stable value for my app, it can't serve better) with 1000 pods/worker, with Locust load test parameter like 200K users with 2000 spawn per second.
You may have to scale out your server when you reach higher throughput, but with 1000 pods/worker i thought you can easily reach 15K-20K rps.

Resources