I noticed that Redis works much slower with docker than it does natively. What could this be related to?
Here is my tests on VPS with 1 CPU core / 1 GB RAM / Ubuntu 22.04.
Native Redis on the host (the fastest)
sudo apt-get update
sudo apt-get install redis (redis_version:6.0.16)
redis-benchmark -q -n 100000
PING_INLINE: 23004.37 requests per second
PING_BULK: 21915.41 requests per second
SET: 23792.53 requests per second
GET: 22867.60 requests per second
INCR: 23894.86 requests per second
LPUSH: 25252.53 requests per second
RPUSH: 24551.93 requests per second
LPOP: 24414.06 requests per second
RPOP: 24307.24 requests per second
SADD: 23512.82 requests per second
HSET: 24746.35 requests per second
SPOP: 22758.31 requests per second
ZADD: 23969.32 requests per second
ZPOPMIN: 22701.47 requests per second
LPUSH (needed to benchmark LRANGE): 24113.82 requests per second
LRANGE_100 (first 100 elements): 17531.56 requests per second
LRANGE_300 (first 300 elements): 7954.18 requests per second
LRANGE_500 (first 450 elements): 6106.12 requests per second
LRANGE_600 (first 600 elements): 5296.89 requests per second
MSET (10 keys): 30012.00 requests per second
Redis in Docker tested outside the container (the slowest)
docker pull redis:6.0.17
docker run --name redis -p 6378:6379 -d redis:6.0.17
redis-benchmark -q -n 100000 -p 6378
PING_INLINE: 7548.31 requests per second
PING_BULK: 7623.69 requests per second
SET: 7474.96 requests per second
GET: 7474.96 requests per second
INCR: 7488.95 requests per second
LPUSH: 7443.25 requests per second
RPUSH: 7487.27 requests per second
LPOP: 7401.92 requests per second
RPOP: 7163.84 requests per second
SADD: 7252.16 requests per second
HSET: 7192.17 requests per second
SPOP: 7217.61 requests per second
ZADD: 7331.38 requests per second
ZPOPMIN: 7597.63 requests per second
LPUSH (needed to benchmark LRANGE): 7392.62 requests per second
LRANGE_100 (first 100 elements): 6248.05 requests per second
LRANGE_300 (first 300 elements): 6377.55 requests per second
LRANGE_500 (first 450 elements): 5748.45 requests per second
LRANGE_600 (first 600 elements): 4578.75 requests per second
MSET (10 keys): 6895.60 requests per second
Running the test inside the container (slightly slower than on the host, but I don't run my app inside the container)
docker exec -it redis sh
redis-benchmark -q -n 100000
PING_INLINE: 22416.50 requests per second
PING_BULK: 21654.40 requests per second
SET: 23413.72 requests per second
GET: 22351.36 requests per second
INCR: 22784.23 requests per second
LPUSH: 24467.83 requests per second
RPUSH: 23651.84 requests per second
LPOP: 23781.21 requests per second
RPOP: 23691.07 requests per second
SADD: 22747.95 requests per second
HSET: 24301.34 requests per second
SPOP: 22172.95 requests per second
ZADD: 24301.34 requests per second
ZPOPMIN: 22578.46 requests per second
LPUSH (needed to benchmark LRANGE): 24177.95 requests per second
LRANGE_100 (first 100 elements): 13817.88 requests per second
LRANGE_300 (first 300 elements): 7212.93 requests per second
LRANGE_500 (first 450 elements): 5898.31 requests per second
LRANGE_600 (first 600 elements): 4890.45 requests per second
MSET (10 keys): 29154.52 requests per second
Running the test inside other container with docker compose (average performance)
Dockerfile
FROM redis:6.0.17
CMD redis-benchmark -q -n 100000 -h redis
docker-compose.yml
version: "3.9"
services:
web:
build: .
redis:
image: "redis:6.0.17"
PING_INLINE: 16173.38 requests per second, p50=1.495 msec
PING_MBULK: 15172.20 requests per second, p50=1.503 msec
SET: 15989.77 requests per second, p50=1.503 msec
GET: 15701.05 requests per second, p50=1.511 msec
INCR: 16121.23 requests per second, p50=1.487 msec
LPUSH: 16428.46 requests per second, p50=1.495 msec
RPUSH: 14630.58 requests per second, p50=1.639 msec
LPOP: 15542.43 requests per second, p50=1.567 msec
RPOP: 15518.31 requests per second, p50=1.551 msec
SADD: 14900.91 requests per second, p50=1.567 msec
HSET: 15176.81 requests per second, p50=1.575 msec
SPOP: 14918.69 requests per second, p50=1.583 msec
ZADD: 15855.40 requests per second, p50=1.543 msec
ZPOPMIN: 15248.55 requests per second, p50=1.551 msec
LPUSH (needed to benchmark LRANGE): 16466.33 requests per second, p50=1.495 msec
LRANGE_100 (first 100 elements): 10330.58 requests per second, p50=2.423 msec
LRANGE_300 (first 300 elements): 5361.07 requests per second, p50=4.671 msec
LRANGE_500 (first 500 elements): 3863.99 requests per second, p50=6.191 msec
LRANGE_600 (first 600 elements): 3718.99 requests per second, p50=6.559 msec
MSET (10 keys): 20618.56 requests per second, p50=1.415 msec
As you can see from the tests, every time you use Docker, the Redis performance slows down.
But the main selling point of Docker is that it does not have a negative impact on performance.
Docker containers run inside a virtual environment and are isolated from the host operating system, which can add some overhead. Running Redis in a container can lead to resource constraints, such as limited memory and CPU availability. For example, if you did not specify the memory or CPU resources for the Redis container, the host's kernel might have to limit the container's access to these resources, which can also impact performance.
To improve performance when running Redis inside a Docker container, you can consider following improvements:
Assign a minimum of 2 GB of memory to the container.
Use the host network mode for the container, rather than the default bridge mode.
Mount the Redis data directory to the host, rather than using a
volume, to eliminate the overhead of file system access in the
container.
Also you can check if your configuration is tuned or not
I hope these tips help you to improve the performance of Redis when running inside a Docker container.
I would assume the different results are due to the network setup of the containers.
For host -> container communication with published ports, docker needs to do network address translation(NAT) to route your request. If you do ss -tlpn you will see not redis listening on the published port but dockerd. The docker daemon then will forward the connection to the correct container port. To do this it needs to keep track of all the connection information and rewrite every IP packet send to the port.
For container -> container communication docker does not need to do the whole NAT but still has some considerable routing logic. Depending on your configuration this could be for example configured in iptables.
To give you a quick overview over how this works:
Docker sets up a network namespace for every container. A bare network namespace has no network connectivity without further modification.
To add connectivity to the container Docker creates a veth pair and moves one end into the network namespace and configures IP addresses for the veth interfaces.
This is done for both containers.
Now if those containers want to talk to each other docker needs to set up correct routing in iptables for both veth interfaces.
I am not sure about all the details, because my investigation into all of this was quite a while back.
As already suggested by #sks147 you could try to run the container with host networking to disable network isolation.
Related
I have an external service that calls the airflow trigger DAG API (quite alot of requests) at once during a fixed timing.
I am having Gunicorn timeout despite tweaking the airflow.cfg settings.
master_timeout, worker_timeout, and refresh_interval.
I am also using gevent worker type for async.
Is there something am missing out?
# Number of seconds the webserver waits before killing gunicorn master that doesn't respond
web_server_master_timeout = 600
# Number of seconds the gunicorn webserver waits before timing out on a worker
web_server_worker_timeout = 480
# Number of workers to refresh at a time. When set to 0, worker refresh is
# disabled. When nonzero, airflow periodically refreshes webserver workers by
# bringing up new ones and killing old ones.
worker_refresh_batch_size = 1
# Number of seconds to wait before refreshing a batch of workers.
worker_refresh_interval = 30
# Number of workers to run the Gunicorn web server
workers = 5
# The worker class gunicorn should use. Choices include
# sync (default), eventlet, gevent
worker_class = gevent
I cannot get past 1200 RPS no matter if I use 4 or 5 workers.
I tried to start locust in 3 variations -- one, four, and five worker processes (docker-compose up --scale worker_locust=num_of_workers). I use 3000 clients with a hatch rate of 100. The service that I am loading is a dummy that just always returns yo and HTTP 200, i.e., it's not doing anything, but returning a constant string. When I have one worker I get up to 600 RPS (and start to see some HTTP errors), when I have 4 workers I can get up to the ~1200 RPS (without a single HTTP error):
When I have 5 workers I get the same ~1200 RPS, but with a lower CPU usage:
I suppose that if the CPU went down in the 5-worker case (with respect to 4-worker case), than it's not the CPU that is bounding the RPS.
I am running this on a 6-core MacBook.
The locustfile.py I use posts essentially almost empty requests (just a few parameters):
from locust import HttpUser, task, between, constant
class QuickstartUser(HttpUser):
wait_time = constant(1) # seconds
#task
def add_empty_model(self):
self.client.post(
"/models",
json={
"grouping": {
"grouping": "a/b"
},
"container_image": "myrepo.com",
"container_tag": "0.3.0",
"prediction_type": "prediction_type",
"model_state_base64": "bXkgc3RhdGU=",
"model_config": {},
"meta": {}
}
)
My docker-compose.yml:
services:
myservice:
build:
context: ../
ports:
- "8000:8000"
master_locust:
image: locustio/locust
ports:
- "8089:8089"
volumes:
- ./:/mnt/locust
command: -f /mnt/locust/locustfile.py --master
worker_locust:
image: locustio/locust
volumes:
- ./:/mnt/locust
command: -f /mnt/locust/locustfile.py --worker --master-host master_locust
Can someone suggest the direction of getting towards the 2000 RPS?
You should check out the FAQ.
https://github.com/locustio/locust/wiki/FAQ#increase-my-request-raterps
It's probably your server not being able to handle more requests, at least from your one machine. There are other things you can do to make more sure that's the case. You can try FastHttpUser, running on multiple machines, or just upping the number of users. But if you can, check to see how the server is handling the load and see what you can optimize there.
You will need more workers to generate more RPS. I thought one worker will have limited local port range when creating tcp connection to the destination.
You may check this value in your linux worker:
net.ipv4.ip_local_port_range
Try to tweak that number it on your each linux worker, or simply create hundreds of new worker with another powerful machine (your 6-core cpu macbook is to small)
To create many workers you could try Locust in kubernetes with horizontal pod autoscaling for the workers deployment.
Here is some helm chart to start play arround with Locust k8s deployment:
https://github.com/deliveryhero/helm-charts/tree/master/stable/locust
You may need to check these args for it:
worker.hpa.enabled
worker.hpa.maxReplicas
worker.hpa.minReplicas
worker.hpa.targetCPUUtilizationPercentage
simply set the maxReplicas value to get more workers when the load testing is started. Or you can scale it manually with kubectl command to scale worker pods to your desired number.
I've done to generate minimal 8K rps (stable value for my app, it can't serve better) with 1000 pods/worker, with Locust load test parameter like 200K users with 2000 spawn per second.
You may have to scale out your server when you reach higher throughput, but with 1000 pods/worker i thought you can easily reach 15K-20K rps.
I have a UWSGI running behind Nginx proxy server. I tried to benchmark my backend instance using Apache bench. At one point in time, I get Too many open files (24) error when I run the command ab -c 1100 -n 2000 https://example.com/test.
I changed the ulimits of my ECS Instance as well as the docker containers and confirmed it by typing ulimit -n which returns 100000 in both the locations.
I cross checked the Individual NGINX, Uwsgi processes limits by opening the /proc/PID where the Max open files is set to 100000.
The worker_connections and worker_rlimit_nofile parameters in /etc/nginx/nginx.conf are also set to highest limit possible.
I want ask 2 question about docker stats
for example
NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
container_1 1.52% 11.72MiB / 7.388GiB 0.15% 2.99GB / 372MB 9.4MB / 0B 9
in this situation net i/o statement 2.99GB / 372MB
how much time reflected in that?
for one hour? or all of time?
and how can check docker container network traffic for an hour or minute?
i would appreciate if you any other advice.
thank you
This blog explains the network io of the docker stats command
Displays total bytes received (RX) and transmitted (TX).
If you need finer grained access, the blog also suggests to use the network pseudo files on your host system.
$ CONTAINER_PID=`docker inspect -f '{{ .State.Pid }}' $CONTAINER_ID`
$ cat /proc/$CONTAINER_PID/net/dev
To your second part: I'm not aware of any build-in method to get the traffic over the specific period, others might correct me. I think the easiest solution is to poll one of the two interfaces and calculate the differences yourself.
I would like to run two containers with the following resource allocation:
Container "C1": reserved cpu1, shared cpu2 with 20 cpu-shares
Container "C2": reserved cpu3, shared cpu2 with 80 cpu-shares
If I run the two containers in this way:
docker run -d --name='C1' --cpu-shares=20 --cpuset-cpus="1,2" progrium/stress --cpu 2
docker run -d --name='C2' --cpu-shares=80 --cpuset-cpus="2,3" progrium/stress --cpu 2
I got that C1 takes 100% of cpu1 as expected but 50% of cpu2 (instead of 20%), C2 takes 100% of cpu3 as expected and 50% of cpu2 (instead of 80%).
It looks like the --cpu-shares option is ignored.
Is there a way to obtain the behavior I'm looking for?
docker run mentions that parameter as:
--cpu-shares=0 CPU shares (relative weight)
And contrib/completion/zsh/_docker#L452 includes:
"($help)--cpu-shares=[CPU shares (relative weight)]:CPU shares:(0 10 100 200 500 800 1000)"
So those values are not %-based.
The OP mentions --cpu-shares=20/80 works with the following Cpuset constraints:
docker run -ti --cpuset-cpus="0,1" C1 # instead of 1,2
docker run -ti --cpuset-cpus="3,4" C2 # instead of 2,3
(those values are validated/checked only since docker 1.9.1 with PR 16159)
Note: there is also CPU quota constraint:
The --cpu-quota flag limits the container’s CPU usage. The default 0 value allows the container to take 100% of a CPU resource (1 CPU).