Apache Bench 'Time per Request' decreases with increasing concurrency - response

I am testing my web server using Apache Bench and I am getting the following responses
Request : ab -n 1000 -c 20 https://www.my-example.com
Time per request: 16.264 [ms] (mean, across all concurrent requests)
Request : ab -n 10000 -c 100 https://www.my-example.com
Time per request: 3.587 [ms] (mean, across all concurrent requests)
Request : ab -n 10000 -c 500 https://www.my-example.com
Time per request: 1.381 [ms] (mean, across all concurrent requests)
The 'Time per request' is decreasing with increasing concurrency. May I know why? Or is this by any chance a bug?

You should be seeing 2 values for Time per request. One of them is [ms] (mean) whereas the other one is [ms] (mean, across all concurrent requests). A concurrency of 20 means that 20 simultaneous requests were sent in a single go and the concurrency was maintained for the duration of the test. The lower value is total_time_taken/total_number_of_requests and it kind of disregards the concurrency aspect whereas the other value is closer to the mean response time (actual response time) you were getting for your requests. I generally visualize it as x concurrent requests being sent in a single batch, and that value is the mean time it took for a batch of concurrent requests to complete. This value will also be closer to your percentiles, which also points to it being the actual time taken by the request.

Related

Understand how k6 manages at low level a large number of API call in a short period of time

I'm new with k6 and I'm sorry if I'm asking something naive. I'm trying to understand how that tool manage the network calls under the hood. Is it executing them at the max rate he can ? Is it queuing them based on the System Under Test's response time ?
I need to get that because I'm running a lot of tests using both k6 run and k6 cloud but I can't make more than ~2000 requests per second (looking at k6 results). I was wondering if it is k6 that implement some kind of back-pressure mechanism if it understand that my system is "slow" or if there are some other reasons why I can't overcome that limit.
I read here that is possible to make 300.000 request per second and that the cloud environment is already configured for that. I also try to manually configure my machine but nothing changed.
e.g. The following tests are identical, the only changes is the number of VUs. I run all test on k6 cloud.
Shared parameters:
60 api calls (I have a single http.batch with 60 api calls)
Iterations: 100
Executor: per-vu-iterations
Here I got 547 reqs/s:
VUs: 10 (60.000 calls with an avg response time of 108ms)
Here I got 1.051,67 reqs/s:
VUs: 20 (120.000 calls with an avg response time of 112 ms)
I got 1.794,33 reqs/s:
VUs: 40 (240.000 calls with an avg response time of 134 ms)
Here I got 2.060,33 ​reqs/s:
VUs: 80 (480.000 calls with an avg response time of 238 ms)
Here I got 2.223,33 ​reqs/s:
VUs: 160 (960.000 calls with an avg response time of 479 ms)
Here I got 2.102,83 peak ​reqs/s:
VUs: 200 (1.081.380 calls with an avg response time of 637 ms) // I reach the max duration here, that's why he stop
What I was expecting is that if my system can't handle so much requests I have to see a lot of timeout errors but I haven't see any. What I'm seeing is that all the API calls are executed and no errors is returned. Can anyone help me ?
As k6 - or more specifically, your VUs - execute code synchronously, the amount of throughput you can achieve is fully dependent on how quickly the system you're interacting with responds.
Lets take this script as an example:
import http from 'k6/http';
export default function() {
http.get("https://httpbin.org/delay/1");
}
The endpoint here is purposefully designed to take 1 second to respond. There is no other code in the exported default function. Because each VU will wait for a response (or a timeout) before proceeding past the http.get statement, the maximum amount of throughput for each VU will be a very predictable 1 HTTP request/sec.
Often, response times (and/or errors, like timeouts) will increase as you increase the number of VUs. You will eventually reach a point where adding VUs does not result in higher throughput. In this situation, you've basically established the maximum throughput the System-Under-Test can handle. It simply can't keep up.
The only situation where that might not be the case is when the system running k6 runs out of hardware resources (usually CPU time). This is something that you must always pay attention to.
If you are using k6 OSS, you can scale to as many VUs (concurrent threads) as your system can handle. You could also use http.batch to fire off multiple requests concurrently within each VU (the statement will still block until all responses have been received). This might be slightly less overhead than spinning up additional VUs.

Difference of Dockerized Redis performance between server and local system

I conduct stress test on my server the result of latency time for 1000 request per second was 1 second, i found out the latency problem is because of redis so I check dockerized redis performance(benchmark) on CentOS 7 (CentOs is on vmware virtual machine cpu: 22 core, RAM: 30 GB) by
redis-benchmark -q -n 100000
Give me following results:
PING_INLINE: 26420.08 requests per second
PING_BULK: 27389.76 requests per second
SET: 27144.41 requests per second
GET: 26702.27 requests per second
INCR: 27041.64 requests per second
LPUSH: 27203.48 requests per second
RPUSH: 27188.69 requests per second
LPOP: 27005.13 requests per second
RPOP: 27367.27 requests per second
SADD: 26645.35 requests per second
HSET: 26881.72 requests per second
SPOP: 27624.31 requests per second
LPUSH (needed to benchmark LRANGE): 27100.27 requests per second
LRANGE_100 (first 100 elements): 20703.93 requests per second
LRANGE_300 (first 300 elements): 11763.32 requests per second
LRANGE_500 (first 450 elements): 9627.42 requests per second
LRANGE_600 (first 600 elements): 8078.20 requests per second
MSET (10 keys): 26709.40 requests per second
But when I’m checking dockerized redis performance(benchmark) on my laptop which is ubuntu 19.04 (cpu: core i3, RAM: 12GB) by
redis-benchmark -q -n 100000
Give me the following results:
PING_INLINE: 117096.02 requests per second
PING_BULK: 126742.72 requests per second
SET: 119904.08 requests per second
GET: 126903.55 requests per second
INCR: 127064.80 requests per second
LPUSH: 111482.72 requests per second
RPUSH: 121359.23 requests per second
LPOP: 112994.35 requests per second
RPOP: 123152.71 requests per second
SADD: 130378.09 requests per second
HSET: 130039.02 requests per second
SPOP: 103199.18 requests per second
LPUSH (needed to benchmark LRANGE): 88809.95 requests per second
LRANGE_100 (first 100 elements): 51046.45 requests per second
LRANGE_300 (first 300 elements): 17853.96 requests per second
LRANGE_500 (first 450 elements): 12784.45 requests per second
LRANGE_600 (first 600 elements): 9744.69 requests per second
MSET (10 keys): 132802.12 requests per second
Why is the performance result of my local system is much better than server despite the server's higher hardware capabilities?

What is the best way to performance test an SQS consumer to find the max TPS that one host can handle?

I have a SQS consumer running in EventConsumerService that needs to handle up to 3K TPS successfully, sometimes upwards of 20K TPS (or 1.2 million messages per minute). For each message processed, I make a REST call to DataService's TCP VIP. I'm trying to perform a load test to find the max TPS that one host can handle in EventConsumerService without overstraining:
Request volume on dependencies, DynamoDB storage, etc
CPU utilization in both EventConsumerService and DataService
Network connections per host
IO stats due to overlogging
DLQ size must be minimal, currently I am seeing my DLQ growing to 500K messages due to 500 Service Unavailable exceptions thrown from DataService, so something must be wrong.
Approximate age of oldest message. I do not want a message sitting in the queue for over X minutes.
Fatals and latency of the REST call to DataService
Active threads
This is how I am performing the performance test:
I set up both my consumer and the other service on one host, the reason being I want to understand the load on both services per host.
I use a TPS generator to fill the SQS queue with a million messages
The EventConsumerService service is already running in production. Once messages started filling the SQS queue, I immediately could see requests being sent to DataService.
Here are the parameters I am tuning to find messagesPolledPerSecond:
messagesPolledPerSecond = (numberOfHosts * numberOfPollers * messageFetchSize) * (1000/(sleepTimeBetweenPollsPerMs+receiveMessageTimePerMs))
messagesInSurge / messagesPolledPerSecond = ageOfOldestMessageSLA
ageOfOldestMessage + settingsUpdatedLatency < latencySLA
The variables for SqsConsumer which I kept constant are:
numberOfHosts = 1
ReceiveMessageTimePerMs = 60 ms? It's out of my control
Max thread pool size: 300
Other factors are all game:
Number of pollers (default 1), I set to 150
Sleep time between polls (default 100 ms), I set to 0 ms
Sleep time when no messages (default 1000 ms), ???
message fetch size (default 1), I set to 10
However, with the above parameters, I am seeing a high amount of messages being sent to the DLQ due to server errors, so clearly I have set values to be too high. This testing methodology seems highly inefficient, and I am unable to find the optimal TPS that does not cause such a tremendous number of messages to be sent to the DLQ, and does not cause such a high approximate age of the oldest message.
Any guidance is appreciated in how best I should test. It'd be very helpful if we can set up a time to chat. PM me directly

How to write bosun alerts which handle low traffic volumes

If you are writing a bosun alert which is based of a percentage error rate for requests handled by your system, how do you write it in such a way that it handles periods of low traffic.
For example:
If I have an alert which looks back over the last 5 minutes and works out the error rate for requests
$errorRate = $numberErr/$numberReq and then triggers an alarm if the errorRate exceeds a predefined threshold crit = $errorRate > 0.05 this can work quite well so long as every 5 minute period had a sufficiently large number of requests ($numberReq).
If the number of requests in a 5 minute period was 10,000 then 501 errors would be required to trigger an alarm. However if the number of requests in a 5 minute period was 100 then only 5 errors would be required to trigger an alarm.
How can I write an alert which handles periods where the number of requests are so low that a small number of errors will equate to a large error rate. I had considered a sliding window of time, rather than a fixed 5 minute period, where the window would increase in size until the number of requests was high enough to give some confidence in the alarm. e.g. increase the time period until the number of requests is 10,000.
I can't find a way to achieve this in bosun, and I don't want to commit to a larger period of time for my alerts because the traffic rate varies so much. A longer period during peak traffic could result in an actual error causing a much larger impact.
I generally pair any percentage and/or historical based alerts with a static threshold.
For example: crit = numberErr > 100 && $errorRate > 0.05. That way the percent part doesn't matter unless the number of errors have also crossed some threshold because the entire statement won't be true.

Neo4j node creation speed

I have a fresh neo4j setup on my laptop, and creating new nodes via the REST API seems to be quite slow (~30-40 ms average). I've Googled around a bit, but can't find any real benchmarks for how long it "should" take; there's this post, but that only lists relative performance, not absolute performance. Is neo4j inherently limited to only adding ~30 new nodes per second (outside of batch mode), or is there something wrong with my configuration?
Config details:
Neo4j version 2.2.5
Server is on my mid-end 2014 laptop, running Ubuntu 15.04
OpenJDK version 1.8
Calls to the server are also from my laptop (via localhost:7474), so there shouldn't be any network latency involved
I'm calling neo4j via Clojure/Neocons; method used is "create" in the class clojurewerkz.neocons.rest.nodes
Using Cypher seems to be even slower; eg. calling "PROFILE CREATE (you:Person {name:"Jane Doe"}) RETURN you" via the HTML interface returns "Cypher version: CYPHER 2.2, planner: RULE. 5 total db hits in 54 ms."
Neo4j performance charasteristics is a tricky area.
Mesuring performance
First of all: it all depends a lot on how server is configured. Measuring anything on laptop is wrong way to do it.
Befor measuring performance you should check following:
You have appropriate server hardware (requirements)
Client and server are in local network.
Neo4j is properly configured (memory mapping, webserver thread pool, java heap size and etc)
Server is properly configured (Linux tcp stack, maximum available open files and etc)
Server is warmed up. Neo4j is written in Java, so you should do appropriate warmup before measuring numbers (i.e. making some load for ~15 minutes).
And last one - enterprise edition. Neo4j enterprise edition has some advanced features that can improve performance a lot (i.e. HPC cache).
Neo4j internally
Neo4j internally is:
Storage
Core API
Traversal API
Cypher API
Everything is performed without any additional network requests. Neo4j server is build on top of this solid foundation.
So, when you are making request to Neo4j server, you are measuring:
Latency between client and server
JSON serialization costs
Web server (Jetty)
Additional modules that are intended for managing locks, transaction and etc
And Neo4j itself
So, bottom line here is - Neo4j is pretty fast by itself, if used in embedded mode. But dealing with Neo4j server involved additional costs.
Numbers
We had internal Neo4j testing. We measured several cases.
Create nodes
Here we are using vanilla Transactional Cypher REST API.
Threads: 2
Node per transaction: 1000
Execution time: 1635
Total nodes created: 7000000
Nodes per second: 7070
Threads: 5
Node per transaction: 750
Execution time: 852
Total nodes created: 7000000
Nodes per second: 8215
Huge database sync
This one uses custom developed unmanaged extension, with binary protocol between server and client and some concurrency.
But this is still Neo4j server (in fact - Neo4j cluster).
Node count: 80.32M (80 320 000)
Relationship count: 80.30M (80 300 000)
Property count: 257.78M (257 780 000)
Consumed time: 2142 seconds
Per second:
Nodes - 37497
Relationships - 37488
Properties - 120345
This numbers shows true Neo4j power.
My numbers
I tried to measure performance right now
Fresh and unconfigured database (2.2.5), Ubuntu 14.04 (VM).
Results:
$ ab -p post_loc.txt -T application/json -c 1 -n 10000 http://localhost:7474/db/data/node
This is ApacheBench, Version 2.3 <$Revision: 1604373 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests
Server Software: Jetty(9.2.4.v20141103)
Server Hostname: localhost
Server Port: 7474
Document Path: /db/data/node
Document Length: 1245 bytes
Concurrency Level: 1
Time taken for tests: 14.082 seconds
Complete requests: 10000
Failed requests: 0
Total transferred: 14910000 bytes
Total body sent: 1460000
HTML transferred: 12450000 bytes
Requests per second: 710.13 [#/sec] (mean)
Time per request: 1.408 [ms] (mean)
Time per request: 1.408 [ms] (mean, across all concurrent requests)
Transfer rate: 1033.99 [Kbytes/sec] received
101.25 kb/s sent
1135.24 kb/s total
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.2 0 19
Processing: 1 1 1.3 1 53
Waiting: 0 1 1.2 1 53
Total: 1 1 1.3 1 54
Percentage of the requests served within a certain time (ms)
50% 1
66% 1
75% 1
80% 1
90% 2
95% 2
98% 3
99% 4
100% 54 (longest request)
This one creates 10000 nodes using REST API, with no properties in 1 thread.
As you can see, event on my laptop in Linux VM, with default settings - Neo4j is able to create nodes in 4ms or less (99%).
Note: I have warmed up database before (created and deleted 100K nodes).
Bolt
If you are looking for best Neo4j performance, you should follow Bolt development. This is new binary protocol for Neo4j server.
More info: here, here and here.
One other thing to try is to run ./bin/neo4j-shell. Since there's no HTTP connection it can help you understand how much is Neo4j and how much is from the HTTP interface.
When I do that on 2.2.2 my CREATEs are generally around 10ms.
I'm not sure what the ideal is and if there is configuration which can improve the performance.

Resources