How can I maximize throughput in Docker and Akka HTTP? - docker

I am building a specific jig for performance measurement. I have a load generator, boom (https://github.com/rakyll/boom). With this I can generate a pretty decent amount of load.
I also have a Docker image containing nginx as a load balancer, and two Akka-HTTP based REST servers. These do nothing except count hits (they always just return 200).
Running one of these servers stand-alone (outside the Docker) I have been able to get 1000 hits/second. Not sure if that's good or not. In this Docker configuration that figure drops to about 220 hits/second. I was kinda expecting, well... 2000 hits/second or thereabouts. Higher would even be better. I'd be happy if I can find a way to get 3-4K hits/sec with this arrangement.
I often get an error message like this:
[9549] Get http://192.168.99.100:9090/dispatcher?reply_to=foo: dial tcp 192.168.99.100:9090: socket: too many open files
Tried running my Docker with --ulimit nofile=2048, but that didn't help. My application.conf for Akka is merely:
akka {
loglevel = "ERROR"
stdout-loglevel = "ERROR"
http.host-connection-pool.max-open-requests = 512
}
The server code:
object Main extends App {
implicit val system = ActorSystem()
implicit val mat = ActorMaterializer()
println(":: Starting Simulator on port "+args(0))
Http().bindAndHandle(route, java.net.InetAddress.getLoopbackAddress.getHostAddress, args(0).toInt)
var hits = 0
var isTiming = false
var numSec = 1
lazy val route =
get {
path("dispatcher") {
if(isTiming) hits += 1
complete(StatusCodes.OK)
} ~
path("startTiming" / IntNumber) { sec =>
isTiming = true
hits = 0
numSec = sec
val timeUnit = FiniteDuration(sec, SECONDS)
system.scheduler.scheduleOnce(timeUnit){ isTiming = false }
complete(StatusCodes.OK)
} ~
path("tps") {
val tps = hits/numSec * 2
complete(s"""${args(0)}: TPS-$tps\n""")
}
}
}
Theory of operation: Start traffic flowing then call the /startTiming/10 endpoint (for a 10-second capture on one of the 2 servers). After 10 seconds, call /tps a couple of times and the timing node will return approx. hits/second (x2).
Any idea how I can get more performance out of this?

Related

Kafka sink to InfluxDB

I'm trying to get data from my kafka topic into InfluxDB using the Confluent/Kafka stack. At the moment, the messages in the topic have a form of {"tag1":"123","tag2":"456"} (I have relatively good control over the message format, I chose the JSON to be as above, could include a timestamp etc if necessary).
Ideally, I would like to add many tags without needing to specify a schema/column names in the future.
I followed https://docs.confluent.io/kafka-connect-influxdb/current/influx-db-sink-connector/index.html (the "Schemaless JSON tags example") as this matches my use case quite closely. The "key" of each message is currently just the MQTT topic name (the topic's source is an MQTT connector). So I set the "key.converter" to "stringconverter" (instead of JSONconverter as in the example).
Other examples I've seen online seem to suggest the need for a schema to be set, which I'd like to avoid. Using InfluxDB v1.8, everything on Docker/maintained on Portainer.
I cannot seem to start the connector and never get any data to move across.
Below is the config for my InfluxDBSink Connector:
{
"name": "InfluxDBSinkKafka",
"config": {
"key.converter.schemas.enable": "false",
"value.converter.schemas.enable": "false",
"name": "InfluxDBSinkKafka",
"connector.class": "io.confluent.influxdb.InfluxDBSinkConnector",
"tasks.max": "1",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"topics": "KAFKATOPIC1",
"influxdb.url": "http://URL:PORT",
"influxdb.db": "tagdata",
"measurement.name.format": "${topic}"
}
}
The connector fails, and each time I click "start" (the play button) the following pops up in the connect container's logs:
[2022-03-22 15:46:52,562] INFO [Worker clientId=connect-1, groupId=compose-connect-group]
Connector InfluxDBSinkKafka target state change (org.apache.kafka.connect.runtime.distributed.DistributedHerder)
[2022-03-22 15:46:52,562] INFO Setting connector InfluxDBSinkKafka state to STARTED (org.apache.kafka.connect.runtime.Worker)
[2022-03-22 15:46:52,562] INFO SinkConnectorConfig values:
config.action.reload = restart
connector.class = io.confluent.influxdb.InfluxDBSinkConnector
errors.deadletterqueue.context.headers.enable = false
errors.deadletterqueue.topic.name =
errors.deadletterqueue.topic.replication.factor = 3
errors.log.enable = false
errors.log.include.messages = false
errors.retry.delay.max.ms = 60000
errors.retry.timeout = 0
errors.tolerance = none
header.converter = null
key.converter = class org.apache.kafka.connect.storage.StringConverter
name = InfluxDBSinkKafka
predicates = []
tasks.max = 1
topics = [KAFKATOPIC1]
topics.regex =
transforms = []
value.converter = class org.apache.kafka.connect.json.JsonConverter
(org.apache.kafka.connect.runtime.SinkConnectorConfig)
[2022-03-22 15:46:52,563] INFO EnrichedConnectorConfig values:
config.action.reload = restart
connector.class = io.confluent.influxdb.InfluxDBSinkConnector
errors.deadletterqueue.context.headers.enable = false
errors.deadletterqueue.topic.name =
errors.deadletterqueue.topic.replication.factor = 3
errors.log.enable = false
errors.log.include.messages = false
errors.retry.delay.max.ms = 60000
errors.retry.timeout = 0
errors.tolerance = none
header.converter = null
key.converter = class org.apache.kafka.connect.storage.StringConverter
name = InfluxDBSinkKafka
predicates = []
tasks.max = 1
topics = [KAFKATOPIC1]
topics.regex =
transforms = []
value.converter = class org.apache.kafka.connect.json.JsonConverter
(org.apache.kafka.connect.runtime.ConnectorConfig$EnrichedConnectorConfig)
I am feeling a little out of my depth and would appreciate any and all help.
The trick here is getting the data in the right format to Kafka in the first place. My MQTT source stream needed to have the value converter set to Bytearray with e schema url and schema = true. Then the Influx Sink started working when I used the jsonconverter, with schema=false. Then it started working. This is deceptive because the message queue looks the same with different valueconverters for the MQTT source connecter, so it took a while to figure out that was the problem.
After getting this working, and realising the confluent stack was perhaps a little overkill for this task, I went with the (much) easier route of pushing MQTT directly to Telegraf and having Telegraf push into InfluxDB. I would recommend this.

Bandwidth Speed Test Slower Than Expected

Im trying to create a bandwidth test using Dart using a LibreSpeed server, but for some reason it's reporting a much slower speed than expected. When I test using the web client for LibreSpeed I get around 30mb/s download bandwidth, however with my Dart program I'm only getting around 4mb/s.
The code for the speed test is as follows:
Future<void> start() async {
var rand = Random();
var req = await client.getUrl(Uri.http(_serverAddress, '/garbage.php',
{'r': rand.nextDouble().toString(), 'ckSize': '20'}));
var resp = await req.close();
var bytesDownloaded = 0;
var start = DateTime.now();
await for (var bytes in resp) {
bytesDownloaded += bytes.length;
}
var timeTaken = DateTime.now().difference(start).inSeconds;
var mbsDownloaded = bytesDownloaded / 1000000;
print(
'$mbsDownloaded megabytes downloaded in $timeTaken seconds at a rate of ${mbsDownloaded / timeTaken} mbs per second');
}
I think I'm probably not understanding a crucial reason as to why it appears to be so slow compared to the web client. Can anyone give me any ideas as to what the bottleneck might be?
The problem is confusion between the use of units when measure internet speed. In general terms there are two ways we can measure the speed:
Mbps (MegaBit Per Second)
MB/s (MegaByte per second)
To understand your problem we need to notice that 1 byte = 8 bits. Also, that the unit LibreSpeed (and most internet providers) uses is Mbps.
The unit your current program are measuring in is MB/s since you are using the length of the list (each element of the list is 1 byte = 8 bit:
await for (var bytes in resp) {
bytesDownloaded += bytes.length;
}
And never multiply that number by 8 later in your code. You are then comparing this number against the number from LibreSpeed which uses Mbps (same as most internet providers) which means your number is 8 times smaller that expected.

cpuPercent metric from docker stats vs cgroups

I am new to cgroups, and trying to get the container stats using cgroups. Previously i was using docker stats but, trying to gather similar metrics with cgroups as well.
In docker stats, cpu stats section is like below:
"cpu_usage": {
"total_usage": 27120642519,
"percpu_usage": [27120642519],
"usage_in_kernelmode": 4550000000,
"usage_in_usermode": 19140000000
},
"system_cpu_usage": 42803030000000,
And, the cpu % metric is calculated using the below equation:
cpuDelta = float64(v.CpuStats.CpuUsage.TotalUsage - previousCPU)
systemDelta = float64(v.CpuStats.SystemUsage - previousSystem)
cpuPct = cpuDelta/systemDelta
I am looking at cgroups to gather systemUsage and the totalUsage, but it does not seem to have similar metrics:
cgroups has a pseudo file cpuacct.stats which has user and system ticks, but these are matching only with usage_in_user_mode and usage_in_kernel_mode from the docker stats output.
and cpuacct.usage_per_cpu pseudo file has a usage per cpu, which is matching with the total_usage from docker stats output above.
$cat cpuacct.stat
user 1914
system 455
$cat cpuacct.usage_percpu
27120642519
But, i could not find any way to figure out how to gather "systemUsage" from cgroups.
Any leads will be of great help!
Thanks!
The answer to your question doesn't lies in the cgroups. Please refer the below mentioned point:
func calculateCPUPercentUnix(previousCPU, previousSystem uint64, v *types.StatsJSON) float64 {
var (
cpuPercent = 0.0
// calculate the change for the cpu usage of the container in between readings
cpuDelta = float64(v.CPUStats.CPUUsage.TotalUsage) - float64(previousCPU)
// calculate the change for the entire system between readings
systemDelta = float64(v.CPUStats.SystemUsage) - float64(previousSystem)
)
if systemDelta > 0.0 && cpuDelta > 0.0 {
cpuPercent = (cpuDelta / systemDelta) * float64(len(v.CPUStats.CPUUsage.PercpuUsage)) * 100.0
}
return cpuPercent
}
The "system_cpu_usage" of the Docker stats API refers to the CPU usage of the host.
The "cpu_usage" > "total_usage" of the Docker stats API refers to the per-CPU usage of the container.
Hence after calculation of the (cpuDelta/systemDelta) we get the per-CPU usage per system CPU.
Now we need to multiply the result of the step 3 and the total number of CPU allocated to the docker container to get the total CPU usage per system CPU.
The result of step 4 when multiplied by 100 gives us the CPU utilization in percentage.
Back to question:
How System CPU is calculated by docker?
To calculate the system CPU usage docker uses the "/proc/stat" defined by POSIX. It looks for the CPU statistics line and then sums up the first seven fields provided. The golang code written to perform the required steps is mentioned below.
https://github.com/rancher/docker/blob/3f1b16e236ad4626e02f3da4643023454d7dbb3f/daemon/stats_collector_unix.go#L137
// getSystemCPUUsage returns the host system's cpu usage in
// nanoseconds. An error is returned if the format of the underlying
// file does not match.
//
// Uses /proc/stat defined by POSIX. Looks for the cpu
// statistics line and then sums up the first seven fields
// provided. See `man 5 proc` for details on specific field
// information.
func (s *statsCollector) getSystemCPUUsage() (uint64, error) {
var line string
f, err := os.Open("/proc/stat")
if err != nil {
return 0, err
}
defer func() {
s.bufReader.Reset(nil)
f.Close()
}()
s.bufReader.Reset(f)
err = nil
for err == nil {
line, err = s.bufReader.ReadString('\n')
if err != nil {
break
}
parts := strings.Fields(line)
switch parts[0] {
case "cpu":
if len(parts) < 8 {
return 0, derr.ErrorCodeBadCPUFields
}
var totalClockTicks uint64
for _, i := range parts[1:8] {
v, err := strconv.ParseUint(i, 10, 64)
if err != nil {
return 0, derr.ErrorCodeBadCPUInt.WithArgs(i, err)
}
totalClockTicks += v
}
return (totalClockTicks * nanoSecondsPerSecond) /
s.clockTicksPerSecond, nil
}
}
return 0, derr.ErrorCodeBadStatFormat
}
Please match the "system_cpu_usage" of docker stats API with the output of below mentioned command to confirm:
cat /proc/stat|grep -w cpu|awk '{split($0,a,\" \"); sum=0; for(i=2;i<8;i++)(sum+=a[i])} END{print sum }'

Get Docker Container CPU Usage as Percentage

Docker provides an interactive stats command, docker stats [cid] which gives up to date information on the CPU usage, like so:
CONTAINER CPU % MEM USAGE/LIMIT MEM % NET I/O
36e8a65d 0.03% 4.086 MiB/7.798 GiB 0.05% 281.3 MiB/288.3 MiB
I'm trying to get the CPU usage as a percentage in a digestible format to do some analysis.
I've seen the stats in /sys/fs which seem to provide similar values as the Docker Remote API which gives me this JSON blob:
{
"cpu_usage": {
"usage_in_usermode": 345230000000,
"total_usage": 430576697133,
"percpu_usage": [
112999686856,
106377031910,
113291361597,
97908616770
],
"usage_in_kernelmode": 80670000000
},
"system_cpu_usage": 440576670000000,
"throttling_data": {
"throttled_time": 0,
"periods": 0,
"throttled_periods": 0
}
}
But I'm unsure how to get an exact CPU Usage as a percentage from that.
Any ideas?
If you are going to use the Stats API call - you can take a look at how the docker client does it: https://github.com/docker/docker/blob/eb131c5383db8cac633919f82abad86c99bffbe5/cli/command/container/stats_helpers.go#L175-L188
func calculateCPUPercent(previousCPU, previousSystem uint64, v *types.StatsJSON) float64 {
var (
cpuPercent = 0.0
// calculate the change for the cpu usage of the container in between readings
cpuDelta = float64(v.CPUStats.CPUUsage.TotalUsage) - float64(previousCPU)
// calculate the change for the entire system between readings
systemDelta = float64(v.CPUStats.SystemUsage) - float64(previousSystem)
)
if systemDelta > 0.0 && cpuDelta > 0.0 {
cpuPercent = (cpuDelta / systemDelta) * float64(len(v.CPUStats.CPUUsage.PercpuUsage)) * 100.0
}
return cpuPercent
}
Basically, you take a point of reference, then see the difference in say 10 secs, you can then tell how much of the time was used by the container. Say, we start with 0 SystemCPUUsage and 0 CPUUsage for the container. If after 10 secs, we have 10 SystemCPUUsage and 1 CPUUsage, then we have 10% usage. You are just given the results in nanoseconds, not seconds, in the API. The actual time does not matter, the total SystemCPUUsage change is what matters, then compare CPUUSage to that.
After we consume the remote api we get these fields: precpu_stats/cpu_stats
Then, basically here is the code: (javascript example)
var res <---- remote api response
var cpuDelta = res.cpu_stats.cpu_usage.total_usage - res.precpu_stats.cpu_usage.total_usage;
var systemDelta = res.cpu_stats.system_cpu_usage - res.precpu_stats.system_cpu_usage;
var RESULT_CPU_USAGE = cpuDelta / systemDelta * 100;
Just to clarify the RESULT_CPU_USAGE... it's the amount of resource consumed from your physical hardware, so supposing you are getting RESULT_CPU_USAGE as 50%, it means that 50% of all your PC power is being used by container X
So I need this also, and the following gives me the correct CPU usage, factoring in number of cores.
var cpuDelta = metric.cpu_stats.cpu_usage.total_usage - metric.precpu_stats.cpu_usage.total_usage;
var systemDelta = metric.cpu_stats.system_cpu_usage - metric.precpu_stats.system_cpu_usage;
var RESULT_CPU_USAGE = cpuDelta / systemDelta * metric.cpu_stats.cpu_usage.percpu_usage.length * 100;
console.log(RESULT_CPU_USAGE);

Console Print Speed

I’ve been looking at a few example programs in order to find better ways to code with Dart.
Not that this example (below) is of any particular importance, however it is taken from rosettacode dot org with alterations by me to (hopefully) bring it up-to-date.
The point of this posting is with regard to Benchmarks and what may be detrimental to results in Dart in some Benchmarks in terms of the speed of printing to the console compared to other languages. I don’t know what the comparison is (to other languages), however in Dart, the Console output (at least in Windows) appears to be quite slow even using StringBuffer.
As an aside, in my test, if n1 is allowed to grow to 11, the total recursion count = >238 million, and it takes (on my laptop) c. 2.9 seconds to run Example 1.
In addition, of possible interest, if the String assignment is altered to int, without printing, no time is recorded as elapsed (Example 2).
Typical times on my low-spec laptop (run from the Console - Windows).
Elapsed Microseconds (Print) = 26002
Elapsed Microseconds (StringBuffer) = 9000
Elapsed Microseconds (no Printing) = 3000
Obviously in this case, console print times are a significant factor relative to computation etc. times.
So, can anyone advise how this compares to eg. Java times for console output? That would at least be an indication as to whether Dart is particularly slow in this area, which may be relevant to some Benchmarks. Incidentally, times when running in the Dart Editor incur a negligible penalty for printing.
// Example 1. The base code for the test (Ackermann).
main() {
for (int m1 = 0; m1 <= 3; ++m1) {
for (int n1 = 0; n1 <= 4; ++n1) {
print ("Acker(${m1}, ${n1}) = ${fAcker(m1, n1)}");
}
}
}
int fAcker(int m2, int n2) => m2==0 ? n2+1 : n2==0 ?
fAcker(m2-1, 1) : fAcker(m2-1, fAcker(m2, n2-1));
The altered code for the test.
// Example 2 //
main() {
fRunAcker(1); // print
fRunAcker(2); // StringBuffer
fRunAcker(3); // no printing
}
void fRunAcker(int iType) {
String sResult;
StringBuffer sb1;
Stopwatch oStopwatch = new Stopwatch();
oStopwatch.start();
List lType = ["Print", "StringBuffer", "no Printing"];
if (iType == 2) // Use StringBuffer
sb1 = new StringBuffer();
for (int m1 = 0; m1 <= 3; ++m1) {
for (int n1 = 0; n1 <= 4; ++n1) {
if (iType == 1) // print
print ("Acker(${m1}, ${n1}) = ${fAcker(m1, n1)}");
if (iType == 2) // StringBuffer
sb1.write ("Acker(${m1}, ${n1}) = ${fAcker(m1, n1)}\n");
if (iType == 3) // no printing
sResult = "Acker(${m1}, ${n1}) = ${fAcker(m1, n1)}\n";
}
}
if (iType == 2)
print (sb1.toString());
oStopwatch.stop();
print ("Elapsed Microseconds (${lType[iType-1]}) = "+
"${oStopwatch.elapsedMicroseconds}");
}
int fAcker(int m2, int n2) => m2==0 ? n2+1 : n2==0 ?
fAcker(m2-1, 1) : fAcker(m2-1, fAcker(m2, n2-1));
//Typical times on my low-spec laptop (run from the console).
// Elapsed Microseconds (Print) = 26002
// Elapsed Microseconds (StringBuffer) = 9000
// Elapsed Microseconds (no Printing) = 3000
I tested using Java, which was an interesting exercise.
The results from this small test indicate that Dart takes about 60% longer for the console output than Java, using the results from the fastest for each. I really need to do a larger test with more terminal output, which I will do.
In terms of "computational" speed with no output, using this test and m = 3, and n = 10, the comparison is consistently around 530 milliseconds for Java compared to 580 milliseconds for Dart. That is 59.5 million calls. Java bombs with n = 11 (238 million calls), which I presume is stack overflow. I'm not saying that is a definitive benchmark of much, but it is an indication of something. Dart appears to be very close in the computational time which is pleasing to see. I altered the Dart code from using the "question mark operator" to use "if" statements the same as Java, and that appears to be a bit faster c. 10% or more, and that appeared to be consistently the case.
I ran a further test for console printing as shown below (example 1 – Dart), (Example 2 – Java).
The best times for each are as follows (100,000 iterations) :
Dart 47 seconds.
Java 22 seconds.
Dart Editor 2.3 seconds.
While it is not earth-shattering, it does appear to illustrate that for some reason (a) Dart is slow with console output, and (b) Dart-Editor is extremely fast with console output. (c) This needs to be taken into account when evaluating any performance that involves console output, which is what initially drew my attention to it.
Perhaps when they have time :) the Dart team could look at this if it is considered worthwhile.
Example 1 - Dart
// Dart - Test 100,000 iterations of console output //
Stopwatch oTimer = new Stopwatch();
main() {
// "warm-up"
for (int i1=0; i1 < 20000; i1++) {
print ("The quick brown fox chased ...");
}
oTimer.reset();
oTimer.start();
for (int i2=0; i2 < 100000; i2++) {
print ("The quick brown fox chased ....");
}
oTimer.stop();
print ("Elapsed time = ${oTimer.elapsedMicroseconds/1000} milliseconds");
}
Example 2 - Java
public class console001
{
// Java - Test 100,000 iterations of console output
public static void main (String [] args)
{
// warm-up
for (int i1=0; i1<20000; i1++)
{
System.out.println("The quick brown fox jumped ....");
}
long tmStart = System.nanoTime();
for (int i2=0; i2<100000; i2++)
{
System.out.println("The quick brown fox jumped ....");
}
long tmEnd = System.nanoTime() - tmStart;
System.out.println("Time elapsed in microseconds = "+(tmEnd/1000));
}
}

Resources