cpuPercent metric from docker stats vs cgroups - docker

I am new to cgroups, and trying to get the container stats using cgroups. Previously i was using docker stats but, trying to gather similar metrics with cgroups as well.
In docker stats, cpu stats section is like below:
"cpu_usage": {
"total_usage": 27120642519,
"percpu_usage": [27120642519],
"usage_in_kernelmode": 4550000000,
"usage_in_usermode": 19140000000
"system_cpu_usage": 42803030000000,
And, the cpu % metric is calculated using the below equation:
cpuDelta = float64(v.CpuStats.CpuUsage.TotalUsage - previousCPU)
systemDelta = float64(v.CpuStats.SystemUsage - previousSystem)
cpuPct = cpuDelta/systemDelta
I am looking at cgroups to gather systemUsage and the totalUsage, but it does not seem to have similar metrics:
cgroups has a pseudo file cpuacct.stats which has user and system ticks, but these are matching only with usage_in_user_mode and usage_in_kernel_mode from the docker stats output.
and cpuacct.usage_per_cpu pseudo file has a usage per cpu, which is matching with the total_usage from docker stats output above.
$cat cpuacct.stat
user 1914
system 455
$cat cpuacct.usage_percpu
But, i could not find any way to figure out how to gather "systemUsage" from cgroups.
Any leads will be of great help!

The answer to your question doesn't lies in the cgroups. Please refer the below mentioned point:
func calculateCPUPercentUnix(previousCPU, previousSystem uint64, v *types.StatsJSON) float64 {
var (
cpuPercent = 0.0
// calculate the change for the cpu usage of the container in between readings
cpuDelta = float64(v.CPUStats.CPUUsage.TotalUsage) - float64(previousCPU)
// calculate the change for the entire system between readings
systemDelta = float64(v.CPUStats.SystemUsage) - float64(previousSystem)
if systemDelta > 0.0 && cpuDelta > 0.0 {
cpuPercent = (cpuDelta / systemDelta) * float64(len(v.CPUStats.CPUUsage.PercpuUsage)) * 100.0
return cpuPercent
The "system_cpu_usage" of the Docker stats API refers to the CPU usage of the host.
The "cpu_usage" > "total_usage" of the Docker stats API refers to the per-CPU usage of the container.
Hence after calculation of the (cpuDelta/systemDelta) we get the per-CPU usage per system CPU.
Now we need to multiply the result of the step 3 and the total number of CPU allocated to the docker container to get the total CPU usage per system CPU.
The result of step 4 when multiplied by 100 gives us the CPU utilization in percentage.
Back to question:
How System CPU is calculated by docker?
To calculate the system CPU usage docker uses the "/proc/stat" defined by POSIX. It looks for the CPU statistics line and then sums up the first seven fields provided. The golang code written to perform the required steps is mentioned below.
// getSystemCPUUsage returns the host system's cpu usage in
// nanoseconds. An error is returned if the format of the underlying
// file does not match.
// Uses /proc/stat defined by POSIX. Looks for the cpu
// statistics line and then sums up the first seven fields
// provided. See `man 5 proc` for details on specific field
// information.
func (s *statsCollector) getSystemCPUUsage() (uint64, error) {
var line string
f, err := os.Open("/proc/stat")
if err != nil {
return 0, err
defer func() {
err = nil
for err == nil {
line, err = s.bufReader.ReadString('\n')
if err != nil {
parts := strings.Fields(line)
switch parts[0] {
case "cpu":
if len(parts) < 8 {
return 0, derr.ErrorCodeBadCPUFields
var totalClockTicks uint64
for _, i := range parts[1:8] {
v, err := strconv.ParseUint(i, 10, 64)
if err != nil {
return 0, derr.ErrorCodeBadCPUInt.WithArgs(i, err)
totalClockTicks += v
return (totalClockTicks * nanoSecondsPerSecond) /
s.clockTicksPerSecond, nil
return 0, derr.ErrorCodeBadStatFormat
Please match the "system_cpu_usage" of docker stats API with the output of below mentioned command to confirm:
cat /proc/stat|grep -w cpu|awk '{split($0,a,\" \"); sum=0; for(i=2;i<8;i++)(sum+=a[i])} END{print sum }'


How to find the byte size of a table in Lua

I'm writing a log aggregator and I want to send the logs if it reaches a max byte size. Thus is there a way in Lua to get to know the size of the variable (active_batch size)?
local batch = {
batch_to_execute = {},
active_batch = { entries = {}, count = 0, retries = 0 }
You only can have total memory used by LUA by collectgarbage.
In this case I think that storing string len and sum of it will work.

How can I maximize throughput in Docker and Akka HTTP?

I am building a specific jig for performance measurement. I have a load generator, boom (https://github.com/rakyll/boom). With this I can generate a pretty decent amount of load.
I also have a Docker image containing nginx as a load balancer, and two Akka-HTTP based REST servers. These do nothing except count hits (they always just return 200).
Running one of these servers stand-alone (outside the Docker) I have been able to get 1000 hits/second. Not sure if that's good or not. In this Docker configuration that figure drops to about 220 hits/second. I was kinda expecting, well... 2000 hits/second or thereabouts. Higher would even be better. I'd be happy if I can find a way to get 3-4K hits/sec with this arrangement.
I often get an error message like this:
[9549] Get dial tcp socket: too many open files
Tried running my Docker with --ulimit nofile=2048, but that didn't help. My application.conf for Akka is merely:
akka {
loglevel = "ERROR"
stdout-loglevel = "ERROR"
http.host-connection-pool.max-open-requests = 512
The server code:
object Main extends App {
implicit val system = ActorSystem()
implicit val mat = ActorMaterializer()
println(":: Starting Simulator on port "+args(0))
Http().bindAndHandle(route, java.net.InetAddress.getLoopbackAddress.getHostAddress, args(0).toInt)
var hits = 0
var isTiming = false
var numSec = 1
lazy val route =
get {
path("dispatcher") {
if(isTiming) hits += 1
} ~
path("startTiming" / IntNumber) { sec =>
isTiming = true
hits = 0
numSec = sec
val timeUnit = FiniteDuration(sec, SECONDS)
system.scheduler.scheduleOnce(timeUnit){ isTiming = false }
} ~
path("tps") {
val tps = hits/numSec * 2
complete(s"""${args(0)}: TPS-$tps\n""")
Theory of operation: Start traffic flowing then call the /startTiming/10 endpoint (for a 10-second capture on one of the 2 servers). After 10 seconds, call /tps a couple of times and the timing node will return approx. hits/second (x2).
Any idea how I can get more performance out of this?

Continuous memory increase when declaring very large array and iterating over stdin

The following code declares two arrays, and then iterates over stdin ( just blindly iterates over the file - no interaction with the arrays ).
This is causing continuous increase in memory.
However, if I just declare two arrays and sleep - there is no increase in memory.
Similarly, if I just iterate over stdin - there is no increase in memory.
But together ( apart from the memory allocated for the arrays) there is a continuous increase.
I measure this by looking at the RES memory using top tool.
I have commented out the first few lines in func doSomething() to show that there is no memory increase when it is commented. Uncommenting the lines and running will cause an increase.
NOTE: This was run on go 1.4.2, 1.5.3 and 1.6
NOTE: You will need to recreate this on a machine with at least 16GB RAM as I have observed it only on the array size of 1 billion.
package main
import (
type MyStruct struct {
arr1 []int
arr2 []int
func (ms *MyStruct) Init(size int, arr1 []int, arr2 []int) error {
fmt.Printf("initializing mystruct arr1...\n")
ms.arr1 = arr1
if ms.arr1 == nil {
ms.arr1 = make([]int, size, size)
fmt.Printf("initializing mystruct arr2...\n")
ms.arr2 = arr2
if ms.arr2 == nil {
ms.arr2 = make([]int, size, size)
fmt.Printf("done initializing ...\n")
for i := 0; i < size; i++ {
ms.arr1[i] = 0
ms.arr2[i] = 0
return nil
func doSomething() error {
ms := &MyStruct{}
size := 1000000000
ms.Init(size, nil, nil)
fmt.Printf("finished allocating..%d %d\n", len(ms.arr1), len(ms.arr2))
fmt.Printf("reading from stdin...\n")
reader := bufio.NewReader(os.Stdin)
var line string
var readErr error
var lineNo int = 0
for {
if lineNo%1000000 == 0 {
fmt.Printf("read %d lines...\n", lineNo)
line, readErr = reader.ReadString('\n')
if readErr != nil {
fmt.Printf("break at %s\n", line)
if readErr == io.EOF {
readErr = nil
if readErr != nil {
return readErr
return nil
func main() {
if err := doSomething(); err != nil {
Is this an issue with my code ? Or is the go system doing something unintended ?
If its the latter, how can I go about debugging this ?
To make it easier to replicate here are pastebin files for good case ( commented portion of the above code) and bad case ( with uncommented portion )
wget http://pastebin.com/raw/QfG22xXk -O badcase.go
yes "1234567890" | go run badcase.go
wget http://pastebin.com/raw/G9xS2fKy -O goodcase.go
yes "1234567890" | go run goodcase.go
Thank you Volker for your above comments. I wanted to capture the process of debugging this as an answer.
The RES top / htop just tells you at a process level what is going on with memory. GODEBUG="gctrace=1" gives you more insight into how the memory is being handled.
A simple run with gctrace set gives the following
root#localhost ~ # yes "12345678901234567890123456789012" | GODEBUG="gctrace=1" go run badcase.go
initializing mystruct arr1...
initializing mystruct arr2...
gc 1 #0.050s 0%: 0.19+0.23+0.068 ms clock, 0.58+0.016/0.16/0.25+0.20 ms cpu, 7629->7629->7629 MB, 7630 MB goal, 8 P
done initializing ...
gc 2 #0.100s 0%: 0.070+2515+0.23 ms clock, 0.49+0.025/0.096/0.24+1.6finished allocating..1000000000 1000000000
ms cpu, 15258->15258reading from stdin...
->15258 MB, 15259read 0 lines...
MB goal, 8 P
gc 3 #2.620s 0%: 0.009+0.32+0.23 ms clock, 0.072+0/0.20/0.11+1.8 ms cpu, 15259->15259->15258 MB, 30517 MB goal, 8 P
read 1000000 lines...
read 2000000 lines...
read 3000000 lines...
read 4000000 lines...
read 51000000 lines...
read 52000000 lines...
read 53000000 lines...
read 54000000 lines...
What does this mean ?
As you can see, the gc hasn't been called for a while now. This means that all the garbage generated from reader.ReadString hasn't been collected and free'd.
Why isn't the garbage collector collecting this garbage ?
From The go gc
Instead we provide a single knob, called GOGC. This value controls
the total size of the heap relative to the size of reachable objects.
The default value of 100 means that total heap size is now 100% bigger
than (i.e., twice) the size of the reachable objects after the last
Since GOGC wasn't set - the default was 100%. So, it would have collected the garbage only when it reached ~32GB. ( Since initially the two arrays give you 16GB of heap space - only when heap doubles will the gc trigger ).
How can I change this ?
Try setting the GOGC=25.
With the GOGC as 25
root#localhost ~ # yes "12345678901234567890123456789012" | GODEBUG="gctrace=1" GOGC=25 go run badcase.go
initializing mystruct arr1...
initializing mystruct arr2...
gc 1 #0.051s 0%: 0.14+0.30+0.11 ms clock, 0.42+0.016/0.31/0.094+0.35 ms cpu, 7629->7629->7629 MB, 7630 MB goal, 8 P
done initializing ...
finished allocating..1000000000 1000000000
gc 2 #0.102s reading from stdin...
12%: 0.058+2480+0.26 ms clock, 0.40+0.022/2480/0.10+1.8 ms cpu, 15258->15258->15258 MB, 15259 MB goal, 8 P
read 0 lines...
gc 3 #2.584s 12%: 0.009+0.20+0.22 ms clock, 0.075+0/0.24/0.046+1.8 ms cpu, 15259->15259->15258 MB, 19073 MB goal, 8 P
read 1000000 lines...
read 2000000 lines...
read 3000000 lines...
read 4000000 lines...
read 19000000 lines...
read 20000000 lines...
gc 4 #6.539s 4%: 0.019+2.3+0.23 ms clock, 0.15+0/2.1/12+1.8 ms cpu, 17166->17166->15258 MB, 19073 MB goal, 8 P
As you can see, another gc was triggered.
But top/htop shows it stable at ~20 GB instead of the calculated 16GB.
The garbage collector doesn't "have" to give it back to the OS. It will sometimes keep it to use efficiently for the future. It doesn't have to keep taking from the OS and giving back - The extra 4 gb is in its pool of free space to use before asking the OS again.

Avoiding excessive memory allocation in golang when using an io.Writer

I am working on a command line tool in Go called redis-mass that converts a bunch of redis commands into redis protocol format.
The first step was to port the node.js version, almost literally to Go. I used ioutil.ReadFile(inputFileName) to get a string version of the file and then returned an encoded string as output.
When I ran this on a file with 2,000,000 redis commands, it took about 8 seconds, compared to about 16 seconds with the node version. I guessed that the reason it was only twice as fast was because it was reading the whole file into memory first, so I changed my encoding function to accept a pair (raw io.Reader, enc io.Writer), and it looks like this:
func EncodeStream(raw io.Reader, enc io.Writer) {
var args []string
var length int
scanner := bufio.NewScanner(raw)
for scanner.Scan() {
command := strings.TrimSpace(scanner.Text())
args = parse(command)
length = len(args)
if length > 0 {
io.WriteString(enc, fmt.Sprintf("*%d\r\n", length))
for _, arg := range args {
io.WriteString(enc, fmt.Sprintf("$%d\r\n%s\r\n", len(arg), arg))
However, this took 12 seconds on the 2 million line file, so I used github.com/pkg/profile to see how it was using memory, and it looks like the number of memory allocations is huge:
# Alloc = 3162912
# TotalAlloc = 1248612816
# Mallocs = 46001048
# HeapAlloc = 3162912
Can I constrain the io.Writer to use a fixed sized buffer and avoid all those allocations?
More generally, how can I avoid excessive allocations in this method? Here's the full source for more context
Reduce allocations by working with []byte instead of strings. fmt.Printf directly to the output instead of fmt.Sprintf and io.WriteString.
func EncodeStream(raw io.Reader, enc io.Writer) {
var args []string
var length int
scanner := bufio.NewScanner(raw)
for scanner.Scan() {
command := bytes.TrimSpace(scanner.Bytes())
args = parse(command)
length = len(args)
if length > 0 {
fmt.Printf(enc, "*%d\r\n", length))
for _, arg := range args {
fmt.Printf(enc, "$%d\r\n%s\r\n", len(arg), arg))

Get Docker Container CPU Usage as Percentage

Docker provides an interactive stats command, docker stats [cid] which gives up to date information on the CPU usage, like so:
36e8a65d 0.03% 4.086 MiB/7.798 GiB 0.05% 281.3 MiB/288.3 MiB
I'm trying to get the CPU usage as a percentage in a digestible format to do some analysis.
I've seen the stats in /sys/fs which seem to provide similar values as the Docker Remote API which gives me this JSON blob:
"cpu_usage": {
"usage_in_usermode": 345230000000,
"total_usage": 430576697133,
"percpu_usage": [
"usage_in_kernelmode": 80670000000
"system_cpu_usage": 440576670000000,
"throttling_data": {
"throttled_time": 0,
"periods": 0,
"throttled_periods": 0
But I'm unsure how to get an exact CPU Usage as a percentage from that.
Any ideas?
If you are going to use the Stats API call - you can take a look at how the docker client does it: https://github.com/docker/docker/blob/eb131c5383db8cac633919f82abad86c99bffbe5/cli/command/container/stats_helpers.go#L175-L188
func calculateCPUPercent(previousCPU, previousSystem uint64, v *types.StatsJSON) float64 {
var (
cpuPercent = 0.0
// calculate the change for the cpu usage of the container in between readings
cpuDelta = float64(v.CPUStats.CPUUsage.TotalUsage) - float64(previousCPU)
// calculate the change for the entire system between readings
systemDelta = float64(v.CPUStats.SystemUsage) - float64(previousSystem)
if systemDelta > 0.0 && cpuDelta > 0.0 {
cpuPercent = (cpuDelta / systemDelta) * float64(len(v.CPUStats.CPUUsage.PercpuUsage)) * 100.0
return cpuPercent
Basically, you take a point of reference, then see the difference in say 10 secs, you can then tell how much of the time was used by the container. Say, we start with 0 SystemCPUUsage and 0 CPUUsage for the container. If after 10 secs, we have 10 SystemCPUUsage and 1 CPUUsage, then we have 10% usage. You are just given the results in nanoseconds, not seconds, in the API. The actual time does not matter, the total SystemCPUUsage change is what matters, then compare CPUUSage to that.
After we consume the remote api we get these fields: precpu_stats/cpu_stats
Then, basically here is the code: (javascript example)
var res <---- remote api response
var cpuDelta = res.cpu_stats.cpu_usage.total_usage - res.precpu_stats.cpu_usage.total_usage;
var systemDelta = res.cpu_stats.system_cpu_usage - res.precpu_stats.system_cpu_usage;
var RESULT_CPU_USAGE = cpuDelta / systemDelta * 100;
Just to clarify the RESULT_CPU_USAGE... it's the amount of resource consumed from your physical hardware, so supposing you are getting RESULT_CPU_USAGE as 50%, it means that 50% of all your PC power is being used by container X
So I need this also, and the following gives me the correct CPU usage, factoring in number of cores.
var cpuDelta = metric.cpu_stats.cpu_usage.total_usage - metric.precpu_stats.cpu_usage.total_usage;
var systemDelta = metric.cpu_stats.system_cpu_usage - metric.precpu_stats.system_cpu_usage;
var RESULT_CPU_USAGE = cpuDelta / systemDelta * metric.cpu_stats.cpu_usage.percpu_usage.length * 100;
