Efficient use of Docker containers for fuzzing

Efficient use of Docker containers for fuzzing - docker

I've been trying out various fuzzers (AFL, Nautilus, KLEE, etc) on different applications that take a file input and I was looking into pointing the "out" directory of these fuzzers (e.g. afl-fuzz -i in -o out ./app ##) to some sort of partition in memory (like ramfs). Is this necessary for these types of fuzzers? I'm concerned with all of the I/O to my disk for reading and writing files to send to the application.
I came across this answer to a similar question: Running Docker in Memory?
They mentioned that you can use -v to accomplish this. But when I tried to mount the RAM disk using the -v option for the out directory, I saw a significant performance drop in executions/sec in AFL. This dropped from ~2000 execs/sec to ~100 execs/sec. I know this is not because of the RAM disk partition, because using -v without the RAM disk passed in yields the same poor performance. Currently I have been running the fuzzer and then copying the contents over after I stop it to improve the performance. Should I be concerned with the hit on my disk?

Related

var/lib/docker/containers/* eats my hard disk space

My raspberrypi suddenly had no more free space.
By looking at the folder sizes with the following command:
sudo du -h --max-depth=3
I noticed that a docker folder eats an incredible amount of hard disk space. It's the folder
var/lib/docker/containers/*
The folder seems to contain some data for the current running docker containers. The first letters of the filename correspond to the docker container-ID. One folder seems to grow dramatically fast. After stopping the affected container and removed him, the related folder disappeared. So the folder seems to have belonged to it.
Problem solved.
I wonder now what the reason could be that this folder size increases so much. Further, I wonder what is the best way to not run into the same problem again later.
I could write a bash script which removes the related container at boot and run it again. Better ideas are very welcome.

The container ids are directories, so you can look inside to see what is using space in there. The two main reasons are:
Logs from stdout/stdere. These can be limited with added options. You can view these with docker logs.
Filesystem changes. The underlying image filesystem is not changed, so any writes trigger a copy-on-write to a directory within each container id. You can view these with docker diff.

How to bypass memory caching while using FIO inside of a docker container?

I am trying to benchmark I/O performance on my host and docker container using flexible IO tool with O_direct enabled in order to bypass memory caching. The result is very suspicious. docker performs almost 50 times better than my host machine which is impossible. It seems like docker is not bypassing the caching at all. even if I ran it with --privileged mode. This is the command I ran inside of a container, Any suggestions?
fio --name=seqread --rw=read --direct=1 --ioengine=libaio --bs=4k --numjobs=1 --size=10G --runtime=600 --group_reporting --output-format=json >/home/docker/docker_seqread_4k.json

(Note this isn't really a programming question so Stackoverflow is the wrong place to ask this... Maybe Super User or Serverfault would be a better choice and get faster answers?)
The result is very suspicious. docker performs almost 50 times better than my host machine which is impossible. It seems like docker is not bypassing the caching at all.
If your best case latencies are suspiciously small compared to your worst case latencies it is highly likely your suspicions are well founded and that kernel caching is still happening. Asking for O_DIRECT is a hint not an order and the filesystem can choose to ignore it and use the cache anyway (see the part about "You're asking for direct I/O to a file in a filesystem but...").
If you have the option and you're interested in disk speed, it is better to do any such test outside of a container (with all the caveats that implies). Another option when you can't/don't want to disable caching is ensure that you do I/O that is at least two to three times the size (both in terms of amount and the region being used) of RAM so the majority of I/O can't be satisfied by buffers/cache (and if you're doing write I/O then do something like end_fsync=1 too).
In summary, the filesystem being used by docker may make it impossible to accurately do what you're requesting (measure the disk speed by bypassing cache while using whatever your default docker filesystem is).

Why a Docker benchmark may give the results you expect
The Docker engine uses, by default, the OverlayFS [1][2] driver for data storage in a containers. It assembles all of the different layers from the images and makes them readable. Writing is always done to the "top" layer, which is the container storage.
When performing reads and writes to the container's filesystem, you're passing through Docker's overlay2 driver, through the OverlayFS kernel driver, through your filesystem driver (e.g. ext4) and onto your block device. Additionally, as Anon mentioned, DIRECT/O_DIRECT is just a hint, and may not be respected by any of the layers you're passing through.
Getting more accurate results
To get an accurate benchmarks within a Docker container, you should write to a volume mount or change your storage driver to one that is not overlaid, such as the Device Mapper driver or the ZFS driver.
Both the Device Mapper driver and the ZFS driver require a dedicated block device (you'll likely need a separate hard drive), so using a volume mount might be the easiest way to do this.
Use a volume mount
Use the -v options with a directory that sits on a block device on your host.
docker run -v /absolute/host/directory:/container_mount_point alpine
Use a different Docker storage driver
Note that the storage driver must be changed on the Docker daemon (dockerd) and cannot be set per container. From the documentation:
Important: When you change the storage driver, any existing images and containers become inaccessible. This is because their layers cannot be used by the new storage driver. If you revert your changes, you can access the old images and containers again, but any that you pulled or created using the new driver are then inaccessible.
With that disclaimer out of the way, you can change your storage driver by editing daemon.json and restarting dockerd.
{
"storage-driver": "devicemapper",
"storage-opts": [
"dm.directlvm_device=/dev/sd_",
"dm.thinp_percent=95",
"dm.thinp_metapercent=1",
"dm.thinp_autoextend_threshold=80",
"dm.thinp_autoextend_percent=20",
"dm.directlvm_device_force=false"
]
}
Additional container benchmark notes - kernel
If you are trying to compare different flavors of Linux, keep in mind that Docker is still running on your host machine's kernel.

Docker for Mac - sync issue

So I have noticed that on Mac there is a huge problem with sync while developing a PHP app. It can take up to 60 seconds before page loads.
As on Mac, Docker uses additional virtual machine I have used http://docker-sync.io to fix it. But I wonder, are you guys having similar issues? Yesterday I have noticed that there is something called File Sharing in Docker settings
img. As I've put my code at /Volumes/Documents/wwwdata should I have to add it also?

As the author of docker-sync, i might be able to give you an comprehensive answer.
Yet, under macOS, there is no solution with native docker for mac tools, to have a somewhat acceptable development environment - which means, sharing source code into the container - during its lifetime.
The main reasons are, that read and write speed on mounted volumes in docker for mac is extremely slow, see the performance comparison . This said, you could mount a volume using -v or volumes into a normal container, but this will be extremely slow. virtualbox or fusion shares are slow out of the same reasons, OSXFS even right now performs better then those, but still is horrible slow.
Docker-sync tries to detach the slow read/write speed from OSXFS by using unison as sync, not direct mount:
Long story short:
Docker for mac is still (very) slow, this hold even for High Sierra with APFS - unusable for development purposes.
The "folder" you are looking at and named "images" are nothing more then OSXFS based mounts into the hyperkit container, so just what it has been used in the past, you just now can configure other folders to be OSXFS synced and available to be mounted then the default ones. So this will not help you at all either.
To make this answer more balanced towards the general case, you find alternatives to docker-sync here - the amount of alternatives also tells you, that there is ( still ) a huge issue in docker-for-mac, it's not docker-sync made up.

Using memory mapped files with docker and alpine

We have an java application that recently has been ported to docker. This application uses memory mapped files. We have observed a huge performance degradation when doing this change and we are trying to diagnose exactly why this happened.
Our previous setup consisted in using CentOS 6.8, java 8 and files stored in the same filesystem as the application was running. Our new setup consists of Docker 17.03, CentOS 7.4, openjdk:8u131-alpine and mounting a volume to the container that is read-only and holds the files used for memory mapped files.
By using iostat we have seen that the tps is multiple times in the docker solution compared to the non-docker solution. We are not sure if this because the OS is loading parts of the files into memory more often. Using Yourkit and VisualVM we could appreciate that there is some memory issues in the docker solution and after some time, the application runs out of memory. This is probably because some resources are being used somewhere else and the application cannot properly handle all the incoming load.
In addition, we would also like to understand if the memory mapped files use memory inside the container or outside the container since depending we will reserve more or less memory for the container itself.
Also, any suggestion to get a better insight on what the root cause of this issue could be is appreciated.

Running Docker in Memory?

As far as I understand Docker uses memory mapped files to start from image. Since I can do this over and over again and as far as I remember start different instances of the same image in parallel, I guess docker abstracts the file system and stores changes somewhere else.
I wonder if docker can be configured (or does it by default) to run in a memory only mode without some sort of a temporary file?

Docker uses a union filesystem that allows it to work in "layers" (devicemapper, BTRFS, etc). It's doing copy-on-write so that starting new containers is cheap, and when it performs the first write, it actually creates a new layer.
When you start a container from an image, you are not using memory-mapped files to restore a frozen process (unless you built all of that into the image yourself...). Rather, you're starting a normal Unix process but inside a sandbox where it can only see its own unionfs filesystem.
Starting many copies of an image where no copy writes to disk is generally cheap and fast. But if you have a process with a long start-up time, you'll still pay that cost for every instance.
As for running Docker containers wholly in memory, you could create a RAM disk and specify that as Docker's storage volume (configurable, but typically located under /var/lib/docker).
In typical use-cases, I would not expect this to be a useful performance tweak. First, you'll spend a lot of memory holding files you won't access. The base layer of an image contains most Linux system files. If you fetch 10 packages from the Docker Hub, you'll probably hit 20G worth of images easily (after that the storage cost tends to plateau). Second, the system already manages memory and swapping pretty well (which is why a RAM disk is a performance tweak) and you get all of that applied to processes running inside a container. Third, for most of the cases where a RAM disk might help, you can use the -v flag to mount the disk as a volume on the container rather than needing to store your whole unionfs there.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart