Testing minimal docker containers with healthcheck - docker

I have 5 containers running one after another. First 3, (ABC) are very minimal. ABC containers need to be health checked, but curl,wget cannot be run on them, so currently I just run test:[CMD-SHELL], "whoami || exit 1" in docker-compose.yml. Which seems to bring them to a healthy state. Other 2 (DE) dependent on ABC to be healthy are being checked using test: [CMD-SHELL] , "curl --fail http://localhost" command. My question is how can I properly check health of those minimal containers, without using curl, wget etc. ?

If you can live with a TCP connection test to your internal service's port, you could use /dev/tcp for this:
HEALTHCHECK CMD bash -c 'echo -n > /dev/tcp/127.0.0.1/<port>'
Like this:
# PASS (webserver is serving on 8099)
root#ab7470ea0c8b:/app# echo -n > /dev/tcp/127.0.0.1/8099
root#ab7470ea0c8b:/app# echo $?
0
# FAIL (webserver is NOT serving on 9000)
root#ab7470ea0c8b:/app# echo -n > /dev/tcp/127.0.0.1/9000
bash: connect: Connection refused
bash: /dev/tcp/127.0.0.1/9000: Connection refused
root#ab7470ea0c8b:/app# echo $?
1
Unfortunately, I think this is the best that can be done without installing curl or wget.

Related

Cannot get docker healthcheck to work with ECS Fargate v 1.4.0

I have a health check defined for my ECS Fargate Service, it works when I test locally and works with Fargate v 1.3.0.
But when I change to Fargate Platform version 1.4.0 it always turns unhealthy. But the actual service is working. I can access the service on the containers public IP.
The health check is defined as:
"CMD-SHELL", "curl --fail http://localhost || exit 1"
So we looked into this and there's an issue in platform version 1.4 where, if the health check outputs anything to stderr a false negative occurs. We will, obviously, fix this but in the meantime you can work around this by (in this case) run curl in silent mode or simply redirect stderr output to /dev/null:
curl -s --fail http://localhost || exit 1
or
curl --fail http://localhost 2>/dev/null || exit 1
Should unblock you for now.
I wanted to collate some answers together and build on them, as follows.
I'm not being funny, but first and foremost make sure you have a healthcheck endpoint running somewhere. Note that this doesn't have to be inside your container! Let me show you what I mean:
curl -s --fail -I https://127.0.0.1:8000/ || exit 1
will only pass if you have a HTTP server running on localhost port 8000 (etc.). This can be anything that returns a 200 - over to you.
Tips:
Make sure curl is installed inside the container
-s is for silent
--fail - ask google
-I header only
If localhost doesn't work try 127.0.0.1
Now, in my case I was not running a HTTP server but rather a long-running python script. In its error state the script exits with 1 (which terminates the task), but otherwise (after a long time) it exits with 0. To fail the healthcheck, the healthcheck call must also return 0 (otherwise there is a 1 and the task is again terminated*). [*exit codes > 1 can be converted to a 1 - see below stolen trick.]
So I had to fake a different endpoint with the same behaviour.
Step forward, Google.
curl -s --fail -I https://www.google.com || exit 1
As before, but now hit an external endpoint kindly provided. Note the || exit 1 which converts any positive-definite integer exit code to the 1 liked by the healthcheck.
Sorry to "state the bleeding obvious", but you really do need a function running here - don't run curl on a local endpoint and expect to get a healthy status!
Remember to expose the https / http ports 443 / 80 in your docker file and in the JSON task definition spec/through the console UI.
TIP! Note that the CMD-SHELL syntax is slightly different depending.
Putting it all together, for ECS Fargate the rest is correct.
You could also try an echo rather than a curl. I am unclear whether a point-to-point call is even required.

Cannot conect to Docker container running in VSTS

I have a test which starts a Docker container, performs the verification (which is talking to the Apache httpd in the Docker container), and then stops the Docker container.
When I run this test locally, this test runs just fine. But when it runs on hosted VSTS, thus a hosted build agent, it cannot connect to the Apache httpd in the Docker container.
This is the .vsts-ci.yml file:
queue: Hosted Linux Preview
steps:
- script: |
./test.sh
This is the test.sh shell script to reproduce the problem:
#!/bin/bash
set -e
set -o pipefail
function tearDown {
docker stop test-apache
docker rm test-apache
}
trap tearDown EXIT
docker run -d --name test-apache -p 8083:80 httpd
sleep 10
curl -D - http://localhost:8083/
When I run this test locally, the output that I get is:
$ ./test.sh
469d50447ebc01775d94e8bed65b8310f4d9c7689ad41b2da8111fd57f27cb38
HTTP/1.1 200 OK
Date: Tue, 04 Sep 2018 12:00:17 GMT
Server: Apache/2.4.34 (Unix)
Last-Modified: Mon, 11 Jun 2007 18:53:14 GMT
ETag: "2d-432a5e4a73a80"
Accept-Ranges: bytes
Content-Length: 45
Content-Type: text/html
<html><body><h1>It works!</h1></body></html>
test-apache
test-apache
This output is exactly as I expect.
But when I run this test on VSTS, the output that I get is (irrelevant parts replaced with …).
2018-09-04T12:01:23.7909911Z ##[section]Starting: CmdLine
2018-09-04T12:01:23.8044456Z ==============================================================================
2018-09-04T12:01:23.8061703Z Task : Command Line
2018-09-04T12:01:23.8077837Z Description : Run a command line script using cmd.exe on Windows and bash on macOS and Linux.
2018-09-04T12:01:23.8095370Z Version : 2.136.0
2018-09-04T12:01:23.8111699Z Author : Microsoft Corporation
2018-09-04T12:01:23.8128664Z Help : [More Information](https://go.microsoft.com/fwlink/?LinkID=613735)
2018-09-04T12:01:23.8146694Z ==============================================================================
2018-09-04T12:01:26.3345330Z Generating script.
2018-09-04T12:01:26.3392080Z Script contents:
2018-09-04T12:01:26.3409635Z ./test.sh
2018-09-04T12:01:26.3574923Z [command]/bin/bash --noprofile --norc /home/vsts/work/_temp/02476800-8a7e-4e22-8715-c3f706e3679f.sh
2018-09-04T12:01:27.7054918Z Unable to find image 'httpd:latest' locally
2018-09-04T12:01:30.5555851Z latest: Pulling from library/httpd
2018-09-04T12:01:31.4312351Z d660b1f15b9b: Pulling fs layer
[…]
2018-09-04T12:01:49.1468474Z e86a7f31d4e7506d34e3b854c2a55646eaa4dcc731edc711af2cc934c44da2f9
2018-09-04T12:02:00.2563446Z % Total % Received % Xferd Average Speed Time Time Time Current
2018-09-04T12:02:00.2583211Z Dload Upload Total Spent Left Speed
2018-09-04T12:02:00.2595905Z
2018-09-04T12:02:00.2613320Z 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Failed to connect to localhost port 8083: Connection refused
2018-09-04T12:02:00.7027822Z test-apache
2018-09-04T12:02:00.7642313Z test-apache
2018-09-04T12:02:00.7826541Z ##[error]Bash exited with code '7'.
2018-09-04T12:02:00.7989841Z ##[section]Finishing: CmdLine
The key thing is this:
curl: (7) Failed to connect to localhost port 8083: Connection refused
10 seconds should be enough for apache to start.
Why can curl not communicate with Apache on its port 8083?
P.S.:
I know that a hard-coded port like this is rubbish and that I should use an ephemeral port instead. I wanted to get it running first wirth a hard-coded port, because that's simpler than using an ephemeral port, and then switch to an ephemeral port as soon as the hard-coded port works. And in case the hard-coded port doesn't work because the port is unavailable, the error should look different, in that case, docker run should fail because the port can't be allocated.
Update:
Just to be sure, I've rerun the test with sleep 100 instead of sleep 10. The results are unchanged, curl cannot connect to localhost port 8083.
Update 2:
When extending the script to execute docker logs, docker logs shows that Apache is running as expected.
When extending the script to execute docker ps, it shows the following output:
2018-09-05T00:02:24.1310783Z CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
2018-09-05T00:02:24.1336263Z 3f59aa014216 httpd "httpd-foreground" About a minute ago Up About a minute 0.0.0.0:8083->80/tcp test-apache
2018-09-05T00:02:24.1357782Z 850bda64f847 microsoft/vsts-agent:ubuntu-16.04-docker-17.12.0-ce-standard "/home/vsts/agents/2…" 2 minutes ago Up 2 minutes musing_booth
The problem is that the VSTS build agent runs in a Docker container. When the Docker container for Apache is started, it runs on the same level as the VSTS build agent Docker container, not nested inside the VSTS build agent Docker container.
There are two possible solutions:
Replacing localhost with the ip address of the docker host, keeping the port number 8083
Replacing localhost with the ip address of the docker container, changing the host port number 8083 to the container port number 80.
Access via the Docker Host
In this case, the solution is to replace localhost with the ip address of the docker host. The following shell snippet can do that:
host=localhost
if grep '^1:name=systemd:/docker/' /proc/1/cgroup
then
apt-get update
apt-get install net-tools
host=$(route -n | grep '^0.0.0.0' | sed -e 's/^0.0.0.0\s*//' -e 's/ .*//')
fi
curl -D - http://$host:8083/
The if grep '^1:name=systemd:/docker/' /proc/1/cgroup inspects whether the script is running inside a Docker container. If so, it installs net-tools to get access to the route command, and then parses the default gw from the route command to get the ip address of the host. Note that this only works if the container's network default gw actually is the host.
Direct Access to the Docker Container
After launching the docker container, its ip addresses can be obtained with the following command:
docker container inspect --format '{{range .NetworkSettings.Networks}}{{.IPAddress}} {{end}}' <container-id>
Replace <container-id> with your container id or name.
So, in this case, it would be (assuming that the first ip address is okay):
ips=($(docker container inspect --format '{{range .NetworkSettings.Networks}}{{.IPAddress}} {{end}}' nuance-apache))
host=${ips[0]}
curl http://$host/

Starting a service during Docker build

I would like start a service during the Docker build. I do not need this service to continue running after the build process has finished necessarily (or I know I can use the CMD command for that), however I do need it running long enough to execute a second command which relies on this service being up and running.
To be more precise I am trying to write a Dockerfile for the ejabberd XMPP Server, which also installs a module for this server. I am trying to start the ejabberd server with ejabberdctl start and then install the module with the ejabberdctl module_install utility, which depends on the node being up and running. It looks like this:
RUN ejabberdctl start && ejabberdctl modules_update_specs && ejabberdctl module_install ejabberd_auth_http
Now I have run into a problem, and I came up with two possible causes. The problem is that my build does not work from this line on, because the node is down when the second command is trying to execute. I get the following error, which is a typical one when you try to use the ejabberdctl utility, without the node actually being up:
Failed RPC connection to the node ejabberd#localhost
The command '/bin/sh -c ejabberdctl start && ejabberdctl modules_update_specs && ejabberdctl module_install ejabberd_auth_http' returned a non-zero code: 3
This could be either because the starting of the service takes a little, longer than it takes for the second command to get executed, so the second command runs into a node which is just starting up. Not sure how likely this is. The second cause could be that a starting of a service which depends on init.d just doesnt work in Docker during the build process.
I build the container up until that line that causes the problem, entered the container and executed the commands manually and everything worked as it should.
So to summarize I would like to start the ejabberd server during the build and then use its control utility to install some stuff. A last option would be to install the module manually without the server running, however I would prefer doing it with the ejabberdctl control utility.
These *ctl programs usually come in with a few utilities to start / stop / monitor the status of the service.
If your case I think the best idea is to have a simple bash script you can run at build time that does this:
start ejabberd
monitor the status at intervals
if the status of the process is up, run your command
Look at this:
root#158479dec020:/# ejabberdctl status
Failed RPC connection to the node ejabberd#158479dec020: nodedown
root#158479dec020:/# echo $?
3
root#158479dec020:/# ejabberdctl start
root#158479dec020:/# echo $?
0
root#158479dec020:/# ejabberdctl status
The node ejabberd#158479dec020 is started with status: started
ejabberd 16.01 is running in that node
root#158479dec020:/# echo $?
0
root#158479dec020:/# ejabberdctl stop
root#158479dec020:/# echo $?
0
root#158479dec020:/# ejabberdctl status
Failed RPC connection to the node ejabberd#158479dec020: nodedown
root#158479dec020:/# echo $?
3
So this tells us that if you run a ejabberd status and the daemon is not running you receive exit code 3, 0 if it's up and running instead.
There you go with your bash script:
function run() {
ejabberdctl start # Repeating just in case...
ejabberdctl status &>/dev/null
if [ $? -eq 0 ]; then
echo "Do some magic here, ejabberd is running..."
exit 0
fi
echo "Ejabberd still down..."
}
while true; do run; sleep 1; done
And this is what you'd get at the CLI:
root#158479dec020:/# ./check.sh
Ejabberd still down...
Do some magic here, ejabberd is running...
root#158479dec020:/# ejabberdctl stop
root#158479dec020:/# ./check.sh
Ejabberd still down...
Ejabberd still down...
Do some magic here, ejabberd is running...

How to know if my program is completely started inside my docker with compose

In my CI chain I execute end-to-end tests after a "docker-compose up". Unfortunately my tests often fail because even if the containers are properly started, the programs contained in my containers are not.
Is there an elegant way to verify that my setup is completely started before running my tests ?
You could poll the required services to confirm they are responding before running the tests.
curl has inbuilt retry logic or it's fairly trivial to build retry logic around some other type of service test.
#!/bin/bash
await(){
local url=${1}
local seconds=${2:-30}
curl --max-time 5 --retry 60 --retry-delay 1 \
--retry-max-time ${seconds} "${url}" \
|| exit 1
}
docker-compose up -d
await http://container_ms1:3000
await http://container_ms2:3000
run-ze-tests
The alternate to polling is an event based system.
If all your services push notifications to an external service, scaeda gave the example of a log file or you could use something like Amazon SNS. Your services emit a "started" event. Then you can subscribe to those events and run whatever you need once everything has started.
Docker 1.12 did add the HEALTHCHECK build command. Maybe this is available via Docker Events?
If you have control over the docker engine in your CI setup you could execute docker logs [Container_Name] and read out the last line which could be emitted by your application.
RESULT=$(docker logs [Container_Name] 2>&1 | grep [Search_String])
logs output example:
Agent pid 13
Enter passphrase (empty for no passphrase): Enter same passphrase again: Identity added: id_rsa (id_rsa)
#host SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.6
#host SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.6
parse specific line:
RESULT=$(docker logs ssh_jenkins_test 2>&1 | grep Enter)
result:
Enter passphrase (empty for no passphrase): Enter same passphrase again: Identity added: id_rsa (id_rsa)

Inform me when site (server) is online again

When I ping one site it returns "Request timed out". I want to make little program that will inform me (sound beep or something like that) when this server is online again. No matter in which language. I think it should be very simple script with a several lines of code. So how to write it?
Some implementations of ping allow you to specify conditions for exiting after receipt of packets:
On Mac OS X, use ping -a -o $the_host
ping will keep trying (by default)
-a means beep when a packet is received
-o means exit when a packet is received
On Linux (Ubuntu at least), use ping -a -c 1 -w inf $the_host
-a means beep when a packet is received
-c 1 specifies the number of packets to send before exit (in this case 1)
-w inf specifies the deadline for when ping exits no matter what (in this case Infinite)
when -c and -w are used together, -c becomes number of packets received before exit
Either can be chained to perform your next command, e.g. to ssh into the server as soon as it comes up (with a gap between to allow sshd to actually start up):
# ping -a -o $the_host && sleep 3 && ssh $the_host
Don't forget the notify sound like echo"^G"! Just to be different - here's Windows batch:
C:\> more pingnotify.bat
:AGAIN
ping -n 1 %1%
IF ERRORLEVEL 1 GOTO AGAIN
sndrec32 /play /close "C:\Windows\Media\Notify.wav"
C:\> pingnotify.bat localhost
:)
One way is to run ping is a loop, e.g.
while ! ping -c 1 host; do sleep 1; done
(You can redirect the output to /dev/null if you want to keep it quiet.)
On some systems, such as Mac OS X, ping may also have the options -a -o (as per another answer) available which will cause it to keep pinging until a response is received. However, the ping on many (most?) Linux systems does not have the -o option and the kind of equivalent -c 1 -w 0 still exits if the network returns an error.
Edit: If the host does not respond to ping or you need to check the availability of service on a certain port, you can use netcat in the zero I/O mode:
while ! nc -w 5 -z host port; do sleep 1; done
The -w 5 specifies a 5 second timeout for each individual attempt. Note that with netcat you can even list multiple ports (or port ranges) to scan when some of them becomes available.
Edit 2: The loops shown above keep trying until the host (or port) is reached. Add your alert command after them, e.g. beep or pop-up a window.

Resources