Starting a service during Docker build

Starting a service during Docker build - docker

I would like start a service during the Docker build. I do not need this service to continue running after the build process has finished necessarily (or I know I can use the CMD command for that), however I do need it running long enough to execute a second command which relies on this service being up and running.
To be more precise I am trying to write a Dockerfile for the ejabberd XMPP Server, which also installs a module for this server. I am trying to start the ejabberd server with ejabberdctl start and then install the module with the ejabberdctl module_install utility, which depends on the node being up and running. It looks like this:
RUN ejabberdctl start && ejabberdctl modules_update_specs && ejabberdctl module_install ejabberd_auth_http
Now I have run into a problem, and I came up with two possible causes. The problem is that my build does not work from this line on, because the node is down when the second command is trying to execute. I get the following error, which is a typical one when you try to use the ejabberdctl utility, without the node actually being up:
Failed RPC connection to the node ejabberd#localhost
The command '/bin/sh -c ejabberdctl start && ejabberdctl modules_update_specs && ejabberdctl module_install ejabberd_auth_http' returned a non-zero code: 3
This could be either because the starting of the service takes a little, longer than it takes for the second command to get executed, so the second command runs into a node which is just starting up. Not sure how likely this is. The second cause could be that a starting of a service which depends on init.d just doesnt work in Docker during the build process.
I build the container up until that line that causes the problem, entered the container and executed the commands manually and everything worked as it should.
So to summarize I would like to start the ejabberd server during the build and then use its control utility to install some stuff. A last option would be to install the module manually without the server running, however I would prefer doing it with the ejabberdctl control utility.

These *ctl programs usually come in with a few utilities to start / stop / monitor the status of the service.
If your case I think the best idea is to have a simple bash script you can run at build time that does this:
start ejabberd
monitor the status at intervals
if the status of the process is up, run your command
Look at this:
root#158479dec020:/# ejabberdctl status
Failed RPC connection to the node ejabberd#158479dec020: nodedown
root#158479dec020:/# echo $?
3
root#158479dec020:/# ejabberdctl start
root#158479dec020:/# echo $?
0
root#158479dec020:/# ejabberdctl status
The node ejabberd#158479dec020 is started with status: started
ejabberd 16.01 is running in that node
root#158479dec020:/# echo $?
0
root#158479dec020:/# ejabberdctl stop
root#158479dec020:/# echo $?
0
root#158479dec020:/# ejabberdctl status
Failed RPC connection to the node ejabberd#158479dec020: nodedown
root#158479dec020:/# echo $?
3
So this tells us that if you run a ejabberd status and the daemon is not running you receive exit code 3, 0 if it's up and running instead.
There you go with your bash script:
function run() {
ejabberdctl start # Repeating just in case...
ejabberdctl status &>/dev/null
if [ $? -eq 0 ]; then
echo "Do some magic here, ejabberd is running..."
exit 0
fi
echo "Ejabberd still down..."
}
while true; do run; sleep 1; done
And this is what you'd get at the CLI:
root#158479dec020:/# ./check.sh
Ejabberd still down...
Do some magic here, ejabberd is running...
root#158479dec020:/# ejabberdctl stop
root#158479dec020:/# ./check.sh
Ejabberd still down...
Ejabberd still down...
Do some magic here, ejabberd is running...

Related

Testing minimal docker containers with healthcheck

I have 5 containers running one after another. First 3, (ABC) are very minimal. ABC containers need to be health checked, but curl,wget cannot be run on them, so currently I just run test:[CMD-SHELL], "whoami || exit 1" in docker-compose.yml. Which seems to bring them to a healthy state. Other 2 (DE) dependent on ABC to be healthy are being checked using test: [CMD-SHELL] , "curl --fail http://localhost" command. My question is how can I properly check health of those minimal containers, without using curl, wget etc. ?

If you can live with a TCP connection test to your internal service's port, you could use /dev/tcp for this:
HEALTHCHECK CMD bash -c 'echo -n > /dev/tcp/127.0.0.1/<port>'
Like this:
# PASS (webserver is serving on 8099)
root#ab7470ea0c8b:/app# echo -n > /dev/tcp/127.0.0.1/8099
root#ab7470ea0c8b:/app# echo $?
0
# FAIL (webserver is NOT serving on 9000)
root#ab7470ea0c8b:/app# echo -n > /dev/tcp/127.0.0.1/9000
bash: connect: Connection refused
bash: /dev/tcp/127.0.0.1/9000: Connection refused
root#ab7470ea0c8b:/app# echo $?
1
Unfortunately, I think this is the best that can be done without installing curl or wget.

Issue accessing vespa outside docker container

Installed Docker on Mac and trying to run Vespa on Docker following steps specified in following link
https://docs.vespa.ai/documentation/vespa-quick-start.html
I did n't had any issues till step 4. I see vespa container running after step 2 and step 3 returned 200 OK response.
But Step 5 failed to return 200 OK response. Below is the command I ran on my terminal
curl -s --head http://localhost:8080/ApplicationStatus
I keep getting
curl: (52) Empty reply from server whenever I run without -s option.
So I tried to see listening ports inside my vespa container and don't see anything for 8080 but can see for 19071(used in step 3)
➜ ~ docker exec vespa bash -c 'netstat -vatn| grep 8080'
➜ ~ docker exec vespa bash -c 'netstat -vatn| grep 19071'
tcp 0 0 0.0.0.0:19071 0.0.0.0:* LISTEN
Below doc has info related to vespa ports
https://docs.vespa.ai/documentation/reference/files-processes-and-ports.html
I'm assuming port 8080 should be active after docker run(step 2 of quick start link) and can be accessed outside container as port mapping is done.
But I don't see 8080 port active inside container in first place.
A'm I missing something. Do I need to perform any additional step than mentioned in quick start? FYI I installed Jenkins inside my docker and was able to access outside container via port mapping. But not sure why it's not working with vespa.I have been trying from quiet sometime but no progress. Please advice me if I'm missing something here.

You have too low memory for your docker container, "Minimum 6GB memory dedicated to Docker (the default is 2GB on Macs).". See https://docs.vespa.ai/documentation/vespa-quick-start.html
The deadlock detector warnings and failure to get configuration from configuration server (which is likely oom killed) indicates that you are too low on memory.

My guess is that your jdisc container had not finished initialize or did not initialize properly? Did you try to check the log?
docker exec vespa bash -c '/opt/vespa/bin/vespa-logfmt /opt/vespa/logs/vespa/vespa.log'
This should tell you if there was something wrong. When it is ready to receive requests you would see something like this:
[2018-12-10 06:30:37.854] INFO : container Container.org.eclipse.jetty.server.AbstractConnector Started SearchServer#79afa369{HTTP/1.1,[http/1.1]}{0.0.0.0:8080}
[2018-12-10 06:30:37.857] INFO : container Container.org.eclipse.jetty.server.Server Started #10280ms
[2018-12-10 06:30:37.857] INFO : container Container.com.yahoo.container.jdisc.ConfiguredApplication Switching to the latest deployed set of configurations and components. Application switch number: 0
[2018-12-10 06:30:37.859] INFO : container Container.com.yahoo.container.jdisc.ConfiguredApplication Initializing new set of configurations and components. Application switch number: 1

Docker swarm: guarantee high availability after restart

I have an issue using Docker swarm.
I have 3 replicas of a Python web service running on Gunicorn.
The issue is that when I restart the swarm service after a software update, an old running service is killed, then a new one is created and started. But in the short period of time when the old service is already killed, and the new one didn't fully start yet, network messages are already routed to the new instance that isn't ready yet, resulting in 502 bad gateway errors (I proxy to the service from nginx).
I use --update-parallelism 1 --update-delay 10s options, but this doesn't eliminate the issue, only slightly reduces chances of getting the 502 error (because there are always at least 2 services running, even if one of them might be still starting up).

So, following what I've proposed in comments:
Use the HEALTHCHECK feature of Dockerfile: Docs. Something like:
HEALTHCHECK --interval=5m --timeout=3s \
CMD curl -f http://localhost/ || exit 1
Knowing that Docker Swarm does honor this healthcheck during service updates, it's relative easy to have a zero downtime deployment.
But as you mentioned, you have a high-resource consumer health-check, and you need larger healthcheck-intervals.
In that case, I recomend you to customize your healthcheck doing the first run immediately and the successive checks at current_minute % 5 == 0, but the healthcheck itself running /30s:
HEALTHCHECK --interval=30s --timeout=3s \
CMD /service_healthcheck.sh
healthcheck.sh
#!/bin/bash
CURRENT_MINUTE=$(date +%M)
INTERVAL_MINUTE=5
[ $((a%2)) -eq 0 ]
do_healthcheck() {
curl -f http://localhost/ || exit 1
}
if [ ! -f /tmp/healthcheck.first.run ]; then
do_healhcheck
touch /tmp/healthcheck.first.run
exit 0
fi
# Run only each minute that is multiple of $INTERVAL_MINUTE
[ $(($CURRENT_MINUTE%$INTERVAL_MINUTE)) -eq 0 ] && do_healhcheck
exit 0
Remember to COPY the healthcheck.sh to /healthcheck.sh (and chmod +x)

There are some known issues (e.g. moby/moby #30321) with rolling upgrades in docker swarm with the current 17.05 and earlier releases (and doesn't look like all the fixes will make 17.06). These issues will result in connection errors during a rolling upgrade like you're seeing.
If you have a true zero downtime deployment requirement and can't solve this with a client side retry, then I'd recommend putting in some kind of blue/green switch in front of your swarm and do the rolling upgrade to the non-active set of containers until docker finds solutions to all of the scenarios.

How to know if my program is completely started inside my docker with compose

In my CI chain I execute end-to-end tests after a "docker-compose up". Unfortunately my tests often fail because even if the containers are properly started, the programs contained in my containers are not.
Is there an elegant way to verify that my setup is completely started before running my tests ?

You could poll the required services to confirm they are responding before running the tests.
curl has inbuilt retry logic or it's fairly trivial to build retry logic around some other type of service test.
#!/bin/bash
await(){
local url=${1}
local seconds=${2:-30}
curl --max-time 5 --retry 60 --retry-delay 1 \
--retry-max-time ${seconds} "${url}" \
|| exit 1
}
docker-compose up -d
await http://container_ms1:3000
await http://container_ms2:3000
run-ze-tests
The alternate to polling is an event based system.
If all your services push notifications to an external service, scaeda gave the example of a log file or you could use something like Amazon SNS. Your services emit a "started" event. Then you can subscribe to those events and run whatever you need once everything has started.
Docker 1.12 did add the HEALTHCHECK build command. Maybe this is available via Docker Events?

If you have control over the docker engine in your CI setup you could execute docker logs [Container_Name] and read out the last line which could be emitted by your application.
RESULT=$(docker logs [Container_Name] 2>&1 | grep [Search_String])
logs output example:
Agent pid 13
Enter passphrase (empty for no passphrase): Enter same passphrase again: Identity added: id_rsa (id_rsa)
#host SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.6
#host SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.6
parse specific line:
RESULT=$(docker logs ssh_jenkins_test 2>&1 | grep Enter)
result:
Enter passphrase (empty for no passphrase): Enter same passphrase again: Identity added: id_rsa (id_rsa)

Jenkins with Publish over SSH plugin, -1 exit status

I use Jenkins for build and plugin for deploy my artifacts to server. After deploying files I stopped service by calling eec in plugin
sudo service myservice stop
and I receive answer from Publish over SSH:
SSH: EXEC: channel open
SSH: EXEC: STDOUT/STDERR from command [sudo service myservice stop]...
SSH: EXEC: connected
Stopping script myservice
SSH: EXEC: completed after 200 ms
SSH: Disconnecting configuration [172.29.19.2] ...
ERROR: Exception when publishing, exception message [Exec exit status not zero. Status [-1]]
Build step 'Send build artifacts over SSH' changed build result to UNSTABLE
Finished: UNSTABLE
The build is failed but the service is stopped.
My /etc/init.d/myservice
#! /bin/sh
# /etc/init.d/myservice
#
# Some things that run always
# touch /var/lock/myservice
# Carry out specific functions when asked to by the system
case "$1" in
start)
echo "Starting myservice"
setsid /opt/myservice/bin/myservice --spring.config.location=/etc/ezd/application.properties --server.port=8082 >> /opt/myservice/app.log &
;;
stop)
echo "Stopping script myservice"
pkill -f myservice
#
;;
*)
echo "Usage: /etc/init.d/myservice {start|stop}"
exit 1
;;
esac
exit 0
Please say me why I get -1 exit status?

Well, the script is called /etc/init.d/myservice, so it matches the myservice pattern given to pkill -f. And because the script is waiting for the pkill to complete, it is still alive and gets killed and returns -1 for that reason (there is also the killing signal in the result of wait, but the Jenkins slave daemon isn't printing it).
Either:
come up with more specific pattern for pkill,
use proper pid-file or
switch to systemd, which can reliably kill exactly the process it started.
On this day and age, I recommend the last option. Systemd is simply lot more reliable than init scripts.

Yes, Jan Hudec is right. I call ps ax | grep myservice in Publish over SSH plugin:
83469 pts/5 Ss+ 0:00 bash -c ps ax | grep myservice service myservice stop
So pkill -f myservice will affect the process with PID 83469 which is parent for pkill. This is -1 status cause as I understand.
I changed pkill -f myservice to pkill -f "java.*myservice" and this solved my problem.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart