docker exec command doesn't return after completing execution - docker

I started a docker container based on an image which has a file "run.sh" in it. Within a shell script, i use docker exec as shown below
docker exec <container-id> sh /test.sh
test.sh completes execution but docker exec does not return until i press ctrl+C. As a result, my shell script never ends. Any pointers to what might be causing this.

I could get it working with adding the -it parameters:
docker exec -it <container-id> sh /test.sh

Mine works like a charm with this command. Maybe you only forgot the path to the binary (/bin/sh)?
docker exec 7bd877d15c9b /bin/bash /test.sh
File location at
/test.sh
File Content:
#!/bin/bash
echo "Hi"
echo
echo "This works fine"
sleep 5
echo "5"
Output:
ArgonQQ#Terminal ~ docker exec 7bd877d15c9b /bin/bash /test.sh
Hi
This works fine
5
ArgonQQ#Terminal ~

My case is a script a.sh with content
like
php test.php &
if I execute it like
docker exec contianer1 a.sh
It also never returned.
After half a day googling and trying
changed a.sh to
php test.php >/tmp/test.log 2>&1 &
It works!
So it seems related with stdin/out/err.
>/tmp/test.log 2>&1
Please try.
And please note that my test.php is a dead loop script that monitors a specified process, if the process is down, it will restart it. So test.php will never exit.

As described here, this "hanging" behavior occurs when you have processes that keep stdout or stderr open.
To prevent this from happening, each long-running process should:
be executed in the background, and
close both stdout and stderr or redirect them to files or /dev/null.
I would therefore make sure that any processes already running in the container, as well as the script passed to docker exec, conform to the above.

OK, I got it.
docker stop a590382c2943
docker start a590382c2943
then will be ok.
docker exec -ti a590382c2943 echo "5"
will return immediately, while add -it or not, no use
actually, in my program, the deamon has the std input and std output, std err. so I change my python deamon like following, things work like a charm:
if __name__ == '__main__':
# do the UNIX double-fork magic, see Stevens' "Advanced
# Programming in the UNIX Environment" for details (ISBN 0201563177)
try:
pid = os.fork()
if pid > 0:
# exit first parent
os._exit(0)
except OSError, e:
print "fork #1 failed: %d (%s)" % (e.errno, e.strerror)
os._exit(0)
# decouple from parent environment
#os.chdir("/")
os.setsid()
os.umask(0)
#std in out err, redirect
si = file('/dev/null', 'r')
so = file('/dev/null', 'a+')
se = file('/dev/null', 'a+', 0)
os.dup2(si.fileno(), sys.stdin.fileno())
os.dup2(so.fileno(), sys.stdout.fileno())
os.dup2(se.fileno(), sys.stderr.fileno())
# do second fork
while(True):
try:
pid = os.fork()
if pid == 0:
serve()
if pid > 0:
print "Server PID %d, Daemon PID: %d" % (pid, os.getpid())
os.wait()
time.sleep(3)
except OSError, e:
#print "fork #2 failed: %d (%s)" % (e.errno, e.strerror)
os._exit(0)

Related

Can't run Docker command via ssh and python's subprocess module

I am trying to automatically run a docker build command using the subprocess module as such:
command = "docker build -t image_name ."
ssh_command = "ssh -o 'StrictHostKeyChecking=no' -i 'XXXXX.pem' ubuntu#" + cur_public_ip + " " + command
retval = subprocess.run(command.split(" "), stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True)
if retval.stderr != '':
print('Error trace: ')
print(retval.stderr)
else:
print("Docker image succesfully built.")
print(retval.stdout)
Interestingly, if I run this command (the string that is the command variable) after I manually SSH into my ec2 instance, it works fine.
But when I run the code above, I get this error:
Error trace:
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
I can't seem to solve this problem, and I am stuck since I don't see how what I am doing is different from manually sshing into the instance and running the command.
The docker daemon is definitely running since I can build manually through an ssh terminal. I've tried changing the rwx permissions of the Dockerfile and all related files on the ec2 instance, but that did not help as well.
How do I make this work? I need to programmatically be able to do this.
Thank you.
Your first problem is that you're only passing command to subprocess.run, so you're running docker build locally:
+--- look here
|
v
retval = subprocess.run(command.split(" "), stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True)
Your second problem is that you've got way to much quoting going on in ssh_command, which is going to result in a number of problems. As written, for example, you'll be passing the literal string 'StrictHostKeyChecking=no' to ssh, resulting in an error like:
command-line: line 0: Bad configuration option: 'stricthostkeychecking
Because you're not executing your command via a shell, all of those quotes will be passed literally in the command line.
Rather than calling command.split(" "), you would be better off just building the command as a list, something like this:
import subprocess
cur_public_ip = "1.2.3.4"
command = ["docker", "build", "-t", "image_name", "."]
ssh_command = [
"ssh",
"-o",
"stricthostkeychecking=no",
"-i",
"XXXXX.pem",
f"ubuntu#{cur_public_ip}",
] + command
retval = subprocess.run(
ssh_command,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
universal_newlines=True,
)

who and w commands in CentOS 8 Docker container

While playing with CentOs 8 on Docker container I found out, that outputs of who and w commands are always empty.
[root#9e24376316f1 ~]# who
[root#9e24376316f1 ~]# w
01:01:50 up 7:38, 0 users, load average: 0.00, 0.04, 0.00
USER TTY FROM LOGIN# IDLE JCPU PCPU WHAT
Even when I'm logged in as a different user in second terminal.
When I want to write to this user it shows
[root#9e24376316f1 ~]# write test
write: test is not logged in
Is this because of Docker? Maybe it works in some way that disallow sessions to see each other?
Or maybe that's some other issue. I would really appreciate some explanation.
These utilities obtain the information about current logins from the utmp file (/var/run/utmp). You can easily check that in ordinary circumstances (e.g. on the desktop system) this file contains something like the following string (here qazer is my login and tty7 is a TTY where my desktop environment runs):
$ cat /var/run/utmp
tty7:0qazer:0�o^�
while in the container this file is (usually) empty:
$ docker run -it centos
[root#5e91e9e1a28e /]# cat /var/run/utmp
[root#5e91e9e1a28e /]#
Why?
The utmp file is usually modified by programs which authenticate the user and start the session: login(1), sshd(8), lightdm(1). However, the container engine cannot rely on them, as they may be absent in the container file system, so "logging in" and "executing on behalf of" is implemented in the most primitive and straightforward manner, avoiding relying on anything inside the container.
When any container is started or any command is execd inside it, the container engine just spawns the new process, arranges some security settings, calls setgid(2)/setuid(2) to forcibly (without any authentication) alter the process' UID/GID and then executes the required binary (the entry point, the command, and so on) within this process.
Say, I start the CentOS container running its main process on behalf of UID 42:
docker run -it --user 42 centos
and then try to execute sleep 1000 inside it:
docker exec -it $CONTAINER_ID sleep 1000
The container engine will perform something like this:
[pid 10170] setgid(0) = 0
[pid 10170] setuid(42) = 0
...
[pid 10170] execve("/usr/bin/sleep", ["sleep", "1000"], 0xc000159740 /* 4 vars */) = 0
There will be no writes to /var/run/utmp, thus it will remain empty, and who(1)/w(1) will not find any logins inside the container.

Monit bundle exec rails s

I have the following shell script that allows me to start my rails app, let's say it's called start-app.sh:
#!/bin/bash
cd /var/www/project/current
. /home/user/.rvm/environments/ruby-2.3.3
RAILS_SERVE_STATIC_FILES=true RAILS_ENV=production nohup bundle exec rails s -e production -p 4445 > /var/www/project/log/production.log 2>&1 &
the file above have permissions of:
-rwxr-xr-x 1 user user 410 Mar 21 10:00 start-app.sh*
if i want to check the process I do the following:
ps aux | grep -v grep | grep ":4445"
it'd give me the following output:
user 2960 0.0 7.0 975160 144408 ? Sl 10:37 0:07 puma 3.12.0 (tcp://0.0.0.0:4445) [20180809094218]
P.S: the reason i grep ":4445" is because i have few processes running on different ports. (for different projects)
now coming to monit, i used apt-get to install it, and the latest version from repo is 5.16, as i'm running on Ubuntu 16.04, also note that monit is running as root, that's why i specified the gid uid in the following. (because the start script is used to be executed from "user" and not "root")
Here's the configuration for monit:
set daemon 20 # check services at 20 seconds interval
set logfile /var/log/monit.log
set idfile /var/lib/monit/id
set statefile /var/lib/monit/state
set eventqueue
basedir /var/lib/monit/events # set the base directory where events will be stored
slots 100 # optionally limit the queue size
set mailserver xx.com port xxx
username "xx#xx.com" password "xxxxxx"
using tlsv12
with timeout 20 seconds
set alert xx#xx.com
set mail-format {
from: xx#xx.com
subject: monit alert -- $EVENT $SERVICE
message: $EVENT Service $SERVICE
Date: $DATE
Action: $ACTION
Host: $HOST
Description: $DESCRIPTION
}
set limits {
programOutput: 51200 B
sendExpectBuffer: 25600 B
fileContentBuffer: 51200 B
networktimeout: 10 s
}
check system $HOST
if loadavg (1min) > 4 then alert
if loadavg (5min) > 2 then alert
if cpu usage > 90% for 10 cycles then alert
if memory usage > 85% then alert
if swap usage > 35% then alert
check process nginx with pidfile /var/run/nginx.pid
start program = "/bin/systemctl start nginx"
stop program = "/bin/systemctl stop nginx"
check process redis
matching "redis"
start program = "/bin/systemctl start redis"
stop program = "/bin/systemctl stop redis"
check process myapp
matching ":4445"
start program = "/bin/bash -c '/home/user/start-app.sh'" as uid "user" and gid "user"
stop program = "/bin/bash -c /home/user/stop-app.sh" as uid "user" and gid "user"
include /etc/monit/conf.d/*
include /etc/monit/conf-enabled/*
Now monit, is detecting and alerting me when the process goes down (if i kill it manually) and when it's manually recovered, but it won't start that shell script automatically.. and according to /var/log/monit.log, it's showing the following:
[UTC Aug 13 10:16:41] info : Starting Monit 5.16 daemon
[UTC Aug 13 10:16:41] info : 'production-server' Monit 5.16 started
[UTC Aug 13 10:16:43] error : 'myapp' process is not running
[UTC Aug 13 10:16:46] info : 'myapp' trying to restart
[UTC Aug 13 10:16:46] info : 'myapp' start: /bin/bash
[UTC Aug 13 10:17:17] error : 'myapp' failed to start (exit status 0) -- no output
So far what I see when monit tries to execute the script is that it tries to load it (i can see it for less than 3 seconds using ps aux | grep -v grep | grep ":4445", but this output is different from the above output i showed up, it shows the content of the shell script being executed and specifically this one:
blablalba... nohup bundle exec rails s -e production -p 4445
and then it disappears. then it tries to re-execute the shell.. again and again...
What am I missing, and what is wrong with my configuration? note that I can't change anything in the start-app.sh because it's on production and working 100%. (i just want to monitor it)
Edit: To my understanding and experience, it seems to be a Environment Variable issue or path issue, but i'm not sure how to solve it, it doesn't make any sense to put the env variables inside monit .. what if someone else wanted to edit that shell script or add something new? i hope you get my point
As i expected, it was user-environment issue and i solved it by editing monit configuration as below:
Before (not working)
check process myapp
matching ":4445"
start program = "/bin/bash -c '/home/user/start-app.sh'" as uid "user" and gid "user"
stop program = "/bin/bash -c /home/user/stop-app.sh" as uid "user" and gid "user"
After (working)
check process myapp
matching ":4445"
start program = "/bin/su -s /bin/bash -c '/home/user/start-app.sh' user"
stop program = "/bin/su -s /bin/bash -c '/home/user/stop-app.sh' user"
Explanation: i removed (uid and gid) as "user" from monit because it will only execute the shell script in the name of "user" but it won't get/import/use user's env path, or env variables.

Does `strace -f` work differently when run inside a docker container?

Assume the following:
I have a program myprogram inside a docker container
I'm running the docker container with
docker run --privileged=true my-label/my-container
Inside the container - the program is being run with:
strace -f -e trace=desc ./myprogram
What I see is that the strace (despite having the -f on) doesn't follow all the child processes.
I see the following output from strace
[pid 10] 07:36:46.668931 write(2, "..\n"..., 454 <unfinished ...>
<stdout of ..>
<stdout other output - but I don't see the write commands - so probably from a child process>
[pid 10] 07:36:46.669684 write(2, "My final output\n", 24 <unfinished ...>
<stdout of My final output>
What I want to see is the other write commands.
Now I should see the the other write commands - because I'm using -f.
What I think is happening is that running inside docker makes the process handling and security different.
My question is: Does strace -f work differently when run inside a docker container?
Note that this application starts and stops in 2 seconds - so the tracing tool has to follow the application lifecycle - like strace does. Connecting to a server background process won't work.
It turns out strace truncates string output - you have to explicitly tell it that you want more than the first n (10?) string chars. You do this with -s 800.
strace -s 800 -ff ./myprogram
You can also get all the write commands by asking strace explicitly with -e write.
strace -s 800 -ff -e write ./myprogram

Remove docker container at the end of each test

I'm using docker to scale the test infrastructure / browsers based on the number of requests received in Jenkins.
Created a python script to identify the total number of spec files and browser type, and spin-up those many docker containers. Python code has the logic to determine how many nodes are currently in use, stale and it determines the required number of containers.
I want to programmatically delete the container / de-register the selenium node at the end of each spec file (Docker --rm flag is not helping me).So that the next test will get a clean browser and environment.
The selenium grid runs on the same box where Jenkins is. Once I invoke protractor protractor.conf.js (Step 3), selenium grid will start distributing the tests to the containers created in Step 1.
When I say '--rm' is not helping, I mean the after step3 the communication is mainly between selenium hub and the nodes. I'm finding it difficult to determine which node / container was used by the selenium grid to execute the test and remove the container even before the grid sends another test to the container.
-- Jenkins Build Stage --
Shell:
# Step 1
python ./create_test_machine.py ${no_of_containers} # This will spin-up selenium nodes
# Step 2
npm install # install node modules
# Step 3
protractor protractor.conf.js # Run the protractor tests
--Python code to spin up containers - create_test_machine.py--
Python Script:
import sys
import docker
import docker.utils
import requests
import json
import time
c = docker.Client(base_url='unix://var/run/docker.sock', version='1.23')
my_envs = {'HUB_PORT_4444_TCP_ADDR' :'172.17.0.1', 'HUB_PORT_4444_TCP_PORT' : 4444}
def check_available_machines(no_of_machines):
t = c.containers()
noof_running_containers = len(t)
if noof_running_containers == 0:
print("0 containers running. Creating " + str(no_of_machines) + "new containers...")
spinup_test_machines(no_of_machines)
else:
out_of_stock = 0
for obj_container in t:
print(obj_container)
container_ip_addr = obj_container['NetworkSettings']['Networks']['bridge']['IPAddress']
container_state = obj_container['State']
res = requests.get('http://' + container_ip_addr + ':5555/wd/hub/sessions')
obj = json.loads(res.content)
node_inuse = len(obj['value'])
if node_inuse != 0:
noof_running_containers -= 1
if noof_running_containers < no_of_machines:
spinup_test_machines(no_of_machines - noof_running_containers)
return
def spinup_test_machines(no_of_machines):
'''
Parameter : Number of test nodes to spin up
'''
print("Creating " + str(no_of_machines) + " new containers...")
# my_envs = docker.utils.parse_env_file('docker.env')
for i in range(0,no_of_machines):
new_container = c.create_container(image='selenium/node-chrome', environment=my_envs)
response = c.start(container=new_container.get('Id'))
print(new_container, response)
return
if len(sys.argv) - 1 == 1:
no_of_machines = int(sys.argv[1]) + 2
check_available_machines(no_of_machines)
time.sleep(30)
else:
print("Invalid number of parameters")
Here the difference can be seen clearly when docker run with -d and --rm
Using -doption
C:\Users\apps>docker run -d --name testso alpine /bin/echo 'Hello World'
5d447b558ae6bf58ff6a2147da8bdf25b526bd1c9f39117498fa017f8f71978b
Check the logs
C:\Users\apps>docker logs testso
'Hello World'
Check the last run containers
C:\Users\apps>docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
5d447b558ae6 alpine "/bin/echo 'Hello Wor" 35 hours ago Exited (0) 11 seconds ago testso
Finally user have to remove it explicity
C:\Users\apps>docker rm -f testso
testso
Using --rm, the container is vanished including its logs as soon as the process that is
run inside the container is completed. No trace of container any more.
C:\Users\apps>docker run --rm --name testso alpine /bin/echo 'Hello World'
'Hello World'
C:\Users\apps>docker logs testso
Error: No such container: testso
C:\Users\apps>docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
I believe that it is clear, how to run the container and leaving no trace after process is finished inside of container.
so to start a container in detached mode, you use -d=true or just -d option. By design, containers started in detached mode exit when the root process used to run the container exits. A container in detached mode cannot be automatically removed when it stops, this means you cannot use the --rm option with -d option.
look at this
https://docs.docker.com/engine/reference/run/
You can use nose test. For every "def test_xxx()", it will call the setup and teardown functions with #with_setup decrator. Below is an example:
from nose.tools import *
c = docker.Client(base_url='unix://var/run/docker.sock', version='1.23')
my_envs = {'HUB_PORT_4444_TCP_ADDR' :'172.17.0.1', 'HUB_PORT_4444_TCP_PORT' : 4444}
my_containers = {}
def setup_docker():
""" Setup Test Environment,
create/start your docker container(s), populate the my_containers dict.
"""
def tear_down_docker():
"""Tear down test environment.
"""
for container in my_containers.itervalues():
try:
c.stop(container=container.get('Id'))
c.remove_container(container=container.get('Id'))
except Exception as e:
print e
#with_setup(setup=setup_docker, teardown=tear_down_docker)
def test_xxx():
# do your test here
# you can call a subprocess to run your selenium
Or, you write a separate python script to detect the containers you set up for your test, and then do something like this:
for container in my_containers.itervalues():
try:
c.stop(container=container.get('Id'))
c.remove_container(container=container.get('Id'))
except Exception as e:
print e

Resources