Can't run Docker command via ssh and python's subprocess module

Can't run Docker command via ssh and python's subprocess module - docker

I am trying to automatically run a docker build command using the subprocess module as such:
command = "docker build -t image_name ."
ssh_command = "ssh -o 'StrictHostKeyChecking=no' -i 'XXXXX.pem' ubuntu#" + cur_public_ip + " " + command
retval = subprocess.run(command.split(" "), stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True)
if retval.stderr != '':
print('Error trace: ')
print(retval.stderr)
else:
print("Docker image succesfully built.")
print(retval.stdout)
Interestingly, if I run this command (the string that is the command variable) after I manually SSH into my ec2 instance, it works fine.
But when I run the code above, I get this error:
Error trace:
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
I can't seem to solve this problem, and I am stuck since I don't see how what I am doing is different from manually sshing into the instance and running the command.
The docker daemon is definitely running since I can build manually through an ssh terminal. I've tried changing the rwx permissions of the Dockerfile and all related files on the ec2 instance, but that did not help as well.
How do I make this work? I need to programmatically be able to do this.
Thank you.

Your first problem is that you're only passing command to subprocess.run, so you're running docker build locally:
+--- look here
|
v
retval = subprocess.run(command.split(" "), stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True)
Your second problem is that you've got way to much quoting going on in ssh_command, which is going to result in a number of problems. As written, for example, you'll be passing the literal string 'StrictHostKeyChecking=no' to ssh, resulting in an error like:
command-line: line 0: Bad configuration option: 'stricthostkeychecking
Because you're not executing your command via a shell, all of those quotes will be passed literally in the command line.
Rather than calling command.split(" "), you would be better off just building the command as a list, something like this:
import subprocess
cur_public_ip = "1.2.3.4"
command = ["docker", "build", "-t", "image_name", "."]
ssh_command = [
"ssh",
"-o",
"stricthostkeychecking=no",
"-i",
"XXXXX.pem",
f"ubuntu#{cur_public_ip}",
] + command
retval = subprocess.run(
ssh_command,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
universal_newlines=True,
)

Related

Failed to start server. Error: Invalid argument

I am currently working with TensorFlow serving and while running a command I encountered an error
Step 1:- I pulled tensorflow/serving image using
docker pull tensorflow/pull
Step 2:- I made a project where I save the TF model in a directory:
C:/Code/potato-disease:
Step 3:- After running the command :-
docker run -t --rm -p 8505:8505 -v C:/Code/potato-disease:/potato-disease tensorflow/serving --rest_api_port=8505 --model_config_file=/potato-disease/models.config
Error:-
Failed to start server. Error: Invalid argument: Expected model potatoes_model to have an absolute path or URI; got base_path()=C:/Code/potato-disease/saved_models
2022-03-16 03:21:46.161233: I tensorflow_serving/core/basic_manager.cc:279] Unload all remaining servables in the manager.
My models.config file
model_config_list {
config {
name: 'potatoes_model'
base_path: 'C:/Code/potato-disease/saved_models'
model_platform: 'tensorflow'
model_version_policy: {all: {}}
}
}

You should edit your models.config config and put /potato-disease/saved_models as a base_path since it'll be the docker that will interpret your code it should have propper arguments depending of it's environnement, since it's running inside docker the absolute path should be considering that.

Converting docker run to python docker, issues passing environmental variables

I'm trying to convert the following docker run command to python docker run:
docker run -v ${HOME}/mypath/somepath:/root/mypath/somepath:ro -v /tmp/report/:/root/report -e MY_VAR=fooname DOCKER_IMAGE
and this is what I have so far:
client = docker.from_env()
client.containers.run(DOCKER_IMAGE, 'MY_VAR=fooname', volumes={
f'{home}/mypath/somepath': {'bind': '/root/mypath/somepath', 'mode': 'ro'},
'/tmp/report': {'bind': '/root/report', 'mode': 'rw'},
},)
But it seems like I'm running into issues when passing the env variables
docker.errors.APIError: 500 Server Error: Internal Server Error ("OCI runtime create failed: container_linux.go:346: starting container process caused "exec: \"MY_VAR=fooname\": executable file not found in $PATH": unknown")
What's the right way to pass the env variables?
EDIT
After changing it to
client.containers.run(DOCKER_IMAGE, None, environment=['MY_VAR=fooname'], volumes={
f'{home}/mypath/somepath': {'bind': '/root/mypath/somepath', 'mode': 'ro'},
'/tmp/report': {'bind': '/root/report', 'mode': 'rw'},
},)
I'm getting this error instead: docker.errors.ContainerError: Command 'None' in image
The docker build file has the command declared to just run a python script.

The second parameter of the run() method is the command, not the environment. If you don't have a command then pass None.
According to the documentation the environment must be either a dict or a list, so in your case:
client.containers.run(DOCKER_IMAGE, None, environment=['MY_VAR=fooname'], ...
Docs: https://docker-py.readthedocs.io/en/stable/containers.html#docker.models.containers.ContainerCollection.run

Remote GNU Parallel job gets "/bin/bash: Permission denied"

Having a problem where running a GNU Parallel job in distributed mode (ie. across multiple machines via the --sshloginfile) and finding that even though the job is running on each machine as the same user (or at least dictated that way in the file being given to the --sshloginfile (eg. myuser#myhostname00x)), getting a "Permission denied" error when the job tries to access a file. This occurs despite being able to (passwordless) ssh into the remote nodes in question and ls the files that the Parallel job claims it has no permissions for (the specified path is to a filesystem that is shared and NFS mounted on all the nodes).
Have a list file of nodes like
me#host001
me#host005
me#host006
and the actual Parallel job looks like
bcpexport() {
<do stuff to arg $1 to BCP copy to a MSSQL DB>
}
export -f bcpexport
parallel -q -j 10 --sshloginfile $basedir/src/parallel-nodes.txt --env $bcpexport \
bcpexport {} "$TO_SERVER_ODBCDSN" $DB $TABLE $USER $PASSWORD $RECOMMEDED_IMPORT_MODE $DELIMITER \
::: $DATAFILES/$TARGET_GLOB
where the $DATAFILES/$TARGET_GLOB glob pattern returns files from a directory. Running this job in single node mode works fine, but when running across all the nodes in the parallel-nodes.txt file throws
/bin/bash: line 27: /path/to/file001: Permission denied
/bin/bash: line 27: /path/to/file002: Permission denied
...and so on for all the files...
If anyone knows what could be going on here, advice or debugging suggestions would be appreciated.

I think the problem is the additional $:
parallel [...] --env $bcpexport bcpexport {} [...]
Unless you set the shell variable $bcpexport to something you probably meant bcpexport (no $) instead.
If $bcpexport is undefined, then it will be replace with nothing by the shell. Thus --env will eat the next argument, so you will really be running:
parallel [...] --env bcpexport {} [...]
which will execute {} as a command, which is exactly what you experience.
So try this instead:
parallel [...] --env bcpexport bcpexport {} [...]

Remove docker container at the end of each test

I'm using docker to scale the test infrastructure / browsers based on the number of requests received in Jenkins.
Created a python script to identify the total number of spec files and browser type, and spin-up those many docker containers. Python code has the logic to determine how many nodes are currently in use, stale and it determines the required number of containers.
I want to programmatically delete the container / de-register the selenium node at the end of each spec file (Docker --rm flag is not helping me).So that the next test will get a clean browser and environment.
The selenium grid runs on the same box where Jenkins is. Once I invoke protractor protractor.conf.js (Step 3), selenium grid will start distributing the tests to the containers created in Step 1.
When I say '--rm' is not helping, I mean the after step3 the communication is mainly between selenium hub and the nodes. I'm finding it difficult to determine which node / container was used by the selenium grid to execute the test and remove the container even before the grid sends another test to the container.
-- Jenkins Build Stage --
Shell:
# Step 1
python ./create_test_machine.py ${no_of_containers} # This will spin-up selenium nodes
# Step 2
npm install # install node modules
# Step 3
protractor protractor.conf.js # Run the protractor tests
--Python code to spin up containers - create_test_machine.py--
Python Script:
import sys
import docker
import docker.utils
import requests
import json
import time
c = docker.Client(base_url='unix://var/run/docker.sock', version='1.23')
my_envs = {'HUB_PORT_4444_TCP_ADDR' :'172.17.0.1', 'HUB_PORT_4444_TCP_PORT' : 4444}
def check_available_machines(no_of_machines):
t = c.containers()
noof_running_containers = len(t)
if noof_running_containers == 0:
print("0 containers running. Creating " + str(no_of_machines) + "new containers...")
spinup_test_machines(no_of_machines)
else:
out_of_stock = 0
for obj_container in t:
print(obj_container)
container_ip_addr = obj_container['NetworkSettings']['Networks']['bridge']['IPAddress']
container_state = obj_container['State']
res = requests.get('http://' + container_ip_addr + ':5555/wd/hub/sessions')
obj = json.loads(res.content)
node_inuse = len(obj['value'])
if node_inuse != 0:
noof_running_containers -= 1
if noof_running_containers < no_of_machines:
spinup_test_machines(no_of_machines - noof_running_containers)
return
def spinup_test_machines(no_of_machines):
'''
Parameter : Number of test nodes to spin up
'''
print("Creating " + str(no_of_machines) + " new containers...")
# my_envs = docker.utils.parse_env_file('docker.env')
for i in range(0,no_of_machines):
new_container = c.create_container(image='selenium/node-chrome', environment=my_envs)
response = c.start(container=new_container.get('Id'))
print(new_container, response)
return
if len(sys.argv) - 1 == 1:
no_of_machines = int(sys.argv[1]) + 2
check_available_machines(no_of_machines)
time.sleep(30)
else:
print("Invalid number of parameters")

Here the difference can be seen clearly when docker run with -d and --rm
Using -doption
C:\Users\apps>docker run -d --name testso alpine /bin/echo 'Hello World'
5d447b558ae6bf58ff6a2147da8bdf25b526bd1c9f39117498fa017f8f71978b
Check the logs
C:\Users\apps>docker logs testso
'Hello World'
Check the last run containers
C:\Users\apps>docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
5d447b558ae6 alpine "/bin/echo 'Hello Wor" 35 hours ago Exited (0) 11 seconds ago testso
Finally user have to remove it explicity
C:\Users\apps>docker rm -f testso
testso
Using --rm, the container is vanished including its logs as soon as the process that is
run inside the container is completed. No trace of container any more.
C:\Users\apps>docker run --rm --name testso alpine /bin/echo 'Hello World'
'Hello World'
C:\Users\apps>docker logs testso
Error: No such container: testso
C:\Users\apps>docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
I believe that it is clear, how to run the container and leaving no trace after process is finished inside of container.

so to start a container in detached mode, you use -d=true or just -d option. By design, containers started in detached mode exit when the root process used to run the container exits. A container in detached mode cannot be automatically removed when it stops, this means you cannot use the --rm option with -d option.
look at this
https://docs.docker.com/engine/reference/run/

You can use nose test. For every "def test_xxx()", it will call the setup and teardown functions with #with_setup decrator. Below is an example:
from nose.tools import *
c = docker.Client(base_url='unix://var/run/docker.sock', version='1.23')
my_envs = {'HUB_PORT_4444_TCP_ADDR' :'172.17.0.1', 'HUB_PORT_4444_TCP_PORT' : 4444}
my_containers = {}
def setup_docker():
""" Setup Test Environment,
create/start your docker container(s), populate the my_containers dict.
"""
def tear_down_docker():
"""Tear down test environment.
"""
for container in my_containers.itervalues():
try:
c.stop(container=container.get('Id'))
c.remove_container(container=container.get('Id'))
except Exception as e:
print e
#with_setup(setup=setup_docker, teardown=tear_down_docker)
def test_xxx():
# do your test here
# you can call a subprocess to run your selenium
Or, you write a separate python script to detect the containers you set up for your test, and then do something like this:
for container in my_containers.itervalues():
try:
c.stop(container=container.get('Id'))
c.remove_container(container=container.get('Id'))
except Exception as e:
print e

docker exec command doesn't return after completing execution

I started a docker container based on an image which has a file "run.sh" in it. Within a shell script, i use docker exec as shown below
docker exec <container-id> sh /test.sh
test.sh completes execution but docker exec does not return until i press ctrl+C. As a result, my shell script never ends. Any pointers to what might be causing this.

I could get it working with adding the -it parameters:
docker exec -it <container-id> sh /test.sh

Mine works like a charm with this command. Maybe you only forgot the path to the binary (/bin/sh)?
docker exec 7bd877d15c9b /bin/bash /test.sh
File location at
/test.sh
File Content:
#!/bin/bash
echo "Hi"
echo
echo "This works fine"
sleep 5
echo "5"
Output:
ArgonQQ#Terminal ~ docker exec 7bd877d15c9b /bin/bash /test.sh
Hi
This works fine
5
ArgonQQ#Terminal ~

My case is a script a.sh with content
like
php test.php &
if I execute it like
docker exec contianer1 a.sh
It also never returned.
After half a day googling and trying
changed a.sh to
php test.php >/tmp/test.log 2>&1 &
It works!
So it seems related with stdin/out/err.
>/tmp/test.log 2>&1
Please try.
And please note that my test.php is a dead loop script that monitors a specified process, if the process is down, it will restart it. So test.php will never exit.

As described here, this "hanging" behavior occurs when you have processes that keep stdout or stderr open.
To prevent this from happening, each long-running process should:
be executed in the background, and
close both stdout and stderr or redirect them to files or /dev/null.
I would therefore make sure that any processes already running in the container, as well as the script passed to docker exec, conform to the above.

OK, I got it.
docker stop a590382c2943
docker start a590382c2943
then will be ok.
docker exec -ti a590382c2943 echo "5"
will return immediately, while add -it or not, no use
actually, in my program, the deamon has the std input and std output, std err. so I change my python deamon like following, things work like a charm:
if __name__ == '__main__':
# do the UNIX double-fork magic, see Stevens' "Advanced
# Programming in the UNIX Environment" for details (ISBN 0201563177)
try:
pid = os.fork()
if pid > 0:
# exit first parent
os._exit(0)
except OSError, e:
print "fork #1 failed: %d (%s)" % (e.errno, e.strerror)
os._exit(0)
# decouple from parent environment
#os.chdir("/")
os.setsid()
os.umask(0)
#std in out err, redirect
si = file('/dev/null', 'r')
so = file('/dev/null', 'a+')
se = file('/dev/null', 'a+', 0)
os.dup2(si.fileno(), sys.stdin.fileno())
os.dup2(so.fileno(), sys.stdout.fileno())
os.dup2(se.fileno(), sys.stderr.fileno())
# do second fork
while(True):
try:
pid = os.fork()
if pid == 0:
serve()
if pid > 0:
print "Server PID %d, Daemon PID: %d" % (pid, os.getpid())
os.wait()
time.sleep(3)
except OSError, e:
#print "fork #2 failed: %d (%s)" % (e.errno, e.strerror)
os._exit(0)

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Can't run Docker command via ssh and python's subprocess module - docker

Related

Failed to start server. Error: Invalid argument

Converting docker run to python docker, issues passing environmental variables

Remote GNU Parallel job gets "/bin/bash: Permission denied"

Remove docker container at the end of each test

docker exec command doesn't return after completing execution

Categories

Resources