I'm trying to automate the following loop with Docker: spawn a container, do some work inside of it (more than one single command), get some data out of the container.
Something along the lines of:
for ( i = 0; i < 10; i++ )
spawn a container
wget revision-i
do something with it and store results in results.txt
According to the documentation I should go with:
for ( ... )
docker run <image> <long; list; of; instructions; separated; by; semicolon>
Unfortunately, this approach is not attractive nor maintanable as the list of instructions grows in complexity.
Wrapping the instructions in a script as in docker run <image> /bin/bash script.sh doesn't work either since I want to spawn a new container for every iteration of the loop.
To sum up:
Is there any sensible way to run a complex series of
commands as described above inside the same container?
Once some data are saved inside a container in, say, /home/results.txt,
and the container returns, how do I get results.txt? The only way I
can think of is to commit the container and tar the file out of the
new image. Is there a more efficient way to do it?
Bonus: should I use vanilla LXC instead? I don't have any experience with it though so I'm not sure.
Thanks.
I eventually came up with a solution that works for me and greatly improved my Docker experience.
Long story short: I used a combination of Fabric and a container running sshd.
Details:
The idea is to spawn container(s) with sshd running using Fabric's local, and run commands on the containers using Fabric's run.
To give a (Python) example, you might have a Container class with:
1) a method to locally spawn a new container with sshd up and running, e.g.
local('docker run -d -p 22 your/image /usr/sbin/sshd -D')
2) set the env parameters needed by Fabric to connect to the running container - check Fabric's tutorial for more on this
3) write your methods to run everything you want in the container exploiting Fabric's run, e.g.
run('uname -on')
Oh, and if you like Ruby better you can achieve the same using Capistrano.
Thanks to #qkrijger (+1'd) for putting me on the right track :)
On question 2.
I don't know if this is the best way, but you could install SSH on you image and use that. For more information on this, you can check out this page from the documentation.
You post 2 questions in one. Maybe you should put 2. in a different post. I will consider 1. here.
It is unclear to me whether you want to spawn a new container for every iteration (as you say first) or if you want to "run a complex series of commands as described above inside the same container?" as you say later.
If you want to spawn multiple containers I would expect you to have a script on your machine handling that.
If you need to pass an argument to your container (like i): there is being work done on passing arguments currently. See https://github.com/dotcloud/docker/pull/1015 (and https://github.com/dotcloud/docker/pull/1015/files for documentation change which is not online yet).
Related
I'm admittedly very new to Docker so this might be a dumb question but here it goes.
I have a Python ETL script that I've packaged in a Docker container essentially following this tutorial, then using cloud functions and cloud scheduler, I have the instance turn start every hour, run the sync and then shut down the instance.
I've run into an issue though where after this process has been running for a while the VM runs out of hard drive space. The script doesn't require any storage or persistence of state - it pulls any state data from external systems and only uses temporary files which are supposed to be deleted when the machine shuts down.
This has caused particular problems where updates I make to the script stop working because the machine doesn't have the space to download the latest version of the container.
I'm guessing it's either logs or perhaps files created automatically to try to persist the state - either within the Docker container or on the VM.
I'm wondering whether if I could get the VM to run the instance with the "--rm" flag so that the image was removed when it was finished this could solve this problem. This would theoretically guarantee that I'm always starting with the most recent image.
The trouble is, I can't for the life of my find a way to configure the "rm" option within the instance settings and the documentation for container options only covers passing arguments to the container ENTRYPOINT and not the docker run options docker run [OPTIONS] IMAGE [COMMAND] [ARG...]
I feel like I'm either missing something obvious or it's not designed to be used this way. Is this something that can be configured in the Dockerfile or is there a different way I have to set up the VM in the first place?
Basically I just want the docker image to be pulled fresh and run each time and not leave any remnants on the VM that will slowly run out of space.
Also, I know Cloud Run might work in some similar situations but I need the script to be able to run for as long as it needs to (particularly at the start when it's backfilling data) and so the 15 minute cap on runtime would be a problem.
Any suggestions would be appreciated!
Note: I'm posting this as an answer as I need more space than a comment. If anyone feels it is not a good answer and wants it deleted, I will be delighted to do such.
Recapping the story, we have a Compute Engine configured to start a Docker Container. The Compute Engine runs the container and then we stop it. An hour later we restart it, let it run and then we stop it again. This continues on into the future. What we seem to find is that the disk associated with the Compute Engine fills up and we end up breaking. The thinking is that the container contained within the Compute Engine is created at first launch of the Compute Engine and then each time it is restarted, it is being "re-used" as opposed to a brand new container instance being created. This means that resources consumed by the container from one run to the next (eg disk storage) continues to grow.
What we would like to happen is that when the Compute Engine starts, it will always create a brand new instance of the container with no history / resource usage of the past. This means that we won't consume resources over time.
One way to achieve this outside of GCP would be to start the container through Docker with the "--rm" flag. This means that when the container ends, it will be auto-deleted and hence there will be no previous container to start the next time the Compute Engine starts. Again ... this is a recap.
If we dig through how GCP Compute Engines work as they relate to containers, we come across a package called "Konlet" (Konlet). This is the package responsible for loading the container in the Compute engine. This appears to be itself a Docker container application written in Go. It appears to read the metadata associated with the Compute Engine and based on that, performs API calls to Docker to launch the target container. The first thing to see from this is that the launch of the target Docker container does not appear to be executed through simple docker command line. This then implies that we can't "simply" edit a script.
Konlet is open source so in principle, we could study it in detail and see if there are special flags associated with it to achieve the equivalent of --rm. However, my immediate recommendation is to post an issue at the Konlet GitHub site and ask the author whether there is a --rm equivalent option for Konlet and, if not, could one be added (and if not, what is the higher level thinking).
In the meantime, let me offer you an alternative to your story. If I am hearing you correctly, every hour you fire a job to start a compute engine, do work and then shutdown the compute engine. This compute engine hosts your "leaky" docker container. What if instead of starting/stopping your compute engine you created/destroyed your compute engine? While the creation/destruction steps may take a little longer to run, given that you are running this once an hour, a minute or two delay might not be egregious.
we are working with fabric-ca docker image. it does not come with scp installed so we have two options:
Option 1: create a new image as described here
Option 2: install scp from the shell when container is started
we'd like to understand what are the pros and cons of each.
Option 1: allows you to build on it further, creates a stable state, you can verify / test an image before releasing
Option 2: takes longer to startup, requires being online during container start, it is harder to trace / understand and manage software stack locked in e.g. bash scripts that start dockers vs. Dockerfile and whatever technology you will end up using for container orchestration.
Ultimately, I use option 2 only for discovery, proof of concept or trying something out. Once I know I need certain container on ongoing basis, I build a proper image via Dockerfile.
You should consider your option 2 a non-starter. Either build a custom image or use a host directory bind-mount (docker run -v /host/path:/container/path option) to inject the data you need; I would probably prefer the bind-mount option.
It’s extremely routine to docker rm a container, and when you do, any changes you’ve made locally in a container are lost. For example, if there is a new software release or a critical security update, you have to recreate the container with a new image. You should pretty much never install software in an interactive shell in a container, especially if you’re going to use it to copy in data your application needs: you’ll have to repeat this step every single time you delete and recreate the container.
Option 1:
The BUILD of the image is longer, but you execute it only the first time
The RUN is faster
You don't need an internet connection at RUN
Include a verification of the different steps
Allow tracability
Option 2:
The RUN is longer
You need need an internet connection at RUN
Harder to trace
I'm trying to create a Docker container based on CentOS 7 that will host R, shiny-server, and rstudio-server, but to I need to have systemd in order for the services to start. I can use the systemd enabled centos image as a basis, but then I need to run the container in privileged mode and allow access to /sys/fs/cgroup on the host. I might be able to tolerate the less secure situation, but then I'm not able to share the container with users running Docker on Windows or Mac.
I found this question but it is 2 years old and doesn't seem to have any resolution.
Any tips or alternatives are appreciated.
UPDATE: SUCCESS!
Here's what I found: For shiny-server, I only needed to execute shiny-server with the appropriate parameters from the command line. I captured the appropriate call into a script file and call that using the final CMD line in my Dockerfile.
rstudio-server was more tricky. First, I needed to install initscripts to get the dependencies in place so that some of the rstudio scripts would work. After this, executing rstudio-server start would essentially do nothing and provide no error. I traced the call through the various links and found myself in /usr/lib/rstudio-server/bin/rstudio-server. The daemonCmd() function tests cat /proc/1/comm to determine how to start the server. For some reason it was failing, but looking at the script, it seems clear that it needs to execute /etc/init.d/rstudio-server start. If I do that manually or in a Docker CMD line, it seems to work.
I've taken those two CMD line requirements and put them into an sh script that gets called from a CMD line in the Dockerfile.
A bit of a hack, but not bad. I'm happy to hear any other suggestions.
You don't necessarily need to use an init system like systemd.
Essentially, you need to start multiple services, there are existing patterns for this. Check out this page about how to use supervisord to achieve the same thing: https://docs.docker.com/engine/admin/using_supervisord/
I often find myself in need of re-creating container with minor modifications to arguments used to docker run container originally (things like changing published ports, network, memory amount).
Now I am making images and running them in place of old containers.
This works fine but I don't always have original params to docker run saved and sometimes (esp. when there are lot of things to define) it becomes pain to recover them.
Is there any way to recover docker run arguments from existing container?
Sorry for being a couple of years late, but I had a similar question and no satisfying answer yet, so I still needed to find my way out.
I've found two sources addressing the issue:
A gist
To run, save this to a file, e.g. run.tpl and do docker inspect --format "$(<run.tpl)" name_or_id_of_running_container
A docker image
Quick run:
$ docker run --rm -v /var/run/docker.sock:/var/run/docker.sock nexdrew/rekcod <container>
Both solutions are quite simple to use, but the second one failed to generate the command for an Nginx container because they did not manage to have it quoted like this "nginx" "-g" "daemon off;"
So, I focused on the first solution, which is a golang template intended to feed the --format parameter of docker inspect. I liked it because it was kind of simple, elegant, and no other tool needed.
I've made some improvements in my forked gist and notified the original author about it.
Couple of answers to this. Run your containers using docker-compose, then you can just run compose files and retain all your configuration. Obviously compose is designed for multi-container applications, but massively underrated for single-container, complex run argument use cases.
Second one is to put your run command into a LABEL on the image. Take a look at Label Schema's docker.cmd etc... Then you can easily retrieve from the image (or from your Dockerfile).
the best way to do this is not to type the commands manually. put them into a shell script... a .sh file on linux/mac, or a .cmd file on windows. then you just run the shell script to create your container and you never have to worry about re-typing the commands and options, you'll never get them wrong, etc.
personally, i write my scripts with "npm scripts" in my package.json file. but the same thing can be done with any tool that can run command-line program with arguments
i do this along with a few other tricks to make sure i never fail to build my images or run my containers. makes life with docker soooo much easier. :)
You can use docker inspect to get the container's configuration. Reconstructing the docker run command from that can be somewhat tedious though.
Another option is to search your shell history using either history | grep "docker run" or ctrl+r (if you use bash). That way, you don't need to go out of your way to save the commands but can still recover them quickly.
I'm new to Docker and was wondering if it was possible (and a good idea) to develop within a docker container.
I mean create a container, execute bash, install and configure everything I need and start developping inside the container.
The container becomes then my main machine (for CLI related works).
When I'm on the go (or when I buy a new machine), I can just push the container, and pull it on my laptop.
This sort the problem of having to keep and synchronize your dotfile.
I haven't started using docker yet, so is it something realistic or to avoid (spacke disk problem and/or pull/push timing issue).
Yes. It is a good idea, with the correct set-up. You'll be running code as if it was a virtual machine.
The Dockerfile configurations to create a build system is not polished and will not expand shell variables, so pre-installing applications may be a bit tedious. On the other hand after building your own image to create new users and working environment, it won't be necessary to build it again, plus you can mount your own file system with the -v parameter of the run command, so you can have the files you are going to need both in your host and container machine. It's versatile.
> sudo docker run -t -i -v
/home/user_name/Workspace/project:/home/user_name/Workspace/myproject <container-ID>
I'll play the contrarian and say it's a bad idea. I've done work where I've tried to keep a container "long running" and have modified it, but then accidentally lost it or deleted it.
In my opinion containers aren't meant to be long running VMs. They are just meant to be instances of an image. Start it, stop it, kill it, start it again.
As Alex mentioned, it's certainly possible, but in my opinion goes against the "Docker" way.
I'd rather use VirtualBox and Vagrant to create VMs to develop in.
Docker container for development can be very handy. Depending on your stack and preferred IDE you might want to keep the editing part outside, at host, and mount the directory with the sources from host to the container instead, as per Alex's suggestion. If you do so, beware potential performance issue on macos x with boot2docker.
I would not expect much from the workflow with pushing the images to sync between dev environments. IMHO keeping Dockerfiles together with the code and synching by SCM means is more straightforward direction to start with. I also carry supporting Makefiles to build image(s) / run container(s) same place.