Copy many large files from the host to a docker container and back - docker

I am a beginner with Docker and I have been searching for 2 days now and I do not understand which would be a better solution.
I have a docker container on a Ubuntu server. I need to copy many large video files to the Ubuntu host via FTP. Docker via cron will process the videos using ffmpeg and save the result to the Ubuntu host somehow so the files are accessible via FTP.
What is the best solution:
create a bind drive - I understand the host may change files in the bind drive
create a volume but I do not understand how may I add files to the volume
create a folder on the Ubuntu and have a cron that will copy using "docker cp" command and after a video has been processed to copy it to the host?
Thank you in advance.

Bind-mounting a host directory for this is probably the best approach, for exactly the reasons you lay out: both the host and container can directly read and write to it, but the host can't easily write to a named volume. docker cp is tricky, you note the problem of knowing when the process is completed, and anyone who can run any docker command at all can pretty trivially root the host; you don't want to give this permission to something network-facing.
If you're designing a larger-scale system, you also might consider an approach where no files are actually shared at all. The upload server sends the files (maybe via HTTP POST) to an internal storage service, then posts a message to a message queue (maybe RabbitMQ). That then retrieves the files from the storage service, does its work, uploads the result, and posts a response message. The big advantages of this approach are being able to run it on multiple systems, easily being able to scale the individual components of it, and not needing to worry about filesystem permissions. But, it's a much more involved design.

Related

Dockerized executable read/write on host filesystem

I just dockerized an executable that reads from a file and creates a new file in the very directory that file came from.
I want to use Docker in that setup, so that I avoid installing numerous third-party libraries in the production environment.
My problem now: I have file /this/is/a.file on my underlying (host) file system and my executable is supposed to create /this/is/b.file.
As far as I see it, the only chance to get this done is by mapping a volume that points to /this/is and then let the executable know where I mounted it to in the docker, container.
Am I right? Or is there a way that I just pass docker run mydockerizedstuff /this/is/a.file without using Docker volumes?
You're correct, you need to pass in /this/is as a volume and the executable will write to that location.
If you want to constrain the thing even more, you can pass /this/is/b.file as a volume. You need to create it (simply via touch) beforehand, otherwise Docker will consider it a directory and create it as such for you, but you'll know that the thing won't be able to create /this/is/c.file or any other thing.

Editing Docker content

I am looking at Docker to share and contain applications, after reading several articles on the subject I can't figure out what the steps would be to use a Docker container for actual development. Is that even acceptable?
My thought process goes like this
Create DockerFile
Share DockerFile
User A and B download DockerFile
User A and B install images
User A and B are able to make changes to their local containers
User A and B submit changes
The way I have been reading different articles Docker is only to share applications but not for continuous development the way I am thinking, the closest I can think of on what I am explaining above is to make changes outside the containers and commit to a repo outside the containers, then the containers will update the local repo and re-run the application internally but you would never develop on the container itself.
Using docker for development process is not only possible, but handy and convenient in my opinion.
What you might have missed during your study of the docker ecosystem is the concept of volumes.
With volumes you can bind mount a directory of your host (the developer computer) into the container.
You may want to use volumes to share application data folder, making it possible for the devs to work on their local copies normally, but have their application served by a docker container.
A link to get started: https://docs.docker.com/engine/admin/volumes/volumes/

Docker separation of concerns / services

I have a laravel project which I am using with docker. Currently I am using a single container to host all the services (apache, mySQL etc) as well as the needed dependencies (project files, git, composer etc) I need for my project.
From what I am reading the current best practice is to put each service into a separate container. So far this seems simple enough since these services are designed to run at length (apache server, mySQL server). When I spin up these 'service' containers using -d they remain running (docker ps) since their main process continuously runs.
However, when I remove all the services from my project container, then there is no main process left to continuously run. This means my container immediately exits once spun up.
I have read the 'hacks' of running other processes like tail -f /dev/null, sleep infinity, using interactive mode, installing supervisord (which I assume would end up watching no processes in such containers?) and even leaving the container to run in the foreground (taking up a terminal console...).
How do I network such a container to keep it running like the abstracted services but detached without these hacks? I cannot seem to find much information on this in the official docker docs nor can I find any examples of other projects (please link any)
EDIT: I am not talking about volumes / storage containers to store the data my project processes, but rather how I can use a container to store the project itself and its dependencies that aren't services (project files, git, composer)
when you run the container try running with the flags ...
docker run -dt ..... etc
you might even try .....
docker run -dti ..... etc
let me know if this brings any joy. has certainly worked for me on occassions.
i know you wanted to avoid hacks but if the above fails then also add ...
CMD cat
to the end of your Dockerfile - it is a hack but is the cleanest hack :)
So after reading this a few times along with Joachim Isaksson's comment, I finally get it. Tools don't need the containers to run continuously to use. Proper separation of the project files, services (mySQL, apache) and tools (git, composer) are done differently.
The project files are persisted within a data volume container. The services are networked since they expose ports. The tools live in their own containers which share the project files data volume - they are not networked. Logs, databases and other output can be persisted in different volumes.
When you wish to run one of these tools, you spin up the tool container by passing the relevant command using docker run. The tool then manipulates the data within the directory persisted within the shared volume. The containers only persist as long as the command to manipulate the data within the shared volume takes to run and then the container stops.
I don't know why this took me so long to grasp, but this is the aha moment for me.

How to share images between multiple docker hosts?

I have two hosts and docker is installed in each.
As we know, each docker stores the images in local /var/lib/docker directory.
So If I want to use some image, such as ubuntu, I must execute the docker pull to download from internet in each host.
I think it's slow.
Can I store the images in a shared disk array? Then have some host pull the image once, allowing every host, with access to the shared disk, to use the image directly.
Is it possible or good practice? Why docker is not designed like this?
It may need to hack the docker's source code to implement this.
Have you looked at this article
Dockerizing an Apt-Cacher-ng Service
http://docs.docker.com/examples/apt-cacher-ng/
extract
This container makes the second download of any package almost instant.
At least one node will be very fast, and I think it should possible to tell the second node to use the cache of the first node.
Edit : you can run your own registry, with a command similar to
sudo docker run -p 5000:5000 registry
see
https://github.com/docker/docker-registry
What you are trying to do is not supposed to work as explained by cpuguy83 at this github/docker issue.
Indeed:
The underlying storage driver would need to synchronize access.
Sharing /var/lib/docker is far not enough and won't work!
According to the doc.docker.com/registry:
You should use the Registry if you want to:
tightly control where your images are being stored
fully own your images distribution pipeline
integrate image storage and distribution tightly into your in-house development workflow
So I guess that this is the (/your) best option to work this out (but I guess that you got that info -- I just add it here to update the details).
Good luck!
Update in 2016.1.25 docker mirror feature is deprecated
Therefore this answer is not applicable now, leave for reference
Old info
What you need is the mirror mode for docker registry, see https://docs.docker.com/v1.6/articles/registry_mirror/
It is supported directly from docker-registry
Surely you can use public mirror service locally.

Graphite installation in a docker container - volume query

I am installing graphite via a docker container.
I have seen that whisper files should not be saved in the container.
So I will be using a data volume from docker to save these on the host machine.
My question is is there anything else I should be saving on the host (I know this is subjective so I guess Im looking for recommendations on whats important)?
Don't believe I need configuration e.g. carbon conf as this will come from my installation
So I'm thinking are there any other files from graphite I need (e.g log files etc)?
What is your reason for keeping log files? Though you do need the directory structure in place. Logging defaults to /opt/graphite/storage/logs. In here you have carbon-cache/ and webapp/ directories. The log directory for the webapp is set in the config- local_settings.py, whereas carbon uses carbon.conf. The configs are well documented so you can look into them for specific information.
Apart from configs that are generated during installation the only other 'file' crucial for the webapp to work is graphite.db in the /opt/graphtie/storage. It is used internally by the django webapp for housekeeping information such as user-auth etc. It gets generated by python manage.py --syncdb so i believe you can generate it again at the target system.

Resources