Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have read this and the intro docs on Docker.io and I like the concept it presents. But, can you help me understand it a little better? Can you give me some practical examples and/or case studies on how Docker is used and when it makes sense to actually use it?
Just a side note, I have recently started using Vagrant to distribute a preconfigured DEV box to our development team (so we all use the same base system). I have even seen examples where Docker is used inside Vagrant and whatnot but I don't get what are the benefits to do this in a practical sense; meaning that I understand the difference between VMs and containers and the logical separation the latter provide, but when should I use the one instead of the other and when Docker inside Vagrant? (this is a more specific question but I am mostly interested in the bigger picture as outlined in the first question above).
I participate in an effort to make software for doing science analysis more available to the research community. Often, the software in question is written by one individual or just a few without sufficient planning for reuse, such as one person on their own computer writing a Python script or a Matlab module. If the software works well, often others would like to try it themselves...but it can be a real challenge in some cases to successfully replicate an environment that's undocumented or difficult to reimplement.
Docker is a great tool to help others reuse software like this, since it is an even lower barrier of entry that writing a Vagrant script to install software in an environment. If I give a person a Docker container, she can do whatever she wants inside of it (write code, install libraries, set up environment, etc. When it's "done", she can save an image of it and publish the image in a Docker repository and tell another researcher, "here it is, just start it up and run this..."
We are also considering using containers as our own configuration management strategy for delivering and archiving production software...at least the server-side components.
We have also done some work with writing scripts in Python and shell to run data processing workflows of multiple Docker containers. One demo that we concocted was to run OpenCV on an image to extract faces of people, then ImageMagick to crop out the faces, and finally ImageMagick again to make a collage of all of the faces. We built a container for OpenCV and a container for ImageMagick, then wrote a Python script to execute a "docker run ..." on each of the containers with the requisite parameters. The Python scripting was accomplished using the docker-py project which worked well for what we needed from it.
Have a look at "how and why Spotify uses Docker" for a case study.
To answer your last question :
I have even seen examples where Docker is used inside Vagrant and
whatnot but I don't get what are the benefits to do this in a
practical sense; meaning that I understand the difference between VMs
and containers and the logical separation the latter provide, but when
should I use the one instead of the other and when Docker inside
Vagrant?
Docker are frequently used inside Vagrant because it doesn't currenlty run on Mac OSX (see Kernel Requirements), which is very commonly used by developers.
Then to have your dev-team working on the same containers, builds and tests products on a laptop and later on "running at scale, in production, on VMs, bare metal, OpenStack clusters, public clouds and more", you need Vagrant on their Mac OSX.
That said, here you can see another awesome case study.
There is a nice docker hack day use case:
Auto-deployment of a java stack with Git and Jenkins. You push your code into your contenairezied git repository. It will trigger a Jenkins build so your webapp will be packaged into a Container Docker and will be run by Docker.
https://www.youtube.com/watch?v=Q1l-WoJ7I7M
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I want to access my NVIDIA GPUs from inside containers. Can I do this without nvidia-container-runtime?
Requiring a custom Docker runtime just to talk to one device seems very strange. There is a whole universe of PCI devices out there. Why does this one need its own runtime? For example, suppose I had both NVIDIA and AMD GPUs. Would I be unable to access both from inside one container?
I understand that nvidia-container-runtime lets me control which GPUs are visible via NVIDIA_VISIBLE_DEVICES. But I do not care about this. I am not using containers to isolate devices; I am using containers to manage CUDA/CUDNN/TensorFlow version h*ll. And if I did want to isolate devices, I would use the same mechanism as forever: By controlling access to nodes in /dev.
In short, the whole "custom runtime" design looks flawed to me.
So, questions:
What am I missing?
Can I obtain access to my NVIDIA GPUs using the stock Docker (or podman) runtime?
If not, why not?
I certainly won't be able to answer every conceivable question related to this. I will try to give a summary. Some of what I write here is based on what's documented here and here. My discussion here will also be focused on linux, and docker (not windows, not singularity, not podman, etc.). I'm also not likely to be able to address in detail questions like "why don't other PCI devices have to do this?". I'm also not trying to make my descriptions of how docker works perfectly accurate to an expert in the field.
The NVIDIA GPU driver has components that run in user space and also other components that run in kernel space. These components work together and must be in harmony. This means the kernel mode component(s) for driver XYZ.AB must be used only with user-space components from driver XYZ.AB (not any other version), and vice-versa.
Roughly speaking, docker is a mechanism to provide an isolated user-space linux presence that runs on top of, and interfaces to, the linux kernel (where all the kernel space stuff lives). The linux kernel is in the base machine (outside the container) and much/most of linux user space code is inside the container. This is one of the architectural factors that allow you to do neato things like run an ubuntu container on a RHEL kernel.
From the NVIDIA driver perspective, some of its components need to be installed inside the container and some need to be installed outside the container.
Can I obtain access to my NVIDIA GPUs using the stock Docker (or podman) runtime?
Yes, you can, and this is what people did before nvidia-docker or the nvidia-container-toolkit existed. You need to install the exact same driver in the base machine as well as in the container. Last time I checked, this works (although I don't intend to provide instructions here.) If you do this, the driver components inside the container match those outside the container, and it works.
What am I missing?
NVIDIA (and presumably others) would like a more flexible scenario. The above description means that if a container was built with any other driver version (than the one installed on your base machine) it cannot work. This is inconvenient.
The original purpose of nvidia-docker was to do the following: At container load time, install the runtime components of the driver, which are present in the base machine, into the container. This harmonizes things, and although it does not resolve every compatibility scenario, it resolves a bunch of them. With a simple rule "keep your driver on the base machine updated to the latest" it effectively resolves every compatibility scenario that might arise from a mismatched driver/CUDA runtime. (The CUDA toolkit, and anything that depends on it, like CUDNN, need only be installed in the container.)
As you point out, the nvidia-container-toolkit has picked up a variety of other, presumably useful, functionality over time.
I'm not spending a lot of time here talking about the compatibility strategy ("forward") that exists for compiled CUDA code, and the compatibility strategy ("backward") that exists when talking about a specific driver and the CUDA versions supported by that driver. I'm also not intending to provide instructions for use of the nvidia-container-toolkit, that is already documented, and many questions/answers about it already exist also.
I won't be able to respond to follow up questions like "why was it architected that way?" or "that shouldn't be necessary, why don't you do this?"
To answer my own question: No, we do not need nvidia-container-runtime.
The NVIDIA shared libraries are tightly coupled to each point release of the driver. NVIDIA likes to say "the driver has components that run in user space", but of course that is a contradiction in terms. So for any version of the driver, you need to make the corresponding release of these shared libraries accessible inside the container.
A brief word on why this is a bad design: Apart from the extra complexity, the NVIDIA shared libraries have dependencies on other shared libraries in the system, in particular C and X11. If a newer release of the NVIDIA libraries ever required features from newer C or X11 libraries, a system running those newer libraries could never host an older container. (Because the container would not be able to run the newer injected libraries.) The ability to run old containers on new systems is one of the most important features of containers, at least in some applications. I guess we have to hope that never happens.
The HPC community figured this out and made it work some time ago. Here are some old instructions for creating a portable Singularity GPU container which injects the required NVIDIA shared libraries when the container runs. You could easily follow a similar procedure to create a portable OCI or Docker GPU container.
These days, Singularity supports a --nv flag to inject the necessary shared libraries automatically. It also supports a --rocm flag for AMD GPUs. (Yes, AMD chose the same bad design.) Presumably you could combine these flags if you needed both.
All of these details are pretty well-documented in the Singularity manual.
Bottom line: If you are asking the same question I was, try Singularity.
I am new to Docker and want to learn the ropes with real-life challenges.
I have an application hosted on IIS and has dependencies over SQL Express and SOLR.
I want to understand the following:
Is it possible to have my whole set-up, including of enabling IIS,
SQL, SOLR and my application in one single container?
If point 1 is feasible, how should I start with it?
Sorry if my questions are basics.
It is feasible, just not a good practice. You want to isolate the software stack to improve the mantainability (easier to deploy updates), modularity (you can reuse a certain component in a different project and even have multiple projects reusing the same image) and security (a software vulnerability in a component of the stack will hardly be able to reach a different component).
So, instead of putting all together into the same image, I do recommend using Docker Compose to have multiple images for each component of the stack (you can even pull generic, up-to-date images from Docker Hub) and assemble them up from the Compose file, so with a single command you can fire up all the components needed for your application to work.
That being said, it is feasible to have all the stack together into the same Dockerfile, but it will be an important mess. You'll need a Dockerfile that installs all the software required, which will make it bulky and hard to mantain. If you're really up for this, you'll have to start from a basic OS image (maybe Windows Server Core IIS) and from there start installing all the other software manually. If there are Dockerfiles for the other components you need to install and they share the same base image or a compatible one, you can straight copy-paste the contents into your Dockerfile, at the cost of said mantainability.
Also, you should definitely use volumes to keep your data safe, especially if you take this monolithic approach, since you risk losing data from the database otherwise.
TL;DR: yes, you can, but you really don't want to since there are much better alternatives that are almost as hard.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I am coding in a few different languages/technologies. Actually to be honest, I am only messing around, playing with golang, node.js, ruby on rails, etc.
But now I want to jump on the Docker bandwagon as well, but I am not sure what the benefits would be and if I should put in the effort.
What is the best practise in using Docker for development environments? Do I set up a separate container for each language or technology I dabble with? Or are containers overkill and I should just set up one VM (Linux VM on Windows host) where I do all the development?
How do you guys use Docker for development work?
You should definitely go ahead and do that as is the best approach to follow, even if you share volumes between containers, and avoid setting up different VMs if you have the necessary hardware-power in your workstation and do not need to distribute your environment on different workstations.
At my current company, I'm the guy responsible for setting up all the development environments among other things. We have a few monolithic applications but we're quickly decoupling multiple functionalities into separate micro-services.
How we're starting to manage that is, every micro-service code repository has everything self-contained, that being docker-compose files, with a makefile for the automation, tests, etc.
Developers just have to install docker-toolbox on their Mac OS X, clone the repo and type make. That will start the docker compose with all the links between the containers and all the necessary bits and pieces (DBs, Caches, Queues).
Direct link to the Makefile: https://github.com/marclop/prometheus-demo/blob/master/Makefile.
Also if you want to avoid setting up all the containers there's a few alternatives out there, for example Phusion's one: https://github.com/phusion/baseimage-docker.
I hope this answers your questions.
You shouldn't use Docker for your development environments, use regular vm's like VirtualBox for that if you want complete separation.
Docker is more suited for delivering finished code somewhere, e.g. to a staging environment.
And the reason is that Docker containers are not ideal for persisted state unless you mess around with sharing volumes.
The answer to this is inherently subjective and tied to how you like to do development. It will also be tied to how you want to deploy these in a testing scenario.
Jonas is correct, the primary purpose of Docker is to provide finished code to a staging/production environment HOWEVER I have used it for development and indeed it may be preferable depending on your situation.
To whit - lets say you have a single virtual server, and you want to minimize the amount of space you are using for your environment. The entire purpose of Docker is to store a single copy of the Linux kernel (and base software) and re-use them in each docker instance. You can also minimize the RAM and CPU usage used for running the base Linux "pieces" by running the Docker container on top of Linux.
Probably the most compelling reason (in your case) to use Docker would be to make finding the base setup you want easier. There are plenty of pre-made docker containers that you can use to build your test/dev environment and deploying your code after you are finished to a different machine is WAY easier using Docker than VMWare or Virtual Box (yes, you can create an OVF and that would work, but Docker is IMHO much easier).
I personally used Project Photon when I was playing around with this, which provided a very easy way to setup the base Docker setup in a VMWare environment.
https://blogs.vmware.com/cloudnative/introducing-photon/
The last time I used Docker was for an assignment in one of my classes where I was having to play around with MongoDB on a local instance. Setting up MongoDB would have been trivial on either (or both) Windows or Linux, but I felt the opportunity to learn Docker was too much to pass up. In the end, I feel much more comfortable now with Docker :)
There are ways of backing up Containers, which can (as Jonas pointed out) get kind of messy, but it isn't outside the realm of a reasonably technical individual.
Good Luck either way you go! Again, either way will obviously work - and I don't see anything inherently wrong with either approach.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I viewed some video tutorials about Docker container.
Yet it's purpose is still not clear to me.
Would it make sense to use Docker for relatively little Wordpress projects as a normal Webdesigner?
When does it make sense to use it in conjunction with Rails?
There's a number of reasons I can think of
As a demo
Lots of people are familiar with Wordpress so it works well as an example of using Docker. You create the MySQL container and then the Wordpress container, which links to MySQL, and then you've got a simple application built from two pieces.
As a packaging system
You can think of Docker as an alternative way to install software. Rather than getting the right versions of PHP and MySQL installed and configuring plugins, you can just fetch a Wordpress image that's configured correctly.
In the context of a Rails app, the first part of getting the app working is to fetch a bunch of dependencies. This leads to the possibility that your app worked in development but some server is inaccessible and your app can't be deployed. Or you depended on some system tool without thinking about it, and the tool is only on your dev machine. Packaging your app in Docker means that you either have the image on the server (so everything's installed and working) or you don't (and it's obvious why your app isn't running).
For isolation and security
You can run multiple Wordpress instances in separate containers just like many providers do with VMs.
If someone's Wordpress server gets broken into, you've still got Docker isolating them from the other Wordpress instances and the hosting server. You can assign resource limits on containers so that nobody can hog the CPU or memory.
It's also trivial to run multiple versions of Wordpress side by side, even if they have incompatible dependencies.
As a development environment
(This doesn't really apply to Wordpress, unless you're involved in Wordpress development.)
One of my favorite uses of Docker is to take our production images, run them locally (giving me a personal copy of our production system) and then run destructive tests against my environment. When the tests are done, I restart all the containers and I'm right back to my starting state. And I can hunt for regressions by standing up a second complete system using last week's images and comparing the two systems' responses to the same requests.
Docker is useful for creating simple, binary-like building blocks for deploying complex applications. Personally, I use it for simple ones as well, as it reduces the number of things that you have to worry about and increases the repeatability of deployment tasks, but there are plenty of other tools (VMs, Chef, etc) that will help with that too, so YMMV.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I recently researched some best practises about Docker and came across different opinions on how or if to handle the init process.
As pointed out here, the init process should not be run at all. I can follow the thought that a container should model a single process and not the whole OS.
On the other hand as described here there can be problems if I just ignore the basic OS services like syslog.
As often there is maybe no absolute answer on how to handle these cases. Can you share some experiences or more insights about this topic? For me both approached seem legit.
As often there is maybe no absolute answer on how to handle these
cases. Can you share some experiences or more insights about this
topic? For me both approached seem legit.
Spot on. There is no absolute answer to this question.
Now, having said that, I think that there are substantial advantages
to the single-process-per-container model, because that really
encourages you to create containers that are composable (like lego
blocks: you can put them together in different combinations to solve a
problem) and that are scalable (you can spin up more instances of a
particular service without too much effort). By not doing crazy
things like running an ssh daemon inside your container, you are
discouraged from editing things "in place" and will -- hopefully -- be
more likely to rely on Dockerfiles to generate your images, which
leads to a much more robust, reproducible process.
On the other hand, there are some applications that don't lend
themselves well to this model. For example, if you have an
application that forks lots of child processes and doesn't properly
wait() for them, you end up with a collection of zombie processes.
You can run a full-blown init process to solve this particular
problem, or you can run something simple like
this (disclaimer: I wrote that) or
this.
Some applications are just really tightly coupled, and while it's
possible to run them in separate containers through liberal
application of Docker volumes and --net=container:..., it's easier
just to let them run in the same container.
Logging in Docker is particular challenging. Running some sort of
log collector inside a container along with your application can be
one solution to that problem, but there are other solutions, too.
Logspout is an interesting
one, but I have also been looking at running systemd inside
containers in order to make use of journald for logging. So, while
I am still running one application process per container, I also
have an init process, and a journald process.
So, ultimately, it really depends on the situation: both on your needs
and the needs of the particular application you are trying to run.
Even in situations where a single process per container isn't
possible, designing containers to offer a single service still
confers many of the advantages I mentioned in the first paragraph.