Docker and file upload server - docker

I am new to the Docker concept and trying to figure out how an image and container are working. I currently have a running server with the ability to upload files on it and to crop them on demand using a PHP tool. (this is a theoretical case for this question).
As I understood: A lazy way would be:
To make an ubuntu image with a running apache serving both the upload and cropping service.
Then I would run a container from this image and expose the upload directory so that the uploaded content is saved somewhere on my computer.
No uploaded content will be stored in the image at anytime, all the content must be stored either within the container or on my computer through a path mapping. Is this accurate ?
PS: I am aware the good practice would be to have an upload service in a container and another running container for the crop service.

Yes, that is correct.
Containers can store modifications, and can even be saved to new images if desired. However, there's a limit (around ~8 GB I think unless you reconfigure it). For something like building an image library, or analyzing big data, it would be best to link in a volume in the run command with the -v option.
No matter what happens in the container, the original image is unmodified unless you explicitly overwrite that image from the host. For example, if someone uses an exploit to hijack the task in the container (like an SQL injection against poorly written code from a 3rd party that is running in the container), only that one container is modified. You could respond by stopping the exploited container, copying out some log files for forensic log examination (using docker cp), and then running a new container from the image (which would still have the same weaknesses but without any of the unauthorized changes that might exist in the stopped container).
In your Dockerfile for building the image, you probably would expose port 80 if your apache will run on port 80. Later, when running the container with docker run you can choose whether to expose port 80 -p 80, or remap it to another port -p 8080:80 or to remap it randomly -P

Related

Blazor server app multi-platform and docker data persistency

I am creating a core library for Blazor server Apps creating a core DB automatically at runtime.
Until now, I create the database in Environment.SpecialFolder.LocalApplicationData which I got working on multiple platforms (OSX, Ubunt and Windows).
As I just discovered Docker's simplicity for deploying images, I am trying to make my library compatible with it.
So I face two issues:
Determine if the app is hosted on a Docker image or not
Persist data on a different volume that is NOT on the host if running Docker.
Of course, if in Docker, I shall not use Environment.SpecialFolder.LocalApplicationData as this is not a persistent location on the image itself. I can mount a volume when starting the image as described here.
So my natural idea is to assume users will mount a volumne with a specific path when starting the image, say
docker volume create MyAppDB
and run it with
docker run -dp 3000:3000 -v MyAppDB:/app/Data/MyAppDB myBuildDockerImage
can be verified by testing the existance of folder /app/Data/MyAppDB, and once 1. is verified, 2. become trivial.
If the folder does not exist, I am for sure on non-docker image... Well am I? What if users forgot to mount the volume? Or misspelled it? Maybe the folder does not exist because I am running on a non-docker environment...!
Is there a way to tweak my docker image when building it to force the mount volumes - i.e. created by ME and not the end-user? That seems safest... Alternatively, if not possible, can I add some specific element in Docker image to make absolutely sure I am running on the docker image I built or not?

How docker detects which changes should be saved and which not?

I know that when we stop docker our changes are lost. There are many answers how to prevent this - commit each time. Idea is that when docker runs it will spin up a fresh container based on the image. On the other hand container persists some data after it exists unless you start using --rm.
Just to simplify:
If you run apt-get install vim, you must commit to save the change
BUT If you change nginx.conf or upload new file to HDFS, you do not lose the data.
So, just curious:
How docker knows what to save and what not? Ex: At the end of apt-get-install we have new files in the system. The same is when I upload new file. for the container/image there is NO difference , Right? Just I/O modification. So how docker know which modification should be saved when we stop the image?
The basic rules here:
Anything you explicitly store outside the container — a database, S3 — will outlive the container.
If you attach a volume to the container when you create the container using a docker run -v option or a Docker Compose volumes: option, any data written to that directory outlives the container. (If it’s a named volume, it lasts until you docker volume rm it.)
Anything else in the container filesystem is lost as soon as you docker rm the container.
If you need things like your application source code or a helper tool installed in an image, write a Dockerfile to describe how to build the image and run docker build. Check the Dockerfile into source control alongside your application.
The general theory of working with Docker is that you always start from a clean slate. When you docker build an image, you start from a base image and install your application into it; you never try to upgrade an installed application. Similarly, when you docker run a container, you start from a fresh copy of its image.
So the clearest answer to the question you ask is really, if you consistently docker rm a container when you stop it, when you docker run a new container, it will have the base image plus the content from the mounted volumes. Docker will never automatically persist anything outside of this.
You should never run docker commit: this leads to magic images that can’t be recreated later (in six months when you discover a critical security issue that risks taking your site down). Similarly, you should never install software in a running container, because it will be lost as soon as the container exits; add it to your Dockerfile and rebuild.
For any Container working with the Docker platform by default all the data generated is temporary and all the file generation or data generation is temporary and no data will persist if you have not mounted the filesystem part of if you have not attached volumes to the container.
IF you are finding that the nginx.conf is getting reused even after changes i would suggest try to find what directories are you trying to mount or mapped to the docker volumes.
The configurations for nginx which reside at /etc/nginx/conf.d/* and you might be mapping the volume with this directory. So if you make any changes in a working container and then remove the container the data will still persist as the data gets written to the writable layer. If the new container which you deploy later with the same volume mapping you will find all the changes you had initially done in the previous case are reflected in the newer container as well.

Writing and reading files to/from host system on Docker

Context:
I have a Java Spring Boot Application which has been deployed to run on a Docker Container. I am using Docker Toolbox to be precise.
The application exposes a few REST API's to upload and download files. The application works fine on Docker i.e. i'm able to upload and download files using API.
Questions:
In the application I have hard coded the path as something like "C:\SomeFolder". What location is this stored on the Docker container?
How do I force the application when running on Docker to use the Host file system instead of Docker's File system?
This is all done by Docker Volumes.
Read more about that in the Docker documentation:
https://docs.docker.com/storage/volumes/
In the application I have hard coded the path as something like "C:\SomeFolder". What location is this stored on the Docker container?
c:\SomeFolder, assuming you have a Windows container. This is the sort of parameter you'd generally set via a command-line option or environment variable, though.
How do I force the application when running on Docker to use the Host file system instead of Docker's File system?
Use the docker run -v option or an equivalent option to mount some directory from the host on that location. Whatever the contents of that directory are on the host will replace what's in the container at startup time, and after that changes in the host should be reflected in the container and vice versa.
If you have an opportunity to rethink this design, there are a number of lurking issues around file ownership and the like. The easiest way to circumvent these issues are to store data somewhere like a database (which may or may not itself be running in Docker) and use network I/O to send data to and from the container, and store as little as possible in the container filesystem. docker run -v is an excellent way to inject configuration files and get log files out in a typical server-oriented use.

How do you access/pull data from an outside server into a Docker container?

I have run into more and more data scientists who use Docker containers, in order to allow for reproducible analyses.
Question: How do you download/pull data into a Docker container?
If the data is downloadable via a URL, naturally you could add a line like this in the Dockerfile
wget www.server_to_data.org/path/path/myfile.gz
But I have data sitting on a server, whereby users ssh into the server with a key-pair in ~/.ssh/id_rsa.pub. I'm not sure how this could work security-wise.
How does one normally download or access your data in this case?
One could possible mount the server, but I'm not sure how one accesses these within the Container/VM.
For your current situation, where you've got the data on a server, and you're handing out key pairs to people who should have access. If you want to just use that existing infrastructure without changing it. Could be done by setting a volume for the ssh keys in the image and then people running the image would need to start the container with the volume set to their ssh key.
Set a volume in the image with the Dockerfile:
FROM ubuntu
#[RUN your installation process]
VOLUME /home/container_user/.ssh
Run the container with mounting the location of the ssh key to that volume:
docker run -d -v PATH_TO_DRECITORY_HOLDING_SSH_KEY:/home/container_user/.ssh [OTHER OPTIONS] IMAGE[:TAG|#DIGEST] [COMMAND] [ARG...]
Then you can download the data as part of the script that runs when the container is started.
The basic idea is lifted from How can I get my ~/.ssh keys into a docker container running locally?
That said, if we back the question up a little and ask how exactly people are going to be using your image, where the image is going to be stored (public or private repo) and how often the data changes there may be some more user friendly ways to satisfy the need. Also if you allow for docker-compose to be the means by which the container is run there is some other options available to you.

Docker Storage - Getting a Layman's answer

I am just discovering Docker - I am finding so much information, but I can't seem to get a straight answer on this option. If someone could give me a clear explanation based on my understanding I have of it so far it would be appreciated.
I am downloading a docker image locally - say the default one from Microsoft, using microsoft/dotnet-samples:dotnetapp-nanoserver I am lost as to where this is downloaded to? Is this downloaded and installed as a program on the host machine, with a isolated script that controls the container? The download is about 1.3 gigs because it includes .Net Core
In another example, if I download apache2 to run as a web server, does it install it in the default paths on the host system, but every container I want to use taps into that - or does every container contain it's isolated version of apache2?
I ask this because I can't find files that mimic the file size of these programs.
I know they are not complete VM's but where can I find the files associated with a container?
I am using Windows Server 2016 and a Mac since I want to do some trials with containers.
An image is a filesystem
Docker images are encapsulated filesystems. The software and files inside are not being directly installed onto your system.
You can think of a Docker image sort of the way you think of a .zip file. You can download a .zip file from somewhere, and it is a single file. Contained inside it might be one file, or dozens of files, or a nested tree of directories and files. But on your disk, it exists as one file.
A Docker image is similar (conceptually, at least... the details are more complicated).
Image storage
Where images are stored varies by platform. On a Linux system, they are usually under /var/lib/docker. I don't know where they are stored on Windows, but this is a more or less opaque store. Poking around inside will not reveal very much to you anyway.
To see what you have, you should use the docker images command. It will show you the images you have stored locally.
Like I said earlier, each image may consist of multiple layers. By default, that command will only show you the top layer, which is the one you'll care about, to run containers from. Technically, there are other layers, and you can see all of them using docker images -a.
Where is the software installed?
When you download an Apache image, nothing is installed on your system at all. The image file(s) are downloaded and stored. Hiding inside is Apache and everything Apache needs in order to run, but Apache is not installed onto your Windows OS anywhere.
When you want to use Apache, you would run a container. Docker takes the Apache image and, using it as a starting template, creates a running process container, inside of which Apache is running. This is isolated from your operating system. Apache is only running inside of the container.
If you run a second container from the Apache image, you now have two completely separate Apache instances running, each in their own isolated filesystem environment.
Where can I find the files?
If you just want to poke around in the container filesystem, you can start the container in interactive mode, and run a shell instead of whatever it normally runs (like Apache). For instance, if you have an image apache:latest, you can do this:
docker run --rm -it apache:latest bash
This will run an instance of apache:latest, but instead of launching Apache, it will run a bash shell and drop you into it.
The --rm flag is convenient for cases like this. It tells Docker to remove the running container when its process exits. That way for a "just looking at something" container like this one, it cleans up after itself.
The -it is actually two flags. -i is interactive mode, and -t allocates a terminal. This is a common flag to pass when you want to directly interact with the container.
Once inside, you can use the usual commands to look at files and directory listings. Note that many containers are stripped-down, though. You don't always have all of the tools you are used to having. Things like ls in Linux are typically there, but a lot of things will not be.
Simply exit when you're done looking around to exit.
Looking around while the process is running
You can also look at the container while Apache is running. First start it normally.
docker run -d apache:latest
This will return a container ID. You can also get the ID from docker ps. Then you can attach to the container with that ID by executing a shell.
docker exec -it <container_id> bash
Now you're in the container in a shell, but Apache is in there running.

Resources