Move file downloaded in Dockerfile to harddrive

Move file downloaded in Dockerfile to harddrive - docker

First off, I really lack a lot of knowledge regarding Docker itself and its structure. I know that it'd be way more beneficial to learn the basics first, but I do require this to work in order to move on to other things for now.
So within a Dockerfile I installed wget & used it to download a file from a website, authentification & download are successful. However, when I later try move said file it can't be found, and it doesn't show up using e.g explorer either (path was specified)
I thought it might have something to do with RUN & how it executes the wget command; I read that the Id can be used to copy it to harddrive, but how'd I do that within a Dockerfile?
RUN wget -P ./path/to/somewhere http://something.com/file.txt --username xyz --password bluh
ADD ./path/to/somewhere/file.txt /mainDirectory
Download is shown & log-in is successful, but as I mentioned I am having trouble using that file later on as it's not to be located on the harddrive. Probably a basic error, but I'd really appreciate some input that might lead to a solution.
Obviously the error is produced when trying to execute ADD as there is no file to move. I am trying to find a way to mount a volume in order to store it, but so far in vain.
Edit:
Though the question is similiar to the "move to harddrive" one, I am searching for ways to get the id of the container created within the Dockerfile in order to move it; while the thread provides such answers, I haven't had any luck using them within the Dockerfile itself.

Short answer is that it's not possible.
The Dockerfile builds an image, which you can run as a short-lived container. During the build, you don't have (write) access to the host and its file system. Which kinda makes sense, since you want to build an immutable image from which to run ephemeral containers.
What you can do is run a container, and mount a path from your host as a volume into the container. This is the only way how you can share files between the host and a container.
Here is an example how you could do this with the sherylynn/wget image:
docker run -v /path/on/host:/path/in/container sherylynn/wget wget -O /path/in/container/file http://my.url.com
The -v HOST:CONTAINER parameter allows you to specify a path on the host that is mounted inside the container at a specified location.
For wget, I would prefer -O over -P when downloading a single file, since it makes it really explicit where your download ends up. When you point -O to the location of the volume, the downloaded file ends up on the host system (in the folder you mounted).
Since I have no idea what your image or your environment looks like, you might need to tweak one or two things to work well with your own image. As a general recommendation: For basic commands like wget or curl, you can find pre-made images on Docker Hub. This can be quite useful when you need to set up a Continuous Integration pipeline or so, where you want to use wget or curl but can't execute it directly.

Use wget -O instead of -P for specific file download
for e.g.,
RUN wget -O /tmp/new_file.txt http://something.com --username xyz --password bluh/new_file.txt
Thanks

Related

How to manage environment variables that point to credential files in docker container?

In my ~/.bashrc, I have set GOOGLE_APPLICATION_CREDENTIALS=~/.gc/credential_file_name.json.
My source code is located in (and I'm working from here) ~/repos/github_repo/ where I have a Dockerfile with its working directory set to /usr/src/app.
If I copy ~/.gc/credential_file_name.json to ~/repos/github_repo/credential_file_name.json and run the docker container with
docker run -t \
-e GOOGLE_APPLICATION_CREDENTIALS=/usr/src/app/credential_file_name.json \
...
the credential file gets picked up and subsequent code runs ok.
But, ideally, I don't want to copy the credential to my github repository, as that risks possibly pushing it on github (even when I add it to .gitignore, it's still not safe).
Additionally, instead of having to explicitly give then full path -e GOOGLE_APPLICATION_CREDENTIALS=/usr/src/app/credential_file_name.json, I would like to do something like -e GOOGLE_APPLICATION_CREDENTIALS=${GOOGLE_APPLICATION_CREDENTIALS} where ${GOOGLE_APPLICATION_CREDENTIALS} gets picked up from my ~/.bashrc.
But obviously, ${GOOGLE_APPLICATION_CREDENTIALS} will point to a path on my computer, which different directory structure than the docker container.
What is the best way to resolve this? I'm new to this and I came across direnv and .envrc, but don't quite understand.
I'm using Makefile to run the docker commands. I will try to avoid docker-compose, but if it solves this problem, please let me know.
Thanks for help!

Can I run scripts from a docker build context without a copy?

I want to build on top of a windows docker container by installing a couple programs. The files total .5 GB and I want to keep the layers as small as possible. I am hoping I can run the setup files from the build-context, and then have the build-context swept away at the end so I don't have a needless copy of the source files for the setup.exe embedded in my container layers. However, I have not found an example where this is the case. Instead I mostly see people run a COPY command to a temporary build folder, run their setup, then remove the folder. Won't those files still be in the container layers because the COPY command creates a new layer when it's done?
I don't know if the container can see the build-context directly. I was hoping for some magical folder filled with the build-context files so I could run a script using it, but haven't found anything.
It seems like the alternative is to create a private file-server and perform a RUN that can download them from that private server and unpack them, run the install, and remove them (all as 1 docker step). I understand this would make it more available to others who need to rerun the build, but I'm not convinced we'll need to rerun it. It's not likely to change as the container will build patches for a legacy application. Just seems like a lot to host files on a private, public-facing server for something that will get called once every couple years if ever.
So are these my two options?
Make a container with needless copies of source files embedded within
Host the files on a private file server and download/install/remove them
Or am I missing another option or point about how the containers work?

It's a long shot as Windows is a tricky thing with file system, but you could do this way:
In your Dockerfile use a COPY command, install then RUN del ... to remove the installation files
Build your image docker build -t my-large-image:latest .
Run your image docker run --name my-large-container my-large-image:latest
Stop the container
Export your container filesystem docker export my-large-container > my-large-container.tar
Import the filesystem to a new image cat my-large-container.tar | docker import - my-small-image
Caveat is you need to run the container once which might not be what you want. And also I haven't tested with windows container, sorry.

I usually do the download or copy in one step, then in the next step I do the silent installation and remove the installer.
# escape=`
FROM mcr.microsoft.com/dotnet/framework/wcf:4.8-windowsservercore-ltsc2016
SHELL ["powershell", "-Command", "$ErrorActionPreference = 'Stop'; $ProgressPreference = 'SilentlyContinue';"]
ADD https://download.visualstudio.microsoft.com/download/pr/6afa582f-fa26-4a73-8cb9-194321e85f8d/ecea51ead62beb7acc73ad9799511ffdb3083ad384fe04ec50e2cbecfb426482/VS_RemoteTools.exe VS_RemoteTools_x64.exe
RUN Start-Process .\\VS_RemoteTools_x64.exe -ArgumentList #('/install','/quiet','/norestart') -NoNewWindow -Wait; `
Remove-Item -Path C:/VS_RemoteTools_x64.exe -Force;
But otherwise, I don't think you can mount a custom volume while it's being built.

I didn't find a satisfactory answer to this. Docker seems designed for only the modern era and assumes you'll be able to download what you need via scripts and tools hitting APIs and file servers. The easiest option I found that I eventually went with was to host the files on a private file server or service (in my case, AWS S3).
I really wish there was a way to have files hosted by the docker daemon in some way, eg. if it acted like a temporary server that you could get data from via http instead of needing to COPY the files and create a layer. Alas, I found no such feature.
Taking this route made my container about a GB smaller.

Separating Docker files and application source files to optimize production environment

I have a bunch of (Ruby) scripts stored on a server. Up until now, my team has used them by opening an accessor app that launches a list of the script names, and they select the script they want to run in that instance on the files in their working folder. The scripts are run directly from the server, so updates made to the script files are automatically reflected when a user runs the script.
The scripts require a fair amount of specific dependencies, so I'm trying to move to a Docker-based workflow to eliminate the problems we encounter with incongruent computer environments. I've been able to successfully build an image with our script library and run an instance of it on my computer.
However, all of the documentation and tutorials include the application source files when building an image, so that all the files are copied over by the Dockerfile. From my understanding, this means that any time the code in the application files needs to be updated, all the users will need to rebuild the image before trying to run anything. I would very rarely ever need to make changes to the environment settings/dependencies, but the app code is changed relatively frequently, so it seems like having every user rebuild an image every single time a line of app code is changed would actually slow down everyone's workflow considerably.
My question is this: Is it not possible to have Docker simply create the environment that a user must have to run the applications, but have the applications themselves still run directly off the server where they were originally stored? And does a new container need to be created every single time a user wants to run any one of the scripts? (The users are not tech-savvy.)

Generally you'd do this by using a Docker image instead of the checked-out tree of scripts. You can use a Docker registry to store a built copy of the image somewhere on the network; Docker Hub works for this, most large public-cloud providers have some version of this (AWS ECR, Google GCR, Azure ACR, ...), or you can run your own. The workflow for using this would generally look like
# Get any updates to the "latest" version of the image
# (can be run infrequently)
docker pull ourorg/scripts
# Actually run the script, injecting config files and credentials
docker run --rm \
-v $PWD/config:/config \
-v $HOME/.ssh:/config/.ssh \
ourorg/scripts \
some_script.rb
# Nothing in this example actually requires a local copy of the scripts
I'm envisioning a directory that has kind of a mix of scripts and support files and not a lot of organization to it. Still, you could write a simple Dockerfile that looks like
FROM ruby:2.7
WORKDIR /opt/scripts
# As of Bundler 2.1, there is no compatibility between Bundler
# versions; this must match exactly what is in Gemfile.lock
RUN gem install bundler -v 2.1.4
# Copy the scripts in and do basic installation
COPY Gemfile Gemfile.lock .
RUN bundle install
COPY . .
ENV PATH /opt/scripts:$PATH
# Prefix all commands with...
ENTRYPOINT ["bundle", "exec"]
# The default command to run is...
CMD ["ls"]
On the back end you'd need a continous integration service (Jenkins is popular if a little unwieldy; there are a large selection of cloud-hosted ones) that can rebuild the Docker image whenever there's a commit to the source repository. You can generally rig this up so that it happens automatically whenever anybody pushes anything.
This process makes more sense of most people are just using the set of scripts and few of them are developing them. It's also a little bit difficult to discover what the scripts are (you might be able to docker run --rm ourorg/scripts ls though).
Is it not possible to have Docker simply create the environment that a user must have to run the applications, but have the applications themselves still run directly off the server where they were originally stored?
This always strikes me as an ineffective use of Docker. You have all of the fiddly steps of your current workflow that require everyone to run a git pull or equivalent routinely, but you also have to inject the host source tree into the container. If there are OS incompatibilities in, for example, native gems in the vendor tree, you have to work around that.
# You still need to do this periodically
git pull
# And you also need to
sudo docker run \
--rm \
-v $PWD:/app \
-v $HOME/config:/config \
-v $HOME/.ssh:/config/.ssh \
-w /app \
ruby:2.7 \
bundle exec ./some_script.rb
Some of these details (especially the config file and credentials) you'd have to deal with even if you did build an image; some others of the details you could improve by building an image. Inside the image you need to correct the ownership and permissions on the ssh keys and replace the $PWD/vendor tree with something the container can run, without modifying the mounted host directories.

Is it not possible to have Docker simply create the environment that a user must have to run the applications, but have the applications themselves still run directly off the server where they were originally stored?
You can build an image with all the environment already installed then mount the directory with the scripts so the container can read the scripts from the host. Something like
docker run -it --rm -v /opt/myscripts:/myscripts myimage somescript.rb
Then your image Dockerfile would end with:
WORKDIR /myscripts
ENTRYPOINT ["/usr/bin/ruby"]
And does a new container need to be created every single time a user wants to run any one of the scripts?
Of course, a container is just an isolated process managed by docker, you could make a wrapper so the users wouldn't need to type the full docker run command.

How do I edit the source code of a docker image?

I hope you are doing well.
I'm trying to rebuild a docker image.
What I mean is, I don't just want to get some files into the file system of the image, but want to edit the source code/the codebase itself... whatever it's called.
Especially, I'd like to make the image instances leave some log information.
But I'm totally clueless what to edit(even I can't find the source base code of that image)
Could you please help me edit the source code if you know how?
I would really appreciate. Thank you in advance.

I'd like to make the image instances leave some log information
This requirement may be reached with bind mounts:
$ docker run -d \
-it \
--name container-name \
-v "$(pwd)"/logs:/app/logs \
your-image
Here, $(pwd)/logs is a directory on your host filesystem that will contain the logs, and /app/logs is a directory that your application uses to write logs inside the container. Of course, you need to modify these according to your needs.
The other requirement might as well be achieved in a similar way:
I don't just want to get some files into the file system of the image, but want to edit the source code/the codebase itself
It depends on the tech stack you use for development. For example, if your app is written in PHP, you can mount source code folder to the container, and each time you modify a file, the same version will "appear" inside the container, since PHP is an interpreted language that does not require compilation.
If you use, for example, Go, this will not work the same way, since Go programs require compilation, and it is not enough to update source code inside the container. In such case you'll have to build the image again each time you need to make a change.

Copying an exe and composing it as a docker image and making it platform independent

I need to create a Docker image, which when run, should install an exe in the specified directory that mentioned in my docker file.
Basically, I need ImageMagick application. The docker file created should be platform independent, say if I ran in windows it should use windows distribution, Linux means Linux distribution. It would be great if it adds an environmental variable in the system. I browsed for the solution, but I couldn't find an appropriate solution.

I know it's a bit late but maybe someone (like me) was still searching.
I ended up using a java-imagemagick docker version from https://hub.docker.com/r/cpaitsupport/java-imagemagick/dockerfile
You can run docker pull cpaitsupport/java-imagemagick to get this docker image to your docker machine.
Now comes the tricky part: as I needed to run the imagemagick inside a docker container for my main app. Now you can COPY the files from cpaitsupport/java-imagemagick to your custom container. Example :
COPY --from=cpaitsupport/java-imagemagick:latest . ./some/dir/imagemagick
now you should have the docker file structure for your custom app and also under some/dir/imagemagick/ the file structure for imagemagick. Here are all ImageMagick relative files (also convert, magic, the libraries etc).
Now if you want to use ImageMagick in your Code you need to setup some ENV variables to your docker container with the "new" path to the ImageMagick directory. Example:
IM4JAVA_TOOLPATH=/some/dir/imagemagick/usr/bin \
LD_LIBRARY_PATH=/usr/lib:/some/dir/imagemagick/usr/lib \
MAGICK_CONFIGURE_PATH=/some/dir/imagemagick/etc/ImageMagick-7 \
MAGICK_CODER_MODULE_PATH=/some/dir/imagemagick/usr/lib/ImageMagick-7.0.5/modules-Q16HDRI/coders \
MAGICK_HOME=/some/dir/imagemagick/usr
Now delete (in Java Code) ProcessStarter.setGlobalSearchPath(imPath); this part if it is set. So you can use the IM4JAVA_TOOLPATH.
Now the ConvertCmd cmd = new ConvertCmd(); and cmd.run(op); should be working.
Maybe it's not the best way but worked for me and I was struggling a lot.
Hope this helps!
Feel free to correct or add additional info.

You can install (extract files) to the external hosting system using docker mount or volumes -
however you can not change system setting by updating environment variables of the hosting system from inside of the containers.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart