Containerized Azure Function does not see databricks odbc driver - docker

I have below Dockerfile:
FROM AS installer-env
COPY . /src/dotnet-function-app
RUN cd /src/dotnet-function-app && \
mkdir -p /home/site/wwwroot && \
dotnet publish *.csproj --output /home/site/wwwroot
ENV AzureWebJobsScriptRoot=/home/site/wwwroot \
#ODBCINI=/etc/ \
#ODBCSYSINI=/etc/odbcinst.ini \
WORKDIR ./home/site/wwwroot
COPY --from=installer-env /home/site/wwwroot /home/site/wwwroot
RUN apt update && apt install -y apt-utils odbcinst1debian2 libodbc1 odbcinst vim unixodbc unixodbc-dev freetds-dev curl tdsodbc unzip libsasl2-modules-gssapi-mit
RUN curl -sL -o && unzip
RUN dpkg -i SimbaSparkODBC-
RUN export ODBCINI=/etc/odbc.ini ODBCSYSINI=/etc/odbcinst.ini SIMBASPARKINI=/opt/simba/spark/lib/64/simba.sparkodbc.ini
Purpose why this azure function app is containerized is to enable using databricks odbc driver to connect to azure databricks instance and delta lake. I have read on stackoverflow, in other thread, that there is no other way for installing custom drivers if app service is not containerized. I thought it should work the same with function app if is containerized.
Unfortunatelly I get exception that:
ERROR [01000] [unixODBC][Driver Manager]Can't open lib 'Simba Spark ODBC Driver' : file not found
Dependency unixODBC with minimum version 2.3.1 is required. Unable to load shared library '' or one of its dependencies. In order to help diagnose loading problems, consider setting the LD_DEBUG environment variable: cannot open shared object file: No such file or directory
even if I point drivers in this line:
RUN export ODBCINI=/etc/odbc.ini ODBCSYSINI=/etc/odbcinst.ini SIMBASPARKINI=/opt/simba/spark/lib/64/simba.sparkodbc.ini
Looks like home/site/wwwroot can not access above folders. What interessting I also tried to copy content of /etc to /home/site/wwwroot/bin to set enrironment variable pointing from that folder, but it is not possible to copy:
COPY . /home/site/wwwroot/bin
Generally, I pass connection details to databricks instance in connection string, but I also tried to point /etc files by below command:
RUN gawk -i inplace '{ print } ENDFILE { print "[ODBC Drivers]" }' /etc/odbcinst.ini
but I get exception during building that:
gawk: inplace:59: warning: inplace::begin: Cannot stat '/etc/odbcinst.ini' (No such file or directory)


Source To Image not finding C++ libs

I am creating a custom Builder Image using S2i dotnet core. This will run in OpenShift linux container
I have modified the custom builder image and included few lines to copy few dlls and ".so" files
When running the container in OpenShift I am facing the below error
error says
"unable to load shared library 'CustomCppWrapper' or one of its dependencies. In order to help diagnose loading problems,
consider setting the LD_DEBUG environment variable: libWrapperName: cannot open shared object file: No such file or directory"
I have set the LD_DEBUG environment variable and found below few errors
/lib64/ error: version lookup error: version `CXXABI_1.3.8' not found (required by /opt/app-root/app/ (fatal)
/lib64/ error: version lookup error: version `CXXABI_1.3.8' not found (required by ./ (fatal)
I did below command and found below
./ /lib64/ version `CXXABI_1.3.8' not found (required by ./
./ /lib64/ version `GLIBCXX_3.4.20' not found (required by /ab/sdk/customlib/gcc540/lib/
./ /lib64/ version `GLIBCXX_3.4.20' not found (required by /ab/sdk/customlib/gcc540/lib/
Below is my Custom Docker file builder image
FROM dotnet/dotnet-31-runtime-rhel7
# This image provides a .NET Core 3.1 environment you can use to run your .NET
# applications.
ENV PATH=/opt/app-root/src/.local/bin:/opt/app-root/src/bin:/opt/app-root/node_modules/.bin:${PATH} \
LABEL io.k8s.description="Platform for building and running .NET Core 3.1 applications" \
# Labels consumed by Red Hat build service
LABEL name="dotnet/dotnet-31-rhel7" \
com.redhat.component="rh-dotnet31-container" \
version="3.1" \
release="1" \
#-------------------------- COPY CPP LIBS
COPY CustomCppWrapper.lib /opt/app-root/app
COPY /opt/app-root/app
# Labels consumed by Eclipse JBoss OpenShift plugin
# Switch to root for package installs
# Copy the S2I scripts from the specific language image to $STI_SCRIPTS_PATH.
COPY ./s2i/bin/ /usr/libexec/s2i
RUN INSTALL_PKGS="rh-nodejs10-npm rh-nodejs10-nodejs-nodemon rh-dotnet31-dotnet-sdk-3.1 rsync" && \
yum install -y --setopt=tsflags=nodocs --disablerepo=\* \
--enablerepo=rhel-7-server-rpms,rhel-server-rhscl-7-rpms,rhel-7-server-dotnet-rpms \
rpm -V $INSTALL_PKGS && \
yum clean all -y && \
# yum cache files may still exist (and quite large in size)
rm -rf /var/cache/yum/*
# Directory with the sources is set as the working directory.
RUN mkdir /opt/app-root/src
WORKDIR /opt/app-root/src
# Trigger first time actions.
RUN scl enable rh-dotnet31 'dotnet help'
# Build the container tool.
RUN /usr/libexec/s2i/container-tool build-tool
# Since $HOME is set to /opt/app-root, the yum install may have created config
# directories (such as ~/.pki/nssdb) there. These will be owned by root and can
# cause actions that work on all of /opt/app-root to fail. So we need to fix
# the permissions on those too.
RUN chown -R 1001:0 /opt/app-root && fix-permissions /opt/app-root
# Needed for the `dotnet watch` to detect changes in a container.
# Run container by default as user with id 1001 (default)
USER 1001
# Set the default CMD to print the usage of the language image.
CMD /usr/libexec/s2i/usage
Your code depends on but it would seem that version isn't installed
In your Dockerfile, add the yum install command that should do it. It would depend on what operating system you're using, but for RHEL 7, for example, you could do:
RUN yum install -y libstdc++
With more details of the operating system I can give a more specific command
In this specific examples the Dockerfile could look something like this:
FROM centos:7
RUN yum install -y libstdc++
CMD ["/bin/bash"]

Robot Framework test case fails with “Element not found” when running selenium library based test case with headless chrome inside a DOCKER container

Below is the test case that I am trying to execute inside the docker container.
Login To GUI
[Documentation] To open GUI and login with valid credentials
${chrome_options}= Evaluate sys.modules['selenium.webdriver'].ChromeOptions() sys, selenium.webdriver
Call Method ${chrome_options} add_argument --no-sandbox
Call Method ${chrome_options} add_argument --headless
Call Method ${chrome_options} add_argument --disable-dev-shm-usage
Call Method ${chrome_options} add_argument --ignore-certificate-errors-spki-list
Call Method ${chrome_options} add_argument --ignore-ssl-errors
Open Browser ${url} chrome options=${chrome_options} executable_path=/usr/lib/chromium/chromedriver
Set Browser Implicit Wait 5
Input Text id=username ${username}
Input Text id=password ${password}
Click Button //input[#value='Sign in']
The test case passed successfully when I tried to execute it directly from IDE (Pycharm) in the MAC terminal. But, When I tried to perform the same via docker container, it fails with error “Element with locator 'id=username' not found” and a blank white screen is attached as part of screenshot in logs. The page I request should get redirected to an authentication page (key cloak) with the username password field, but I am getting blank page in the docker container.
I checked the log file inside container “/usr/lib/chromium/chrome_debug.log”
[0302/] Failed to read DnsConfig.
[0302/] Error parsing cert retrieved from AIA (as DER):
ERROR: Failed parsing Certificate SEQUENCE
ERROR: Failed parsing Certificate
[0302/] Error parsing cert retrieved from AIA (as DER):
ERROR: Failed parsing Certificate SEQUENCE
ERROR: Failed parsing Certificate
[0302/] AiaRequest::OnFetchCompleted got error -301
[0302/] handshake failed; returned -1, SSL error code 1, net_error -202
Then I tried the below command inside the container and I got:
/usr/lib/chromium # chromium-browser --headless --no-sandbox --ignore-certificate-errors --ignore-ssl-errors https://<url>
[0302/] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory
[0302/] Failed to read DnsConfig.
[0302/] Failed to read DnsConfig.
[0302/] Error parsing cert retrieved from AIA (as DER):
ERROR: Failed parsing Certificate SEQUENCE
ERROR: Failed parsing Certificate
[0302/] Error parsing cert retrieved from AIA (as DER):
ERROR: Failed parsing Certificate SEQUENCE
ERROR: Failed parsing Certificate
[0302/] AiaRequest::OnFetchCompleted got error -301
[0302/115904.273717:INFO:CONSOLE(27)] "Mixed Content: The page at 'https://<url>/auth/realms/ml/protocol/openid-connect/auth?client_id=ml-client&redirect_uri=https%3A%2F%2F<url>%2Foauth%2Fcallback&response_type=code&scope=ml-scope+openid+email+profile&state=6d35f7-add8-40b-a8e7-b169876cfc' was loaded over a secure connection, but contains a form that targets an insecure endpoint 'http://ml-sec-access-mgmt-http:8080/auth/realms/ml/login-actions/authenticate?session_code=mrjXrpjeadGywFIIgkHhddBag74tDnWV6FHA3Qk&execution=f19849-6670-406c-a1b0-139bb1f1dc05&client_id=ml-client&tab_id=vGTrJ7OI8'. This endpoint should be made available over a secure connection.", source: https://<url>/auth/realms/ml/protocol/openid-connect/auth?client_id=ml-client&redirect_uri=https%3A%2F%2F<url>%2Foauth%2Fcallback&response_type=code&scope=ml-scope+openid+email+profile&state=6d85f7-add8-40db-a8e7-b16239876cfc (27)
I even download the chromium browser in my MAC and tried opening the URL it works fine.
Docker File [Reference:]:
#Base image
FROM python:3.9.0-alpine3.12
# Set the reports directory environment variable
ENV ROBOT_REPORTS_DIR /opt/robotframework/reports
# Set the tests directory environment variable
ENV ROBOT_TESTS_DIR /opt/robotframework/tests
# Set the working directory environment variable
ENV ROBOT_WORK_DIR /opt/robotframework/temp
# Set number of threads for parallel execution
# By default, no parallelisation
# Install system dependencies
RUN apk update \
&& apk --no-cache upgrade \
&& apk --no-cache --virtual .build-deps add \
gcc \
libffi-dev \
linux-headers \
make \
musl-dev \
openssl-dev \
which \
wget \
curl \
vim \
ca-certificates \
git \
jq \
chromium \
#Install robotframework and required libraries from the requirements file
ADD requirements.txt /
RUN pip3 install \
--no-cache-dir \
-r requirements.txt
# Create the default report and work folders with the default user to avoid runtime issues
# These folders are writeable by anyone, to ensure the user can be changed on the command line.
&& mkdir -p ${ROBOT_WORK_DIR} \
# Installing product related utilities inside the container
XXXXX<contents are hidden as it is not relevant to this query>
# Allow any user to write logs
RUN chmod ugo+w /var/log
# Update system path
ENV PATH=/opt/robotframework/bin:$PATH
# A dedicated work folder to allow for the creation of temporary files
Requirements.txt file contents:
#Required robot framework packages
I even referred the link Getting empty page running selenium in headless chrome Docker.
I could not figure out what could be the issue. Is it really a redirect issue or certificate issue or Mixed content? I am quite confused. Any ideas?
I found a solution for the above problem statement.
First I tried using chrome and firefox instead of chromium. But apline doesn't had chrome and so switched my base image to ubuntu. Also, in general, ubuntu is suggested [Reference:] as a best docker base image for running Python Applications.
But even after changing to ubuntu as new docker base image with chrome and firefox, it is the same error (blank page white screen).
Below error as well,
oot#a4ac8fd9a950:/opt/google/chrome# google-chrome --headless --no-sandbox https://<URL>
[0306/] Cannot create Pref Service with no user data dir.
[0306/] Less than 64MB of free space in temporary directory for shared memory files: 63
[0306/] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory
[0306/] Error parsing cert retrieved from AIA (as DER):
ERROR: Failed parsing Certificate SEQUENCE
ERROR: Failed parsing Certificate
[0306/] Error parsing cert retrieved from AIA (as DER):
ERROR: Failed parsing Certificate SEQUENCE
ERROR: Failed parsing Certificate
[0306/] AiaRequest::OnFetchCompleted got error -301
[0306/] handshake failed; returned -1, SSL error code 1, net_error -202
Then I tried the same with Xvfb [Xvfb (short for X virtual framebuffer) is an in-memory display server for UNIX-like operating system (e.g., Linux). It enables you to run graphical applications without a display (e.g., browser tests on a CI server) while also having the ability to take screenshots.] This worked. Giving all the contents below for reference.
Modified the docker file as below:
FROM ubuntu:20.04
# Set the reports directory environment variable
ENV ROBOT_REPORTS_DIR /opt/robotframework/reports
# Set the tests directory environment variable
ENV ROBOT_TESTS_DIR /opt/robotframework/tests
# Set the working directory environment variable
ENV ROBOT_WORK_DIR /opt/robotframework/temp
# Set number of threads for parallel execution
# By default, no parallelisation
ENV DEBIAN_FRONTEND=noninteractive
# Install system dependencies
RUN apt-get update \
&& apt-get install --quiet --assume-yes \
python3-pip \
unzip \
firefox \
wget \
curl \
vim \
ca-certificates \
git \
jq \
# Install chrome package
RUN wget --no-verbose
RUN dpkg --install google-chrome-stable_current_amd64.deb; apt-get --fix-broken --assume-yes install
#Install robotframework and required libraries from the requirements file
ADD requirements.txt /
RUN pip3 install \
--no-cache-dir \
-r requirements.txt
# Install webdrivers for chrome and firefox
RUN CHROMEDRIVER_VERSION=`wget --no-verbose --output-document -` && \
wget --no-verbose --output-document /tmp/$CHROMEDRIVER_VERSION/ && \
unzip -qq /tmp/ -d /opt/chromedriver && \
chmod +x /opt/chromedriver/chromedriver && \
ln -fs /opt/chromedriver/chromedriver /usr/local/bin/chromedriver
RUN GECKODRIVER_VERSION=`wget --no-verbose --output-document - | grep tag_name | cut -d '"' -f 4` && \
wget --no-verbose --output-document /tmp/geckodriver.tar.gz$GECKODRIVER_VERSION/geckodriver-$GECKODRIVER_VERSION-linux64.tar.gz && \
tar --directory /opt -zxf /tmp/geckodriver.tar.gz && \
chmod +x /opt/geckodriver && \
ln -fs /opt/geckodriver /usr/local/bin/geckodriver
# Create the default report and work folders with the default user to avoid runtime issues
# These folders are writeable by anyone, to ensure the user can be changed on the command line.
&& mkdir -p ${ROBOT_WORK_DIR} \
# Installing product related utilities inside the container
# Allow any user to write logs
RUN chmod ugo+w /var/log
# Update system path
ENV PATH=/opt/robotframework/bin:$PATH
# A dedicated work folder to allow for the creation of temporary files
Requirement text file:
#Required robot framework packages
New Robot FW test case with Xvfb:
*** Settings ***
Library SeleniumLibrary
Library XvfbRobot
*** Test Cases ***
Login To GUI
[Documentation] To open GUI and login with valid credentials
Start Virtual Display 1920 1080
Open Browser ${URL}
Set Window Size 1920 1080
Set Browser Implicit Wait 5
Input Text id=username ${username}
Input Text id=password ${password}
Click Button //input[#value='Sign in']

Succesfully created a virtualenv (using "mkproject") in Dockerfile, but can't run "workon" properly

Edit: Solved- typo
I have a Dockerfile that successfully creates a virtualenv using virtualenvwrapper (along with setting up a heap of "standard" settings/packages in our normal environment). I am using the resulting image as a "base image" for further use. All good so far. However, the following Dockerfile (based of the first image, "base_image_14.04") falls down at the last line:
FROM base_image_14.04
USER root
RUN DEBIAN_FRONTEND=noninteractive \
apt-get update && apt-get install -y \
libproj0 libproj-dev \
libgeos-c1v5 libgeos-dev \
libjpeg62 libjpeg-dev \
zlib1g zlib1g-dev \
libfreetype6 libfreetype6-dev \
libgdal20 libgdal-dev \
&& rm -rf /var/lib/apt/lists
USER webdev
RUN ["/bin/bash", "-ic", "mkproject maproxy"]
ADD ./requirements.txt .
RUN ["/bin/bash", "-ic", "workon mapproxy && pip install -r requirements.txt"]
The "mkproject mapproxy" works fine. If I comment out the last line it builds successfully and I can spin up the container and run "workon mapproxy" manually, not a problem. But when I try and build with the last line, it gives a workon error:
ERROR: Environment 'mapproxy' does not exist. Create it with 'mkvirtualenv mapproxy'.
workon is being called, but for some reason it can't find the mapproxy virtualenv.
WORKON_HOME & PROJECT_HOME both exist (defined in the parent image) and point to the correct locations (and are used successfully by "mkproject mapproxy").
So why is workon returning an error when the mapproxy virtualenv exists? The same error happens when I isolate that last line into a third Dockerfile building on the second.
Solved: It was a simple typo. mkproject maproxy instead of mapproxy. :sigh:
I am trying to build a docker image and am running into similar problems.
First question was why use a virtual env in docker? The main reason in a nutshell is to minimize effort to migrate an existing and working approach into a docker container. I will eventually use docker-compose, but I wanted to start by getting my feet wet with it all in a single docker container.
In my first attempt I installed almost everything with apt-get, including uwsgi. I installed my app "globally" with pip3. The app has command line functionality and a separate flask web app, hence the need for uwsgi. The command line functionality works, but when I make a request of the flask app uwsgi / python has a problem with locale: Fatal Python error: Py_Initialize: Unable to get the locale encoding and ImportError: No module named 'encodings
I have stripped away all my app specific additions to narrow down the problem. This is the Dockerfile I'm using:
# Docker image definition for testing
FROM ubuntu:xenial
# Create a user
RUN useradd -G sudo -ms /bin/bash tester
RUN echo 'tester:password' | chpasswd
WORKDIR /home/tester
# Skipping apt-get update to save some build time. Some are kept
# to insure they are the same as on host setup.
RUN apt-get install -y python3 python3-dev python3-pip \
virtualenv virtualenvwrapper sudo nano && \
apt-get clean -qy
# After above, can we use those installed in rest of Dockerfile?
# Yes, but not always, such as with virtualenvwrapper. What about
# virtualenv? How do you "source" the script? Doesn't appear to be
# installed, as bash complains "source needs a single parameter"
RUN ["/bin/bash", "-c", "source", "/usr/share/virtualenvwrapper/"]
# Create a virtualenv so uwsgi can find locale
# RUN mkdir /home/tester/.virtualenv && virtualenv -p`which python3` /home/bts_tools/.virtualenv/bts_tools
RUN mkvirtualenv -p`which python3` bts_tools && \
workon bts_tools && \
pip3 --disable-pip-version-check install --upgrade bts_tools
USER tester
ENTRYPOINT ["/bin/bash"]
CMD ["--login"]
The build fails on the line I try to source the virtualenvwrapper script. Bash complains source needs an argument - the file to be sourced. So I comment out the RUN lines and it builds without error. When I run the resulting container I see all the additions to the ENV that virtualenvwrapper makes (you can see all of them by executing the "set" command without any args), and the script to be sourced is there too.
So my question is why doesn't docker find them? How does the docker build process work if the results of any previous RUNs or ENVs aren't applied for subsequent use in the Dockerfile? I know some things are applied and work, for example if you apt-get nginx you can refer to /etc/nginx or alter things under that folder. You can create a user and set it's password or cd into its home folder for example. If I move the WORKDIR before the RUN useradd -G I see a warning from useradd the home folder already exists. I tried to use the "time" program to time how long it takes to do various things in the Dockerfile and docker complains it can't find 'time'.
So what exactly is going on? I have spent the last 3 days trying to figure this out. It just shouldn't be this difficult. What am I missing?
Parts of the bts_tools flask app worked when I wasn't using virtual envs. Most of the app didn't work, and the issue was this locale problem. Since everything works on the host outside of docker, and after trying to alter the PATH, PYTHONHOME, PYTHONPATH in my uwsgi start script to overcome the dreaded "locale encoding" fatal error, I decided to try to replicate the host setup as closely as possible since that didn't have the locale issue. When I have had that problem before I could run dpkg-reconfigure python3 or fix with changes to PATH or ENV settings. If you google the problem you'll see many people have difficulties with python & locale. It's almost enough reason to avoid using python!
I posted this elsewhere about locale issue, if it helps.

Using ccache in automated builds on Docker cloud

I am using automated builds on Docker cloud to compile a C++ app and provide it in an image.
Compilation is quite long (range 2-3 hours) and commits on github are frequent (~10 to 30 per day).
Is there a way to keep the building cache (using ccache) somehow?
As far as I understand it, docker caching is useless since the compilation layer producing the ccache will not be used due to the source code changes.
Or can we tweak to bring some data back to first layer?
Any other solution? Pushing it somewhere?
Here is the Dockerfile:
# CACHE_TAG is provided by Docker cloud
# see
# using ARG in FROM requires min v17.05.0-ce
FROM qgis/qgis3-build-deps:${CACHE_TAG}
MAINTAINER Denis Rouzaud <>
ENV CC=/usr/lib/ccache/clang
ENV CXX=/usr/lib/ccache/clang++
COPY . /usr/src/QGIS
WORKDIR /usr/src/QGIS/build
RUN cmake \
-GNinja \
.. \
&& ninja install \
&& rm -rf /usr/src/QGIS
You should try saving and restoring your cache data from a third party service:
- an online object storage like Amazon S3
- a simple FTP server
- an Internet available machine with ssh to make a scp
I'm assuming that your cache data is stored inside the ´~/.ccache´ directory
Using Docker multistage build
From some time, Docker supports Multi-stage builds and you can try using it to implement the solution with a single Dockerfile:
Warning: I've not tested it
# CACHE_TAG is provided by Docker cloud
# see
# using ARG in FROM requires min v17.05.0-ce
FROM qgis/qgis3-build-deps:${CACHE_TAG} as builder
MAINTAINER Denis Rouzaud <>
ENV CC=/usr/lib/ccache/clang
ENV CXX=/usr/lib/ccache/clang++
COPY . /usr/src/QGIS
WORKDIR /usr/src/QGIS/build
# restore cache
RUN curl -o ccache.tar.bz2 http://my-object-storage/ccache.tar.bz2
RUN tar -xjvf ccache.tar.bz2
COPY --from=downloader /.ccache ~/.ccache
RUN cmake \
-GNinja \
.. \
&& ninja install
# save the current cache online
RUN tar -cvjSf ccache.tar.bz2 .ccache
RUN curl -T ccache.tar.bz2 -X PUT http://my-object-storage/ccache.tar.bz2
FROM alpine:latest
# USE THE FROM IMAGE YOU NEED, this is only an example
# E.g.:
# COPY --from=builder /usr/src/QGIS/build/YOUR_EXECUTABLE /usr/bin
# ...
In the stage 2 you will build the final image that will be pushed to your repository.
 Using Docker cloud hooks
Another, but less clear, approach could be using a Docker Cloud pre_build hook file to download cache data:
echo "=> Downloading build cache data"
curl -o ccache.tar.bz2 http://my-object-storage/ccache.tar.bz2 # e.g. Amazon S3 like service
cd /
tar -xjvf ccache.tar.bz2
Obviously you can use dedicate docker images to run curl or tar mounting the local directory as a volume in this script.
Then, copy the .ccache extracted folder inside your container during the build, using a COPY command before your cmake call:
WORKDIR /usr/src/QGIS/build
COPY /.ccache ~/.ccache
RUN cmake ...
In order to make this you should find a way to upload your cache data after the build and you could make this easily using a post_build hook file:
echo "=> Uploading build cache data"
tar -cvjSf ccache.tar.bz2 ~/.ccache
curl -T ccache.tar.bz2 -X PUT http://my-object-storage/ccache.tar.bz2
But your compilation data aren't available from the outside, because they live inside the container. So you should upload the cache after the cmake command inside your main Dockerfile:
RUN cmake...
&& tar ...
&& curl ...
&& ninja ...
&& rm ...
If curl or tar aren't available, just add them to your container using the package manager (qgis/qgis3-build-deps is based on Ubuntu 16.04, so they should be available).

How to add a file to an image in Dockerfile without using the ADD or COPY directive

I need the contents of a large *.zip file (5 gb) in my Docker container in order to compile a program. The *.zip file resides on my local machine. The strategy for this would be:
COPY /tmp/
RUN cd /tmp \
&& unzip \
&& make
After having done this I would like to remove the unzipped directory and the original *.zip file because they are not needed any more. The problem is that the COPY (and also the ADD directive) will add a layer to the image that will contain the file which is problematic as may image will be at least 5gb big. Is there a way to add a file to a container without using COPY or ADD directive? wget will not work as the mentioned *.zip file is on my local machine and curl file://localhost/home/user/ -o /tmp/ will not work either.
It is not straightforward but it can be done via wget or curl with a little support from python. (All three tools should usually be available on a *nix system.)
wget will not work when no url is given and
curl file://localhost/home/user/ -o /tmp/
will not work from within a Dockerfile's RUN instruction. Hence, we will need a server which wget and curl can access and download from.
To do this we set up a little python server which serves our http requests. We will be using the http.server module from python for this. (You can use python or python 3. It will work with both.).
python -m http.server --bind 8000
The python server will serve all files in the directory it is started in. So you should make sure that you start your server either in the directory the file you want to download during your image build resides in or create a temporary directory which contains your program. For illustration purposes let's create the file foo.txt which we will later download via wget in our Dockerfile:
echo "foo bar" > foo.txt
When starting the http server, it is important, that we specify the IP address of our local machine on the LAN. Furthermore, we will open Port 8000. Having done this we should see the following output:
python3 -m http.server --bind 8000
Serving HTTP on port 8000 ...
Now we build a Dockerfile to illustrate how this works. (We will assume that the file foo.txt should be downloaded into /tmp):
FROM debian:latest
RUN apt-get update -qq \
&& apt-get install -y wget
RUN cd /tmp \
&& wget
Now we start the build with
docker build -t test .
During the build you will see the following output on our python server: - - [01/Nov/2014 23:32:37] "GET /foo.txt HTTP/1.1" 200 -
and the build output of our image will be:
Step 2 : RUN cd /tmp && wget
---> Running in 49c10e0057d5
--2014-11-01 22:56:15--
Connecting to connected.
HTTP request sent, awaiting response... 200 OK
Length: 25872 (25K) [text/plain]
Saving to: `foo.txt'
0K .......... .......... ..... 100% 129M=0s
2014-11-01 22:56:15 (129 MB/s) - `foo.txt' saved [25872/25872]
---> 5228517c8641
Removing intermediate container 49c10e0057d5
Successfully built 5228517c8641
You can then check if it really worked by starting and entering a container from the image you just build:
docker run -i -t --rm test bash
You can then look in /tmp for foo.txt.
We can now add any file to our image without creating an new layer. Assuming you want to add a program of about 5 gb as mentioned in the question we could do:
FROM debian:latest
RUN apt-get update -qq \
&& apt-get install -y wget
RUN cd /tmp \
&& wget http://conventiont:8000/ \
&& unzip \
&& cd program \
&& make \
&& make install \
&& cd /tmp \
&& rm -f \
&& rm -rf program
In this way we will not be left with 10 gb of cruft.
There's no way to do this. A feature request is here
Can you not map a local folder to the container when launched and then copy the files you need.
sudo docker run -d -P --name myContainerName -v /localpath/zip_extract:/container/path/ yourContainerID
I have posted a similar answer here:
You can use docker-squash to squash newly created layers. That will essentially remove the archive from final image if you remove it in subsequent RUN instruction.
