Docker RUN fails but executing the same commands in the image succeeds - docker

I'm working on a jetson tk1 deployment scheme where I use docker to create the root filesystem which then gets flashed onto the image.
The way this works is I create an armhf image using the nvidia provided sample filesystem with a qemu-arm-static binary which I can then build upon using standard docker tools.
I then have a "flasher" image which copies the contents of the file system, creates an iso image and flashes it onto my device.
The problem that I'm having is that I'm getting inconsistent results between installing apt packages using a docker RUN statement vs entering the image and installing apt packages.
IE:
# docker build -t jetsontk1:base .
Dockerfile
from jetsontk1:base1
RUN apt update
RUN apt install build-essential cmake
# or
RUN /bin/bash -c 'apt install build-essential cmake -y'
vs:
docker run -it jetsontk1:base1 /bin/bash
# apt update
# apt install build-essential cmake
When I install using the docker script I get the following error:
Processing triggers for man-db (2.6.7.1-1) ...
terminate called after throwing an instance of 'std::length_error'
what(): basic_string::append
qemu: uncaught target signal 6 (Aborted) - core dumped
Aborted (core dumped)
The command '/bin/sh -c /bin/bash -c 'apt install build-essential cmake -y'' returned a non-zero code: 134
I have no issues when manually installing applications from when I'm inside the container, but there's no point in using docker to manage this image building process if I can't do apt installs :/
The project can be found here: https://github.com/dtmoodie/tk1_builder
With the current state with the issue as I presented it at commit: 8e22c0d5ba58e9fdab38e675eed417d73ae0aad9

Related

How to pack and transport only the delta of a container?

I have the following scenario:
A docker or podman container is setup ready and deployed to several production instances, that are NOT connected to the internet.
A new release has been developed, that needs only a new package, like a python module of a few kilobytes in size.
The new package is installed on dev container, and the dockerfile has been updated to also load the latest module (just for documentation, because the target system cannot reach docker.io).
We have packed the new container release, which is more than a Gigabyte in size. And could transport the new container to the target environments.
My question is: is there a way, to pack, create and transport only a delta of the container compared to the previously deployed version?
podman version 3.4.7
echo "\
FROM jupyter/scipy-notebook
USER root
RUN apt-get update && apt-get install --no-install-recommends -y mupdf-tools python3-dev
USER user
RUN pip -V
RUN pip install fitz==0.0.1.dev2
RUN pip install PyMuPDF==1.20.2
RUN pip install seaborn
RUN pip install openpyxl==3.0.10
RUN pip install flask==2.1.3
" > sciPyDockerfile
podman build --tag python_runner -f ./sciPyDockerfile
sudo podman save -o python_runner.tar python_runner
gzip python_runner.tar
The result is a file
1.1G Nov 28 15:27 python_runner.tar.gz
Is there any way to pack the delta only?

Docker build dependent on host Ubuntu version not on the actual Docker File

I'm facing an issue with my docker build.
I have a dockerfile as follow:
FROM python:3.6
RUN apt-get update && apt-get install -y libav-tools
....
The issue I'm facing is that I'm getting this error when building on ubuntu:20.04 LTS
E: Package 'libav-tools' has no installation candidate
I made some research and found out that ffmpeg should be a replacement for libav-tools
FROM python:3.6
RUN apt-get update && apt-get install -y ffmpeg
....
I tried again without any issue.
but when I tried to build the same image with ffmpeg on ubuntu:16.04 xenial I'm getting a message that
E: Package 'ffmpeg' has no installation candidate
after that, I replace the ffmpeg with libav-tools and it worked on ubuntu:16.04
I'm confused now why docker build is dependant on the host ubuntu version that I'm using and not the actual dockerfile.
shouldn't docker build be coherent whatever the ubuntu version I'm using.
Delete the the existing image and pull it again. Seems you have a old image which may have a different base OS and that is why you are seeing the issue

How to retain cmake changes when building Docker image in Google Cloud Build?

I am working on a CI pipeline with Google Cloud Build to run tests on code stored in Cloud Source Repositories. As it stands, the pipeline uses the docker cloud builder to build an image with docker build. The building process takes nearly an hour to complete and it runs periodically. It builds the program and then runs a single test on it in one build step, this part works fine. What I want to do is build the program using cmake and make then store the image in the container registry so that I can run future tests from that image without having to spend the time building it before testing.
The issue is that when I run the custom image from the Container Registry in Cloud Build, it does not recognize the module that was built with cmake. The pipeline ran tests just fine when I built it then ran tests in the same build steps, but no longer recognizes the module when I run the image as a custom builder on Cloud Build.
The dockerfile used to build the image is as follows:
FROM ubuntu
ARG DEBIAN_FRONTEND=noninteractive
COPY . /app
WORKDIR /app
RUN apt-get update
RUN apt-get -y install python3.7
RUN apt-get -y install python3-pip
RUN pip3 install numpy
RUN pip3 install matplotlib
RUN pip3 install transitions
RUN pip3 install pandas
RUN apt-get install -y cmake
RUN apt-get install -y swig
RUN pip3 install conan
RUN ln -s ~/.local/bin/conan /usr/bin/conan
RUN apt-get install gcc
RUN cd ~
RUN python3 path/to/master_build.py
The master_build.py uses os.system commands to build the program. It calls a shell script to do the cmake process. The shell script is:
#!/bin/sh
mkdir dist3
cd dist3
cmake -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release ../src
make
cd ~
This builds the program no problem, then the python script calls other scripts to run basic tests, which all work fine when I do it in this build step. The issue is, when I use Cloud Build to run the custom image from container registry, it can no longer find the module that was built with CMake.
This is the cloudbuild config file that runs the custom image:
steps:
- name: 'gcr.io/$PROJECT_ID/built_images:latest'
args: ['python3', 'path/to/run_tests.py']
I get a ModuleNotFoundError, which is weird because it worked fine when I ran the test script in the same build after calling cmake. I'm guessing that the file system is not being retained when I push the image to container registry and it can no longer find the dist3 folder.
How can I retain the dist3 folder when I am pushing the image to container registry?
Thanks!

Changes to my dockerfile are not reflected when running `docker build`

Docker beginner here. I'm trying to build a docker image by invoking docker build -t my_image . and making changes to the dockerfile on lines that fail. I'm currently running into a problem on this line:
RUN apt-get install -qy locales
Which was corrected after previously being:
RUN apt-get install -q locales (I forgot the -y which assumes 'yes' inputs.)
But, when I run the build command again, the change to -qy is seemingly not reflected:
---> Running in bc68a3eec929
Reading package lists...
Building dependency tree...
Reading state information...
The following additional packages will be installed:
libc-l10n
The following NEW packages will be installed:
libc-l10n locales
0 upgraded, 2 newly installed, 0 to remove and 0 not upgraded.
Need to get 4907 kB of archives.
After this operation, 20.8 MB of additional disk space will be used.
Do you want to continue? [Y/n] Abort.
The command '/bin/sh -c apt-get install -q locales' returned a non-zero code: 1
What I've tried:
Removing any recently stopped containers
Removing any recently built images (which were all failed)
Using docker build --no-cache -t my_image .
Note that I do not use VOLUME in the dockerfile. I saw some users had problems with this, but it's not my issue.
In short, why are changes to my dockerfile not recognized during the build command?
Based on what you have mentioned, I think the first failure is not due to -qy, it must be due to not updating the repos before running the apt-get commands.
Can you try this and see?
RUN apt-get update && apt-get install -qy locales

"No module named PIL" after "RUN pip3 install Pillow" in docker container; neither PIL nor Pillow present in dist-packages directory

I'm following this SageMaker guide and using the 1.12 cpu docker file.
https://github.com/aws/sagemaker-tensorflow-serving-container
If I use the requirements.txt file to install Pillow, my container works great locally, but when I deploy to SageMaker, 'pip3 install' fails with an error indicating my container doesn't have internet access.
To work around that issue, I'm trying to install Pillow in my container before deploying to SageMaker.
When I include the lines "RUN pip3 install Pillow" and "RUN pip3 show Pillow" in my docker file, when building, I see output saying "Successfully installed Pillow-6.2.0" and the show command indicates the lib was installed at /usr/local/lib/python3.5/dist-packages. Also running "RUN ls /usr/local/lib/python3.5/dist-packages" in the docker files shows "PIL" and "Pillow-6.2.0.dist-info" in dist-packages, and the PIL directory includes many code files.
However, when I run my container locally, trying to import in python using "from PIL import Image" results in error "No module named PIL". I've tried variations like "import Image", but PIL doesn't seem to be installed in the context in which the code is running when I start the container.
Before the line "from PIL import Image", I added "import subprocess" and 'print(subprocess.check_output("ls /usr/local/lib/python3.5/dist-packages".split()))'
This ls output matches what I get when running it in the docker file, except "PIL" and "Pillow-6.2.0.dist-info" are missing. Why are those two in /usr/local/lib/python3.5/dist-packages when I run the docker file but not when my container is started locally?
Is there a better way to include Pillow in my container? The referenced Github page also shows that I can deploy libraries by including the files (in code/lib of model package), but to get files compatible with Ubuntu 16.04 (which the docker container uses; I'm on a Mac), I'd probably copy them from the docker container after running "RUN pip3 install Pillow" in my docker file, and it seems odd that I would need to get files from the docker container to deploy to the docker container.
My docker file looks like:
ARG TFS_VERSION
FROM tensorflow/serving:${TFS_VERSION} as tfs
FROM ubuntu:16.04
LABEL com.amazonaws.sagemaker.capabilities.accept-bind-to-port=true
COPY --from=tfs /usr/bin/tensorflow_model_server /usr/bin/tensorflow_model_server
# nginx + njs
RUN \
apt-get update && \
apt-get -y install --no-install-recommends curl && \
curl -s http://nginx.org/keys/nginx_signing.key | apt-key add - && \
echo 'deb http://nginx.org/packages/ubuntu/ xenial nginx' >> /etc/apt/sources.list && \
apt-get update && \
apt-get -y install --no-install-recommends nginx nginx-module-njs python3 python3-pip python3-setuptools && \
apt-get clean
RUN pip3 install Pillow
# cython, falcon, gunicorn, tensorflow-serving
RUN \
pip3 install --no-cache-dir cython falcon gunicorn gevent requests grpcio protobuf tensorflow && \
pip3 install --no-dependencies --no-cache-dir tensorflow-serving-api
COPY ./ /
ARG TFS_SHORT_VERSION
ENV SAGEMAKER_TFS_VERSION "${TFS_SHORT_VERSION}"
ENV PATH "$PATH:/sagemaker"
RUN pip3 show Pillow
RUN ls /usr/local/lib/python3.5/dist-packages
I've tried installing Pillow on the same line as cython and other dependencies, but the result is the same...those dependencies are in /usr/local/lib/python3.5/dist-packages both at the time the container is built and when the container is started locally, while "PIL" and "Pillow-6.2.0.dist-info" are only present when the container is built.
Apologies for the late response.
If I use the requirements.txt file to install Pillow, my container works great locally, but when I deploy to SageMaker, 'pip3 install' fails with an error indicating my container doesn't have internet access.
If restricted internet access isn't a requirement, then you should be able to enable internet access by making enable_network_isolation=False when instantiating your Model class in the SageMaker Python SDK, as shown here: https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/model.py#L85
If restricted internet access is a requirement, this means that you will need to either install your dependencies in your own container beforehand or make use of the packaging as you mentioned in your correspondence.
I have copied your provided Dockerfile and created an image to run as an image in order to reproduce the error you are seeing. I was not able to reproduce the error as quoted below:
However, when I run my container locally, trying to import in python using "from PIL import Image" results in error "No module named PIL". I've tried variations like "import Image", but PIL doesn't seem to be installed in the context in which the code is running when I start the container.
I created a similar Docker image and ran it as a container with the following command:
docker run -it --entrypoint bash <DOCKER_IMAGE>
from within the container I started a Python3 session and ran the following commands locally without error:
root#13eab4c6e8ab:/# python3 -s
Python 3.5.2 (default, Oct 8 2019, 13:06:37)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from PIL import Image
Can you please provide the code for how you're starting your SageMaker jobs?
Please double check that the Docker image you have created is the one being referenced when starting your SageMaker jobs.
Please let me know if there is anything I can clarify.
Thanks!

Resources