Sagemaker with windows - docker

I am trying to use aws sagemaker with Windows using Docker :
Here is the docker file :
# Build an image that can do training and inference in SageMaker
# This is a Python 2 image that uses the nginx, gunicorn, flask stack
# for serving inferences in a stable way.
FROM ubuntu:16.04
MAINTAINER Amazon AI <sage-learner#amazon.com>
RUN apt-get -y update && apt-get install -y --no-install-recommends \
wget \
python3.5 \
nginx \
libgcc-5-dev \
ca-certificates \
&& rm -rf /var/lib/apt/lists/*
# Here we get all python packages.
# There's substantial overlap between scipy and numpy that we eliminate by
# linking them together. Likewise, pip leaves the install caches populated which uses
# a significant amount of space. These optimizations save a fair amount of space in the
# image, which reduces start up time.
RUN wget https://bootstrap.pypa.io/3.3/get-pip.py && python3.5 get-pip.py && \
pip3 install numpy==1.14.3 scipy scikit-learn==0.19.1 xgboost==0.72.1 pandas==0.22.0 flask gevent gunicorn && \
(cd /usr/local/lib/python3.5/dist-packages/scipy/.libs; rm *; ln ../../numpy/.libs/* .) && \
rm -rf /root/.cache
# Set some environment variables. PYTHONUNBUFFERED keeps Python from buffering our standard
# output stream, which means that logs can be delivered to the user quickly. PYTHONDONTWRITEBYTECODE
# keeps Python from writing the .pyc files which are unnecessary in this case. We also update
# PATH so that the train and serve programs are found when the container is invoked.
ENV PYTHONUNBUFFERED=TRUE
ENV PYTHONDONTWRITEBYTECODE=TRUE
ENV PATH="/opt/program:${PATH}"
# Set up the program in the image
COPY xgboost /opt/program
WORKDIR /opt/program
My question is should I, since I work under windows 7, change these path : ?
Thank you

Are you talking about the ENV PATH?
That sets the PATH env within the docker container, which uses linux file system (ubuntu:16.04), so you shouldn't have to change anything.
https://docs.docker.com/engine/reference/builder/#environment-replacement
EDIT:
I reread your question. None of your paths have to change within your Dockerfile, as they are configured for the docker container themselves.

Related

How to edit mounted files in a devcontainer?

I'm struggling to setup a dev container for my project. Initially, the setup was fine. However, when I edited files from inside the container, I could no longer modify them outside of the container because the user inside the container was a root user.
So the, I modified my Dockerfile to build my project inside of a user directory. The issue is now, I can't edit the mounted source code because I lack the necessary permissions inside the container.
Here is my (new) Dockerfile:
# We build everything from the perspective of the user
# This lets us edit mounted files from inside the container using
# the VsCode Remote Container extension w/o setting their permissions to root
FROM nvidia/cuda:11.6.1-devel-ubuntu20.04
# Prevents apt from giving prompts
# Set as ARG so it does not persist after build
# https://serverfault.com/questions/618994/when-building-from-dockerfile-debian-ubuntu-package-install-debconf-noninteract
ARG DEBIAN_FRONTEND=noninteractive
# Docker docs: https://docs.docker.com/develop/develop-images/dockerfile_best-practices/
# In addition, when you clean up the apt cache by removing /var/lib/apt/lists it reduces the image size, since the apt cache is not stored in a layer.
# Since the RUN statement starts with apt-get update, the package cache is always refreshed prior to apt-get install.
# Install all apt dependencies before we switch to non-root
RUN apt update && apt install curl \
# for bulding torchvision
git ninja-build -y && rm -rf /var/lib/apt/lists/*
# These commands are for the VsCode Docker extension
RUN adduser --system --group user
USER user
WORKDIR /home/user
# Install miniconda.sh
ENV PATH="/home/user/miniconda3/bin:${PATH}"
RUN curl \
https://repo.anaconda.com/miniconda/Miniconda3-py38_4.12.0-Linux-x86_64.sh -o miniconda.sh \
&& mkdir /home/user/.conda \
&& bash miniconda.sh -b \
&& rm miniconda.sh
COPY ./environment.yml ./environment.yml
RUN conda env create -f environment.yml && rm ./environment.yml
# pip throws a warning "don't run pip as root", but it's not actually running as root -- so ignore it
# (you can check by attaching to the container & running `pip list`)
COPY ./requirements.txt ./requirements.txt
RUN conda run --no-capture-output -n video-rec pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cu116 && rm ./requirements.txt
# This is a hack, will figure it out later
# RUN conda run --no-capture-output -n video-rec pip install 'mosiacml[all]'
# TODO: Pillow SIMD for fast image augmentations
# Uncomment + remove FFMPEG from environment.yml if using GPU decoding
# Compile nv-codec
# RUN git clone --depth 1 --branch n11.1.5.1 https://git.videolan.org/git/ffmpeg/nv-codec-headers.git && \
# cd nv-codec-headers && \
# make install -j 100
# Build FFMPEG with Nvidia, torch requires FFMPEG 4.2 (I think)
# RUN git clone --depth 1 --branch n4.2.7 https://git.ffmpeg.org/ffmpeg.git ffmpeg/ \
# && cd ffmpeg && \
# ./configure --enable-nonfree --enable-shared --enable-cuda-nvcc --enable-libnpp --extra-cflags=-I/usr/local/cuda/include --extra-ldflags=-L/usr/local/cuda/lib64 \
# # I need this, I believe this generates code that works for A100s
# # See: https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/
# # SM80 or SM_80, compute_80 => NVIDIA A100
# # SM76 => Tesla GA10x cards, RTX Ampere – RTX 3080, GA102 – RTX 3090, RTX A2000, A3000, RTX A4000, A5000, A6000, NVIDIA A40, GA106 – RTX 3060, GA104 – RTX 3070, GA107 – RTX 3050, RTX A10, RTX A16, RTX A40, A2 Tensor Core GPU
# --nvccflags="-gencode arch=compute_80,code=sm_80 -O2" \
# && make -j 100 \
# && make install
# torchvision (FFMPEG dependency is installed through conda)
RUN git clone --depth 1 --branch v0.13.1 https://github.com/pytorch/vision.git vision \
&& cd vision \
# remove the existing torchvision
&& conda run --no-capture-output -n video-rec pip uninstall --yes torchvision \
&& conda run --no-capture-output -n video-rec python3 setup.py install
# Set a default shell for the Dev container extension
ENV SHELL /bin/bash
and here is my devcontainer.json:
{
"dockerComposeFile": "../commands/develop/docker-compose.yml",
"service": "develop",
"workspaceFolder": "/home/user/app",
"shutdownAction": "stopCompose",
"updateRemoteUserUID": true
}
Lastly, here is my compose file:
version: "3.8"
services:
develop:
build: ../..
stdin_open: true
volumes:
- ../..:/home/user/app
- ${DATA_DIR}:/data
environment:
WANDB_API_KEY: ${WANDB_API_KEY}
user: user
# This removes a certain memory restriction ??
# w/o the container crashes
ipc: host
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]
What can I do to let me edit files on the host machine from inside my dev container without ruining their permissions?

what is the lightest docker image to be used for automation test?

I need to create a docker image to run the UI automation test in headless mode.
it should contain:
NodeJs, JDK, chrome browser.
I have created the one below, which is 1.6 GB, is there a better way to make it lighter and optimized
FROM node:slim
ENV DEBIAN_FRONTEND noninteractive
WORKDIR /project
#=============================
# Install Dependenices
#=============================
SHELL ["/bin/bash", "-c"]
RUN apt update && apt install -y wget bzip2 openjdk-11-jre xvfb libnotify-dev
#==============================
# install chrome
#==============================
RUN wget https://dl.google.com/linux/direct/${CHROME_PACKAGE} && \
dpkg-deb -x ${CHROME_PACKAGE} / && \
apt-get install -f -y
#=========================
# Copying Scripts to root
#=========================
COPY . /project
RUN chmod a+x ./execute_test.sh
#=======================
# framework entry point
#=======================
CMD [ "/bin/bash" ]
Your build will fail if not use SHELL ["/bin/bash", "-c"]? Otherwise eliminate this line can save you a layer. You can combine the 2 RUN into one which save you a few more. Then try --squash flag when building your image. Note your docker daemon needs to have experiment enabled to use this flag. You should get a smaller image using these steps.

Dockerfile cannot find executable script (no such file or directory)

I'm writting a Dockerfile in order to create an image for a web server (a shiny server more precisely). It works well, but it depends on a huge database folder (db/) that it is not distributed with the package, so I want to do all this preprocessing while creating the image, by running the corresponding script in the Dockerfile.
I expected this to be simple, but I'm struggling figuring out where my files are being located within the image.
This repo has the following structure:
Dockerfile
preprocessing_files
configuration_files
app/
application_files
db/
processed_files
So that app/db/ does not exist, but is created and filled with files when preprocessing_files are run.
The Dockerfile is the following:
# Install R version 3.6
FROM r-base:3.6.0
# Install Ubuntu packages
RUN apt-get update && apt-get install -y \
sudo \
gdebi-core \
pandoc \
pandoc-citeproc \
libcurl4-gnutls-dev \
libcairo2-dev/unstable \
libxml2-dev \
libxt-dev \
libssl-dev
# Download and install ShinyServer (latest version)
RUN wget --no-verbose https://s3.amazonaws.com/rstudio-shiny-server-os-build/ubuntu-12.04/x86_64/VERSION -O "version.txt" && \
VERSION=$(cat version.txt) && \
wget --no-verbose "https://s3.amazonaws.com/rstudio-shiny-server-os-build/ubuntu-12.04/x86_64/shiny-server-$VERSION-amd64.deb" -O ss-latest.deb && \
gdebi -n ss-latest.deb && \
rm -f version.txt ss-latest.deb
# Install R packages that are required
RUN R -e "install.packages(c('shiny', 'flexdashboard','rmarkdown','tidyverse','plotly','DT','drc','gridExtra','fitdistrplus'), repos='http://cran.rstudio.com/')"
# Copy configuration files into the Docker image
COPY shiny-server.conf /etc/shiny-server/shiny-server.conf
COPY /app /srv/shiny-server/
COPY /app/db /srv/shiny-server/app/
# Make the ShinyApp available at port 80
EXPOSE 80
CMD ["/usr/bin/shiny-server"]
This above file works well if preprocessing_files are run in advance, so app/application_files can successfully read app/db/processed_files. How could this script be run in the Dockerfile? To me the intuitive solution would be simply to write:
RUN bash -c "preprocessing.sh"
Before the ADD instruction, but then preprocessing_files are not found. If the above instruction is written below ADD and also WORKDIR app/, the same error happens. I cannot understand why.
You cannot execute code on the host machine from Dockerfile. RUN command executes inside the container being built. You can:
Copy preprocessing_files inside docker container and run preprocessing.sh inside the container (this would increase size of the container)
Create a makefile/build.sh script which launches preprocessing.sh before executing docker build

Up to date (lts-13.0) minimal base docker image for haskell/stack?

I'm would like to deploy my haskell application on docker and the base image fco/stack-build that I've found takes 9GB ! Do you know a base image more minimal than that ??
stack-build is as large as it is, because it contains the required system dependencies of all packages on Stackage.
I am using the following base image for building and deploying:
FROM ubuntu:18.04
RUN apt-get update
# Build dependencies
RUN apt-get install --assume-yes curl
RUN curl -sSL https://get.haskellstack.org/ | sh
RUN apt-get install --assume-yes libtinfo-dev
# Without this haddock crashes for modules containing
# non-ASCII characters.
ENV LANG C.UTF-8
It's not really minimal if you just want to use the image during runtime as you wouldn't need stack in that case.
Firstly, that image might only be necessary to build the excutable, once you have it built you could use a multistage docker build or just copy the executable directly to a more slim image.
The dockerfile is available here: https://github.com/commercialhaskell/stack/blob/master/etc/dockerfiles/stack-build/lts-13.0/Dockerfile
You could remove these commands (which probably add up to the bulk of the size):
# Use Stackage's debian-bootstrap.sh script to install system libraries and
# tools required to build any Stackage package.
#
RUN apt-get update && \
apt-get install -y wget && \
wget -qO- https://raw.githubusercontent.com/fpco/stackage/$BOOTSTRAP_COMMIT/debian-bootstrap.sh | bash && \
rm -rf /var/lib/apt/lists/*

Can't build openjdk:8-jdk image directly

I'm slowly making my way through the Riot Taking Control of your Docker Image tutorial http://engineering.riotgames.com/news/taking-control-your-docker-image. This tutorial is a little old, so there are some definite changes to how the end file looks. After hitting several walls I decided to work in the opposite order of the tutorial. I successfully folded the official jenkinsci image into my personal Dockerfile, starting with FROM: openjdk:8-dk. But when I try to fold in the openjdk:8-dk file into my personal image I receive the following error
E: Version '8u102-b14.1-1~bpo8+1' for 'openjdk-8-jdk' was not found
ERROR: Service 'jenkinsmaster' failed to build: The command '/bin/sh
-c set -x && apt-get update && apt-get install -y openjdk-8-jdk="$JAVA_DEBIAN_VERSION"
ca-certificates-java="$CA_CERTIFICATES_JAVA_VERSION" && rm -rf
/var/lib/apt/lists/* && [ "$JAVA_HOME" = "$(docker-java-home)" ]'
returned a non-zero code: 100 Cosettes-MacBook-Pro:docker-test
Cosette$
I'm receiving this error even when I gave up and directly copied and pasted the openjdk:8-jdk Dockerfile into my own. My end goal is to bring my personal Dockerfile down to the point that it starts FROM debian-jessie. Any help would be appreciated.
My Dockerfile:
FROM buildpack-deps:jessie-scm
# A few problems with compiling Java from source:
# 1. Oracle. Licensing prevents us from redistributing the official JDK.
# 2. Compiling OpenJDK also requires the JDK to be installed, and it gets
# really hairy.
RUN apt-get update && apt-get install -y --no-install-recommends \
bzip2 \
unzip \
xz-utils \
&& rm -rf /var/lib/apt/lists/*
RUN echo 'deb http://deb.debian.org/debian jessie-backports main' > /etc/apt/sources.list.d/jessie-backports.list
# Default to UTF-8 file.encoding
ENV LANG C.UTF-8
# add a simple script that can auto-detect the appropriate JAVA_HOME value
# based on whether the JDK or only the JRE is installed
RUN { \
echo '#!/bin/sh'; \
echo 'set -e'; \
echo; \
echo 'dirname "$(dirname "$(readlink -f "$(which javac || which java)")")"'; \
} > /usr/local/bin/docker-java-home \
&& chmod +x /usr/local/bin/docker-java-home
ENV JAVA_HOME /usr/lib/jvm/java-8-openjdk-amd64
ENV JAVA_VERSION 8u102
ENV JAVA_DEBIAN_VERSION 8u102-b14.1-1~bpo8+1
# see https://bugs.debian.org/775775
# and https://github.com/docker-library/java/issues/19#issuecomment-70546872
ENV CA_CERTIFICATES_JAVA_VERSION 20140324
RUN set -x \
&& apt-get update \
&& apt-get install -y \
openjdk-8-jdk="$JAVA_DEBIAN_VERSION" \
ca-certificates-java="$CA_CERTIFICATES_JAVA_VERSION" \
&& rm -rf /var/lib/apt/lists/* \
&& [ "$JAVA_HOME" = "$(docker-java-home)" ]
# see CA_CERTIFICATES_JAVA_VERSION notes above
RUN /var/lib/dpkg/info/ca-certificates-java.postinst configure
# Jenkins Specifics
# install Tini
ENV TINI_VERSION 0.9.0
ENV TINI_SHA fa23d1e20732501c3bb8eeeca423c89ac80ed452
# Use tini as subreaper in Docker container to adopt zombie processes
RUN curl -fsSL https://github.com/krallin/tini/releases/download/v${TINI_VERSION}/tini-static -o /bin/tini && chmod +x /bin/tini \
&& echo "$TINI_SHA /bin/tini" | sha1sum -c -
# Set Jenkins Environmental Variables
ENV JENKINS_HOME /var/jenkins_home
ENV JENKINS_SLAVE_AGENT_PORT 50000
# jenkins version being bundled in this docker image
ARG JENKINS_VERSION
ENV JENKINS_VERSION ${JENKINS_VERSION:-2.19.1}
# jenkins.war checksum, download will be validated using it
ARG JENKINS_SHA=dc28b91e553c1cd42cc30bd75d0f651671e6de0b
ENV JENKINS_UC https://updates.jenkins.io
ENV COPY_REFERENCE_FILE_LOG $JENKINS_HOME/copy_reference_file.log
ENV JAVA_OPTS="-Xmx8192m"
ENV JENKINS_OPTS="--handlerCountMax=300 --logfile=/var/log/jenkins/jenkins.log --webroot=/var/cache/jenkins/war"
# Can be used to customize where jenkins.war get downloaded from
ARG JENKINS_URL=http://repo.jenkins-ci.org/public/org/jenkins-ci/main/jenkins-war/${JENKINS_VERSION}/jenkins-war-${JENKINS_VERSION}.war
ARG user=jenkins
ARG group=jenkins
ARG uid=1000
ARG gid=1000
# Jenkins is run with user `jenkins`, uid = 1000. If you bind mount a volume from the host or a data
# container, ensure you use the same uid.
RUN groupadd -g ${gid} ${group} \
&& useradd -d "$JENKINS_HOME" -u ${uid} -g ${gid} -m -s /bin/bash ${user}
# Jenkins home directory is a volume, so configuration and build history
# can be persisted and survive image upgrades
VOLUME /var/jenkins_home
# `/usr/share/jenkins/ref/` contains all reference configuration we want
# to set on a fresh new installation. Use it to bundle additional plugins
# or config file with your custom jenkins Docker image.
RUN mkdir -p /usr/share/jenkins/ref/init.groovy.d
# Install Jenkins. Could use ADD but this one does not check Last-Modified header neither does it
# allow to control checksum. see https://github.com/docker/docker/issues/8331
RUN curl -fsSL ${JENKINS_URL} -o /usr/share/jenkins/jenkins.war \
&& echo "${JENKINS_SHA} /usr/share/jenkins/jenkins.war" | sha1sum -c -
# Prep Jenkins Directories
USER root
RUN chown -R ${user} "$JENKINS_HOME" /usr/share/jenkins/ref
RUN mkdir /var/log/jenkins
RUN mkdir /var/cache/jenkins
RUN chown -R ${group}:${user} /var/log/jenkins
RUN chown -R ${group}:${user} /var/cache/jenkins
# Expose ports for web (8080) & node (50000) agents
EXPOSE 8080
EXPOSE 50000
# Copy in local config filesfiles
COPY init.groovy /usr/share/jenkins/ref/init.groovy.d/tcp-slave-agent-port.groovy
COPY jenkins-support /usr/local/bin/jenkins-support
COPY jenkins.sh /usr/local/bin/jenkins.sh
# NOTE : Just set pluginID to download latest version of plugin.
# NOTE : All plugins need to be listed as there is no transitive dependency resolution.
# from a derived Dockerfile, can use `RUN plugins.sh active.txt` to setup
# /usr/share/jenkins/ref/plugins from a support bundle
COPY plugins.sh /usr/local/bin/plugins.sh
RUN chmod +x /usr/local/bin/plugins.sh
RUN chmod +x /usr/local/bin/jenkins.sh
# Switch to the jenkins user
USER ${user}
# Tini as the entry point to manage zombie processes
ENTRYPOINT ["/bin/tini", "--", "/usr/local/bin/jenkins.sh"]
Try a JAVA_DEBIAN_VERSION of 8u111-b14-2~bpo8+1
Here's what happens: when you build the docker file, docker tries to execute all the lines in the dockerfile. One of those is this apt command: apt-get install -y openjdk-8-jdk="$JAVA_DEBIAN_VERSION". This comand says "Install OpenJDK version $JAVA_DEBIAN_VERSION, exactly. Nothing else.". This version is no longer available in Debian repositories, so it can't be apt-get installed! I believe this happens with all packages in official mirrors: if a new version of the package is released, the older version is no longer around to be installed.
If you want to access older Debian packages, you can use something like http://snapshot.debian.org/. The older OpenJDK package has known security vulnerabilities. I recommend using the latest version.
You can use the latest version by leaving out the explicit version in the apt-get command. On the other hand, this will make your image less reproducible: building the image today may get you u111, building it tomorrow may get you u112.
As for why the instructions worked in the other Dockerfile, I think the reason is that at the time the other Dockerfile was built, the package was available. So docker could apt-get install it. Docker then built the image containing the (older) OpenJDK. That image is a binary, so you can install it, or use it in FROM without any issues. But you can't reproduce the image: if you were to try and build the same image yourself, you would run into the same errors.
This also brings up an issue about security updates: since docker images are effectively static binaries (built once, bundle in all dependencies), they don't get security updates once built. You need to keep track of any security updates affecting your docker images and rebuild any affected docker images.

Resources