I am Building a lambda to denoise audio files. Python soundfile uses libsndfile system dependency which I am installing via apt in my DockerFile. The container is running fine locally but when I run it after deploying to lambda it says [ "errorMessage": "sndfile library not found",
"errorType": "OSError", ]. Here is my Dockerfile,
# Define global args
ARG FUNCTION_DIR="/home/app"
ARG RUNTIME_VERSION="3.7"
# Stage 1 - bundle base image + runtime
# Grab a fresh copy of the image and install GCC if not installed ( In case of debian its already installed )
FROM python:${RUNTIME_VERSION} AS python-3.7
# Stage 2 - build function and dependencies
FROM python-3.7 AS build-image
# Install aws-lambda-cpp build dependencies ( In case of debian they're already installed )
RUN apt-get update && apt-get install -y \
g++ \
make \
cmake \
unzip \
libcurl4-openssl-dev
# Include global args in this stage of the build
ARG FUNCTION_DIR
ARG RUNTIME_VERSION
# Create function directory
RUN mkdir -p ${FUNCTION_DIR}
# Copy handler function
COPY app/requirements.txt ${FUNCTION_DIR}/app/requirements.txt
# Optional – Install the function's dependencies
RUN python${RUNTIME_VERSION} -m pip install -r ${FUNCTION_DIR}/app/requirements.txt --target ${FUNCTION_DIR}
# Install Lambda Runtime Interface Client for Python
# RUN python${RUNTIME_VERSION} -m pip install awslambdaric --target ${FUNCTION_DIR}
# Stage 3 - final runtime image
# Grab a fresh copy of the Python image
FROM python-3.7
# Include global arg in this stage of the build
ARG FUNCTION_DIR
# Set working directory to function root directory
WORKDIR ${FUNCTION_DIR}
# Copy in the built dependencies
COPY --from=build-image ${FUNCTION_DIR} ${FUNCTION_DIR}
# Install librosa system dependencies
RUN apt-get update -y && apt-get install -y \
libsndfile1 \
ffmpeg
# (Optional) Add Lambda Runtime Interface Emulator and use a script in the ENTRYPOINT for simpler local runs
# ADD https://github.com/aws/aws-lambda-runtime-interface-emulator/releases/latest/download/aws-lambda-rie /usr/bin/aws-lambda-rie
# COPY entry.sh /
COPY app ${FUNCTION_DIR}/app
ENV NUMBA_CACHE_DIR=/tmp
RUN ln -s /usr/lib/x86_64-linux-gnu/libsndfile.so.1 /usr/local/bin/libsndfile.so.1
# enable below for local testing
# COPY events ${FUNCTION_DIR}/events
# COPY .env ${FUNCTION_DIR}
# RUN chmod 755 /usr/bin/aws-lambda-rie /entry.sh
ENTRYPOINT [ "/usr/local/bin/python", "-m", "awslambdaric" ]
CMD [ "app.handler.lambda_handler" ]
Below is my lambda config
{
"FunctionName": "DenoiseAudio",
"FunctionArn": "arn:aws:lambda:us-east-1:xxxx:function:DenoiseAudio",
"Role": "arn:aws:iam::xxxx:role/lambda-s3-role",
"CodeSize": 0,
"Description": "",
"Timeout": 120,
"MemorySize": 128,
"LastModified": "2021-01-25T13:41:00.000+0000",
"CodeSha256": "84ae6e6e475cad50ae5176d6176de09e95a74d0e1cfab3df7cf66a41f65e4e19",
"Version": "$LATEST",
"TracingConfig": {
"Mode": "PassThrough"
},
"RevisionId": "43c6e7c4-27a8-4c6d-8c32-c1e074d40a62",
"State": "Active",
"LastUpdateStatus": "Successful",
"PackageType": "Image",
"ImageConfigResponse": {}
}
It's resolved now. It turns out that the error log was wrong. It was happening because lambda was not assigned enough memory. As soon as I assigned enough memory the problem went away.
It looks like you are not building from one of the AWS managed base images which include the runtime interface client. It also looks like you are not adding it manually as well. My recommendation is to start with the Python3.7 or Python3.8 base image (both found here). Then add the other libraries as needed. I would also encourage you to look at the AWS SAM implementation of container image support for Lambda functions. It will make your life easier :). You can find a video demonstration of it here: https://youtu.be/3tMc5r8gLQ8?t=1390.
If you want still want to build an image from scratch, take a look at the documents that will guide you through the requirements found here.
Related
I have a web application that's a hybrid of JS/NPM/Webpack for the frontend and Python/Django for the backend. The backend code and the source code for the frontend are stored in the code repository but the "compiled" frontend code is not as the expectation is that Webpack would build this code after deployment.
Currently, I have the following package.json:
{
"name": "Name",
"description": "...",
"scripts": {
"start": "npx webpack --config webpack.config.js"
},
"engines": {
"npm": ">=8.11.0",
"node": ">=16.15.1"
},
"devDependencies": {
[...]
},
"dependencies": {
[...]
}
}
The app is deployed to Google Cloud's Run Cloud via the deploy command, specifically:
/gcloud/google-cloud-sdk/bin/gcloud run deploy [SERVICE-NAME] --source . --region us-west1 --allow-unauthenticated
However, the command npx webpack --config webpack.config.js is apparently never executed as the built files are not generated. Django returns the error:
Error reading /app/webpack-stats.json. Are you sure webpack has generated the file and the path is correct?
What's the most elegant/efficient way to execute the build command in production? Should I include in the Dockerfile via RUN npx webpack --config webpack.config.js? I'm not even sure this would work.
Edit 1:
My current Dockerfile:
# Base image is one of Python's official distributions.
FROM python:3.8.13-slim-buster
# Declare generic app variables.
ENV APP_ENVIRONMENT=Dev
# Update and install libraries.
RUN apt update
RUN apt -y install \
sudo \
curl \
install-info \
git-all \
gnupg \
lsb-release
# Install nodejs.
RUN curl -fsSL https://deb.nodesource.com/setup_18.x | sudo -E bash -
RUN sudo apt install -y nodejs
RUN npx webpack --config webpack.config.js
# Copy local code to the container image. This is ncessary
# for the installation on Cloud Run to work.
ENV APP_HOME /app
WORKDIR $APP_HOME
COPY . ./
# Handle requirements.txt first so that we don't need to re-install our
# python dependencies every time we rebuild the Dockerfile.
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
# Run the web service on container startup. Here we use the gunicorn
# webserver, with one worker process and 8 threads.
# For environments with multiple CPU cores, increase the number of workers
# to be equal to the cores available.
# Timeout is set to 0 to disable the timeouts of the workers to allow Cloud Run to handle instance scaling.
# Note that the $PORT variable is available by default on Cloud Run.
CMD exec gunicorn --bind :$PORT --workers 1 --threads 8 --timeout 0 --chdir project/ backbone.wsgi:application
According to your error message (Error reading /app/webpack-stats.json), there is a reference to /app directory in webpack.config.js. It could be the problem, because at that point the directory does not exist. Try to run npx webpack command after WORKDIR /app.
I am trying to create a python based image with some packages installed. But i want the image layer not to show anything about the packages I installed.
I am trying to use the multistage build
eg:
FROM python:3.9-slim-buster as builder
RUN pip install django # (I dont want this command to be seen when checking the docker image layers, So thats why using multistage build)
FROM python:3.9-slim-buster
# Here i want to copy all the site packages
COPY --from=builder /usr/local/lib/python3.9/site-packages /usr/local/lib/python3.9/site-packages
Now build image
docker build -t python_3.9-slim-buster_custom:latest .
and later check the image layers
dive python_3.9-slim-buster_custom:latest
this will not show the RUN pip install django line
Will this be a good way to achieve what i want (hide all the pip install commands)
It depends on what you are installing, if this will be sufficient or not. Some python libraries add binaries to your system on which they rely.
FROM python:3.9-alpine as builder
# install stuff
FROM python:3.9-alpine
# this is for sure required
COPY --from=builder /usr/local/lib/python3.9/site-packages /usr/local/lib/python3.9/site-packages
# this depends on what you are installing
COPY --from=builder /usr/local/bin /usr/local/bin
The usual approach I see for this is to use a virtual environment in an earlier build stage, then copy the entire virtual environment into the final image. Remember that virtual environments are very specific to a single Python build and installation path.
If your application has its own setup.cfg or setup.py file, then a minimal version of this could look like:
FROM python:3.9-slim-buster as builder
# If you need build-only tools, like build-essential for Python C
# extensions, install them first
# RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install ...
WORKDIR /src
# Create and "activate" the virtual environment
RUN python3 -m venv /app
ENV PATH=/app/bin:$PATH
# Install the application as normal
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
RUN pip install .
FROM python:3.9-slim-buster as builder
# If you need runtime libraries, like a database client C library,
# install them first
# RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install ...
# Copy the entire virtual environment over
COPY --from=builder /app /app
ENV PATH=/app/bin:$PATH
# Run an entry_points script from the setup.cfg as the main command
CMD ["my_app"]
Note that this has only minimal protection against a curious user seeing what's in the image. The docker history or docker inspect output will show the /app container directory, you can docker run --rm the-image pip list to see the package dependencies, and the application and library source will be present in a human-readable form.
Currently whats working for me is.
FROM python:3.9-slim-buster as builder
# DO ALL YOUR STUFF HERE
FROM python:3.9-slim-buster
COPY --from=builder / /
Can anyone explain the advantages of multi-stage builds, especially what is going on here in this specific Dockerfile example?
ref: Section titled:
Create an image from an alternative base image
https://docs.aws.amazon.com/lambda/latest/dg/images-create.html#images-create-from-alt
Question:
What advantage does this approach have:
FROM python:buster as build-image
ARG FUNCTION_DIR="/function"
<instructions>
FROM python:buster
# Copy in the build image dependencies
COPY --from=build-image ${FUNCTION_DIR} ${FUNCTION_DIR}
<more instructions>
over this approach
FROM python:buster
<all needed instructions>
I'm not seeing the advantage, or why this approach would be taken but I do not doubt there is some advantage to this approach.
copy of Dockerfile from link above
# Define function directory
ARG FUNCTION_DIR="/function"
FROM python:buster as build-image
# Install aws-lambda-cpp build dependencies
RUN apt-get update && \
apt-get install -y \
g++ \
make \
cmake \
unzip \
libcurl4-openssl-dev
# Include global arg in this stage of the build
ARG FUNCTION_DIR
# Create function directory
RUN mkdir -p ${FUNCTION_DIR}
# Copy function code
COPY app/* ${FUNCTION_DIR}
# Install the runtime interface client
RUN pip install \
--target ${FUNCTION_DIR} \
awslambdaric
# Multi-stage build: grab a fresh copy of the base image
FROM python:buster
# Include global arg in this stage of the build
ARG FUNCTION_DIR
# Set working directory to function root directory
WORKDIR ${FUNCTION_DIR}
# Copy in the build image dependencies
COPY --from=build-image ${FUNCTION_DIR} ${FUNCTION_DIR}
ENTRYPOINT [ "/usr/local/bin/python", "-m", "awslambdaric" ]
CMD [ "app.handler" ]
A complete C toolchain is quite large; guesstimating, an Ubuntu-based python image probably triples in size if you install g++, make, and the required C header files. You only need this toolchain to build C extensions but you don't need it once it's built.
So the multi-stage build here runs in two parts:
Install the full C toolchain. Build Python packages with C extensions and put them in a known directory. This is very large but is only an intermediate step.
Start from a plain Python image, without the toolchain, and copy the built libraries from the first image. This is more moderately sized and is what is eventually run and redistributed.
This is a very typical use of a multi-stage build, and you see similarities to it in many languages. In Go you can have a first stage that builds a statically-linked binary and then a final stage that only contains the binary and not the compiler; in Node you can have a first stage that includes things like the Typescript compiler and a final stage that only includes production dependencies; and so on.
I am trying to dockerise my elm application (code is open source), here is my Dockerfile:
# set base image as alpine
FROM alpine:3.11.2 AS builder
# download the elm compiler and extract it to /user/local/bin/elm
RUN wget -O - 'https://github.com/elm/compiler/releases/download/0.19.1/binary-for-linux-64-bit.gz' \
| gunzip -c >/usr/local/bin/elm
# make the elm compiler executable
RUN chmod +x /usr/local/bin/elm
# update remote repositories
RUN apk update
# install nodejs
RUN apk add --update nodejs npm
# install uglifyjs
RUN npm install uglify-js --global
# set the working directory for any RUN, CMD, ENTRYPOINT, COPY and ADD
# instructions that follows the WORKDIR instruction.
WORKDIR /app
# remember, our current working directory within the container is /app
# we now copy everything (except stuff listed in .dockerignore)
# from local machine to /app (in the container).
COPY . .
# build elm production code
RUN elm make src/app/Main.elm --optimize --output=elm.js
When I run docker build . --no-cache I get the following error:
ConnectionFailure Network.Socket.getAddrInfo (called with preferred socket type/protocol: AddrInfo {addrFlags = [AI_ADDRCONFIG], addrFamily = AF_UNSPEC, addrSocketType = Stream, addrProtocol = 0, addrAddress = , addrCanonName = }, host name: Just "package.elm-lang.org", service name: Just "443"): does not exist (Try again)
Here is what it looks like:
I don't have any connection issues, plus if I did have any, then you'd think the install of nodejs and uglifyjs would also fail, correct? Yet those install without any problems.
I'm confused and not really sure what I need to do.
Looks like this is an OS level networking problem. An easy hack would be to wrap the make command in an infinite retry loop (in a different script) and run that script instead
#!/usr/bin/env bash
while :
do
elm make src/app/Main.elm --optimize --output=elm.js
[ $? -eq 0 ] && exit # exit if above command is successful
done
And in your Dockerfile, change the last line to
# build elm production code
RUN ./retry.sh
I am trying to use a Python wrapper for a Java library called Tabula. I need both Python and Java images within my Docker container. I am using the openjdk:8 and python:3.5.3 images. I am trying to build the file using Docker-compose, but it returns the following message:
/bin/sh: 1: java: not found
when it reaches the line RUN java -version within the Dockerfile. The line RUN find / -name "java" also doesn't return anything, so I can't even find where Java is being installed in the Docker environment.
Here is my Dockerfile:
FROM python:3.5.3
FROM openjdk:8
FROM tailordev/pandas
RUN apt-get update && apt-get install -y \
python3-pip
# Create code directory
ENV APP_HOME /usr/src/app
RUN mkdir -p $APP_HOME/temp
WORKDIR /$APP_HOME
# Install app dependencies
ADD requirements.txt $APP_HOME
RUN pip3 install -r requirements.txt
# Copy source code
COPY *.py $APP_HOME/
RUN find / -name "java"
RUN java -version
ENTRYPOINT [ "python3", "runner.py" ]
How do I install Java within the Docker container so that the Python wrapper class can invoke Java methods?
This Dockerfile can not work because the multiple FROM statements at the beginning don't mean what you think it means. It doesn't mean that all the contents of the Images you're referring to in the FROM statements will end up in the Images you're building somehow, it actually meant two different concepts throughout the history of docker:
In the newer Versions of Docker multi stage builds, which is a very different thing from what you're trying to achieve (but very interesting nontheless).
In earlier Versions of Docker, it gave you the ability to simply build multiple images in one Dockerfile.
The behavior you are describing makes me assume you are using such an earlier Version. Let me explain what's actually happening when you run docker build on this Dockerfile:
FROM python:3.5.3
# Docker: "The User wants me to build an
Image that is based on python:3.5.3. No Problem!"
# Docker: "Ah, the next FROM Statement is coming up,
which means that the User is done with building this image"
FROM openjdk:8
# Docker: "The User wants me to build an Image that is based on openjdk:8. No Problem!"
# Docker: "Ah, the next FROM Statement is coming up,
which means that the User is done with building this image"
FROM tailordev/pandas
# Docker: "The User wants me to build an Image that is based on python:3.5.3. No Problem!"
# Docker: "A RUN Statement is coming up. I'll put this as a layer in the Image the user is asking me to build"
RUN apt-get update && apt-get install -y \
python3-pip
...
# Docker: "EOF Reached, nothing more to do!"
As you can see, this is not what you want.
What you should do instead is build a single image where you will first install your runtimes (python, java, ..), and then your application specific dependencies. The last two parts you're already doing, here's how you could go about installing your general dependencies:
# Let's start from the Alpine Java Image
FROM openjdk:8-jre-alpine
# Install Python runtime
RUN apk add --update \
python \
python-dev \
py-pip \
build-base \
&& pip install virtualenv \
&& rm -rf /var/cache/apk/*
# Install your framework dependencies
RUN pip install numpy scipy pandas
... do the rest ...
Note that I haven't tested the above snippet, you may have to adapt a few things.