How to locate openjdk in Docker container? - docker

I tried to run the pyspark application.For this first I installed pyspark from pip then pulled openjdk:8 to set the JAVA_HOME variable
Dockerfile :
FROM python:3
ADD my_script.py /
COPY requirements.txt ./
ENV JAVA_HOME /usr/lib/jvm/java-8-openjdk-amd64/
RUN pip install --no-cache-dir -r requirements.txt
CMD [ "python", "./my_script.py" ]
my_script.py :
from pyspark import SparkContext
from pyspark import SparkConf
#spark conf
conf1 = SparkConf()
conf1.setMaster("local[*]")
conf1.setAppName('hamza')
print(conf1)
sc = SparkContext(conf = conf1)
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
print(sqlContext)
Requirements.txt :
pyspark
numpy
Getting this error :
C:\Users\hrafiq\Desktop\sample>docker run -it --rm --name data2 my-python-app
<pyspark.conf.SparkConf object at 0x7f4bd933ba58>
/usr/local/lib/python3.7/site-packages/pyspark/bin/spark-class: line 71:
/usr/lib/jvm/java-8-openjdk-amd64//bin/java: No such file or directory
Traceback (most recent call last):
File "./my_script.py", line 14, in <module>
sc = SparkContext(conf = conf1)
File "/usr/local/lib/python3.7/site-packages/pyspark/context.py", line 115, in __init__
SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
File "/usr/local/lib/python3.7/site-packages/pyspark/context.py", line 298, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway(conf)
File "/usr/local/lib/python3.7/site-packages/pyspark/java_gateway.py", line 94, in launch_gateway
raise Exception("Java gateway process exited before sending its port number")
Exception: Java gateway process exited before sending its port number
So the question is If is not finding the java file then how will I find the that file? I know it is stored in some virtual hard disk which we dont have any access.
Any help would be appreciated
Thanks

Setting the JAVA_HOME env var is not enough. You need to actually install openjdk inside your docker image.
Your base image (python:3) is itself based on a Debian Stretch image. So you can use apt-get install to fetch the JDK :
FROM python:3
RUN apt-get update && \
apt-get install -y openjdk-8-jdk-headless && \
rm -rf /var/lib/apt/lists/*
ENV JAVA_HOME /usr/lib/jvm/java-8-openjdk-amd64/
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
COPY my_script.py ./
CMD [ "python", "./my_script.py" ]
(In the above I have optimized the layers ordering so that you won't need to re-build the pip install layer when only your script changes)

Related

Cannot install private dependency from artifact registry inside docker build

I am trying to install a private python package that was uploaded to an artifact registry inside a docker container (to deploy it on cloudrun).
I have sucessfully used that package in a cloud function in the past, so I am sure the package works.
cloudbuild.yaml
steps:
- name: 'gcr.io/cloud-builders/docker'
args: [ 'build', '-t', 'gcr.io/${_PROJECT}/${_SERVICE_NAME}:$SHORT_SHA', '--network=cloudbuild', '.', '--progress=plain']
Dockerfile
FROM python:3.8.6-slim-buster
ENV APP_PATH=/usr/src/app
ENV PORT=8080
# Copy requirements.txt to the docker image and install packages
RUN apt-get update && apt-get install -y cython
RUN pip install --upgrade pip
# Set the WORKDIR to be the folder
RUN mkdir -p $APP_PATH
COPY / $APP_PATH
WORKDIR $APP_PATH
RUN pip install -r requirements.txt --no-color
RUN pip install --extra-index-url https://us-west1-python.pkg.dev/my-project/my-package/simple/ my-package==0.2.3 # This line is where the bug occurs
# Expose port
EXPOSE $PORT
# Use gunicorn as the entrypoint
CMD exec gunicorn --bind 0.0.0.0:8080 app:app
The permissions I added are:
cloudbuild default service account (project-number#cloudbuild.gserviceaccount.com): Artifact Registry Reader
service account running the cloudbuild : Artifact Registry Reader
service account running the app: Artifact Registry Reader
The cloudbuild error:
Step 10/12 : RUN pip install --extra-index-url https://us-west1-python.pkg.dev/my-project/my-package/simple/ my-package==0.2.3
---> Running in b2ead00ccdf4
Looking in indexes: https://pypi.org/simple, https://us-west1-python.pkg.dev/muse-speech-devops/gcp-utils/simple/
User for us-west1-python.pkg.dev: [91mERROR: Exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/pip/_internal/cli/base_command.py", line 167, in exc_logging_wrapper
status = run_func(*args)
File "/usr/local/lib/python3.8/site-packages/pip/_internal/cli/req_command.py", line 205, in wrapper
return func(self, options, args)
File "/usr/local/lib/python3.8/site-packages/pip/_internal/commands/install.py", line 340, in run
requirement_set = resolver.resolve(
File "/usr/local/lib/python3.8/site-packages/pip/_internal/resolution/resolvelib/resolver.py", line 94, in resolve
result = self._result = resolver.resolve(
File "/usr/local/lib/python3.8/site-packages/pip/_vendor/resolvelib/resolvers.py", line 481, in resolve
state = resolution.resolve(requirements, max_rounds=max_rounds)
File "/usr/local/lib/python3.8/site-packages/pip/_vendor/resolvelib/resolvers.py", line 348, in resolve
self._add_to_criteria(self.state.criteria, r, parent=None)
File "/usr/local/lib/python3.8/site-packages/pip/_vendor/resolvelib/resolvers.py", line 172, in _add_to_criteria
if not criterion.candidates:
File "/usr/local/lib/python3.8/site-packages/pip/_vendor/resolvelib/structs.py", line 151, in __bool__
From your traceback log, we can see that Cloud Build doesn't have the credentials to authenticate to the private repo:
Step 10/12 : RUN pip install --extra-index-url https://us-west1-python.pkg.dev/my-project/my-package/simple/ my-package==0.2.3
---> Running in b2ead00ccdf4
Looking in indexes: https://pypi.org/simple, https://us-west1-python.pkg.dev/muse-speech-devops/gcp-utils/simple/
User for us-west1-python.pkg.dev: [91mERROR: Exception: //<-ASKING FOR USERNAME
I uploaded a simple package to a private Artifact Registry repo to test this out when building a container and also received the same message. Since you seem to be authenticating with a service account key, the username and password will need to be stored inside pip.conf:
pip.conf
[global]
extra-index-url = https://_json_key_base64:KEY#LOCATION-python.pkg.dev/PROJECT/REPOSITORY/simple/
This file therefore needs to be available during the build process. Multi-stage docker builds are very useful here to ensure the configuration keys are not exposed, since we can choose what files make it into the final image (configuration keys would only be present while used to download the packages from the private repo):
Sample Dockerfile
# Installing packages in a separate image
FROM python:3.8.6-slim-buster as pkg-build
# Target Python environment variable to bind to pip.conf
ENV PIP_CONFIG_FILE /pip.conf
WORKDIR /packages/
COPY requirements.txt /
# Copying the pip.conf key file only during package downloading
COPY ./config/pip.conf /pip.conf
# Packages are downloaded to the /packages/ directory
RUN pip download -r /requirements.txt
RUN pip download --extra-index-url https://LOCATION-python.pkg.dev/PROJECT/REPO/simple/ PACKAGES
# Final image that will be deployed
FROM python:3.8.6-slim-buster
ENV PYTHONUNBUFFERED True
ENV APP_HOME /app
WORKDIR /packages/
# Copying ONLY the packages from the previous build
COPY --from=pkg-build /packages/ /packages/
# Installing the packages from the copied files
RUN pip install --no-index --find-links=/packages/ /packages/*
WORKDIR $APP_HOME
COPY ./src/main.py ./
# Executing sample flask web app
CMD exec gunicorn --bind :$PORT --workers 1 --threads 8 --timeout 0 main:app
I based the dockerfile above on this related thread, and I could confirm the packages were correctly downloaded from my private Artifact Registry repo, and also that the pip.conf file was not present in the resulting image.

Dockerfile for a fastAPI app using Factory pattern with unicorn

I am building a back-end service for a full-stack application using fastAPI and unicorn.
src/asgi.py
import uvicorn
from src import create_app
app = create_app()
if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", log_level="info", reload=True)
src/init.py
from fastapi import FastAPI
from src.api.v1.auth import auth_router
from src.core.config import *
def create_app() -> FastAPI:
root_app = FastAPI()
root_app.include_router(
auth_router,
prefix="/api/v1",
tags=["auth"],
)
return root_app
Dockerfile
FROM python:3.9
RUN mkdir /app
WORKDIR /app
RUN apt update && \
apt install -y postgresql-client
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
How I am building and running:
docker build -t travian-back:v1 .
travian-back:v1 uvicorn asgi:app
There is no error at all, server is up at http://127.0.0.1:8000
Now I am trying to directly add the uvicorn asgi:app command to my Dockerfile. The reason is because I am going to use docker-compose at the end and it would be easier. This is what I have now:
Dockerfile
RUN mkdir /app
WORKDIR /app
RUN apt update && \
apt install -y postgresql-client
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["uvicorn", "asgi:app"]
Now instead of doing travian-front:v1 uvicorn asgi:app I am doing travian-back:v1 uvicorn asgi:app , I have no error when building and running my docker image but the server can't be reached at http://127.0.0.1:8000
The thing is that you don't run the asgi file as main, since you use uvicorn to point to it. So it's not listening on 0.0.0.0 or better put, all those options are ignored.
Either invoke the asgi file directly, which I would not recommend, or drop the asgi file and use uvicorn with the --factory flag and point it to your app factory.
ENTRYPOINT ["uvicorn", "src.init:create_app", "--factory", "--host 0.0.0.0"]
I am using entrypoint here so that you can pass additional flags such as log level on run without overriding this.
docker run -p 8000:8000 myapp --log-level warning
That said, I am somewhat confused by your file name init.py. Do you mean __init__.py? If so I would not put the factory in this file, __init__.py is not meant to be used like this. Put it in a file named main.py or similar.

Missing CV2 in Docker container

I've made the following Dockerfile to build a python application:
FROM python:3.7
WORKDIR /app
# Install python dependencies
ADD requirements.txt /app/
RUN pip3 install --upgrade pip
RUN pip3 install -r requirements.txt
# Copy sources
ADD . /app
# Run detection
CMD ["detect.py" ]
ENTRYPOINT ["python3"]
The requirements.txt file contains only a few dependencies, including opencv:
opencv-python
opencv-python-headless
filterpy==1.1.0
lap==0.4.0
paho-mqtt==1.5.1
numpy
Pillow
Building the Docker image works perfectly.
When I try to run the image, I got the following error:
ImportError: libGL.so.1: cannot open shared object file: No such file or directory
Traceback (most recent call last):
File "detect.py", line 6, in <module>
import cv2
File "/usr/local/lib/python3.7/site-packages/cv2/__init__.py", line 5, in <module>
from .cv2 import *
ImportError: libGL.so.1: cannot open shared object file: No such file or directory
Seems like the CV2 dependency is not satisfied.
Is there something I missed building the Dockerfile ?
I have tried to replace opencv-python-headless by python3-opencv but there is no matching distribution found.
libGL.so.1 could be found in libgl1, so you could add next to Dockerfile:
RUN apt update; apt install -y libgl1
Typically, docker images may remove many libraries to make the size small, these dependencies most probably could be already installed in your host system.
So, for me, I usually use dpkg -S in host system to find which package I needed, then install them in container:
shubuntu1#shubuntu1:~$ dpkg -S libGL.so.1
libgl1:amd64: /usr/lib/x86_64-linux-gnu/libGL.so.1
libgl1:amd64: /usr/lib/x86_64-linux-gnu/libGL.so.1.0.0

docker: Error response from daemon: OCI runtime create failed:

I am trying to run my docker image, but I am receiving the following error:
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "exec: \"python3\": executable file not found in $PATH": unknown.
ERRO[0000] error waiting for container: context canceled
Here is my Dockerfile:
FROM python:3.6-alpine3.7
RUN apk add --no-cache python3-dev \
&& pip3 install --upgrade pip
RUN apk add --no-cache --update \
python3 python3-dev gcc \
gfortran musl-dev
RUN apk add --no-cache libressl-dev musl-dev libffi-dev
RUN python3.6 -m pip install --upgrade pip
RUN apk --no-cache add git
RUN apk add mariadb-dev
WORKDIR /socialworks-api
COPY . /socialworks-api
RUN pip3 --no-cache-dir install -r requirements.txt
ENV PATH="/opt/gtk/bin:$env/development.env"
EXPOSE 5000
ENTRYPOINT ["python3"]
CMD ["app.py"]
I think the issue might be with the ENV PATH which I set. I have also tried setting it to:
ENV PATH="/opt/gtk/bin:${env/development.env}"
But I would receive the following error when building my dockerfile:
Step 11/14 : ENV PATH="/opt/gtk/bin:${env/development.env}"
failed to process "\"/opt/gtk/bin:${env/development.env}\"": missing ':' in substitution
Without setting the environment, my application won't run.
I have also tried running on my dockerfile this command:
RUN export $(grep -v '^#' ./env/development.env | xargs)
It build successfully, but when I enter this command to the terminal:
docker run -it flaskapp
I am receiving an error that it still unable to locate the env variables.
$ docker run -it flaskapp
Traceback (most recent call last):
File "app.py", line 11, in <module>
app.config.from_object("config.Config")
File "/usr/local/lib/python3.6/site-packages/flask/config.py", line 174, in from_object
obj = import_string(obj)
File "/usr/local/lib/python3.6/site-packages/werkzeug/utils.py", line 568, in import_string
__import__(import_name)
File "/socialworks-api/config.py", line 4, in <module>
class Config(object):
File "/socialworks-api/config.py", line 5, in Config
MYSQL_HOST = os.environ['MYSQL_HOST']
File "/usr/local/lib/python3.6/os.py", line 669, in __getitem__
raise KeyError(key) from None
KeyError: 'MYSQL_HOST'
In your Dockerfile you specify
ENV PATH="/opt/gtk/bin:$env/development.env"
When you later run python3, the system looks in only these two directories, and when there's not a Python there, you get the error you see. It's probably in /usr/bin/python3, but the directory /usr/bin isn't in $PATH.
The simplest answer here is to delete this line entirely. Your Dockerfile creates neither a /opt/gtk nor a /development.env directory, so there's no files in either path that could be executed. If you do need to install custom software, putting it in a directory that's in the default system path (like /usr/local/bin) is a good approach.
If you do need to keep this line, make sure the existing $PATH stays as part of the value.
ENV PATH="/opt/gtk/bin:$PATH"
(Don't specify ENTRYPOINT ["python"]; combine this into a single CMD ["python3", "app.py"]. I'd encourage you to run docker run --rm -it yourimage /bin/sh to look around the filesystem and run debugging commands like env, but the ENTRYPOINT declaration breaks this use, and having the script file named in the CMD means it's still not the "container-as-command" pattern.)

Docker build for numpy , pandas giving error

I have a Dockerfile in a directory called docker_test. The structure of docker_test is as follows:
M00618927A:docker_test i854319$ ls
Dockerfile hello_world.py
My dockerfile looks like below:
### Dockerfile
# Created by Baktaawar
# Pulling from base Python image
FROM python:3.6.7-alpine3.6
# author of file
LABEL maintainer="Baktawar"
# Set the working directory of the docker image
WORKDIR /docker_test
COPY . /docker_test
# packages that we need
RUN pip --no-cache-dir install numpy pandas jupyter
EXPOSE 8888
ENTRYPOINT ["python"]
CMD ["hello_world.py"]
I run the command
docker build -t dockerfile .
It starts the building process but then gives the following error in not being able to get the numpy etc installed in the image
Sending build context to Docker daemon 4.096kB
Step 1/8 : FROM python:3.6.7-alpine3.6
---> 8f30079779ef
Step 2/8 : LABEL maintainer="Baktawar"
---> Running in 7cf081021b1e
Removing intermediate container 7cf081021b1e
---> 581cf24fa4e6
Step 3/8 : WORKDIR /docker_test
---> Running in 7c58855c4332
Removing intermediate container 7c58855c4332
---> dae70a34626b
Step 4/8 : COPY . /docker_test
---> 432b174b4869
Step 5/8 : RUN pip --no-cache-dir install numpy pandas jupyter
---> Running in 972efa9336ed
Collecting numpy
Downloading https://files.pythonhosted.org/packages/cf/8d/6345b4f32b37945fedc1e027e83970005fc9c699068d2f566b82826515f2/numpy-1.16.2.zip (5.1MB)
Collecting pandas
Downloading https://files.pythonhosted.org/packages/81/fd/b1f17f7dc914047cd1df9d6813b944ee446973baafe8106e4458bfb68884/pandas-0.24.1.tar.gz (11.8MB)
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 357, in get_provider
module = sys.modules[moduleOrReq]
KeyError: 'numpy'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-install-8c3o0ycd/pandas/setup.py", line 732, in <module>
ext_modules=maybe_cythonize(extensions, compiler_directives=directives),
File "/tmp/pip-install-8c3o0ycd/pandas/setup.py", line 475, in maybe_cythonize
numpy_incl = pkg_resources.resource_filename('numpy', 'core/include')
File "/usr/local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 1142, in resource_filename
return get_provider(package_or_requirement).get_resource_filename(
File "/usr/local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 359, in get_provider
__import__(moduleOrReq)
ModuleNotFoundError: No module named 'numpy'
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-8c3o0ycd/pandas/
You are using pip version 18.1, however version 19.0.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
The command '/bin/sh -c pip --no-cache-dir install numpy pandas jupyter' returned a non-zero code: 1
How can I get this basic setup done?
You basically need to install the following on alpine, in order to be able to install numpy:
apk --no-cache add musl-dev linux-headers g++
Try the following Dockerfile:
### Dockerfile
# Created by Baktawar
# Pulling from base Python image
FROM python:3.6.7-alpine3.6
# author of file
LABEL maintainer="Baktawar"
# Set the working directory of the docker image
WORKDIR /app
COPY . /app
# Install native libraries, required for numpy
RUN apk --no-cache add musl-dev linux-headers g++
# Upgrade pip
RUN pip install --upgrade pip
# packages that we need
RUN pip install numpy && \
pip install pandas && \
pip install jupyter
EXPOSE 8888
ENTRYPOINT ["python"]
CMD ["hello_world.py"]
You may find this gist interresting:
https://gist.github.com/orenitamar/f29fb15db3b0d13178c1c4dd611adce2
And this package on alpine, is also of interrest I think:
https://pkgs.alpinelinux.org/package/edge/community/x86/py-numpy
Update
In order to tag the image properly, use the syntax:
docker build -f <dockerfile> -t <tagname:tagversion> <buildcontext>
For you, this would be:
docker build -t mypythonimage:0.1 .

Resources