Docker Container Can't Find Package Even after install - docker

I have the following repository structure (I know the names are stupid):
dummy/
main/
hello.py
requirements.txt
yips/
yips/
_init_.py
lol.py
setup.py
Dockerfile
The idea is to run the program hello.py, which imports a method from lol.py in the Yips library. For testing purposes, I import sklearn in the lol.py despite not using it. My Dockerfile looks like the following:
FROM python:3.9-bullseye
WORKDIR /yips
COPY yips/ ./
RUN pip3 install .
WORKDIR /main
COPY ./main/ .
RUN pip3 install --no-cache-dir -r ./requirements.txt
CMD ["python3", "hello.py"]
Requirements.txt has both sklearn and numpy, which is used in hello.py.
I have tried running the docker image and it complains that it cannot find sklearn and for what its worth, when I do not import it, everything works as fine (so there is not an issue with the numpy import in hello.py). I have also tried adding a direct call to pip install sklearn before installing my yips library. Does anyone have any insight on how to fix this?

You might have put scikit in your requirements.txt when you should include scikit-learn.
How to fix the error for the main use cases

Related

Missing CV2 in Docker container

I've made the following Dockerfile to build a python application:
FROM python:3.7
WORKDIR /app
# Install python dependencies
ADD requirements.txt /app/
RUN pip3 install --upgrade pip
RUN pip3 install -r requirements.txt
# Copy sources
ADD . /app
# Run detection
CMD ["detect.py" ]
ENTRYPOINT ["python3"]
The requirements.txt file contains only a few dependencies, including opencv:
opencv-python
opencv-python-headless
filterpy==1.1.0
lap==0.4.0
paho-mqtt==1.5.1
numpy
Pillow
Building the Docker image works perfectly.
When I try to run the image, I got the following error:
ImportError: libGL.so.1: cannot open shared object file: No such file or directory
Traceback (most recent call last):
File "detect.py", line 6, in <module>
import cv2
File "/usr/local/lib/python3.7/site-packages/cv2/__init__.py", line 5, in <module>
from .cv2 import *
ImportError: libGL.so.1: cannot open shared object file: No such file or directory
Seems like the CV2 dependency is not satisfied.
Is there something I missed building the Dockerfile ?
I have tried to replace opencv-python-headless by python3-opencv but there is no matching distribution found.
libGL.so.1 could be found in libgl1, so you could add next to Dockerfile:
RUN apt update; apt install -y libgl1
Typically, docker images may remove many libraries to make the size small, these dependencies most probably could be already installed in your host system.
So, for me, I usually use dpkg -S in host system to find which package I needed, then install them in container:
shubuntu1#shubuntu1:~$ dpkg -S libGL.so.1
libgl1:amd64: /usr/lib/x86_64-linux-gnu/libGL.so.1
libgl1:amd64: /usr/lib/x86_64-linux-gnu/libGL.so.1.0.0

hashes mismatch when installing from docker file -'pkgs dont match the hashes from req file'

I have dockerfile that installs from requirements.txt file. When installing librosa lib it pulled many other needed libs, when installing pycparser i got this error:
Downloading pycparser-2.20-py2.py3-none-any.whl (112 kB)
ERROR: THESE PACKAGES DO NOT MATCH THE HASHES FROM THE REQUIREMENTS FILE. If you have updated the package versions, please update the hashes. Otherwise, examine the package contents carefully; someone
may have tampered with them.
torch from https://files.pythonhosted.org/packages/8c/5d/faf0d8ac260c7f1eda7d063001c137da5223be1c137658384d2d45dcd0d5/torch-1.6.0-cp38-cp38-manylinux1_x86_64.whl#sha256=5357873e243bcfa804c32dc341f564e9a4c12addfc9baae4ee857fcc09a0a216 (from -r requirements.txt (line 4)):
Expected sha256 5357873e243bcfa804c32dc341f564e9a4c12addfc9baae4ee857fcc09a0a216
Got eb3c7b3621d64e9d9955ec0546729291338556d4ee8ccbf347169f574816f089
what's the problem with hashes? I did not indicate any hashes or ids in my reqs file.
my requirements file:
flask
pydub
scipy
torch
numpy
librosa
dockerfile:
FROM voice
RUN mkdir -p ./voice_flask/d
WORKDIR /voice_flask/d
COPY . /voice_flask/d
RUN pip install -r requirements.txt
CMD ["python", "server.py"]
I suggest you to install the packages fresh using --no-cache-dir
RUN pip install -r requirements.txt --no-cache-dir
see this Issue

How COPY in separate lines helps with less cache invalidations?

Docker documentation suggests the following as best practice.
If you have multiple Dockerfile steps that use different files from
your context, COPY them individually, rather than all at once. This
ensures that each step’s build cache is only invalidated (forcing the
step to be re-run) if the specifically required files change.
For example:
COPY requirements.txt /tmp/
RUN pip install --requirement /tmp/requirements.txt
COPY . /tmp/
Results in fewer cache invalidations for the RUN step, than if you put
the COPY . /tmp/ before it.
My question is how, how does it help?
In either cases, if requirement.txt file doesn't change then pip install would fetch me the same result, so why does it matter that in best practice scenario, the requirement.txt is the only file in directory (while doing pip install)?
On the other hand, it creates one more layer in the image, which is
something I would not want.
Say you have a very simple application
$ ls
Dockerfile main.py requirements.txt
With the corresponding Dockerfile
FROM python:3
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["./main.py"]
Now say you only change the main.py script. Since the requirements.txt file hasn't changed, the RUN pip install ... can reuse the Docker image cache. This avoids re-running pip install, which can download a lot of packages and take a while.

Issues with running keras model in docker

I wrote a python script for a keras model which uses tensorflow as backend. I tried to run this model in docker. But when I ran it, it always showed "killed". Is this because of a memory issue that my docker doesn't have that much of space to run my python script? Any suggestions are greatly appreciated.
I checked my python script and it ran well in other environments.
The data I use to feed the model is very big. When I use a smaller-size data, the model works in the container. Any suggestions on how to use the large data set?
This is my dockerfile that I used to build up the image:
FROM tensorflow/tensorflow:latest
RUN pip install pandas
RUN pip install numpy
RUN pip install sklearn
RUN pip install matplotlib
RUN pip install keras
RUN pip install tensorflow
RUN mkdir app
COPY . /app
CMD ["python", “app/model2-keras-model.py”]
when I ran it in the container, this is what I got:

how to use pip to install pkg from requirement file without reinstall

I am trying to build an Docker image. My Dockerfile is like this:
FROM python:2.7
ADD . /code
WORKDIR /code
RUN pip install -r requirement.txt
CMD ["python", "manage.py", "runserver", "0.0.0.0:8300"]
And my requirement.txt file like this:
wheel==0.29.0
numpy==1.11.3
django==1.10.5
django-cors-headers==2.0.2
gspread==0.6.2
oauth2client==4.0.0
Now, I have a little change in my code, and i need pandas, so i add it in to requirement.txt file
wheel==0.29.0
numpy==1.11.3
pandas==0.19.2
django==1.10.5
django-cors-headers==2.0.2
gspread==0.6.2
oauth2client==4.0.0
pip install -r requirement.txt will install all packages in that file, although almost of them has installed before. My question is how to make pip install pandas only? That will save the time to build image.
Thank you
If you rebuild your image after changing requirement.txt with docker build -t <your_image> ., I guess it cann't be done because each time when docker runs docker build, it'll start an intermediate container from base image, and it's a new environment so pip obviously will need to install all of dependencies.
You can consider to build your own base image on python:2.7 with common dependencies pre-installed, then build your application image on your own base image. Once there's a need to add more dependencies, manually re-build the base image on the previous one with only extra dependencies installed, and then maybe optionally docker push it back to your registry.
Hope this could be helpful :-)

Resources