NLTK is not working in docker - docker

I want to run a nltk service in docker. However I always get the error message "'nltk' is not a package". Are you able to figure out what is going wrong? During build everything works fine, the nltk version is printed. When starting the container with docker-compose up nltk I get
$ docker-compose up nltk
Recreating nltk
Attaching to nltk
nltk | Traceback (most recent call last):
nltk | File "/var/www/nltk.py", line 1, in <module>
nltk | from nltk.corpus import brown
nltk | File "/var/www/nltk.py", line 1, in <module>
nltk | from nltk.corpus import brown
nltk | ModuleNotFoundError: No module named 'nltk.corpus'; 'nltk' is not a package
docker-compose.yml
nltk:
build: docker/nltk
container_name: nltk
volumes:
- ./volumes/nltk/var/www/nltk.py:/var/www/nltk.py
environment:
HOME: /var/www
Dockerfile
FROM python:3.6
RUN mkdir /var/www
ENV HOME /var/www
WORKDIR /var/www
RUN pip install -U nltk
RUN pip install -U numpy
RUN python -m nltk.downloader -d $HOME/nltk_data all
RUN python -c "import nltk"
RUN python -c "import nltk; print(nltk.__version__)"
EXPOSE 80
CMD [ "python", "/var/www/nltk.py" ]
nltk.py
import nltk
from nltk.corpus import brown
brown.words()

final Dockerfile
FROM python:3.6
ENV NLTK_DATA /usr/share/nltk_data
RUN pip install -U nltk
RUN pip install -U numpy
RUN python -m nltk.downloader -d /usr/share/nltk_data all
EXPOSE 80
WORKDIR /var/www
CMD [ "python", "/var/www/main.py" ]
final docker-compose
nltk:
build: docker/nltk
container_name: nltk
volumes:
- ./volumes/nltk/var/www/main.py:/var/www/main.py

Try renaming nltk.py to something else. I'm guessing the import nltk and from nltk.corpus is trying to import from your nltk.py file instead of the package. The reason it works when building the image is because your nltk.py file didn't exist yet, since it is mounted at runtime from the compose file.

Related

Docker run giving 'No module named 'pandas''

This is my docker file
FROM public.ecr.aws/i7d4o1h8/miniconda3:4.10.3p0
RUN pip install --upgrade pip
COPY condaEnv.yml .
RUN conda env create -f condaEnv.yml python=3.9.7
RUN pip install sagemaker-inference
COPY inference_code.py /opt/ml/code/
ENV SAGEMAKER_SUBMIT_DIRECTORY /opt/ml/code/
ENV SAGEMAKER_PROGRAM inference_code.py
ENTRYPOINT ["python", "/opt/ml/code/inference_code.py"]
When I run docker build with the command docker build -t docker_name ., it is successful, at the end I see Successfully tagged docker_name:latest
But when I am trying to run the docker image it gives
Traceback (most recent call last):
File "/opt/ml/code/inference_code.py", line 4, in <module>
import pandas as pd
ModuleNotFoundError: No module named 'pandas'
But in the condaEnv.yml file I have the pandas defined as
name: plato_vrar
channels:
- conda-forge
- defaults
dependencies:
- pandas=1.3.4
- pip=21.2.4
prefix: plato_vrar/
What am I missing here?
In anaconda, creating an environment means only the environment is prepared. You need also to activete it using conda activate <ENV_NAME> then python is correctly linked to the anaconda version rather than the system version. Refer to the conda document for more details: https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#activating-an-environment

Missing CV2 in Docker container

I've made the following Dockerfile to build a python application:
FROM python:3.7
WORKDIR /app
# Install python dependencies
ADD requirements.txt /app/
RUN pip3 install --upgrade pip
RUN pip3 install -r requirements.txt
# Copy sources
ADD . /app
# Run detection
CMD ["detect.py" ]
ENTRYPOINT ["python3"]
The requirements.txt file contains only a few dependencies, including opencv:
opencv-python
opencv-python-headless
filterpy==1.1.0
lap==0.4.0
paho-mqtt==1.5.1
numpy
Pillow
Building the Docker image works perfectly.
When I try to run the image, I got the following error:
ImportError: libGL.so.1: cannot open shared object file: No such file or directory
Traceback (most recent call last):
File "detect.py", line 6, in <module>
import cv2
File "/usr/local/lib/python3.7/site-packages/cv2/__init__.py", line 5, in <module>
from .cv2 import *
ImportError: libGL.so.1: cannot open shared object file: No such file or directory
Seems like the CV2 dependency is not satisfied.
Is there something I missed building the Dockerfile ?
I have tried to replace opencv-python-headless by python3-opencv but there is no matching distribution found.
libGL.so.1 could be found in libgl1, so you could add next to Dockerfile:
RUN apt update; apt install -y libgl1
Typically, docker images may remove many libraries to make the size small, these dependencies most probably could be already installed in your host system.
So, for me, I usually use dpkg -S in host system to find which package I needed, then install them in container:
shubuntu1#shubuntu1:~$ dpkg -S libGL.so.1
libgl1:amd64: /usr/lib/x86_64-linux-gnu/libGL.so.1
libgl1:amd64: /usr/lib/x86_64-linux-gnu/libGL.so.1.0.0

Dockerfile Run Python Pipenv Build Fails

Learning how to dockerize applications and I ran into a snag where my Dockerfile build fails outputting:
app_1 | Traceback (most recent call last):
app_1 | File "app.py", line 4, in <module>
app_1 | from flask import Flask, render_template, request, json
app_1 | ModuleNotFoundError: No module named 'flask'
bucketlist-flask-mysql_app_1 exited with code 1
The directory structure in the container is:
/app
/app -> app.py Pipfile.lock
The directory in the repository is
Dockerfile docker-compose.yml /src
/src -> Pipfile Pipfile.lock sql_scripts/ FlaskApp/app.py
The Dockerfile is:
# alpine with py3.8 - reqd version of python for pipenv
from python:3.8-alpine
#EXPOSE port, default is 5000, app uses 5000
EXPOSE 5000
# create a directory to run the app in
WORKDIR /app
# install pip system-wide
RUN pip install pipenv
#move the files into /app
COPY src/Pipfile.lock /app
# add the application files
COPY src/FlaskApp /app
# run the application at launch
RUN pipenv install --ignore-pipfile
CMD ["pipenv", "run", "python3", "app.py"]
and the docker-compose is:
version: "3"
services:
app:
build: .
links:
- db
ports:
- "5000:5000"
db:
image: mariadb
restart: always
ports:
- "32000:3306"
environment:
MYSQL_ROOT_PASSWORD: root
volumes:
- ./src/sql_scripts:/docker-entry-point-initdb.d/:ro
I've done quite a bit of iteration and troubleshooting. If I run start a python:3.8-alpine container and manually copy over Pipfile.lock and app.py I can run in sh:
pip install pipenv
pipenv install --ignore-pipfile
pipenv run python3 app.py
When ran from sh manually the application builds and runs perfectly, my best conclusion currently is that the processes may be running concurrently and not allowing the pipenv install to finish executing?
For reference, I believe the python is just fine but the first 4 lines of app.py are:
#!/bin/bash
"exec" "pipenv" "run" "python3" "$(pwd | sed 's?src.*?src/FlaskApp/app.py?g')"
#START: app.py code \/
from flask import Flask, render_template, request, jsom
Solution!
After a lot of troubleshooting I solved the problem by purging my docker containers, the real solution is after a change in your docker files, you need to run docker-compose down before running docker-compose up. I assumed when shutting down containers this process was involved and didnt know the docker-compose down command.

Dockerize python flask api that uses mysql.connector gives ModuleNotFoundError when run the docker image

I created an API using python3.7 flask and MySQL. I simply want to dockerize it. I was able to successfully create a docker image. But it throws ModuleNotFoundError: when I run it.
Docker File
FROM python:3.7.4-buster
COPY ./ /app
WORKDIR ./app
RUN pip install -r requirements.txt
ENTRYPOINT ["python3.7"]
CMD ["app.py"]
requriments file
pandas==0.25.1
pymongo==3.9.0
Flask==1.1.1
SQLAlchemy==1.3.8
mysql_connector_repackaged==0.3.1
Docker Image built successfully. However when I run the docker image. I get the following error.
$ docker run -p 5000:5000 api_module:latest
Traceback (most recent call last):
File "app.py", line 5, in <module>
from mysqlDB import input_data_mysql
File "/app/mysqlDB.py", line 2, in <module>
import mysql.connector
File "/usr/local/lib/python3.7/site-packages/mysql/connector/__init__.py", line 34, in <module>
import _version
ModuleNotFoundError: No module named '_version'
mysql-connector-python==8.0.17
solved this

How to locate openjdk in Docker container?

I tried to run the pyspark application.For this first I installed pyspark from pip then pulled openjdk:8 to set the JAVA_HOME variable
Dockerfile :
FROM python:3
ADD my_script.py /
COPY requirements.txt ./
ENV JAVA_HOME /usr/lib/jvm/java-8-openjdk-amd64/
RUN pip install --no-cache-dir -r requirements.txt
CMD [ "python", "./my_script.py" ]
my_script.py :
from pyspark import SparkContext
from pyspark import SparkConf
#spark conf
conf1 = SparkConf()
conf1.setMaster("local[*]")
conf1.setAppName('hamza')
print(conf1)
sc = SparkContext(conf = conf1)
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
print(sqlContext)
Requirements.txt :
pyspark
numpy
Getting this error :
C:\Users\hrafiq\Desktop\sample>docker run -it --rm --name data2 my-python-app
<pyspark.conf.SparkConf object at 0x7f4bd933ba58>
/usr/local/lib/python3.7/site-packages/pyspark/bin/spark-class: line 71:
/usr/lib/jvm/java-8-openjdk-amd64//bin/java: No such file or directory
Traceback (most recent call last):
File "./my_script.py", line 14, in <module>
sc = SparkContext(conf = conf1)
File "/usr/local/lib/python3.7/site-packages/pyspark/context.py", line 115, in __init__
SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
File "/usr/local/lib/python3.7/site-packages/pyspark/context.py", line 298, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway(conf)
File "/usr/local/lib/python3.7/site-packages/pyspark/java_gateway.py", line 94, in launch_gateway
raise Exception("Java gateway process exited before sending its port number")
Exception: Java gateway process exited before sending its port number
So the question is If is not finding the java file then how will I find the that file? I know it is stored in some virtual hard disk which we dont have any access.
Any help would be appreciated
Thanks
Setting the JAVA_HOME env var is not enough. You need to actually install openjdk inside your docker image.
Your base image (python:3) is itself based on a Debian Stretch image. So you can use apt-get install to fetch the JDK :
FROM python:3
RUN apt-get update && \
apt-get install -y openjdk-8-jdk-headless && \
rm -rf /var/lib/apt/lists/*
ENV JAVA_HOME /usr/lib/jvm/java-8-openjdk-amd64/
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
COPY my_script.py ./
CMD [ "python", "./my_script.py" ]
(In the above I have optimized the layers ordering so that you won't need to re-build the pip install layer when only your script changes)

Resources