Dockerfile for tesseract 4.0 - docker

I am trying to create a Dockerfile for tesseract-ocr version 4.0.
Following are the contents of the Docker file.
FROM ubuntu:16.04
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update && apt-get install -y software-properties-
common && add-apt-repository -y ppa:alex-p/tesseract-ocr
RUN apt-get update && apt-get install -y tesseract-ocr
FROM python:3.6-alpine
ADD . /App
WORKDIR /App
COPY requirements.txt ./
COPY . .
RUN pip install --no-cache-dir -r requirements.txt
I am able to build the Docker image, but when I spin a container and try to run a tesseract command, I get
"tesseract" not found

The solution was to upgrade to Ubuntu 18.04:
FROM ubuntu:18.04
RUN apt-get update \
&& apt-get install tesseract-ocr -y \
python3 \
#python-setuptools \
python3-pip \
&& apt-get clean \
&& apt-get autoremove
ADD . /home/App
WORKDIR /home/App
COPY requirements.txt ./
COPY . .
RUN pip3 install -r requirements.txt
VOLUME ["/data"]
EXPOSE 5000 5000
CMD ["python3","OCRRun.py"]

Related

Run 32bit app nn ubuntu 20.04 docker container

I built a ubuntu image using the following Dockerfile:
FROM ubuntu:20.04
# Disable Prompt During Packages Installation
ARG DEBIAN_FRONTEND=noninteractive
# Add 32bit architecture
RUN dpkg --add-architecture i386 \
&& apt-get update \
&& apt-get install -y libc6:i386 libncurses5:i386 libstdc++6:i386 zlib1g:i386
RUN apt-get update && apt-get install -y locales && rm -rf /var/lib/apt/lists/* \
&& localedef -i en_US -c -f UTF-8 -A /usr/share/locale/locale.alias en_US.UTF-8
ENV LANG en_US.utf8
RUN apt-get update && apt-get install -y \
iputils-ping \
python3 python3-pip
# Copy app to container
COPY . /app
WORKDIR /app
# Install pip requirements
COPY requirements.txt /app
RUN python3 -m pip install -r requirements.txt
# During debugging, this entry point will be overridden. For more information, please refer to https://aka.ms/vscode-docker-python-debug
CMD ["bash"]
I've been trying to run a 32bit app (hence the first run command in the Dockerfile) I have inside the my_app directory using:
./app
but I keep getting
bash: ./app: No such file or directory
I build your docker file with no error, do you have more detail ?

Error running python code via docker image

I have a python code which runs fine to pull data from an API but I am getting issues to run it via docker. I am using pyodbc to load data into SQLServer in my python code. Here is my dockerfile:
FROM python:3.9.2
RUN apt-get update -y && apt-get install -y --no-install-recommends \
unixodbc-dev \
unixodbc \
libpq-dev
WORKDIR /app
COPY requirements.txt requirements.txt
RUN pip3 install -r requirements.txt
COPY . .
CMD ["python3","LoadAPI_data.py"]
After creating the docker image, when I am trying to run the docker image, I get the following error:
Error !!!!: ('01000', "[01000] [unixODBC][Driver Manager]Can't open
lib 'ODBC Driver 17 for SQL Server' : file not found (0)
(SQLDriverConnect)")
Can anyone let me know how do I get rid of this error?
I was able to get my code running by updating my dockerfile to run installation of SQL DB as well as python. Here is what my new dockerfile looks like.
FROM ubuntu:18.04
RUN apt-get update -y && \
apt-get install -y \
libpq-dev \
gcc \
python3-pip \
unixodbc-dev
RUN apt-get update && apt-get install -y \
curl apt-utils apt-transport-https debconf-utils gcc build-essential g++-5\
&& rm -rf /var/lib/apt/lists/*
RUN curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
RUN curl https://packages.microsoft.com/config/ubuntu/18.04/prod.list > /etc/apt/sources.list.d/mssql-release.list
RUN apt-get update
RUN ACCEPT_EULA=Y apt-get install -y --allow-unauthenticated msodbcsql17
RUN pip3 install pyodbc
WORKDIR /app
COPY requirements.txt requirements.txt
RUN pip3 install -r requirements.txt
COPY . .
CMD ["python3","LoadAPI_data.py"]

Entrypoint not found when deployed to Fargate. Locally works

I have the following Dockerfile, currently working locally in my device:
FROM python:3.7-slim-buster
WORKDIR /app
COPY . /app
VOLUME /app
RUN chmod +x /app/cat/sitemap_download.py
COPY entrypoint.sh /app/entrypoint.sh
RUN chmod +x /app/entrypoint.sh
ARG VERSION=3.7.4
RUN apt update && \
apt install -y bash wget && \
wget -O /tmp/nordrepo.deb https://repo.nordvpn.com/deb/nordvpn/debian/pool/main/nordvpn-release_1.0.0_all.deb && \
apt install -y /tmp/nordrepo.deb && \
apt update && \
apt install -y nordvpn=$VERSION && \
apt remove -y wget nordvpn-release
RUN apt-get clean \
&& apt-get -y update
RUN apt-get -y install python3-dev \
python3-psycopg2 \
&& apt-get -y install build-essential
RUN pip install --upgrade pip
RUN pip install -r cat/requirements.txt
RUN pip install awscli
ENTRYPOINT ["sh", "-c", "./entrypoint.sh"]
But when I deploy it to Fargate, the container stops before reaching the steady state with:
sh: 1: ./entrypoint.sh: not found
Edit: Adding entrypoint.sh file for clarification:
#!/bin/env sh
# start process, but it should exit once the file is in S3
/app/cat/sitemap_download.py
# Once the process is done, we are good to scale down the service
aws ecs update-service --cluster cluster_name --region eu-west-1 --service service-name --desired-count 0
I have tried modifying ENTRYPOINT to use it as exec form, or with full path but always get the same issue. Any ideas on what am I doing wrong?
I've managed to fix it now.
Changing the Dockerfile to look as follows solves the issue:
COPY . /app
VOLUME /app
RUN chmod +x /app/cat/sitemap_download.py
COPY entrypoint.sh /app/entrypoint.sh
RUN chmod +x /app/entrypoint.sh
ARG VERSION=3.7.4
RUN apt update && \
apt install -y bash wget && \
wget -O /tmp/nordrepo.deb https://repo.nordvpn.com/deb/nordvpn/debian/pool/main/nordvpn-release_1.0.0_all.deb && \
apt install -y /tmp/nordrepo.deb && \
apt update && \
apt install -y nordvpn=$VERSION && \
apt remove -y wget nordvpn-release
RUN apt-get clean \
&& apt-get -y update
RUN apt-get -y install python3-dev \
python3-psycopg2 \
&& apt-get -y install build-essential
RUN pip install --upgrade pip
RUN pip install -r cat/requirements.txt
RUN pip install awscli
ENTRYPOINT ["/bin/bash"]
CMD ["./entrypoint.sh"]
I tried this after reading: What is the difference between CMD and ENTRYPOINT in a Dockerfile?
I believe this syntax fixes it because with entrypoint I'm indicating bash to be run at start, and then passing the script as parameter.

Docker image size is coming up to 1.7 G for Ubuntu with Python packages

Following is my Dockerfile :-
FROM ubuntu:18.04 AS builder
RUN apt update -y
RUN apt install python3.8 -y && apt install python3-pip -y
RUN apt install build-essential automake pkg-config libtool libffi-dev libgmp-dev -y
RUN apt install libsecp256k1-dev -y
RUN apt install openjdk-8-jre -y
RUN apt install git -y
RUN apt install libkrb5-dev -y
RUN apt install vim -y
RUN mkdir /opt/app
RUN chown -R root:root /opt/app
COPY ["requirements.txt","/opt/app/requirements.txt"]
SHELL ["/bin/bash", "-c"]
WORKDIR /opt/app
RUN pip3 install -r requirements.txt && apt-get -y clean all
RUN mkdir /opt/app/
RUN chown -R root:root /opt/app/
RUN cd /opt/app/
RUN git clone -b master https://bitbucket.org/heroes/test.git
CMD ["bash","/opt/app/bin/connect.sh"]
Docker image is generating with an image file size of 1.7G. I need to have OpenJDK hence cannot use a standard python package as a base package. When I perform docker history , I can see 2 or 3 layers (installing packages above like Python3.8, OpenJDK and libsecp256k1-dev) taking up to 400MB to 500MB in size. Ubuntu as a base image takes only 64 MB however rest of size is taking by my dockerfile layers.
I believe I need to re-write the dockerfile in order to reduce the file size which I did but nothing happened concrete.
Please assist me on reducing the image less than 1 GB at least.
[Update]
Below is my updated Dockerfile:-
FROM ubuntu:18.04 AS builder
WORKDIR /opt/app
COPY requirements.txt /opt/app/aws/requirements.txt
RUN mkdir -p /opt/app/aws \
&& apt-get update -yq \
&& apt-get install -y python3.8 python3-pip openjdk-8-jre -yq && apt-get -y clean all \
&& chown -R root:root /opt/app && cd /opt/app/aws && pip3 install -r requirements.txt
FROM alpine
COPY --from=builder /opt/app /opt/app
SHELL ["/bin/bash", "-c"]
CMD ["bash","/opt/app/aws/bin/connector/connect.sh"]
Screenshot of image size:-
After removing unwanted libraries like git, etc and using the multi-stage build, the image is now approx 1.7 GB which I believe is a lot. Any suggestion to improve this?
You have multiple issues going on.
First, each of your RUN apt install is increasing your image size, you should have them all in the same RUN stage, and at the end of the stage, delete all cached apt files.
Second, you're installing unnecessary stuff. Why would you need vim and git for instance? Why are you installing build-essential and other build-related stuff if you're not building anything?
Third, it seems you tried to do a multi-stage build but ended up adding everything to the same image. Read up on python multi-stage builds.
If we consider best practices instead of multiple RUN use single RUN.
For example
RUN apt-get update -yq \
&& apt-get install -y python3-dev build-essential -yq \
&& apt-get install curl -yq \
&& pip install -r requirements.txt \
&& apt-get purge -y --auto-remove gcc python3-dev build-essential
you can use multistage builds if you don't require git in your final image you can remove in final stage
Also if possible you can use alpine version also.
Try disabling recommended packages of APT with --no-install-recommends, you can read more about it from here.
Now the image is smaller:
FROM ubuntu:18.04 AS builder
RUN apt update -y
RUN apt install python3-pip -y
RUN apt install build-essential automake pkg-config libtool libffi-dev libgmp-dev -y
RUN apt install libsecp256k1-dev -y
RUN apt install openjdk-8-jre-headless -y
RUN apt install git -y
RUN apt install libkrb5-dev -y
RUN apt install vim -y
RUN mkdir /opt/app
RUN chown -R root:root /opt/app
COPY ["requirements.txt","/opt/app/requirements.txt"]
SHELL ["/bin/bash", "-c"]
WORKDIR /opt/app
RUN pip3 install -r requirements.txt && apt-get -y clean all
RUN mkdir /opt/app/
RUN chown -R root:root /opt/app/
RUN cd /opt/app/
RUN git clone -b master https://bitbucket.org/heroes/test.git
CMD ["bash","/opt/app/bin/connect.sh"]

Docker copy src

My project hierarchy is as such:
docker-flowcell-restore
docker-flowcell-restore
config
src
requirements.txt
Dockerfile
I would like to add my src file to my docker image using Copy. So far my Docker image has the following:
FROM ubuntu
RUN apt-get update && apt-get install -y \
python3 \
python3-pip
ENV INSTALL_PATH /docker-flowcell-restore
RUN mkdir -p $INSTALL_PATH
WORKDIR $INSTALL_PATH
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt
ENTRYPOINT ['python', 'docker-flowcell-restore/src/main.py']
How do I add the copy of the contents of the src folder? Thank you.
Add the copy after your pip install command. This leverages the docker layer cache so you don't rerun the install of all the requirements after a change to different parts of your code:
FROM ubuntu
RUN apt-get update \
&& apt-get install -y \
python3 \
python3-pip \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
ENV INSTALL_PATH /docker-flowcell-restore
RUN mkdir -p $INSTALL_PATH
WORKDIR $INSTALL_PATH
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt
COPY src/ src/
ENTRYPOINT ['python', 'src/main.py']
I've also added steps to cleanup the apt cache after performing the install, and adjusted the entrypoint since the WORKDIR was defined.
You can do the following:
FROM ubuntu
RUN apt-get update && apt-get install -y \
python3 \
python3-pip
ENV INSTALL_PATH /docker-flowcell-restore
RUN mkdir -p $INSTALL_PATH
WORKDIR $INSTALL_PATH
COPY requirements.txt requirements.txt
COPY ./src <the path inside the container where you want src>
RUN pip install -r requirements.txt
ENTRYPOINT ['python', 'docker-flowcell-restore/src/main.py']
This is assuming that you are building your image from the directory where the Dockerfile is located.

Resources