Simple PySpark script using Dockerfile not able to run - docker

So I have a test.py file that creates a dataframe and outputs a csv:
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
df = spark.createDataFrame([[1,2],[3,4]])
df.write.csv("test", header=True)
I'm trying to get this to run via Docker but with zero luck.
I created the following Dockerfile:
FROM apache/spark-py:v3.3.0
COPY test.py test.py
# RUN python -m pip install --upgrade pip && pip3 install pyspark==3.3.0
RUN apt-get update -y &&\
apt-get install -y python3 &&\
pip3 install pyspark==3.3.0
CMD ["python3", "test.py"]
and run the following to run the test.py script in Docker:
docker build -t ch_image --rm . && docker run --rm --name ch_container ch_image
I'm getting the following error message:
Sending build context to Docker daemon 31.23 kB
Step 1/4 : FROM apache/spark-py:v3.3.0
---> d186e5bd67b6
Step 2/4 : COPY test.py test.py
---> Using cache
---> 460491fa3ac9
Step 3/4 : RUN apt-get update -y && apt-get install -y python3 && pip3 install pyspark==3.3.0
---> Running in 21d6432b8d7b
Reading package lists...
E: Could not open lock file /var/lib/apt/lists/lock - open (13: Permission denied)
E: Unable to lock directory /var/lib/apt/lists/
The command '/bin/sh -c apt-get update -y && apt-get install -y python3 && pip3 install pyspark==3.3.0' returned a non-zero code: 100
I've tried:
COPY test.py .
CMD ["/opt/spark/bin/pyspark", "test.py"]
RUN python -m pip install --upgrade pip && pip3 install pyspark==3.3.0
Any help to understand why I'm not able to run the script would be much appreciated.

You got "permission denied" error. Could you please add "USER root" between step 2 and step 4 in your dockerfile as shown below?
FROM apache/spark-py:v3.3.0
COPY test.py test.py
USER root
RUN apt-get update -y &&\
apt-get install -y python3 &&\
pip3 install pyspark==3.3.0
CMD ["python3", "test.py"]

Related

Issue facing while dockerization of person tracking

I've build an application which is detecting and tracking region of interest. It workers absolutely fine in my local machine, but when I am dockerazing the app I get this error:
"qt.qpa.plugin: Could not find the Qt platform plugin "xcb" in "" " on opencv version 4.5.5.64.
When I downgrade to opencv version 4.1.2.30, it then starts working( Note I want to work on Opencv 4.5.5.64 for tracker related issues).
I am running docker image like: sudo docker run -it -e DISPLAY=$DISPLAY -v /tmp/.X11-unix/:/tmp/.X11-unix --device /dev/video0 --gpus all --ipc=host myImage.
Its working with older version of opencv but not with 4.5.5.64. I've been banging my head on this issue since last 4 days now, I would really appreciate if I can get some help
FROM nvcr.io/nvidia/pytorch:22.04-py3
RUN rm -rf /opt/pytorch
RUN apt-get update && apt-get autoclean
RUN apt-get update && apt-get install -y --no-install-recommends python3-pip
RUN pip3 install pyqt5
RUN apt-get install -y '^libxcb.*-dev' libx11-xcb-dev libglu1-mesa-dev libxrender-dev libxi-dev libxkbcommon-dev libxkbcommon-x11-dev
RUN apt-get install -y libsm6 libxrender1 libfontconfig1
ENV QT_DEBUG_PLUGINS=1
COPY requirements.txt .
RUN python -m pip install --upgrade pip
RUN pip uninstall -y torch torchvision torchtext Pillow
RUN pip install --no-cache -r requirements.txt albumentations wandb gsutil notebook Pillow>=9.1.0 \
torch torchvision --extra-index-url https://download.pytorch.org/whl/cu113
RUN pip install opencv-contrib-python==4.5.5.64
RUN mkdir -p /usr/src/app
WORKDIR /usr/src/app
COPY . /usr/src/app
ENV OMP_NUM_THREADS=8
below is the dockerfile content.

Trying to build a docker image but the build fails on Cloud Build

This is the Dockerfile that I am trying to build using Cloud Build.
FROM ubuntu:latest
LABEL MAINTAINER example
WORKDIR /app
RUN apt-get update \
&& apt-get install -y python3-pip python3-dev \
&& pip3 install --upgrade pip
COPY . /app
RUN pip install --no-cache-dir -r requirements.txt
EXPOSE 5000
CMD ["newrelic-admin","run-program","gunicorn","wsgi:app","--bind","0.0.0.0:5000","--workers","2","--threads","4","--worker-class=gthread","--log-level","info"]
Here is the requirements.txt file
numpy==1.18.2
pyarrow==0.17.0
lightgbm==2.3.1
scikit-learn==0.22.2.post1
pandas==1.0.3
scipy==1.4.1
Flask==2.1.0
tqdm==4.43.0
joblib==0.15.1
newrelic==6.2.0.156
google-cloud-storage==1.33.0
gunicorn==20.1.0
Once the build begins, it gets stuck at pyarrow and returns
Installing build dependencies: finished with status 'error'
[91m error: subprocess-exited-with-error
× pip subprocess to install build dependencies did not run successfully.
How can I fix this?

Can't reach Flask server running in Docker container

I'm building a Flask + React webapp that runs correctly on my machine. I try to dockerize it. I can build image (stiko:demo), docker runs, server starts:
But when I try to open https://0.0.0.0:5000/ on my browser, connection fails:
I've searched for a while now, trying to start from various images, trying to use ENDPOINT + CMD command, use flask run --host=0.0.0.0 but still the same issue.
Here is Dockerfile:
FROM ubuntu:20.04
RUN useradd -ms /bin/bash ubuntu
RUN apt update
RUN apt install software-properties-common -y
RUN apt-get install libpq-dev -y
RUN add-apt-repository ppa:deadsnakes/ppa
RUN apt install python3.9 -y
RUN apt install python3-pip -y
RUN pip3 install --upgrade pip
WORKDIR /app/build
COPY ./build ./
WORKDIR /app/server
COPY ./server ./
RUN pip3 install -r requirements.txt --no-cache-dir
RUN pip3 install python-dotenv
ENV APP_SETTINGS="config.DevelopmentConfig"
EXPOSE 5000
app.py:
import sys
import os
from flask import Flask, request, render_template
from flask_sqlalchemy import SQLAlchemy
from flask_cors import CORS, cross_origin
from models import User, Project, Image, db
from api import blueprints
app = Flask(__name__,
static_folder='../build/static',
template_folder="../build"
)
app.config.from_object(os.environ['APP_SETTINGS'])
db.init_app(app)
cors = CORS(app)
# Register the blueprints
for b in blueprints:
app.register_blueprint(b)
#cross_origin
#app.route('/', defaults={'u_path': ''})
#app.route('/<path:u_path>')
def index(u_path=None):
return render_template("index.html")
if __name__ == "__main__":
app.run(host=('0.0.0.0'), port=5000, ssl_context='adhoc')
project structure:
build
|__static
|__index.html
|__ ...
server
|__app.py
|__requirements.txt
|__ ...
Dockerfile
Any help would be appreciated, thanks!
Your Dockerfile doesn't appear to be starting your flask api. Try adding a CMD to the end.
FROM ubuntu:20.04
RUN useradd -ms /bin/bash ubuntu
RUN apt update
RUN apt install software-properties-common -y
RUN apt-get install libpq-dev -y
RUN add-apt-repository ppa:deadsnakes/ppa
RUN apt install python3.9 -y
RUN apt install python3-pip -y
RUN pip3 install --upgrade pip
WORKDIR /app/build
COPY ./build ./
WORKDIR /app/server
COPY ./server ./
RUN pip3 install -r requirements.txt --no-cache-dir
RUN pip3 install python-dotenv
ENV APP_SETTINGS="config.DevelopmentConfig"
EXPOSE 5000
CMD [ "python", "app.py" ]
Well, just in case someone would be stuck like me, with probably zero network skill:
The Dockerfile and app.py are just fine. But instead of trying to access https://0.0.0.0:5000/ in browser I should have tried https://127.0.0.1:5000/ which works perfectly fine!

Rclone installation in docker on heroku with telegram bot [Help]

I want to install rclone on a docker image in heroku to able to use Rclone with python telegram bot
I made a heroku.yml file
build:
docker:
worker: Dockerfile
run:
worker: bash start.sh
And start.sh as
python3 -m bot
And Dockerfile as
FROM ubuntu:18.04
WORKDIR /usr/src/app
RUN docker pull rclone/rclone:latest
RUN docker run rclone/rclone:latest version
RUN chmod 777 /usr/src/app
RUN apt -qq update
RUN apt -qq install -y python3 python3-pip locales
COPY requirements.txt .
RUN pip3 install --no-cache-dir -r requirements.txt
COPY . .
RUN locale-gen en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US:en
ENV LC_ALL en_US.UTF-8
CMD ["bash","start.sh"]
I get error The command '/bin/sh -c docker pull rclone/rclone:latest' returned a non-zero code: 127 in the git bash CLI
What Am I doing wrong? Or what is the procedure?
Thanks in advance!
FROM ubuntu:16.04
WORKDIR /app
# line number 12 - 15 in your Dockerfile
RUN echo "LC_ALL=en_US.UTF-8" >> /etc/environment
RUN echo "LANG=en_US.UTF-8" >> /etc/environment
RUN more "/etc/environment"
RUN apt-get update
#RUN apt-get upgrade -y
#RUN apt-get dist-upgrade -y
RUN apt-get install curl htop git zip nano ncdu build-essential chrpath libssl-dev libxft-dev pkg-config glib2.0-dev libexpat1-dev gobject-introspection python-gi-dev apt-transport-https libgirepository1.0-dev libtiff5-dev libjpeg-turbo8-dev libgsf-1-dev fail2ban nginx -y
# Install Rclone
RUN curl -sL https://rclone.org/install.sh | bash
RUN rclone version
# Cleanup
RUN apt-get update && apt-get upgrade -y && apt-get autoremove -y
Based on this answer
You can try this, also don't try to use docker commands inside a Dockerfile.

Connect docker python to SQL server with pyodbc

I'm trying to connect a pyodbc python script running in a docker container to login to a MSSQL database I have tried all sorts of docker files, but not been able to make the connection (fails when bulding the docker or when python tries to connect), Does anyone have a working dockerfile, using pyodbc:
Dockerfile:
# Use an official Python runtime as a parent image
FROM python:2.7-slim
# Set the working directory to /app
WORKDIR /app
# Copy the current directory contents into the container at /app
ADD . /app
# Install any needed packages specified in requirements.txt
RUN pip install -r requirements.txt
# Run app.py when the container launches
CMD ["python", "App.py"]
requirements.TXT
pyodbc
App.Py
import pyodbc
connection = pyodbc.connect('Driver={SQL Server};'
'Server=xxxx;'
'Database=xxx;'
'UID=xxxx;'
'PWD=xxxx')
cursor = connection.cursor()
cursor.execute("SELECT [Id],[Name] FROM [DCMM].[config].[Models]")
for row in cursor.fetchall():
print(row.Name)
connection.close()
Bulding the container
docker build -t sqltest .
Output:
Sending build context to Docker daemon 4.096kB
Step 1/5 : FROM python:2.7-slim
---> 426d65ab9a72
Step 2/5 : WORKDIR /app
---> Using cache
---> 725f35122880
Step 3/5 : ADD . /app
---> 3feb8b7744f7
Removing intermediate container 4214091a111a
Step 4/5 : RUN pip install -r requirements.txt
---> Running in 27aa4dcfe738
Collecting pyodbc (from -r requirements.txt (line 1))
Downloading pyodbc-4.0.17.tar.gz (196kB)
Building wheels for collected packages: pyodbc
Running setup.py bdist_wheel for pyodbc: started
Running setup.py bdist_wheel for pyodbc: finished with status 'error'
Failed building wheel for pyodbc
Complete output from command /usr/local/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-EfWsmy/pyodbc/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/tmpa3S13tpip-wheel- --python-tag cp27:
running bdist_wheel
running build
running build_ext
building 'pyodbc' extension
creating build
creating build/temp.linux-x86_64-2.7
creating build/temp.linux-x86_64-2.7/src
gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DPYODBC_VERSION=4.0.17 -DSQL_WCHART_CONVERT=1 -I/usr/local/include/python2.7 -c src/cursor.cpp -o build/temp.linux-x86_64-2.7/src/cursor.o -Wno-write-strings
unable to execute 'gcc': No such file or directory
error: command 'gcc' failed with exit status 1
----------------------------------------
Running setup.py clean for pyodbc
Failed to build pyodbc
Installing collected packages: pyodbc
Running setup.py install for pyodbc: started
Running setup.py install for pyodbc: finished with status 'error'
Complete output from command /usr/local/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-EfWsmy/pyodbc/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-BV4sRM-record/install-record.txt --single-version-externally-managed --compile:
running install
running build
running build_ext
building 'pyodbc' extension
creating build
creating build/temp.linux-x86_64-2.7
creating build/temp.linux-x86_64-2.7/src
gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DPYODBC_VERSION=4.0.17 -DSQL_WCHART_CONVERT=1 -I/usr/local/include/python2.7 -c src/cursor.cpp -o build/temp.linux-x86_64-2.7/src/cursor.o -Wno-write-strings
unable to execute 'gcc': No such file or directory
error: command 'gcc' failed with exit status 1
----------------------------------------
Command "/usr/local/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-EfWsmy/pyodbc/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-BV4sRM-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-EfWsmy/pyodbc/
The command '/bin/sh -c pip install -r requirements.txt' returned a non-zero code: 1
Need to Run:
sudo apt-get install gcc
need to add a odbcinst.ini file containing:
[FreeTDS]Description=FreeTDS Driver Driver=/usr/lib/x86_64-linux-gnu/odbc/libtdsodbc.so Setup=/usr/lib/x86_64-linux-gnu/odbc/libtdsS.so
need to add folowing to docker file
ADD odbcinst.ini /etc/odbcinst.ini
RUN apt-get update
RUN apt-get install -y tdsodbc unixodbc-dev
RUN apt install unixodbc-bin -y
RUN apt-get clean -y
need to change connection in .py to
connection = pyodbc.connect('Driver={FreeTDS};'
'Server=xxxxx;'
'Database=DCMM;'
'UID=xxxxx;'
'PWD=xxxxx')
Now the container compiles, and gets data from SQL server
Running through this recently I found it was necessary to additionally include the following line (note that it did not build without this step):
RUN apt-get install --reinstall build-essential -y
The full Dockerfile looks as follows:
# parent image
FROM python:3.7-slim
# install FreeTDS and dependencies
RUN apt-get update \
&& apt-get install unixodbc -y \
&& apt-get install unixodbc-dev -y \
&& apt-get install freetds-dev -y \
&& apt-get install freetds-bin -y \
&& apt-get install tdsodbc -y \
&& apt-get install --reinstall build-essential -y
# populate "ocbcinst.ini"
RUN echo "[FreeTDS]\n\
Description = FreeTDS unixODBC Driver\n\
Driver = /usr/lib/x86_64-linux-gnu/odbc/libtdsodbc.so\n\
Setup = /usr/lib/x86_64-linux-gnu/odbc/libtdsS.so" >> /etc/odbcinst.ini
# install pyodbc (and, optionally, sqlalchemy)
RUN pip install --trusted-host pypi.python.org pyodbc==4.0.26 sqlalchemy==1.3.5
# run app.py upon container launch
CMD ["python", "app.py"]
Here's one way to then actually establish the connection inside app.py, via sqlalchemy (and assuming port 1433):
import sqlalchemy as sa
args = (username, password, server, database)
connstr = "mssql+pyodbc://{}:{}#{}/{}?driver=FreeTDS&port=1433&odbc_options='TDS_Version=8.0'"
engine = sa.create_engine(connstr.format(*args))
Based on Kåre Rasmussen's answer, here's a complete dockerfile for further use.
Make sure to edit the last two lines according to your architecture! They should reflect the actual paths to libtdsodbc.so and libtdsS.so.
If you're not sure about the paths to libtdsodbc.so and libtdsS.so, try dpkg --search libtdsodbc.so and dpkg --search libtdsS.so.
FROM python:3
#Install FreeTDS and dependencies for PyODBC
RUN apt-get update && apt-get install -y tdsodbc unixodbc-dev \
&& apt install unixodbc-bin -y \
&& apt-get clean -y
RUN echo "[FreeTDS]\n\
Description = FreeTDS unixODBC Driver\n\
Driver = /usr/lib/arm-linux-gnueabi/odbc/libtdsodbc.so\n\
Setup = /usr/lib/arm-linux-gnueabi/odbc/libtdsS.so" >> /etc/odbcinst.ini
Afterwards, install PyODBC, COPY your app and run it.
I was unable to use all of the above resolutions, I was keeping al kind of errors relating to the pyodbc package, in particular:
ImportError: libodbc.so.2: cannot open shared object file: No such file or directory.
I ended up with another resolution which defines the ODBC SQL Server Driver specifically for an Ubuntu 18.04 Docker image, in this case ODBC Driver 17 for SQL Server. In my specific use case I needed to make the connection to my MySQL database server on Azure via Flask SQLAlchemy, but the latter is not a necessity for the Docker configuration.
Dockerfile, with most important part adding the Microsoft repository and installing msodbcsql17 and unixodbc-dev:
# Ubuntu 18.04 base with Python runtime and pyodbc to connect to SQL Server
FROM ubuntu:18.04
WORKDIR /app
# apt-get and system utilities
RUN apt-get update && apt-get install -y \
curl apt-utils apt-transport-https debconf-utils gcc build-essential g++-5\
&& rm -rf /var/lib/apt/lists/*
# adding custom Microsoft repository
RUN curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
RUN curl https://packages.microsoft.com/config/ubuntu/18.04/prod.list > /etc/apt/sources.list.d/mssql-release.list
# install SQL Server drivers
RUN apt-get update && ACCEPT_EULA=Y apt-get install -y msodbcsql17 unixodbc-dev
# install SQL Server tools
RUN apt-get update && ACCEPT_EULA=Y apt-get install -y mssql-tools
RUN echo 'export PATH="$PATH:/opt/mssql-tools/bin"' >> ~/.bashrc
RUN /bin/bash -c "source ~/.bashrc"
# python libraries
RUN apt-get update -y && \
apt-get install -y python3-pip python3-dev
# install necessary locales, this prevents any locale errors related to Microsoft packages
RUN apt-get update && apt-get install -y locales \
&& echo "en_US.UTF-8 UTF-8" > /etc/locale.gen \
&& locale-gen
# copy requirements and install packages, I added this for general use
COPY ./requirements.txt > ./requirements.txt
RUN pip3 install -r ./requirements.txt
# you can also use regular install of the packages
RUN pip3 install pyodbc SQLAlchemy
# and if you are also planning to use Flask and Flask-SQLAlchemy
Run pip3 install Flask Flask-SQLAlchemy
COPY ..
# run your app via entrypoint or change the CMD command to your regular command
COPY docker-entrypoint.sh wsgi.py ./
CMD ["./docker-entrypoint.sh"]
This should build without any errors in Docker.
My database url looked like this:
import urllib.parse
# name the sepcific ODBC driver by version number, we installed msodbcsql17
params = urllib.parse.quote_plus("DRIVER={ODBC Driver 17 for SQL Server};SERVER=<your.database.windows.net>;DATABASE=<your-db-name>;UID=<username>;PWD=<password>")
db_uri = "mssql+pyodbc:///?odbc_connect={PARAMS}".format(PARAMS=params)
And for the bonus if you are using Flask-SQLAlchemy, your app config should contain something like this:
app.config["SQLALCHEMY_TRACK_MODIFICATIONS"] = False
app.config["SQLALCHEMY_DATABASE_URI"] = db_uri # from above
Happy coding!
How to install the necessary dependencies for pyodbc is related to the linux distribution and its version (in docker case, that is the base image of your docker image). If none of the above work for you, you can figure out the commands by trying in the docker container instance.
First, exec into the docker container
docker exec -it <container id> bash
Try various ways to get the distribution name and version of your linux. Then try different instructions in Install the Microsoft ODBC driver for SQL Server (Linux)
Here is a working example for Debian 9 based images, deriving exactly as the document instructions.
# Install pyodbc dependencies
RUN curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
RUN curl https://packages.microsoft.com/config/debian/9/prod.list > /etc/apt/sources.list.d/mssql-release.list
RUN apt-get update
RUN ACCEPT_EULA=Y apt-get -y install msodbcsql17
RUN apt-get -y install unixodbc-dev
RUN pip install pyodbc
For me to solve this issue I also had to add the following 2 lines in the dockerfile:
RUN echo MinProtocol = TLSv1.0 >> /etc/ssl/openssl.cnf
RUN echo CipherString = DEFAULT#SECLEVEL=1 >> /etc/ssl/openssl.cnf
For those who wanted to do official microsoft approach to install odbc driver and use python:slim docker image, you can use this as DockerFile:
FROM python:3.9-slim
RUN apt-get -y update && apt-get install -y curl gnupg
RUN curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
# download appropriate package for the OS version
# Debian 11
RUN curl https://packages.microsoft.com/config/debian/11/prod.list \
> /etc/apt/sources.list.d/mssql-release.list
RUN exit
RUN apt-get -y update
RUN ACCEPT_EULA=Y apt-get install -y msodbcsql18
Then for sqlalchemy's this can be called:
con_str = f"mssql+pyodbc://{username}:{password}#{host}/{db}?" \
"driver=ODBC+Driver+18+for+SQL+Server&TrustServerCertificate=yes"
engine = create_engine(con_str)
I created a Gist on GitHub on how to do this. I hope it helps. I had to piece things together from what I found on different resources.
https://gist.github.com/joshatxantie/4bcf5d0243fba63845fce7cc40365a3a
Goodluck!
I fixed this problem by using pypyodbc instead of pyodbc.
pip install pypyodbc==1.3.5
https://pypi.org/project/pypyodbc/
Found the hint here:
https://github.com/Azure/azure-functions-python-worker/issues/249
For not more problem use library for python
pymssql this not need install driver
pip install pymssql
import pymssql
conn = pymssql.connect(server, user, password, "tempdb")
cursor = conn.cursor(as_dict=True)
cursor.execute('SELECT * FROM persons WHERE salesrep=%s', 'John Doe')
for row in cursor:
print("ID=%d, Name=%s" % (row['id'], row['name']))
conn.close()
and work in docker

Resources