installing python packages on kubernetes using helm charts - docker

I am trying to use some external python packages using pip which would let me use snowflake on apache airflow.
I have a dockerfile and I am using helm chats to install airflow.
Now I need to add some python dependencies to integrate snowflake and airflow and I have two ways of doing this.
Idea 1:
Adding python packages to docker file using requirements.txt file which will have my pip packages and then docker build using this dockerfile
Idea 2:
Adding python packages to values.yaml file and using this to upgrade my helm chart for airflow so that it installs airflow and these packages.
I tried these two and it doesn't seem to work. I don't see my packages.
Are there any alternative or recommended ways of doing this?

I could solve this by updating the dockerfile as other users suggested above.
add your python packages to a requirements.txt file and save it to a folder ( your working directory)
FROM apache/airflow:latest
USER airflow
WORKDIR C:/Users/my_folder_which_has_requirements.txt
COPY requirements.txt ./
RUN pip install -r requirements.txt
You can also do this without requirements.txt like
FROM apache/airflow:latest
USER airflow
RUN pip install "package1" "package2" "package3"

Related

Can docker help me avoid the reinstallation process of libraries everytime I ssh to a new machine?

I have a list of python libraries I need to reinstall everytime I ssh onto this new computer on campus. How can I make sure I only have to create one file/write one command so I don't have to repeat this reinstallation process? Is Docker a good way to do so?
I have a requirements.txt file with all libraries I need
pandas
matplotlib
seaborn
numpy
sklearn
opencv-python
The project has various Jupyter notebooks which require the above imports to run.
I'm not too familiar with Dockerfiles and Docker containers:
FROM my/base
ADD . /srv
RUN pip install -r requirements.txt
RUN python sunflower.ipynb
ENTRYPOINT ["run_server"]
Would I wrap my entire project in a Dockerfile similar to that seen above? I have to ssh to a machine called dgx1.cc.gatech.edu by connecting a port on my machine to the corresponding port on the the dgx1.
8088:localhost:8088 username#dgx1.cc.gatech.edu
If you can run any Docker command, you can trivially get unrestricted root access to the entire system. It's extremely unlikely that you'll be able to run Docker commands on shared systems like you describe. Docker has several other disadvantages here and I wouldn't recommend it for the use case you're describing.
On the other hand, the Python requirements.txt file on its own is enough to list out the dependencies you need. You can create a virtual environment and install those packages into it, and they will be under your control and isolated from anything else installed on that system. Using this just involves:
# Create the virtual environment
python3 -m venv ./sunflower
# Make it be your current Python ("activate" it)
. ./sunflower/bin/activate
# Install the packages you need
pip3 install -r requirements.txt
# Run your script
python3 sunflower.ipynb
# Switch back to the system Python
deactivate
If you repeatedly ssh into the same system you can reuse the same virtual environment; re-run the activate command and the packages will be already installed.
(The current official Python tooling recommendation is to use Pipenv if you don't have more specialized needs, but that uses a different package-listing file setup than the requirements.txt file.)

How to create a docker file which has a standard python library for use

need your help i have a standard python library which is in .tar.gz file. i need to manually copy the file in git repo to use it all the time.
i need to create a docker container which will have this file and install the libraries from that standard library.
need your help on it. i looking for a Docker file
tried docker file as below
FROM python:3.6
COPY . /app
WORKDIR /app
RUN ls -ltr
EXPOSE 8080
RUN pip install pipenv
RUN pipenv install --system --deploy --skip-lock
I have a .tar.gz file which i need to copy it to docker and install the packages in it and use it containers

Pip compiled as part of dockerfile - fastest way to add a new entry to requirements.txt?

I'm using this Dockerfile as part of this docker compose file.
Right now, every time I want to add a new pip requirement, I stop my containers, add the new pip requirement, run docker-compose -f local.yml build, and then restart the containers with docker-compose -f local.yml up. This takes a long time, and it even looks like it's recompiling the container for Postgres if I just add a pip dependency.
What's the fastest way to add a single pip dependency to a container?
This is related to fact that the Docker build cache is being invalidated. When you edit the requirements.txt the step RUN pip install --no-cache-dir -r /requirements/production.txt and all subsequent instructions in the Dockerfile get invalidated. Thus they get re-executed.
As a best practice, you should avoid invalidaing the build cache as much as possible. This is achieved by moving the steps that change often to the bottom of the Dockerfile. You can edit the Dockerfile and while developing add separate pip installation steps to the end.
...
USER django
WORKDIR /app
pip install --no-cache-dir <new package>
pip install --no-cache-dir <new package2>
...
And once you are sure of all the dependencies needed, add them to the requirements file. That way you avoid invalidating the build cache early on and only build the steps starting from the installation of the new packages on ward.

how to use pip to install pkg from requirement file without reinstall

I am trying to build an Docker image. My Dockerfile is like this:
FROM python:2.7
ADD . /code
WORKDIR /code
RUN pip install -r requirement.txt
CMD ["python", "manage.py", "runserver", "0.0.0.0:8300"]
And my requirement.txt file like this:
wheel==0.29.0
numpy==1.11.3
django==1.10.5
django-cors-headers==2.0.2
gspread==0.6.2
oauth2client==4.0.0
Now, I have a little change in my code, and i need pandas, so i add it in to requirement.txt file
wheel==0.29.0
numpy==1.11.3
pandas==0.19.2
django==1.10.5
django-cors-headers==2.0.2
gspread==0.6.2
oauth2client==4.0.0
pip install -r requirement.txt will install all packages in that file, although almost of them has installed before. My question is how to make pip install pandas only? That will save the time to build image.
Thank you
If you rebuild your image after changing requirement.txt with docker build -t <your_image> ., I guess it cann't be done because each time when docker runs docker build, it'll start an intermediate container from base image, and it's a new environment so pip obviously will need to install all of dependencies.
You can consider to build your own base image on python:2.7 with common dependencies pre-installed, then build your application image on your own base image. Once there's a need to add more dependencies, manually re-build the base image on the previous one with only extra dependencies installed, and then maybe optionally docker push it back to your registry.
Hope this could be helpful :-)

Unable to upgrade pip in docker build

In running the Docker build (using Jenkins CI), it fails on upgrading pip (last line of the docker file). I need it to upgrade version 8.1.1, as it suggest in the log, as my deploy fails on PIP versions mismatch.
Dockerfile
FROM ubuntu:14.04
FROM python:3.4
# Expose a port for gunicorn to listen on
EXPOSE 8002
# Make a workdir and virtualenv
WORKDIR /opt/documents_api
# Install everything else
ADD . /opt/documents_api
# Set some environment varialbes for PIP installation and db management
ENV CQLENG_ALLOW_SCHEMA_MANAGEMENT="True"
RUN apt-get update
RUN apt-get install -y python3-pip
RUN pip3 install --upgrade pip
Here's the error:
Step 15 : RUN pip3 install --upgrade pip
19:46:00 ---> Running in 84e2bcc850c0
19:46:04 Collecting pip
19:46:04 Downloading pip-8.1.1-py2.py3-none-any.whl (1.2MB)
19:46:04 Installing collected packages: pip
19:46:04 Found existing installation: pip 7.1.2
19:46:04 Uninstalling pip-7.1.2:
19:46:05 Successfully uninstalled pip-7.1.2
19:46:10 Exception:
19:46:10 Traceback (most recent call last):
19:46:10 File "/usr/local/lib/python3.4/shutil.py", line 424, in _rmtree_safe_fd
19:46:10 os.unlink(name, dir_fd=topfd)
19:46:10 FileNotFoundError: [Errno 2] No such file or directory: 'pip'
19:46:10 You are using pip version 7.1.2, however version 8.1.1 is available.
When you use two FROM directives, docker creates two output images, that's why it's messed up.
First, remove FROM ubuntu:14.04 and don't apt-get update in a Dockerfile, it's a bad practice (your image will be different every time you build, defeating the whole purpose of containers/Docker).
Second, you can check official python images Dockerfile to know which version of pip is installed, for example, python:3.4 (it's already v8.1.1).
Third, there is a special image for you case (external application): python:3.4-onbuild. Your Dockerfile can be reduced to:
FROM python:3.4-onbuild
ENV CQLENG_ALLOW_SCHEMA_MANAGEMENT="True"
EXPOSE 8002
CMD python myapp.py
One last thing, try to use alpine based images, they're much smaller (for python, it's almost 10 time smaller than the ubuntu based).
turns out the host I was running had no outside (internet) access. So the upgrade was failing. We solved it by adding another package to the DTR that had the necessary version in it.
use /usr/bin/ for run pip. Example :
/usr/bin/pip install --upgrade pip
running this command solved the same problem for me (python 3.9):
RUN /usr/local/bin/python -m pip install --upgrade pip

Resources