I would like to add snowflake connector requirements to a Pipenv pipfile in a clean way. The requirements.txt file is available at https://github.com/snowflakedb/snowflake-connector-python/tree/main/tested_requirements
What is the proper way to reference one of the requirements files directly in a Pipenv pipfile?
Related
I have the following repository structure (I know the names are stupid):
dummy/
main/
hello.py
requirements.txt
yips/
yips/
_init_.py
lol.py
setup.py
Dockerfile
The idea is to run the program hello.py, which imports a method from lol.py in the Yips library. For testing purposes, I import sklearn in the lol.py despite not using it. My Dockerfile looks like the following:
FROM python:3.9-bullseye
WORKDIR /yips
COPY yips/ ./
RUN pip3 install .
WORKDIR /main
COPY ./main/ .
RUN pip3 install --no-cache-dir -r ./requirements.txt
CMD ["python3", "hello.py"]
Requirements.txt has both sklearn and numpy, which is used in hello.py.
I have tried running the docker image and it complains that it cannot find sklearn and for what its worth, when I do not import it, everything works as fine (so there is not an issue with the numpy import in hello.py). I have also tried adding a direct call to pip install sklearn before installing my yips library. Does anyone have any insight on how to fix this?
You might have put scikit in your requirements.txt when you should include scikit-learn.
How to fix the error for the main use cases
I am trying to use some external python packages using pip which would let me use snowflake on apache airflow.
I have a dockerfile and I am using helm chats to install airflow.
Now I need to add some python dependencies to integrate snowflake and airflow and I have two ways of doing this.
Idea 1:
Adding python packages to docker file using requirements.txt file which will have my pip packages and then docker build using this dockerfile
Idea 2:
Adding python packages to values.yaml file and using this to upgrade my helm chart for airflow so that it installs airflow and these packages.
I tried these two and it doesn't seem to work. I don't see my packages.
Are there any alternative or recommended ways of doing this?
I could solve this by updating the dockerfile as other users suggested above.
add your python packages to a requirements.txt file and save it to a folder ( your working directory)
FROM apache/airflow:latest
USER airflow
WORKDIR C:/Users/my_folder_which_has_requirements.txt
COPY requirements.txt ./
RUN pip install -r requirements.txt
You can also do this without requirements.txt like
FROM apache/airflow:latest
USER airflow
RUN pip install "package1" "package2" "package3"
I am building a Docker image and need to run pip install vs a private PyPi with credentials.
What is the best way to secure the credentials?
Using various file configuration options (pip.conf, requirements.txt, .netrc) is still a vulnerability even if I delete them because they can be recovered.
Environment variables are also visible.
What's the most secure approach?
I understand that you want to provide those credentials on build time and get rid of them afterwards.
Well, the most secure way to handle this with pip would be by using a multi-stage build process.
First, you would declare an initial build-image with the file configurations and any dependency that could be needed to download/compile your desired packages; don't worry about the possibility of recovering those files, since you will only use them for the build process.
Afterwards define your final image without the build dependencies and copy only the source code you want to run from your project and the dependencies from the build image. The resultant image won't have the configuration files and it's impossible to recover them, since they never were there.
FROM python:3.10-slim as build
RUN apt-get update
RUN apt-get install -y --no-install-recommends \
build-essential gcc
WORKDIR /usr/app
RUN python -m -venv /usr/app/venv
ENV PATH="/usr/app/venv/bin:$PATH"
[HERE YOU COPY YOUR CONFIGURATION FILES WITH CREDENTIALS]
COPY requirements.txt
RUN pip install -r requirements
FROM python:3.10-slim
WORKDIR /usr/app
COPY --from=build /usr/app/venv ./venv
[HERE YOU COPY YOUR SOURCE CODE INTO YOUR CURRENT WORKDIR]
ENV PATH="/usr/app/venv/bin:$PATH"
ENTRYPOINT ["python", "whatever.py"]
Docker documentation suggests the following as best practice.
If you have multiple Dockerfile steps that use different files from
your context, COPY them individually, rather than all at once. This
ensures that each step’s build cache is only invalidated (forcing the
step to be re-run) if the specifically required files change.
For example:
COPY requirements.txt /tmp/
RUN pip install --requirement /tmp/requirements.txt
COPY . /tmp/
Results in fewer cache invalidations for the RUN step, than if you put
the COPY . /tmp/ before it.
My question is how, how does it help?
In either cases, if requirement.txt file doesn't change then pip install would fetch me the same result, so why does it matter that in best practice scenario, the requirement.txt is the only file in directory (while doing pip install)?
On the other hand, it creates one more layer in the image, which is
something I would not want.
Say you have a very simple application
$ ls
Dockerfile main.py requirements.txt
With the corresponding Dockerfile
FROM python:3
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["./main.py"]
Now say you only change the main.py script. Since the requirements.txt file hasn't changed, the RUN pip install ... can reuse the Docker image cache. This avoids re-running pip install, which can download a lot of packages and take a while.
I had a simple Docker file:
FROM python:3.6
COPY . /app
WORKDIR /app
RUN pip install -r requirements
The problem was - it installs requirements on every build. I have a lot of requirements, but they rarely change.
I searched for solutions and ended up with this:
FROM python:3.6
COPY requirements.txt /app/requirements.txt
WORKDIR /app
RUN pip install -r requirements.txt
COPY . /app
That worked perfectly fine, till moment it stopped updating the code. E.g., comment couple lines in some file that goes to /app and build - lines stays uncommented in image.
I searched again and found out that this is possibly caused by cache. I tried --no-cache build flag, but now I'm getting requirements installation again.
Is there some workaround or right way to do it in my situation?
You should use ADD not COPY if you want to invalidate cache.
FROM python:3.6
COPY requirements.txt /app/requirements.txt
WORKDIR /app
RUN pip install -r requirements.txt
ADD . /app
Try the above docker file.
Have you ever used docker-compose? Docker-compose has 'volumes', it's as a cache, and when you start container, It will not re-build your dependencies. It auto refresh when your code changes.
and with your situation, you should do like this:
FROM python:3.6
WORKDIR /app
COPY . /app
COPY requirements.txt ./
RUN pip install -r requirements.txt
CMD["python","app.py"]
Let try.
Changing a file that you simply copy in (COPY . /app) will not be seen by Docker, so it will use a cached layer *, hence your result. Using --no-cache will force a re-build of every layer, again explaining what you've observed.
The 'docker' way to avoid re-installing all requirements every time would be to put all the static requirements in a base image, then use this image in your FROM line with all the other requirements which do change.
* Although, I'm fairly sure I've observed that if you copy a named file, as opposed to a directory, changes are picked up even without --no-cache