view docker pyspark output files in windows directly - docker

Note: I am using Windows 11
I have built a docker image that executes a pyspark application to read CSV files and write them in parquet format. Below is my Dockerfile
FROM gcr.io/datamechanics/spark:platform-3.2-latest
ENV PYSPARK_MAJOR_PYTHON_VERSION=3
WORKDIR /opt/application/
COPY requirements.txt .
RUN pip3 install -r requirements.txt
COPY input/ input/
COPY output/ output/
COPY main.py .
I am using below to execute the script
docker run <image_name> driver local:///opt/application/main.py
I am writing the output to the output folder. To view the output files I am using
docker cp <container_name>:/opt/application/output C:/output/
Is there any ETL way to execute the python script(using dockerfile/shell script) and view the output files directly in windows(without copying it from the docker container)?

Related

running a docker container and trying to check the log of the container prints an error on console

I have built the image into a container, why I try to print the log of the container I run in detach mode I get an error printout :
Error: Could not find or load main class EazyBankApplication
Caused by: java.lang.ClassNotFoundException: EazyBankApplication
Please how can I fix this, I have tried changing the run command to accommodate a directory path and it still persists.
FROM openjdk:17
RUN mkdir /eazyApp
COPY ./src/main/java/com/eazybank/ /eazyApp
WORKDIR /eazyApp
CMD java EazyBankApplication
In your Dockerfile, you're COPYing only a single source directory into the image. When the container tries to execute the CMD java EazyBankApplication, the corresponding .class file doesn't exist.
Many SO Java questions address this by compiling the application on the host, and then COPY the built jar file (but not any of its source) into the image.
FROM openjdk:17
WORKDIR /eazyApp
COPY target/eazybankapplication-0.0.1.jar ./
CMD java -jar eazybankapplication-0.0.1.jar
If you want to build the application inside Docker, then you need to invoke the build tool somehow. A typical approach is to COPY the entire application tree into the image, excluding parts of it via a .dockerignore file.
FROM openjdk:17
WORKDIR /eazyApp
# Copy in the entire source tree
# (.dockerignore file excludes host's build/)
COPY ./ ./
# Build the application
RUN ./gradlew build
# Run it
CMD java -jar build/libs/eazybankapplication-0.0.1.jar
(You could combine the two approaches with a multi-stage build to get a final image without the source code, and possible with only the JRE for a smaller image.)
You probably need a build system like Gradle or Maven in practice, but if you only have the source files, you can still directly use javac.
FROM openjdk:17
WORKDIR /eazyApp
COPY ./src/ ./src/
RUN mkdir class
ENV CLASSPATH=/eazyApp/class
RUN find src -name '*.java' -print > files.txt \
&& javac -dclass #files.txt \
&& rm files.txt
CMD java com.eazybank.EazyBankApplication
Your copy command does not copy the files in the directory, use
COPY ./src/main/java/com/eazybank/* /eazyApp
Also you don't need mkdir command, copy command will create the directory.

Dockerfile (multistage build?) and container with multiple executable files

I'm having some issues with producing the following setup.
I've implemented a Java application that can start a process with any executable file (with Runtime.exec({whatever-file-here})....), where the file path is provided via external configuration. I have then created a Docker image with the said application, the idea being that the external executable file will be part of a second Docker image, containing all the necessary dependencies. This will leave the option to easily swap the file being executed by the Java app.
So from one side there is the Java image that should look like:
FROM openjdk:14
WORKDIR /app
COPY /build/some.jar /app/some.jar
And let's say I build a service-image out of it. The next step would be to use the aforementioned image as a base image in either a second Dockerfile or a single file with multiple stages.
The way I imagine it being a second Dockerfile for let's say a Python executable will be:
FROM python:latest #python so I can run the script
COPY --from=service-image / / #to get the runtime environment + app directory + jar
COPY some-file.py /app/some-file.py #copying the file for the jar to run
CMD ["java", "-jar", "/app/some.jar"] #the command that will start the java app
And running a container with an image build from the second file should have both a JRE to run the jar file and python to run the .py file as well as the actual .jar and .py files. I'm ignoring any details such as environment variables necessary for the java app to work. But that doesn't seem right, as the resulting image is absolutely massive.
What would you recommend as an approach? Until now I haven't dealt with complex Docker scenarios.
I really do not think that you will be able to create a proper container by replacing the root folder with the one of an other image.
Here is how you could do:
Build your jar file using an openjdk image
Create an image with python and Java installed and copy the .jar from the previous image
You can start from a python image and install Java or the opposite.
Here is an example:
FROM openjdk:14 AS build
WORKDIR /app
COPY . .
RUN build-my-app.sh
FROM openjdk:14-alpine AS runner
WORKDIR /app
# Install python
ENV PYTHONUNBUFFERED=1
RUN apk add --update --no-cache python3 && ln -sf python3 /usr/bin/python
RUN python3 -m ensurepip
RUN pip3 install --no-cache --upgrade pip setuptools
COPY --from=builder /app/dist/myapp.jar myapp.jar
COPY some-file.py some-file.py
CMD ["java", "-jar", "/app/some.jar"] #the command that will start the java app
EDIT: Apparently you are not using Docker to build your jar so you can simply copy it from your host machine (like that py file) and skip the build step.

Dockerfile for Flask app, WORKDIR path should be absolute

So I am learning out docker for the first time and was wondering if I am doing this in the correct format for my flask app, as a lot of documentation online for the WOKRDIR command is changing dir into "/app" however my main file to run the app is run.py which would be the same directory as the actual docker file. However, WORKDIR doesn't let me do "WORKDIR ." to use the current DIR.
Can someone clarify if I have my docker file set up correctly?
(I also plan to host this on Heroku if that matters)
File structure
Docker file
# start by pulling the python image
FROM python:3.8-alpine
# copy the requirements file into the image
COPY ./requirements.txt /requirements.txt
# Don't need to switch working directory
# WORKDIR
# install the dependencies and packages in the requirements file
RUN pip install -r requirements.txt
# copy every content from the local file to the image
COPY . /app
# configure the container to run in an executed manner
ENTRYPOINT [ "python" ]
CMD ["run.py" ]

Check the file contents of a docker image

I am new to docker and I built my image with
docker build -t mycontainer .
The contents of my Dockerfile is
FROM python:3
COPY ./* /my-project/
RUN pip install -r requirements.txt
CMD python /my-project/main.py
Here I get an error:
Could not open requirements file: No such file or directory: 'requirements.txt'
I am not sure if all the files from my local are actually copied to the image.
I want to inspect the contents of the image, is there any way I can do that?
Any help will be appreciated!
When you run docker build, it should print out a line like
Step 2/4 : COPY ./* /my-project/
---> 1254cdda0b83
That number is actually a valid image ID, and so you can get a debugging shell in that image
docker run --rm -it 1254cdda0b83 bash
In particular the place that container starts up will have the exact filesystem, environment variables (from ENV directives), current directory (WORKDIR), user (USER), and so on; directly typing in the next RUN command should get the same result as Docker running it itself.
(In this specific case, try running pwd and ls -l in the debugging shell; does your Dockerfile need a WORKDIR to tell the pip command where to run?)
You just have to get into the project directory and run the pip command.
The best way to do that is to set the WORKDIR /my-project!
This is the updated file
FROM python:3
COPY ./* /my-project/
WORKDIR /my-project
RUN pip install -r requirements.txt
CMD python /my-project/main.py
Kudos!

Can I copy a directory from within a docker Image in dockerfile?

I am trying to build a simple java web application(not maven) from within my docker file by pulling code from my git server and creating a deployable war.In order to do so I have to copy the classes directory to my WEB-INF Folder .My classes directory is at /usr/app_name/build/classes(in the docker image) and I want to copy it to /usr/app_name/WebContent/WEB-INF/ (within the same image).
Here is my docker file:
FROM maven:3.5-jdk-8 AS buildserver
WORKDIR /usr/app_name
RUN git clone http://uname:pass#git_server_host:git_server_port/scm/tes/app_name.git /usr/app_name
COPY /usr/app_name/build/classes /usr/app_name/WebContent/WEB-INF/
#***#Is there any way to perform above operation***
WORKDIR /usr/app_name/WebContent/WEB-INF/
RUN jar -cvf app_name.war *
FROM tomcat:latest
COPY --from=buildserver /usr/app_name/WebContent/WEB-INF/app_name.war .
EXPOSE 5060
The COPY command in docker only works for copying files from the docker host machine to the image being built. You can do what you need by just running a cp command in the image (or using rsync or some other tool if you have them installed in the container). An example of this would be:
RUN cp -r /usr/app_name/build/classes /usr/app_name/WebContent/WEB-INF/
to copy the contents to /usr/app_name/WebContent/WEB-INF/classes, or:
RUN cp -r /usr/app_name/build/classes/* /usr/app_name/WebContent/WEB-INF/
if you want to copy the content to /usr/app_name/WebContent/WEB-INF directly.

Resources