Unable to read file from mounted directory in docker - docker

I'm developing a project which accepts image (photos) as a input from user, perform OCR on it using command-line Tesseract-OCR, store result in text file as "input.txt", then perform stopword-removal on this file using java program. All this should be done inside docker container. I have created docker image with Tesseract OCR installed within it. I have working StopWord-Removal Java code.
In my project, I have mounted host-os directory as "/work" directory inside docker image, so that I can get "image (photos)" directly from users home directory as,
docker run --rm -v `pwd`:/work -w /work ocr
here "ocr" is my docker image. I have created bash script, which calls Tessract-OCR and then calls StopWordRemoval java code, as
#!/bin/bash
tesseract sample.jpg input
java StopWords
Problem is, output of "tesseract sample.jpg input" is saved as "input.txt", but it is not accessible inside java program, whereas if I try to open other files from given directory using same code it's working.

Is the file "input.txt" written to host mounted directory or to container filesystem? I assume that "input.txt" is successfully written. It will be good if you can try accessing the file "input.txt" outside Java to narrow down the issue.

Related

How can I load a Docker image created from the .tar file of an original Docker image?

I have a Docker image in .tar format. When I load it using sudo docker load < image.tar, it works fine.
I used tar -xf image.tar to un-archive the file. I then un-archived each layer file so I could edit some scripts and update some libraries manually. Once I was done with this, I used tar -cf on each layer and then the entire image.
When I load the modified image the same way I loaded the original, it does not work. I get this error:
open /var/lib/docker/tmp/docker-import-742628246/image-edited/json: no such file or directory
What could I have done wrong to cause this error and how can I properly load the modified .tar file into Docker?
P.S.: The problem appeared on Docker 20.10.12, running on Kali Linux 2021.4 inside VMWare Workstation Player.
I am modifying some image tarball and trying to load it back to docker daemon, and saw the same error. I tried this: sha256sum the layer.tar in each layer, and search the config json file(the other json file at the same level as manifest.json, with a long sha value as filename) for the original hash, replace with the new sha256 value, then this error disappears.
But then again, another error like /var/lib/docker/tmp-xxxx/xxxxx cannot open file appears, then I think nothing should/can be modified in the layer tar, as we cannot bypass docker checking the integrity of image in any easy way.
I am planning to adding another layer to the image using Google jib tool, where I copy some script to the container which does the modification I want, so that original layers keep intact.

How do I make a script to run another program after a file is created in a specific folder?

How do I make a script to run another program after a file is created in a specific folder? Could be Windows or Linux. To be specific, I want to use rclone to move file to remote folder right after it is created. The program itself doesnt have built-in monitoring function
Could be Windows or Linux.
Linux: It's easy with inotifywait.
cd "a specific folder"
inotifywait -me close_write --format %f . | while read file
do rclone move "$file" …
done
Unfortunately this doesn't work with the Ubuntu Linux app from the Windows Store for files on mounted Windows filesystems.

Why does it show "File not found" when I am trying to run a command from a docker file to find and remove specific logs?

I have a docker file which has below command.
#Kafka log cleanup for log files older than 7 days
RUN find /opt/kafka/logs -name "*.log.*" -type f -mtime -7 -exec rm {} \;
While executing it gives an error opt/kafka/logs not found. But I can access to that directory. Any help on this is appreciated. Thank you.
Changing the contents of a directory defined with VOLUME in your Dockerfile using a RUN step will not work. The temporary container will be started with an anonymous volume and only changes to the container filesystem are saved to the image layer, not changes to the volume.
The RUN step, along with every other step in the Dockerfile, are used to build the image, and this image is the input to the container, it does not use your running containers or volumes for the build input, so it makes no sense to cleanup files that are not created as part of your image build.
If you do delete files created in your image build, you should make sure this is done within the same RUN step. Otherwise, files you delete are already written to an image layer, and are transferred and stored on disk, just not visible in containers based on the layer that includes the delete step.

Copying an exe and composing it as a docker image and making it platform independent

I need to create a Docker image, which when run, should install an exe in the specified directory that mentioned in my docker file.
Basically, I need ImageMagick application. The docker file created should be platform independent, say if I ran in windows it should use windows distribution, Linux means Linux distribution. It would be great if it adds an environmental variable in the system. I browsed for the solution, but I couldn't find an appropriate solution.
I know it's a bit late but maybe someone (like me) was still searching.
I ended up using a java-imagemagick docker version from https://hub.docker.com/r/cpaitsupport/java-imagemagick/dockerfile
You can run docker pull cpaitsupport/java-imagemagick to get this docker image to your docker machine.
Now comes the tricky part: as I needed to run the imagemagick inside a docker container for my main app. Now you can COPY the files from cpaitsupport/java-imagemagick to your custom container. Example :
COPY --from=cpaitsupport/java-imagemagick:latest . ./some/dir/imagemagick
now you should have the docker file structure for your custom app and also under some/dir/imagemagick/ the file structure for imagemagick. Here are all ImageMagick relative files (also convert, magic, the libraries etc).
Now if you want to use ImageMagick in your Code you need to setup some ENV variables to your docker container with the "new" path to the ImageMagick directory. Example:
IM4JAVA_TOOLPATH=/some/dir/imagemagick/usr/bin \
LD_LIBRARY_PATH=/usr/lib:/some/dir/imagemagick/usr/lib \
MAGICK_CONFIGURE_PATH=/some/dir/imagemagick/etc/ImageMagick-7 \
MAGICK_CODER_MODULE_PATH=/some/dir/imagemagick/usr/lib/ImageMagick-7.0.5/modules-Q16HDRI/coders \
MAGICK_HOME=/some/dir/imagemagick/usr
Now delete (in Java Code) ProcessStarter.setGlobalSearchPath(imPath); this part if it is set. So you can use the IM4JAVA_TOOLPATH.
Now the ConvertCmd cmd = new ConvertCmd(); and cmd.run(op); should be working.
Maybe it's not the best way but worked for me and I was struggling a lot.
Hope this helps!
Feel free to correct or add additional info.
You can install (extract files) to the external hosting system using docker mount or volumes -
however you can not change system setting by updating environment variables of the hosting system from inside of the containers.

Move file downloaded in Dockerfile to harddrive

First off, I really lack a lot of knowledge regarding Docker itself and its structure. I know that it'd be way more beneficial to learn the basics first, but I do require this to work in order to move on to other things for now.
So within a Dockerfile I installed wget & used it to download a file from a website, authentification & download are successful. However, when I later try move said file it can't be found, and it doesn't show up using e.g explorer either (path was specified)
I thought it might have something to do with RUN & how it executes the wget command; I read that the Id can be used to copy it to harddrive, but how'd I do that within a Dockerfile?
RUN wget -P ./path/to/somewhere http://something.com/file.txt --username xyz --password bluh
ADD ./path/to/somewhere/file.txt /mainDirectory
Download is shown & log-in is successful, but as I mentioned I am having trouble using that file later on as it's not to be located on the harddrive. Probably a basic error, but I'd really appreciate some input that might lead to a solution.
Obviously the error is produced when trying to execute ADD as there is no file to move. I am trying to find a way to mount a volume in order to store it, but so far in vain.
Edit:
Though the question is similiar to the "move to harddrive" one, I am searching for ways to get the id of the container created within the Dockerfile in order to move it; while the thread provides such answers, I haven't had any luck using them within the Dockerfile itself.
Short answer is that it's not possible.
The Dockerfile builds an image, which you can run as a short-lived container. During the build, you don't have (write) access to the host and its file system. Which kinda makes sense, since you want to build an immutable image from which to run ephemeral containers.
What you can do is run a container, and mount a path from your host as a volume into the container. This is the only way how you can share files between the host and a container.
Here is an example how you could do this with the sherylynn/wget image:
docker run -v /path/on/host:/path/in/container sherylynn/wget wget -O /path/in/container/file http://my.url.com
The -v HOST:CONTAINER parameter allows you to specify a path on the host that is mounted inside the container at a specified location.
For wget, I would prefer -O over -P when downloading a single file, since it makes it really explicit where your download ends up. When you point -O to the location of the volume, the downloaded file ends up on the host system (in the folder you mounted).
Since I have no idea what your image or your environment looks like, you might need to tweak one or two things to work well with your own image. As a general recommendation: For basic commands like wget or curl, you can find pre-made images on Docker Hub. This can be quite useful when you need to set up a Continuous Integration pipeline or so, where you want to use wget or curl but can't execute it directly.
Use wget -O instead of -P for specific file download
for e.g.,
RUN wget -O /tmp/new_file.txt http://something.com --username xyz --password bluh/new_file.txt
Thanks

Resources