How to use azcopy version 10.3.0 to copy and then delete file from Blob storage to VM using powershell - azcopy

Some csv files keep getting placed in an azure storage container. I need to continuously move the files to a VM. I am using the following powershell script running on my VM.
while($true){
.\azcopy sync "source blob" "destination folder on VM" --include-pattern "*.csv" --log-level ERROR
Start-Sleep -Seconds 60
}
The files are getting copied, but how do I delete the files from the source. They are not needed anymore after copying to the VM.

I scoured the interwebs for this and still none of the "solutions" were sufficient. SO -- I had to write my own. Below is the code I wrote that will do the following:
a. Download AZCopy if it doesnt exist.
b. Copy information from an Azure blob storage
c. Upon successful copy, delete information from source
https://github.com/SlkRck/demoscripts/blob/master/copy-blobstorage.ps1

You can use azcopy remove command after the azcopy sync operation is completed. And here I need to mention that the azcopy sync operation is thread-blocking, so it's safe to use azcopy rm command at the end of the azcopy sync operation.
Note that if you want to just remove all the .csv file, you should add --include-pattern="*.csv" in the command.
I'm using the latest version of azcopy, v10.3.1. If you prefer to use v10.3.0, then first you need to use this command azcopy remove --help for the details of this command and it's parameters.
A sample command like below:
while($true){
.\azcopy sync "source blob" "destination folder on VM" --include-pattern "*.csv" --log-level ERROR
Start-Sleep -Seconds 60
#after the copy operation is completed, use remove command as below.
azcopy.exe rm "https://xxx.blob.core.windows.net/test4?sastoken" --include-pattern="*.csv" --recursive=true
}
One thing need to mention, in my test, I use the sas token for my testing purpose.

Related

Is there any way to make a COPY command on Dockerfile silent or less verbose?

We have a Dockerfile which copies the folder to another folder.
Something like
COPY . /mycode
But the problem is that there is tons of files in the generated code, and it creates 10K+ lines on the jenkins log where we are running the CICD pipeline.
copying /bla/blah/bla to copying /bla/blah/bla 10k times.
Is there a way to make this COPY less verbose or silent? jenkins admin has already warned us that our log file is nearing his max limit.
You can tar/zip the files on the host, so there's only one file to copy. Then untar/unzip after it's been copied and direct the output of untar/unzip to /dev/null.
By definition the docker cp command doesn't have any "silent" switches, but perhaps redirecting the output may help, have you tried:
docker cp -a CONTAINER:SRC_PATH DEST_PATH|- &> /dev/null
I know it's not the most elegant, but if what you seek is to supress the console output, that may help.
Sources
[1] https://docs.docker.com/engine/reference/commandline/cp/
[2] https://csatlas.com/bash-redirect-stdout-stderr/

Can I run scripts from a docker build context without a copy?

I want to build on top of a windows docker container by installing a couple programs. The files total .5 GB and I want to keep the layers as small as possible. I am hoping I can run the setup files from the build-context, and then have the build-context swept away at the end so I don't have a needless copy of the source files for the setup.exe embedded in my container layers. However, I have not found an example where this is the case. Instead I mostly see people run a COPY command to a temporary build folder, run their setup, then remove the folder. Won't those files still be in the container layers because the COPY command creates a new layer when it's done?
I don't know if the container can see the build-context directly. I was hoping for some magical folder filled with the build-context files so I could run a script using it, but haven't found anything.
It seems like the alternative is to create a private file-server and perform a RUN that can download them from that private server and unpack them, run the install, and remove them (all as 1 docker step). I understand this would make it more available to others who need to rerun the build, but I'm not convinced we'll need to rerun it. It's not likely to change as the container will build patches for a legacy application. Just seems like a lot to host files on a private, public-facing server for something that will get called once every couple years if ever.
So are these my two options?
Make a container with needless copies of source files embedded within
Host the files on a private file server and download/install/remove them
Or am I missing another option or point about how the containers work?
It's a long shot as Windows is a tricky thing with file system, but you could do this way:
In your Dockerfile use a COPY command, install then RUN del ... to remove the installation files
Build your image docker build -t my-large-image:latest .
Run your image docker run --name my-large-container my-large-image:latest
Stop the container
Export your container filesystem docker export my-large-container > my-large-container.tar
Import the filesystem to a new image cat my-large-container.tar | docker import - my-small-image
Caveat is you need to run the container once which might not be what you want. And also I haven't tested with windows container, sorry.
I usually do the download or copy in one step, then in the next step I do the silent installation and remove the installer.
# escape=`
FROM mcr.microsoft.com/dotnet/framework/wcf:4.8-windowsservercore-ltsc2016
SHELL ["powershell", "-Command", "$ErrorActionPreference = 'Stop'; $ProgressPreference = 'SilentlyContinue';"]
ADD https://download.visualstudio.microsoft.com/download/pr/6afa582f-fa26-4a73-8cb9-194321e85f8d/ecea51ead62beb7acc73ad9799511ffdb3083ad384fe04ec50e2cbecfb426482/VS_RemoteTools.exe VS_RemoteTools_x64.exe
RUN Start-Process .\\VS_RemoteTools_x64.exe -ArgumentList #('/install','/quiet','/norestart') -NoNewWindow -Wait; `
Remove-Item -Path C:/VS_RemoteTools_x64.exe -Force;
But otherwise, I don't think you can mount a custom volume while it's being built.
I didn't find a satisfactory answer to this. Docker seems designed for only the modern era and assumes you'll be able to download what you need via scripts and tools hitting APIs and file servers. The easiest option I found that I eventually went with was to host the files on a private file server or service (in my case, AWS S3).
I really wish there was a way to have files hosted by the docker daemon in some way, eg. if it acted like a temporary server that you could get data from via http instead of needing to COPY the files and create a layer. Alas, I found no such feature.
Taking this route made my container about a GB smaller.

Why does it show "File not found" when I am trying to run a command from a docker file to find and remove specific logs?

I have a docker file which has below command.
#Kafka log cleanup for log files older than 7 days
RUN find /opt/kafka/logs -name "*.log.*" -type f -mtime -7 -exec rm {} \;
While executing it gives an error opt/kafka/logs not found. But I can access to that directory. Any help on this is appreciated. Thank you.
Changing the contents of a directory defined with VOLUME in your Dockerfile using a RUN step will not work. The temporary container will be started with an anonymous volume and only changes to the container filesystem are saved to the image layer, not changes to the volume.
The RUN step, along with every other step in the Dockerfile, are used to build the image, and this image is the input to the container, it does not use your running containers or volumes for the build input, so it makes no sense to cleanup files that are not created as part of your image build.
If you do delete files created in your image build, you should make sure this is done within the same RUN step. Otherwise, files you delete are already written to an image layer, and are transferred and stored on disk, just not visible in containers based on the layer that includes the delete step.

Alternative to using --squash when building docker images on Windows using local files

We have some local installers and zip files that we use to build our docker images. It is easy to get this to work in a Dockerfile:
FROM mcr.microsoft.com/windows/nanoserver
COPY myinstaller.exe .
RUN myinstaller.exe; \
del myinstaller.exe
The problem here is that it produces a layer for the COPY line, which increases the size of the image. A common work-around for this is to have one RUN line, that downloads the file from the Internet, runs commands, and then deletes the installation file. The problem, as written above, is that the installers are on the local filesystem.
I found that there is a --squash command for docker:
docker build --squash -t mytestimage .
This does exactly what I want: It gives me an image without this extra installer file that is not necessary. To run this command, you need to enable experimental features though. There is also an open issue to simply remove this feature:
https://github.com/moby/moby/issues/34565
Is there some alternative way of using local installers in a Dockerfile when running on Windows, that doesn't involve setting up a server to provide the files?
We ended up setting up nginx to provide files when building. On our build server, the machine building our docker images and the server that has the installer files have a very good connection between them, so downloading huge files is not a real problem.
When it comes to --squash, it is bugged for Docker on Windows. Here is the relevant issue for it:
https://github.com/moby/moby/issues/31468
There is an issue to move --squash out of experimental, but it doesn't seem to have a lot of support:
https://github.com/moby/moby/issues/38657
The alternative that some people propose instead of --squash is multi stage build, discussion here:
https://github.com/moby/moby/issues/34565
There is an alternative to --squash, if you have local installer files, you don't want to set up a web server, and you would like your docker image to be small, and you are running Windows: Use mapped drives.
In Windows, you can share folders with other users on your network. Docker containers are like another computer that is running on your physical machine, and it can access these network drives.
First set up a new user, for example username share and password password1. Create a folder somewhere on your computer. Then right click it, click properties, and then go to the Sharing tab and click "Share". Find the user that you have just created, using the little dropdown menu and Find people ..., and share the folder with this user.
Create a folder somewhere for your test project. Create a batch file setupshare.bat that looks like this:
#echo off
for /f "tokens=2 delims=:" %%i in ('ipconfig ^| findstr "Default Gateway"') do (
set hostip=%%i
goto :end
)
:end
set hostip=%hostip: =%
net use O: \\%hostip%\vms /USER:share password1
The first part of this file is only to find the ip address that the docker container can use to access its host computer. It is not the most pretty thing I've ever put together, so let me know if there's a better way!
It uses a for-loop, as that is the way to save the output of a command to a variable in batch files. The command is ipconfig, and we pipe it to findstr and searches for Default Gateway. We need to use ^| instead of just | because it is in a for-loop. The first part of the for-loop divides each line from the command on the delimiter, which is : in this case, and we only take the second token. The for-loop only handles the first line, if there are multiple entries with a Default Gateway. This script doesn't work if there are multiple entries and the first one is not the correct one.
The line set hostip=%hostip: =% is to remove a space at the start of the string.
We then have the IP address that we want to use stored in hostip. We use this in the net use command, which will map O:\ to shared folder vms on the machine with IP hostip. We use the username share and the password password1. Note that this is a very bad way of handling passwords, as they kind of should be secret!
With a batch file like this, we can set up a Dockerfile in this way:
# escape=`
FROM mcr.microsoft.com/dotnet/core/sdk:3.0
COPY setupshare.bat .
RUN setupshare.bat && `
copy O:\file.txt file.txt
The RUN command will first call setupshare.bat that sets up the network share properly. We can then use any file that we shared, for example a huge installer, and install the things that we want. In this case I have only shared a test file file.txt to see that it works, so just change that line.
I would still advice everyone to just set up a little web server, for example nginx, and use the standard way of writing Dockerfiles, with downloading files and running it in the same RUN command. That's what people expect when they see a Dockerfile, and it should be a more robust solution.
We can also hope that the Docker people either makes a COPY command that can copy, run, and delete installers in the same layer, or that --squash is implemented properly.

Move file downloaded in Dockerfile to harddrive

First off, I really lack a lot of knowledge regarding Docker itself and its structure. I know that it'd be way more beneficial to learn the basics first, but I do require this to work in order to move on to other things for now.
So within a Dockerfile I installed wget & used it to download a file from a website, authentification & download are successful. However, when I later try move said file it can't be found, and it doesn't show up using e.g explorer either (path was specified)
I thought it might have something to do with RUN & how it executes the wget command; I read that the Id can be used to copy it to harddrive, but how'd I do that within a Dockerfile?
RUN wget -P ./path/to/somewhere http://something.com/file.txt --username xyz --password bluh
ADD ./path/to/somewhere/file.txt /mainDirectory
Download is shown & log-in is successful, but as I mentioned I am having trouble using that file later on as it's not to be located on the harddrive. Probably a basic error, but I'd really appreciate some input that might lead to a solution.
Obviously the error is produced when trying to execute ADD as there is no file to move. I am trying to find a way to mount a volume in order to store it, but so far in vain.
Edit:
Though the question is similiar to the "move to harddrive" one, I am searching for ways to get the id of the container created within the Dockerfile in order to move it; while the thread provides such answers, I haven't had any luck using them within the Dockerfile itself.
Short answer is that it's not possible.
The Dockerfile builds an image, which you can run as a short-lived container. During the build, you don't have (write) access to the host and its file system. Which kinda makes sense, since you want to build an immutable image from which to run ephemeral containers.
What you can do is run a container, and mount a path from your host as a volume into the container. This is the only way how you can share files between the host and a container.
Here is an example how you could do this with the sherylynn/wget image:
docker run -v /path/on/host:/path/in/container sherylynn/wget wget -O /path/in/container/file http://my.url.com
The -v HOST:CONTAINER parameter allows you to specify a path on the host that is mounted inside the container at a specified location.
For wget, I would prefer -O over -P when downloading a single file, since it makes it really explicit where your download ends up. When you point -O to the location of the volume, the downloaded file ends up on the host system (in the folder you mounted).
Since I have no idea what your image or your environment looks like, you might need to tweak one or two things to work well with your own image. As a general recommendation: For basic commands like wget or curl, you can find pre-made images on Docker Hub. This can be quite useful when you need to set up a Continuous Integration pipeline or so, where you want to use wget or curl but can't execute it directly.
Use wget -O instead of -P for specific file download
for e.g.,
RUN wget -O /tmp/new_file.txt http://something.com --username xyz --password bluh/new_file.txt
Thanks

Resources