When I want to precompute the sha256 of a downloaded file foo.txt, the output of nix-hash on the manually downloaded file is different from the one that is printed by nix-prefetch-url.
How can I compute the same hash with nix-hash?
The standard format of nix-hash is different from the one that nix-prefetch-url uses.
To get the same output from the manually downloaded file as nix-prefetch-url would give, run nix-hash like this:
nix-hash --type sha256 --flat --base32 <file>
Related
I have a text file contains of images from Artifactory.
I have a shell script to run blackduck scan on those images in a loop but I am getting errors like invalid reference format.
#!/usr/bin/sh
input="dockerimagesURL.txt"
while read dockerimagesURL
do
docker pull "$dockerimagesURL"
DockerImageID=$(docker images "$dockerimagesURL" --format '{{.ID}}')
sudo -S java -jar /home/dxc/Desktop/synopsis-detect-7.11.0/synopsys-detect-7.11.0.jar -- scan command continued.
done < $1
dockerimagesURL.txt file contains:
buildimages-docker-local.artifactory.com/docker-registry1:tag
buildimages-docker-local.artifactory.com/docker-registry2:tag
buildimages-docker-local.artifactory.com/docker-registry3:tag
buildimages-docker-local.artifactory.com/docker-registry4:tag
The above script is failing for multiple reasons:
Invalid reference format
docker pull -- not happening
Assuming your full error message is
'docker-registry1:tag' is not a valid repository/tag: invalid reference format
'tag' is a placeholder. Replace it by a tag that really exists, e.g. 'latest' (the default).
buildimages-docker-local.artifactory.com/docker-registry1:latest
The error could still exist after that change if the repository is not valid/cannot be found!
I read the Docker Image Specification v1.2.0.
It said:
Layers are referenced by cryptographic hashes of their serialized representation. This is a SHA256 digest over the tar archive used to transport the layer, represented as a hexadecimal encoding of 256 bits, e.g., sha256:a9561eb1b190625c9adb5a9513e72c4dedafc1cb2d4c5236c9a6957ec7dfd5a9. Layers must be packed and unpacked reproducibly to avoid changing the layer ID, for example by using tar-split to save the tar headers. Note that the digest used as the layer ID is taken over an uncompressed version of the tar.
I want find out the specific process. So I try the flowing:
chao#manager-02:~/image_lab$ docker image save busybox:1.27-glibc > busybox.tar
chao#manager-02:~/image_lab$ tar -xvf busybox.tar
47f54add1c481ac7754f9d022c2c420099a16e78faf85b4f2926a96ee40277fe/
47f54add1c481ac7754f9d022c2c420099a16e78faf85b4f2926a96ee40277fe/VERSION
47f54add1c481ac7754f9d022c2c420099a16e78faf85b4f2926a96ee40277fe/json
47f54add1c481ac7754f9d022c2c420099a16e78faf85b4f2926a96ee40277fe/layer.tar
fe2d514cd10652d0384abf2b051422722f9cdd7d189e661450cba8cd387a7bb8.json
manifest.json
repositories
chao#manager-02:~/image_lab$ ls
47f54add1c481ac7754f9d022c2c420099a16e78faf85b4f2926a96ee40277fe Dockerfile manifest.json
busybox.tar fe2d514cd10652d0384abf2b051422722f9cdd7d189e661450cba8cd387a7bb8.json repositories
chao#manager-02:~/image_lab$ sha256sum 47f54add1c481ac7754f9d022c2c420099a16e78faf85b4f2926a96ee40277fe/layer.tar
545903a7a569bac2d6b75f18d399251cefb53e12af9f644f4d9e6e0d893095c8 47f54add1c481ac7754f9d022c2c420099a16e78faf85b4f2926a96ee40277fe/layer.tar
Why the sha256sum I generated is not equal to sha256sum of the image layer?
Technically, you did answer your own question. This is what the Docker image spec says (as you quoted):
[DiffID] is a SHA256 digest over the tar archive used to transport the layer (...) Note that the digest used as the layer ID is taken over an uncompressed version of the tar.]"
But later on, when describing the content of the image, the same doc also says:
There is a directory for each layer in the image. Each directory is named with a 64 character hex name that is deterministically generated from the layer information. These names are not necessarily layer DiffIDs or ChainIDs.
If you look at the manifest.json of your image, you'll see that the rootfs.diff_ids array points to same hash you obtained by sha256suming layer.tar. The hash you computed is the DiffID.
The obvious follow up question then is: where did that directory name came from?!
I am not sure, but it seems that it is generated by whatever algorithm was used to generate layer IDs on the older Docker image format v1. Back then, images and layers were conflated into a single concept.
I'd guess they kept the v1 directory names unchanged to simplify the use old layers with newer Docker versions.
Footnote: AFAIU, the Docker image format spec is superseded by the OCI image format specification, but docker image save seems to generate archives in the older Docker format.)
I would like to edit a docker images metadata for the following reasons:
I don't like an image parents EXPOSE, VOLUME etc declaration (see #3465, Docker-Team did not want to provide a solution), so I'd like to "un-volume" or "un-expose" the image.
I dont't like an image ContainerConfig (see docker inspect [image]) cause it was generated from a running container using docker commit [container]
Fix error durring docker build or docker run like:
cannot mount volume over existing file, file exists [path]
Is there any way I can do that?
Its a bit hacky, but works:
Save the image to a tar.gz file:
$ docker save [image] > [targetfile.tar.gz]
Extract the tar file to get access to the raw image data:
tar -xvzf [targetfile.tar.gz]
Lookup the image metadata file in the manifest.json file: There should be a key like .Config which contains a [HEX] number. There should be an exact [HEX].json in the root of the extracted folder.
This is the file containing the image metadata. Edit as you like.
Pack the extracted files back into an new.tar.gz-archive
Use cat [new.tar.gz] | docker load to re-import the modified image
Use docker inspect [image] to verify your metadata changes have been applied
EDIT:
This has been wrapped into a handy script: https://github.com/gdraheim/docker-copyedit
I had come across the same workaround - since I have to edit the metadata of some images quite often (fixing an automated image rebuild from a third party), I have create a little script to help with the steps of save/unpack/edit/load.
Have a look at docker-copyedit. It can remove or overrides volumes as well as set other metadata values like entrypoint and cmd.
I'm trying to extract files from a tar file on a windows 7 machine using 7Zip's 7za.exe the command line. The file is 700GB and I only need a specific subdirectory. This should be possible using the following command.
7za x -r test.zip folder\subfolder
Running this on a test file (test.zip) does what is expected, i.e. it extracts all (sub)files from the folder\subfolder in the zip file. However, for the tar file, it doesn't work. I think it is related to a difference in file listings, as exemplified below.
7za l test.zip
produces:
folder
folder\subfolder
folder\subfolder\on_ADJ.png
While
7za l 20150602.tar
produces (excerpt):
.\Corpus
.\Corpus\DOC
.\Corpus\DOC\manual.pdf
Parallel to the first command, I tried using the following command.
7za x -r 20150602.tar .\Corpus\DOC
However, it doesn't work. Working within quotes (".\Corpus\DOC") or without .\ doesn't work either and 7Zip produces the following error.
Cannot use absolute pathnames for this command
Am I right that the tar file has absolute paths in it? If so, how could I solve this problem without having to extract the whole file?
I'm running GSUTIL v3.42 from a Windows CMD script on a Windows server 2008 R2 using Python 2.7.6. Files to be uploaded arrive in an "outgoing" directory and are uploaded in parallel by GSUTIL to an "incoming" bucket. The script requests a listing of the "incoming" bucket after uploading has finished and then compares the files listed with those it attempted to upload, in order to detect any upload failures. Another separate script moves files from the "incoming" bucket to a "processed" bucket afterwards.
If I attempt to upload the identical file (same name/size/content/date etc.) a second time, it doesn't upload, although I get no errors and nothing in my logging to indicate failure. I am not using the "no clobber" option, so I would expect gsutil to just upload the file.
In the scenario below, assume that the file has been successfully uploaded and moved to the "processed" bucket already on that day. In case timings matter, the second upload is being attempted within half an hour of the first.
File A arrives in "outgoing" directory.
I get a file listing of "outgoing" and write this to dirListing.txt
I perform a GSUTIL upload using
type dirListing.txt | python gsutil -m cp -I -L myGsutilLogFile.txt gs://myIncomingBucket
I then perform a GSUTIL listing
python gsutil ls -l -h gs://myIncomingBucket > bucketListing.txt
File match dirListing.txt and bucketListing.txt to detect mismatches and hence upload failures.
On the second run, File A isn't being uploaded in step 3 and consequently isn't returned in step 4, causing a mismatch in step 5. [I've checked the content of all of the relevant files and it's definitely in dirListing.txt and not in bucketListing.txt]
I need the ability to re-process a file in case the separate script that moves the file from the "incoming" to the "processed" bucket fails for some reason or doesn't do what it should do. I have to upload in parallel because there are normally hundreds of files on each run.
Is what I've described above expected behaviour from GSUTIL? (I haven't seen anything in the documentation that suggests this) If so, is there any way of forcing GSUTIL to re-attempt the upload? Or am I missing something obvious, please? I have debug output from GSUTIL if that's necessary/useful.
From the above, it looks like you're uploading using "-L" to log to a manifest file. If you're using the same manifest file, and the file has already been uploaded once, then gsutil will not try to re-upload the file. From the docs on "-L" in "gsutil help cp":
If the log file already exists, gsutil will use the file as an
input to the copy process, and will also append log items to the
existing file. Files/objects that are marked in the existing log
file as having been successfully copied (or skipped) will be
ignored.