How are docker layer directories named? - docker

In the description of how a docker image is structured (here) the spec shows that there is a directory for each layer. It states
There is a directory for each layer in the image. Each directory is named with a 64 character hex name that is deterministically generated from the layer information. These names are not necessarily layer DiffIDs or ChainIDs.
Somewhat frustratingly, they don't say how those names are derived from the layer information, although this might appear elsewhere in the document (though looking over it I've been unable to find it).
What is the algorithm to derive the layer directory names? Perhaps the name doesn't matter as long as it's deterministically chosen?

Related

transfer learning practice for distinguishing cat and dog

i'm trying to practice transfer learning myself.
I'm trying to count the number of each cat and dog files (each 12500 pictures for cat and dog with the total of 25000 pictures).
Here is my code.Code
And here is my path for the picture folderenter image description here.
I thought this was a simple code, but still couldn't figure out why i keep getting (0,0) in my coding (supposed to be (12500 cat files,12500 dog files)):(.
Use os.path.join() inside glob.glob(). Also, if all your images are of a particular extension (say, jpg), you could replace '*.*' with '*.jpg*' for example.
Solution
import os, glob
files = glob.glob(os.path.join(path,'train/*.*'))
As a matter of fact, you might as well just do the following using os library alone, since you are not selecting any particular file extension type.
import os
files = os.listdir(os.path.join(path,'train'))
Some Explanation
The method os.path.join() here helps you join multiple folders together to create a path. This will work whether you are on a Windows/Mac/Linux system. But, for windows the path-separator is \ and for Mac/Linux it is /. So, not using os.path.join() could create an un-resolvable path for the OS. I would use glob.glob when I am interested in getting some specific types (extensions) of files. But glob.glob(path) requires a valid path to work with. In my solution, os.path.join() is creating that path from the path components and feeding it into glob.glob().
For more clarity, I suggest you see documentation for os.path.join and glob.glob.
Also, see pathlib module for path manipulation as an alternative to os.path.join().

What does it mean sha256 in Docker, where are directories of layers of image?

I am newbie at Docker. I have to theoretically question about Docker. We know that defaultly Docker uses AUFS, layers filesystem. Where can I find in /var/libs/docker folders for each layer ? I would like to see it.
And second thing:
What is sha256 ? I know that it is some number-hash. But what does it mean in docker ?
You can see more at "Docker and AUFS in practice"
This diagram shows that each image layer, and the container layer, is represented in the Docker hosts filesystem as a directory under /var/lib/docker/.
The union mount point provides the unified view of all layers.
As of Docker 1.10, image layer IDs do not correspond to the names of the directories that contain their data.
As I mentioned before:
the V2 API does not deal in Image IDs. Rather, it uses digests to identify layers, which can be calculated as property of the layer and are independently verifiable.
See "Docker Registry HTTP API V2":
This API design is driven heavily by content addressability.
The core of this design is the concept of a content addressable identifier.
It uniquely identifies content by taking a collision-resistant hash of the bytes. Such an identifier can be independently calculated and verified by selection of a common algorithm.
If such an identifier can be communicated in a secure manner, one can retrieve the content from an insecure source, calculate it independently and be certain that the correct content was obtained.
Put simply, the identifier is a property of the content.
To disambiguate from other concepts, we call this identifier a digest.
A digest is a serialized hash result, consisting of a algorithm and hex portion. The algorithm identifies the methodology used to calculate the digest. The hex portion is the hex-encoded result of the hash.
We define a digest string to match the following grammar:
digest := algorithm ":" hex algorithm := /[A-Fa-f0-9_+.-]+/ hex := /[A-Fa-f0-9]+/
Some examples of digests include the following:
digest description
sha256:6c3c624b58dbbcd3c0dd82b4c53f04194d1247c6eebdaab7c610cf7d66709b3b Common sha256 based digest
While the algorithm does allow one to implement a wide variety of algorithms, compliant implementations should use sha256

Caffe mean file creation without database

I run caffe using an image_data_layer and don't want to create an LMDB or LevelDB for the data, But The compute_image_mean tool only works with LMDB/LevelDB databases.
Is there a simple solution for creating a mean file from a list of files (the same format that image_data_layer is using)?
You may notice that recent models (e.g., googlenet) do not use a mean file the same size as the input image, but rather a 3-vector representing a mean value per image channel. These values are quite "immune" to the specific dataset used (as long as it is large enough and contains "natural images").
So, as long as you are working with natural images you may use the same values as e.g., GoogLenet is using: B=104, G=117, R=123.
The simplest solution is to create a LMDB or LevelDB database of the image set.
The complicated solution is to write a tool similar to compute_image_mean, which takes image inputs and do the transformations and find the mean!

Preparing image dataset for input into Caffe deep learning

I know the first step is to create two file lists with the corresponding labels, one for the training and one for the test set. Suppose the former is called train.txt and the latter val.txt. The paths in these file lists should be relative. The labels should start at 0 and look similar to this:
relative/path/img1.jpg 0
relative/path/img2.jpg 0
relative/path/img3.jpg 1
relative/path/img4.jpg 1
relative/path/img5.jpg 2
For each of these two sets, we will create a separate LevelDB. Is this formatted as a text file? I thought I would create a directory with several subdirectories for each of my classes. Do I manually have to create a text file?
Please see this tutorial on how to use convert_imageset to build levelDb or lmdb datasets for caffe's training.
As you can see from these instruction it does not matter how you arrange the image files on your disk (same folder/different folders...) as long as you have the correct paths in your 'train.txt'/'val.txt' files relative to '/path/to/jpegs/' argument. But if you want to use convert_imageset tool, you'll have to create a text file listing all the images you want to use.

Finding what hard drive sectors occupy a file

I'm looking for a nice easy way to find what sectors occupy a given file. My language preference is C#.
From my A-Level Computing class I was taught that a hard drive has a lookup table on the first few KB of the disk. In this table there is a linked list for each file detailing what sectors that file occupies. So I'm hoping there's a convinient way to look in this table for a certain file and see what sectors it occupies.
I have tried Google'ing but I am finding nothing useful. Maybe I'm not searching for the right thing but I can't find anything at all.
Any help is appreciated, thanks.
About Drives
The physical geometry of modern hard drives is no longer directly accessible by the operating system. Early hard drives were simple enough that it was possible to address them according to their physical structure, cylinder-head-sector. Modern drives are much more complex and use systems like zone bit recording , in which not all tracks have the same amount of sectors. It's no longer practical to address them according to their physical geometry.
from the fdisk man page:
If possible, fdisk will obtain the disk geometry automatically. This is not necessarily the physical disk geometry (indeed, modern disks do not really have anything
like a physical geometry, certainly not something that can be described in simplistic Cylinders/Heads/Sectors form)
To get around this problem modern drives are addressed using Logical Block Addressing, which is what the operating system knows about. LBA is an addressing scheme where the entire disk is represented as a linear set of blocks, each block being a uniform amount of bytes (usually 512 or larger).
About Files
In order to understand where a "file" is located on a disk (at the LBA level) you will need to understand what a file is. This is going to be dependent on what file system you are using. In Unix style file systems there is a structure called an inode which describes a file. The inode stores all the attributes a file has and points to the LBA location of the actual data.
Ubuntu Example
Here's an example of finding the LBA location of file data.
First get your file's inode number
$ ls -i
659908 test.txt
Run the file system debugger. "yourPartition" will be something like sda1, it is the partition that your file system is located on.
$sudo debugfs /dev/yourPartition
debugfs: stat <659908>
Inode: 659908 Type: regular Mode: 0644 Flags: 0x80000
Generation: 3039230668 Version: 0x00000000:00000001
...
...
Size of extra inode fields: 28
EXTENTS:
(0): 266301
The number under "EXTENTS", 266301, is the logical block in the file system that your file is located on. If your file is large there will be multiple blocks listed. There's probably an easier way to get that number, I couldn't find one.
To validate that we have the right block use dd to read that block off the disk. To find out your file system block size, use dumpe2fs.
dumpe2fs -h /dev/yourPartition | grep "Block size"
Then put your block size in the ibs= parameter, and the extent logical block in the skip= parameter, and run dd like this:
sudo dd if=/dev/yourPartition of=success.txt ibs=4096 count=1 skip=266301
success.txt should now contain the original file's contents.
sudo hdparm --fibmap file
For ext, vfat and NTFS ..maybe more.
fibmap is also a linux C library.

Resources