Docker registry - garbage collection doesn't actually remove anything - docker

I feel like I'm missing some critical bit about running a docker registry and removing images from said registry. On an internal-use docker registry I'm able to mark the images as deleted via API commands - so from the users perspective they're gone. However they still remain in the actual registry itself, taking up a lot of space.
I've ran garbage collection but this seems to only MARK the blobs for deletion, without actually doing anything.
(...)
win-core-vs2019
win-core-vs2022
win-core-vs2022: marking manifest sha256:b540cf77e0441517844513eb1b9988c33bc02e8ea5a080eea7a0d236e17db11e
win-core-vs2022: marking blob sha256:e29df5ef74e5f27acbbdaae30e515af0951180e469caec2e4af0b02d1d9e6102
win-core-vs2022: marking blob sha256:1a65b089bc835b0c3700397b1935e97cf469b0891bb4de3942c8dfbe4b672d47
win-core-vs2022: marking blob sha256:dad2b6fc2adc18eb8acfe9910a98a7f780d922fd6369792237facc90f444344f
win-core-vs2022: marking blob sha256:09b3f845cc0794835f45d9ca6a9f9191ee6d0121fef8f35017a55b702e4996d6
win-core-vs2022: marking blob sha256:687fdefae0b4173cc8f04fc97c5e80f19223b841f71216a0fbacb35f5fb265ae
win-core-vs2022: marking blob sha256:3487e620f3d431026bfd97b9f8afc5e20acb087e2f9dacab68246f6b95b827a4
win-core-vs2022: marking blob sha256:8c860b31b8e4212847e80d7bbacbf2c454d472df1aea08fe1de2a7b225d1a74a
win-core-vs2022: marking blob sha256:5a3edecfab50a7a2021b488b9dc348b3fc92ccba888a5867dea07a4453bd1805
win-core-vs2022: marking blob sha256:685e126bf726f4733d308ed7677ec7d7fb241633f0be93f4f55c3d5bb1af722c
win-core-vs2022: marking blob sha256:ba45e6489207a081c1e472ace13269fc7d1511ae031a2f4b9e34c393768e1075
win-core-vs2022: marking blob sha256:c75ae80571eedcb99a7af6e480fb19bc192e58123c3b6a086f91b06facb8735b
win-core-vs2022: marking blob sha256:379c1a65e56c2a49d65c8dfc7aec19d36e20416ec4e3602764cc7bd481c5cc3e
win-core-vs2022: marking blob sha256:49cfee3d1ef654c176a8f1344c9a60ca2078ac939f63ee7e58cf53593be51d59
152 blobs marked, 0 blobs and 0 manifests eligible for deletion
If I run the garbage collection again it'll throw the same output (and I'm NOT using --dry-run either). Am I doing something wrong?

Related

Rollback snapshot but run out of space

I have a 1TB zpool and a 700GB volume with one clean snapshot, such as:
zpool1
zpool1/volume1
zpool1/volume1#snap1
After writing 500GB data into volume, its written property has growth to 500GB as well.
Then I tried to rollback to the snapshot and I got error with "out of space".
Does zpool need extra space to rollback snapshot with big written value? Or can anyone explain why it fails?
After searching of zfs source code(dsl_dataset.c), I found the last part of dsl_dataset_rollback_check() may explain this limit:
* When we do the clone swap, we will temporarily use more space
* due to the refreservation (the head will no longer have any
* unique space, so the entire amount of the refreservation will need
* to be free). We will immediately destroy the clone, freeing
* this space, but the freeing happens over many txg's.
*
unused_refres_delta = (int64_t)MIN(ds->ds_reserved,
dsl_dataset_phys(ds)->ds_unique_bytes);
if (unused_refres_delta > 0 &&
unused_refres_delta >
dsl_dir_space_available(ds->ds_dir, NULL, 0, TRUE)) {
dsl_dataset_rele(ds, FTAG);
return (SET_ERROR(ENOSPC));
}
So that the volume's "avail" must be lager than "refreserv" to perform rollback.
Only for thin-volume can pass this check.
Rolling back to a snapshot requires a little space (for updating metadata), but this is very small.
From what you’ve described, I would expect nearly anything you write in the same pool / quota group to fail with ENOSPC at this point. If you run zpool status, I bet you’ll see that the entire pool is almost entirely full, or if you are using quotas, perhaps you’ve eaten up all of whatever quota group it applies to. If this is not what you expected, it could be that you’re using mirroring or RAID-Z, which causes duplicate bytes to be written (to allow corruption recovery). You can tell this by looking at the used physical bytes (instead of written logical bytes) in zfs list.
Most of the data you added after the snapshot can be deleted once rollback has completed, but not before then (so rollback has to keep that data around until it completes).

How are docker layer directories named?

In the description of how a docker image is structured (here) the spec shows that there is a directory for each layer. It states
There is a directory for each layer in the image. Each directory is named with a 64 character hex name that is deterministically generated from the layer information. These names are not necessarily layer DiffIDs or ChainIDs.
Somewhat frustratingly, they don't say how those names are derived from the layer information, although this might appear elsewhere in the document (though looking over it I've been unable to find it).
What is the algorithm to derive the layer directory names? Perhaps the name doesn't matter as long as it's deterministically chosen?

Is it possible to append bytes to a blob in Azure Blob Storage?

I had a bit of trouble uploading large files (>2GB) with ASP.NET MVC5 but I managed to fix it by splitting that file in packets with jQuery and uploading each packet separately. In my backend I want to upload those packets to Azure Blob Storage. Is there a way to append those bytes to an already existing blob? Most solutions I find on the internet advice to download the blob, add the bytes and re-upload them. But I think that's kinda a waste of bandwith since you download and reupload a file all the time
Try using append blobs. There is a code sample at https://azure.microsoft.com/en-us/documentation/articles/storage-dotnet-how-to-use-blobs/#writing-to-an-append-blob. From that page:
An append blob is a new type of blob, introduced with version 5.x of the Azure storage client library for .NET. An append blob is optimized for append operations, such as logging. Like a block blob, an append blob is comprised of blocks, but when you add a new block to an append blob, it is always appended to the end of the blob. You cannot update or delete an existing block in an append blob. The block IDs for an append blob are not exposed as they are for a block blob.
Each block in an append blob can be a different size, up to a maximum of 4 MB, and an append blob can include a maximum of 50,000 blocks. The maximum size of an append blob is therefore slightly more than 195 GB (4 MB X 50,000 blocks).
Azure Blob Storage supports re-composing new blobs from existing blocks in already uploaded (and committed) blobs and combining those existing blocks with new blocks you upload. The order of the blocks can be chosen freely, so you can append new blocks or insert new blocks.
Append blobs are mostly used for applications that continuously add data to an object.
For an overview, check here.

Texture streaming in DirectX11, Immutable vs Dynamic

We often have the case where we need to stream textures to the graphics card (in game case: terrains, in my case image from different input sources like cameras/capture cards/videos)
Of course in camera case, I receive my data in a separate thread, but still need to upload that data to the GPU for display.
I know 2 models for it.
Use a dynamic resource:
You create a dynamic texture which has the same size and format as your input image, when you receive a new image you set a flag that tells you need upload, and then use map in the device context to upload the texture data (with eventual double buffer of course).
Advantage is you have a single memory location, hence you don't have memory fragmentation over time.
Drawback is you need to upload in immediate context, so your upload had to be in your render loop.
Use immutable and load/discard
In that case you upload in the image receiving thread, by creating a new resource, push the data and discard the old resource.
Advantage is you should have a stall free upload (no need for immediate context, you can still run your command list while texture is uploading), resource can be used with a simple trigger once available (to swap SRV).
Drawback is you can fragment memory over time (by allocating and freeing resources in a constant manner (30 fps for a standard camera as example).
Also you have to deal with throttling yourself (but that part is not a big deal).
So is there something I missed in those techniques, or is there an even better way to handle this?
These are the two main methods of updating textures D3D11.
However, the assumption that the first method will not result in memory usage patterns identical to the second case is dependent on the driver, and likely is not true. You would use D3D11_MAP_WRITE_DISCARD if you are overwriting the whole image (which it sounds like what you are doing), meaning that the current contents of the buffer become undefined. However, this is only true from the CPU's point-of-view. They are retained for the GPU, if they are potentially used in a pending draw operation. Most (maybe all?) drivers will actually allocate new storage for the write location of the mapped texture in this case, otherwise command buffer processing would need to stall. The same holds if you do not use the discard flag. Instead, when the map command is processed in the command buffer, the resource's buffer is updated to the value returned from Map in D3D11_MAPPED_SUBRESOURCE.
Also, it is not true that you must update dynamic textures in the immediate context. Only that if you update them in a deferred context, you must use the D3D11_MAP_DISCARD flag. This means you could update the texture on a worker thread, if you are overwriting the entire texture.
The bottom line is that, since the CPU/GPU system on a PC is not a unified memory system, there will be synchronization issues updating GPU resources coming from the CPU.

Golang async face detection

I'm using an OpenCV binding library for Go and trying to asynchronously detect objects in 10 images but keep getting this panic. Detecting only 4 images never fails.
var wg sync.WaitGroup
for j := 0; j < 10; j++ {
wg.Add(1)
go func(i int) {
image := opencv.LoadImage(strconv.Itoa(i) + ".jpg")
defer image.Release()
faces := cascade.DetectObjects(image)
fmt.Println((len(faces) > 0))
wg.Done()
}(j)
}
wg.Wait()
I'm fairly new to OpenCV and Go and trying to figure where the problem lies. I'm guessing some resource is being exhausted but which one.
Each time you call DetectObjects the underlying implementation of OpenCV builds a tree of classifiers and stores them inside of cascade. You can see part of the handling of these chunks of memory at https://github.com/Itseez/opencv/blob/master/modules/objdetect/src/haar.cpp line 2002
Your original code only had one cascade as a global. Each new go routine call DetectObjects used the same root cascade. Each new image would have free'd the old memory and rebuilt a new tree and eventually they would stomp on each other's memory use and cause a dereference through 0, causing the panic.
Moving the allocation of the cascade inside the goroutine allocates a new one for each DetectObject call and they do not share any memory.
The fact that it never happened on 4 images, but failed on 5 images is the nature of computing. You got lucky with 4 images and never saw the problem. You always saw the problem on 5 images because exactly the same thing happened each time (regardless of concurrency).
Repeating the same image multiple times doesn't cause the cascade tree to be rebuilt. If the image didn't change why do the work ... an optimization in OpenCV to handle multiple images frames.
The problem seemed to be having the cascade as a global variable.
Once I moved
cascade := opencv.LoadHaarClassifierCascade("haarcascade_frontalface_alt.xml")
into the goroutine all was fine.
You are not handling for a nil image
image := opencv.LoadImage(strconv.Itoa(i) + ".jpg")
if image == nil {
// handle error
}

Resources