I have a self-hosten GitLab CE Omnibus installation (version 11.5.2) running including the container registry.
Now, the disk size needed to host all those containers increase quite fast.
As an admin, I want to list all Docker images in this registry including their size, so I can maybe let those get deleted.
Maybe I haven't looked hard enough but currently, I couldn't find something in the Admin Panel of GitLab. Before I make myself the work of creating a script to compare that weird linking between repositories and blobs directories in /var/opt/gitlab/gitlab-rails/shared/registry/docker/registry/v2 and then aggregating the sizes based on the repositories, i wanted to ask:
Is there some CLI command or even a curl call to the registry to get the information I want?
Update: This answer is deprecated by now. Please see the accepted answer for a solution built into GitLab's Rails console directly.
Original Post:
Thanks to great comment from #Rekovni, my problem is somehwat solved.
First: The huge amount of used disk space by Docker Images was due to a bug in Gitlab/Docker Registry. Follow the link from Rekovni's comment below my question.
Second: In his link, there's also an experimental tool which is being developed by GitLab. It lists and optionally deletes those old unused Docker layers (related to the bug).
Third: If anyone wants do his own thing, I hacked together a pretty ugly script which lists the image size for every repo:
#!/usr/bin/env python3
# coding: utf-8
import os
from os.path import join, getsize
import subprocess
def get_human_readable_size(size,precision=2):
suffixes=['B','KB','MB','GB','TB']
suffixIndex = 0
while size > 1024 and suffixIndex < 4:
suffixIndex += 1
size = size/1024.0
return "%.*f%s"%(precision,size,suffixes[suffixIndex])
registry_path = '/var/opt/gitlab/gitlab-rails/shared/registry/docker/registry/v2/'
repos = []
for repo in os.listdir(registry_path + 'repositories'):
images = os.listdir(registry_path + 'repositories/' + repo)
for image in images:
try:
layers = os.listdir(registry_path + 'repositories/{}/{}/_layers/sha256'.format(repo, image))
imagesize = 0
# get image size
for layer in layers:
# get size of layer
for root, dirs, files in os.walk("{}/blobs/sha256/{}/{}".format(registry_path, layer[:2], layer)):
imagesize += (sum(getsize(join(root, name)) for name in files))
repos.append({'group': repo, 'image': image, 'size': imagesize})
# if folder doesn't exists, just skip it
except FileNotFoundError:
pass
repos.sort(key=lambda k: k['size'], reverse=True)
for repo in repos:
print("{}/{}: {}".format(repo['group'], repo['image'], get_human_readable_size(repo['size'])))
But please do note, it's really static, doesn't list specific tags for an image, doesn't take into account that some layers might be used by other images as well. But it will give you a rough estimate in case you don't want to use Gitlab's tool written above. You might use the ugly script as you like, but I do not take any liability whatsoever.
The current answer should now be marked as deprecated.
As posted in the comments, if your repositories are nested, you will miss projects. Additionally, from experience, it seems to under-count the disk space used by the the repositories it finds. It will also skip repositories that are created with Gitlab 14 and up.
I was made aware of that by using the Gitlab Rails Console that is now available: https://docs.gitlab.com/ee/administration/troubleshooting/gitlab_rails_cheat_sheet.html#registry-disk-space-usage-by-project
You can adapt that command to increase the number of projects it will find as it's only looking at the 100 last projects.
I have a problem, as I think, with my prosody configuration. When I am sending files (for example photos) more the ~2 or 3 megabytes (as I established experimentally) using Converstions 2.* version (android IM app) it transfers this files using peer to peer connection instead of uploading this file to server and sending a link to my interlocutor. Small files transfers well using http upload. And I couldn't find a reason for such behavior.
Here are some lines for http_upload module from my config, that I took from official documentation (where I hadn't found a setup for turning off peer to peer files transfer):
http_upload_file_size_limit = 536870912 -- 512 MB in bytes
http_upload_expire_after = 604800 -- 60 * 60 * 24 * 7
http_upload_quota = 10737418240 -- 10 GB
http_upload_path = "/var/lib/prosody"
And this is my full config: https://pastebin.com/V6DNYrhe
Small files are transferred well using http upload. And I couldn't
find a reason for such behavior.
TL;DR: You put options in the wrong place. The default 1MB limit
applies. This is advertised to clients so they know about it and can use
more efficient p2p transfer methods for very large files.
http_upload_path = "/var/lib/prosody"
This line makes Prosodys data directory public, allowing anyone easy
access to all user data. You really don't want to do that. You are
lucky you did not put that in the correct section.
And this is my full config: https://pastebin.com/V6DNYrhe
"http_upload" is in the global modules_enabled list which will load
it onto all VirtualHost(s).
You have added options to the end of the config file, putting them under
a Component section. That makes those options only apply to that
Component.
Thus, the VirtualHost where mod_http_upload is loaded sees no options
set and will use the defaults.
http_upload_file_size_limit = 536870912 -- 512 MB in bytes
Don't do this. Prosodys built-in HTTP server is not optimized for very
large uploads. There is a safety limit on HTTP request size that will
cap HTTP upload size limit to 10M to prevent DoS attacks.
While that limit can be changed, I would strongly suggest you look at
https://modules.prosody.im/mod_http_upload_external.html instead.
We are trying to use Dask to clean up some data as part of an ETL process.
The original file is over 3GB csv .
When we run the code on a subset (1GB) the code runs successfully (with a few user warning regarding our cleaning procedures such as:
ddf[id1] = ddf[id1].str.extract(´(\d+)´)
repeater = re.compile(r´((\d)\2{5,}´)
mask_repeater = ddf[id1].str.contrains(repeater, regex=True)
ddf = ddf[~mask_repeater]
On the 3GB file the process nearly completes (there is only one task left - drop-duplicates-agg) and then restarts from the middle (that is what I can see from the bokeh status website). we also see the warning which is the same as when the script starts to run.
RuntimeWarning: Couldn't detect a suitable IP address for reaching '8.8.8.8', defaulting to '127.0.0.1'...
I´m running on a offline single windows64bit workstation with 24 cores .
Any suggestions?
I want to do some IO operations (reading files, listing folder content, etc) on a shared/network folder (\\JoulePC\Test\). If the folder is offline then the program will freeze for quite a while EVERY TIME I try to access it.
What I need to build is something like this:
function DriveIsOnline(Path: string): Boolean;
The function should return an answer quickly (under 1 second). I would use the DriveIsOnline before performing any IO operations on that remove folder.
__
The API function GetDriveType will return 1 (which means 'The root path is invalid') if the drive is offline. Will it be logically correct to consider this answer ('1') as an indication that the drive is offline?
I want to load a text file in Session.
The file size is about 50KB ~ 100KB.
When user trigger the function in my page. it will create the Session.
My Server's RAM is about 8GB. and the max users is about 100
Because there will be a script run in background to collect IP and MAC in LAN.
The script continues write data into text file.
In the same time, the webpage will using Ajax to fetch fresh data from text file.and display on the page.
Is it suitable to implement by session to keep the result? or any better way to achieve ?
Thanks ~
The Python script will collect the data in the LAN in 1 ~ 3 minutes.(Background job)
To avoid blocking for 1~3 minutes. I will use Ajax to fetch the data in text file (continuing added by Python script) and show on the page.
And my user should carry the information cross pages. So I want to store the data in Session.
00:02:D1:19:AA:50: 172.19.13.39
00:02:D1:13:E8:10: 172.19.12.40
00:02:D1:13:EB:06: 172.19.1.83
C8:9C:DC:6F:41:CD: 172.19.12.73
C8:9C:DC:A4:FC:07: 172.19.12.21
00:02:D1:19:9B:72: 172.19.13.130
00:02:D1:13:EB:04: 172.19.13.40
00:02:D1:15:E1:58: 172.19.12.37
00:02:D1:22:7A:4D: 172.19.11.84
00:02:D1:24:E7:0F: 172.19.1.79
00:FD:83:71:00:10: 172.19.11.45
00:02:D1:24:E7:0D: 172.19.1.77
00:02:D1:81:00:02: 172.19.11.58
00:02:D1:24:36:35: 172.19.11.226
00:02:D1:1E:18:CA: 172.19.12.45
00:02:D1:0D:C5:A8: 172.19.1.45
74:27:EA:29:80:3E: 172.19.12.62
Why does this need to be stored in the browser? Couldn't you fire off what you're collecting to a data store somewhere?
Anyway, assuming you HAD to do this, and the example you gave is pretty close to the data you'll actually be seeing, you have a lot of redundant data there. You could save space for the IPs by creating a hash pointing to each successive value, I.E.
{172 => {19 => {13 => [39], 12 => [40, 73, 21], 1 => [83]}}} ...etc. Similarly for the MAC addresses. But again, you can probably simplify this problem a LOT by storing the info you need somewhere other than the session.