GitLab: How to list registry containers with size - docker

I have a self-hosten GitLab CE Omnibus installation (version 11.5.2) running including the container registry.
Now, the disk size needed to host all those containers increase quite fast.
As an admin, I want to list all Docker images in this registry including their size, so I can maybe let those get deleted.
Maybe I haven't looked hard enough but currently, I couldn't find something in the Admin Panel of GitLab. Before I make myself the work of creating a script to compare that weird linking between repositories and blobs directories in /var/opt/gitlab/gitlab-rails/shared/registry/docker/registry/v2 and then aggregating the sizes based on the repositories, i wanted to ask:
Is there some CLI command or even a curl call to the registry to get the information I want?

Update: This answer is deprecated by now. Please see the accepted answer for a solution built into GitLab's Rails console directly.
Original Post:
Thanks to great comment from #Rekovni, my problem is somehwat solved.
First: The huge amount of used disk space by Docker Images was due to a bug in Gitlab/Docker Registry. Follow the link from Rekovni's comment below my question.
Second: In his link, there's also an experimental tool which is being developed by GitLab. It lists and optionally deletes those old unused Docker layers (related to the bug).
Third: If anyone wants do his own thing, I hacked together a pretty ugly script which lists the image size for every repo:
#!/usr/bin/env python3
# coding: utf-8
import os
from os.path import join, getsize
import subprocess
def get_human_readable_size(size,precision=2):
suffixes=['B','KB','MB','GB','TB']
suffixIndex = 0
while size > 1024 and suffixIndex < 4:
suffixIndex += 1
size = size/1024.0
return "%.*f%s"%(precision,size,suffixes[suffixIndex])
registry_path = '/var/opt/gitlab/gitlab-rails/shared/registry/docker/registry/v2/'
repos = []
for repo in os.listdir(registry_path + 'repositories'):
images = os.listdir(registry_path + 'repositories/' + repo)
for image in images:
try:
layers = os.listdir(registry_path + 'repositories/{}/{}/_layers/sha256'.format(repo, image))
imagesize = 0
# get image size
for layer in layers:
# get size of layer
for root, dirs, files in os.walk("{}/blobs/sha256/{}/{}".format(registry_path, layer[:2], layer)):
imagesize += (sum(getsize(join(root, name)) for name in files))
repos.append({'group': repo, 'image': image, 'size': imagesize})
# if folder doesn't exists, just skip it
except FileNotFoundError:
pass
repos.sort(key=lambda k: k['size'], reverse=True)
for repo in repos:
print("{}/{}: {}".format(repo['group'], repo['image'], get_human_readable_size(repo['size'])))
But please do note, it's really static, doesn't list specific tags for an image, doesn't take into account that some layers might be used by other images as well. But it will give you a rough estimate in case you don't want to use Gitlab's tool written above. You might use the ugly script as you like, but I do not take any liability whatsoever.

The current answer should now be marked as deprecated.
As posted in the comments, if your repositories are nested, you will miss projects. Additionally, from experience, it seems to under-count the disk space used by the the repositories it finds. It will also skip repositories that are created with Gitlab 14 and up.
I was made aware of that by using the Gitlab Rails Console that is now available: https://docs.gitlab.com/ee/administration/troubleshooting/gitlab_rails_cheat_sheet.html#registry-disk-space-usage-by-project
You can adapt that command to increase the number of projects it will find as it's only looking at the 100 last projects.

Related

Rospy message_filter ApproximateTimeSynchronizer issue

I installed ROS melodic version in Ubuntu 18.04.
I'm running a rosbag in the background to mock cameras in messages rostopics.
I set the camera names in rosparams and iterated through it to capture each camera topics.
I'm using message_filter ApproximateTimeSynchronizer to get time synchronized data as mentioned in the official documentation,
http://wiki.ros.org/message_filters
But most of the time the callback function to ApproximateTimeSynchronizer is not being called/is having delay. The code snippet I'm using is given below:
What am I doing wrong here?
def camera_callback(*args):
pass # Other logic comes here
rospy.init_node('my_listener', anonymous=True)
camera_object_data = []
for camera_name in rospy.get_param('/my/cameras'):
camera_object_data.append(message_filters.Subscriber(
'/{}/hd/camera_info'.format(camera_name), CameraInfo))
camera_object_data.append(message_filters.Subscriber(
'/{}/hd/image_color_rect'.format(camera_name), Image))
camera_object_data.append(message_filters.Subscriber(
'/{}/qhd/image_depth_rect'.format(camera_name), Image))
camera_object_data.append(message_filters.Subscriber(
'/{}/qhd/points'.format(camera_name), PointCloud2)
topic_list = [filter_obj for filter_obj in camera_object_data]
ts = message_filters.ApproximateTimeSynchronizer(topic_list, 10, 1, allow_headerless=True)
ts.registerCallback(camera_callback)
rospy.spin()
Looking at your code, it seems correct. There is, however, a trouble with perhaps bad timestamps and ergo this synchronizer as well, see http://wiki.ros.org/message_filters/ApproximateTime for algorithm assumptions.
My recommendation is to write a corresponding node that publishes empty versions of these four msgs all at the same time. If it's still not working in this perfect scenario, there is an issue with the code above. If it is working just fine, then you need to pay attention to the headers.
Given that you have it as a bag file, you can step through the msgs on the command line and observe the timestamps as well. (Can also step within python).
$ rosbag play --pause recorded1.bag # step through msgs by pressing 's'
On time-noisy msgs with small payloads, I've just written a node to listen to all these msgs, and republish them all with the latest time found on any of them (for sync'd logging to csv). Not optimal, but it should reveal where the issue lies.

How to not rebuild artifacts on every invocation

I want to download and build ruby within a workspace. I've been trying to implement this by mimicking rules_go. I have that part working. The issue I'm having is it rebuilds the openssl and ruby artifacts each time ruby_download_sdk is invoked. In the code below the download artifacts are cached but the builds of openssl and ruby are always executed.
def ruby_download_sdk(name, version = None):
# TODO detect os and arch
os, arch = "osx", "x86_64"
_ruby_download_sdk(
name = name,
version = version,
)
_register_toolchains(name, os, arch)
def _ruby_download_sdk_impl(repository_ctx):
# TODO detect platform
platform = ("osx", "x86_64")
_sdk_build_file(repository_ctx, platform)
_remote_sdk(repository_ctx)
_ruby_download_sdk = repository_rule(
_ruby_download_sdk_impl,
attrs = {
"version": attr.string(),
},
)
def _remote_sdk(repository_ctx):
_download_openssl(repository_ctx, version = "1.1.1c")
_download_ruby(repository_ctx, version = "2.6.3")
openssl_path, ruby_path = "openssl/build", ""
_build(repository_ctx, "openssl", openssl_path, ruby_path)
_build(repository_ctx, "ruby", openssl_path, ruby_path)
def _build(repository_ctx, name, openssl_path, ruby_path):
script_name = "build-{}.sh".format(name)
template_name = "build-{}.template".format(name)
repository_ctx.template(
script_name,
Label("#rules_ruby//ruby/private:{}".format(template_name)),
substitutions = {
"{ssl_build}": openssl_path,
"{ruby_build}": ruby_path,
}
)
repository_ctx.report_progress("Building {}".format(name))
res = repository_ctx.execute(["./" + script_name], timeout=20*60)
if res.return_code != 0:
print("res %s" % res.return_code)
print(" -stdout: %s" % res.stdout)
print(" -stderr: %s" % res.stderr)
Any advice on how I can make bazel aware such that it doesn't rebuild these build artifacts every time?
Problem is, that bazel isn't really building your ruby and openssl. When it prepares your build tree and runs the repository rule, it just executes a shell script as instructed, which apparently happens to build, but that fact is essentially opaque to bazel (and it also happens before bazel itself would even build).
There might be other, but I see the following as your options from top of my head:
Pre-build your ruby environment and its results as external dependency. The obvious downside (which may or may not be quite a lot of pain) being you need to do so for all platforms you need to supports (incl. making sure correct detection and corresponding download). The upside being you really only build once (per platform) and also have control over tooling used across all hosts. This would likely be my primary choice.
Build ssl and ruby as any other C sources making them just another bazel target. This however means you'd need to bazelify their builds (describe and maintain bazel build of otherwise bazel unaware project).
You can continue further along the path you've started and just (sort of) leave bazel out of it. I.e. for these builds extend the magic and in the build scripts used for instance using deterministic location and perhaps manifest files of what is around (also to make corruption less likely) make it possible to determine that the build has indeed already taken place and you can just collect its previous results.

Continuous Integration with Blue Ocean, Github and Nuget causes path too long

NUnit.Extension.VSProjectLoader.3.7.0
I try to get a build chain to work with Jenkins Blue Ocean where the sources are in GitHub and additional dependencies are in nuget.
When I restore packages I get the error after the specific package NUnit.Extension.VSProjectLoader.3.7.0:
Errors in packages.config projects
The specified path, file name, or both are too long. The fully qualified file name must be less than 260 characters, and the directory name must be less than 248 characters.
On the agent machine the path is very short: C:\guinode\ on top of that additional length is added making the packages folder the following size:
MyGitProject is replacing my actual project name, the length is equal.
C:\guinode\workspace\MyGitProject_master-CFRRXMXQEUULVB4YKQOFGB65CQNC4U5VJKTARN2A6TSBK5PBATBA\packages
Checking the package on the agent machine shows that NUnit.Extension.VSProjectLoader.3.7.0 was loaded completely.
Checking a local installation and replacing the first path of the package I can find two files that are 260 characters or longer.
They belong to an internal project, so I have a chance of influencing that.
None of the directories are 248 characters or more.
So the immediate solution for me is to redeploy the internal reference package.
My question for future reference is if I can do something to the packages location or something to workspace\MyGitProject_master-CFRRXMXQEUULVB4YKQOFGB65CQNC4U5VJKTARN2A6TSBK5PBATBA so that I save some characters per default.
According to the microsoft documentation it can be possible to modify the 260 length rule.
If you prefix your file with '\\?\' eg: '\\?\C:\guinode\workspace...' then long path will be in use ( a little bit more than 32000 char). I hope settings JENKINS_HOME to this kind of path make all process use that (I'm not sure)
On recent Windows version (10.1607, 2016?) there is an option in the registry to enable long path. Set 1 to the following key: HKLM\SYSTEM\CurrentControlSet\Control\FileSystem LongPathsEnabled (Type: REG_DWORD) and restart the process.

Is it possible to keep my Nix packages in sync across machines not running NixOS?

I know with NixOS, you can simply copy over the configuration.nix file to sync your OS state including installed packages between machines.
Is it possible then, to do the same using Nix the package manager on a non-NixOS OS to sync only the installed packages?
Please note, that at least since 30.03.2017 (corresponding to 17.03 Nix/NixOS channel/release), as far as I understand the official, modern, supported and suggested solution is to use the so called overlays.
See the chapter titled "Overlays" in the nixpkgs manual for a nice guide on how to use the new approach.
As a short summary: you can put any number of files with .nix extension in $HOME/.config/nixpkgs/overlays/ directory. They will be processed in alphabetical order, and each one can modify the set of available Nix packages. Each of the files must be written as in the following pattern:
self: super:
{
boost = super.boost.override {
python = self.python3;
};
rr = super.callPackage ./pkgs/rr {
stdenv = self.stdenv_32bit;
};
}
The super set corresponds to the "old" set of packages (before the overlay was applied). If you want to refer to the old version of a package (as in boost above), or callPackage, you should reference it via super.
The self set corresponds to the eventual, "future" set of packages, representing the final result after all overlays are applied. (Note: don't be scared when sometimes using them might get rejected by Nix, as it would result in infinite recursion. Probably you should rather just use super in those cases instead.)
Note: with the above changes, the solution I mention below in the original answer seems "deprecated" now — I believe it should still work as of April 2017, but I have no idea for how long. It appears marked as "obsolete" in the nixpkgs repository.
Old answer, before 17.03:
Assuming you want to synchronize apps per-user (as non-NixOS Nix keeps apps visible on per-user basis, not system-wide, as far as I know), it is possible to do it declaratively. It's just not well advertised in the manual — though it seems quite popular among long-time Nixers!
You must create a text file at: $HOME/.nixpkgs/config.nix — e.g.:
$ mkdir -p ~/.nixpkgs
$ $EDITOR ~/.nixpkgs/config.nix
then enter the following contents:
{
packageOverrides = defaultPkgs: with defaultPkgs; {
home = with pkgs; buildEnv {
name = "home";
paths = [
nethack mc pstree #...your favourite pkgs here...
];
};
};
}
Then you should be able to install all listed packages with:
$ nix-env -i home
or:
$ nix-env -iA nixos.home # *much* faster than above
In paths you can put stuff in a similar way like in /etc/nixos/configuration.nix on NixOS. Also, home is actually a "fake package" here. You can add more custom package definitions beside it, and then include them your "paths".
(Side note: I'm hoping to write a blog post with what I learned on how exactly this works, and also showing how to extend it with more customizations. I'll try to remember to link it here if I succeed.)

Different checksum results for jar files compiled on subsequent build?

I am working verifying the jar files present on remote unix boxes with that of built on local machine(Windows & Cygwin) with same JVM.
As a POC I am trying to verify if same checksum is produced with jar files generated on my machine with consecutive builds, I tried below,
Generated the jar file first time using ant script
Calculated the checksum (e.g. "xyz abc")
Generated the jar file again with same ant script without changing anything
I got different checksum but same byte count (e.g. "xvw abc")
I am not sure how java internal processes produce the class files and then the jar files, Can someone please help me understand below points
Does the cksum utility of unix/cygwin consider timestamp of the file while coming up with the value?
Will the checksum be different for compiled class files/jar file produced if we keep every other things same [Compiler version + sourcecode + machine + environment]?
Answer to question 1: cksum doesn't consider the timestamp of the archive (e.g. jar-file) but it does consider the timestamps of the files inside the jarfile.
Answer to question 2: The checksums of the individual class-files will be the same with all other things the same (source-code, compiler etc.) The checksums of the jar-files will be different. Causes of differences can be the timestamp of the files inside the jarfile or if files are put into the archive in different orders (e.g. caused by parallel builds).
If you want to create a reproducible build with gradle you can do so with the config below:
tasks.withType(AbstractArchiveTask) {
preserveFileTimestamps = false
reproducibleFileOrder = true
}
Maven allows something similar, sorry I don't know how to do this with ant..
More info here:
https://dzone.com/articles/reproducible-builds-in-java
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=74682318

Resources