In Artifactory, under the Storage in Monitoring tab, the Binaries size is 1.57 GB, whereas the Artifacts size is just 6.27 MB. What exactly are Binaries in Artifactory, as they are taking a lot of Storage.
Is it possible to delete these Binaries without affecting the Artifacts?
Binaries size = sum of the sizes all the binaries you uploaded
Artifacts size = sum of the size of all the artifacts stored
Optimization is the amount of optimization provided due to checksum based storage which is in your case 25617%
It appears you are uploading the same binary multiple times to different folders and hence the above situation.
Related
We have got around 80 micro-services in our product. We make quarterly release of our product. In order to identify which version of service is going on which release, we maintain a manifest file in every release. Our manifest files are growing dramatically and we don't like it.
Is there a better way to maintain this catalog?
Have you used any tool which helps you maintaining this application catalog?
I cannot help thinking I am missing the point.
I have a jenkins system that is to store builds in artifactory pro, I would like to end up able to compare builds on the artifactory server and download a set of files as a zip file.
The output of my build is a few thousand files in a folder structure.
As I understand it, artifactory uses a de-duplicating file system so the efficient way to do it is to upload each file individually as only a few change each build.
When I do this, it takes 20 mins but the result on the server is good, I can compare builds and see the changes, however I cannot download the whole release, I need to click on each file and do them one by one.
If I upload a zip file, it is quicker and I can download it, but I loose the ability to see the files inside and presumably this will eat up disk space as there can be no de-duplication.
Ah - the explode option, this unpacks the zip file on the artifactory server, brilliant, except if I diff builds it just shows me the original archive and says, it changed, yet I still have to download every file individually.
Has any one cracked this thing, I want fast upload, diff files (with efficient storage) and single click download?
To download files in a "single click" you can either:
Use the REST API to download a folder as a compressed file
Use the JFrog CLI to download files in a single command using file specs.
HTH,
Yinon
i make my first steps with docker repos in artifactory (5.1.3) and theres something that scares me a little bit.
I pushed different tags from the same docker image (abaout 500MB) to a repo.
I'd expected that storage use and size of the repo would stay at about 500 MB.
But with 5 Image-Versions in it, for example, the repo is about 2,5 GB in size.
Also the the "Max Unique Tags" setting in the local docker repo settings has no effect - i set 3 but nothing is deleted - there are again 5 versions.
With this behaviour we will fill our storage system by the end of the month easily - did i miss something or is this docker stuff in artifactory still beta ?
Artifactory is physically storing the layers for those tags only once, so the actual storage being used should be ~500MB (deduplication).
The reported size you are seeing in the UI (artifacts count / size) is the amount of physical storage that would be occupied if each artifact was a physical binary (not just a link). Since deduplication can occur between different repositories, there is no good way of reporting the physical storage size per repository (one image/tag/layer can be shared between multiple repositories).
In the Storage Summary page you can see both the physical storage size used by Artifactory and how much you gained by deduplication.
My understanding
As far as I understood artifacts up to now, is that it was used to know if my project refers to an unfinished build task's artifacts.
Problem
I tried reading more (on jenkins site too) but I'm not sure I understand so easily what they do now. I know that when I promote a build, I can fingerprint artifacts. What does it mean then?
Artifacts - anything produced during the build process.
Fingerprinting artifacts - recording the MD5 checksum of selected artifacts.
You can then use the MD5 checksum to track a particular artifact back to the build it came from.
Adding to #Slav answer, Fingerprinting help Jenkins in keeping track of which version of a file is used by which version of a dependency.
Quoting an example and how it works from Jenkins Page:
For example:
Suppose you have the TOP project that depends on the MIDDLE project, which in turn depends on the BOTTOM project.
You are working on the BOTTOM project. The TOP team reported that bottom.jar that they are using causes an Null Pointer Exception, which you (a member of the BOTTOM team) thought you fixed in BOTTOM #32.
Jenkins can tell you which MIDDLE builds and TOP builds are using (or not using) your bottom.jar #32.
How does it work?
The fingerprint of a file is simply a MD5 checksum. Jenkins maintains a database of md5sum, and for each md5sum, Jenkins records which builds of which projects used. This database is updated every time a build runs and files are fingerprinted.
To avoid the excessive disk usage, Jenkins does not store the actual file. Instead, it just stores md5sum and their usages. These files can be seen in
$JENKINS_HOME/fingerprints
Is there any way to store metrics of a build in Jenkins, that it is both visible on the build page, and that allows other jobs to later access these metrics and take action based on them?
In my particular case I am running a matrix configuration job. It is performing 25 or so builds. Each build result is archived as an artifact. Each build result has a metric indicating its quality. It is currently stored in a file among the artifacts.
A second job needs to take the build artifact with the best quality metric. Currently it is copying the artifacts from all 25 builds for evaluation, and deletes everything but the best one. But this takes time as each build artifact is about 100MB.
Additionally, it would be nice if the build quality metric was published visually per build in Jenkins.
My current best idea is to only copy the metric report file artifact from each build first, evaluate them, and then somehow copy the complete artifacts only from the best build. Perhaps using the groovy plugin or something similar.
But I am hoping there is a more integrated solution which also makes the metrics of each build more easily viewable on the build page.
The "Plot plugin" is very nice for visualising metrics, but it does not seem to make the metrics available to other jobs?
(Background for those interested: The matrix build job is performing FPGA Place&Route iterations distributed over a build farm, and the quality metric is the achieved timing margin for each iteration)
EDIT
For clarification, below is a image where I have tried to illustrate the current setup. All jobs run on Jenkins slaves. No jobs run on the Jenkins master, it only holds the artifacts.
As can be seen, a total of 2x25x100MB = 5 Gigabyte of data is copied between Jenkins slaves and the Jenkins master. This takes a significant amount of time.
As little as only 2x100MB would need to be copied if the metrics could be evaluated earlier.