When you type docker images it will show you which images are locally available and other information. Part of this information is the virtual size. What exactly is that?
I found a little explanation in GitHub Issues #22 on docker, but this still is not clear to me. What I really like to know is, the amount of bytes to be downloaded and how many bytes an images needs on my hard drive.
Additionally Docker Hub 2.0 has still another information. When you look to the Tags page of an image there is another value shown. At least this seems to be always much smaller compared to the information given by docker images.
The "virtual size" refers to the total sum of the on-disk size of all the layers the image is composed of. For example, if you have two images, app-1 and app-2, and both are based on a common distro image/layer whose total size is 100MB, and app-1 adds an additional 10MB but app-2 adds an additional 20MB, the virtual sizes will be 110MB and 120MB respectively, but the total disk usage will only be 130MB since that base layer is shared between the two.
The transfer size is going to be less (in most cases by quite a bit) due to gzip compression being applied to the layers while in transit.
The extended details provided in https://github.com/docker-library/docs/blob/162cdda0b66dd62ea1cc80a64cb6c369e341adf4/irssi/tag-details.md#irssilatest might make this more concretely obvious. As you can see there, the virtual size (sum of all the on-disk layer sizes) of irssi:latest is 261.1MB, but the "Content-Length" (compressed size in-transit) is only 97.5MB, and that's assuming that you don't already have any of the layers, when it's fairly likely you already have that first layer downloaded, which accounts for 125.1MB of the virtual size and 51.4MB of the "Content-Length" (and it's likely you have it already because that top layer is debian:jessie, which is a common base for the top-level images).
irssi:latest
Total Virtual Size: 261.1 MB (261122797 bytes)
Total v2 Content-Length: 97.5 MB (97485603 bytes)
Layers (13)
6d1ae97ee388924068b7a4797d995d57d1e6194843e7e2178e592a880bf6c7ad
Created: Fri, 04 Dec 2015 19:27:57 GMT
Docker Version: 1.8.3
Virtual Size: 125.1 MB (125115267 bytes)
v2 Blob: sha256:d4bce7fd68df2e8bb04e317e7cb7899e981159a4da89339e38c8bf30e6c318f0
v2 Content-Length: 51.4 MB (51354256 bytes)
v2 Last-Modified: Fri, 04 Dec 2015 19:45:49 GMT
8b9a99209d5c8f3fc5b4c01573f0508d1ddaa01c4f83c587e03b67497566aab9
...
Related
I'm trying to read an animated gif with ImageMagick. The file in question is available online, located here.
My code (linked with ImageMagick/MagickWand 7) is
#include <stdlib.h>
#include <MagickWand/MagickWand.h>
int main(void){
MagickWand *magick_wand;
MagickWandGenesis();
magick_wand = NewMagickWand();
MagickReadImage(magick_wand, "animated.gif");
return 0;
}
If I run this in the debugger and move to the line right after the image is read, the process is taking up 1.4GB of memory, according to top. I've found animated gifs with similar file sizes, and they don't go anywhere near this amount of memory consumption. Unfortunately, my experience with animated gif processing is very limited, so I'm not sure what's reasonable or not.
I have a few questions: Is this reasonable? Is it a bug? Does anyone know what makes the memory consumption of one file different from another? Is there way to control the memory consumption of ImageMagick? There's apparently a file called policy.xml which can be used to specify upper memory limits, but I've set it low and still get this behavior.
If you're curious about the larger context behind this question, in real life I'm using a python library called Wand to do this in a CMS web application. If a user uploads this particular file, it causes the OOM killer to kill the app server process (the OOM limit on these machines is set fairly low).
[Update]:
I've been able to get memory limits in policy.xml to work, but I need to set both the "memory" and "map" values. Setting either low but not the other doesn't work. I'm still curious on the other points.
ImageMagick6 decompresses the entire image to memory on load and represents each pixel as a sixteen bit number. This needs a lot of memory! ImageMagick7 uses floats rather than 16 bit numbers, so it'll be twice the size again. Your GIF is 1920 x 1080 RGBA pixels and has 45 frames, so that's 1920 * 1080 * 45 * 4 * 4 bytes, or about 1.4gb.
To save memory, you can get IM to open large images via a temporary disk file. This will be easier on your RAM, but will be a lot slower.
Other image processing libraries can use less memory -- for example libvips can stream images on demand rather than loading them into RAM, and this can give a large saving. With your image and pyvips I see:
$ python3
Python 3.10.7 (main, Nov 24 2022, 19:45:47) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyvips
>>> import os, psutil
>>> process = psutil.Process(os.getpid())
>>> # n=-1 means load all frames, access="sequential" means we want to stream
>>> x = pyvips.Image.new_from_file("huge2.gif", n=-1, access="sequential")
>>> # 50mb total process size after load
>>> process.memory_info().rss
49815552
>>> # compute the average pixel value for the entire animation
>>> x.avg()
101.19390990440672
>>> process.memory_info().rss
90320896
>>> # total memory use is now 90mb
>>>
I start continous flight regarding for my app like this:
java -XX:StartFlightRecording:filename=jfr-logs/ ...
Everytime the app is stopped, a new file is generated in the jfr-logs directory:
ls -l jfr-logs
-rw-r--r-- 1 jumar staff 2.2M Dec 13 09:28 hotspot-pid-57471-id-1-2021_12_13_09_28_01.jfr
-rw-r--r-- 1 jumar staff 1.0M Dec 13 09:28 hotspot-pid-57923-id-1-2021_12_13_09_28_19.jfr
I'd like to make sure thare are no more than X number of these files and/or that they don't consume more than Y MBs of disk space.
I run this app in a Docker container and the jfr-logs directory is stored on a persistent volume.
You can set the max size of a recording, but it's more of minimum size, because it is the size at which JFR will rotate the file, so it might be exceeded with a few MB.
Example:
java -XX:StartFlightRecording:maxsize=50MB,filename=jfr-logs/ ...
By default, JFR uses a maxsize of 250 MB.
There is no way to limit the number of files, but it's possible to set the max chunk size, which is the size at which the chunk file will be rotated, possibly exceeded with a few MB.
-XX:FlightRecorderOptions:maxchunksize=20MB
The default size is 12 MB, and it is recommended to keep as is, unless there is an issue.
Setting it too low, will increase overhead due to additional rotations and less reuse of reoccurring constants.
Setting it too high might result in additional overhead and memory usage when parsing the file.
All testing of JFR has be done with the default max chunk size.
My jenkins instance has been running for over two years without issue but yesterday quit responding to http requests. No errors, just clocks and clocks.
I've restarted the service, then restarted the entire server.
There's been a lot of mention of a thread dump. I attempted to get that but I'm not sure that this is that.
Heap
PSYoungGen total 663552K, used 244203K [0x00000000d6700000, 0x0000000100000000, 0x0000000100000000)
eden space 646144K, 36% used [0x00000000d6700000,0x00000000e4df5f70,0x00000000fde00000)
from space 17408K, 44% used [0x00000000fef00000,0x00000000ff685060,0x0000000100000000)
to space 17408K, 0% used [0x00000000fde00000,0x00000000fde00000,0x00000000fef00000)
ParOldGen total 194048K, used 85627K [0x0000000083400000, 0x000000008f180000, 0x00000000d6700000)
object space 194048K, 44% used [0x0000000083400000,0x000000008879ee10,0x000000008f180000)
Metaspace used 96605K, capacity 104986K, committed 105108K, reserved 1138688K
class space used 12782K, capacity 14961K, committed 14996K, reserved 1048576K
Ubuntu 16.04.5 LTS
I prefer looking in the jenkins log file. There you can see error and then fix them.
here's output from a Duplicity backup that I run every night on a server:
--------------[ Backup Statistics ]--------------
StartTime 1503561610.92 (Thu Aug 24 02:00:10 2017)
EndTime 1503561711.66 (Thu Aug 24 02:01:51 2017)
ElapsedTime 100.74 (1 minute 40.74 seconds)
SourceFiles 171773
SourceFileSize 83407342647 (77.7 GB)
NewFiles 15
NewFileSize 58450408 (55.7 MB)
DeletedFiles 4
ChangedFiles 6
ChangedFileSize 182407535 (174 MB)
ChangedDeltaSize 0 (0 bytes)
DeltaEntries 25
RawDeltaSize 59265398 (56.5 MB)
TotalDestinationSizeChange 11743577 (11.2 MB)
Errors 0
-------------------------------------------------
I don't know if I'm reading this right, but what it seems to be saying is that:
I started with 77.7 GB
I added 15 files totaling 55.7 MB
I deleted or changed files whose sum total was 174 MB
My deltas after taking all changes into account totaled 56.5 MB
The total disk space on the remote server that I pushed the deltas to was 11.2 MB
It seems to me that we're saying I only pushed 11.2 MB but should've probably pushed at least 55.7 MB because of those new files (can't really make a small delta of a file that didn't exist before), and then whatever other disk space the deltas would've taken.
I get confused when I see these reports. Can someone help clarify? I've tried digging for documentation but am not seeing much in the way of clear, concise plain English explanations on these values.
Disclaimer: I couldn't find a proper resource that explained the difference nor something in the duplicity docs that supports this theory.
ChangedDeltaSize, DeltaEntries and RawDeltaSize do not relate to changes in the actual files, they are related to differences between sequential data. Duplicity uses the rsync algorithm to create your backups which in its turn is a type of delta encoding.
Delta encoding is a way of storing data in the form of differences rather than complete files. Thus the delta changes you see listed is a change in those pieces of data and can therefore be smaller. In fact I think they should be smaller as they are just small snippets of changed data.
Some sources:
- http://duplicity.nongnu.org/ "Encrypted bandwidth-efficient backup using the rsync algorithm" .
- https://en.wikipedia.org/wiki/Rsync " The rsync algorithm is a type of delta encoding.. "
- https://en.wikipedia.org/wiki/Delta_encoding
I'm using LVM thinpool for docker storage (dm.thinpooldev) and I've run out of space on the meta data pool a few times so far. It's easy to combat as I can just recreate the thinpool with large metadata, but I'm just guessing (and probably over guessing) how big to make it.
Does anyone have any suggestions for relative size of meta data for Docker? It looks like the defaults in lvcreate aren't enough:
--poolmetadatasize MetadataVolumeSize[bBsSkKmMgG]
Sets the size of pool's metadata logical volume. Supported values are in range between 2MiB and 16GiB for thin pool,
and upto 16GiB for cache pool.
The minimum value is computed from pool's data size. Default value for thin pool is (Pool_LV_size / Pool_LV_chunk_size *
64b). Default unit is
megabytes.
The basic commands I'm using are:
DISK=/dev/xvdf
VG=docker_vg
LV=docker_pool
pvcreate $DISK
vgcreate $VG $DISK
lvcreate -l 100%FREE --thinpool $LV $VG
Or substitute an arbitrary meta data size
lvcreate -l 100%FREE --poolmetadatasize 200M --thinpool $LV $VG
[EDIT]
Well, no response, so I'm just going with 1% for now. This is working for us so far, though probably still over-provisioned.
DISK=/dev/xvdf
VG=docker_vg
LV=docker_pool
DOCKER_POOL_METADATA_PERCENTAGE=1
DISK_SIZE=$(blockdev --getsize64 ${DISK})
META_DATA_SIZE=$(echo "scale=0;${DISK_SIZE}*${DOCKER_POOL_METADATA_PERCENTAGE}/100" | bc)
pvcreate ${DISK}
vgcreate ${DOCKER_VG} ${DISK}
# This metadata sizing is in k because bytes doesn't seem to translate properly
lvcreate -l 100%FREE --poolmetadatasize $((${META_DATA_SIZE}/1024))k --thinpool ${DOCKER_POOL} ${DOCKER_VG}