Docker LVM thinpool meta data sizing - docker

I'm using LVM thinpool for docker storage (dm.thinpooldev) and I've run out of space on the meta data pool a few times so far. It's easy to combat as I can just recreate the thinpool with large metadata, but I'm just guessing (and probably over guessing) how big to make it.
Does anyone have any suggestions for relative size of meta data for Docker? It looks like the defaults in lvcreate aren't enough:
--poolmetadatasize MetadataVolumeSize[bBsSkKmMgG]
Sets the size of pool's metadata logical volume. Supported values are in range between 2MiB and 16GiB for thin pool,
and upto 16GiB for cache pool.
The minimum value is computed from pool's data size. Default value for thin pool is (Pool_LV_size / Pool_LV_chunk_size *
64b). Default unit is
megabytes.
The basic commands I'm using are:
DISK=/dev/xvdf
VG=docker_vg
LV=docker_pool
pvcreate $DISK
vgcreate $VG $DISK
lvcreate -l 100%FREE --thinpool $LV $VG
Or substitute an arbitrary meta data size
lvcreate -l 100%FREE --poolmetadatasize 200M --thinpool $LV $VG
[EDIT]
Well, no response, so I'm just going with 1% for now. This is working for us so far, though probably still over-provisioned.
DISK=/dev/xvdf
VG=docker_vg
LV=docker_pool
DOCKER_POOL_METADATA_PERCENTAGE=1
DISK_SIZE=$(blockdev --getsize64 ${DISK})
META_DATA_SIZE=$(echo "scale=0;${DISK_SIZE}*${DOCKER_POOL_METADATA_PERCENTAGE}/100" | bc)
pvcreate ${DISK}
vgcreate ${DOCKER_VG} ${DISK}
# This metadata sizing is in k because bytes doesn't seem to translate properly
lvcreate -l 100%FREE --poolmetadatasize $((${META_DATA_SIZE}/1024))k --thinpool ${DOCKER_POOL} ${DOCKER_VG}

Related

Is it neccesary to increase VM memory for redis instance

Now I'm using 4CPU 8GB memory Virtual machine in GCP, and I'm also using redisearch docker container.
I have 47.5Millon Hash keys and I estimate it is about 35GB over. So if I import all of my data at redis-cli in VM, It needs really 35GB over memory?
+ I already tried to import 7.5Millon but memory utilization is about 70% full
If your cache need 35Gb, then, your cache will need 35Gb.
The values you gave are coherent. If 47M keys use 35Gb then 7.5M will use about 5.6Gb (which is also 70% of 8Gb).
If you dont want to change your VM properties then you can use the swap property in the redis conf file to use part of the cold storage of the VM.
Note that you have to be careful using swap, depending on the hardware it can be a pretty bad idea. Using anything but NVMes is bad (even SSDs), as you can see here :
Benchmark with SSds
Benchmark with NVMes

How to disable core dump in a docker image?

I have a service that uses an Docker image. About a half dozen people use it. However, occasionally containers produces big core.xxxx dump files. How do I disable it on docker images? My base image is Debian 9.
To disable core dumps set a ulimit value in /etc/security/limits.conf file and defines some shell specific restrictions.
A hard limit is something that never can be overridden, while a soft limit might only be applicable for specific users. If you would like to ensure that no process can create a core dump, you can set them both to zero. Although it may look like a boolean (0 = False, 1 = True), it actually indicates the allowed size.
soft core 0
hard core 0
The asterisk sign means it applies to all users. The second column states if we want to use a hard or soft limit, followed by the columns stating the setting and the value.

Cassandra Data storage: data directory space not equal to the space occupied

This is a beginners question on Cassandra Architecture.
I have a 3 node Cassandra cluster. The data directory is at $CASSANDRA_HOME/data/data. I've loaded a huge data set. I did a nodetool flush and then nodetool tablestats on the table I loaded the data. This says the total space occupied is around 50GiB. I was curious and checked the size of my data directory du $CASSANDRA_HOME/data/data on each of the nodes,which shows around 1-2GB on each. How could the data directory be less than the space occupied by a single table? Am I missing something? My table is created with replication factor 1
du gives out the true storage capacity used by the paths given to it. This is not always directly connected to the size of the data stored in these paths.
Two main factors mix up the output of du compared to any other storage usage information you might get (e. g. from Cassandra).
du might give out a smaller number than expected because of two reasons: ⓐ It combines hard links. This means that if the paths given to it contain hard linked files (I won't explain hard links here, but this term is a fixed one for Unixish operating systems so it can be looked up easily), these are counted only once while the files exist multiple times. ⓑ It is aware of sparse files; these are files which contain large (sometimes huge) areas of empty space (zero-bytes). In many Unixish file systems these can be stored efficiently, depending on how they have been created.
du might give out a larger number than expected because file systems have some overhead. To store a file of n bytes, n + h bytes need to be stored because of this. h depends on the file system and its configuration. The most important factor is that file systems typically store files in a block structure. If a file isn't exactly the size of a multiple of the block size of the file system, the last needed block is still allocated completely by this file, so some of its size if wasted. du will show the whole block as allocated because, in fact, it is.
So in your case Cassandra might talk about space occupied of 50GiB but a lot of it might be empty (never written-to) space. This might be stored in a sparse file on the file system which in fact only uses 2GiB of storage size (which du shows).

cache coherence protocol AMD Opteron chips (MOESI?)

If I may start with an example.
Say we have a system of 4 sockets, where each socket has 4 cores and each socket has 2GB RAM
ccNUMA (cache coherent non-uniform memory access) type of memory.
Let's say the 4 processes running are on each socket and all have some shared memory region allocated in P2's RAM denoted SHM. This means any load/store to that region will incur a lookup into the P2's directory, correct? If so, then... When that look up happens, is that an equivalent to accessing RAM in terms of latency? Where does this directory reside physically? (See below)
With a more concrete example:
Say P2 does a LOAD on SHM and that data is brought into P2's L3 cache with the tag '(O)wner'. Furthermore, say P4 does a LOAD on the same SHM. This will cause P4 to do a lookup into P2's directory, and since the data is tagged as Owned by P2 my question is:
Does P4 get SHM from P2's RAM or does it ALWAYS get the data from P2's L3 cache?
If it always gets the data from the L3 cache, wouldn't it be faster to get the data directly from P2's RAM? Since it already has to do a look up in P2's directory? And my understanding is that the directory is literally sitting on top of the RAM.
Sorry if I'm grossly misunderstanding what is going on here, but I hope someone can help clarify this.
Also, is there any data on how fast such a directory look up is? In terms of data retrieval is there documentation on the average latencies on such lookups? How many cycles on a L3 read-hit, read-miss, directory lookup? etc.
It depends on whether the Opteron processor implements the HT Assist mechanism.
If it does not, then there is no directory. In your example, when P4 issues a load, a memory request will arrive to P2 memory controller. P2 will answer back with the cache line and will also send a probe message to the other two cores. Finally, these other two cores will answer back to P4 with an ACK saying they do not have a copy of the cache line.
If HT Assist is enabled (typically for 6-core and higher sockets), then each L3 cache contains a snoop filter (directory) used to write down which cores are keeping a line. Thus, in your example, P4 will not send probe messages to the other two cores, as it looks up the HT Assist directory to find out that no one else has a copy of the line (this is a simplification, as the state of the line would be Exclusive instead of Owned and no directory lookup would be needed).

Finding what hard drive sectors occupy a file

I'm looking for a nice easy way to find what sectors occupy a given file. My language preference is C#.
From my A-Level Computing class I was taught that a hard drive has a lookup table on the first few KB of the disk. In this table there is a linked list for each file detailing what sectors that file occupies. So I'm hoping there's a convinient way to look in this table for a certain file and see what sectors it occupies.
I have tried Google'ing but I am finding nothing useful. Maybe I'm not searching for the right thing but I can't find anything at all.
Any help is appreciated, thanks.
About Drives
The physical geometry of modern hard drives is no longer directly accessible by the operating system. Early hard drives were simple enough that it was possible to address them according to their physical structure, cylinder-head-sector. Modern drives are much more complex and use systems like zone bit recording , in which not all tracks have the same amount of sectors. It's no longer practical to address them according to their physical geometry.
from the fdisk man page:
If possible, fdisk will obtain the disk geometry automatically. This is not necessarily the physical disk geometry (indeed, modern disks do not really have anything
like a physical geometry, certainly not something that can be described in simplistic Cylinders/Heads/Sectors form)
To get around this problem modern drives are addressed using Logical Block Addressing, which is what the operating system knows about. LBA is an addressing scheme where the entire disk is represented as a linear set of blocks, each block being a uniform amount of bytes (usually 512 or larger).
About Files
In order to understand where a "file" is located on a disk (at the LBA level) you will need to understand what a file is. This is going to be dependent on what file system you are using. In Unix style file systems there is a structure called an inode which describes a file. The inode stores all the attributes a file has and points to the LBA location of the actual data.
Ubuntu Example
Here's an example of finding the LBA location of file data.
First get your file's inode number
$ ls -i
659908 test.txt
Run the file system debugger. "yourPartition" will be something like sda1, it is the partition that your file system is located on.
$sudo debugfs /dev/yourPartition
debugfs: stat <659908>
Inode: 659908 Type: regular Mode: 0644 Flags: 0x80000
Generation: 3039230668 Version: 0x00000000:00000001
...
...
Size of extra inode fields: 28
EXTENTS:
(0): 266301
The number under "EXTENTS", 266301, is the logical block in the file system that your file is located on. If your file is large there will be multiple blocks listed. There's probably an easier way to get that number, I couldn't find one.
To validate that we have the right block use dd to read that block off the disk. To find out your file system block size, use dumpe2fs.
dumpe2fs -h /dev/yourPartition | grep "Block size"
Then put your block size in the ibs= parameter, and the extent logical block in the skip= parameter, and run dd like this:
sudo dd if=/dev/yourPartition of=success.txt ibs=4096 count=1 skip=266301
success.txt should now contain the original file's contents.
sudo hdparm --fibmap file
For ext, vfat and NTFS ..maybe more.
fibmap is also a linux C library.

Resources