How can a volume exist on more than one physical disk? - delphi

Here they say that calling DeviceIoControl with IOCTL_VOLUME_GET_VOLUME_DISK_EXTENTS control code, "retrieves the physical location of a specified volume on one or more disks." But from my 25 years of using compuers I know that a physical disk can have one or more volumes, not the other way around. I can't even imagine how a volume can exist on multiple physical disks. So, the question is, which are the cases when a volume exists on multiple disks?

Spanned Volume
A spanned volume combines areas of unallocated space from multiple disks into one logical volume, allowing you to more efficiently use all of the space and all the drive letters on a multiple-disk system.
Though it's only supported on dynamic disks
The following operations can be performed only on dynamic disks:
...
Extend a simple or spanned volume.

https://learn.microsoft.com/en-us/windows/win32/fileio/basic-and-dynamic-disks explains what you must at least have heard of in your 25 years: (software) RAIDs. A RAID 0 is basically your solution to the problem "no disk exists that is large enough for my needs".

Related

Flash memory raw data changes depending on the reading tool. Why?

I've been playing around with the raw data inside an 8GB Memory Stick, reading and writing directly into specific sectors, but for some reason changes don't remain consistent.
I've used Active # Disk Editor to write a string at a specific sector and it seems consistent when I read it through Active (it survives unmounting, rebooting...), but if I try to read it through terminal using dd and hexdump the outcome is different.
Some time ago I was researching ways to fully and effectively erase a disk and I read somewhere that solid state drives such as flash drives or SSDs have more memory than it's stated so its internals keep replacing parts of the memory in order to increase lifespan or something like that.
I don't know if it is because of that or if it's even correct. Could you tell me if I'm wrong or where to find good documentation about the subject?
Okay I just figured it out.
Apparently when you open a disk inside a Hex editor there's two ways you can go, you can open it as a physical disk (the whole disk) or as a logical disk, aka a volume or a partition.
Active # Disk Editor was opening it as a physical disk, while using dd and hexdump was dumping it as a logical disk. In other words it was dumping the content of the only partition inside the physical disk. This means there was an offset between the real physical sectors where I was writing data using Active and the ones that I was reading (quite a big offset, 2048 sectors of 512 bytes each).
So changes were being made, I was just looking at the wrong positions. Hope this saves someone a few minutes.

Are Read/Write operations allowed in RAID when a drive fails?

RAID 0/1/5 in question. I am trying to determine whether or not it is possible to continue reading or writing if ONE hard drive in the array fails.
For RAID 0 I assumed the entire array fails if one goes down (but is
it still possible to read from the existing?
For RAID 1 I assumed reading and writing continues as normal since
there is a mirror copy.
For Raid 5 I assumed that reading would continue as normal (just the information would have to potentially be reconstructed) and write should cease until a new drive is inserted
I am pretty sure that my assumptions for RAID 1 are correct, but I am uncertain on the behavior of RAID 0 and 5. Ideally you wouldn't touch RAID 5 until you could rebuild the new drive, but is it possible? Any resources on the subject would be awesome.
RAID 0: If one disk fails, the entire array fails (i.e. no longer can read from the existing).
RAID 1: If one disk fails, the array can continued to be read and written. After a new disk has been replaced, reconstruction of the RAID can occur online but performance would be impacted.
RAID 5: If one disk fails, the array can continued to be read and written. After a new disk has been replaced, reconstruction of the RAID can occur online but performance would be impacted.

Is executable code stored uniquely in tmpfs copied to another part of RAM when run?

An executable on disk needs to first have it's code and data sections loaded into RAM before it can be executed. When an executable is stored in tmpfs, it's already in RAM, so does the kernel bypass the step of loading the executable into RAM by just mapping the tmpfs pages into the processes address space? Does the answer apply to both executables and loaded libraries?
Your question appears to have been answered in a post on the Linux Kernel Mailing List in 2007
(As tmps is a scheme for storing in the filesystem cache code, with no backing storage, the mentioned buffer cache should I believe be the "original")
Phillip Susi asked:
The question is, when you execute a binary on tmpfs, does its code segment get
mapped directly where it's at in the buffer cache, or does it get copied to
another page for the executing process? At least, assuming this is possible
due to the vma and file offsets of the segment being aligned.
And Hugh Dickens replied
Its pages are mapped directly into the executing process, without copying.
You might want to read the full thread - a note is made that this depends on having a system with an MMU, and then the discussion veers into tmpfs's non-persistence.
Linux's copy-on-write behavior would I believe mean that any data page you write to would get a unique copy created for your process at the time of the first writing.

File Systems - Memory-Mapped Files

An example final question for my operating systems class:
Most operating systems support "memory-mapped files"; this describes files which are mapped into the address space of a running process. Reads and writes to the file are converted into memory reads and writes. We can imagine the existence of two new system calls, map() and unmap().
a) Consider map(); it accepts a file name and a virtual address, causing the operating system to map the file into the address space starting at the virtual address. Describe how the virtual memory system could be used to support this call.
b) Consider unmap(); it disassociates the file from the virtual address space. Describe the stats that should be taken to implement this system call. List all your assumptions.
c) In many UNIX systems, the inodes are kept at the start of disk. An alternative design is to allocate an inode when a file is created and put the inode at the start of the first block of the file. Discuss the pros and cons of this alternative.
d) What would happen if the bitmap or free list containing information about free disk blocks was completely lost due to a crash? Is there anyway to recover from this disaster, or is the disk no longer usable. Discuss your answer for a UNIX and FAT-style of disk-block allocation.
Any information of discussion on these questions is greatly appreciated.
for c) overheads to go retrieve the directory/file and a data especially true when allocating memory for new file,which results in looking up every inode and retrieving their file_size, permission,etc
good when large number of small files required lots of space this could cost a lot of memory in the start of the disk.

2 Files, Half the Content, vs. 1 File, Twice the Content, Which is Greater?

If I have 2 files each with this:
"Hello World" (x 1000)
Does that take up more space than 1 file with this:
"Hello World" (x 2000)
What are the drawbacks of dividing content into multiple smaller files (assuming there's reason to divide them into more files, not like this example)?
Update:
I'm using a Macbook Pro, 10.5. But I'd also like to know for Ubuntu Linux.
Marcelos gives the general performance case. I'd argue worrying about this is premature optimization. you should split things into different files where it is logical to split them.
also if you really care about file size of such repetitive files then you can compress them.
your example even hints at this, a simple run length encoding of
"Hello World"x1000
is much more space efficient than actually having "hello world" written out 1000 times.
Files take up space in the form of clusters on the disk. A cluster is a number of sectors, and the size depends on how the disk was formatted.
A typical size for clusters is 8 kilobytes. That would mean that the two smaller files would use two clusters (16 kilobytes) each and the larger file would use three clusters (24 kilobytes).
A file will by average use half a cluster more than it's size. So with a cluster size of 8 kilobytes each file will by average have an overhead of 4 kilobytes.
Most filesystems use a fixed-size cluster (4 kB is typical but not universal) for storing files. Files below this cluster size will all take up the same minimum amount.
Even above this size, the proportional wastage tends to be high when you have lots of small files. Ignoring skewness of size distribution (which makes things worse), the overall wastage is about half the cluster size times the number of files, so the fewer files you have for a given amount of data, the more efficiently you will store things.
Another consideration is that metadata operations, especially file deletion, can be very expensive, so again smaller files aren't your friends. Some interesting work was done in ReiserFS on this front until the author was jailed for murdering his wife (I don't know the current state of that project).
If you have the option, you can also tune the file sizes to always fill up a whole number of clusters, and then small files won't be a problem. This is usually too finicky to be worth it though, and there are other costs. For high-volume throughput, the optimal file size these days is between 64 MB and 256 MB (I think).
Practical advice: Stick your stuff in a database unless there are good reasons not to. SQLite substantially reduces the number of reasons.
I think the usage of file(s) is to take into consideration, according to the API and the language used to read/write them (and hence eventually API restrictions).
Fragmentation of the disk, that will tend to decrease with only big files, will penalize data access if you're reading one big file in one shot, whereas several access spaced out time to small files will not be penalized by fragmentation.
Most filesystems allocate space in units larger than a byte (typically 4KB nowadays). Effective file sizes get "rounded up" to the next multiple of that "cluster size". Therefore, dividing up a file will almost always consume more total space. And of course there's one extra entry in the directory, which may cause it to consume more space, and many file systems have an extra intermediate layer of inodes where each file consumes one entry.
What are the drawbacks of dividing
content into multiple smaller files
(assuming there's reason to divide
them into more files, not like this
example)?
More wasted space
The possibility of running out of inodes (in extreme cases)
On some filesystems: very bad performance when directories contain many files (because they're effectively unordered lists)
Content in a single file can usually be read sequentially (i.e. without having to move the read/write head) from the HD, which is the most efficient way. When it spans multiple files, this ideal case becomes much less likely.

Resources