I’m looking in the Mach-O structure and there is one bit which I am confused over.
I understand the basic structure of a macho file. I'm trying to programmatically read the bytes in the first TEXT section in the first TEXT segment, and I have a pointer to the start of the Mach-O header. I am trying to compute the appropriate offset to add to that pointer so it points to the bytes in the TEXT section.
In order to obtain the data from the sections in segments, I would have to “take the offset of the segment command in the file, add the size of the segment structure, and then loop through nsects times, incrementing the offset by the size of the section struct each time” as mentioned in this article here: https://h3adsh0tzz.com/2020/01/macho-file-format/
However, with reference to the same article, in the “Data” section at the bottom of the page, the article also mentions that the memory addresses are relative to the start of the data and not the start of the Mach-O. In that case, why did we need to calculate all the offsets above if it is relative to the start of the data and not the Mach-O header?
Edit: Just a note, I'm interested in reading the bytes both in memory and on disk.
Related
I've developed a Delphi service that writes a log to a file. Each entry is written to a new line. Once this logfile reaches a specific size limit, I'd like to trim the first X lines from the beginning of the file to keep its size below the specified limit. I've found some code here on SO which demonstrates how to delete chunks of data from the BOF, but how do I go about deleting full randomly sized lines and not given chunks?
First of all, this is not language tag spam, but this question not specific to one language in particulary and I think that this stackexchange site is the most appropriated for my question.
I'm working on cache and memory, trying to understand how it works.
What I don't understand is this sentence (in bold, not in the picture) :
In the MIPS architecture, since words are aligned to multiples of four
bytes, the least significant two bits are ignored when selecting a
word in the block.
So let's say I have this two adresses :
[1........0]10
[1........0]00
^
|
same 30 bits for boths [31-12] for the tag and [11-2] for the index (see figure below)
As I understand the first one will result in a MISS (I assume that the initial cache is empty). So one slot in the cache will be filled with the data located in this memory adress.
Now, we took the second one, since it has the same 30 bits, it will result in a HIT in the cache because we access the same slot (because of the same 10 bits) and the 20 bits of the adress are equals to the 20 bits stored in the Tag field.
So in result, we'll have the data located at the memory [1........0]10 and not [1........0]00 which is wrong !
So I assume this has to do with the sentence I quote above. Can anyone explain me why my reasoning is wrong ?
The cache in figure :
In the MIPS architecture, since words are aligned to multiples of four
bytes, the least significant two bits are ignored when selecting a
word in the block.
It just mean that in memory, my words are aligned like that :
So when selecting a word, I shouldn't care about the two last bits, because I'll load a word.
This two last bits will be useful for the processor when a load byte (lb) instruction will be performed, to correctly shift the data to get the one at the correct byte position.
I need to read and interpret a binary file containing a TIFF image. I know there exist readers for doing this but I want to go the hard way. I found the TIFF format description and need to parse the binary file in small chunks. Assume I was able to read in memory the complete binary file. This means that I have a variable containing one long list of bytes.
I know via the format definition what the meaning is of the different groups of n bytes.
How can one define character variables with different lengths (sometimes 2, sometimes 3, sometimes 4 etc.) so that the variable address points to the right position in the image variable array?
With other words, assume my image is loaded into an array Image containing all bytes of the file.
The first 2 bytes I want to load in a string with length 2 bytes so that I can just link the address pointer to the first position in the Image array and automatically the first 2 bytes are associated with the first character string. A second string of 4 bytes would have another meaning and so I make the address for the second string of 4 bytes point to the 3rd position of the Image array.
Is this feasible in C++? I remember that this was a normal way of working for dynamical memory allocation in Fortran 77 in a simulation code I analysed a long time ago.
Thanks in advance for the hints!
Regards,
Stefan
The C++ language is easily capable of processing TIFF files from a byte array. The idea you have in mind is basically correct, but there a few problems with it. C strings are zero-terminated and the strings which appear in TIFF files are not necessarily zero terminated since their length is specified explicitly. It really is simpler to create a dedicated data structure to hold the TIFF-specific data fields and then parse the binary data into the structure. Your method will immediately run into trouble with the Motorola/Intel byte issue if your machine has the opposite endian-ness.
what is the difference between .pag file and .ind file ?
I know the page file contains actual data means data-blocks and cells and index file holds the pointer of data block i.e. available in page file.
but is there any other difference ?regarding size?
As per my opinion size of page file is always larger than index file. Is it write?
If the size of Index file is larger than page file then what happened?If size of index file is larger than page file then is write?
If I have deleted the page file then it's affect to index file?
or
If I have deleted some data-block from page file then how is affect to index file?
You are correct about the page file including the actual data of the cube (although there is no data without the index, so in effect they are both the data).
Very typically the page files are bigger than the index. It's simply based on the number of dimensions and whether they are sparse or dense, the number of stored members in the dimensions, the density of the data blocks, the compression scheme used in the data blocks, and the number of index entries in the database.
It's not a requirement that one be larger than the other, it will simply depend on how you use the cube. I would advise you to not really worry about it unless you run into specific performance problems. At that point it is then useful, if for the purposes of optimizing retrieval, calc, or data load time, whether you should make a change to the configuration of the cube.
If you delete the page file it doesn't affect the index file necessarily, but you would lose all of the data in the cube. You would also lose the data if you just deleted all the index files. While the page files have data in them, as I mentioned, it is truly the combination of the page and index files that make up the data in the cube.
Under the right circumstances you can delete data from the database (such as doing a CLEARDATA operation) and you can reduce the size of the page files and/or the index. For example, deleting data such that you are clearing out some combination of sparse members may reduce the size of the index a bit as well as any data blocks associated with those index entries (that is, those particular combinations of sparse dimensions). It may be necessary to restructure and compact the cube in order for the size of the files to decrease. In fact, in some cases you can remove data and the size of the store files could grow.
Why does a process's address space have to divide into four segments (text, data, stack and heap)? What is the advandatage? is it possible to have only one whole big segment?
There are multiple reasons for splitting programs into parts in memory.
One of them is that instruction and data memories can be architecturally distinct and discontiguous, that is, read and written from/to using different instructions and circuitry inside and outside of the CPU, forming two different address spaces (i.e. reading code from address 0 and reading data from address 0 will typically return two different values, from different memories).
Another is reliability/security. You rarely want the program's code and constant data to change. Most of the time when that happens, it happens because something is wrong (either in the program itself or in its inputs, which may be maliciously constructed). You want to prevent that from happening and know if there are any attempts. Likewise you don't want the data areas that can change to be executable. If they are and there are security bugs in the program, the program can be easily forced to do something harmful when malicious code makes it into the program data areas as data and triggers those security bugs (e.g. buffer overflows).
Yet another is storage... In many programs a number of data areas aren't initialized at all or are initialized to one common predefined value (often 0). Memory has to be reserved for these data areas when the program is loaded and is about to start, but these areas don't need to be stored on the disk, because there's no meaningful data there.
On some systems you may have everything in one place (section/segment/etc). One notable example here is MSDOS, where .COM-style programs have no structure other than that they have to be less than about 64KB in size and the first executable instruction must appear at the very beginning of file and assume that its location corresponds to IP=0x100 (where IP is the instruction pointer register). How code and data are placed and interleaved in a .COM program is unimportant and up to the programmer.
There are other architectural artifacts such as x86 segments. Again, MSDOS is a good example of an OS that deals with them. .EXE-style programs in it may have multiple segments in them that correspond directly to the x86 CPU segments, to the real-mode addressing scheme, in which memory is viewed through 64KB-long "windows" known as segments. The position of these windows/segments is relative to the value of the CPU's segment registers. By altering the segment register values you can move the "windows". In order to access more than 64KB one needs to use different segment register values and that often implies having multiple segments in the .EXE (can be not just one segment for code and one for data, but also multiple segments for either of them).
At least the text and data segments are separated to prevent malicious code that's stored inside a variable from being run.
Instructions (compiled code) are stored in the text segment, while the contents of your variables are stored in a data segment, the latter of which never gets executed, only read from and written to.
A little more info here.
Isn't this distinction just a big, hacky workaround for patching security into the von-Neumann architecture where data and instructions share the same memory?