What is paging? - memory

Paging is explained here, slide #6 :
http://www.cs.ucc.ie/~grigoras/CS2506/Lecture_6.pdf
in my lecture notes, but I cannot for the life of me understand it. I know its a way of translating virtual addresses to physical addresses. So the virtual addresses, which are on disks are divided into chunks of 2^k. I am really confused after this. Can someone please explain it to me in simple terms?

Paging is, as you've noted, a type of virtual memory. To answer the question raised by #John Curtsy: it's covered separately from virtual memory in general because there are other types of virtual memory, although paging is now (by far) the most common.
Paged virtual memory is pretty simple: you split all of your physical memory up into blocks, mostly of equal size (though having a selection of two or three sizes is fairly common in practice). Making the blocks equal sized makes them interchangeable.
Then you have addressing. You start by breaking each address up into two pieces. One is an offset within a page. You normally use the least significant bits for that part. If you use (say) 4K pages, you need 12 bits for the offset. With (say) a 32-bit address space, that leaves 20 more bits.
From there, things are really a lot simpler than they initially seem. You basically build a small "descriptor" to describe each page of memory. This will have a linear address (the address used by the client application to address that memory), and a physical address for the memory, as well as a Present bit. There will (at least usually) be a few other things like permissions to indicate whether data in that page can be read, written, executed, etc.
Then, when client code uses an address, the CPU starts by breaking up the page offset from the rest of the address. It then takes the rest of the linear address, and looks through the page descriptors to find the physical address that goes with that linear address. Then, to address the physical memory, it uses the upper 20 bits of the physical address with the lower 12 bits of the linear address, and together they form the actual physical address that goes out on the processor pins and gets data from the memory chip.
Now, we get to the part where we get "true" virtual memory. When programs are using more memory than is actually available, the OS takes the data for some of those descriptors, and writes it out to the disk drive. It then clears the "Present" bit for that page of memory. The physical page of memory is now free for some other purpose.
When the client program tries to refer to that memory, the CPU checks that the Present bit is set. If it's not, the CPU raises an exception. When that happens, the CPU frees up a block of physical memory as above, reads the data for the current page back in from disk, and fills in the page descriptor with the address of the physical page where it's now located. When it's done all that, it returns from the exception, and the CPU restarts execution of the instruction that caused the exception to start with -- except now, the Present bit is set, so using the memory will work.
There is one more detail that you probably need to know: the page descriptors are normally arranged into page tables, and (the important part) you normally have a separate set of page tables for each process in the system (and another for the OS kernel itself). Having separate page tables for each process means that each process can use the same set of linear addresses, but those get mapped to different set of physical addresses as needed. You can also map the same physical memory to more than one process by just creating two separate page descriptors (one for each process) that contain the same physical address. Most OSes use this so that, for example, if you have two or three copies of the same program running, it'll really only have one copy of the executable code for that program in memory -- but it'll have two or three sets of page descriptors that point to that same code so all of them can use it without making separate copies for each.
Of course, I'm simplifying a lot -- quite a few complete (and often fairly large) books have been written about virtual memory. There's also a fair amount of variation among machines, with various embellishments added, minor changes in parameters made (e.g., whether a page is 4K or 8K), and so on. Nonetheless, this is at least a general idea of the core of what happens (and it's still at a high enough level to apply about equally to an ARM, x86, MIPS, SPARC, etc.)

Simply put, its a way of holding far more data than your address space would normally allow. I.e, if you have a 32 bit address space and 4 bit virtual address, you can hold (2^32)^(2^4) addresses (far more than a 32 bit address space).

Paging is a storage mechanism that allows OS to retrieve processes from the secondary storage into the main memory in the form of pages. In the Paging method, the main memory is divided into small fixed-size blocks of physical memory, which is called frames. The size of a frame should be kept the same as that of a page to have maximum utilization of the main memory and to avoid external fragmentation.

Related

Confusion on Memory Layout vs Memory Management Schemes

I was studying some operating system concepts, got a bit confused, and now have the following question...
Does the memory layout of a program in execution (ie. text, data, stack, heap) only make sense in context of it's virtual address space? If a program is organized ("laid" out) into these logical sections in it's virtual address space, don't these sections just get messed up as soon as addresses start getting converted from virtual to physical addresses using a memory management scheme like paging or segmentation?
As far as I'm aware, these two schemes allow for non-contiguous partitioning in the physical address space. So if my "text" section was from address 0 to 100 (random size I picked) in the virtual address space, and I choose to use paging, and my page sizes were 20 addresses in length each (ie there would be 5 pages for the text section), once these pages get placed in the physical address space non-contiguously (based on wherever free space is available), wouldn't the notion of a TEXT "section" kinda not make sense anymore (as it's been chunked and scattered)?
Lastly, are the variable-sized segments in segmentation that end up in the physical address space the exact same size as the logical categories (text, data, stack, heap) of the memory layout present in the virtual space? Is the only caveat here that in the physical space the segments are scattered non-contiguously (are not adjacent to one another) but still exist wholesomely within their specific category (ie all the "data" remains together/contiguous in the physical space)?
Any help and clarification is greatly appreciated, thank you!
Does the memory layout of a program in execution (ie. text, data, stack, heap) only make sense in context of it's virtual address space? If a program is organized ("laid" out) into these logical sections in it's virtual address space, don't these sections just get messed up as soon as addresses start getting converted from virtual to physical addresses using a memory management scheme like paging or segmentation?
That's correct. The sections are contiguous in virtual memory, but not contiguous in physical memory. This isn't an issue since the operating system maintains page tables; the processor's MMU uses those to translate virtual to physical addresses transparently on each access, and the operating system itself can use them to figure out which (scattered) physical pages to interact with e.g. when the process ends and its memory is to be reclaimed.
As far as I'm aware, these two schemes allow for non-contiguous partitioning in the physical address space. So if my "text" section was from address 0 to 100 (random size I picked) in the virtual address space, and I choose to use paging, and my page sizes were 20 addresses in length each (ie there would be 5 pages for the text section), once these pages get placed in the physical address space non-contiguously (based on wherever free space is available), wouldn't the notion of a TEXT "section" kinda not make sense anymore (as it's been chunked and scattered)?
The idea of a section is still applicable in contexts where virtual addresses are applicable. Your user-mode program deals with virtual addresses (i.e. pointers essentially are virtual addresses), and a lot of the operating system still deals with virtual addresses as well. The translation to scattered physical addresses done on-demand by the MMU, and only a subset of kernel code needs to deal with physical addresses.
An aside: Those aren't realistic sizes due to the overhead of bookkeeping for pages; a typical page size is 4096 bytes, and there are ways of creating larger pages on some platforms to reduce this overhead further.
Lastly, are the variable-sized segments in segmentation that end up in the physical address space the exact same size as the logical categories (text, data, stack, heap) of the memory layout present in the virtual space? Is the only caveat here that in the physical space the segments are scattered non-contiguously (are not adjacent to one another) but still exist wholesomely within their specific category (ie all the "data" remains together/contiguous in the physical space)?
Nope, they are scattered on a page-by-page basis and not every virtual page will be backed with a physical page of memory. An example of this is e.g. due to demand paging where a page only gets a physical backing lazily when one is actually needed. Pages of .text that haven't been used yet might not be loaded from disk until a pagefault actually induces the kernel to load them from disk.
Likewise if physical memory is scarce, unused pages might be evicted from virtual memory and be placed onto disk; when they're next accessed a pagefault will induce the kernel to load them back in from disk.
A virtual address might also map to a physical address that doesn't represent a physical page of DRAM memory on a DIMM somewhere. It's possible to map virtual addresses to physical addresses that represent memory-mapped IO, or a page of virtual memory might be shared between two processes as a form of cooperative communication.
There are further tricks done for the sake of optimization. For example, Linux's fork syscall doesn't copy pages; rather it sets up the page tables to enable a feature called copy on write, where pages are only copied when either the parent or child writes to them, and pages which are only read are shared between the two.

Committed vs Reserved Memory

According to "Windows Internals, Part 1" (7th Edition, Kindle version):
Pages in a process virtual address space are either free, reserved, committed, or shareable.
Focusing only on the reserved and committed pages, the first type is described in the same book:
Reserving memory means setting aside a range of contiguous virtual addresses for possible future use (such as an array) while consuming negligible system resources, and then committing portions of the reserved space as needed as the application runs. Or, if the size requirements are known in advance, a process can reserve and commit in the same function call.
Both reserving or committing will initially get you entries in the VADs (virtual address descriptors), but neither operation will touch the PTE (page table entries) structures. It used to cost PTEs for reserving before Windows 8.1, but not anymore.
As described above, reserved means blocking a range of virtual addresses, NOT blocking physical memory or paging file space at the OS level. The OS doesn't include this in the commit limit, therefore when the time comes to allocate this memory, you might get a surprise. It's important to note that reserving happens from the perspective of the process address space. It's not that there's any physical resource reserved - there's no stamping of "no vacancy" against RAM space or page file(s).
The analogy with plots of land might be missing something: take reserved as the area of land surrounded by wooden poles, thus letting others now that the land is taken. But how about committed ? It can't be land on which structures (eg houses) have already been build, since those would require PTEs and there's none there yet, since we haven't accessed anything. It's only when touching committed data that the PTEs will get built, which will make the pages available to the process.
The main problem is that committed memory - at least in its initial state - is functionally very much alike reserved memory. It's just an area blocked within VADs. Try to touch one of the addresses, and you'll get an access violation exception for a reserved address:
Attempting to access free or reserved memory results in an access violation exception because the page isn’t mapped to any storage that can resolve the reference
...and an initial page fault for a committed one (immediately followed by the required PTE entries being created).
Back to the land analogy, once houses are build, that patch of land is still committed. Yet this is a bit peculiar, since it was still committed when the original grass was there, before the very first shovel was excavated to start construction. It resembled the same state as that of a reserved patch. Maybe it would be better to think of it like terrain eligible for construction. Eg you have a permit to build (albeit you might never build as much as a wall on that patch of land).
What would be the reasons for using one type of memory versus the other ? There's at least one: the OS guarantees that there will be room to allocate committed memory, should that ever occur in the future, but doesn't guarantee anything for reserved memory aside from blocking that process' address space range. The only downside for committed memory is that one or more paging files might need to be extended in size as to be able to make the commit limit take into account the recently allocated block, so should the requester demand the use of part of all the data in the future, the OS can provide access to it.
I can't really think how the land analogy can capture this detail of "guarantee". After all, the reserved patch also physically existed, covered by the same grass as a committed one in its pristine state.
The stack is another scenario where reserved and committed memory are used together:
When a thread is created, the memory manager automatically reserves a predetermined amount of virtual memory, which by default is 1 MB.[...] Although 1 MB is reserved, only the first page of the stack will be committed [...]
along with a guard page. When a thread’s stack grows large enough to touch the guard page, an exception occurs, causing an attempt to allocate another guard. Through this mechanism, a user stack doesn’t immediately consume all 1 MB of committed memory but instead grows with demand."
There is an answer here that deals with why one would want to use reserved memory as opposed to committed . It involves storing continuously expanding data - which is actually the stack model described above - and having specific absolute address ranges available when needed (although I'm not sure why one would want to do that within a process).
Ok, what am I actually asking ?
What would be a good analogy for the reserved/committed concept ?
Any other reason aside those depicted above that would mandate the
use of reserved memory ? Are there any interesting use cases when
resorting to reserved memory is a smart move ?
Your question hits upon the difference between logical memory translation and virtual memory translation. While CPU documentation likes to conflate these two concepts, they are different in practice.
If you look at logical memory translation, there are are only two states for a page. Using your terminology, they are FREE and COMMITTED. A free page is one that has no mapping to a physical page frame and a COMMITTED page has such a mapping.
In a virtual memory system, the operating system has to maintain a copy of the address space in secondary storage. How this is done depends upon the operating system. Typically, a process will have its mapping to several different files for secondary storage. The operating system divides the address space into what is usually called a SECTION.
For example, the code and read only data could be stored virtually as one or more SECTIONS in the executable file. Code and static data in shared libraries could each be in a different section that are paged to the shared libraries. You might have a map to a shared filed to the process that uses memory that can be accessed by multiple processes that forms another section. Most of the read/write data is likely to be in a page file in one or more sections. How the operating system tracks where it virtually stores each section of data is system dependent.
For windows, that gives the definition of one of your terms: Sharable. A sharable section is one where a range of addresses can be mapped to different processes, at different (or possibly the same) logical addresses.
Your last term is then RESERVED. If you look at the Windows' VirtualAlloc function documentation, you can see that (among your options) you can RESERVE or COMMIT. If you reserve you are creating a section of VIRTUAL MEMORY that has no mapping to physical memory.
This RESERVE/COMMIT model is Windows-specific (although other operating systems may do the same). The likely reason was to save disk space. When Windows NT was developed, 600MB drives the size of washing machine were still in use.
In these days of 64-bit address spaces, this system works well for (as you say) expanding data. In theory, an exception handler for a stack overrun can simply expand the stack. Reserving 4GB of memory takes no more resources than reserving a single page (which would not be practicable in a 32-bit system—see above). If you have 20 threads, this makes reserving stack space efficient.
What would be a good analogy for the reserved/committed concept ?
One could say RESERVE is like buying options to buy and COMMIT is exercising the option.
Any other reason aside those depicted above that would mandate the use of reserved memory ? Are there any interesting use cases when resorting to reserved memory is a smart move ?
IMHO, the most likely places to RESERVE without COMMITTING are for creating stacks and heaps with the former being the most important.

What makes a TLB faster than a Page Table if they both require two memory accesses?

Just going off wikipedia:
The page table, generally stored in main memory, keeps track of where the virtual pages are stored in the physical memory. This method uses two memory accesses (one for the page table entry, one for the byte) to access a byte. First, the page table is looked up for the frame number. Second, the frame number with the page offset gives the actual address. Thus any straightforward virtual memory scheme would have the effect of doubling the memory access time. Hence, the TLB is used to reduce the time taken to access the memory locations in the page table method.
So given that, what I'm curious about is why the TLB is actually faster because from what I know it's just a smaller, exact copy of the page table.
You still need to access the TLB to find the physical address, and then once you have that, you still need to actually access the data at the physical address, which is two lookups just like with the page table.
I can only think of two reasons why the TLB is faster:
looking up an address in the TLB or page table is not O(n) (I assumed it's O(1) like a hash table). Thus, since the TLB is much smaller, it's faster to do a lookup. Also in this case, why not just use a hash table instead of a TLB?
I incorrectly interpreted how the TLB works, and it's not actually doing two accesses.
I realize it has been three years since this question was asked, but since it is still just as relevant, and it still shows up in search engines, I'll try my best to produce a complete answer.
Accessing the main memory through the TLB rather than the page table is faster primarily for two reasons:
1. The TLB is faster than main memory (which is where the page table resides).
The typical access time is in the order of <1 ns for the TLB and 100 ns for main memory
A TLB access is part of an L1 cache hit, and modern CPUs can do 2 loads per clock if they both hit in L1d cache.
The reasons for this are twofold:
The TLB is located within the CPU, while main memory - and thus the page table - is not.
The TLB - like other caches - is made of fast and expensive SRAM, whereas main memory usually consists of slow and inexpensive DRAM (read more here).
Thus, if the supposition that both the TLB and page table require only one memory access was correct, a TLB hit would still, roughly speaking, halve memory access time. However, as we shall see next, the supposition is not correct, and the benefit of having a TLB is even greater.
2. Accessing the page table usually requires multiple memory accesses.
This really is the crux of the issue.
Modern CPUs tend to use multilevel page tables in order to save memory. Most notably, x86-64 page tables currently consist of up to four levels (and a fifth may be coming). This means that accessing a single byte in memory through the page table requires up to five memory accesses: four for the page table and one for the data. Obviously the cost would be unbearably high if not for the TLB; it is easy to see why CPU and OS engineers put in a lot of effort to minimize the frequency of TLB misses.
Finally, do note that even this explanation is somewhat of a simplification, as it ignores, among other things, data caching. The detailed mechanics of modern desktop CPUs are complex and, to a degree, undisclosed. For a more detailed discussion on the topic, refer to this thread, for instance.
Page-table accesses can and are cached by data caches on modern CPUs, but the next access in a page-walk depends on the result of the first access (a pointer to the next level of the page table), so a 4-level page walk would have about 4x 4 cycle = 16 cycle latency even if all accesses hit in L1d cache. That would be a lot more for the pipeline to hide than the ~3 to 4 cycle TLB latency that's part of an L1d cache hit load in a modern Intel CPU (which of course uses TLBs for data and instruction accesses).
You are right in your assumption that approach with TLB still requires 2 accesses. But the approach with TLB is faster because:
TLB is made of faster memory called associative memory
Usually we make 2 memory accesses to physical memory but with TLB there is 1 access to TLB and other access is to physical memory.
Associative memory is faster because it is content addressable memory but its expensive too , because of the extra logic circuits required.
You can read about the content addressable memory here.
It depends upon the specific implementation. In general, the TLB is a cache that exists within the CPU.
You still need to access the TLB to find the physical address, and then once you have that, you still need to actually access the data at the physical address, which is two lookups just like with the page table.
The CPU can access the cache much faster than it can access data through the memory bus. It is making two accesses to two different places (one faster and one slower). Also, it is possible for the memory location to be cached within the CPU as well, in which case no accesses are required to go through the memory bus.
I think #ihohen said it pretty much but as a student to future students may come here, in simple words an explanation is:
" Without a TLB in a single level paging you need 2 accesses to main memory:
1 for finding the translation of the logical adress in the page table (which is placed in main memory) and 1 another for actually accessing the memory block ".
Now with a TLB , you reduce the above only to one access (the second one) because the step of finding the translation (hopefully) will take place without needing to access main memory because you will find the translation in the TLB which placed in cpu ".
So when we say that a TLB reduces access time by 2 , we mean that approximately if we ignore the case of a TLB miss, and consider the simplest model of paging (the single level one) then is fair to say that a TLB speeds up the process by 2.
There will be many variations, because first and foremost today's computers will use advanced paging techniques (multilevel, demand paging e.t.c) but this sentence is an
intuitive explanation as to why the idea of TLB is much more helpful than a simple page table.
The book "Operating Systems " by Silberschatz states another (a little bit more detailed) math type to measure access time with a TLB:
Consider:
h : TLB hit ratio
τ : time to access main memory
e : time spend searching to find TLB registration
t = h * (e + τ) + (1-h)*(e + 2τ)

memory segments and physical RAM [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
The memory map of a process appears to be fragmented into segments (stack, heap, bss, data, and text),
I was wondering are these segments just an abstraction for the
convenience of the process and the physical RAM is just a linear array
of addresses or is the physical RAM also fragmented into these
segments?
Also if the RAM is not fragmented and is just a linear array then how
does the OS provide the process the abstraction of these segments?
Also how would programming change if the memory map to a process appeared as just a linear array and not divided into segments (with the MMU translating virtual addresses into physical ones)?
In a modern OS supporting virtual memory, it is the address space of the process that is divided into these segments. And in general case that address space of the process is projected onto the physical RAM in a completely random fashion (with some fixed granularity, 4K typically). Address space pages located next to each other do not have to be projected into the neighboring physical pages of RAM. Physical pages of RAM do not have to maintain the same relative order as the process's address space pages. This all means that there is no such separation into segments in RAM and there can't possibly be.
In order to optimize memory access an OS might (and typically will) try to map sequential pages of the process address space to sequential pages in RAM, but that's just an optimization. In general case, the mapping is unpredictable. On top of that the RAM is shared by all processes in the system, with RAM pages belonging to different processes being arbitrarily interleaved in RAM, which eliminates any possibility of having such "segments" in RAM. There's no process-specific ordering or segmentation in RAM. RAM is just a cache for virtual memory mechanism.
Again, every process works with its own virtual address space. This is where these segments can exist. The process has no direct access to RAM. The process doesn't even need to know that RAM exists.
These segments are largely a convenience for the program loader and operating system (though they also provide a basis for coarse-grained protection; execution permission can be limited to text and writes prohibited from rodata).1
The physical memory address space might be fragmented but not for the sake of such application segments. For example, in a NUMA system it might be convenient for hardware to use specific bits to indicate which node owns a given physical address.
For a system using address translation, the OS can somewhat arbitrarily place the segments in physical memory. (With segmented translation, external fragmentation can be a problem; a contiguous range of physical memory addresses may not be available, requiring expensive moving of memory segments. With paged translation, external fragmentation is not a possible. Segmented translation has the advantage of requiring less translation information: each segment requiring only a base and bound with other metadata whereas a memory section would typically have many more than two pages each of which has a base address and metadata.)
Without address translation, placement of segments would necessarily be less arbitrary. Fortunately, most programs do not care about the specific address where segments are placed. (Single address space OSes
(Note that it can be convenient for sharable sections to be in fixed locations. For code this can be used to avoid indirection through a global offset table without requiring binary rewriting in the program loader/dynamic linker. This can also reduce address translation overhead.)
Application-level programming is generally sufficiently abstracted from such segmentation that its existence is not noticeable. However, pure abstractions are naturally unfriendly to intense optimization for physical resource use, including execution time.
In addition, a programming system may choose to use a more complex placement of data (without the application programmer needing to know the implementation details). For example, use of coroutines may encourage using a cactus/spaghetti stack where contiguity is not expected. Similarly, a garbage collecting runtime might provide additional divisions of the address space, not only for nurseries but also for separating leaf objects, which have no references to collectable memory, from non-leaf objects (reducing the overhead of mark/sweep). It is also not especially unusual to provide two stack segments, one for data whose address is not taken (or at least is fixed in size) and one for other data.
1One traditional layout of these segments (with a downward growing stack) in a flat virtual address space for Unix-like OSes places text at the lowest address, rodata immediate above that, initialized data immediately above that, zero-initialized data (bss) immediately above that, heap growing upward from the top of bss, and stack growing downward from the top of the application's portion of the virtual address space.
Having heap and stack growing toward each other allows arbitrary growth of each (for a single thread using that address space!). This placement also allows a program loader to simply copy the program file into memory starting at the lowest address, groups memory by permission, and can sometimes allow a single global pointer to address all of the global/static data range (rodata, data, and bss).
The memory map to a process appears fragmented into segments (stack, heap, bss, data, and text)
That's the basic mapping used by Unix; other operating systems use different schemes. Generally, though, they split the process memory space into separate segments for executing code, stack, data, and heap data.
I was wondering are these segments are just abstraction for the processes for convience and the physical RAM is just a linear array of addresses or the physical RAM is also fragmented into these segments?
Depends.
Yes, these segments are created and managed by the OS for the benefit of the process. But physical memory can be arranged as linear addresses, or banked segments, or non-contiguous blocks of RAM. It's up to the OS to manage the total system memory space so that each process can access its own portion of it.
Virtual memory adds yet another layer of abstraction, so that what looks like linear memory locations are in fact mapped to separate pages of RAM, which could be anywhere in the physical address space.
Also if the RAM is not fragmanted and is just a linear array then how the OS provides the process the abstraction of these segments?
The OS manages all of this by using virtual memory mapping hardware. Each process sees contiguous memory areas for its code, data, stack, and heap segments. But in reality, the OS maps the pages within each of these segments to physical pages of RAM. So two identical running processes will see the same virtual address space composed of contiguous memory segments, but the memory pages comprising these segments will be mapped to entirely different physical RAM pages.
But bear in mind that physical RAM may not actually be one contiguous block of memory, but may in fact be split across multiple non-adjacent blocks or memory banks. It is up to the OS to manage all of this in a way that is transparent to the processes.
Also how the programming would change if the memory map to a process would appear just as a linear array and not divided into segments?, and then the MMU would just translate these virtual addresses into physical ones.
The MMU always operates that way, translating virtual memory addresses into physical memory addresses. The OS sets up and manages the mapping of each page of each segment for each process. Each time the process exceeds its stack allocation, for example, the OS traps a segment fault and adds another page to the process's stack segment, mapping the virtual page to a physical page selected from available memory.
Virtual memory also allows the OS to swap out process pages temporarily to disk, so that the total amount of virtual memory occupied by all of the running processes can easily exceed the actual physical memory RAM space of a system. Only the currently active executing processes actually have access to real physical RAM pages.
I was wondering are these segments are just abstraction for the
processes for convience and the physical RAM is just a linear array of
addresses or the physical RAM is also fragmented into these segments?
This in fact highly depends on architecture. Some will have hardware tools (e.g. descriptor registers for x86) to split the RAM into segments. Others just keep this information in software (OS kernel information for this process). Also some segments information are totally irrelevant on execution, they're used merely for code/data loading (e.g. relocation segments).
Also if the RAM is not fragmanted and is just a linear array then how
the OS provides the process the abstraction of these segments?
Process code never references to segments, he only knows about addresses, so the OS has nothing to abstract.
Also how the programming would change if the memory map to a process
would appear just as a linear array and not divided into segments?,
and then the MMU would just translate these virtual addresses into
physical ones
Programming would not be affected. When you program in C you don't define any of these segments, and code also doesn't reference these segments. These segments are to keep an ordered layout, and don't even need to be the same across OS.

memory management and segmentation faults in modern day systems (Linux)

In modern-day operating systems, memory is available as an abstracted resource. A process is exposed to a virtual address space (which is independent from address space of all other processes) and a whole mechanism exists for mapping any virtual address to some actual physical address.
My doubt is:
If each process has its own address space, then it should be free to access any address in the same. So apart from permission restricted sections like that of .data, .bss, .text etc, one should be free to change value at any address. But this usually gives segmentation fault, why?
For acquiring the dynamic memory, we need to do a malloc. If the whole virtual space is made available to a process, then why can't it directly access it?
Different runs of a program results in different addresses for variables (both on stack and heap). Why is it so, when the environments for each run is same? Does it not affect the amount of addressable memory available for usage? (Does it have something to do with address space randomization?)
Some links on memory allocation (e.g. in heap).
The data available at different places is very confusing, as they talk about old and modern times, often not distinguishing between them. It would be helpful if someone could clarify the doubts while keeping modern systems in mind, say Linux.
Thanks.
Technically, the operating system is able to allocate any memory page on access, but there are important reasons why it shouldn't or can't:
different memory regions serve different purposes.
code. It can be read and executed, but shouldn't be written to.
literals (strings, const arrays). This memory is read-only and should be.
the heap. It can be read and written, but not executed.
the thread stack. There is no reason for two threads to access each other's stack, so the OS might as well forbid that. Moreover, the tread stack can be de-allocated when the tread ends.
memory-mapped files. Any changes to this region should affect a specific file. If the file is open for reading, the same memory page may be shared between processes because it's read-only.
the kernel space. Normally the application should not (or can not) access that region - only kernel code can. It's basically a scratch space for the kernel and it's shared between processes. The network buffer may reside there, so that it's always available for writes, no matter when the packet arrives.
...
The OS might assume that all unrecognised memory access is an attempt to allocate more heap space, but:
if an application touches the kernel memory from user code, it must be killed. On 32-bit Windows, all memory above 1<<31 (top bit set) or above 3<<30 (top two bits set) is kernel memory. You should not assume any unallocated memory region is in the user space.
if an application thinks about using a memory region but doesn't tell the OS, the OS may allocate something else to that memory (OS: sure, your file is at 0x12341234; App: but I wanted to store my data there). You could tell the OS by touching the end of your array (which is unreliable anyways), but it's easier to just call an OS function. It's just a good idea that the function call is "give me 10MB of heap", not "give me 10MB of heap starting at 0x12345678"
If the application allocates memory by using it then it typically does not de-allocate at all. This can be problematic as the OS still has to hold the unused pages (but the Java Virtual Machine does not de-allocate either, so hey).
Different runs of a program results in different addresses for variables
This is called memory layout randomisation and is used, alongside of proper permissions (stack space is not executable), to make buffer overflow attacks much more difficult. You can still kill the app, but not execute arbitrary code.
Some links on memory allocation (e.g. in heap).
Do you mean, what algorithm the allocator uses? The easiest algorithm is to always allocate at the soonest available position and link from each memory block to the next and store the flag if it's a free block or used block. More advanced algorithms always allocate blocks at the size of a power of two or a multiple of some fixed size to prevent memory fragmentation (lots of small free blocks) or link the blocks in a different structures to find a free block of sufficient size faster.
An even simpler approach is to never de-allocate and just point to the first (and only) free block and holds its size. If the remaining space is too small, throw it away and ask the OS for a new one.
There's nothing magical about memory allocators. All they do is to:
ask the OS for a large region and
partition it to smaller chunks
without
wasting too much space or
taking too long.
Anyways, the Wikipedia article about memory allocation is http://en.wikipedia.org/wiki/Memory_management .
One interesting algorithm is called "(binary) buddy blocks". It holds several pools of a power-of-two size and splits them recursively into smaller regions. Each region is then either fully allocated, fully free or split in two regions (buddies) that are not both fully free. If it's split, then one byte suffices to hold the size of the largest free block within this block.

Resources