Mapping and allocating - memory

I am little confused with term mapping, for example, when we say mapping memory for database, it means that we assigning specific amount of memory at some memory location to that database?
Also is allocating memory synonym for reserving memory?
Very often I encounter these two terms, and they aren't so clear to me.
If someone can clarify these two terms, I will be very thankful.

This might be a question better asked to the software community at stackoverflow. However, I am a CS.
I would say that terms aren't always used accurately and precisely.
In general allocating memory is making memory available to a program for an active purpose, such as allocating memory for buffers to hold a file or in in-memory structure now.
Reserving memory is often used to mean the same thing. However, it is sometimes more passive. For example reserving memory in case their is a future requirement, or protecting against too much memory allocation for a different purpose.
Often when the term 'mapping' is used, it is for a file. It may mean exactly the same as allocating. Or it means more; mapping may be using an underlying mechanism provided by virtual memory management systems, where part of virtual memory is 'mapped' to the file, without actually reading the file into physical memory. The trick is, as the memory-mapped file is accessed, the block/page being accessed is read in 'invisibly' to the process when necessary. This uses a mechanism called demand paging. It's benefit is a program can access the file as if it is all read into memory, but only the parts actually accessed are retrieved from the persistent storage system (disk, flash, whatever), which can be a huge win if only small parts of the file are needed.
Further, it simplifies the program, which can be written as if the whole file is in memory. Instead of the application developer trying to keep track of which parts of the file have been loaded into memory, the operating system does that instead.
Even better, the Operating system can be asked to track which blocks/pages have their contents changed, and it can be asked to periodically write that back out to persistent storage. This can even further simplify the application program.
This is popular with some databases.

Mapping basically means assigning. Except we often want a 1 to 1 mapping in the case of functions. If you define the function of an object, physical or just logical, and define it's relationships and how it changes under transformation then you have mapped it.

Related

How to access heap space of a program by using another program?

Consider I have a case where i have executed a program and created an instance of a class,
MyClass mClass = new MyClass()
After the execution the Reference Space will be stored in the Heap Space. Now I want to write a Program which can access the Heap Space to retrieve the data from the Previously created instance of another program.
Can I do it?
Thank you
As far as I know, in practice, no. If you give the other program admin privileges, you can read the memory of another program, but as far as I know there is no way to know for sure where the heap of that program resides. (There are probably hacky ways to achieve this, but it's not going to be pretty or reliable).
However, it is possible for a process to establish a region of shared memory that another process can read, or use sockets, but this requires co-operation between the processes. Also, it still doesn't give a process direct access to another process' heap - your program can only see what the other process lets it see.
Note that, while you can't change the behavior of new (as far as I know), there is nothing preventing you from writing code to manage your heap manually; in that sense it would be possible to place the heap directly in the shared memory region. Whether that would be wise or not is another question and, obviously, highly dependent on the context.
In order to fully understand how operating systems manage memory, you'll have to understand virtual memory and memory management hardware (you'll probably want to go deeper than Wikipedia, though) .
Well i think you can use sockets for accessing the Heap Space. I hope this works.

Can RAM be conceptually divided into 'stack' and 'heap memory'?

I have been reading up on how processes are implemented by computers, and found mention of stacks and heaps. This was pretty cool to me, since a lack of any kind of computer science background means that these things have been pretty esoteric to me.
My current understanding is that, on an extremely basic level, a process is represented in RAM as a fix-sized 'Stack' and associated variably-sized data structures that collectively are known as 'heap' space. Frames are added to the stack that may result in creating, editing or deleting data stored in heap space thereby changing a processes 'state').
So my question is, can all RAM usage be categorized either as being part of a heap or part of a stack?
What else could be stored in RAM that doesn't fall into either of these categories?
Yes, in extremely simple systems, especially old and not supporting multitasking, all user memory can be used by a combination of user code, user data, user heap data, user stack data. It's up to the programmer what to use memory for. The memory doesn't mind. But, naturally, there needs to be code and, in most cases, stack. Everything else is optional.

Constant memory dumps

Is it possible to constantly dump the memory of a process to record every change that is happening? For example if I have a program that modifies the contents of an array I'd like to know the contents of that array before some modification. I imagine a program could save the initial memory and then all changes in a file and I'd just search the file by the modified contents of the array which I know. Then I'd look for changes in that specific memory location before that moment and find the initial contents.
Does a program like that exist? If so, what program would you recommend?
EDIT: I wrote a program in C++ that captures packets of another process using pcap and I would like to know how these packets are constructed inside that program. I'm using Windows.
Notice that memory content is (or may be) changing a lot faster than what a disk is capable of writing.
Also, your question is OS specific. I guess that you are using Linux.
In all cases, design your application very early with your goals.
Perhaps you are looking for application checkpointing. If on Linux, consider BLCR.
Perhaps you are looking for some persistence mechanism. A possible way might be to explicitly persist the state of your application at some points in your program, which are executed frequently. Persistence of the call stack or of continuations is a difficult issue
You may want to use textual formats (like JSON) for serialization. You could be interested in database technology, either relational-SQL (e.g. Sqlite or PostGreSQL) or noSQL mongodb
Persistence and checkpointing may be related to garbage collection algorithms (notably copying GC).
Some language implementations are able to persist their entire heap. For example, in Common Lisp, the SBCL implementation offers save-lisp-and-die
For debugging, you might want watchpoints, or the gcore(1) command.
Notice that if you fork(2) your process and sleep or idle immediately the child process you are keeping in that child process a snapshot of your address space.
Read also about transactional memory & ACID properties

In what situations Static Allocation fares better than Dynamic Allocation?

I was going through some of the decisions made to make Xara Xtreme, an open source SVG graphics application. Their memory management decision was quite intriguing to me since I naively took it for granted that on-demand dynamic allocation as the way of writing object oriented application.
The explanation from the documentation is
How on earth can static allocations be efficient?
If you are used to large dynamic data structures, this may seem strange
to you. Firstly, all our objects (and
thus allocation size) are far smaller
(on average) than each dynamic area
allocation within a program such as
Impression. This means that though
there are likely to be many holes
within memory, they are small. Also,
we have far more allocated objects
within memory, and thus these holes
quickly get filled. Furthermore,
virtual memory managers will free up
any pages of memory that contain no
allocations and give this memory back
to the operating system so that it may
be used again (either by us, or by
another task).
We benefit greatly from
the fact that whenever we allocate
memory in this manner, we do not have
to move any memory about. This proved
a bottleneck in ArtWorks which also
had many small allocations being used
concurrently. more
In brief, the presence of plenty of small objects and the need to prevent memory move are the reasons given for choosing static allocation. I don't have clear understanding about the reasons mentioned.
Though this talks about static allocation, what I see from the cursory look at the code is that a block of memory is dynamically allocated at the application start and kept alive till the application ends, roughly simulating static allocation.
Could you explain in what situations Static Allocation fares better than on-demand Dynamic Allocation in order to consider it as the main mode of allocation in a serious applications?
It's quicker because you avoid the overhead of calling a system routine to manage your storage. malloc() maintains a heap, so every request requires a scan for an appropriately-sized block, possibly resizing the block, updating the block list to mark this block as used, etc. If you're allocating a lot of small objects, this overhead can be excessive. With static allocation you can create an allocation pool and just maintain a simple bitmap to show which areas are in use. This assumes that each object is the same size, so you commonly create one pool per object type.
In short, there's really no such thing as static allocation other than the space allocated for your functions themselves and other read-only kinds of memory. (Do an assemble-only "gcc -S" and look for all the memory blocks, if you're interested.) If you're making and breaking objects, you're dynamically allocating. That being said, there's nothing to stop you from tightly controlling the allocation mechanism itself.
That's what functions like mallinfo() and mallopt() do for controlling how malloc() does its magic. However, that might not even be good enough for you. If you know all your chunks are going to be the same size, you can allocate and deallocate much more efficiently. And if you know you have 3 sizes of stuff, you can keep 3 arenas of memory each with their own allocator.
On top of this, you have the situation at runtime where the process doesn't have enough room and needs to ask the os for more - that involves a system call that is more expensive than just incrementing an array index. On unix, it's usually brk() or sbrk() or the like. And that can take valuable time.
Another, rarer situation, would be if you need to multiply-allocate things. Like 3 threads need to share information and only when all 3 release it does it get freed. That's something nonstandard and not generally covered by typical mallopt() or even pthread-specific memory or mutex/semaphore-locked chunks.
So if you have high speed optimization issues or you are running on an embedded system where you need to squeeze all you can out of the available memory, then "static allocation", or at least controlling the allocation mechanism, may be the way to go.

How does shared memory vs message passing handle large data structures?

In looking at Go and Erlang's approach to concurrency, I noticed that they both rely on message passing.
This approach obviously alleviates the need for complex locks because there is no shared state.
However, consider the case of many clients wanting parallel read-only access to a single large data structure in memory -- like a suffix array.
My questions:
Will using shared state be faster and use less memory than message passing, as locks will mostly be unnecessary because the data is read-only, and only needs to exist in a single location?
How would this problem be approached in a message passing context? Would there be a single process with access to the data structure and clients would simply need to sequentially request data from it? Or, if possible, would the data be chunked to create several processes that hold chunks?
Given the architecture of modern CPUs & memory, is there much difference between the two solutions -- i.e., can shared memory be read in parallel by multiple cores -- meaning there is no hardware bottleneck that would otherwise make both implementations roughly perform the same?
One thing to realise is that the Erlang concurrency model does NOT really specify that the data in messages must be copied between processes, it states that sending messages is the only way to communicate and that there is no shared state. As all data is immutable, which is fundamental, then an implementation may very well not copy the data but just send a reference to it. Or may use a combination of both methods. As always, there is no best solution and there are trade-offs to be made when choosing how to do it.
The BEAM uses copying, except for large binaries where it sends a reference.
Yes, shared state could be faster in this case. But only if you can forgo the locks, and this is only doable if it's absolutely read-only. if it's 'mostly read-only' then you need a lock (unless you manage to write lock-free structures, be warned that they're even trickier than locks), and then you'd be hard-pressed to make it perform as fast as a good message-passing architecture.
Yes, you could write a 'server process' to share it. With really lightweight processes, it's no more heavy than writing a small API to access the data. Think like an object (in OOP sense) that 'owns' the data. Splitting the data in chunks to enhance parallelism (called 'sharding' in DB circles) helps in big cases (or if the data is on slow storage).
Even if NUMA is getting mainstream, you still have more and more cores per NUMA cell. And a big difference is that a message can be passed between just two cores, while a lock has to be flushed from cache on ALL cores, limiting it to the inter-cell bus latency (even slower than RAM access). If anything, shared-state/locks is getting more and more unfeasible.
in short.... get used to message passing and server processes, it's all the rage.
Edit: revisiting this answer, I want to add about a phrase found on Go's documentation:
share memory by communicating, don't communicate by sharing memory.
the idea is: when you have a block of memory shared between threads, the typical way to avoid concurrent access is to use a lock to arbitrate. The Go style is to pass a message with the reference, a thread only accesses the memory when receiving the message. It relies on some measure of programmer discipline; but results in very clean-looking code that can be easily proofread, so it's relatively easy to debug.
the advantage is that you don't have to copy big blocks of data on every message, and don't have to effectively flush down caches as on some lock implementations. It's still somewhat early to say if the style leads to higher performance designs or not. (specially since current Go runtime is somewhat naive on thread scheduling)
In Erlang, all values are immutable - so there's no need to copy a message when it's sent between processes, as it cannot be modified anyway.
In Go, message passing is by convention - there's nothing to prevent you sending someone a pointer over a channel, then modifying the data pointed to, only convention, so once again there's no need to copy the message.
Most modern processors use variants of the MESI protocol. Because of the shared state, Passing read-only data between different threads is very cheap. Modified shared data is very expensive though, because all other caches that store this cache line must invalidate it.
So if you have read-only data, it is very cheap to share it between threads instead of copying with messages. If you have read-mostly data, it can be expensive to share between threads, partly because of the need to synchronize access, and partly because writes destroy the cache friendly behavior of the shared data.
Immutable data structures can be beneficial here. Instead of changing the actual data structure, you simply make a new one that shares most of the old data, but with the things changed that you need changed. Sharing a single version of it is cheap, since all the data is immutable, but you can still update to a new version efficiently.
What is a large data structure?
One persons large is another persons small.
Last week I talked to two people - one person was making embedded devices he used the word
"large" - I asked him what it meant - he say over 256 KBytes - later in the same week a
guy was talking about media distribution - he used the word "large" I asked him what he
meant - he thought for a bit and said "won't fit on one machine" say 20-100 TBytes
In Erlang terms "large" could mean "won't fit into RAM" - so with 4 GBytes of RAM
data structures > 100 MBytes might be considered large - copying a 500 MBytes data structure
might be a problem. Copying small data structures (say < 10 MBytes) is never a problem in Erlang.
Really large data structures (i.e. ones that won't fit on one machine) have to be
copied and "striped" over several machines.
So I guess you have the following:
Small data structures are no problem - since they are small data processing times are
fast, copying is fast and so on (just because they are small)
Big data structures are a problem - because they don't fit on one machine - so copying is essential.
Note that your questions are technically non-sensical because message passing can use shared state so I shall assume that you mean message passing with deep copying to avoid shared state (as Erlang currently does).
Will using shared state be faster and use less memory than message passing, as locks will mostly be unnecessary because the data is read-only, and only needs to exist in a single location?
Using shared state will be a lot faster.
How would this problem be approached in a message passing context? Would there be a single process with access to the data structure and clients would simply need to sequentially request data from it? Or, if possible, would the data be chunked to create several processes that hold chunks?
Either approach can be used.
Given the architecture of modern CPUs & memory, is there much difference between the two solutions -- i.e., can shared memory be read in parallel by multiple cores -- meaning there is no hardware bottleneck that would otherwise make both implementations roughly perform the same?
Copying is cache unfriendly and, therefore, destroys scalability on multicores because it worsens contention for the shared resource that is main memory.
Ultimately, Erlang-style message passing is designed for concurrent programming whereas your questions about throughput performance are really aimed at parallel programming. These are two quite different subjects and the overlap between them is tiny in practice. Specifically, latency is typically just as important as throughput in the context of concurrent programming and Erlang-style message passing is a great way to achieve desirable latency profiles (i.e. consistently low latencies). The problem with shared memory then is not so much synchronization among readers and writers but low-latency memory management.
One solution that has not been presented here is master-slave replication. If you have a large data-structure, you can replicate changes to it out to all slaves that perform the update on their copy.
This is especially interesting if one wants to scale to several machines that don't even have the possibility to share memory without very artificial setups (mmap of a block device that read/write from a remote computer's memory?)
A variant of it is to have a transaction manager that one ask nicely to update the replicated data structure, and it will make sure that it serves one and only update-request concurrently. This is more of the mnesia model for master-master replication of mnesia table-data, which qualify as "large data structure".
The problem at the moment is indeed that the locking and cache-line coherency might be as expensive as copying a simpler data structure (e.g. a few hundred bytes).
Most of the time a clever written new multi-threaded algorithm that tries to eliminate most of the locking will always be faster - and a lot faster with modern lock-free data structures. Especially when you have well designed cache systems like Sun's Niagara chip level multi-threading.
If your system/problem is not easily broken down into a few and simple data accesses then you have a problem. And not all problems can be solved by message passing. This is why there are still some Itanium based super computers sold because they have terabyte of shared RAM and up to 128 CPU's working on the same shared memory. They are an order of magnitude more expensive then a mainstream x86 cluster with the same CPU power but you don't need to break down your data.
Another reason not mentioned so far is that programs can become much easier to write and maintain when you use multi-threading. Message passing and the shared nothing approach makes it even more maintainable.
As an example, Erlang was never designed to make things faster but instead use a large number of threads to structure complex data and event flows.
I guess this was one of the main points in the design. In the web world of google you usually don't care about performance - as long as it can run in parallel in the cloud. And with message passing you ideally can just add more computers without changing the source code.
Usually message passing languages (this is especially easy in erlang, since it has immutable variables) optimise away the actual data copying between the processes (of course local processes only: you'll want to think your network distribution pattern wisely), so this isn't much an issue.
The other concurrent paradigm is STM, software transactional memory. Clojure's ref's are getting a lot of attention. Tim Bray has a good series exploring erlang and clojure's concurrent mechanisms
http://www.tbray.org/ongoing/When/200x/2009/09/27/Concur-dot-next
http://www.tbray.org/ongoing/When/200x/2009/12/01/Clojure-Theses

Resources