What is "tagged memory"? - memory

What is "tagged memory" and how does it help in reducing program size?

You may be referring to a tagged union, or more specifically a hardware implementation like the tagged architecture used in LISP machines. Basically a method for storing data with type information.
In a LISP machine, this was done in-memory by using a longer word length and using some of the extra bits to store type information. Handling and checking of tags was done implicitly in hardware.
For a type-safe C++ implementation, see boost:variant.

Not sure, but it is possible that you are referring to garbage collection, which is the process of automatically disposing of no longer used objects created when running a program.
"Tagged memory" can be a synonym for mark-and-sweep, which is the most basic way to implement garbage collection.
If this is all wrong, please edit your question to clarify.

The Windows DDK makes use of "pool tags" when allocating memory out of the kernel page pool. It costs 4 bytes of memory per allocation, but allows you to label (i.e. tag) portions of kernel memory which might help with debugging and detecting memory leaks.
BTW I don't see how anything called "tagged memory" could reduce program code size. It sounds like extra work, which translates to "more code" and "bigger program." Maybe it's meant to reduce the memory footprint somehow?

Here's a more technical description going into the implementation details as to how this is used for garbage collection. You may also want to check out the wikipedia article about Tagged Pointers.

Related

Can a garbage collected language compile to a non-garbage collected one without including a garbage collector in the runtime?

As I understand it, when a managed language (like Haxe) can and wants to compiles to a non-managed language (like C++), it includes some form of garbage collector in the runtime.
I was wondering if it would be possible to completely abstract away memory management in the intermediate representation / abstract syntax tree, so that a garbage collector would not be needed and the default behavior (stack allocations live until end of scope and heap allocations live until freed) could be used?
Thank you!
If I understood you correctly, you're asking whether it's possible to take a garbage collected language and compile it to an equivalent program in a non-garbage collected language without introducing memory errors or leaks, just by adding frees in the right places (i.e. no reference counting or otherwise keeping track of references or implementing a garbage collection algorithm in anyway or doing anything else at run time that could be considered garbage collection).
No, that is not possible. To do something like this, you'd have to be able to statically answer the question "What's the point in the program, after which a given object is no longer referenced", which is a non-trivial semantic property and thus undecidable per Rice's theorem.
You could define a sufficiently restricted subset of the language (something like "only one live variable may hold a strong reference to an object at a time and anything else must use weak references"), but programming in that subset would be so different from programming in the original language¹ that there wouldn't be much of a point in doing that.
¹ And perhaps more importantly: it would be highly unlikely that existing code would conform to that subset. So if there's a compiler that can compile my code to efficient GC-free native code, but only if I completely re-write my code to fit an awkward subset of the language, why wouldn't I just re-write the project in Rust instead? Especially since interop with libraries that aren't written in the subset would probably be infeasible as well.

Memory segmentation?

I now understand how virtual memory works and what is responsible for setting up this virtual memory. However, some days ago, I encountered memory segmentation that split up the address space into segments like data and text. I cannot find any clear, non-ambiguous resources (at least to me) that explains memory segmentation. For instance, I would like to know,
What is responsible for splitting up address spaces into segments?
How exactly does it work? Like how are segments translated to physical addresses, and what checks if an address within a certain segment has been accessed?
I have found this wiki article but it does not really answer such questions.
The term "segment" appears in at least two distinct memory contexts.
In ye olde days, segmentation was method used for memory protection. The Intel chips continued the use of segments for decades after they were obsolete. Intel finally dropped the used of segments in 64-bit mode but they still exist in vestigial form and they still exist in 32-bit mode.
That is the type of "segmentation" described in the wikipedia link.
The "code" and "data"-type segmentation is something entirely different. Another term for this is "program section."
When you link your code, the linker usually groups memory with the same attributes into "program sections" (aka "segments"). Typically you will have memory that:
Is read only/execute (code)
read/write and initialized to zero
read/write and initialized to specified values
Read only
In order to control the grouping of related memory, linkers generally used named segments/program sections. A linker may, by default, create a program section/segment called "Code" and place all the executable code in that segment. It make create, by default, a segment called "Data" and place the read only data in that segment.
Powerful linkers allow the programmer to override these. Some assembly languages and system languages allow you to specify program sections.
"Segments" in this context only exist only in the linking process. There is no area in memory marked "Code" or "Data" (unless you are using the olde Intel system).
What is responsible for splitting up address spaces into segments?
The address space is not split up into segments of this second type on modern systems (ie those designed after 1970 and not from Intel). Some confusing books use this as a pedagogical concept in diagrams. A process can (and usually does) have code pages interspersed with data pages.
Like how are segments translated to physical addresses, and what checks if an address within a certain segment has been accessed?
That question relates to the use of the term "Segment" described at the top. That translation is done using hardware registers.
Well, to be honest I prefer you to consult books that have basics and thorough materials rather reading articles. Because, their content is specific and of above basic level (to me).
Every term in your question is a separate topic that are very well described in bellow reference. If you really want answers and clear concepts then you should go through this:
Read out Abraham Silberschatz's "Operating system concepts".
Chapter 8: Memory Management
Sub topics: Paging basic method and hardware support, Segmentation

Pointers in Delphi

Pointers can be still used in pascal and i think they may preserve it until delphi is alive.
Even though i have used pointer when i am learning pascal. I still can't understand the real use of pointers, I manage to do all my delphi programs without it.(by some other ways)
what is the real use of pointers. And I am asking for real world usage, and can we manage to do anything without pointers.
You use pointers a lot more often than you think you do in Delphi. The compiler just hides it.
var
SL: TStringList;
...
SL := TStringList.Create;
// SL is now a pointer to an instance of the TStringList class.
// You don't know it because the compiler handles dereferencing
// it, so you don't have to use SL^ . You can just use the var.
SL.Add('This is a string');
A string is also a pointer to a block of memory that stores the string. (It actually stores more than that, but...)
So are instances of records, PChars (which is a pointer to a character array, basically), and tons of other things you use every day. There are lots of pointers that aren't called Pointer. :)
the pointers contain the address of a memory location, because that are present everywhere. each variable which you declare, even the code which you write can be accessed using a pointer, the pointers is one of the most essential elements in Win32 programming, you must read this great article from Rudy Velthuis Addressing pointers to understand the pointers usage.
To understand what pointers might be used for in modern day Delphi, one needs to understand how pointers have been historically used and how Delphi uses pointers behind the scenes.
Pointers to code
One can have a pointer to a piece of code. This can be used for the intuitive thing (parameterizing some algorithm with the functions it needs to call; example: TObjectList.Sort takes a function pointer as a parameter). Modern Delphi uses pointers-to-code to implement the following (without going into details):
Virtual Method Tables; We can't have OOP without VMT's
Interfaces
Events and anonymous methods.
Those are all very powerful methods of avoiding raw pointers to code, indeed there's very little need for raw code pointers today.
Pointers to data
Everybody learned pointers using the Linked Lists. Pointers are essential in the implementation most non-trivial data structures; In fact it's hard to name a useful data structure that's not implemented using pointers.
Delphi gives lots of grate abstractions for data pointers, so we can work without ever touching an pointer. We have Objects (class instances) and good implementations for most data structures (string, TObjectList, dynamic arrays).
When do we use Pointers?
We essentially use pointers to implement more of the grate stuff that Delphi provides for us. I can give examples of what I've used pointers for, but I find it more interesting to give examples of what others have used pointers for:
TVirtualTree: Makes good use of pointers to data.
Pascal Script for Delphi: Makes extensive use of raw pointers to code!
Let's start with a definition, taken from Wikipedia:
A pointer is a programming language
data type whose value refers directly
to (or "points to") another value
stored elsewhere in the computer
memory using its address.
All computers address memory and to do so the machine language that they execute must do so using pointers.
However, high level languages do not need to include pointers explicitly. Some examples of those that do not are LISP, Java, C#, Perl, Python, but there are many more.
I'm interpreting your question to be why languages support explicit pointer use, since all languages use pointers implicitly.
Delphi descends from Pascal which is a rather primitive language when viewed from the 21st century. Pascal pointers are the only way to use references. In modern Pascal derivatives, e.g. Delphi, have many types of data that are reference based, but implicitly so. For example I'm thinking of strings, object instances, interfaces and so on.
It is perfectly possible to write any program in Delphi without resorting to explicit pointers. The modern trend away from explicit pointers is down to the observation that explicit pointer code is more prone to errors than the alternatives.
I don't think there's any real reason to carry on using explicit pointer code in Delphi. Perhaps very time critical algorithms may push you that way, but I'm really struggling to think of anything that is significantly better implemented with pointers than with the alternatives.
Personally I avoid using explicit pointers wherever feasible. It generally makes code easier to write, verify and maintain, in my experience.
1) Classes are pointers. Strings are pointers. The compiler pretty much hides this from you but you do need to understand this when debugging.
2) Any sort of tree structure needs pointers.
I use a lot of custom data structures in my programs (for instance, in my own scripting language /interpreter/, where I store the structures, or "records", used in this language - these can contain other records and the nesting is unrestricted). I do this at the very lowest level, namely, I allocate a number of bytes on the heap, and then I read, parse, and write to these completely manually. To this end, I need pointers to point to the bytes in the allocated blocks. Typically, the blocks consists of "records" that reference each other in some ways. In addition, many of these structures can be written to file (that is, I have also designed my own binary file formats), simply by copying from the heap (in RAM) to the disk byte-by-byte.
Pointers (memory addresses) are a basic CPU type to access memory. Whatever language you use to write an application, when it comes to the CPU it has to use "pointers" to work.
Thereby they are a low-level type which allows a huge degree of versatility, at the price of a somewhat more complex management and the risk of accessing or writing the wrong memory if not used correctly.
You could let a language translate your data to pointers and memory buffers, or use pointers directly if the language allows for it. Being able to do it allows for direct and optimized memory access and management.
I thought I could add my salt to the soup, but the answers above say it mostly all. There's one more thing though. I remember being scared when I saw all those referencing and dereferencing operators (in different languages) and all the magic that was done with pointers. I didn't dare to look the devil in the eye for YEARS. I preferred to ineffectively copy data instead of going down the pointer path.
Today though, I do love pointers. Working with pointers makes you aware of what happens under the hood. And it makes you aware of the memory you are playing with (responsibly). Practically it allows you to code more effitiently and consciously! Last but not least it turns out to be quite fun to play with such simple but powerful toys.

Low level programming: How to find data in a memory of another running process?

I am trying to write a statistics tool for a game by extracting values from game's process memory (as there is no other way). The biggest challenge is to find out required addresses that store data I am interested. What makes it even more harder is dynamic memory allocation - I need to find not only addresses that store data but also pointers to those memory blocks, because addresses are changing every time game restarts.
For now I am just manually searching game memory using memory editor (ArtMoney), and looking for addresses that change their values as data changes (or don't change). After address is found I am looking for a pointer that points to this memory block in a similar way.
I wonder what techniques/tools exist for such tasks? Maybe there are some articles I can read? Is mastering disassembler the only way to go? For example game trainers are solving similar tasks, but they make them in days and I am struggling already for weeks.
Thanks.
PS. It's all under windows.
Is mastering disassembler the only way to go?
Yes; go download WinDbg from http://www.microsoft.com/whdc/devtools/debugging/default.mspx, or if you've got some money to blow, IDA Pro is probably the best tool for doing this
If you know how to code in C, it is easy to search for memory values. If you don't know C, this page might point you to your solution if you can code in C#. It will not be hard to port the C# they have to Java.
You might take a look at DynInst (Dynamic Instrumentation). In particular, look at the Dynamic Probe Class Library (DPCL). These tools will let you attach to running processes via the debugger interface and insert your own instrumentation (via special probe classes) into them while they're running. You could probably use this to instrument the routines that access your data structures and trace when the values you're interested in are created or modified.
You might have an easier time doing it this way than doing everything manually. There are a bunch of papers on those pages you can look at to see how other people built similar tools, too.
I believe the Windows support is maintained, but I have not used it myself.

Virtual Memory

Most of the literature on Virtual Memory point out that the as a Application developer,understanding Virtual Memory can help me in harnessing its powerful capabilities. I have been involved in developing applications on Linux for sometime but but didn't care about Virtual Memory intricacies while I code. Am I missing something? If so, please shed some light on how I can leverage the workings of Virtual Memory. Else let me know if am I not making sense with the question!
Well, the concept is pretty simple actually. I won't repeat it here, but you should pick up any book on OS design and it will be explained there. I recommend the "Operating System Concepts" from Silberscahtz and Galvin - it's what I had to use in the University and it's good.
A couple of things that I can think of what Virtual Memory knowledge might give you are:
Learning to allocate memory on page boundaries to avoid waste (applies only to virtual memory, not the usual heap/stack memory);
Lock some pages in RAM so they don't get swapped to HDD;
Guardian pages;
Reserving some address range and committing actual memory later;
Perhaps using the NX (non-executable) bit to increase security, but im not sure on this one.
PAE for accessing >4GB on a 32-bit system.
Still, all of these things would have uses only in quite specific scenarios. Indeed, 99% of applications need not concern themselves about this.
Added: That said, it's definately good to know all these things, so that you can identify such scenarios when they arise. Just beware - with power comes responsibility.
It's a bit of a vague question.
The way you can use virtual memory, is chiefly through the use of memory-mapped files. See the mmap() man page for more details.
Although, you are probably using it implicitly anyway, as any dynamic library is implemented as a mapped file, and many database libraries use them too.
The interface to use mapped files from higher level languages is often quite inconvenient, which makes them less useful.
The chief benefits of using mapped files are:
No system call overhead when accessing parts of the file (this actually might be a disadvantage, as a page fault probably has as much overhead anyway, if it happens)
No need to copy data from OS buffers to application buffers - this can improve performance
Ability to share memory between processes.
Some drawbacks are:
32-bit machines can run out of address space easily
Tricky to handle file extending correctly
No easy way to see how many / which pages are currently resident (there may be some ways however)
Not good for real-time applications, as a page fault may cause an IO request, which blocks the thread (the file can be locked in memory however, but only if there is enough).
May be 9 out of 10 cases you need not worry about virtual memory management. That's the job of the kernel. May be in some highly specialized applications do you need to tweak around them.
I know of one article that talks about computer memory management with an emphasis on Linux [ http://lwn.net/Articles/250967 ]. Hope this helps.
For most applications today, the programmer can remain unaware of the workings of computer memory without any harm. But sometimes -- for example the case when you want to improve the footprint of your program -- you do end up having to manipulate memory yourself. In such situations, knowing how memory is designed to work is essential.
In other words, although you can indeed survive without it, learning about virtual memory will only make you a better programmer.
And I would think the Wikipedia article can be a good start.
If you are concerned with performance -- understanding memory hierarchy is important.
For small data sets which are fully contained in physical memory you need to be concerned with caching (accessing memory from the cache is much faster).
When dealing with large data sets -- which may be paged out due to lack of physical memory you need to be careful to keep your access patterns localized.
For example if you declare a matrix in C (int a[rows][cols]), it is allocated by rows. Thus when scanning the matrix, you need to scan by rows rather than by columns. Otherwise you will be paging the same data in and out many times.
Another issue is the difference between dirty and clean data held in memory. Clean data is information loaded from file that was not modified by the program. The OS may page out clean data (perhaps depending on how it was loaded) without writing it to disk. Dirty pages must first be written to the swap file.

Resources