Virtual Memory - memory

Most of the literature on Virtual Memory point out that the as a Application developer,understanding Virtual Memory can help me in harnessing its powerful capabilities. I have been involved in developing applications on Linux for sometime but but didn't care about Virtual Memory intricacies while I code. Am I missing something? If so, please shed some light on how I can leverage the workings of Virtual Memory. Else let me know if am I not making sense with the question!

Well, the concept is pretty simple actually. I won't repeat it here, but you should pick up any book on OS design and it will be explained there. I recommend the "Operating System Concepts" from Silberscahtz and Galvin - it's what I had to use in the University and it's good.
A couple of things that I can think of what Virtual Memory knowledge might give you are:
Learning to allocate memory on page boundaries to avoid waste (applies only to virtual memory, not the usual heap/stack memory);
Lock some pages in RAM so they don't get swapped to HDD;
Guardian pages;
Reserving some address range and committing actual memory later;
Perhaps using the NX (non-executable) bit to increase security, but im not sure on this one.
PAE for accessing >4GB on a 32-bit system.
Still, all of these things would have uses only in quite specific scenarios. Indeed, 99% of applications need not concern themselves about this.
Added: That said, it's definately good to know all these things, so that you can identify such scenarios when they arise. Just beware - with power comes responsibility.

It's a bit of a vague question.
The way you can use virtual memory, is chiefly through the use of memory-mapped files. See the mmap() man page for more details.
Although, you are probably using it implicitly anyway, as any dynamic library is implemented as a mapped file, and many database libraries use them too.
The interface to use mapped files from higher level languages is often quite inconvenient, which makes them less useful.
The chief benefits of using mapped files are:
No system call overhead when accessing parts of the file (this actually might be a disadvantage, as a page fault probably has as much overhead anyway, if it happens)
No need to copy data from OS buffers to application buffers - this can improve performance
Ability to share memory between processes.
Some drawbacks are:
32-bit machines can run out of address space easily
Tricky to handle file extending correctly
No easy way to see how many / which pages are currently resident (there may be some ways however)
Not good for real-time applications, as a page fault may cause an IO request, which blocks the thread (the file can be locked in memory however, but only if there is enough).

May be 9 out of 10 cases you need not worry about virtual memory management. That's the job of the kernel. May be in some highly specialized applications do you need to tweak around them.
I know of one article that talks about computer memory management with an emphasis on Linux [ http://lwn.net/Articles/250967 ]. Hope this helps.

For most applications today, the programmer can remain unaware of the workings of computer memory without any harm. But sometimes -- for example the case when you want to improve the footprint of your program -- you do end up having to manipulate memory yourself. In such situations, knowing how memory is designed to work is essential.
In other words, although you can indeed survive without it, learning about virtual memory will only make you a better programmer.
And I would think the Wikipedia article can be a good start.

If you are concerned with performance -- understanding memory hierarchy is important.
For small data sets which are fully contained in physical memory you need to be concerned with caching (accessing memory from the cache is much faster).
When dealing with large data sets -- which may be paged out due to lack of physical memory you need to be careful to keep your access patterns localized.
For example if you declare a matrix in C (int a[rows][cols]), it is allocated by rows. Thus when scanning the matrix, you need to scan by rows rather than by columns. Otherwise you will be paging the same data in and out many times.
Another issue is the difference between dirty and clean data held in memory. Clean data is information loaded from file that was not modified by the program. The OS may page out clean data (perhaps depending on how it was loaded) without writing it to disk. Dirty pages must first be written to the swap file.

Related

Will many attributes use (much) more memory in a Rails app?

As a Ruby on Rails-programmer I am constantly struggling with memory problems and my apps on Heroku often hit the 100% mark. As I have learned RoR by learning-by-doing and mostly by myself I believe I have missed quite a lot of conventions and best practises in terms of, specifically, memory conservation.
Measuring memory usage is difficult, even using great gems as derailed as it will only give me very indirect pointers to code that will use too much memory.
I tend to use a lot of gems and a lot of attributes on my major models. My most important one, my Product model, holds some 40 attributes and there are about 400.000 objects in my database. Which brings me to my question, or rather clarifications.
A. I assume that if I do a Product.all request in the controller it will load some (400.000*40 = ) 16M "fields" (not sure what to call it) into memory, right?
B. If I do Product.where(<query that brings up half of the objects>) will load 200.000*40= 8M "fields" into memory, right?
C. If I do Product.select(:id,:name, :price) would bring 400.000*3= 1.2M "fields" into memory, right?
D. I also assume that if I do selections of attributes that are integers (such as price) it will be less expensive than strings (such as name) which in term will be less expensive than text (such as description). In other word a Product.select(:id) will use less memory than Product.select(:long_description_with_1000_characters), right?
E. I also assume that if I do a search such as Product.all.limit(30), it will use less memory than a Product.all.limit(500), right?
F. Considering all this, and assuming the answer is YES on all of the above I assume it would be worth the time to go through my code to find "fat and greedy" requests of the 16M type. In that case, what would be the most effective way, or tool, of understanding how many "fields" a certain request will use? At the moment, I think I will just go through the code and try to picture, and trouble shoot, each database request to see if it can be slimmed down.
Edit: Another way of phrasing the (F) question: If I do a change to a certain database-request. How can I tell if it is using less memory, or not? My current approach is to upload to Heroku on my production app and check the total memory usage, which is VERY blunt obviously.
FYI: I am using Scout on Heroku to find memory bloats which is useful to some extent. I also use bullet gem to find N+1 issues, which helps to speed up requests but I am not sure if it affects memory issues.
Any pointers to blog posts or similar that discuss this would be interesting as well. In addition, any comments on how to make sure requests like the above will not bloat or use too much memory would be highly appreciated.

What advantages does in-memory OLAP have over traditional systems with significant memory?

Do in-memory OLAP engines have advantages over the traditional OLAP engines backed by enough RAM to contain the entire cube(s)?
For example, if I use a MOLAP engine (SSAS) and GB / TB of RAM where the entire cube (or even star-schema) is RAM resident, what is the difference compared to something like TM1 / SAP HANA?
Basically it comes down to the following:
A system that is really optimized for "in-memory" operations takes into account several aspects like random access, memory page size, different cache levels (CPU, ...) etc.
This leads to a maximum use of the possibilities that RAM offers and HDD does not offer which in turn makes for excellent performance.
A traditional engine which is optimized for filesystem access is usually taking into account several aspects relevant to files/OS handling of filesystem etc.
Even when such an engine loads everything into its cache (memory) it still operates on the data AS IF it were on disk which makes sense since the code must work in situations where not everything fits into memory too. Using the same implementation for both situation makes for better testing/stability/bug-fixing/maintainability etc. BUT this leads to "not taking advantage" of all that makes RAM access different from file/disk access. Such an engine usually can be made faster IF it implements RAM specific optimizations so that it offers in each world (RAM versus disk) the best ... I am not aware of any engine doing that ...

Memory efficiency vs Processor efficiency

In general use, should I bet on memory efficiency or processor efficiency?
In the end, I know that must be according to software/hardware specs. but I think there's a general rule when there's no boundaries.
Example 01 (memory efficiency):
int n=0;
if(n < getRndNumber())
n = getRndNumber();
Example 02 (processor efficiency):
int n=0, aux=0;
aux = getRndNumber();
if(n < aux)
n = aux;
They're just simple examples and wrote them in order to show what I mean. Better examples will be well received.
Thanks in advance.
I'm going to wheel out the universal performance question trump card and say "neither, bet on correctness".
Write your code in the clearest possible way, set specific measurable performance goals, measure the performance of your software, profile it to find the bottlenecks, and then if necessary optimise knowing whether processor or memory is your problem.
(As if to make a case in point, your 'simple examples' have different behaviour assuming getRndNumber() does not return a constant value. If you'd written it in the simplest way, something like n = max(0, getRndNumber()) then it may be less efficient but it would be more readable and more likely to be correct.)
Edit:
To answer Dervin's criticism below, I should probably state why I believe there is no general answer to this question.
A good example is taking a random sample from a sequence. For sequences small enough to be copied into another contiguous memory block, a partial Fisher-Yates shuffle which favours computational efficiency is the fastest approach. However, for very large sequences where insufficient memory is available to allocate, something like reservoir sampling that favours memory efficiency must be used; this will be an order of magnitude slower.
So what is the general case here? For sampling a sequence should you favour CPU or memory efficiency? You simply cannot tell without knowing things like the average and maximum sizes of the sequences, the amount of physical and virtual memory in the machine, the likely number of concurrent samples being taken, the CPU and memory requirements of the other code running on the machine, and even things like whether the application itself needs to favour speed or reliability. And even if you do know all that, then you're still only guessing, you don't really know which one to favour.
Therefore the only reasonable thing to do is implement the code in a manner favouring clarity and maintainability (taking factors you know into account, and assuming that clarity is not at the expense of gross inefficiency), measure it in a real-life situation to see whether it is causing a problem and what the problem is, and then if so alter it. Most of the time you will not have to change the code as it will not be a bottleneck. The net result of this approach is that you will have a clear and maintainable codebase overall, with the small parts that particularly need to be CPU and/or memory efficient optimised to be so.
You think one is unrelated to the other? Why do you think that? Here are two examples where you'll find often unconsidered bottlenecks.
Example 1
You design a DB related software system and find that I/O is slowing you down as you read in one of the tables. Instead of allowing multiple queries resulting in multiple I/O operations you ingest the entire table first. Now all rows of the table are in memory and the only limitation should be the CPU. Patting yourself on the back you wonder why your program becomes hideously slow on memory poor computers. Oh dear, you've forgotten about virtual memory, swapping, and such.
Example 2
You write a program where your methods create many small objects but posses O(1), O(log) or at the worst O(n) speed. You've optimized for speed but see that your application takes a long time to run. Curiously, you profile to discover what the culprit could be. To your chagrin you discover that all those small objects adds up fast. Your code is being held back by the GC.
You have to decide based on the particular application, usage etc. In your above example, both memory and processor usage is trivial, so not a good example.
A better example might be the use of history tables in chess search. This method caches previously searched positions in the game tree in case they are re-searched in other branches of the game tree or on the next move.
However, it does cost space to store them, and space also requires time. If you use up too much memory you might end up using virtual memory which will be slow.
Another example might be caching in a database server. Clearly it is faster to access a cached result from main memory, but then again it would not be a good idea to keep loading and freeing from memory data that is unlikely to be re-used.
In other words, you can't generalize. You can't even make a decision based on the code - sometimes the decision has to be made in the context of likely data and usage patterns.
In the past 10 years. main memory has increased in speed hardly at all, while processors have continued to race ahead. There is no reason to believe this is going to change.
Edit: Incidently, in your example, aux will most likely end up in a register and never make it to memory at all.
Without context I think optimising for anything other than readability and flexibilty
So, the only general rule I could agree with is "Optmise for readability, while bearing in mind the possibility that at some point in the future you may have to optimise for either memory or processor efficiency in the future".
Sorry it isn't quite as catchy as you would like...
In your example, version 2 is clearly better, even though version 1 is prettier to me, since as others have pointed out, calling getRndNumber() multiple times requires more knowledge of getRndNumber() to follow.
It's also worth considering the scope of the operation you are looking to optimize; if the operation is time sensitive, say part of a web request or GUI update, it might be better to err on the side of completing it faster than saving memory.
Processor efficiency. Memory is egregiously slow compared to your processor. See this link for more details.
Although, in your example, the two would likely be optimized to be equivalent by the compiler.

Low level programming: How to find data in a memory of another running process?

I am trying to write a statistics tool for a game by extracting values from game's process memory (as there is no other way). The biggest challenge is to find out required addresses that store data I am interested. What makes it even more harder is dynamic memory allocation - I need to find not only addresses that store data but also pointers to those memory blocks, because addresses are changing every time game restarts.
For now I am just manually searching game memory using memory editor (ArtMoney), and looking for addresses that change their values as data changes (or don't change). After address is found I am looking for a pointer that points to this memory block in a similar way.
I wonder what techniques/tools exist for such tasks? Maybe there are some articles I can read? Is mastering disassembler the only way to go? For example game trainers are solving similar tasks, but they make them in days and I am struggling already for weeks.
Thanks.
PS. It's all under windows.
Is mastering disassembler the only way to go?
Yes; go download WinDbg from http://www.microsoft.com/whdc/devtools/debugging/default.mspx, or if you've got some money to blow, IDA Pro is probably the best tool for doing this
If you know how to code in C, it is easy to search for memory values. If you don't know C, this page might point you to your solution if you can code in C#. It will not be hard to port the C# they have to Java.
You might take a look at DynInst (Dynamic Instrumentation). In particular, look at the Dynamic Probe Class Library (DPCL). These tools will let you attach to running processes via the debugger interface and insert your own instrumentation (via special probe classes) into them while they're running. You could probably use this to instrument the routines that access your data structures and trace when the values you're interested in are created or modified.
You might have an easier time doing it this way than doing everything manually. There are a bunch of papers on those pages you can look at to see how other people built similar tools, too.
I believe the Windows support is maintained, but I have not used it myself.

Is it worth caching objects created by Delphi's Memory Manager?

I have an application that creates, and destroys thousands of objects. Is it worth caching and reusing objects, or is Delphi's memory manager fast enough that creating and destroying objects multiple times is not that great an overhead (as opposed to keeping track of a cache) When I say worth it, of course I'm looking for a performance boost.
From recent testing - if object creation is not expensive (i.e. doesn't depend on external resources - accessing files, registry, database ...) then you'll have a hard time beating Delphi's memory manager. It is that fast.
That of course holds if you're using a recent Delphi - if not, get FastMM4 from SourceForge and use it instead of Delphi's internal MM.
Memory allocation is only a small part of why you would want to cache. You need to know the full cost of constructing a semantically valid object, and compare it with the cost of retrieving items from the cache, and not just for a micro-benchmark: cache effects (CPU cache, that is) may change the runtime dynamics in a real live running application.
Or to put it another way, measure it and find out. If you're not measuring, you're not engineering, just guessing.
Only a profiler will tell you. Try both approaches in a tight loop and see what comes out on top :-)
You absolutely have to measure with real-world loads to answer questions like this. Depending on what resources are held in those objects, any resource contention, construction cost, size, etc., the answer may surprise you, and may even change depending on the nature of the load.
It is usually very difficult to determine where your performance issues will be without measuring.
I think this depends on the code your objects will execute during create and destroy. The impact from TObject.Create and TObject.Destroy is normally neglectable and may easily be outweight by the caching overhead.
You should also consider that the state of an object may differ when reused from that after just being created.
Often the only way to tell - is to try it.
If current performance is adequate then you don't have much call to try and increase it. However, if you have performance issues, then some caching (or indeed some other strategies) may help.
You will also need some stats on how often a specific object (instance) is being used. If you're referencing the same set of data regularly, than caching may really improve performance but if the accesses are distributed across all the possible objects, than your cache miss-rate might be too high for it to be worth-while.

Resources