Is it worth caching objects created by Delphi's Memory Manager? - delphi

I have an application that creates, and destroys thousands of objects. Is it worth caching and reusing objects, or is Delphi's memory manager fast enough that creating and destroying objects multiple times is not that great an overhead (as opposed to keeping track of a cache) When I say worth it, of course I'm looking for a performance boost.

From recent testing - if object creation is not expensive (i.e. doesn't depend on external resources - accessing files, registry, database ...) then you'll have a hard time beating Delphi's memory manager. It is that fast.
That of course holds if you're using a recent Delphi - if not, get FastMM4 from SourceForge and use it instead of Delphi's internal MM.

Memory allocation is only a small part of why you would want to cache. You need to know the full cost of constructing a semantically valid object, and compare it with the cost of retrieving items from the cache, and not just for a micro-benchmark: cache effects (CPU cache, that is) may change the runtime dynamics in a real live running application.
Or to put it another way, measure it and find out. If you're not measuring, you're not engineering, just guessing.

Only a profiler will tell you. Try both approaches in a tight loop and see what comes out on top :-)

You absolutely have to measure with real-world loads to answer questions like this. Depending on what resources are held in those objects, any resource contention, construction cost, size, etc., the answer may surprise you, and may even change depending on the nature of the load.
It is usually very difficult to determine where your performance issues will be without measuring.

I think this depends on the code your objects will execute during create and destroy. The impact from TObject.Create and TObject.Destroy is normally neglectable and may easily be outweight by the caching overhead.
You should also consider that the state of an object may differ when reused from that after just being created.

Often the only way to tell - is to try it.
If current performance is adequate then you don't have much call to try and increase it. However, if you have performance issues, then some caching (or indeed some other strategies) may help.

You will also need some stats on how often a specific object (instance) is being used. If you're referencing the same set of data regularly, than caching may really improve performance but if the accesses are distributed across all the possible objects, than your cache miss-rate might be too high for it to be worth-while.

Related

Will many attributes use (much) more memory in a Rails app?

As a Ruby on Rails-programmer I am constantly struggling with memory problems and my apps on Heroku often hit the 100% mark. As I have learned RoR by learning-by-doing and mostly by myself I believe I have missed quite a lot of conventions and best practises in terms of, specifically, memory conservation.
Measuring memory usage is difficult, even using great gems as derailed as it will only give me very indirect pointers to code that will use too much memory.
I tend to use a lot of gems and a lot of attributes on my major models. My most important one, my Product model, holds some 40 attributes and there are about 400.000 objects in my database. Which brings me to my question, or rather clarifications.
A. I assume that if I do a Product.all request in the controller it will load some (400.000*40 = ) 16M "fields" (not sure what to call it) into memory, right?
B. If I do Product.where(<query that brings up half of the objects>) will load 200.000*40= 8M "fields" into memory, right?
C. If I do Product.select(:id,:name, :price) would bring 400.000*3= 1.2M "fields" into memory, right?
D. I also assume that if I do selections of attributes that are integers (such as price) it will be less expensive than strings (such as name) which in term will be less expensive than text (such as description). In other word a Product.select(:id) will use less memory than Product.select(:long_description_with_1000_characters), right?
E. I also assume that if I do a search such as Product.all.limit(30), it will use less memory than a Product.all.limit(500), right?
F. Considering all this, and assuming the answer is YES on all of the above I assume it would be worth the time to go through my code to find "fat and greedy" requests of the 16M type. In that case, what would be the most effective way, or tool, of understanding how many "fields" a certain request will use? At the moment, I think I will just go through the code and try to picture, and trouble shoot, each database request to see if it can be slimmed down.
Edit: Another way of phrasing the (F) question: If I do a change to a certain database-request. How can I tell if it is using less memory, or not? My current approach is to upload to Heroku on my production app and check the total memory usage, which is VERY blunt obviously.
FYI: I am using Scout on Heroku to find memory bloats which is useful to some extent. I also use bullet gem to find N+1 issues, which helps to speed up requests but I am not sure if it affects memory issues.
Any pointers to blog posts or similar that discuss this would be interesting as well. In addition, any comments on how to make sure requests like the above will not bloat or use too much memory would be highly appreciated.

CGContextSaveGState performance

I am working on a app with image editing utilities where i use CoreGraphics,in few scenarios the images are huge which is fetched from the server.Does CGContextSaveGState using often has impact on the performance.
While rob mayoff's advice is very good and you should listen to him, I can spoil part of the reveal in this one case.
Yes. Saving and restoring gstates is very expensive. It's almost always cheaper to simply undo what you did.
That said, it's also usually much more readable code, so you should still measure it to see whether or not it's too expensive in your case. You might find that it's cheap enough that it's not worth the expense of rewriting the code to avoid it.

Memory efficiency vs Processor efficiency

In general use, should I bet on memory efficiency or processor efficiency?
In the end, I know that must be according to software/hardware specs. but I think there's a general rule when there's no boundaries.
Example 01 (memory efficiency):
int n=0;
if(n < getRndNumber())
n = getRndNumber();
Example 02 (processor efficiency):
int n=0, aux=0;
aux = getRndNumber();
if(n < aux)
n = aux;
They're just simple examples and wrote them in order to show what I mean. Better examples will be well received.
Thanks in advance.
I'm going to wheel out the universal performance question trump card and say "neither, bet on correctness".
Write your code in the clearest possible way, set specific measurable performance goals, measure the performance of your software, profile it to find the bottlenecks, and then if necessary optimise knowing whether processor or memory is your problem.
(As if to make a case in point, your 'simple examples' have different behaviour assuming getRndNumber() does not return a constant value. If you'd written it in the simplest way, something like n = max(0, getRndNumber()) then it may be less efficient but it would be more readable and more likely to be correct.)
Edit:
To answer Dervin's criticism below, I should probably state why I believe there is no general answer to this question.
A good example is taking a random sample from a sequence. For sequences small enough to be copied into another contiguous memory block, a partial Fisher-Yates shuffle which favours computational efficiency is the fastest approach. However, for very large sequences where insufficient memory is available to allocate, something like reservoir sampling that favours memory efficiency must be used; this will be an order of magnitude slower.
So what is the general case here? For sampling a sequence should you favour CPU or memory efficiency? You simply cannot tell without knowing things like the average and maximum sizes of the sequences, the amount of physical and virtual memory in the machine, the likely number of concurrent samples being taken, the CPU and memory requirements of the other code running on the machine, and even things like whether the application itself needs to favour speed or reliability. And even if you do know all that, then you're still only guessing, you don't really know which one to favour.
Therefore the only reasonable thing to do is implement the code in a manner favouring clarity and maintainability (taking factors you know into account, and assuming that clarity is not at the expense of gross inefficiency), measure it in a real-life situation to see whether it is causing a problem and what the problem is, and then if so alter it. Most of the time you will not have to change the code as it will not be a bottleneck. The net result of this approach is that you will have a clear and maintainable codebase overall, with the small parts that particularly need to be CPU and/or memory efficient optimised to be so.
You think one is unrelated to the other? Why do you think that? Here are two examples where you'll find often unconsidered bottlenecks.
Example 1
You design a DB related software system and find that I/O is slowing you down as you read in one of the tables. Instead of allowing multiple queries resulting in multiple I/O operations you ingest the entire table first. Now all rows of the table are in memory and the only limitation should be the CPU. Patting yourself on the back you wonder why your program becomes hideously slow on memory poor computers. Oh dear, you've forgotten about virtual memory, swapping, and such.
Example 2
You write a program where your methods create many small objects but posses O(1), O(log) or at the worst O(n) speed. You've optimized for speed but see that your application takes a long time to run. Curiously, you profile to discover what the culprit could be. To your chagrin you discover that all those small objects adds up fast. Your code is being held back by the GC.
You have to decide based on the particular application, usage etc. In your above example, both memory and processor usage is trivial, so not a good example.
A better example might be the use of history tables in chess search. This method caches previously searched positions in the game tree in case they are re-searched in other branches of the game tree or on the next move.
However, it does cost space to store them, and space also requires time. If you use up too much memory you might end up using virtual memory which will be slow.
Another example might be caching in a database server. Clearly it is faster to access a cached result from main memory, but then again it would not be a good idea to keep loading and freeing from memory data that is unlikely to be re-used.
In other words, you can't generalize. You can't even make a decision based on the code - sometimes the decision has to be made in the context of likely data and usage patterns.
In the past 10 years. main memory has increased in speed hardly at all, while processors have continued to race ahead. There is no reason to believe this is going to change.
Edit: Incidently, in your example, aux will most likely end up in a register and never make it to memory at all.
Without context I think optimising for anything other than readability and flexibilty
So, the only general rule I could agree with is "Optmise for readability, while bearing in mind the possibility that at some point in the future you may have to optimise for either memory or processor efficiency in the future".
Sorry it isn't quite as catchy as you would like...
In your example, version 2 is clearly better, even though version 1 is prettier to me, since as others have pointed out, calling getRndNumber() multiple times requires more knowledge of getRndNumber() to follow.
It's also worth considering the scope of the operation you are looking to optimize; if the operation is time sensitive, say part of a web request or GUI update, it might be better to err on the side of completing it faster than saving memory.
Processor efficiency. Memory is egregiously slow compared to your processor. See this link for more details.
Although, in your example, the two would likely be optimized to be equivalent by the compiler.

What's more expensive, comparison or assignment?

I've started reading Algorithms and I keep wondering, when dealing with primitives of the same type, which is the more expensive operation, assignment or comparison? Does this vary a great deal between languages?
What do you think?
At the lowest level one does two reads, the other does a read and a write.
But why should you really care? You shouldn't care about performance at this level. Optimize for Big-O
Micro-optimization is almost always the wrong thing to do. Don't even start on it unless the program runs too slowly, and you use a profiler to determine exactly where the slow parts are.
Once you've done that, my advice is to see about improving code and data locality, because cache misses are almost certainly worse than suboptimal instructions.
That being done, in the rather odd case that you can use either an assignment-based or comparison-based approach, try both and time them. Micro-optimization is a numbers game. If the numbers aren't good enough, find out why, then verify that what you're doing actually works.
So, what do you mean by a comparison? Conditional jumps cause problems to any vaguely modern processor, but different processors do different things, and there's no guarantee that any given one will slow things down. Also, if either causes a cache miss, that's probably the slower one no matter what.
Finally, languages are normally compiled to machine code, and simple things like comparisons and assignments will normally be compiled the same. The big difference will be the type of CPU.

Virtual Memory

Most of the literature on Virtual Memory point out that the as a Application developer,understanding Virtual Memory can help me in harnessing its powerful capabilities. I have been involved in developing applications on Linux for sometime but but didn't care about Virtual Memory intricacies while I code. Am I missing something? If so, please shed some light on how I can leverage the workings of Virtual Memory. Else let me know if am I not making sense with the question!
Well, the concept is pretty simple actually. I won't repeat it here, but you should pick up any book on OS design and it will be explained there. I recommend the "Operating System Concepts" from Silberscahtz and Galvin - it's what I had to use in the University and it's good.
A couple of things that I can think of what Virtual Memory knowledge might give you are:
Learning to allocate memory on page boundaries to avoid waste (applies only to virtual memory, not the usual heap/stack memory);
Lock some pages in RAM so they don't get swapped to HDD;
Guardian pages;
Reserving some address range and committing actual memory later;
Perhaps using the NX (non-executable) bit to increase security, but im not sure on this one.
PAE for accessing >4GB on a 32-bit system.
Still, all of these things would have uses only in quite specific scenarios. Indeed, 99% of applications need not concern themselves about this.
Added: That said, it's definately good to know all these things, so that you can identify such scenarios when they arise. Just beware - with power comes responsibility.
It's a bit of a vague question.
The way you can use virtual memory, is chiefly through the use of memory-mapped files. See the mmap() man page for more details.
Although, you are probably using it implicitly anyway, as any dynamic library is implemented as a mapped file, and many database libraries use them too.
The interface to use mapped files from higher level languages is often quite inconvenient, which makes them less useful.
The chief benefits of using mapped files are:
No system call overhead when accessing parts of the file (this actually might be a disadvantage, as a page fault probably has as much overhead anyway, if it happens)
No need to copy data from OS buffers to application buffers - this can improve performance
Ability to share memory between processes.
Some drawbacks are:
32-bit machines can run out of address space easily
Tricky to handle file extending correctly
No easy way to see how many / which pages are currently resident (there may be some ways however)
Not good for real-time applications, as a page fault may cause an IO request, which blocks the thread (the file can be locked in memory however, but only if there is enough).
May be 9 out of 10 cases you need not worry about virtual memory management. That's the job of the kernel. May be in some highly specialized applications do you need to tweak around them.
I know of one article that talks about computer memory management with an emphasis on Linux [ http://lwn.net/Articles/250967 ]. Hope this helps.
For most applications today, the programmer can remain unaware of the workings of computer memory without any harm. But sometimes -- for example the case when you want to improve the footprint of your program -- you do end up having to manipulate memory yourself. In such situations, knowing how memory is designed to work is essential.
In other words, although you can indeed survive without it, learning about virtual memory will only make you a better programmer.
And I would think the Wikipedia article can be a good start.
If you are concerned with performance -- understanding memory hierarchy is important.
For small data sets which are fully contained in physical memory you need to be concerned with caching (accessing memory from the cache is much faster).
When dealing with large data sets -- which may be paged out due to lack of physical memory you need to be careful to keep your access patterns localized.
For example if you declare a matrix in C (int a[rows][cols]), it is allocated by rows. Thus when scanning the matrix, you need to scan by rows rather than by columns. Otherwise you will be paging the same data in and out many times.
Another issue is the difference between dirty and clean data held in memory. Clean data is information loaded from file that was not modified by the program. The OS may page out clean data (perhaps depending on how it was loaded) without writing it to disk. Dirty pages must first be written to the swap file.

Resources