I'm taking a college course about compilers and we just finished talking about garbage collection and ways to free memory. However, in class lectures and in our textbook, I was led to believe that reference counting was not a great way to manage memory.
The reasoning was that that reference counting is very expensive because the program has to insert numerous additional instructions to increment and decrement the reference count. Additionally, everytime the reference count changes, the program has to check if it equals zero and if so, reclaim the memory.
My textbook even has the sentence: "On the whole, the problems with reference counting outweight its advantages and it is rarely used for automatic storage management in programming language environments.
My questions are: Are these legitamate concerns? Does objective-c avoid them somehow? If so how?
Reference counting does have meaningful overhead, it's true. However, the "classic textbook" solution of tracing garbage collectors are not without downsides as well. The biggest one is nondeterminism, but pausing vs throughput is a significant concern as well.
In the end though, ObjC doesn't really get a choice. A state of the art copying collector requires certain properties of the language (no raw pointers for example) that ObjC just doesn't have. As a result, trying to apply the textbook solution to ObjC ends up requiring a partially conservative, non-copying collector, which in practice is around the same speed as refcounting but without its deterministic behavior.
(edit) My personal feelings are that throughput is a secondary, or even tertiary, concern and that the really important debate comes down to deterministic behavior vs cycle collection and heap compaction by copying. All three of those are such valuable properties that I'd be hard-pressed to pick one.
The consensus on RC vs. tracing in computer science research has been, for a long time, that tracing has superior CPU throughput despite longer (maximum) pause times. (E.g. see here, here, and here.) Only very recently, in 2013, has there been a paper (last link under those three) presenting an RC based system that performs equally or a little better than the best tested tracing GC, with regard to CPU throughput. Needless to say it has no "real" implementations yet.
Here is a tiny benchmark I just did on my iMac with 3.1 GHz i5, in the iOS 7.1 64-bit simulator:
long tenmillion = 10000000;
NSTimeInterval t;
t = [NSDate timeIntervalSinceReferenceDate];
NSMutableArray *arr = [NSMutableArray arrayWithCapacity:tenmillion];
for (long i = 0; i < tenmillion; ++i)
[arr addObject:[NSObject new]];
NSLog(#"%f seconds: Allocating ten million objects and putting them in an array.", [NSDate timeIntervalSinceReferenceDate] - t);
t = [NSDate timeIntervalSinceReferenceDate];
for (NSObject *obj in arr)
[self doNothingWith:obj]; // Can't be optimized out because it's a method call.
NSLog(#"%f seconds: Calling a method on an object ten million times.", [NSDate timeIntervalSinceReferenceDate] - t);
t = [NSDate timeIntervalSinceReferenceDate];
NSObject *o;
for (NSObject *obj in arr)
o = obj;
NSLog(#"%f seconds: Setting a pointer ten million times.", [NSDate timeIntervalSinceReferenceDate] - t);
With ARC disabled (-fno-objc-arc), this gives the following:
2.029345 seconds: Allocating ten million objects and putting them in an array.
0.047976 seconds: Calling a method on an object ten million times.
0.006162 seconds: Setting a pointer ten million times.
With ARC enabled, that becomes:
1.794860 seconds: Allocating ten million objects and putting them in an array.
0.067440 seconds: Calling a method on an object ten million times.
0.788266 seconds: Setting a pointer ten million times.
Apparently allocating objects and calling methods became somewhat cheaper. Assigning to an object pointer became more expensive by orders of magnitude, though don't forget that I didn't call -retain in the non-ARC example, and note that you can use __unsafe_unretained should you ever have a hotspot that assigns object pointers like crazy. Nevertheless, if you want to "forget about" memory management and let ARC insert retain/release calls where ever it wants, you will, in the general case, be wasting lots of CPU cycles, repeatedly and in all code pathes that set pointers. A tracing GC on the other hand leaves your code itself alone, and only kicks in at select moments (usually when allocating something), doing its thing in one fell swoop. (Of course the details are a lot more complicated in truth, given generational GC, incremental GC, concurrent GC, etc.)
So yes, since Objective-C's RC uses atomic retain/release, it is rather expensive, but Objective-C also has many more inefficiencies than that imposed by refcounting. (For instance, the fully dynamic/reflective nature of methods, which can be "swizzled" at any time by at run-time, prevents the compiler from doing many cross-method optimizations that would require data flow analysis and such. An objc_msgSend() is always a call to a "dynamically linked" black box from the view of the static analyzer, so to say.) All in all Objective-C as a language is not exactly the most efficient or best optimizable out there; people call it "C's type safety with Smalltalk's blazing speed" for a reason. ;-)
When writing Objective-C, one generally just instruments around well-implemented Apple libraries, which surely use C and C++ and assembly or whatever for their hotspots. Your own code barely ever needs to be efficient. When there is a hot spot, you can make it very efficient by dropping down to lower level constructs like pure C-style code within a single Objective-C method, but one rarely ever needs this. That's why Objective-C can afford the cost of ARC in the general case. I'm not yet convinced that tracing GC has any inherent problems in memory-constrained environments and think one could use a properly high-level language to instrument said libraries just as well, but apparently RC sits better with Apple/iOS. One has to consider the whole of the framework they've built up so far and all their legacy libraries when asking oneself why they didn't go with a tracing GC; for instance I've heard that RC is rather deeply built into CoreFoundation.
On the whole, the problems with reference counting outweight its advantages and it is rarely used for automatic storage management in programming language environments
The tricky word is automatic
Manual reference counting which is the traditional Obj-C way to do things, avoids the problems by delegating them to the programmer. The programmer has to know about the reference counting and manually add retain and release calls. If he/she creates a reference cycle, he/she is responsible for solving it.
The modern automatic reference counting does a lot of things for the programmer but still it's not a transparent storage management. The programmer still has to know about reference counting, still has to solve the reference cycles.
What's really tricky is to create a framework which handles memory managment by reference counting transparently, that is, without the need for the programmer to know about it. That's why it isn't used for automatic storage management.
The performance loss caused by additional instructions is not very big and usually it's not important.
Related
I have a C++ application which embeds Lua and periodically (~10 times per second) runs scripts that can create new userdata objects that allocate non-trivial amounts of memory (~1MiB each) on C++ side. The memory is correctly freed when these objects are garbage-collected, but the problem is that they don't seem to be collected until it's too late and the process starts consuming inordinate amounts of memory (many GiBs). I guess this happens because Lua GC thinks that these objects are just not worth collecting because they appear small on Lua side, and it has no idea how much memory do they actually consume. If this is correct, is there any way to tell it about the real cost of keeping these objects around?
For people familiar with C#, I'm basically looking for the equivalent of GC.AddMemoryPressure() for Lua.
The closest thing you're going to get at present is LUA_GCSTEP. The manual says:
lua_gc
int lua_gc (lua_State *L, int what, int data);
Controls the garbage collector. […] For more details about these options, see collectgarbage.
collectgarbage ([opt [, arg]])
This function is a generic interface to the garbage collector. It performs different functions according to its first argument, opt:
[…]
"step": performs a garbage-collection step. The step "size" is controlled by arg. With a zero value, the collector will perform one basic (indivisible) step. For non-zero values, the collector will perform as if that amount of memory (in KBytes) had been allocated by Lua. Returns true if the step finished a collection cycle.
(emphasis mine)
This will advance the GC once (as if that much was allocated), but that information isn't kept, meaning that parameters influencing the behavior for future collections (GC pause length, total memory in use, etc.) aren't affected. (So check if that's works well enough for you and if it doesn't maybe try playing with the GC pause and step multiplier.)
No. there is no such mechanism in Lua.
As i know, volatile is usually used to prevent unexpected compile optimization during some hardware operations. But which scenes volatile should be declared in property definition puzzles me. Please give some representative examples.
Thx.
A compiler assumes that the only way a variable can change its value is through code that changes it.
int a = 24;
Now the compiler assumes that a is 24 until it sees any statement that changes the value of a. If you write code somewhere below above statement that says
int b = a + 3;
the compiler will say "I know what a is, it's 24! So b is 27. I don't have to write code to perform that calculation, I know that it will always be 27". The compiler may just optimize the whole calculation away.
But the compiler would be wrong in case a has changed between the assignment and the calculation. However, why would a do that? Why would a suddenly have a different value? It won't.
If a is a stack variable, it cannot change value, unless you pass a reference to it, e.g.
doSomething(&a);
The function doSomething has a pointer to a, which means it can change the value of a and after that line of code, a may not be 24 any longer. So if you write
int a = 24;
doSomething(&a);
int b = a + 3;
the compiler will not optimize the calculation away. Who knows what value a will have after doSomething? The compiler for sure doesn't.
Things get more tricky with global variables or instance variables of objects. These variables are not on stack, they are on heap and that means that different threads can have access to them.
// Global Scope
int a = 0;
void function ( ) {
a = 24;
b = a + 3;
}
Will b be 27? Most likely the answer is yes, but there is a tiny chance that some other thread has changed the value of a between these two lines of code and then it won't be 27. Does the compiler care? No. Why? Because C doesn't know anything about threads - at least it didn't used to (the latest C standard finally knows native threads, but all thread functionality before that was only API provided by the operating system and not native to C). So a C compiler will still assume that b is 27 and optimize the calculation away, which may lead to incorrect results.
And that's what volatile is good for. If you tag a variable volatile like that
volatile int a = 0;
you are basically telling the compiler: "The value of a may change at any time. No seriously, it may change out of the blue. You don't see it coming and *bang*, it has a different value!". For the compiler that means it must not assume that a has a certain value just because it used to have that value 1 pico-second ago and there was no code that seemed to have changed it. Doesn't matter. When accessing a, always read its current value.
Overuse of volatile prevents a lot of compiler optimizations, may slow down calculation code dramatically and very often people use volatile in situations where it isn't even necessary. For example, the compiler never makes value assumptions across memory barriers. What exactly a memory barrier is? Well, that's a bit far beyond the scope of my reply. You just need to know that typical synchronization constructs are memory barriers, e.g. locks, mutexes or semaphores, etc. Consider this code:
// Global Scope
int a = 0;
void function ( ) {
a = 24;
pthread_mutex_lock(m);
b = a + 3;
pthread_mutex_unlock(m);
}
pthread_mutex_lock is a memory barrier (pthread_mutex_unlock as well, by the way) and thus it's not necessary to declare a as volatile, the compiler will not make an assumption of the value of a across a memory barrier, never.
Objective-C is pretty much like C in all these aspects, after all it's just a C with extensions and a runtime. One thing to note is that atomic properties in Obj-C are memory barriers, so you don't need to declare properties volatile. If you access the property from multiple threads, declare it atomic, which is even default by the way (if you don't mark it nonatomic, it will be atomic). If you never access it from multiple thread, tagging it nonatomic will make access to that property a lot faster, but that only pays off if you access the property really a lot (a lot doesn't mean ten times a minute, it's rather several thousand times a second).
So you want Obj-C code, that requires volatile?
#implementation SomeObject {
volatile bool done;
}
- (void)someMethod {
done = false;
// Start some background task that performes an action
// and when it is done with that action, it sets `done` to true.
// ...
// Wait till the background task is done
while (!done) {
// Run the runloop for 10 ms, then check again
[[NSRunLoop currentRunLoop]
runUntilDate:[NSDate dateWithTimeIntervalSinceNow:0.01]
];
}
}
#end
Without volatile, the compiler may be dumb enough to assume, that done will never change here and replace !done simply with true. And while (true) is an endless loop that will never terminate.
I haven't tested that with modern compilers. Maybe the current version of clang is more intelligent than that. It may also depend on how you start the background task. If you dispatch a block, the compiler can actually easily see whether it changes done or not. If you pass a reference to done somewhere, the compiler knows that the receiver may the value of done and will not make any assumptions. But I tested exactly that code a long time ago when Apple was still using GCC 2.x and there not using volatile really caused an endless loop that never terminated (yet only in release builds with optimizations enabled, not in debug builds). So I would not rely on the compiler being clever enough to do it right.
Just some more fun facts about memory barriers:
If you ever had a look at the atomic operations that Apple offers in <libkern/OSAtomic.h>, then you might have wondered why every operation exists twice: Once as x and once as xBarrier (e.g. OSAtomicAdd32 and OSAtomicAdd32Barrier). Well, now you finally know it. The one with "Barrier" in its name is a memory barrier, the other one isn't.
Memory barriers are not just for compilers, they are also for CPUs (there exists CPU instructions, that are considered memory barriers while normal instructions are not). The CPU needs to know these barriers because CPUs like to reorder instructions to perform operations out of order. E.g. if you do
a = x + 3 // (1)
b = y * 5 // (2)
c = a + b // (3)
and the pipeline for additions is busy, but the pipeline for multiplication is not, the CPU may perform instruction (2) before (1), after all the order won't matter in the end. This prevents a pipeline stall. Also the CPU is clever enough to know that it cannot perform (3) before either (1) or (2) because the result of (3) depends on the results of the other two calculations.
Yet, certain kinds of order changes will break the code, or the intention of the programmer. Consider this example:
x = y + z // (1)
a = 1 // (2)
The addition pipe might be busy, so why not just perform (2) before (1)? They don't depend on each other, the order shouldn't matter, right? Well, it depends. Consider another thread monitors a for changes and as soon as a becomes 1, it reads the value of x, which should now be y+z if the instructions were performed in order. Yet if the CPU reordered them, then x will have whatever value it used to have before getting to this code and this makes a difference as the other thread will now work with a different value, not the value the programmer would have expected.
So in this case the order will matter and that's why barriers are needed also for CPUs: CPUs don't order instructions across such barriers and thus instruction (2) would need to be a barrier instruction (or there needs to be such an instruction between (1) and (2); that depends on the CPU). However, reordering instructions is only performed by modern CPUs, a much older problem are delayed memory writes. If a CPU delays memory writes (very common for some CPUs, as memory access is horribly slow for a CPU), it will make sure that all delayed writes are performed and have completed before a memory barrier is crossed, so all memory is in a correct state in case another thread might now access it (and now you also know where the name "memory barrier" actually comes from).
You are probably working a lot more with memory barriers than you are even aware of (GCD - Grand Central Dispatch is full of these and NSOperation/NSOperationQueue bases on GCD), that's why your really need to use volatile only in very rare, exceptional cases. You might get away writing 100 apps and never have to use it even once. However, if you write a lot low level, multi-threading code that aims to achieve maximum performance possible, you will sooner or later run into a situation where only volatile can grantee you correct behavior; not using it in such a situation will lead to strange bugs where loops don't seem to terminate or variables simply seem to have incorrect values and you find no explanation for that. If you run into bugs like these, especially if you only see them in release builds, you might miss a volatile or a memory barrier somewhere in your code.
A good explanation is given here: Understanding “volatile” qualifier in C
The volatile keyword is intended to prevent the compiler from applying any optimizations on objects that can change in ways that cannot be determined by the compiler.
Objects declared as volatile are omitted from optimization because their values can be changed by code outside the scope of current code at any time. The system always reads the current value of a volatile object from the memory location rather than keeping its value in temporary register at the point it is requested, even if a previous instruction asked for a value from the same object. So the simple question is, how can value of a variable change in such a way that compiler cannot predict. Consider the following cases for answer to this question.
1) Global variables modified by an interrupt service routine outside the scope: For example, a global variable can represent a data port (usually global pointer referred as memory mapped IO) which will be updated dynamically. The code reading data port must be declared as volatile in order to fetch latest data available at the port. Failing to declare variable as volatile, the compiler will optimize the code in such a way that it will read the port only once and keeps using the same value in a temporary register to speed up the program (speed optimization). In general, an ISR used to update these data port when there is an interrupt due to availability of new data
2) Global variables within a multi-threaded application: There are multiple ways for threads communication, viz, message passing, shared memory, mail boxes, etc. A global variable is weak form of shared memory. When two threads sharing information via global variable, they need to be qualified with volatile. Since threads run asynchronously, any update of global variable due to one thread should be fetched freshly by another consumer thread. Compiler can read the global variable and can place them in temporary variable of current thread context. To nullify the effect of compiler optimizations, such global variables to be qualified as volatile
If we do not use volatile qualifier, the following problems may arise
1) Code may not work as expected when optimization is turned on.
2) Code may not work as expected when interrupts are enabled and used.
volatile comes from C. Type "C language volatile" into your favourite search engine (some of the results will probably come from SO), or read a book on C programming. There are plenty of examples out there.
To make this language agnostic let's pseudo code something along the lines of:
for(int i=0;i<=N;i++){
double d=0;
userDefinedObject o=new userDefinedObject();
//effectively do something useful
o.destroy();
}
Now, this may get into deeper details between Java/C++/Python etc, but:
1 - Is doing this with primitives wrong or just sort of ugly/overkill (d could be defined above, and set to 0 in each iteration if need be).
2 - Is doing this with an object actually wrong? Now,I know Java will take care of the memory but for C++ let's assume we have a proper destructor that we call.
Now - the question is quite succinct - is this wrong or just a matter of taste?
Thank you.
Java Garbage Collector will take care of any memory allocation without a reference, which means that if you instantiate on each iteration, you will allocate new memory and lose the reference to the previous one. Said this, you can conclude that the GC will take care of the non-referenced memory, BUT you also have to consider the fact that memory allocation, specifically object initialization takes time and process. So, if you do this on a small program, you're probably not going to feel anything wrong. But let say you're working with something like Bitmap, the allocation will totally own your memory.
For both cases, I'd say it is a matter of taste, but in a real life project, you should be totally sure that you need to initialize within a loop
I have been designing a component-based game library, with the overall intention of writing it in C++ (as that is my forte), with Ogre3D as the back-end. Now that I am actually ready to write some code, I thought it would be far quicker to test out my framework under the XNA4.0 framework (somewhat quicker to get results/write an editor, etc). However, whilst I am no newcomer to C++ or C#, I am a bit of a newcomer when it comes to doing things the "XNA" way, so to speak, so I had a few queries before I started hammering out code:
I read about using arrays rather than collections to avoid performance hits, then also read that this was not entirely true and that if you enumerated over, say, a concrete List<> collection (as opposed to an IEnumerable<>), the enumerator is a value-type that is used for each iteration and that there aren't any GC worries here. The article in question was back in 2007. Does this hold true, or do you experienced XNA developers have real-world gotchas about this? Ideally I'd like to go down a chosen route before I do too much.
If arrays truly are the way to go, no questions asked, I assume when it comes to resizing the array, you copy the old one over with new space? Or is this off the mark? Do you attempt to never, ever resize an array? Won't the GC kick in for the old one if this is the case, or is the hit inconsequential?
As the engine was designed for C++, the design allows for use of lambdas and delegates. One design uses the fastdelegate library which is the fastest possible way of using delegates in C++. A more flexible, but slightly slower approach (though hardly noticeable in the world of C++) is to use C++0x lambdas and std::function. Ideally, I'd like to do something similar in XNA, and allow delegates to be used. Does the use of delegates cause any significant issues with regard to performance?
If there are performance considerations with regards to delegates, is there a difference between:
public void myDelegate(int a, int b);
private void myFunction(int a, int b)
{
}
event myDelegate myEvent;
myEvent += myFunction;
vs:
public void myDelegate(int a, int b);
event myDelegate myEvent;
myEvent += (int a, int b) => { /* ... */ };
Sorry if I have waffled on a bit, I prefer to be clear in my questions. :)
Thanks in advance!
Basically the only major performance issue to be aware of in C# that is different to what you have to be aware of in C++, is the garbage collector. Simply don't allocate memory during your main game loop and you'll be fine. Here is a blog post that goes into detail.
Now to your questions:
1) If a framework collection iterator could be implemented as a value-type (not creating garbage), then it usually (always?) has been. You can safely use foreach on, for example, List<>.
You can verify if you are allocating in your main loop by using the CLR Profiler.
2) Use Lists instead of arrays. They'll handle the resizing for you. You should use the Capacity property to pre-allocate enough space before you start gameplay to avoid GC issues. Using arrays you'd just have to implement all this functionality yourself - ugly!
The GC kicks in on allocations (not when memory becomes free). On Xbox 360 it kicks in for every 1MB allocated and is very slow. On Windows it is a bit more complicated - but also doesn't have such a huge impact on performance.
3) C# delegates are pretty damn fast. And faster than most people expect. They are about on-par with method calls on interfaces. Here and here are questions that provide more detials about delegate performance in C#.
I couldn't say how they compare to the C++ options. You'd have to measure it.
4) No. I'm fairly sure this code will produce identical IL. You could disassemble it and check, or profile it, though.
I might add - without checking myself - I suspect that having an event myDelegate will be slower than a plain myDelegate if you don't need all the magic of event.
Warning! possibly a very dumb question
Does functional programming eat up more memory than procedural programming?
I mean ... if your objects(data structures whatever) are all imutable. Don't you end up having more object in the memory at a given time.
Doesn't this eat up more memory?
It depends on what you're doing. With functional programming you don't have to create defensive copies, so for certain problems it can end up using less memory.
Many functional programming languages also have good support for laziness, which can further reduce memory usage as you don't create objects until you actually use them. This is arguably something that's only correlated with functional programming rather than a direct cause, however.
Persistent values, that functional languages encourage but which can be implemented in an imperative language, make sharing a no-brainer.
Although the generally accepted idea is that with a garbage collector, there is some amount of wasted space at any given time (already unreachable but not yet collected blocks), in this context, without a garbage collector, you end up very often copying values that are immutable and could be shared, just because it's too much of a mess to decide who is responsible for freeing the memory after use.
These ideas are expanded on a bit in this experience report which does not claim to be an objective study but only anecdotal evidence.
Apart from avoiding defensive copies by the programmer, a very smart implementation of pure functional programming languages like Haskell or Standard ML (which lack physical pointer equality) can actively recover sharing of structurally equal values in memory, e.g. as part of the memory management and garbage collection.
Thus you can have automatic hash consing provided by your programming language runtime-system.
Compare this with objects in Java: object identity is an integral part of the language definition. Even just exchanging one immutable String for another poses semantic problems.
There is indeed at least a tendency to regard memory as affluent ressource (which, in fact, it really is in most cases), but this applies to modern programming as a whole.
With multiple cores, parallel garbage collectors and available RAM in the gigabytes, one used to concentrate on different aspects of a program than in earlier times, when every byte one could save counted. Remember when Bill Gates said "640K should be enough for every program"?
I know that I'm a lot late on this question.
Functional languages does not in general use more memory than imperative or OO languages. It depends more on the code you write. Yes F#, SML, Haskell and such has immutable values (not variables), but for all of them it goes without saying that if you update f.x. a single linked list, it re-compute only what is necessary.
Say you got a list of 5 elements, and you are removing the first 3 and adding a new one in front of it. it will simply get the pointer that points to the fourth element and let the new list point to that point of data i.e. reusing data. as seen below.
old list
[x0,x1,x2]
\
[x3,x4]
new list /
[y0,y1]
If it was an imperative language we could not do this because the values x3 and x4 could very well change over time, the list [x3,x4] could change too. Say that the 3 elements removed are not used afterward, the memory they use can be cleaned up right away, in contrast to unused space in an array.
That all data are immutable (except IO) are a strength. It simplifies the data flow analysis from a none trivial computation to a trivial one. This combined with a often very strong type system, will give the compiler a bunch of information about the code it can use to do optimization it normally could not do because of indicability. Most often the compiler turn values that are re-computed recursively and discarded from each iteration (recursion) into a mutable computation. These two things gives you the proof that if your program compile it will work. (with some assumptions)
If you look at the language Rust (not functional) just by learning about "borrow system" you will understand more about how and when things can be shared safely. it is a language that is painful to write code in unless you like to see your computer scream at you that your are an idiot. Rust is for the most part the combination of all the study made of programming language and type theory for more than 40 years. I mention Rust, because it despite the pain of writing in it, has the promise that if your program compile, there will be NO memory leaking, dead locking, dangling pointers, even in multi processing programs. This is because it uses much of the research of functional programming language that has been done.
For a more complex example of when functional programming uses less memory, I have made a lexer/parser interpreter (the same as generator but without the need to generate a code file) when computing the states of the DFA (deterministic finite automata) it uses immutable sets, because it compute new sets of already computed sets, my code allocate less memory simply because it borrow already known data points instead of copying it to a new set.
To wrap it up, yes functional programming can use more memory than imperative once. Most likely it is because you are using the wrong abstraction to mirror the problem. i.e. If you try to do it the imperative way in a functional language it will hurt you.
Try this book, it has not much on memory management but is a good book to start with if you will learn about compiler theory and yes it is legal to download. I have ask Torben, he is my old professor.
http://hjemmesider.diku.dk/~torbenm/Basics/
I'll throw my hat in the ring here. The short answer to the question is no, and this is because immutability does not mean the same thing as stored in memory. For example, let's take this toy program :
x = 2
x = x * 3
x = x * 2
print(x)
Which uses mutation to compute new values. Compare this to the same program which does not use mutation:
x = 2
y = x * 3
z = y * 2
print(z)
At first glance, it appears this requires 3x the memory of the first program! However, just because a value is immutable doesn't mean it needs to be stored in memory. In the case of the second program, after y is computed, x is no longer necessary, because it isn't used for the rest of the program, and can be garbage collected, or removed from memory. Similarly, after z is computed, y can be garbage collected. So, in principle, with a perfect garbage collector, after we execute the third line of code, I only need to have stored z in memory.
Another oft-worried about source of memory consumption in functional languages is deep recursion. For example, calculating a large Fibonacci number.
calc_fib(x):
if x > 1:
return x * calc_fib(x-1)
else:
return x
If I run calc_fib(100000), I could implement this in a way which requires storing 100000 values in memory, or I could use Tail-Call Elimination (basically storing only the most-recently computed value in memory instead of all function calls). For less straightforward recursion you can resort to trampolining. So for functional languages which support this, recursion does not need to be a source of massive memory consumption, either. However, not all nominally functional languages do (for example, JavaScript does not).