Related
According to this OCaml's memory is contained in two contiguous chunks of virtual memory managed by the OCaml runtime. I would like to know why this is necessary. Couldn't OCaml simply use the system malloc to allocate memory, and only use these heaps to store the block headers and pointers to the object's actual home in memory? This seems like a reinvention of the wheel when the operating system can do so much of the work instead.
I would also like to know why OCaml must allocate an entirely new major heap during the compact phase of garbage collection (source here), e.g.:
In the following rough sketch, let the letters A-D represent equally-sized pieces of OCaml blocks, and let . represent a freed space of the same size. From what I am given to understand, the OCaml garbage collector would "compact" this major heap:
[AAABB..CCCCCC.....DDD....]
By allocating a new major heap:
[.........................]
And copying the still-live blocks into it, before freeing the original heap:
[AAABBCCCCCCDDD...........]
How is this more efficient than simply rearranging those blocks inside the original heap? In the example above, a contiguity check could have avoided having to move blocks A and B at all, and in any case, how is it more efficient to ask the operating system to always allocate an entire new major heap?
Every garbage collector handle their own heap for performance reasons. malloc is a quite slow, and compared to OCaml minor heap allocation it is multiple orders of magnitude slower (an allocation in the minor heap is done in 2 assembly instructions).
For the compaction, no that doesn't reallocate.
Warning! possibly a very dumb question
Does functional programming eat up more memory than procedural programming?
I mean ... if your objects(data structures whatever) are all imutable. Don't you end up having more object in the memory at a given time.
Doesn't this eat up more memory?
It depends on what you're doing. With functional programming you don't have to create defensive copies, so for certain problems it can end up using less memory.
Many functional programming languages also have good support for laziness, which can further reduce memory usage as you don't create objects until you actually use them. This is arguably something that's only correlated with functional programming rather than a direct cause, however.
Persistent values, that functional languages encourage but which can be implemented in an imperative language, make sharing a no-brainer.
Although the generally accepted idea is that with a garbage collector, there is some amount of wasted space at any given time (already unreachable but not yet collected blocks), in this context, without a garbage collector, you end up very often copying values that are immutable and could be shared, just because it's too much of a mess to decide who is responsible for freeing the memory after use.
These ideas are expanded on a bit in this experience report which does not claim to be an objective study but only anecdotal evidence.
Apart from avoiding defensive copies by the programmer, a very smart implementation of pure functional programming languages like Haskell or Standard ML (which lack physical pointer equality) can actively recover sharing of structurally equal values in memory, e.g. as part of the memory management and garbage collection.
Thus you can have automatic hash consing provided by your programming language runtime-system.
Compare this with objects in Java: object identity is an integral part of the language definition. Even just exchanging one immutable String for another poses semantic problems.
There is indeed at least a tendency to regard memory as affluent ressource (which, in fact, it really is in most cases), but this applies to modern programming as a whole.
With multiple cores, parallel garbage collectors and available RAM in the gigabytes, one used to concentrate on different aspects of a program than in earlier times, when every byte one could save counted. Remember when Bill Gates said "640K should be enough for every program"?
I know that I'm a lot late on this question.
Functional languages does not in general use more memory than imperative or OO languages. It depends more on the code you write. Yes F#, SML, Haskell and such has immutable values (not variables), but for all of them it goes without saying that if you update f.x. a single linked list, it re-compute only what is necessary.
Say you got a list of 5 elements, and you are removing the first 3 and adding a new one in front of it. it will simply get the pointer that points to the fourth element and let the new list point to that point of data i.e. reusing data. as seen below.
old list
[x0,x1,x2]
\
[x3,x4]
new list /
[y0,y1]
If it was an imperative language we could not do this because the values x3 and x4 could very well change over time, the list [x3,x4] could change too. Say that the 3 elements removed are not used afterward, the memory they use can be cleaned up right away, in contrast to unused space in an array.
That all data are immutable (except IO) are a strength. It simplifies the data flow analysis from a none trivial computation to a trivial one. This combined with a often very strong type system, will give the compiler a bunch of information about the code it can use to do optimization it normally could not do because of indicability. Most often the compiler turn values that are re-computed recursively and discarded from each iteration (recursion) into a mutable computation. These two things gives you the proof that if your program compile it will work. (with some assumptions)
If you look at the language Rust (not functional) just by learning about "borrow system" you will understand more about how and when things can be shared safely. it is a language that is painful to write code in unless you like to see your computer scream at you that your are an idiot. Rust is for the most part the combination of all the study made of programming language and type theory for more than 40 years. I mention Rust, because it despite the pain of writing in it, has the promise that if your program compile, there will be NO memory leaking, dead locking, dangling pointers, even in multi processing programs. This is because it uses much of the research of functional programming language that has been done.
For a more complex example of when functional programming uses less memory, I have made a lexer/parser interpreter (the same as generator but without the need to generate a code file) when computing the states of the DFA (deterministic finite automata) it uses immutable sets, because it compute new sets of already computed sets, my code allocate less memory simply because it borrow already known data points instead of copying it to a new set.
To wrap it up, yes functional programming can use more memory than imperative once. Most likely it is because you are using the wrong abstraction to mirror the problem. i.e. If you try to do it the imperative way in a functional language it will hurt you.
Try this book, it has not much on memory management but is a good book to start with if you will learn about compiler theory and yes it is legal to download. I have ask Torben, he is my old professor.
http://hjemmesider.diku.dk/~torbenm/Basics/
I'll throw my hat in the ring here. The short answer to the question is no, and this is because immutability does not mean the same thing as stored in memory. For example, let's take this toy program :
x = 2
x = x * 3
x = x * 2
print(x)
Which uses mutation to compute new values. Compare this to the same program which does not use mutation:
x = 2
y = x * 3
z = y * 2
print(z)
At first glance, it appears this requires 3x the memory of the first program! However, just because a value is immutable doesn't mean it needs to be stored in memory. In the case of the second program, after y is computed, x is no longer necessary, because it isn't used for the rest of the program, and can be garbage collected, or removed from memory. Similarly, after z is computed, y can be garbage collected. So, in principle, with a perfect garbage collector, after we execute the third line of code, I only need to have stored z in memory.
Another oft-worried about source of memory consumption in functional languages is deep recursion. For example, calculating a large Fibonacci number.
calc_fib(x):
if x > 1:
return x * calc_fib(x-1)
else:
return x
If I run calc_fib(100000), I could implement this in a way which requires storing 100000 values in memory, or I could use Tail-Call Elimination (basically storing only the most-recently computed value in memory instead of all function calls). For less straightforward recursion you can resort to trampolining. So for functional languages which support this, recursion does not need to be a source of massive memory consumption, either. However, not all nominally functional languages do (for example, JavaScript does not).
I have been using cl_mem in some of my OpenCL boilerplate code, but I have been using it through context and not a sharp understanding of what exactly it is. I have been using it as a type for the memory I push on and off the board, which has so far been floats. I tried looking at the OpenCL docs, but cl_mem doesn't show up (does it?). Is there any documentation on it, or is it simple and can someone explain.
The cl_mem type is a handle to a "Memory Object" (as described in Section 3.5 of the OpenCL 1.1 Spec). These essentially are inputs and outputs for OpenCL kernels, and are returned from OpenCL API calls in host code such as clCreateBuffer
cl_mem clCreateBuffer (cl_context context, cl_mem_flags flags,
size_t size, void *host_ptr, cl_int *errcode_ret)
The memory areas represented can be permitted different access patterns e.g. Read Only, or be allocated in different memory regions, depending on the flags set in the create buffer calls.
The handle is typically stored to allow a later call to release the memory, e.g:
cl_int clReleaseMemObject (cl_mem memobj)
In short, it provides an abstraction over where the memory actually is: you can copy data into the associated memory or back out via the OpenCL APIs clEnqueueWriteBuffer and clEnqueueReadBuffer, but the OpenCL implementation can allocate the space where it wants.
For the computer a cl_mem is a number (like a file handler for Linux) that is reserved for the use as a "memory identifier"( the API/driver whatever stores information about your memory under this number that it knows what it holds/how big it is and stuff like that)
I've heard of stackless languages. However I don't have any idea how such a language would be implemented. Can someone explain?
The modern operating systems we have (Windows, Linux) operate with what I call the "big stack model". And that model is wrong, sometimes, and motivates the need for "stackless" languages.
The "big stack model" assumes that a compiled program will allocate "stack frames" for function calls in a contiguous region of memory, using machine instructions to adjust registers containing the stack pointer (and optional stack frame pointer) very rapidly. This leads to fast function call/return, at the price of having a large, contiguous region for the stack. Because 99.99% of all programs run under these modern OSes work well with the big stack model, the compilers, loaders, and even the OS "know" about this stack area.
One common problem all such applications have is, "how big should my stack be?". With memory being dirt cheap, mostly what happens is that a large chunk is set aside for the stack (MS defaults to 1Mb), and typical application call structure never gets anywhere near to using it up. But if an application does use it all up, it dies with an illegal memory reference ("I'm sorry Dave, I can't do that"), by virtue of reaching off the end of its stack.
Most so-called called "stackless" languages aren't really stackless. They just don't use the contiguous stack provided by these systems. What they do instead is allocate a stack frame from the heap on each function call. The cost per function call goes up somewhat; if functions are typically complex, or the language is interpretive, this additional cost is insignificant. (One can also determine call DAGs in the program call graph and allocate a heap segment to cover the entire DAG; this way you get both heap allocation and the speed of classic big-stack function calls for all calls inside the call DAG).
There are several reasons for using heap allocation for stack frames:
If the program does deep recursion dependent on the specific problem it is solving,
it is very hard to preallocate a "big stack" area in advance because the needed size isn't known. One can awkwardly arrange function calls to check to see if there's enough stack left, and if not, reallocate a bigger chunk, copy the old stack and readjust all the pointers into the stack; that's so awkward that I don't know of any implementations.
Allocating stack frames means the application never has to say its sorry until there's
literally no allocatable memory left.
The program forks subtasks. Each subtask requires its own stack, and therefore can't use the one "big stack" provided. So, one needs to allocate stacks for each subtask. If you have thousands of possible subtasks, you might now need thousands of "big stacks", and the memory demand suddenly gets ridiculous. Allocating stack frames solves this problem. Often the subtask "stacks" refer back to the parent tasks to implement lexical scoping; as subtasks fork, a tree of "substacks" is created called a "cactus stack".
Your language has continuations. These require that the data in lexical scope visible to the current function somehow be preserved for later reuse. This can be implemented by copying parent stack frames, climbing up the cactus stack, and proceeding.
The PARLANSE programming language I implemented does 1) and 2). I'm working on 3). It is amusing to note that PARLANSE allocates stack frames from a very fast-access heap-per-thread; it costs typically 4 machine instructions. The current implementation is x86 based, and the allocated frame is placed in the x86 EBP/ESP register much like other conventional x86 based language implementations. So it does use the hardware "contiguous stack" (including pushing and poppping) just in chunks. It also generates "frame local" subroutine calls the don't switch stacks for lots of generated utility code where the stack demand is known in advance.
Stackless Python still has a Python stack (though it may have tail call optimization and other call frame merging tricks), but it is completely divorced from the C stack of the interpreter.
Haskell (as commonly implemented) does not have a call stack; evaluation is based on graph reduction.
There is a nice article about the language framework Parrot. Parrot does not use the stack for calling and this article explains the technique a bit.
In the stackless environments I'm more or less familiar with (Turing machine, assembly, and Brainfuck), it's common to implement your own stack. There is nothing fundamental about having a stack built into the language.
In the most practical of these, assembly, you just choose a region of memory available to you, set the stack register to point to the bottom, then increment or decrement to implement your pushes and pops.
EDIT: I know some architectures have dedicated stacks, but they aren't necessary.
Call me ancient, but I can remember when the FORTRAN standards and COBOL did not support recursive calls, and therefore didn't require a stack. Indeed, I recall the implementations for CDC 6000 series machines where there wasn't a stack, and FORTRAN would do strange things if you tried to call a subroutine recursively.
For the record, instead of a call-stack, the CDC 6000 series instruction set used the RJ instruction to call a subroutine. This saved the current PC value at the call target location and then branches to the location following it. At the end, a subroutine would perform an indirect jump to the call target location. That reloaded saved PC, effectively returning to the caller.
Obviously, that does not work with recursive calls. (And my recollection is that the CDC FORTRAN IV compiler would generate broken code if you did attempt recursion ...)
There is an easy to understand description of continuations on this article: http://www.defmacro.org/ramblings/fp.html
Continuations are something you can pass into a function in a stack-based language, but which can also be used by a language's own semantics to make it "stackless". Of course the stack is still there, but as Ira Baxter described, it's not one big contiguous segment.
Say you wanted to implement stackless C. The first thing to realize is that this doesn't need a stack:
a == b
But, does this?
isequal(a, b) { return a == b; }
No. Because a smart compiler will inline calls to isequal, turning them into a == b. So, why not just inline everything? Sure, you will generate more code but if getting rid of the stack is worth it to you then this is easy with a small tradeoff.
What about recursion? No problem. A tail-recursive function like:
bang(x) { return x == 1 ? 1 : x * bang(x-1); }
Can still be inlined, because really it's just a for loop in disguise:
bang(x) {
for(int i = x; i >=1; i--) x *= x-1;
return x;
}
In theory a really smart compiler could figure that out for you. But a less-smart one could still flatten it as a goto:
ax = x;
NOTDONE:
if(ax > 1) {
x = x*(--ax);
goto NOTDONE;
}
There is one case where you have to make a small trade off. This can't be inlined:
fib(n) { return n <= 2 ? n : fib(n-1) + fib(n-2); }
Stackless C simply cannot do this. Are you giving up a lot? Not really. This is something normal C can't do well very either. If you don't believe me just call fib(1000) and see what happens to your precious computer.
Please feel free to correct me if I'm wrong, but I would think that allocating memory on the heap for each function call frame would cause extreme memory thrashing. The operating system does after all have to manage this memory. I would think that the way to avoid this memory thrashing would be a cache for call frames. So if you need a cache anyway, we might as well make it contigous in memory and call it a stack.
I've been looking into how programming languages work, and some of them have a so-called virtual machines. I understand that this is some form of emulation of the programming language within another programming language, and that it works like how a compiled language would be executed, with a stack. Did I get that right?
With the proviso that I did, what bamboozles me is that many non-compiled languages allow variables with "liberal" type systems. In Python for example, I can write this:
x = "Hello world!"
x = 2**1000
Strings and big integers are completely unrelated and occupy different amounts of space in memory, so how can this code even be represented in a stack-based environment? What exactly happens here? Is x pointed to a new place on the stack and the old string data left unreferenced? Do these languages not use a stack? If not, how do they represent variables internally?
Probably, your question should be titled as "How do dynamic languages work?."
That's simple, they store the variable type information along with it in memory. And this is not only done in interpreted or JIT compiled languages but also natively-compiled languages such as Objective-C.
In most VM languages, variables can be conceptualized as pointers (or references) to memory in the heap, even if the variable itself is on the stack. For languages that have primitive types (int and bool in Java, for example) those may be stored on the stack as well, but they can not be assigned new types dynamically.
Ignoring primitive types, all variables that exist on the stack have their actual values stored in the heap. Thus, if you dynamically reassign a value to them, the original value is abandoned (and the memory cleaned up via some garbage collection algorithm), and the new value is allocated in a new bit of memory.
The VM has nothing to do with the language. Any language can run on top of a VM (the Java VM has hundreds of languages already).
A VM enables a different kind of "assembly language" to be run, one that is more fit to adapting a compiler to. Everything done in a VM could be done in a CPU, so think of the VM like a CPU. (Some actually are implemented in hardware).
It's extremely low level, and in many cases heavily stack based--instead of registers, machine-level math is all relative to locations relative to the current stack pointer.
With normal compiled languages, many instructions are required for a single step. a + might look like "Grab the item from a point relative to the stack pointer into reg a, grab another into reg b. add reg a and b. put reg a into a place relative to the stack pointer.
The VM does all this with a single, short instruction, possibly one or two bytes instead of 4 or 8 bytes PER INSTRUCTION in machine language (depending on 32 or 64 bit architecture) which (guessing) should mean around 16 or 32 bytes of x86 for 1-2 bytes of machine code. (I could be wrong, my last x86 coding was in the 80286 era.)
Microsoft used (probably still uses) VMs in their office products to reduce the amount of code.
The procedure for creating the VM code is the same as creating machine language, just a different processor type essentially.
VMs can also implement their own security, error recovery and memory mechanisms that are very tightly related to the language.
Some of my description here is summary and from memory. If you want to explore the bytecode definition yourself, it's kinda fun:
http://java.sun.com/docs/books/jvms/second_edition/html/Instructions2.doc.html
The key to many of the 'how do VMs handle variables like this or that' really comes down to metadata... The meta information stored and then updated gives the VM a much better handle on how to allocate and then do the right thing with variables.
In many cases this is the type of overhead that can really get in the way of performance. However, modern day implementations, etc have come a long way in doing the right thing.
As for your specific questions - treating variables as vanilla objects / etc ... comes down to reassigning / reevaluating meta information on new assignments - that's why x can look one way and then the next.
To answer a part of your questions, I'd recommend a google tech talk about python, where some of your questions concerning dynamic languages are answered; for example what a variable is (it is not a pointer, nor a reference, but in case of python a label).