I get that a flex buffer stack, or a stack in general, likes to abide by LIFO (Last In? First Out!). And I see in the manual that there are two "intended" ways to go about buffer selection: manually allocate and choose and free, or abide by the LIFO behavior of yypush_buffer_state() and yypop_buffer_state().
But my reentrant lexer would do well to yylex its stacks in this order:
[3] pushed from [2] pushed from [1] pushed from [0]
[2]
[1]
[0]
[3] the same one from before...
[2]
[1]
[0]
[3]
[2]
[1]
[0]
(pop pop pop)
[0] which was [3] a moment ago.
I thought I would be cheeky and simply yy_switch_to_buffer() a lower buffer in the stack, and make use of the existing push function for automatic allocation. But it turns out that function internally refers to YY_CURRENT_BUFFER macro which, when a stack exists, is:
((struct yyguts_t*)yyscanner)->yy_buffer_stack[
((struct yyguts_t*)yyscanner)->yy_buffer_stack_top]
And it will become confused when we try to switch to buffers lower on the stack than the top. I guess it was a very strongly worded "intended"!
I have found that decrementing yy_buffer_stack_top and then doing yy_switch_to_buffer(((struct yyguts_t*)yyscanner)->yy_buffer_stack[((struct yyguts_t*)yyscanner)->yy_buffer_stack_top], yyscanner) seems to work. (I'd use YY_CURRENT_BUFFER but it actually requires an internal yyg definition.) I can even increment the stack top back to its maximum value and return to the untouched buffer that was only unacknowledged for a while.
Is this a poor way to go about it? Am I going to run into any problems other than: I must not yypush_buffer_state() or yypop_buffer_state() without first resetting yy_buffer_stack_top? I can manage that...
Related
why stack pointer is initialized to the maximum value?
I only knows that It is the tiny register which stores the last program request’s address in a stack. It is the particular kind of buffer that stores the information in the order of top-down. but can anyone explain me why initially it's having max value.
Without more detail on the actual microprocessor, there isn't a single exact answer; but in general, each architecture handles stack pointer initialization a bit differently. For example, the version of ARM used in many microprocessors initializes the stack pointer (also R13) from the vector table (the first entry). Other architectures either initialize the register to 0 or some other value; so it's not always all 1s as you mention in your question. If the hardware itself doesn't initialize the stack pointer so somewhere meaningful, some of the first instructions usually does. And usually this value is near or at the top of memory as you mention, the stack typically grows down from higher addresses to lower ones; so a value of all 1s or some other larger value might make sense depending on how memory is laid out and managed.
One thing also worth mentioning is that you say the stack stores the "last program request's address" which if I understand correctly is one of the things the stack stores. In more architectures, the stack can store much more than just the return address of a call but also local variables, saved context when a call is made or a context is swapped (either by an OS or by an exception/interrupt) and anything else the program might want to push onto it.
So the short answer is: it isn't always set to the maximum value but it's usually set to some high value as the stack will grow down to lower addresses as things like data and addresses are pushed on it.
How to implement copy-on-write technique for stack management in postfix calculations of big numbers?
I want to optimize my system regarding operations like:
duplicate top of stack
swap the two top elements
copy the second element on stack to the top
rotate the stack making the third element on top
apply n-ary operations on the stack
A typical operation take a number of parameters from top of stack and leave a number of results instead.
My stack is implemented as an array where data grows from low-mem to hi-mem and pointers from hi-mem to low-mem.
As I can see, the problem is not in the copy-on-write technique per se, but in memory management and garbage collecting.
In Forth we cannot have operator overloading and should have an explicitly separate set of operations for each class of numbers. Then we have two options:
A set of operations over bigint objects (their handlers), with manual memory management, similar to operations over files, dynamic memory buffers, or other objects.
A full set of operations including the separate stack, like operations over floating-point numbers, with automatic memory management.
Note that option (2) contains (1) under the hood.
Reference counting
One approach to implement automatic memory management in option (2) is to use a dedicated stack (or maybe two stacks) and reference counting. The stack contains references to objects (buffers). The operations that alter data (like b::1+) should make copy of the buffer and replace the reference by a new one if the counter is greater than 1 (otherwise data can be altered in the place).
Placing an item (a reference) to the stack should increase its counter, removing from the stack should decrease its counter. When the counter become 0, the buffer should be freed.
Such operations like b::dup, b::over increase the counter. b::swap doesn't change any counter. b::drop decreases the counter (and frees the buffer if the counter is 0).
Moving an item from the stack into a bigint variable should not change the counter in the result. But the counter for the previous value of the variable (if any) should be decreased.
If named variables and constants are not enough (e.g., to have user-defined arrays), you may need to introduce into the API the pointers (a kind of anonymous variables) or handlers to bigint objects.
Immutable objects
Another approach to implement option (2) is to have immutable objects and garbage collection loop. In this approach we don't need to count references, but need to maintain the list of external pointers (including variables).
A simple example of such garbage collection can be found in the implementation of s-expressions by Peter Sovietov: Функциональное программирование на языке Форт ("Functional programming in Forth language" in Russian, but it can be easy translated using Google Translate).
I have created several tasks, the body of which is the same function. Inside the function, there is a delay that is the same for every task. So, when this delay is large enough, the stack of each task is filled with 6 words less than when the delay is less.
As far as I understand, the stack of tasks increases when there is a situation with several tasks with the status Ready.
1.In this situation, some additional 6-word context is written to the task stack?
2.In my example, it turns out 6 words (24 bytes), can this value change somehow?
3.What else can affect the increase in the stack, such as jumping into an interrupt handler?
int main(void){
xTaskCreate(Task_PrintCountString, "Task_1", mySTACK_SIZE, ¶m1, 1, NUUL);
xTaskCreate(Task_PrintCountString, "Task_2", mySTACK_SIZE, ¶m2, 1, NUUL);
xTaskCreate(Task_PrintCountString, "Task_3", mySTACK_SIZE, ¶m3, 1, NUUL);
vTaskStartScheduler();
}
#define myDELAY 100
void Task_PrintCountString(void *pParams){
uint16_t c=0;
for(;;){
if(xSemaphoreTake(WriteCountMutex, portMAX_DELAY) == pdTRUE){
PrintCountString(*(uint8_t *)pParams, c++);
xSemaphoreGive(WriteCountMutex);
}
vTaskDelay(myDELAY/portTICK_PERIOD_MS);//When myDELAY=1, the task stack is 6 words more than when myDELAY= 100!
}
}
Address 0x20000158 is the top of the stack for one of the tasks. Similarly, others!
The only function PrintCountString always the same depth. But at the same time, the stack can grow to its maximum knowledge in several stages, going through dozens of iterations of the task cycle! It turns out that not only the context is saved to the stack, but something else?
P.S.
I use ARM CM0 port and heap_1.
I made an observation:
In the PrintCountString function there was a block for waiting for the SPI flag - while(). I replaced the DMA transfer method and removed the while() loop. The stack of both tasks began to fill up immediately to the maximum value, regardless of delays.
FreeRTOS is just C code so, with one exception, the stack usage is determined by the compiler, including the selected optimisation level. The one exception be that when a task is not running its context is saved to its stack. The context size is fixed in all cases that matter. The size of the context depends on which FreeRTOS port you are using (there are more than 40).
There are a few things in your post I'm not sure about.
when this delay is large enough, the stack of each task is filled with
6 words less than when the delay is less
As above, the C code doesn't change depending on how long a delay you have - the compiler knows nothing about FreeRTOS or the concept of a delay. You may see different stack depths depending on where the code was in the call tree when an interrupt occurs (including interrupts that cause context switches).
I look at the top of the stack and count the remaining number of bytes
to be 0xA5
You don't say which port you are using, so I don't know if the stack grows up or down. In any case the stack is filled with 0xa5 when the task is created but never touched by the kernel again so the number of 0xa5s left shows the maximum amount of stack the task has used since it was created. That value is returned by calling uxTaskGetStackHighWaterMark().
In general the stack memory is used by the microcontroller itself. There are several instructions which writes/reads some data to/from the stack.
The PUSH instruction copies the content of one register to the stack and modifies the stack pointer register
The POP instruction reads a word from the stack into a register and modifies the stack pointer register
Both instructions are managed by the compiler. If a functions is executed the compiler creates some PUSH instructions to "free" some registers to work with. At the end of the function the original register state will be restored using the POP instructions to guarantee the upcoming control flow.
and each branch instruction stores several register values to the stack
The amount of used stack memory depends on the call hierarchy.
According to your example, if the SemaphoreTake function is called and finds a released semaphore the call hierarchy is different from a situation where the semaphore is blocked. This creates a different stack usage.
Objects can be put on and removed only from the top of a stack. But what about reading and writing their values? Please correct me if I'm wrong, but I think process must be able to read from any part of the stack, since if only reading from the top was possible it would have to remove (and store somewhere) whole content of the stack above a variable it wants to examine. But in that case, how does the process know where exactly in the stack is a particular variable? I suspect it just holds a pointer to it, but where is that pointer stored?
Another thing - reading about stacks I often find phrases like "All memory allocated on the stack is known at compile time." Well, I probably misunderstand this, so please tell me where's the flaw in my logic:
Suppose a local variable is created when an if() statement is true, and isn't when it's false. Whether it's true will turn out at run time. So at compile time there's no way to know if it should be created, hence I wouldn't think memory for it is allocated at all, as it would be wasteful. Consequently, it isn't created/known at compile time.
At compile time, it's known how much space each type needs: An Integer, for instance, is 4 Bytes wide on 32 bit platforms, and a class with 2 Integers consumes 8 Bytes. Whether this space is allocated for a specific variable is not necessarily known (may depend on an if, as you stated).
When you invoke a method, all parameters and the return address are pushed onto the stack. To get one parameter, you walk up the stack up to its position, which is computed by the base pointer and the size of each parameter.
So it is not entirely true for this stack that you can access the top element only. It is, however, for the Stack data structure.
I've heard of stackless languages. However I don't have any idea how such a language would be implemented. Can someone explain?
The modern operating systems we have (Windows, Linux) operate with what I call the "big stack model". And that model is wrong, sometimes, and motivates the need for "stackless" languages.
The "big stack model" assumes that a compiled program will allocate "stack frames" for function calls in a contiguous region of memory, using machine instructions to adjust registers containing the stack pointer (and optional stack frame pointer) very rapidly. This leads to fast function call/return, at the price of having a large, contiguous region for the stack. Because 99.99% of all programs run under these modern OSes work well with the big stack model, the compilers, loaders, and even the OS "know" about this stack area.
One common problem all such applications have is, "how big should my stack be?". With memory being dirt cheap, mostly what happens is that a large chunk is set aside for the stack (MS defaults to 1Mb), and typical application call structure never gets anywhere near to using it up. But if an application does use it all up, it dies with an illegal memory reference ("I'm sorry Dave, I can't do that"), by virtue of reaching off the end of its stack.
Most so-called called "stackless" languages aren't really stackless. They just don't use the contiguous stack provided by these systems. What they do instead is allocate a stack frame from the heap on each function call. The cost per function call goes up somewhat; if functions are typically complex, or the language is interpretive, this additional cost is insignificant. (One can also determine call DAGs in the program call graph and allocate a heap segment to cover the entire DAG; this way you get both heap allocation and the speed of classic big-stack function calls for all calls inside the call DAG).
There are several reasons for using heap allocation for stack frames:
If the program does deep recursion dependent on the specific problem it is solving,
it is very hard to preallocate a "big stack" area in advance because the needed size isn't known. One can awkwardly arrange function calls to check to see if there's enough stack left, and if not, reallocate a bigger chunk, copy the old stack and readjust all the pointers into the stack; that's so awkward that I don't know of any implementations.
Allocating stack frames means the application never has to say its sorry until there's
literally no allocatable memory left.
The program forks subtasks. Each subtask requires its own stack, and therefore can't use the one "big stack" provided. So, one needs to allocate stacks for each subtask. If you have thousands of possible subtasks, you might now need thousands of "big stacks", and the memory demand suddenly gets ridiculous. Allocating stack frames solves this problem. Often the subtask "stacks" refer back to the parent tasks to implement lexical scoping; as subtasks fork, a tree of "substacks" is created called a "cactus stack".
Your language has continuations. These require that the data in lexical scope visible to the current function somehow be preserved for later reuse. This can be implemented by copying parent stack frames, climbing up the cactus stack, and proceeding.
The PARLANSE programming language I implemented does 1) and 2). I'm working on 3). It is amusing to note that PARLANSE allocates stack frames from a very fast-access heap-per-thread; it costs typically 4 machine instructions. The current implementation is x86 based, and the allocated frame is placed in the x86 EBP/ESP register much like other conventional x86 based language implementations. So it does use the hardware "contiguous stack" (including pushing and poppping) just in chunks. It also generates "frame local" subroutine calls the don't switch stacks for lots of generated utility code where the stack demand is known in advance.
Stackless Python still has a Python stack (though it may have tail call optimization and other call frame merging tricks), but it is completely divorced from the C stack of the interpreter.
Haskell (as commonly implemented) does not have a call stack; evaluation is based on graph reduction.
There is a nice article about the language framework Parrot. Parrot does not use the stack for calling and this article explains the technique a bit.
In the stackless environments I'm more or less familiar with (Turing machine, assembly, and Brainfuck), it's common to implement your own stack. There is nothing fundamental about having a stack built into the language.
In the most practical of these, assembly, you just choose a region of memory available to you, set the stack register to point to the bottom, then increment or decrement to implement your pushes and pops.
EDIT: I know some architectures have dedicated stacks, but they aren't necessary.
Call me ancient, but I can remember when the FORTRAN standards and COBOL did not support recursive calls, and therefore didn't require a stack. Indeed, I recall the implementations for CDC 6000 series machines where there wasn't a stack, and FORTRAN would do strange things if you tried to call a subroutine recursively.
For the record, instead of a call-stack, the CDC 6000 series instruction set used the RJ instruction to call a subroutine. This saved the current PC value at the call target location and then branches to the location following it. At the end, a subroutine would perform an indirect jump to the call target location. That reloaded saved PC, effectively returning to the caller.
Obviously, that does not work with recursive calls. (And my recollection is that the CDC FORTRAN IV compiler would generate broken code if you did attempt recursion ...)
There is an easy to understand description of continuations on this article: http://www.defmacro.org/ramblings/fp.html
Continuations are something you can pass into a function in a stack-based language, but which can also be used by a language's own semantics to make it "stackless". Of course the stack is still there, but as Ira Baxter described, it's not one big contiguous segment.
Say you wanted to implement stackless C. The first thing to realize is that this doesn't need a stack:
a == b
But, does this?
isequal(a, b) { return a == b; }
No. Because a smart compiler will inline calls to isequal, turning them into a == b. So, why not just inline everything? Sure, you will generate more code but if getting rid of the stack is worth it to you then this is easy with a small tradeoff.
What about recursion? No problem. A tail-recursive function like:
bang(x) { return x == 1 ? 1 : x * bang(x-1); }
Can still be inlined, because really it's just a for loop in disguise:
bang(x) {
for(int i = x; i >=1; i--) x *= x-1;
return x;
}
In theory a really smart compiler could figure that out for you. But a less-smart one could still flatten it as a goto:
ax = x;
NOTDONE:
if(ax > 1) {
x = x*(--ax);
goto NOTDONE;
}
There is one case where you have to make a small trade off. This can't be inlined:
fib(n) { return n <= 2 ? n : fib(n-1) + fib(n-2); }
Stackless C simply cannot do this. Are you giving up a lot? Not really. This is something normal C can't do well very either. If you don't believe me just call fib(1000) and see what happens to your precious computer.
Please feel free to correct me if I'm wrong, but I would think that allocating memory on the heap for each function call frame would cause extreme memory thrashing. The operating system does after all have to manage this memory. I would think that the way to avoid this memory thrashing would be a cache for call frames. So if you need a cache anyway, we might as well make it contigous in memory and call it a stack.