Load from the terminal input buffer to parameter stack - forth

Why does this code not work?
TIB 10 ACCEPT
TIB SP# 1 cells - 10 cmove
In that code I tried to enter a string and store it in the terminal input buffer and later store it on the parameter stack.
But with .S I see that does not work.

The parameter stack grows towards low memory
The main problem with the sample code is that the parameter stack grows towards low memory. So the starting point for the destination of the copy should be at a higher memory address (inside the existing/defined parameter stack). So instead of
TIB SP# 1 cells - 10 cmove
it should be:
TIB SP# 1 cells + 10 cmove
Memory allocation for the string
The next problem is that there is not enough storage for the string on the parameter stack. ACCEPT has left over one cell (four bytes on a 32-bit system), the actual number of characters. With sample input "user10181" (9 characters),
TIB 10 ACCEPT
results in:
.S <1> 9 ok
Forgetting about that extra element for the moment1, for the purpose of this elaboration, we allocate four cells on the parameter stack (the actual value, for example, 235, does not matter), 16 bytes on a 32-bit system:
235 DUP DUP DUP
The result of TIB SP# 1 cells + 10 cmove is then:
.S <5> 9 235 8241 541085779 541215060 ok
We see that three of the four cells (each with four bytes) have been overwritten by cmove (as expected).
TIB is overwritten by subsequent input from the terminal
Unfortunately, our copied bytes are not as expected. Decoding the output for the three changed cells (that are in decimal), first from
8241 541085779 541215060
to hexadecimal:
2031 20405053 20424954
And then decoding as ASCII:
20 31 20 40 50 53 20 42 49 54
1 # P S B I T
And reversing (we had the high memory first and the test platform is little endian):
"TIB SP# 1 "
That is the first ten characters of our second line, TIB SP# 1 cells + 10 cmove. Thus it is clear that the terminal input buffer (TIB) is too temporary to be used in this case.
The solution
The solution to the third problem is to have all the code compiled before we ask for user input. For instance, put it into a word, inputOnStack:
: inputOnStack TIB 10 ACCEPT 235 DUP DUP DUP TIB SP# 1 cells + 10 cmove ;
The result is then:
inputOnStack user10181 ok
.S <5> 9 235 24881 942747697 1919251317 ok
That corresponds to "user10181" and the tenth character is "a" (most likely from the "a" in inputOnStack).
Test platform:
Raspberry Pi, model B.
Operating system: Raspbian, as installed by NOOBS 1.3.10, released 2014-09-09.
Gforth: version 0.7.0 (installed with sudo apt-get update; sudo apt-get install gforth)
1. A more advanced version of the code could use the actual number of characters. In any case, it should DROPped one way or the other to balance the stack if this code is be integrated into other code.

Consider very carefully what happens to the stack after each word. I have reproduced your code below, and annotated it with the stack depth at every point.
( 0 ) TIB ( 1 ) 10 ( 2 ) ACCEPT ( 1 )
( 1 ) TIB ( 2 ) SP# ( 3 ) 1 ( 4 ) cells ( 4 ) - ( 3 ) 10 ( 4 ) cmove ( 1 )
So when SP# is executed, it returns a pointer to stack element 2. The pointer is then decremented by one cell, resulting in a pointer to stack element 3 (because the stack grows downwards). cmove then overwrites 10 bytes, i.e. 2 stack elements (guessing you're running a 64-bit forth). So stack elements 3 and 2 are changed. Finally, cmove pops three elements from the stack, leaving only one. Which is unchanged.

MU!
Basically what you do here is a copy CMOVE to the area at the stack, on a Forth where SP# is defined (it is not a standard word). So you destroy the stack. The answer is: such code is not supposed to work.
The question should not be " why doesn't this work?" but "under what circumstance has this the intended effect?"
Let us assume:
this Forth has an addressable and writable stack (if you think that's the same read up about Intel segment descriptors and the particulars of POP and MOV instructions)
SP# points to the top of the stack at the moment is called.
the stack grows up. (if it grows down the situation is totally different)
at the start of the call the stack was empty.
The TIB and ACCEPT are red herrings and will be ignored.
1 cells - \ The pointer to the stack is changed opening up one cell
100 cmove \ you overwrite one cell below the stack,
\ then the (32-bit) cell place where the result of SP# resides *and*
\ then TIB and then a couple of more bytes
Assuming a protected Forth, the last bytes are beyond the stack, and lead to a segmentation fault.
So you must readjust your thinking to low level if you want to use Forth. Think of memory as stack of letterboxes and start from there.

Related

Understanding AArch64 Translation Tables

I'm doing a hobby OS project and I an trying to get Virtual Memory set up. I had another project in an x86 architecture working with Page Tables but I am now learning ArmV8 now.
Now, I now that the maximum amount of bits used for addressing is 48[1]. The last 12 to 16 bits are used "as-is" to index within the selected region (depending on which granule size is selected[2]).
I just don't understand how we get those intermediate bits. Obviously the documentation is showing that intermediate tables are used[3] but it is quite unclear on how those tables are used.
In the first half of the following image, we see translation of an address with 4k granules and using 38 address bits.
I can't understand this image in the slightest. The "offsets", for example bits 38 to 30 point to an entry in the L1 table. How and where is this table defined ?
What I think is happening is, this a 12+8+8+8 address translation scheme. Starting from the right, 12 bits to find an offset within a 4096 block of memory. Right of that is 8 bits for L3, meaning that L3 indexes 256 blocks of 4096 bytes (1MB). Right of this, L2, has 8 bits also so 256 entries of (256*4096), totalling 256MB per L2 entry. Right of L2 is L1 with also 8 bits, 256 entries of 256MB means the total addressable memory is 64GB of physical RAM.
I don't think this is correct because that would only allow a 1:1 mapping of memory. Each table descriptor needs to carry some access flags and what not. Thus going back to the question of: how are those table defined. Each offset section is 8 bits and that's not enough to contain the address of a translation table.
Anyway, I am completely lost. I would appreciate if someone could give me a "plain english" explanation of how a translation table walk is done ? A graph would be nice but probably too much effort, I'll make one and share if after to help me synthesize the information. Or at least, if someone has one, a link to a good video/guide where the information isn't totally obfuscated ?
Here is the list of materials I have consulted:
https://developer.arm.com/documentation/den0024/a/The-Memory-Management-Unit/Translating-a-Virtual-Address-to-a-Physical-Address
https://forums.raspberrypi.com/viewtopic.php?t=227139
https://armv8-ref.codingbelief.com/en/chapter_d4/d42_4_translation_tables_and_the_translation_proces.html
https://github.com/bztsrc/raspi3-tutorial/blob/master/10_virtualmemory/mmu.c
[1]https://developer.arm.com/documentation/den0024/a/The-Memory-Management-Unit/Translation-tables-in-ARMv8-A
[2]https://developer.arm.com/documentation/den0024/a/The-Memory-Management-Unit/Translation-tables-in-ARMv8-A/Effect-of-granule-sizes-on-translation-tables
[3]https://developer.arm.com/documentation/den0024/a/The-Memory-Management-Unit/Translating-a-Virtual-Address-to-a-Physical-Address
The entire model behind translation tables arises from three values: the size of a translation table entry (TTE), the hardware page size (aka "translation granule"), and the amount of bits used for virtual addressing.
On arm64, TTEs are always 8 bytes. The hardware page size can be one of 4KiB, 16KiB or 64KiB (0x1000, 0x4000 or 0x10000 bytes), depending on both hardware support and runtime configuration. The amount of bits used for virtual addressing similarly depends on hardware support and runtime configuration, but with a lot more complex constraints.
By example
For the sake of simplicity, let's consider address translation under TTBR0_EL1 with no block mappings, no virtualization going on, no pointer authentication, no memory tagging, no "large physical address" support and the "top byte ignore" feature being inactive. And let's pick a hardware page size of 0x1000 bytes and 39-bit virtual addressing.
From here, I find it easiest to start at the result and go backwards in order to understand why we arrived here. So suppose you have a virtual address of 0x123456000 and the hardware maps that to physical address 0x800040000 for you. Because the page size is 0x1000 bytes, that means that for 0 <= n <= 0xfff, all accesses to virtual address 0x123456000+n will go to physical address 0x800040000+n. And because 0x1000 = 2^12, that means the lowest 12 bytes of your virtual address are not used for address translation, but indexing into the resulting page. Though the ARMv8 manual does not use this term, they are commonly called the "page offset".
63 12 11 0
+------------------------------------------------------------+-------------+
| upper bits | page offset |
+------------------------------------------------------------+-------------+
Now the obvious question is: how did we get 0x800040000? And the obvious answer is: we got it from a translation table. A "level 3" translation table, specifically. Let's defer how we found that for just a moment and suppose we know it's at 0x800037000. One thing of note is that translation tables adhere to the hardware page size as well, so we have 0x1000 bytes of translation information there. And because we know that one TTE is 8 bytes, that gives us 0x1000/8 = 0x200, or 512 entries in that table. 512 = 2^9, so we'll need 9 bits from our virtual address to index into this table. Since we already use the lower 12 bits as page offset, we take bits 20:12 here, which for our chosen address yield the value 0x56 ((0x123456000 >> 12) & 0x1ff). Multiply by the TTE size, add to the translation table address, and we know that the TTE that gave us 0x800040000 is written at address 0x8000372b0.
63 21 20 12 11 0
+------------------------------------------------------------+-------------+
| upper bits | L3 index | page offset |
+------------------------------------------------------------+-------------+
Now you repeat the same process over for how you got 0x800037000, which this time came from a TTE in a level 2 translation table. You again take 9 bits off your virtual address to index into that table, this time with an value of 0x11a ((0x123456000 >> 21) & 0x1ff).
63 30 29 21 20 12 11 0
+------------------------------------------------------------+-------------+
| upper bits | L2 index | L3 index | page offset |
+------------------------------------------------------------+-------------+
And once more for a level 1 translation table:
63 40 39 30 29 21 20 12 11 0
+------------------------------------------------------------+-------------+
| upper bits | L1 index | L2 index | L3 index | page offset |
+------------------------------------------------------------+-------------+
At this point, you used all 39 bits of your virtual address, so you're done. If you had 40-bit addressing, then there'd be another L0 table to go through. If you had 38-bit addressing, then we would've taken the L1 table all the same, but it would only span 0x800 bytes instead of 0x1000.
But where did the L1 translation table come from? Well, from TTBR0_EL1. Its physical address is just in there, serving as the root for address translation.
Now, to perform the actual translation, you have to do this whole process in reverse. You start with a translation table from TTBR0_EL1, but you don't know ad-hoc whether it's L0, L1, etc. To figure that out, you have to look at the translation granule and the number of bits used for virtual addressing. With 4KiB pages there's a 12-bit page offset and 9 bits for each level of translation tables, so with 39 bits you're looking at an L1 table. Then you take bits 39:30 of the virtual address to index into it, giving you the address of the L2 table. Rinse and repeat with bits 29:21 for L2 and 20:12 for L3, and you've arrived at the physical address of the target page.

How do I create an array in Forth?

I know, the question was often asked in the past and perhaps the information are given in previous Stack Overflow postings. But learning Forth is a very complicated task and repetition helps to understand the advantages of a concatenative programming language over alternative languages like C.
What I have learned from Forth tutorials is that Forth doesn't provide commands for creating a 2D array, but the user has to realize everything from scratch in the program. I've found two options in occupying memory in Forth. At first by creating a new word:
: namelist s” hello” s” world” ;
or secondly by the CREATE statement:
create temperature 10 allot
temperature 10 cells dump
So far so good; we have created an array of 10 cells in which integer variables can be stored. But what is, if I need to save float numbers? Do I have to convert float always to int or can they saved into the normal cells?
Another open problem is how to store string values in the array. As far as I know, a string contains a pointer plus a size. So in theory I can store only 5 strings in a 10 cell array and additionally I need memory somewhere else which holds the string itself. That doesn't make much sense.
Is there some kind of higher abstraction available to store values in arrays which can be used to write easy to read programs? I mean, if every programmer is using his own Forth method to store something in the memory, other programmers will find it hard to read the code.
create creates a word that returns address of a buffer in the dictionary (data space); it is zero length initially, so you have to reserve required space for it right away.
allot reserves space that is measured in address units (usually bytes), so you have to calculate the required size in bytes.
For example:
create a 10 cells allot
create b 10 floats allot
It is just buffers, and you still need to deal with pointers arithmetic to get or set an item, e.g.:
0.35e 2 floats b + f! \ store the float number into third item (0-based indexing)
Example of a word that creates an array of floats in the dictionary:
: create-floats-array ( length "name" -- ) create floats allot does> swap 1- floats + ;
10 create-floats-array c
0.35e 3 c f! \ store the float number into third item (1-based indexing)
3 c f# f. \ print float number form third item
If you need many arrays and many strings it is better to use appropriate libraries.
For example, see Cell array module and Dynamic text string module from Forth Foundation Library.
A generalised 2darray of elements. Takes the element size as a parameter
\ size is the per element multiplier
\ n size * is the per_row factor
\ m n size * * is the total space required
: 2darray \ create> m n size -- ; does> mix nix -- a
\ size is the number of bytes for one element
\
create 2dup * , \ multiplier to apply to m index
dup , \ multiplier to apply to n index
* * allot \ calculate total bytes and allot them
does> \ mix nix -- a
>r r# cell+ # * \ offset from n index
swap r# # * + \ offset with m index
r> + 2 cells+ \ 2 cells offset for the 'admin' cells
;
Examples
50 40 /float 2darray 50x40floats
50 40 2 cells 2darray 50x40stringpairs
even
20 constant /record
10 200 /record 2darray 2000records
You're confused about strings. The string just goes into memory and memory at that address is allocated for that string, and it's there forever (unless you change that).
So if you wanted to store 5 ( c-addr u) strings in an allocated block of memory (calling it an array is a bit of a stretch), you can just store the c-addr in cell n and the length u in cell n+1.
If you're worried about 10 cells being a lot of space (it's really nothing to worry about) and only want to use 5 cells, you can store your strings as counted strings, using words like C" - counted strings store the length in the first byte, every subsequent byte is a character.
Also, you can store things into the dictionary at the current dp using the word , (comma).

translate virtual address to physical address

The following page table is for a system with 16-bit virtual and physical addresses and with 4,096-byte pages. The reference bit is set to 1 when the page has been referenced. Periodically, a thread zeroes out all values of the reference bit.All numbers are provided in decimal.
I want to convert the following virtual addresses (in hexadecimal) to the equivalent physical addresses. Also I want to set the reference bit for the appropriate entry in the page table.
• 0xE12C
• 0x3A9D
• 0xA9D9
• 0x7001
• 0xACA1
I know the answers are but I want to know how can I achieve these answers:
0xE12C → 0x312C
0x3A9D → 0xAA9D
0xA9D9 → 0x59D9
0x7001 → 0xF001
0xACA1 → 0x5CA1
I found and tried This but it did not help me much.
It is given that virtual address is 16 bit long.Hence, there are 2^16 addresses in the virtual address space.
Page Size is given to be 4 KB ( there are 4K (4 * (2 ^ 10) )addresses in a page), so the number of pages will be ( 2^16 ) / ( 2 ^ 12 ) = 2 ^ 4.
To address each page 4 bits are required.
The most significant 4 bits in the virtual address will denote the page number being referred and the remaining 12 bits will be the page offset.
One thing to remember is page size (in the virtual address space ) is always same as the frame size in the main memory. Hence the last 12 bits will remain same in the physical address as that of the virtual address.
To get the frame address in the main memory just use the first 4 bits.
Example: Consider the virtual address 0xACA1
Here A in ACA1 denotes the page number ( 10 ) and corresponding frame no is 5 ( 0101) hence the resulting physical address will be → 0x5CA1.
To translate a virtual address to a physical address (applies ONLY to this homework question), we need to know 2 things:
Page Size
Number of bits for virtual address
In this example: 16-bit system, 4KB page size and physical memory size is 64KB.
First of all we need to determine the number of needed bits to act as offset inside page. log2(Page-Size) = log2(4096) = 12 bits for offset
Out of the 16 bits for virtual address, 12 are for offset, that means each process has 2^4 = 16 virtual pages. Each entry in page table stores the corresponding frame accommodating the page. For example:
Now lets translate!
First of all for ease of work lets convert 0xE12C to binary.
0xE12C = (1110 0001 0010 1100) in base 2
1110 = 14 in decimal
Entry 14 in P.T => Page frame 3.
Lets concatenate it to the 12 offset bits
Answer: (0011 0001 0010 1100) = 0x312C
Another example: 0x3A9D
0x3A9D = 0011 1010 1001 1101
0011 = 3
PageTable[3] = 10
10 in decimal = 1010 in binary
1010 1010 1001 1101 in binary = 0xAA9D
To help you solve this question, we need to get our details right:
16 bit of virtual address space = 2^16 = 65,536 address space
16 bit of physical address space = 2^16 = 65,536 address space
4096 Byte page size determines the offset, which is Log(4096) / Log (2) = 12 bit. This means, 2^12 for Page size
As per #Akash Mahapatra, the offset from virtual address is directly mapped to the offset onto physical address
As such, we now have:
2^16 (16bit) for virtual address - 2^12 (12bit) for offset = 4-bit for pages, or rather total number of pages available.
I won't repeat the calculation for physical since it's the same numbers.
2^4 (4bit) for pages = 16, which correlates to the number of table entries above!
We're getting there... be patient! :)
Memory Address 0xE12C in hex notation is also known to be holding 16-bit of address. (Since it's stated in the question.)
Let's butcher the address now...
We first remove '0x' from the info.
We can convert E12C to binary notation like #Tony Tannous, but I am going to apply a little short-cut.
I simply use a ratio. Well, the address is notated in 4 characters above, and since 16/4 = 4, I can define the first letter as virtual address, while the other 3 are offset address.
With the information, 'E' in hexadecimal format, I need to convert to Decimal = 14. Then I look at your table provided, and I found page frame '3'. Page frame 3 is noted in decimal format, which then need to be converted back to Hexadecimal format... Duh!... which is 3!
So, the Physical address mapping of the virtual memory location of 0xE12C can be found at 0x312C on the physical memory.
You will then go back to the table, and refer to the reference bit column and put a '1' to the row 14.
Apply the same concept for these -
0x3A9D → 0xAA9D
0xA9D9 → 0x59D9
0x7001 → 0xF001
0xACA1 → 0x5CA1
If you notice, the last 3 digits are the same (which determines the offset).
And the 1st of the 4-digits are mapped according to the table:
table entry 3 -> page frame 10 -> hex notation A
table entry A (10) -> page frame 5 -> hex notation 5
table entry 7 -> page frame 15 -> hex notation F
table entry A (10) -> page frame 5 -> hex notation 5
Hope this explanation helps you and others like me! :)

Direct Mapped Cache of Blocks Example

So i have this question in my homework assignment that i have struggling a bit with. I looked over my lecture content/notes and have been able to utilize those to answer the questions, however, i am not 100% sure that i did everything correctly. There are two parts (part C and D) in the question that i was not able to figure out even after consulting my notes and online sources. I am not looking for a solution for those two parts by any means, but it would be greatly appreciated if i could get, at least, a nudge in the right direction in how i can go about solving it.
I know this is a rather large question, however, i hope someone could possibly check my answers and tell me if all my work and methods of looking at this problem is correct. As always, thank you for any help :)
Alright, so now that we have the formalities out of the way,
--------------------------Here is the Question:--------------------------
Suppose a small direct-mapped cache of blocks with 32 blocks is constructed. Each cache block stores
eight 32-bit words. The main memory—which is byte addressable1—is 16,384 bytes in size. 32-bit words are stored
word aligned in memory, i.e., at an address that is divisible by 4.
(a) How many 32-bit words can the memory store (in decimal)?
(b) How many address bits would be required to address each byte of memory?
(c) What is the range of memory addresses, in hex? That is, what are the addresses of the first and last bytes of
memory? I'll give you a hint: memory addresses are numbered starting at 0.
(d) What would be the address of the last word in memory?
(e) Using the cache mapping scheme discussed in the Chapter 5 lecture notes, how many and which address bits
would be used to form the block offset?
(f) How many and which memory address bits would be used to form the cache index?
(g) How many and which address bits would be used to form the tag field for each cache block?
(h) To which cache block (in decimal) would memory address 0x2A5C map to?
(i) What would be the block offset (in decimal) for 0x2A5C?
(j) How many other main memory words would map to the same block as 0x2A5C?
(k) When the word at 0x2A5C is moved into a cache block, what are the memory addresses (in hex) of the other
words which will also be moved into this block? Express your answer as a range, e.g., [0x0000, 0x0200].
(l) The first word of a main memory block that is mapped to a cache block will always be at an address that is
divisible by __ (in decimal)?
(m) Including the V and tag bits of each cache block, what would be the total size of the cache (in bytes)
(n) what would be the size allocated for the data bits (in bytes)?
----------------------My answers and work-----------------------------------
a) memory = 16384 bytes. 16384 bytes into bits = 131072 bits. 131072/32 = 4096 32-bit words
b) 2^14 (main memory) * 2^2 (4 bits/word) = 2^16. take log(base2)(2^16) = 16 bits
c) couldnt figure this part out (would appreciate some input (NOT A SOLUTION) on how i can go about looking at this problem
d)could not figure this part out either :(
e)8 words in each cache line. 8 * 4(2^2 bits/word) = 32 bits in each cache line. log(base2)(2^5) = 5 bits used for block offset.
f) # of blocks = 2^5 = 32 blocks. log(base2)(2^5) = 5 bits for cache index
g) tag = 16 - 5 - 5 - 2(word alignment) = 4 bits
h) 0x2A5C
0010 10100 10111 00
tag index offset word aligned bits
maps to cache block index = 10100 = 0x14
i) maps to block offset = 10111 = 0x17
j) 4 tag bits, 5 block offset = 2^9 other main memory words
k) it is a permutation of the block offsets. so it maps the memory addresses with the same tag and cache index bits and block offsets of 0x00 0x01 0x02 0x04 0x08 0x10 0x11 0x12 0x14 0x18 0x1C 0x1E 0x1F
l)divisible by 4
m) 2(V+tag+data) = 2(1+4+2^3*2^5) = 522 bits = 65.25 bytes
n)data bits = 2^5 blocks * 2^3 words per block = 256 bits = 32 bytes
Part C:
If a memory has M bytes, and the memory is byte addressable, the the memory addresses range from 0 to M - 1.
For your question, this means that memory addresses range from 0 to 16383, or in hex 0x0 to 0x3FFF.
Part D:
Words are 4 bytes long. So given your answer to C, the last word is at:
(0x3FFFF - 3) -> 0x3FFC.
You can see that this is correct because the lowest 2 bits of the address are 0, which must be true of any 4 byte aligned address.

Irregularities in Gforth's conversion to doubles

I'm fairly confused about how the s>d and d>s functions work in Forth.
From what I've read, typing 16.0 will put 160 0 on the stack (since it takes up two cells) and d. will show 160.
Now, if I enter 16 s>d I would expect the stack to be 160 0 and d. to show 160 like in the previous example. However, the stack is 16 0 and d. is 16.
Am I entering doubles incorrectly? Is s>d not as simple as "convert a single celled value into a double celled value? Is there any reason for this irregularity? Any clues would be much appreciated.
Gforth interpets all of these the same: 1.60, 16.0, and 160., i.e. 160 converted to a double number. Whereas 16 s>d converts 16 to a double number.
ANS Forth only mandates that when the text interpreter processes a number that is immediately followed by a decimal point and is not found as a definition name, the text interpreter shall convert it to a double-cell number. But Gforth goes beoynd that: http://www.complang.tuwien.ac.at/forth/gforth/Docs-html/Number-Conversion.html#Number-Conversion

Resources