While developing BigZ, mostly used for number theoretical experiments, I've discovered the need of orthogonality in the word-set that create, filter or transform sets. I want a few words that logically combinated cover a wide range of commands, without the need to memorize a large number of words and ways to combinate them.
1 100 condition isprime create-set
put the set of all prime numbers between 1 and 100 on a set stack, while
function 1+ transform-set
transform this set to the set of all numbers p+1, where p is a prime less than 100.
Further,
condition sqr filter-set
leaves the set of all perfect squares on the form p+1 on the stack.
This works rather nice for sets of natural numbers, but to be able to create, filter and transform sets of n-tuples I need to be able to count locals in unnamed words. I have redesigned words to shortly denote compound conditions and functions:
: ~ :noname ;
: :| postpone locals| ; immediate
1 100 ~ :| p | p is prime p 2 + isprime p 2 - isprime or and ;
1 100 ~ :| a b | a dup * b dup * + isprime ;
Executing this two examples gives the parameter stack ( 1 100 xt ) but to be able to handle this right, in the first case a set of numbers and in the second case a set of pairs should be produced, I'll have to complement the word :| to get ( 1 100 xt n ) where n is the numbet of locals used. I think one could use >IN and PARSE to do this, but it was a long time ago I did such things, so I doubt I can do it properly nowadays.
I didn't understand (LOCALS) but with patience and luck I managed to do it with my original idea:
: bl# \ ad n -- m
over + swap 0 -rot
do i c# bl = +
loop negate ;
\ count the number of blanks in the string ad n
variable loc#
: locals# \ --
>in # >r
[char] | parse bl# loc# !
r> >in ! ; immediate
\ count the number of locals while loading
: -| \ --
postpone locals#
postpone locals| ; immediate
\ replace LOCALS|
Now
: test -| a b | a b + ;
works as LOCALS| but leave the number of locals in the global variable loc#.
Maybe you should drop LOCALS| and parse the local variables yourself. For each one, call (LOCAL) with its name, and end with passing an empty string.
See http://lars.nocrew.org/dpans/dpans13.htm#13.6.1.0086 for details.
Related
I use Forth (namely Swapforth) to configure certain hardware via I2C. I have a word:
i2c1-send ( reg-address byte -- )
that writes a byte to the specific internal register of a certain chip.
The initialization sequence is quite long, and therefore implementing it as below is not vialable due to memory consumption.
: i2c1-init
$1201 $10 i2c1-send
$2130 $43 i2c1-send
[...]
$0231 $43 i2c1-send
;
I have created an implementation that creates a structure holding the length of the sequence in the first cell and triple bytes in the next cells. (Please note that i2c1-send is just a placeholder allowing you to test it without my hardware).
: i2c1-send ( reg_addr byte -- )
\ It is just a placeholder to show what will be written in HW
swap
." addr=" hex . ." val=" . decimal CR
;
: i2c1: ( "<spaces>name" -- )
create here $326e9 0 ,
does> dup cell+ swap
# 0 do
dup c# >r 1+
dup c# 8 lshift swap 1+
dup c# rot or r> i2c1-send
1+
loop
drop
;
: i2c1-def ( addr val -- )
c, ( adr )
dup 8 rshift c,
255 and c,
;
: i2c1; ( -- )
\ Make sure that i2c1: was used before
$326e9 <> abort" i2c1; without i2c1:"
dup cell+ here swap - ( first_cell length )
\ Verify that the length is a multiple of 3
3 /mod swap 0<> abort" illegal length - not a multiple of 3"
swap !
;
With the above code you define the initialization list similarly:
i2c1: set1
$1234 $11 i2c1-def
$1521 $18 i2c1-def
[...]
$2313 $10 i2c1-def
i2c1;
But the memory consumption is significantly reduced (by factor of 2 in case of J1B Forth CPU).
However I dislike the syntax. I'd prefere to have something that would allow to define the initialization list just by numbers, until certain delimiter is found, like below:
i2c1-x: i2c1-init
$1234 $11
$1521 $18
[...]
$2313 $10
i2c1-x;
I have created the word shown below:
: i2c-delim s" i2c1-x;" ;
: i2c1-x: create here 0 ,
begin
parse-name
2dup i2c-delim compare 0<> while
evaluate \ We store the address later
parse-name
evaluate
c,
\ Now store the address
dup 8 rshift c,
255 and c,
repeat
2drop
dup cell+ here swap - ( first_cell length )
\ Verify that the length is a multiple of 3
3 /mod swap 0<> abort" length not a multiple of 3"
swap !
does> dup cell+ swap
# 0 do
dup c# >r 1+
dup c# 8 lshift swap 1+
dup c# rot or r> i2c1-send
1+
loop
drop
;
It works perfectly for short definitions:
i2c1-x: set2 $1234 $ac $6543 $78 $9871 $01 $3440 $02 i2c1-x;
But fails for longer ones that use multiple lines:
i2c1-x: set2
$1234 $ac
$6543 $78
$9871 $01
$3440 $02
i2c1-x;
Is it possible to define i2c1-x so that it handles multiple lines, or do I have to use solution based on separate i2c1:, i2c1-def and i2c1;?
There is REFILL word to parse multiple lines.
\ Get the next name (lexeme) possibly from the next lines
\ NB: Use the result of parse-name-sure immediate
\ since it may be garbled after the next refill
\ (the buffer may be be overwritten by the next line).
: parse-name-sure ( -- c-addr u|0 )
begin parse-name dup 0= while refill 0= if exit then 2drop repeat
;
\ Check if the first string equals to the second
: equals ( c-addr2 u2 c-addr1 u1 -- flag )
dup 3 pick <> if 2drop 2drop false exit then
compare 0=
;
It is a common approach to translate the input until some delimiter. A general function to perform this approach:
\ Translate the input till a delimiter
\ using xt as translator for a lexeme
2variable _delimiter
: translate-input-till-with ( i*x c-addr u xt -- j*x )
>r _delimiter 2!
begin parse-name-sure dup while
2dup _delimiter 2# equals 0= while
r# execute
repeat then 2drop rdrop
;
There is a sense to also factor out the manipulation of 16-bits units into a library:
[undefined] w# [if]
\ NB: little-endian endianness variant
: w! ( x addr -- ) dup 1+ >r >r dup 8 rshift r> c! r> c! ;
: w# ( addr -- x ) dup c# 8 lshift swap 1+ c# or ;
: w, ( x -- ) here 2 allot w! ;
[then]
Also, a function to converting text into number should be in a library. Using evaluate for that is not hygienic. See example of StoN definition in "How to enter numbers in Forth" question. A helper to convert the "$"-prefixed numbers may be found in your Forth-system.
\ dummy definitions for test only
: s-to-n ( addr u -- x ) evaluate ;
: send-i2c1 ( addr x -- ) ." send: " . . CR ;
The application code:
\ Translate the input numbers till the delimiter into the special format
\ (the code could be simplified using the quotations)
: i2c-delim s" i2c1-x;" ;
: translate-i2c-pair ( c-addr u -- )
s-to-n
parse-name-sure
2dup i2c-delim equals abort" translate-i2c: unexpected delimiter"
s-to-n c, w,
;
: translate-i2c-input ( -- )
i2c-delim ['] translate-i2c-pair translate-input-till-with
;
\ Send data from the special format
: send-i2c1-bulk ( addr u -- )
3 / 0 ?do
dup c# swap 1+
dup w# swap 2+ >r send-i2c1 r>
loop drop
;
\ The defining word
: i2c1-x:
create here >r 0 , here >r translate-i2c-input here r> - r> !
does> dup cell+ swap # send-i2c1-bulk
;
A testcase
i2c1-x: test
1 2
3 4
5
6
i2c1-x;
test
Is there something like input in Basic or scanf("%d") in C in Forth?
Probably it will be something like this:
200 buffer: buf
: input ( -- n ) buf 200 accept
some-magic-filter
buf swap evaluate ;
The problem in the above code, is how to define a filter that will pass only numbers, but not any words, definitions, etc?
The standard specifies only a low level >NUMBER word to interpret integer numbers.
OTOH using EVALUATE to convert strings into numbers is a quick and dirty way. Either use it without checks (in the case of trusted input) or do not use it at all. Trying to filter the string before EVALUATE is a bad idea: it has cost of >NUMBER word itself and low reusing factor.
NB: neither >NUMBER nor EVALUATE detects numeric overflow.
In any case, your word to input a single-cell integer can be defined something like:
: accept-number ( -- n )
PAD DUP 80 ACCEPT ( addr u ) StoN ( n )
;
In the case of trusted input you can define StoN like
: StoN ( addr u -- x )
STATE # ABORT" This naive StoN should not be used in compilation state"
DEPTH 2- >R
EVALUATE
DEPTH 1- R> <> IF -24 THROW THEN
\ check depth to accept the single-cell numbers only
;
Otherwise (in the case of untrusted input) you have two choices: to rely on the specific words of a particular Forth system or to use some (perhaps your own) library.
I use the following lexicon to define StoN:
\ ---
\ The words from Substring Matching library
\ (where length is counted in address units)
: MATCH-HEAD ( a u a-key u-key -- a-right u-right true | a u false )
2 PICK OVER U< IF 2DROP FALSE EXIT THEN
DUP >R
3 PICK R# COMPARE IF RDROP FALSE EXIT THEN
SWAP R# + SWAP R> - TRUE
;
\ ---
\ The words from Literals interpreting library
\ (where prefix 'I-' is shortcut for Interpret)
: I-DLIT ( a u -- x x true | a u false )
2DUP S" -" MATCH-HEAD >R
DUP 0= IF NIP RDROP EXIT THEN
0 0 2SWAP >NUMBER NIP IF RDROP 2DROP FALSE EXIT THEN
R> IF DNEGATE THEN 2SWAP 2DROP TRUE
;
: I-LIT ( a u -- x true | a u false )
I-DLIT IF D>S TRUE EXIT THEN FALSE
;
After that StoN can be defined as:
: StoN ( a u -- x ) I-LIT IF EXIT THEN -24 THROW ;
The mentioned libraries can be found at GitHub:
Substring matching functions library
Resolvers example (for various lexemes)
Rosetta Code suggests this code snippet, working with GForth 0.6.2, to determine if an input string is numeric:
: is-numeric ( addr len -- )
2dup snumber? ?dup if
0< if
-rot type ." as integer = " .
else
2swap type ." as double = " <# #s #> type
then
else 2dup >float if
type ." as float = " f.
else
type ." isn't numeric in base " base # dec.
then then ;
I built a BASIC like #INPUT word for Camel Forth to give BASIC users something more familiar. It takes more than one might think. It starts with $ACCEPT which can be used to like input with a string variable or memory block.
The definition of NUMBER? here is for single ints only but it compiles on GForth. It outputs true if conversion is bad; the reverse of SNUMBER?
DECIMAL
: NUMBER? ( addr len -- n ?) \ ?=0 is good conversion
( -- addr len) \ bad conversion
OVER C# [CHAR] - = DUP >R \ save flag for later
IF 1 /STRING THEN \ remove minus sign
0 0 2SWAP >NUMBER NIP NIP \ convert the number
R> IF SWAP NEGATE SWAP THEN \ negate if needed
;
: $ACCEPT ( $addr -- ) CR ." ? " DUP 1+ 80 ACCEPT SWAP C! ;
: #INPUT ( variable -- ) \ made to look/work like TI-BASIC
BEGIN
PAD $ACCEPT \ $ACCEPT text into temp buffer PAD
PAD COUNT NUMBER? \ convert the number in PAD
WHILE \ while the conversion is bad do this
CR ." Input error "
CR DROP
REPEAT
SWAP ! ; \ store the number in the variable
\ USAGE: VARIABLE X
\ X #INPUT
Some online gforth docs provide a seemingly complete description of base-execute's effects:
base-execute i*x xt u – j*x gforth “base-execute”
execute xt with the content of BASE being u, and restoring the
original BASE afterwards.
But the syntax for the effects seems like a lock without a key -- the page links to nothing that describes what i*x xt u – j*x signifies. Some hunting turned up a partial description of the syntax notation, (which tells us that u is an unsigned number and xt is an execution token), but that's still not enough to understand i*x xt u – j*x.
How is base-execute used, and what does it do?
To understand what base-execute does you need to understand execute and BASE. I'll also explain how to read i*x and j*x in the stack effect.
execute works by taking an execution token xt and executing it. ' 1+ execute is the same as 1+ on its own. The reason to use execute, though, is because you can pass xt on the stack, instead of having to choose it ahead of time. For instance:
: exec-twice dup >r execute r> execute ;
2 ' 1+ exec-twice . ( this outputs 4 )
BASE is a variable that controls what numeric base to use for input and output.
BASE is initially 10. So 5 2 BASE ! . outputs 101 (which is 5 in base 2).
base-execute puts them together: it changes BASE to u, executes xt, then restores BASE to its previous value. Its implementation might look like this:
: base-execute BASE # >r BASE ! execute r> BASE ! ;
Here's an example usage:
: squared ( n1 -- n2 ) dup * ;
: squares ( n -- ) 0 do i squared . loop ;
10 squares ( 0 1 4 9 16 25 36 49 64 81 )
: hex-execute ( i*x xt -- j*x ) 16 base-execute ;
10 ' squares hex-execute ( 0 1 4 9 10 19 24 31 40 51 )
10 squares ( 0 1 ... 81 we're back to decimal )
Now for i*x xt u -- j*x:
The stack notation documentation you linked to has most of the information you need to read the effect. i*x -- j*x means that something might happen to the stack, but it doesn't specify what. In this case, the exact stack effect depends on what xt is.
To know the stack effect with a given xt, replace i*x and j*x with the two sides of xt's stack effect.
For example, if xt is ' . you would look at .'s stack effect, which is n --. In that case you could think of base-execute's stack effect as n xt-of-. u --.
Recently, on comp.lang.forth I found some code, kindly written by Coos Haak, which I have difficulty understanding.
It is supposed to sum or multiply the digits between the parenthesis. For example,
( 1 2 3 +) ok
. 6 ok
For convenience, I'll reproduce it here:
: (
depth 1+ r> 2>r
;
: cond
depth j >
;
: done
2r> rdrop 2>r
;
: +)
begin cond
while +
repeat
done
;
: *)
begin cond
while *
repeat
done
;
I see the phrases r> 2>r and 2r> rdrop 2>r. But, I'm rather confused about what they are doing. I'd guess that the stack depth at the open parenthesis is being hidden on the return stack somehow. But, I don't get it.
What do these do to the return stack?
In the Gforth documentation I see:
r> R:w – w core “r-from”
2>r d – R:d core-ext “two-to-r”
2r> R:d – d core-ext “two-r-from”
rdrop R:w – gforth “rdrop”
w Cell, can contain an integer or an address
d double sized signed integer
Does this have something to do with the conversion between w and d?
2>r (and the Forth 200x word n>r) preserves the order of the elements pushed to the return stack. So if the you have ( 1 0 ) on the data stack, with 0 as the top of the stack, then after 2>r you will have 0 at the top of the return stack and 1 below it. 2>r is therefore definable, not as
: 2>r ]] >r >r [[ ; immediate
But as:
: 2>r ]] swap >r >r [[ ; immediate
And these definitions are equivalent:
: a ]] 0 >r 1 >r [[ ; immediate
: b ]] 0 1 2>r [[ ; immediate
What Coos Haak does in that code then is to slip a value below the top of the return stack. If his ( merely pushed the depth to the top of the return stack, then on exit from this word, gforth would try to jump to the depth as an address. The same error condition is seen if you try to use his words in this way:
: numbers ( 1 2 ;
: sum +) ;
numbers sum
\ output: :16: error: Invalid memory address
\ >>>numbers<<< sum
That code would work however (and the normal usage would fail) if ( and +) coordinated with the third element on the return stack instead of the second.
There are a few pitfalls with this code:
The normal denizens of the return stack, so to speak, aren't guaranteed to take up only one cell of the return stack.
The use of j relies on knowledge about the precise depth into the return stack that j pulls from - i.e., it relies on knowledge about how DO ... LOOP and related words are implemented.
These words could be portably implemented as immediate words, where they would keep depth at the top of the return stack, but then you couldn't use them outside of a definition. It's simple enough to make them work as is on any given Forth.
This is a typical example of premature optimisation.
2>R moves two items to the return stack, but the standard prescribes the order that the two items arrive there. Coos Haak knows this and takes "advantage" of it.
Replace the code with the equivalent
: (
R> \ remember return address
depth >R
>R \ restore return address.
;
Now you see what is going on. You want to remember the stack depth, but if it is on the stack it will interfere with the calculation. So you tuck it under the return address of the ( code, later to be retrieved in a similar fashion.
Alternatively you could make this a machine code definition and then there would not be a return address to worry about.
CODE (
<DEPTH> <to-r>
ENDCODE
where the actual machine code is left as an exercise.
Yet an other alternative is using a macro, that also need not worry about the return stack.
: ( POSTPONE DEPTH POSTPONE >R ;
I ignored the 1+ . Is a techicality, because depth itself changes the depth by 1. So you always have to judiciously add 1-'s or 1+'s whenever you actually use depth .
I need a base converter function for Lua. I need to convert from base 10 to base 2,3,4,5,6,7,8,9,10,11...36 how can i to this?
In the string to number direction, the function tonumber() takes an optional second argument that specifies the base to use, which may range from 2 to 36 with the obvious meaning for digits in bases greater than 10.
In the number to string direction, this can be done slightly more efficiently than Nikolaus's answer by something like this:
local floor,insert = math.floor, table.insert
function basen(n,b)
n = floor(n)
if not b or b == 10 then return tostring(n) end
local digits = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"
local t = {}
local sign = ""
if n < 0 then
sign = "-"
n = -n
end
repeat
local d = (n % b) + 1
n = floor(n / b)
insert(t, 1, digits:sub(d,d))
until n == 0
return sign .. table.concat(t,"")
end
This creates fewer garbage strings to collect by using table.concat() instead of repeated calls to the string concatenation operator ... Although it makes little practical difference for strings this small, this idiom should be learned because otherwise building a buffer in a loop with the concatenation operator will actually tend to O(n2) performance while table.concat() has been designed to do substantially better.
There is an unanswered question as to whether it is more efficient to push the digits on a stack in the table t with calls to table.insert(t,1,digit), or to append them to the end with t[#t+1]=digit, followed by a call to string.reverse() to put the digits in the right order. I'll leave the benchmarking to the student. Note that although the code I pasted here does run and appears to get correct answers, there may other opportunities to tune it further.
For example, the common case of base 10 is culled off and handled with the built in tostring() function. But similar culls can be done for bases 8 and 16 which have conversion specifiers for string.format() ("%o" and "%x", respectively).
Also, neither Nikolaus's solution nor mine handle non-integers particularly well. I emphasize that here by forcing the value n to an integer with math.floor() at the beginning.
Correctly converting a general floating point value to any base (even base 10) is fraught with subtleties, which I leave as an exercise to the reader.
you can use a loop to convert an integer into a string containting the required base. for bases below 10 use the following code, if you need a base larger than that you need to add a line that mapps the result of x % base to a character (usign an array for example)
x = 1234
r = ""
base = 8
while x > 0 do
r = "" .. (x % base ) .. r
x = math.floor(x / base)
end
print( r );