Is it possible to consume tick in a Forth definition? - forth

When reading about the tick (') operator I wondered if it can be useful inside a word definition. I know that there is ['] to be used inside a definition, but I thought about using it to read the word name following invocation.
An example:
4 variable cnt
: cycle: ( arg fn -- )
4 cnt !
dup execute
-1 cnt +!
cnt # 0 = until
I can use cycle: to repeat some word invocation, as follows.
: hello ." hello" ;
cycle: hello
Which prints hello four times, as expected.
But the following code won't define a word that prints hello four times:
: 4hello cycle: hello ;
The tick operator still expects a word from the input stream following invocation of 4hello.
Is it possible to inject it somehow when using cycle: in a word definition, so it won't "leak" outside?

Yes, it's possible. You would have to make cycle: immediate. And then also change it to postpone its actions, rather than perform them at runtime.
Postponing means to delay the actions of words. Immediate words are compiled into the current definition, and normal words are arranged to be compiled when the current definition is executing.
In this case it might look something like this.
: (cycle) 4 0 do dup execute loop drop ;
: cycle: ' postpone literal postpone (cycle) ; immediate
Note that this version no longer works outside definitions.


Skip over input stream in ATLAST forth

I'm trying to implement a kind of "conditional :" in ATLAST, the reasoning being I have a file that gets FLOADed multiple times to handle multiple steps of my program flow (I'm essentially abusing Forth as an assembler, step 1 does a first parsing for references, etc. and in step 2 the instruction words actually emit bytes).
So when declaring words for "macros" in that file, it errors out in step 2, because they were already declared in step 1, but I also can't just FORGET them, because that would forget everything that came afterwards, such as the references I just collected in step 1.
So essentially I need a ": that only runs in step 1", my idea being something like this:
: ::
STAGE # 0 = IF
[COMPILE] : ( be a word declaration )
BEGIN ( eat the disabled declaration )
' ( get the address of the next word )
['] ; ( get the address of semicolon )
= ( loop until they are equal )
:: FIVE 5 ; ( declares as expected )
FIVE . ( prints 5 )
1 STAGE ! ( up to here everything's fine )
:: FIVE 6 ; ( is supposed to do nothing, but errors out )
FIVE . ( is supposed to print 5 again )
The traced error message (starting from 1 STAGE !):
Trace: !
Trace: ::
Trace: STAGE
Trace: #
Trace: (LIT) 0
Trace: =
Trace: ?BRANCH
Trace: '
Trace: (LIT) 94721509587192
Trace: =
Trace: ?BRANCH
Trace: '
Word not specified when expected.
Trace: ;
Compiler word outside definition.
KEY ( -- ch ) as common in some other Forths for reading a single character from the input stream ( outside the :: declaration, since it's IMMEDIATE ) doesn't exist in ATLAST, the only related words I could find are:
': is supposed to read a word from the input stream, then pushes its compile address
[']: like ' but reads a word from the current line (the inside of the :: declaration)
(LIT)/(STRLIT): are supposed to read literals from the input stream according to the documentation, I could only ever make them segmentation fault, I think they're for compiler-internal use only (e.g., if the compiler encounters a number literal it will compile the (LIT) word to make it push that number onto the stack)
There aren't any WORD or PARSE either, as in some other Forths.
As you can see, ' is struggling actually getting something from the input stream for some weird reason, and it looks like ['] is failing to capture the ; which then errors out because it's suddenly encountering a ; where it doesn't belong.
I suspect it actually ran ' ['], even though it's supposed to work on the input stream, not the immediate line, and I'm clearly in compile mode there.
I did a similar thing with conditionally declaring variables, there it was rather easy to just [COMPILE] ' DROP to skip a single word (turning RES x into ' x DROP), but here I'm pretty sure I can't actually compile those instructions, because I can't emit a loop outside of a declaration. Unless there is a way to somehow compile similar code that recursively gets rid of everything until the ;.
A problem is that ' cannot find a number. A possible solution is to use a special dummy name for the definition, instead of skip it over:
: ::
' DROP \ this xt isn't needed
" : _dummy" EVALUATE ( -- n ) DROP
Or maybe use a new name every time:
: ::
' >NAME # \ ( s1 ) \ should be checked
": _dummy_" DUP >R S+
But due to non standard words it might not work. Another problem is that non colon-definitions are out of the scope.
Perhaps, a better solution is a preprocessing by external means.
It appears that ATLAST is a primitive Forth, that doesn't allow you to go to a more sophisticated handling of sources. But all is not lost!
For example, a Forth implementation according to the ISO standard will handle the matter with ease with one or more of: REQUIRE [IF] [THEN] [DEFINED] SRC >IN NAME WORD FIND.
As you have a Forth, you can steal these words from another Forth and compile the code.
Another solution that may help directly is executing EXIT in interpret mode while loading a file.
You have to find out whether you can create a flag whether to abandon the input source. Then this definition might help:
: ?abandon IF S" EXIT" EVALUATE THEN ;
S" FIVE" FOUND ?abandon
Note that ?abandon must be executed in interpret mode.

How to refactor string containing variable names into booleans?

I have an SPSS variable containing lines like:
Every line starts with pipe, and ends with one. I need to refactor it into boolean variables as the following:
var var1 var2 var3 var4 var5
|2|4|5| 0 1 0 1 1
I have tried to do it with a loop like:
loop # = 1 to 72.
compute var# = SUBSTR(var,2#,1).
end loop.
My code won't work with 2 or more digits long numbers and also it won't place the values into their respective variables, so I've tried nest the char.substr(var,char.rindex(var,'|') + 1) into another loop with no luck because it still won't allow me to recognize the variable number.
How can I do it?
This looks like a nice job for the DO REPEAT command. However the type conversion is somewhat tricky:
DO REPEAT var#i=var1 TO var72
/i=1 TO 72.
COMPUTE var#i = CHAR.INDEX(var,CONCAT("|",LTRIM(STRING(i,F2.0)),"|"))>0).
Explanation: Let's go from the inside to the outside:
STRING(value,F2.0) converts the numeric values into a string of two digits (with a leading white space where the number consist of just one digit), e.g. 2 -> " 2".
LTRIM() removes the leading whitespaces, e.g. " 2" -> "2".
CONCAT() concatenates strings. In the above code it adds the "|" before and after the number, e.g. "2" -> "|2|"
CHAR.INDEX(stringvar,searchstring) returns the position at which the searchstring was found. It returns 0 if the searchstring wasn't found.
CHAR.INDEX(stringvar,searchstring)>0 returns a boolean value indicating if the searchstring was found or not.
It's easier to do the manipulations in Python than native SPSS syntax.
You can use SPSSINC TRANS extension for this purpose.
/* Example data*/.
data list free / TextStr (a99).
begin data.
end data.
/* defining function to achieve task */.
begin program.
def runTask(x):
numbers=map(int,filter(None,[i.strip() for i in x.lstrip('|').split("|")]))
answer=[1 if i in numbers else 0 for i in xrange(1,max(numbers)+1)]
return answer
end program.
/* Run job*/.
spssinc trans result = V1 to V30 type=0 /formula "runTask(TextStr)".

New lines in word definition using interpreter directives of Gforth

I am using the interpreter directives (non ANS standard) control structures of Gforth as described in the manual section 5.13.4 Interpreter Directives. I basically want to use the loop words to create a dynamically sized word containing literals. I came up with this definition for example:
: foo
[ 10 ] [FOR]
Yet this produces an Address alignment exception after the [FOR] (yes, I know you should not use a for loop in Forth at all. This is just for an easy example).
In the end it turned out that you have to write loops as one-liners in order to ensure their correct execution. So doing
: foo [ 10 [FOR] ] 1 [ [NEXT] ] ;
instead works as intended. Running see foo yields:
: foo
1 1 1 1 1 1 1 1 1 1 1 ; ok
which is exactly what I want.
Is there a way to get new lines in the word definition? The words I would like to write are way more complex, and for a presentation I would need them better formatted.
It would really be best to use an immediate word instead. For example,
: ones ( n -- ) 0 ?do 1 postpone literal loop ; immediate
: foo ( -- ten ones ) [ 10 ] ones ;
With SEE FOO resulting in the same as your example. With POSTPONE, especially with Gforth's ]] .. [[ syntax, the repeated code can be as elaborate as you like.
A multiline [FOR] would need to do four things:
Use REFILL to read in subsequent lines.
Save the read-in lines, because you'll need to evaluate them one by one to preserve line-expecting parsing behavior (such as from comments: \ ).
Stop reading in lines, and loop, when you match the terminating [NEXT].
Take care to leave >IN right after the [NEXT] so that interpretation can continue normally.
You might still run into issues with some code, like code checking SOURCE-ID.
For an example of using REFILL to parse across multiple lines, here's code from a recent posting from CLF, by Gerry:
: line, ( u1 caddr2 u2 -- u3 )
tuck here swap chars dup allot move +
: <text>  ( "text" -- caddr u )
here 0
bl word count s" </text>" compare
0 >in ! source line, bl c, 1+
repeat then
This collects everything between <text> and a </text> that's on its own line, as with a HERE document, while also adding spaces. To save the individual lines for [FOR] in an easy way, I'd recommend leaving 0 as a sentinel on the data stack and then drop SAVE-MEM 'd lines on top of it.

PostScript execution of nested procedures

(I'm back with yet another question :-) )
Given the following PostScript code:
/riverside { 5 pop } def
/star { 6 pop 2 {riverside} repeat } def
I'm wondering how nested procedures should be handled. (I'm creating my own interpreter).
When I execute the star procedure, halfway it finds a nameObjec(riverside) and replaces it with an executable array containing the values from the riverside procedure and executes them.
If I execute the repeat operator the interpreter crashes because there is only one item left on the stack.
Should I actually execute an executable array (=procedure) directly when I'm already in an executable array (=prodecure), or should the executable arrays (=procedures) always be pushed on the (operand?/execution?)stack? or only be executed by another operator?
How many times should this riverside be executed? (2 or 3 times?) I guess 2?
For your information: this is the situation that I have when I execute star on the 3rd line (see the ERROR):
% begin execute 3rd line (star)
% OP = operand stack
% EX = execution stack
% handle 6
OP: 6
EX: star
% handle pop (removes 6 from OP)
OP: -
EX: star
% handle 2
OP: 2
EX: star
% set the riverside executable array on the EX, execute the values
OP: 2
EX: star riverside
% repeat operator:
CRASH, only one item on the OP left, but repeat operator requires 2 operands.
OP: 5
% end
Please shine a light on this matter, because it is somewhat complex/confusing :-)
another code sample might be this one:
{ 72 0 lineto
currentpoint translate
-144 rotate } def
{ moveto
currentpoint translate
4 {starside} repeat
.5 setgray fill
stroke } def
200 200 star
when the interpreter tokenizes /star { moveto ... if it encounters the nested {starside} how will that be treated? (+ what if there was {starside 5 2 mul pop} instead of only {starside} ?)
I believe you need to look at section 3.5.3 of the PLRM. Although this deals with a simple executable array the concept is the same. When the token scanner encounters a '{' it starts to build an executable array. Until it reaches a matching '}' token the scanner simply stores what it encounters on the operand stack. When it encounters the matching '{' then the objects are converted into an executable array (and stored on the operand stack)
In the case of the scanner encountering an executable name, it stores the name on the operand stack. It does not execute the name, nor does it even perform lookup on it to retrieve the associated object.
So immediately before the execution of '}' in your example, the operand stack would contain twp objects, the '{' opening array, and the executable name riverside. When you encounter the '}' then the scanner creates the actual executable array and stores it on the operand stack. (Note, implementation details vary here)
So immediately before the execution of 'repeat' you would have two objects on the stack, the counter and an executable array containing a single executable name.
You don't look up the name until the executable array containing the name is executed.
This might make it clearer:
/test {(This is my initial string\n) print} def
2 {test} repeat
2 {test} /test {(This is my second string\n) print} def repeat
Notice that I've redefined 'test' after creating the executable array containing the executable name 'test', yet the execution uses the later definition of test. As you can see, its vitally important not to do name lookup too early!

Why does return/redo evaluate result functions in the calling context, but block results are not evaluated?

Last night I learned about the /redo option for when you return from a function. It lets you return another function, which is then invoked at the calling site and reinvokes the evaluator from the same position
>> foo: func [a] [(print a) (return/redo (func [b] [print b + 10]))]
>> foo "Hello" 10
Even though foo is a function that only takes one argument, it now acts like a function that took two arguments. Something like that would otherwise require the caller to know you were returning a function, and that caller would have to manually use the do evaluator on it.
Thus without return/redo, you'd get:
>> foo: func [a] [(print a) (return (func [b] [print b + 10]))]
>> foo "Hello" 10
== 10
foo consumed its one parameter and returned a function by value (which was not invoked, thus the interpreter moved on). Then the expression evaluated to 10. If return/redo did not exist you'd have had to write:
>> do foo "Hello" 10
This keeps the caller from having to know (or care) if you've chosen to return a function to execute. And is cool because you can do things like tail call optimization, or writing a wrapper for the return functionality itself. Here's a variant of return that prints a message but still exits the function and provides the result:
>> myreturn: func [] [(print "Leaving...") (return/redo :return)]
>> foo: func [num] [myreturn num + 10]
>> foo 10
== 20
But functions aren't the only thing that have behavior in do. So if this is a general pattern for "removing the need for a DO at the callsite", then why doesn't this print anything?
>> test: func [] [return/redo [print "test"]]
>> test
== [print "test"]
It just returned the block by value, like a normal return would have. Shouldn't it have printed out "test"? That's what do would...uh, do with it:
>> do [print "test"]
The short answer is because it is generally unnecessary to evaluate a block at the call point, because blocks in Rebol don't take parameters so it mostly doesn't matter where they are evaluated. However, that "mostly" may need some explanation...
It comes down to two interesting features of Rebol: static binding, and how do of a function works.
Static Binding and Scopes
Rebol doesn't have scoped word bindings, it has static direct word bindings. Sometimes it seems like we have lexical scope, but we really fake that by updating the static bindings each time we're building a new "scoped" code block. We can also rebind words manually whenever we want.
What that means for us in this case though, is that once a block exists, its bindings and values are static - they're not affected by where the block is physically located, or where it is being evaluated.
However, and this is where it gets tricky, function contexts are weird. While the bindings of words bound to a function context are static, the set of values assigned to those words are dynamically scoped. It's a side effect of how code is evaluated in Rebol: What are language statements in other languages are functions in Rebol, so a call to if, for instance, actually passes a block of data to the if function which if then passes to do. That means that while a function is running, do has to look up the values of its words from the call frame of the most recent call to the function that hasn't returned yet.
This does mean that if you call a function and return a block of code with words bound to its context, evaluating that block will fail after the function returns. However, if your function calls itself and that call returns a block of code with its words bound to it, evaluating that block before your function returns will make it look up those words in the call frame of the current call of your function.
This is the same for whether you do or return/redo, and affects inner functions as well. Let me demonstrate:
Function returning code that is evaluated after the function returns, referencing a function word:
>> a: 10 do do has [a] [a: 20 [a]]
** Script error: a word is not bound to a context
** Where: do
** Near: do do has [a] [a: 20 [a]]
Same, but with return/redo and the code in a function:
>> a: 10 do has [a] [a: 20 return/redo does [a]]
** Script error: a word is not bound to a context
** Where: function!
** Near: [a: 20 return/redo does [a]]
Code do version, but inside an outer call to the same function:
>> do f: function [x] [a: 10 either zero? x [do f 1] [a: 20 [a]]] 0
== 10
Same, but with return/redo and the code in a function:
>> do f: function [x] [a: 10 either zero? x [f 1] [a: 20 return/redo does [a]]] 0
== 10
So in short, with blocks there is usually no advantage to doing the block elsewhere than where it is defined, and if you want to it is easier to use another call to do instead. Self-calling recursive functions that need to return code to be executed in outer calls of the same function are an exceedingly rare code pattern that I have never seen used in Rebol code at all.
It could be possible to change return/redo so it would handle blocks as well, but it probably isn't worth the increased overhead to return/redo to add a feature that is only useful in rare circumstances and already has a better way to do it.
However, that brings up an interesting point: If you don't need return/redo for blocks because do does the same job, doesn't the same apply to functions? Why do we need return/redo at all?
How DO of a Function Works
Basically, we have return/redo because it uses exactly the same code that we use to implement do of a function. You might not realize it, but do of a function is really unusual.
In most programming languages that can call a function value, you have to pass the parameters to the function as a complete set, sort of how R3's apply function works. Regular Rebol function calling causes some unknown-ahead-of-time number of additional evaluations to happen for its arguments using unknown-ahead-of-time evaluation rules. The evaluator figures out these evaluation rules at runtime and just passes the results of the evaluation to the function. The function itself doesn't handle the evaluation of its parameters, or even necessarily know how those parameters were evaluated.
However, when you do a function value explicitly, that means passing the function value to a call to another function, a regular function named do, and then that magically causes the evaluation of additional parameters that weren't even passed to the do function at all.
Well it's not magic, it's return/redo. The way do of a function works is that it returns a reference to the function in a regular shortcut-return value, with a flag in the shortcut-return value that tells the interpreter that called do to evaluate the returned function as if it were called right there in the code. This is basically what is called a trampoline.
Here's where we get to another interesting feature of Rebol: The ability to shortcut-return values from a function is built into the evaluator, but it doesn't actually use the return function to do it. All of the functions you see from Rebol code are wrappers around the internal stuff, even return and do. The return function we call just generates one of those shortcut-return values and returns it; the evaluator does the rest.
So in this case, what really happened is that all along we had code that did what return/redo does internally, but Carl decided to add an option to our return function to set that flag, even though the internal code doesn't need return to do so because the internal code calls the internal function. And then he didn't tell anyone that he was making the option externally available, or why, or what it did (I guess you can't mention everything; who has the time?). I have the suspicion, based on conversations with Carl and some bugs we've been fixing, that R2 handled do of a function differently, in a way that would have made return/redo impossible.
That does mean that the handling of return/redo is pretty thoroughly oriented towards function evaluation, since that is its entire reason for existing at all. Adding any overhead to it would add overhead to do of a function, and we use that a lot. Probably not worth extending it to blocks, given how little we'd gain and how rarely we'd get any benefit at all.
For return/redo of a function though, it seems to be getting more and more useful the more we think about it. In the last day we've come up with all sorts of tricks that this enables. Trampolines are useful.
While the question originally asked why return/redo did not evaluate blocks, there were also formulations like: "is cool because you can do things like tail call optimization", "[can write] a wrapper for the return functionality", "it seems to be getting more and more useful the more we think about it".
I do not think these are true. My first example demonstrates a case where return/redo can really be used, an example being in the "area of expertise" of return/redo, so to speak. It is a variadic sum function called sumn:
use [result collect process] [
collect: func [:value [any-type!]] [
unless value? 'value [return process result]
append/only result :value
return/redo :collect
process: func [block [block!] /local result] [
result: 0
foreach value reduce block [result: result + value]
sumn: func [] [
result: copy []
return/redo :collect
This is the usage example:
>> sumn 1 * 2 2 * 3 4
== 12
Variadic functions taking "unlimited number" of arguments are not as useful in Rebol as it may look at the first sight. For example, if we wanted to use the sumn function in a small script, we would have to wrap it into a paren to indicate where it should stop collecting arguments:
result: (sumn 1 * 2 2 * 3 4)
print result
This is not any better than using a more standard (non-variadic) alternative called e.g. block-sum and taking just one argument, a block. The usage would be like
result: block-sum [1 * 2 2 * 3 4]
print result
Of course, if the function can somehow detect what is its last argument without needing enclosing paren, we really gain something. In this case we could use the #[unset!] value as the sumn stopping argument, but that does not spare typing either:
result: sumn 1 * 2 2 * 3 4 #[unset!]
print result
Seeing the example of a return wrapper I would say that return/redo is not well suited for return wrappers, return wrappers being outside of its area of expertise. To demonstrate that, here is a return wrapper written in Rebol 2 that actually is outside of return/redo's area of expertise:
myreturn: func [
{my RETURN wrapper returning the string "indefinite" instead of #[unset!]}
; the [throw] attribute makes this function a RETURN wrapper in R2:
value [any-type!] {the value to return}
] [
either value? 'value [return :value] [return "indefinite"]
Testing in R2:
>> do does [return #[unset!]]
>> do does [myreturn #[unset!]]
== "indefinite"
>> do does [return 1]
== 1
>> do does [myreturn 1]
== 1
>> do does [return 2 3]
== 2
>> do does [myreturn 2 3]
== 2
Also, I do not think it is true that return/redo helps with tail call optimizations. There are examples how tail calls can be implemented without using return/redo at the site. As said, return/redo was tailor-made to support implementation of variadic functions and it is not flexible enough for other purposes as far as argument passing is concerned.
