Does Gforth optimize proper tail calls? - tail-call-optimization

I have the following (somewhat inefficient) code:
\ RNG support
VARIABLE seed
: rand ( -- random )
seed #
DUP 13 LSHIFT XOR
DUP 17 RSHIFT XOR
DUP 5 LSHIFT XOR
DUP seed ! ;
\ Checker for number of set bits
: clear-lsb ( u -- u )
DUP 1- AND ;
: under++ ( u x -- u++ x )
SWAP 1+ SWAP ;
: ones-rec ( c u -- c u | c )
?DUP ( c 0 | c u u )
0<> IF EXIT ( c | c u )
THEN under++ clear-lsb RECURSE ;
: ones-rec ( c u -- c u | c )
?DUP IF
under++ clear-lsb
RECURSE
THEN ;
: ones ( u -- n )
0 SWAP ones-rec ;
\ Makes a random number with n set bits
: rand-n-bits ( n -- random )
rand ( n random )
OVER SWAP ( n n random )
ones ( n n bits )
= ( n f )
IF EXIT
THEN RECURSE ;
By my understanding, both the RECURSEs in rand-n-bits and ones-rec should be proper tail calls. However, when I ask Gforth to do 10 rand-n-bits, I overflow the return stack. Does Gforth not optimize proper tail calls, or am I simply not doing this correctly?

Related

How to combine two sums and factor out a common element expression?

Here's the initial premise: two sums s1 and s2 are added; the sum element expressions have a common factor a[n].
s1: sum(r1[m,q]*b[m,n]*a[n],n,0,N)$
s2: sum(r2[m,q]*c[m,n]*a[n],n,0,N)$
s1+s2;
I expect the sums to be combined and the common element expression a[n] factored out:
s12: sum(a[n]*(r1[m,q]*b[m,n]+r2[m,q]*c[m,n]),n,0,N);
However, I'm unable to make Maxima produce such contraction. The most simplification I was able to obtain was using sumcontract(s1+s2) and it results in two sums without the common element being factored out:
r1[m,q]*sum(b[m,n]*a[n], n,0,N) + r2[m,q]*sum(c[m,n]*a[n], n,0,N);
How to make Maxima produce the factored out expression from s1+s2 as in s12 above?
NOTE: If we remove the r1 and r2, then the factor(sumcontract(s1+s2)) indeed results in the expected s12 expression. However, with both present, it results in two sums and does not factor out the a[n] as mentioned.
How about this. I've applied sumcontract, intosum, and factor.
(%i1) s1: sum(r1[m,q]*b[m,n]*a[n],n,0,N)$
(%i2) s2: sum(r2[m,q]*c[m,n]*a[n],n,0,N)$
(%i3) s1 + s2;
N N
==== ====
\ \
(%o3) r2 > c a + r1 > b a
m, q / m, n n m, q / m, n n
==== ====
n = 0 n = 0
(%i4) intosum (%);
N N
==== ====
\ \
(%o4) > c r2 a + > b r1 a
/ m, n m, q n / m, n m, q n
==== ====
n = 0 n = 0
(%i5) sumcontract (%);
N
====
\
(%o5) > (c r2 a + b r1 a )
/ m, n m, q n m, n m, q n
====
n = 0
(%i6) factor (%);
N
====
\
(%o6) > (c r2 + b r1 ) a
/ m, n m, q m, n m, q n
====
n = 0
In this, intosum is pushing constant factors back into the sum.

Using quotation computed by another word causes compilation error

Background
Loading the snippet below results in an error message Cannot apply "call" to a run-time computed value:
: subtract-sum ( seq -- quot: ( n -- n ) ) sum '[ _ - ] ;
: subtract-sum-seq ( seq -- x ) dup subtract-sum map ;
My understanding is that this is an expected behavior, since internal call to call in map requires inputs and outputs of processed quotation to be present at compile time.
Problem
However I tested in listener what I believe to be two equivalent expressions and they worked just fine.
Example 1:
# : subtract-sum ( seq -- quot: ( n -- n ) ) sum '[ _ - ] ;
# : subtract-sum-seq ( seq -- seq call ) dup subtract-sum ;
# { 1 2 3 4 } subtract-sum-seq
{ 1 2 3 4 }
[ 10 - ]
# map
{ -9 -8 -7 -6 }
Example 2:
# : subtract-sum-seq ( seq -- x ) dup '[ _ - ] map ;
# { 1 2 3 4 } subtract-sum-seq
{ -9 -8 -7 -6 }
Question
What is the difference between original code and the working examples that causes an error in the first one but not the other two? There clearly seems to be something about quotations I'm not understanding here.
Additional info
Interestingly, I tried to wrap my call to map inside the listener from the first example into a word and it resulted in the same error as the original code:
# { 1 2 3 4 } subtract-avg-seq map
{ -9 -8 -7 -6 }
# : apply ( -- seq ) { 1 2 3 4 } subtract-avg-seq map ; ! error: Cannot apply "call" to a run-time computed value
There are two different problems at play in this example.
The first is that in interactive code, the Listener will not check the stack effect of quotations, but they get checked when the code gets compiled in a definition. That's the reason that manually expanding the words in the Listener worked.
The second problem is that the nested effects declared for the quotations are ignored in most words. You could replace the ( n -- quot: ( n -- n ) ) with ( n -- q ) and it will work the same.
In this case, the declaration for the quotation in the first word doesn't carry to the second word. That's the reason that even if it's all theoretically correct, the compiler can't prove it; it just doesn't know the effects of the quotation.
The solution is to declare the effects of the quotation at the call site:
: subtract-sum ( seq -- quot: ( n -- n ) ) sum '[ _ - ] ;
: subtract-sum-seq ( seq -- x ) dup subtract-sum [ call( n -- n ) ] curry map ;
{ 1 2 3 4 } subtract-sum-seq .
! -> { -9 -8 -7 -6 }
https://docs.factorcode.org/content/article-effects.html
https://docs.factorcode.org/content/article-inference-escape.html

How to convert a string (hex decimal string) to uint64 in Lua?

I need to convert a sip callId (eg. 1097074724_100640573#8.8,8.8) string into requestId and I am using sha1 digest to get a hash. I need to convert this hex-decimal into uint64_t due to internal compatibility:
--
-- Obtain request-id from callId
--
-- Returns hash
--
function common_get_request_id( callId )
local command = "echo -n \"" .. callId .. "\" | openssl sha1 | sed 's/(stdin)= //g'"
local handle = assert( io.popen( command, "r" ) )
local output = handle:read( "*all" )
local outputHash = string.gsub(output, "\n", "") -- strip newline
handle:close()
-- How to convert outputHash to uint64?
end
I am not sure about uint64 support in Lua. Also, how to do the conversion?
You can receive two 32-bit integer numbers from Lua as two "double" values and convert them to one "uint64" on C side.
outputHash = '317acf63c685455cfaaf1c3255eeefd6ca3c5571'
local p = 4241942993 -- some prime below 2^32
local c1, c2 = 0, 0
for a, b in outputHash:gmatch"(%x)(%x)" do
c1 = (c1 * 16 + tonumber(a, 16)) % p
c2 = (c2 * 16 + tonumber(b, 16)) % p
end
return c1, c2 -- both numbers 0..(2^32-1)

Counting number of variables in Z3 quantified formula

I'm trying to collect all the variables in a formula (quantified formula in Z3py). A small example
w, x, y, z = Bools('w x y z')
fml = And( ForAll(x, ForAll(y, And(x, y))), ForAll(z, ForAll(w, And(z, w))) )
varSet = traverse( fml )
The code i use to traverse is
def traverse(e):
r = set()
def collect(e):
if is_quantifier(e):
# Lets assume there is only one type of quantifier
if e.is_forall():
collect(e.body())
else:
if ( is_and(e) ):
n = e.num_args()
for i in range(n):
collect( e.arg(i) )
if ( is_or(e) ):
n = e.num_args()
for i in range(n):
collect( e.arg(i) )
if ( is_not(e) ):
collect( e.arg(0) )
if ( is_var(e) ):
r.add( e )
collect(e)
return r
And I'm getting: set( [Var(0), Var(1)] ). As i understand this is due to Z3 uses De Bruijn index. Is it possible to avoid this and get the desired set: set( [Var(0), Var(1), Var(2), Var(3)] ).
Your code is correct; there is no Var(2) or Var(3) in this example. There are two top-level quantifiers and the de-Bruijn indices in each of them are 0 and 1. Those two quantifiers do not appear within the body of another quantifier, so there can be no confusion.

Can a function be optimized for tail recursion even when there are more than one distinct recursive calls?

As I mentioned in a recent SO question, I'm learning F# by going through the Project Euler problems.
I now have a functioning answer to Problem 3 that looks like this:
let rec findLargestPrimeFactor p n =
if n = 1L then p
else
if n % p = 0L then findLargestPrimeFactor p (n/p)
else findLargestPrimeFactor (p + 2L) n
let result = findLargestPrimeFactor 3L 600851475143L
However, since there are 2 execution paths that can lead to a different call to findLargestPrimeFactor, I'm not sure it can be optimized for tail recursion. So I came up with this instead:
let rec findLargestPrimeFactor p n =
if n = 1L then p
else
let (p', n') = if n % p = 0L then (p, (n/p)) else (p + 2L, n)
findLargestPrimeFactor p' n'
let result = findLargestPrimeFactor 3L 600851475143L
Since there's only one path that leads to a tail call to findLargestPrimeFactor, I figure it is indeed going to be optimized for tail recursion.
So my questions:
Can the first implementation be optimized for tail recursion even if there are two distinct recursive calls?
If both versions can be optimized for tail recursion, is there one better (more "functional", faster, etc) than the other?
Your first findLargestPrimeFactor function is tail recursive - a function can be made tail recursive if all recursive calls occur in the tail position, even if there are more than one.
Here's the IL of the compiled function:
.method public static int64 findLargestPrimeFactor(int64 p,
int64 n) cil managed
{
.custom instance void [FSharp.Core]Microsoft.FSharp.Core.CompilationArgumentCountsAttribute::.ctor(int32[]) = ( 01 00 02 00 00 00 01 00 00 00 01 00 00 00 00 00 )
// Code size 56 (0x38)
.maxstack 8
IL_0000: nop
IL_0001: ldarg.1
IL_0002: ldc.i8 0x1
IL_000b: bne.un.s IL_000f
IL_000d: br.s IL_0011
IL_000f: br.s IL_0013
IL_0011: ldarg.0
IL_0012: ret
IL_0013: ldarg.1
IL_0014: ldarg.0
IL_0015: rem
IL_0016: brtrue.s IL_001a
IL_0018: br.s IL_001c
IL_001a: br.s IL_0026
IL_001c: ldarg.0
IL_001d: ldarg.1
IL_001e: ldarg.0
IL_001f: div
IL_0020: starg.s n
IL_0022: starg.s p
IL_0024: br.s IL_0000
IL_0026: ldarg.0
IL_0027: ldc.i8 0x2
IL_0030: add
IL_0031: ldarg.1
IL_0032: starg.s n
IL_0034: starg.s p
IL_0036: br.s IL_0000
} // end of method LinkedList::findLargestPrimeFactor
The first branch in the else clause (i.e. if n % p = 0L) starts at IL_0013 and continues until IL_0024 where it unconditionally branches back to the entry point of the function.
The second branch in the else clause starts at IL_0026 and continues until the end of the function where it again unconditionally branches back to the start of the function. The F# compiler has converted your recursive function into a loop for both cases of the else clause which contains the recursive calls.
Can the first implementation be optimized for tail recursion even if there are two distinct recursive calls?
The number of recursive branches is orthogonal with tail recursion. Your first function is tail-recursive since findLargestPrimeFactor is the last operation on both two branches. If in doubt, you can try to run the function in Release mode (where tail call optimization option is turned on by default) and observe results.
If both versions can be optimized for tail recursion, is there one better (more "functional", faster, etc) than the other?
There is just a slight difference between two versions. The second version creates an extra tuple, but it will not slow down computation that much. I consider the first function more readable and straight to the point.
To be nitpicking, the first variant is shorter using elif keyword:
let rec findLargestPrimeFactor p n =
if n = 1L then p
elif n % p = 0L then findLargestPrimeFactor p (n/p)
else findLargestPrimeFactor (p + 2L) n
Another version is to use pattern matching:
let rec findLargestPrimeFactor p = function
| 1L -> p
| n when n % p = 0L -> findLargestPrimeFactor p (n/p)
| n -> findLargestPrimeFactor (p + 2L) n
Since the underlying algorithm is the same, it will not be faster either.

Resources