Plotting dual recursion equations with wxplot2d always exceeds control stack - maxima

I have the following Maxima code:
A[t] :=
if t=0 then A0
else (a+b)*A[t-1]+B[t-1]+c
;
B[t] :=
if t=0 then B0
else (a-b)*B[t-1]+c
;
a:0.1;
b:0.1;
c:1;
A0:100;
B0:0;
wxplot2d(A[t], [t, 0, 100]);
The only remotely weird thing I can think of is that recurion equation A depends on recurion equation B. I would think everything else is extremely basic.
But when I run it, I always get the following error repeated multiple times and no plot.
Maxima encountered a Lisp error:
Control stack exhausted (no more space for function call frames).
This is probably due to heavily nested or infinitely recursive function
calls, or a tail call that SBCL cannot or has not optimized away.
Even when I plot from time steps 0 to 1 with wxplot2d(A[t], [t, 0, 1]);, which by my count would only be two recursions and one external function reference, I still get the same error. Is there no way to have Maxima plot these equations?

I find that the following seems to work.
myvalues: makelist ([t, A[t]], t, 0, 100);
wxplot2d ([discrete, myvalues]);
Just to be clear, A[t] := ..., with square brackets, defines what is called an array function Maxima, which is a memoizing function (i.e. remembers previously calculated values). An ordinary, non-memoizing function is defined as A(t) := ..., with parentheses.
Given that A and B are defined only for nonnegative integers, it makes sense that they should be memoizing functions, so there's no need to change it.

Related

What do the square brackets and numbers mean in Lua's documentation?

What does the [x, y, z] section mean in the Lua docs? You see it on every C function.
For example, lua_gettop says:
int lua_gettop (lua_State *L);
[-0, +0, –]
Returns the index of the top element in the stack. Because indices start at 1, this result is equal to the number of elements in the stack; in particular, 0 means an empty stack.
What does [-0, +0, –] mean?
The [-0, +0, –] section tells you how that function manipulates the Lua argument stack. There's an explanation in the Functions and Types section, but their wall of text is a bit hard to parse. I'll summarize.
Given:
[-o, +p, x]
Those elements mean:
o: how many elements the function pops from the stack.
mnemonic: - because it reduces the size of the stack.
-0 means it pops nothing.
p: how many elements the function pushes onto the stack.
mnemonic: + because it increases the size of the stack.
+0 means it pushes nothing.
Any function always pushes its results after popping its arguments.
the form x|y means the function can push or pop x or y elements, depending on the situation;
? means that we cannot know how many elements the function pops/pushes by looking only at its arguments. (For instance, they may depend on what is in the stack.)
x: whether the function may raise errors:
- means the function never raises any error;
m means the function may raise only out-of-memory errors;
v means the function may raise the errors explained in the text;
e means the function can run arbitrary Lua code, either directly or through metamethods, and therefore may raise any errors.

Maxima: Is there any way to make functions defined within the main function be local, in a similar way to local variables?

I wonder if there is any way to make functions defined within the main function be local, in a similar way to local variables. For example, in this function that calculates the gradient of a scalar function,
grad(var,f) := block([aux],
aux : [gradient, DfDx[i]],
gradient : [],
DfDx[i] := diff(f(x_1,x_2,x_3),var[i],1),
for i in [1,2,3] do (
gradient : append(gradient, [DfDx[i]])
),
return(gradient)
)$
The variable gradient that has been defined inside the main function grad(var,f) has no effect outside the main function, as it is inside the aux list. However, I have observed that the function DfDx, despite being inside the aux list, does have an effect outside the main function.
Is there any way to make the sub-functions defined inside the main function to be local only, in a similar way to what can be made with local variables? (I know that one can kill them once they have been used, but perhaps there is a more elegant way)
To address the problem you are needing to solve here, another way to compute the gradient is to say
grad(var, e) := makelist(diff(e, var1), var1, var);
and then you can say for example
grad([x, y, z], sin(x)*y/z);
to get
cos(x) y sin(x) sin(x) y
[--------, ------, - --------]
z z 2
z
(There isn't a built-in gradient function; this is an oversight.)
About local functions, bear in mind that all function definitions are global. However you can approximate a local function definition via local, which saves and restores all properties of a symbol. Since the function definition is a property, local has the effect of temporarily wiping out an existing function definition and later restoring it. In between you can create a temporary function definition. E.g.
foo(x) := 2*x;
bar(y) := block(local(foo), foo(x) := x - 1, foo(y));
bar(100); /* output is 99 */
foo(100); /* output is 200 */
However, I don't this you need to use local -- just makelist plus diff is enough to compute the gradient.
There is more to say about Maxima's scope rules, named and unnamed functions, etc. I'll try to come back to this question tomorrow.
To compute the gradient, my advice is to call makelist and diff as shown in my first answer. Let me take this opportunity to address some related topics.
I'll paste the definition of grad shown in the problem statement and use that to make some comments.
grad(var,f) := block([aux],
aux : [gradient, DfDx[i]],
gradient : [],
DfDx[i] := diff(f(x_1,x_2,x_3),var[i],1),
for i in [1,2,3] do (
gradient : append(gradient, [DfDx[i]])
),
return(gradient)
)$
(1) Maxima works mostly with expressions as opposed to functions. That's not causing a problem here, I just want to make it clear. E.g. in general one has to say diff(f(x), x) when f is a function, instead of diff(f, x), likewise integrate(f(x), ...) instead of integrate(f, ...).
(2) When gradient and Dfdx are to be the local variables, you have to name them in the list of variables for block. E.g. block([gradient, Dfdx], ...) -- Maxima won't understand block([aux], aux: ...).
(3) Note that a function defined with square brackets instead of parentheses, e.g. f[x] := ... instead of f(x) := ..., is a so-called array function in Maxima. An array function is a memoizing function, i.e. if f[x] is called two or more times, the return value is only computed once, and then returned every time thereafter. Sometimes that's a useful optimization when the domain of the function comprises a finite set.
(4) Bear in mind that x_1, x_2, x_3, are distinct symbols, not related to each other, and not related to x[1], x[2], x[3], even if they are displayed the same. My advice is to work with subscripted symbols x[i] when i is a variable.
(5) About building up return values, try to arrange to compute the whole thing at one go, instead of growing the result incrementally. In this case, makelist is preferable to for plus append.
(6) The return function in Maxima acts differently than in other programming languages; it's a little hard to explain. A function returns the value of the last expression which was evaluated, so if gradient is that last expression, you can just write grad(var, f) := block(..., gradient).
Hope this helps, I know it's obscure and complex. The Maxima programming language was not designed before being implemented, and some of the decisions are clearly questionable at the long interval of more than 50 years (!) later. That's okay, they were figuring it out as they went along. There was not a body of established results which could provide a point of reference; the original authors were contributing to what's considered common knowledge today.

Check that at least 1 element is true in each of multiple vectors of compare results - horizontal OR then AND

I'm looking for an SSE Bitwise OR between components of same vector. (Editor's note: this is potentially an X-Y problem, see below for the real comparison logic.)
I am porting some SIMD logic from SPU intrinsics. It has an instruction
spu_orx(a)
Which according to the docs
spu_orx: OR word across d = spu_orx(a) The four word elements of
vector a are logically Ored. The result is returned in word element 0
of vector d. All other elements (1,2,3) of d are assigned a value of
zero.
How can I do that with SSE 2 - 4 involving minimum instruction? _mm_or_ps is what I got here.
UPDATE:
Here is the scenario from SPU based code:
qword res = spu_orx(spu_or(spu_fcgt(x, y), spu_fcgt(z, w)))
So it first ORs two 'greater' comparisons, then ORs its result.
Later couples of those results are ANDed to get final comparison value.
This is effectively doing (A||B||C||D||E||F||G||H) && (I||J||K||L||M||N||O||P) && ... where A..D are the 4x 32-bit elements of the fcgt(x,y) and so on.
Obviously vertical _mm_or_ps of _mm_cmp_ps results is a good way to reduce down to 1 vector, but then what? Shuffle + OR, or something else?
UPDATE 1
Regarding "but then what?"
I perform
qword res = spu_orx(spu_or(spu_fcgt(x, y), spu_fcgt(z, w)))
On SPU it goes like this:
qword aRes = si_and(res, res1);
qword aRes1 = si_and(aRes, res2);
qword aRes2 = si_and(aRes1 , res3);
return si_to_uint(aRes2 );
several times on different inputs,then AND those all into a single result,which is finally cast to integer 0 or 1 (false/true test)
SSE4.1 PTEST bool any_nonzero = !_mm_testz_si128(v,v);
That would be a good way to horizontal OR + booleanize a vector into a 0/1 integer. It will compile to multiple instructions, and ptest same,same is 2 uops on its own. But once you have the result as a scalar integer, scalar AND is even cheaper than any vector instruction, and you can branch on the result directly because it sets integer flags.
#include <immintrin.h>
bool any_nonzero_bit(__m128i v) {
return !_mm_testz_si128(v,v);
}
On Godbolt with gcc9.1 -O3 -march=nehalem:
any_nonzero(long long __vector(2)):
ptest xmm0, xmm0 # 2 uops
setne al # 1 uop with false dep on old value of RAX
ret
This is only 3 uops on Intel for a horizontal OR into a single bit in an integer register. AMD Ryzen ptest is only 1 uop so it's even better.
The only risk here is if gcc or clang creates false dependencies by not xor-zeroing eax before doing a setcc into AL. Usually gcc is pretty fanatical about spending extra uops to break false dependencies so I don't know why it doesn't here. (I did check with -march=skylake and -mtune=generic in case it was relying on Nehalem partial-register renaming for -march=nehalem. Even -march=znver1 didn't get it to xor-zero EAX before the ptest.)
It would be nice if we could avoid the _mm_or_ps and have PTEST do all the work. But even if we consider inverting the comparisons, the vertical-AND / horizontal-OR behaviour doesn't let us check something about all 8 elements of 2 vectors, or about any of those 8 elements.
e.g. Can PTEST be used to test if two registers are both zero or some other condition?
// NOT USEFUL
// 1 if all the vertical pairs AND to zero.
// but 0 if even one vertical AND result is non-zero
_mm_testz_si128( _mm_castps_si128(_mm_cmpngt_ps(x,y)),
_mm_castps_si128(_mm_cmpngt_ps(z,w)));
I mention this only to rule it out and save you the trouble of considering this optimization idea. (#chtz suggested it in comments. Inverting the comparison is a good idea that can be useful for other ways of doing things.)
Without SSE4.1 / delaying the horizontal OR
We might be able to delay horizontal ORing / booleanizing until after combining some results from multiple vectors. This makes combining more expensive (imul or something), but saves 2 uops in the vector -> integer stage vs. PTEST.
x86 has cheap vector mask->integer bitmap with _mm_movemask_ps. Especially if you ultimately want to branch on the result, this might be a good idea. (But x86 doesn't have a || instruction that booleanizes its inputs either so you can't just & the movemask results).
One thing you can do is integer multiply movemask results: x * y is non-zero iff both inputs are non-zero. Unlike x & y which can be false for 0b0101 &0b1010for example. (Our inputs are 4-bit movemask results andunsigned` is 32-bit so we have some room before we overflow). AMD Bulldozer family has an integer multiply that isn't fully pipelined so this could be a bottleneck on old AMD CPUs. Using just 32-bit integers is also good for some low-power CPUs with slow 64-bit multiply.
This might be good if throughput is more of a bottleneck than latency, although movmskps can only run on one port.
I'm not sure if there are any cheaper integer operations that let us recover the logical-AND result later. Adding doesn't work; the result is non-zero even if only one of the inputs was non-zero. Concatenating the bits together (shift+or) is also of course like an OR if we eventually just test for any non-zero bit. We can't just bitwise AND because 2 & 1 == 0, unlike 2 && 1.
Keeping it in the vector domain
Horizontal OR of 4 elements takes multiple steps.
The obvious way is _mm_movehl_ps + OR, then another shuffle+OR. (See Fastest way to do horizontal float vector sum on x86 but replace _mm_add_ps with _mm_or_ps)
But since we don't actually need an exact bitwise-OR when our inputs are compare results, we just care if any element is non-zero. We can and should think of the vectors as integer, and look at integer instructions like 64-bit element ==. One 64-bit element covers/aliases two 32-bit elements.
__m128i cmp = _mm_castps_si128(cmpps_result); // reinterpret: zero instructions
// SSE4.1 pcmpeqq 64-bit integer elements
__m128i cmp64 = _mm_cmpeq_epi64(cmp, _mm_setzero_si128()); // -1 if both elements were zero, otherwise 0
__m128i swap = _mm_shuffle_epi32(cmp64, _MM_SHUFFLE(1,0, 3,2)); // copy and swap, no movdqa instruction needed even without AVX
__m128i bothzero = _mm_and_si128(cmp64, swap); // both halves have the full result
After this logical inversion, ORing together multiple bothzero results will give you the AND of multiple conditions you're looking for.
Alternatively, SSE4.1 _mm_minpos_epu16(cmp64) (phminposuw) will tell us in 1 uop (but 5 cycle latency) if either qword is zero. It will place either 0 or 0xFFFF in the lowest word (16 bits) of the result in this case.
If we inverted the original compares, we could use phminposuw on that (without pcmpeqq) to check if any are zero. So basically a horizontal AND across the whole vector. (Assuming that it's elements of 0 / -1). I think that's a useful result for inverted inputs. (And saves us from using _mm_xor_si128 to flip the bits).
An alternative to pcmpeqq (_mm_cmpeq_epi64) would be SSE2 psadbw against a zeroed vector to get 0 or non-zero results in the bottom of each 64-bit element. It won't be a mask, though, it's 0xFF * 8. Still, it's always that or 0 so you can still AND it. And it doesn't invert.

How do I tell Maxima about valid approximations of subexpressions of a large expression?

I have a fairly large expression that involves a lot of subexpressions of the form (100*A^3 + 200*A^2 + 100*A)*x or (-A^2 - A)*y or (100*A^2 + 100*A)*z
I know, but I don't know how to tell Maxima this, that it in this case is valid to make the approximation A+1 ~ A, thereby effectively removing anything but the highest power of A in each coefficient.
I'm now looking for functions, tools, or methods that I can use to guide Maxima in dropping various terms that aren't important.
I have attempted with subst, but that requires me to specify each and every factor separately, because:
subst([A+1=B], (A+2)*(A+1)*2);
subst([A+1=B], (A+2)*(A*2+2));
(%o1) 2*(A+2)*B
(%o2) (A+2)*(2*A+2)
(that is, I need to add one expression for each slightly different variant)
I tried with ratsimp, but that's too eager to change every occurrence:
ratsubst(B, A+1, A*(A+1)*2);
ratsubst(B, A+1, A*(A*2+2));
(%o3) 2*B^2-2*B
(%o4) 2*B^2-2*B
which isn't actually simpler, as I would have preferred the answer to have been given as 2*B^2.
In another answer, (https://stackoverflow.com/a/22695050/5999883) the functions let and letsimp were suggested for the task of substituting values, but I fail to get them to really do anything:
x:(A+1)*A;
let ( A+1, B );
letsimp(x);
(x)A*(A+1)
(%o6) A+1 --\> B
(%o7) A^2+A
Again, I'd like to approximate this expression to A^2 (B^2, whatever it's called).
I understand that this is, in general, a hard problem (is e.g. A^2 + 10^8*A still okay to approximate as A^2?) but I think that what I'm looking for is a function or method of calculation that would be a little bit smarter than subst and can recognize that the same substitution could be done in the expression A^2+A as in the expression 100*A^2+100*A or -A^2-A instead of making me create a list of three (or twenty) individual substitutions when calling subst. The "nice" part of the full expression that I'm working on is that each of these A factors are of the form k*A^n*(A+1)^m for various small integers n, m, so I never actually end up with the degenerate case mentioned above.
(I was briefly thinking of re-expressing my expression as a polynomial in A, but this will not work as the only valid approximation of the expression (A^3+A^2+A)*x + y is A^3*x + y -- I know nothing about the relative sizes of x and y.

Maxima spline result not changing in loop

I'm trying to iterate over 2 parameters to get two splines for each pair. The code:
y_arr:[0.2487,0.40323333333333,0.55776666666667,0.7123]$
str_h_arr:[-0.8,-1.0,-1.2,-1.4]$
z_points:[0,0.1225,0.245,0.3675,0.49,0.6125,0.735,0.8575,0.98,1.1025,1.225,1.3475,1.47,
1.5925,1.715,1.8375,1.96,2.0825,2.205,2.26625,2.3275,2.3765,2.401,2.4255,2.43775,
2.4451,2.448775,2.45]$
length(a)$
length(b)$
load(interpol)$
for y_k:1 thru length(a) do (
for h_k:1 thru length(b) do (
y:y_arr[y_k],
str_h:str_h_arr[h_k],
bot_startpoints: [[-2.45,0],[0,y],[2.45,0]],
top_startpoints: [[-2.45,str_h_min],[0,y+str_h],[2.45,str_h_min]],
spline: cspline(bot_startpoints),
bot(x):=''spline,
print(bot(0))
)
);
//Part with top spline is skipped.
For all iterations output is now the same: 0.7123
What I want to get is two splines like in picture
Members of y_arr are y values in x=0, str_h_arr: height between splines in x=0.
So bot(0) should give me all values from y_arr.
If i don't use loop and just give this block values of y_k and h_k, it's working properly.
Can anybody point me to where I'm (or Maxima is) wrong with using loop with cspline?
The problem is that quote-quote (two single quotes, '') is applied only once, when it is read in input; it is not applied every time the expression in evaluated in the loop.
Looks like you need only to evaluate the spline at x = 0 and nothing else. So I'll suggest ev(spline, x=0) to evaluate it. You can also construct a lambda expression and evaluate that.
Here is the program after I've revised it as described above. Also, it is simpler and clearer to write for y in y_arr do (...) rather than making use of an explicit index for y_arr.
y_arr:[0.2487,0.40323333333333,0.55776666666667,0.7123]$
str_h_arr:[-0.8,-1.0,-1.2,-1.4]$
z_points:[0,0.1225,0.245,0.3675,0.49,0.6125,0.735,0.8575,0.98,1.1025,1.225,1.3475,1.47,
1.5925,1.715,1.8375,1.96,2.0825,2.205,2.26625,2.3275,2.3765,2.401,2.4255,2.43775,
2.4451,2.448775,2.45]$
load(interpol)$
for y in y_arr do (
for str_h in str_h_arr do (
bot_startpoints: [[-2.45,0],[0,y],[2.45,0]],
top_startpoints: [[-2.45,str_h_min],[0,y+str_h],[2.45,str_h_min]],
spline: cspline(bot_startpoints),
print (ev (spline, x=0))));
This is the output I get:
0.2487
0.2487
0.2487
0.2487
0.40323333333333
0.40323333333333
0.40323333333333
0.40323333333333
0.55776666666667
0.55776666666667
0.55776666666667
0.55776666666667
0.7123
0.7123
0.7123
0.7123

Resources