Why the output differs in these two erlang expression sequence in shell? - erlang

In Erlang shell why the following produces different result?
1> Total=15.
2> Calculate=fun(Number)-> Total=2*Number end.
3> Calculate(6).
exception error: no match of right hand side value 12
1> Calculate=fun(Number)-> Total=2*Number end.
2> Total=15.
3> Calculate(6).
12

In Erlang the = operator is both assignment and assertion.
If I do this:
A = 1,
A = 2,
my program will crash. I just told it that A = 1 which, when A is unbound (doesn't yet exist as a label) it now is assigned the value 1 forever and ever -- until the scope of execution changes. So then when I tell it that A = 2 it tries to assert that the value of A is 2, which it is not. So we get a crash on a bad match.
Scope in Erlang is defined by two things:
Definition of the current function. This scope is absolute for the duration of the function definition.
Definition of the current lambda or list comprehension. This scope is local to the lambda but also closes over whatever values from the outer scope are referenced.
These scopes are always superceded at the time they are declared by whatever is in the outer scope. That is how we make closures with anonymous functions. For example, let's say I had have a socket I want to send a list of data through. The socket is already bound to the variable name Socket in the head of the function, and we want to use a list operation to map the list of values to send to a side effect of being sent over that specific socket. I can close over the value of the socket within the body of a lambda, which has the effect of currying that value out of the more general operation of "sending some data":
send_stuff(Socket, ListOfMessages) ->
Send = fun(Message) -> ok = gen_tcp:send(Socket, Message) end,
lists:foreach(Send, ListOfMessages).
Each iteration of the list operation lists:foreach/2 can only accept a function of arity 1 as its first argument. We have created a closure that captures the value of Socket internally already (because that was already bound in the outer scope) and combines it with the unbound, inner variable Message. Note also that we are checking whether gen_tcp:send/2 worked each time within the lambda by asserting that the return value of gen_tcp:send/2 was really ok.
This is a super useful property.
So with that in mind, let's look at your code:
1> Total = 15.
2> Calculate = fun(Number)-> Total = 2 * Number end.
3> Calculate(6).
In the code above you've just assigned a value to Total, meaning you have created a label for that value (just like we had assigned Socket in the above example). Then later you are asserting that the value of Total is whatever the result of 2 * Number might be -- which can never be true since Total was an integer so 2 * 7.5 wouldn't cut it either, because the result would be 15.0, not 15.
1> Calculate = fun(Number)-> Total = 2 * Number end.
2> Total = 15.
3> Calculate(6).
In this example, though, you've got an inner variable called Total which does not close over any value declared in the outer scope. Later, you are declaring a label in the outer scope called Total, but by this time the lambda definition on the first line has been converted to an abstract function and the label Total as used there has been completely given over to the immutable space of the new function definition the assignment to Calculate represented. Thus, no conflict.
Consider what happens, for example, with trying to reference an inner value from a list comprehension:
1> A = 2.
2
2> [A * B || B <- lists:seq(1,3)].
[2,4,6]
3> A.
2
4> B.
* 1: variable 'B' is unbound
This is not what you would expect from, say, Python 2:
>>> a = 2
>>> a
2
>>> [a * b for b in range(1,4)]
[2, 4, 6]
>>> b
3
Incidentally, this has been fixed in Python 3:
>>> a = 2
>>> a
2
>>> [a * b for b in range(1,4)]
[2, 4, 6]
>>> b
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'b' is not defined
(And I would provide a JavaScript example for comparison as well, but the scoping rules there are just so absolutely insane it doesn't even matter...)

In the first case you have bound Total to 15. In Erlang, variable are unmutable, but in the shell when you write Total = 15. you do not really create the variable Total, the shell does its best to mimic the behavior you will have if you were running an application, and it stores in a table the couple {"Total",15}.
On the next line you define the fun Calculate. the Parser find the expression Total=2*Number, and it goes through its table to detect that Total was previously defined. The evaluation is turned into something equivalent to 15 = 2*Number.
So, in the third line, when you ask to evaluate Calculate(6), it goes to calculate and evaluates 15 = 2*6 and issues the error message
exception error: no match of right hand side value 12
In the second example, Total is not yet defined when you define the function. The function is stored without assignment (Total is not used anymore), at least no assignment to a global variable. So there is no conflict when you define Total, and no error when you evaluate Calculate(6).
The behavior would be exactly the same in a compiled module.

The variable 'Total' is already assigned a value 15, so you can NOT using the same variable name Total in the second line. You should change to the other name Total1 or Total2...

Related

What does (_,[]) mean?

I was given a question which was:
given a number N in the first argument selects only numbers greater than N in the list, so that
greater(2,[2,13,1,4,13]) = [13,4,13]
This was the solution provided:
member(_,[]) -> false;
member(H,[H|_]) -> true;
member(N,[_,T]) -> member(N,T).
I don't understand what "_" means. I understand it has something to do with pattern matching but I don't understand it completely. Could someone please explain this to me
This was the solution provided:
I think you are confused: the name of the solution function isn't even the same as the name of the function in the question. The member/2 function returns true when the first argument is an element of the list provided as the second argument, and it returns false otherwise.
I don't understand what "_" means. I understand it has something to do with pattern matching but I don't understand it completely. Could someone please explain this to me
_ is a variable name, and like any variable it will match anything. Here are some examples of pattern matching:
35> f(). %"Forget" or erase all variable bindings
ok
45> {X, Y} = {10, 20}.
{10,20}
46> X.
10
47> Y.
20
48> {X, Y} = {30, 20}.
** exception error: no match of right hand side value {30,
20}
Now why didn't line 48 match? X was already bound to 10 and Y to 20, so erlang replaces those variables with their values, which gives you:
48> {10, 20} = {30, 20}.
...and those tuples don't match.
Now lets try it with a variable named _:
49> f().
ok
50> {_, Y} = {10, 20}.
{10,20}
51> Y.
20
52> {_, Y} = {30, 20}.
{30,20}
53>
As you can see, the variable _ sort of works like the variable X, but notice that there is no error on line 52, like there was on line 48. That's because the _ variable works a little differently than X:
53> _.
* 1: variable '_' is unbound
In other words, _ is a variable name, so it will initially match anything, but unlike X, the variable _ is never bound/assigned a value, so you can use it over and over again without error to match anything.
The _ variable is also known as a don't care variable because you don't care what that variable matches because it's not important to your code, and you don't need to use its value.
Let's apply those lessons to your solution. This line:
member(N,[_,T]) -> member(N,T).
recursively calls the member function, namely member(N, T). And, the following function clause:
member(_,[]) -> false;
will match the function call member(N, T) whenever T is an empty list--no matter what the value of N is. In other words, once the given number N has not matched any element in the list, i.e. when the list is empty so there are no more elements to check, then the function clause:
member(_,[]) -> false;
will match and return false.
You could rewrite that function clause like this:
member(N, []) -> false;
but erlang will warn you that N is an unused variable in the body of the function, which is a way of saying: "Are you sure you didn't make a mistake in your function definition? You defined a variable named N, but then you didn't use it in the body of the function!" The way you tell erlang that the function definition is indeed correct is to change the variable name N to _ (or _N).
It means a variable you don't care to name. If you are never going to use a variable inside the function you can just use underscore.
% if the list is empty, it has no members
member(_, []) -> false.
% if the element I am searching for is the head of the list, it is a member
member(H,[H|_]) -> true.
% if the elem I am searching for is not the head of the list, and the list
% is not empty, lets recursively go look at the tail of the list to see if
% it is present there
member(H,[_|T]) -> member(H,T).
the above is pseudo code for what is happening. You can also have multiple '_' unnamed variables.
According to Documentation:
The anonymous variable is denoted by underscore (_) and can be used when a variable is required but its value can be ignored.
Example:
[H, _] = [1,2] % H will be 1
Also documentation says that:
Variables starting with underscore (_), for example, _Height, are normal variables, not anonymous. They are however ignored by the compiler in the sense that they do not generate any warnings for unused variables.
Sorry if this is repetitive...
What does (_,[]) mean?
That means (1) two parameters, (2) the first one matches anything and everything, yet I don't care about it (you're telling Erlang to just forget about its value via the underscore) and (3) the second parameter is an empty list.
Given that Erlang binds or matches values with variables (depending on the particular case), here you're basically looking to a match (like a conditional statement) of the second parameter with an empty list. If that match happens, the statement returns false. Otherwise, it tries to match the two parameters of the function call with one of the other two statements below it.

Why is Lua's arg table not one-indexed?

In the Lua command line, when I pass arguments to a script like this:
lua myscript.lua a b c d
I can read the name of my script and arguments from the global arg table. arg[0] contains the script name, arg[1] - arg[#arg] contain the remaining arguments. What's odd about this is table is that it has a value at index 0, unlike every other Lua array which starts indexing at 1. This means that when iterating over it like this:
for i,v in ipairs(arg) do print(i, v) end
the output only considers index 1-4, and does not print the script name. Also #arg evaluates to 4, not 5.
Is there any good reason for this decision? It initially took me aback, and I had to verify that the manual wasn't mistaken.
Asking why certain design decisions were made is always tricky because only the creator of the language can really answer. I guess it was chosen such that you can iterate over the arguments using ipairs and don't have to handle the first one special because it's the script name and not an argument.
#arg is meaningless anyway because it counts only the number of elements in the consecutive array section but the zeroth and negative indices are stored in the hashmap section. To obtain the actual number of elements use
local n = 0
for _ in pairs(arg) do
n = n + 1
end
At least it is documented in Programming in Lua:
A main script can retrieve its arguments in the global variable arg. In a call like
prompt> lua script a b c
lua creates the table arg with all the command-line arguments, before running the script. The script name goes into index 0; its first argument (a in the example), goes to index 1, and so on. Eventual options go to negative indices, as they appear before the script. For instance, in the call
prompt> lua -e "sin=math.sin" script a b
lua collects the arguments as follows:
arg[-3] = "lua"
arg[-2] = "-e"
arg[-1] = "sin=math.sin"
arg[0] = "script"
arg[1] = "a"
arg[2] = "b"
More often than not, the script only uses the positive indices (arg[1] and arg[2], in the example).
arg in Lua mimics argv in C: arg[0] contains the name of the script just like argv[0] contains the name of the program.
This does not contradict 1-based arrays in Lua, since the arguments to the script are the more important data. The name of the script is seldom used.

erlang has no shared memory. So what happens with the sum function for example?

Erlang has no shared memory. Look at the sum function,
sum(H|T)->H+sum(T);
sum([])->0
So
sum([1,2,3])=1+2+3+0
Now what happens? Does erlang creates an array with [1,1+2,1+2+3,1+2+3+0]?
This is what happens:
sum([1,2,3]) = 1 + sum([2,3])
=> sum[2, 3] = 2 + sum([3])
=> sum([3]) = 3 + sum([])
=> sum([]) = 0
Now sum([3]) can be evaluated:
sum([3]) = 3 + sum([]) = 3 + 0 = 3
which means that sum([2, 3]) can be evaluated:
sum([2, 3]) = 2 + sum([3]) = 2 + 3 = 5
which means that sum([1, 2, 3]) can be evaluated:
sum([1,2,3]) = 1 + sum([2,3]) = 1 + 5 = 6
Response to comment:
Okay, I figured what you were really asking about was immutable variables. Suppose you have the following C code:
int x = 0;
x += 1;
Does that code somehow demonstrate shared memory? If not, then C does not use shared memory for int variables...and neither does erlang.
In C you introduce a variable, sum, give it an initial value, 0, and
after that you add values to it. Erlang does not do this. What does
Erlang do?
Erlang allocates a new frame on the stack for each recursive function call. Each frame stores the local variables and their values, e.g. the parameter variables, for that particular function call. There can be multiple frames on the stack each storing a variable named X, but they are separate variables, so none of the X variables is ever mutated--instead a new X variable is created for each new frame, and the new X is given a new value.
Now, if the stack really worked like that in erlang, then a recursive function that executed millions of times would add millions of frames to the stack and in the process would probably use up its allocated memory and crash your program. To avoid using excessive amounts of memory, erlang employs tail call optimization, which allows the amount of memory that a function uses to remain constant. Tail call optimization allows erlang to replace the first frame on the stack with a subsequent frame of the same size, which keeps the memory usage constant. In addition, even when a function is not defined in a tail recursive format, like your sum() function, erlang can optimize the code so that it uses constant memory (see the Seven Myths of Erlang Performance).
In your sum() function, no variables are mutated and no memory is shared. In effect, though, function parameter variables do act like mutable variables.
My first diagram above is a representation of the stack adding a new frame for each recursive function call. If you redefine sum() to be tail recursive, like this:
sum(List)->
sum(List, 0).
sum([H|T], Total) ->
sum(T, Total+H);
sum([], Total)->
Total.
then below is a diagram of a recursive function executing that represents frames being replaced on the stack to keep the memory usage constant:
sum([1, 2, 3]) => sum([1, 2, 3], 0) [H=1, T=[2,3], Total=0]
=> sum([2,3], 1) [H=2, T=[3], Total=1]
=> sum([3], 3]) [H=3, T=[], Total=3]
=> sum([], 6) [Total=6]
=> 6
you're making recursive calls. the scope for every function body is not terminated until it returns something. so the invariable variable H for every call will be kept till the base case happen.
it could be tail recursive with the help of an accumulator in the function arguments, which is lighter on memory by calculating the H part first and then calling the successor recursive and giving the calculated value to the successor as the accumulator.
so in both ways there's nothing used outside of your function scopes.

Does ets provide a means to do an Update & Read in one go - like an Increment Operation?

I initialize a named ets table in my start(_StartType, _StartArgs) -> function, before setting up my standard Cowboy web handling routines.
ets:new(req_stats,[named_table,public]),ets:insert(req_stats,{req_count,0})
I have this function:
count_req()->
[{_,Cnt}]=ets:lookup(req_stats,req_count),
ets:insert(req_stats,Cnt+1),
Cnt+1.
My concern is this;
If i call count_req() for each web request under high load, i WILL most likely end up with an inaccurate count, because [{_,Cnt}]=ets:lookup(req_stats,req_count) could be updated several times before I return Cnt+1
Does ets provide a means to do an update & read in one go - like an increment operation?
Thanks.
You can use ets:update_counter/3:
ets:update_counter(req_stats, req_count, {2, 1})
That is, increment the 2nd element of the tuple by 1, and return the new value.
In Erlang/OTP 18.0 (released on 2015-06-24), ets:update_counter/4 was introduced. It lets you provide a default value, to be used if the key is not yet present in the table. So if you want the counter to become 1 after the first time you increment it, give 0 as the default value:
1> ets:new(req_stats, [named_table]).
req_stats
2> ets:tab2list(req_stats).
[]
3> ets:update_counter(req_stats, req_count, {2, 1}, {req_count, 0}).
1
4> ets:tab2list(req_stats).
[{req_count,1}]

Why does return/redo evaluate result functions in the calling context, but block results are not evaluated?

Last night I learned about the /redo option for when you return from a function. It lets you return another function, which is then invoked at the calling site and reinvokes the evaluator from the same position
>> foo: func [a] [(print a) (return/redo (func [b] [print b + 10]))]
>> foo "Hello" 10
Hello
20
Even though foo is a function that only takes one argument, it now acts like a function that took two arguments. Something like that would otherwise require the caller to know you were returning a function, and that caller would have to manually use the do evaluator on it.
Thus without return/redo, you'd get:
>> foo: func [a] [(print a) (return (func [b] [print b + 10]))]
>> foo "Hello" 10
Hello
== 10
foo consumed its one parameter and returned a function by value (which was not invoked, thus the interpreter moved on). Then the expression evaluated to 10. If return/redo did not exist you'd have had to write:
>> do foo "Hello" 10
Hello
20
This keeps the caller from having to know (or care) if you've chosen to return a function to execute. And is cool because you can do things like tail call optimization, or writing a wrapper for the return functionality itself. Here's a variant of return that prints a message but still exits the function and provides the result:
>> myreturn: func [] [(print "Leaving...") (return/redo :return)]
>> foo: func [num] [myreturn num + 10]
>> foo 10
Leaving...
== 20
But functions aren't the only thing that have behavior in do. So if this is a general pattern for "removing the need for a DO at the callsite", then why doesn't this print anything?
>> test: func [] [return/redo [print "test"]]
>> test
== [print "test"]
It just returned the block by value, like a normal return would have. Shouldn't it have printed out "test"? That's what do would...uh, do with it:
>> do [print "test"]
test
The short answer is because it is generally unnecessary to evaluate a block at the call point, because blocks in Rebol don't take parameters so it mostly doesn't matter where they are evaluated. However, that "mostly" may need some explanation...
It comes down to two interesting features of Rebol: static binding, and how do of a function works.
Static Binding and Scopes
Rebol doesn't have scoped word bindings, it has static direct word bindings. Sometimes it seems like we have lexical scope, but we really fake that by updating the static bindings each time we're building a new "scoped" code block. We can also rebind words manually whenever we want.
What that means for us in this case though, is that once a block exists, its bindings and values are static - they're not affected by where the block is physically located, or where it is being evaluated.
However, and this is where it gets tricky, function contexts are weird. While the bindings of words bound to a function context are static, the set of values assigned to those words are dynamically scoped. It's a side effect of how code is evaluated in Rebol: What are language statements in other languages are functions in Rebol, so a call to if, for instance, actually passes a block of data to the if function which if then passes to do. That means that while a function is running, do has to look up the values of its words from the call frame of the most recent call to the function that hasn't returned yet.
This does mean that if you call a function and return a block of code with words bound to its context, evaluating that block will fail after the function returns. However, if your function calls itself and that call returns a block of code with its words bound to it, evaluating that block before your function returns will make it look up those words in the call frame of the current call of your function.
This is the same for whether you do or return/redo, and affects inner functions as well. Let me demonstrate:
Function returning code that is evaluated after the function returns, referencing a function word:
>> a: 10 do do has [a] [a: 20 [a]]
** Script error: a word is not bound to a context
** Where: do
** Near: do do has [a] [a: 20 [a]]
Same, but with return/redo and the code in a function:
>> a: 10 do has [a] [a: 20 return/redo does [a]]
** Script error: a word is not bound to a context
** Where: function!
** Near: [a: 20 return/redo does [a]]
Code do version, but inside an outer call to the same function:
>> do f: function [x] [a: 10 either zero? x [do f 1] [a: 20 [a]]] 0
== 10
Same, but with return/redo and the code in a function:
>> do f: function [x] [a: 10 either zero? x [f 1] [a: 20 return/redo does [a]]] 0
== 10
So in short, with blocks there is usually no advantage to doing the block elsewhere than where it is defined, and if you want to it is easier to use another call to do instead. Self-calling recursive functions that need to return code to be executed in outer calls of the same function are an exceedingly rare code pattern that I have never seen used in Rebol code at all.
It could be possible to change return/redo so it would handle blocks as well, but it probably isn't worth the increased overhead to return/redo to add a feature that is only useful in rare circumstances and already has a better way to do it.
However, that brings up an interesting point: If you don't need return/redo for blocks because do does the same job, doesn't the same apply to functions? Why do we need return/redo at all?
How DO of a Function Works
Basically, we have return/redo because it uses exactly the same code that we use to implement do of a function. You might not realize it, but do of a function is really unusual.
In most programming languages that can call a function value, you have to pass the parameters to the function as a complete set, sort of how R3's apply function works. Regular Rebol function calling causes some unknown-ahead-of-time number of additional evaluations to happen for its arguments using unknown-ahead-of-time evaluation rules. The evaluator figures out these evaluation rules at runtime and just passes the results of the evaluation to the function. The function itself doesn't handle the evaluation of its parameters, or even necessarily know how those parameters were evaluated.
However, when you do a function value explicitly, that means passing the function value to a call to another function, a regular function named do, and then that magically causes the evaluation of additional parameters that weren't even passed to the do function at all.
Well it's not magic, it's return/redo. The way do of a function works is that it returns a reference to the function in a regular shortcut-return value, with a flag in the shortcut-return value that tells the interpreter that called do to evaluate the returned function as if it were called right there in the code. This is basically what is called a trampoline.
Here's where we get to another interesting feature of Rebol: The ability to shortcut-return values from a function is built into the evaluator, but it doesn't actually use the return function to do it. All of the functions you see from Rebol code are wrappers around the internal stuff, even return and do. The return function we call just generates one of those shortcut-return values and returns it; the evaluator does the rest.
So in this case, what really happened is that all along we had code that did what return/redo does internally, but Carl decided to add an option to our return function to set that flag, even though the internal code doesn't need return to do so because the internal code calls the internal function. And then he didn't tell anyone that he was making the option externally available, or why, or what it did (I guess you can't mention everything; who has the time?). I have the suspicion, based on conversations with Carl and some bugs we've been fixing, that R2 handled do of a function differently, in a way that would have made return/redo impossible.
That does mean that the handling of return/redo is pretty thoroughly oriented towards function evaluation, since that is its entire reason for existing at all. Adding any overhead to it would add overhead to do of a function, and we use that a lot. Probably not worth extending it to blocks, given how little we'd gain and how rarely we'd get any benefit at all.
For return/redo of a function though, it seems to be getting more and more useful the more we think about it. In the last day we've come up with all sorts of tricks that this enables. Trampolines are useful.
While the question originally asked why return/redo did not evaluate blocks, there were also formulations like: "is cool because you can do things like tail call optimization", "[can write] a wrapper for the return functionality", "it seems to be getting more and more useful the more we think about it".
I do not think these are true. My first example demonstrates a case where return/redo can really be used, an example being in the "area of expertise" of return/redo, so to speak. It is a variadic sum function called sumn:
use [result collect process] [
collect: func [:value [any-type!]] [
unless value? 'value [return process result]
append/only result :value
return/redo :collect
]
process: func [block [block!] /local result] [
result: 0
foreach value reduce block [result: result + value]
result
]
sumn: func [] [
result: copy []
return/redo :collect
]
]
This is the usage example:
>> sumn 1 * 2 2 * 3 4
== 12
Variadic functions taking "unlimited number" of arguments are not as useful in Rebol as it may look at the first sight. For example, if we wanted to use the sumn function in a small script, we would have to wrap it into a paren to indicate where it should stop collecting arguments:
result: (sumn 1 * 2 2 * 3 4)
print result
This is not any better than using a more standard (non-variadic) alternative called e.g. block-sum and taking just one argument, a block. The usage would be like
result: block-sum [1 * 2 2 * 3 4]
print result
Of course, if the function can somehow detect what is its last argument without needing enclosing paren, we really gain something. In this case we could use the #[unset!] value as the sumn stopping argument, but that does not spare typing either:
result: sumn 1 * 2 2 * 3 4 #[unset!]
print result
Seeing the example of a return wrapper I would say that return/redo is not well suited for return wrappers, return wrappers being outside of its area of expertise. To demonstrate that, here is a return wrapper written in Rebol 2 that actually is outside of return/redo's area of expertise:
myreturn: func [
{my RETURN wrapper returning the string "indefinite" instead of #[unset!]}
; the [throw] attribute makes this function a RETURN wrapper in R2:
[throw]
value [any-type!] {the value to return}
] [
either value? 'value [return :value] [return "indefinite"]
]
Testing in R2:
>> do does [return #[unset!]]
>> do does [myreturn #[unset!]]
== "indefinite"
>> do does [return 1]
== 1
>> do does [myreturn 1]
== 1
>> do does [return 2 3]
== 2
>> do does [myreturn 2 3]
== 2
Also, I do not think it is true that return/redo helps with tail call optimizations. There are examples how tail calls can be implemented without using return/redo at the www.rebol.org site. As said, return/redo was tailor-made to support implementation of variadic functions and it is not flexible enough for other purposes as far as argument passing is concerned.

Resources