How does single assignment in Erlang lead to more readable code? - erlang

How does single assignment in Erlang lead to more readable code (referential transparency)?

Coding is easy, debugging is hard. Code to make debugging trivial. -- Barry Rountree
With a single assignment, you can be sure that variable has one value in the whole function body. It makes debugging much easier. You can put debugging and logging whenever you want. You can easily spot the place where it gains its value and so on. Isn't it obvious?

On of the goals of functional programs is to avoid side effects. In short this is a property of the code that it behaves exactly the same every time it's executed. That's why shared state is being avoided and why developers often frown upon the process dictionary in Erlang. A pure functional language wouldn't have any side effects. Various functional languages try to formalize code that produces side effects, like for example Haskell.
Obviously, if the value assigned to a variable can be changed, then the same function executed twice would produce different result depending on the value contained in the variable. In OOP output from a function executed on an object produces a result that depends on the state contained in that object. So you can't understand the code properly without knowing the state contained in the object as well.
With single assignment the output doesn't depend on the state but only on the arguments passed to the function. This is especially useful when something crashed and you have a stack trace or you log a debug output from a function. You can read the code and assign values to each variable knowing that nothing would have altered those values if the same code was to be executed again.

Related

COBOL: What is the benefit of using paragraphs and sections instead of subprograms?

What is the benefit of using paragraphs and sections for executing pieces of code, instead of using a subprogram instead? As far as I can see paragraphs and sections are dangerous because they have an non intuitive control flow, its easy to fall through and execute stuff you never meant to execute, and there is no variable (item) scoping, therefore it encourages a style of programming where everything is visible to everything else. Its a slippery soup.
I read a lot, but I could not find anything related to the comparative benefit of paragraphs/sections vs a subprogram. I also asked online some people in some COBOL forum, but their answers were along the lines of "is this a joke" or "go learn programming"(!!!).
I do not wish to engage in a discussion of stylistic preferences, everyone writes the way that their brain works, I only want to know, is there any benefit to using paragraphs/sections for flow control? As in, are there any COBOL operations that can be done only by using paragraphs/sections? Or is it just a remnant of an early way of thinking about code?
Because no other language I know of has mimicked that, so it either has some mechanical concrete essential reason to exist in COBOL, or it is a stylistic preference of the COBOL people. Can someone illuminate me on what is happening?
These are multiple questions... the two most important ones:
Are there any COBOL operations that can be done only by using paragraphs/sections?
Yes. A likely not complete list:
USE statements in DECLARATIVES can only apply to a paragraph or a section. These are used for handling file errors and exceptions. Not all compilers support this COBOL standard feature in full.
Segmentation (primary: a program that is only partially loaded in memory) is only possible with sections; but that is to be considered a "legacy feature" (at least I don't know of people actually using it this way explicitly); see the comment of Gilbert Le Blanc for more details on this
fall-through, many other languages have this feature with a kind of a switch statement (COBOL's EVALUATE, which is not the same as a common switch but can be used similar has an explicit break and no fall-through)
GO TO DEPENDING ON (could be recoded to achieve something similar with EVALUATE and then either PERFORM, if the paragraphs are expected to fall-through, which is not uncommon, then that creates a lot of extra code)
GO TO in general and especially nice - the old obsolete ALTER statement
PERFORM statement, format 1 "out-of-line"
file state is only shared between programs when you define it as EXTERNAL, and you often want to have a file state being limited to a single program
up to COBOL85: EXIT statement (plain without anything else, actually doing nothing else then a CONTINUE would)
What is the benefit of using paragraphs and sections for executing pieces of code, instead of using a subprogram instead?
shared data (I guess you know of programs with static data or otherwise (module)global data that is shared between functions/methods and also different source code files)
much less overhead than a CALL is
consistency:
you know what's in your code, you don't know what another program does (or at least: you cannot guarantee that it will do the same some years later exactly the same)
easier to extend/change: adding another variable (and also removing part of it, change its size) to a CALL USING means that you also have to adjust the called program - and all programs that call this, even when you place the complete definition in a copybook, which is very reasonable, this means you have to recompile all programs that use this
a section/paragraph is always available (it is already loaded when the program runs), a CALLed program may not be available or lead to an exception, for example because it cannot be loaded as its parameters have changed
less stuff to code
Note: While not all compilers support this you can work around nearly all of the runtime overhead and consistency issue when you use one source files with multiple program definitions in (possibly nested) and using a static call-convention. This likely gives you the "modern" view you aim for with scope-limitation of variables, within the programs either persistent (like local-static) when defined in WORKING-STORAGE or always passed when in LINKAGE or "local-temporary" when in LOCAL-STORAGE.
Should all code of an application be in one program?
[I've added this one to not lead to bad assumptions] Of course not!
Using sub-programs and also user-defined functions (possibly even nested providing the option for "scoped" and "shared" data) is a good thing where you have a "feature boundary" (for example: access to data, user-interface, ...) or with "modern" COBOL where you have a "language boundary" (for example: direct CALLs of C/Java/whatever), but it isn't "jut for limiting a counter to a section" - in this case: either define a variable which state is not guaranteed to be available after any PERFORM or define one for the section/paragraph; in both cases it would be reasonable to use a prefix telling you this.
Using that "separate by boundary" approach also takes care of the "bad habit of everything being seen by everyone" issue (which is in any case only true for "all sections/paragraphs in the same program).
Personal side note: I would only use paragraphs where it is a "shop/team rule" (it is better to stay consistent then to do things different "just because they are better" [still providing an option to possibly change the common rule]) or for GO TO, which I normally not use.
SECTIONs and EXIT SECTION + EXIT PERFORM [CYCLE] (and very rarely GOBACK/EXIT PROGRAM) make paragraphs nearly unnecessary.
very short answer. subroutines!!
Subroutines execute in the context of the calling routine. Two virtues: no parameter passing, easy to create. In some languages, subroutines are private to (and are part of) the calling (invoking) routine (see various dialects of BASIC).
direct answer: Section and Paragraph support a different way of thinking about programming. Higher performance than call subprogram. Supports overlays. The "fall thru" aspect can be quite useful, a feature rather than a vice. They may be necessary depending on what you are doing with a specific COBOL compiler.
See also PL/1, BAL/360, architecture 360/370/...
As a veteran Cobol dinosaur, I would say asking about the benefit is not the right question. I used paragraph (or section) differently than a subprogram. The right question in my opinion is when to use them logically. If I can make an analogy, if you have a Dog java class, you will write Dog-appropriate methods within it. If there's a cat involved, you may need a helper class. In this case the helper class is the subprogram. Though, you can instead code the helper class methods inside the Dog class, but that will be bad coding.
In any other language I would recommend putting self contained functions into subroutines.
However in COBOL not so much. If the code is very likely to be used in other programs then a subroutine is a good idea. Otherwise not!
The reason being the total lack of any checks on the number type or existence of passed parameters at compile time. Small errors in call statements lead to program crashes at run time. Limiting the use of sub-routines and carefully checking the calling code for errors makes for a more reliable program.
Using paragraphs any type mismatch will be flagged at compile time, or, an automatic conversion will occur.

Can I somehow preserve values of local macros in Stata after the completion of the do-file?

Whenever I add new lies to the code (e.g. when computing a different estimate) I do not want to rerun the whole do-file again. However, I often need the values of certain local macros that were generated during the previous run of the do-file.
Is there a way to keep those values? Or I should switch to using more globals instead?
Yes, use global.
But note that you need to be careful with global for the exact reason you are using it: the macro remains in memory until you exit that instance of Stata, or until you reset it within the code.
Some people have very strong feelings about not using global ever (see pp5 and continuing here: http://faculty.chicagobooth.edu/matthew.gentzkow/research/ra_manual_coding.pdf). Once you learn their properties, and to not incur the small number of problems they can potentially cause, you should be fine.
Globals are by no means the only alternative.
First, consider using scalars. A scalar with a permanent name will survive beyond the end of a do-file.
Second, consider converting your do-file to a program and learning about saved results.
Third, you can always consider putting results in a new variable; it's just that it is usually bad style and wasteful on storage.
At a guess, the first is likely to be the most useful for you. Many Stata users are happy to use do-files with many dataset-specific statements. Jumping to writing fully-fledged and more general programs is a big jump and not (at first) trivial.

Does functional programming avoid state?

According to wikipedia: functional programming is a programming paradigm that treats computation as the evaluation of mathematical functions and avoids state and mutable data. (emphasis mine).
Is this really true? My personal understanding is that it makes the state more explicit, in the sense that programming is essentially applying functions (transforms) to a given state to get a transformed state. In particular, constructs like monads let you carry the state explicitly through functions. I also don't think that any programming paradigm can avoid state altogether.
So, is the Wikipedia definition right or wrong? And if it is wrong what is a better way to define functional programming?
Edit: I guess a central point in this question is what is state? Do you understand state to be variables or object attributes (mutable data) or is immutable data also state? To take an example (in F#):
let x = 3
let double n = 2 * n
let y = double x
printfn "%A" y
would you say this snippet contains a state or not?
Edit 2: Thanks to everyone for participating. I now understand the issue to be more of a linguistic discrepancy, with the use of the word state differing from one community to the other, as Brian mentions in his comment. In particular, many in the functional programming community (mainly Haskellers) interpret state to carry some state of dynamism like a signal varying with time. Other uses of state in things like finite state machine, Representational State Transfer, and stateless network protocols may mean different things.
I think you're just using the term "state" in an unusual way. If you consider adding 1 and 1 to get 2 as being stateful, then you could say functional programming embraces state. But when most people say "state," they mean storing and changing values, such that calling a function might leave things different than before the function was called, and calling the function a second time with the same input might not have the same result.
Essentially, stateless computations are things like this:
The result of 1 + 1
The string consisting of 's' prepended to "pool"
The area of a given rectangle
Stateful computations are things like this:
Increment a counter
Remove an element from the array
Set the width of a rectangle to be twice what it is now
Rather than avoids state, think of it like this:
It avoids changing state. That word "mutable".
Think in terms of a C# or Java object. Normally you would call a method on the object and you might expect it to modify its internal state as a result of that method call.
With functional programming, you still have data, but it's just passed through each function, creating output that corresponds to the input and the operation.
At least in theory. In reality, not everything you do actually works functionally so you frequently end up hiding state to make things work.
Edit:
Of course, hiding state also frequently leads to some spectacular bugs, which is why you should only use functional programming languages for purely functional situations. I've found that the best languages are the ones that are object oriented and functional, like Python or C#, giving you the best of both worlds and the freedom to move between each as necessary.
The Wikipedia definition is right. Functional programming avoids state. Functional programming means that you apply a function to a given input and get a result. However, it is guaranteed that your input will not be modified in any way. It doesn't mean that you cannot have a state at all. Monads are perfect example of that.
The Wikipedia definition is correct. It may seem puzzling at first, but if you start working with say Haskell, you'll note that you don't have any variables laying around containing values.
The state can still be sort of represented using state monads.
At some point, all apps must have state (some data that changes). Otherwise, you would start the app up, it wouldn't be able to do anything, and the app would finish.
You can have immutable data structures as part of your state, but you will swap those for other instances of other immutable data structures.
The point of immutability is not that data does not change over time. The point is that you can create pure functions that don't have side effects.

Are there any downsides to passing in an Erlang record as a function argument?

Are there any downsides to passing in an Erlang record as a function argument?
There is no downside, unless the caller function and the called function were compiled with different 'versions' of the record.
Some functions from erlangs standard library do indeed use records in their interfaces (I can't recall which ones, right now--but there are a few), but in my humble opinion, the major turnoff is, that the user will have to include your header file, just to use your function.
That seems un-erlangy to me (you don't ever do that normally, unless you're using said functions from the stdlib), creates weird inter-dependencies, and is harder to use from the shell (I wouldn't know from the top of my head how to load & use records from the shell -- I usually just "cheat" by constructing the tuple manually...)
Also, handling records is a bit different from the stuff you usually do, since their keys per default take the atom 'undefined' as value, au contraire to how you usually do it with proplists, for instance (a value that wasn't set just isn't there) -- this might cause some confusion for people who do not normally work a lot with records.
So, all-in-all, I'd usually prefer a proplist or something similar, unless I have a very good reason to use a record. I do usually use records, though, for internal state of for example a gen_server or a gen_fsm; It's somewhat easier to update that way.
I think the biggest downside is that it's not idiomatic. Have you ever seen an API that required you to construct a record and pass it in?
Why would you want to do something that's going to feel foreign to any erlang programmer? There's a convention already in use for optional named arguments to functions. Inventing yet another way without good cause is pointless.

Is there an idiomatic way to order function arguments in Erlang?

Seems like it's inconsistent in the lists module. For example, split has the number as the first argument and the list as the second, but sublists has the list as the first argument and the len as the second argument.
OK, a little history as I remember it and some principles behind my style.
As Christian has said the libraries evolved and tended to get the argument order and feel from the impulses we were getting just then. So for example the reason why element/setelement have the argument order they do is because it matches the arg/3 predicate in Prolog; logical then but not now. Often we would have the thing being worked on first, but unfortunately not always. This is often a good choice as it allows "optional" arguments to be conveniently added to the end; for example string:substr/2/3. Functions with the thing as the last argument were often influenced by functional languages with currying, for example Haskell, where it is very easy to use currying and partial evaluation to build specific functions which can then be applied to the thing. This is very noticeable in the higher order functions in lists.
The only influence we didn't have was from the OO world. :-)
Usually we at least managed to be consistent within a module, but not always. See lists again. We did try to have some consistency, so the argument order in the higher order functions in dict/sets match those of the corresponding functions in lists.
The problem was also aggravated by the fact that we, especially me, had a rather cavalier attitude to libraries. I just did not see them as a selling point for the language, so I wasn't that worried about it. "If you want a library which does something then you just write it" was my motto. This meant that my libraries were structured, just not always with the same structure. :-) That was how many of the initial libraries came about.
This, of course, creates unnecessary confusion and breaks the law of least astonishment, but we have not been able to do anything about it. Any suggestions of revising the modules have always been met with a resounding "no".
My own personal style is a usually structured, though I don't know if it conforms to any written guidelines or standards.
I generally have the thing or things I am working on as the first arguments, or at least very close to the beginning; the order depends on what feels best. If there is a global state which is chained through the whole module, which there usually is, it is placed as the last argument and given a very descriptive name like St0, St1, ... (I belong to the church of short variable names). Arguments which are chained through functions (both input and output) I try to keep the same argument order as return order. This makes it much easier to see the structure of the code. Apart from that I try to group together arguments which belong together. Also, where possible, I try to preserve the same argument order throughout a whole module.
None of this is very revolutionary, but I find if you keep a consistent style then it is one less thing to worry about and it makes your code feel better and generally be more readable. Also I will actually rewrite code if the argument order feels wrong.
A small example which may help:
fubar({f,A0,B0}, Arg2, Ch0, Arg4, St0) ->
{A1,Ch1,St1} = foo(A0, Arg2, Ch0, St0),
{B1,Ch2,St2} = bar(B0, Arg4, Ch1, St1),
Res = baz(A1, B1),
{Res,Ch2,St2}.
Here Ch is a local chained through variable while St is a more global state. Check out the code on github for LFE, especially the compiler, if you want a longer example.
This became much longer than it should have been, sorry.
P.S. I used the word thing instead of object to avoid confusion about what I was talking.
No, there is no consistently-used idiom in the sense that you mean.
However, there are some useful relevant hints that apply especially when you're going to be making deeply recursive calls. For instance, keeping whichever arguments will remain unchanged during tail calls in the same order/position in the argument list allows the virtual machine to make some very nice optimizations.

Resources