How efficient is (Guile) Scheme's reverse function - linked-list

It is very easy to use Scheme's reverse function, for example after creating a list in reverse order with (cons new-obj my-list) rather than (append my-list (list new-obj)).
However, I'm wondering how efficient that part would be. If a Scheme list is a singly linked list (as I assume) that would imply traveling through the whole list at least once to reach the last element, isn't it? And this would require reverse to somehow create the backlinks along the way?
OTOH in a doubly-linked list one would simply traverse the list from the end to the start, which would be more efficient.
My question is: if I have a situation where a program generates a list (in reverse order) and at some point processes the whole list is it worth bothering to implement that processing in a way that it can handle the list in reverse order, or is it no penalty to simply use reverse first?
This is for Guile Scheme, and Guile 1.8 to be specific (if it makes a difference).

Reversing takes O(n) time to create the n-long reversed list, so takes O(n) space as well. You have only traverse the original list once to reverse it.
If the original list isn't used anymore, it could be garbage collected (when? becomes the question) and its memory reclaimed, for the overall theoretical O(1) space cost; but garbage collecting has its own costs.
So, test, and if your lists are long and you see lots of garbage collecting going on, do what you suggested. It will more or less halve your time and space requirements (in that segment of the code base).
But if the list-building and reversing takes up 0.0001% of your overall time and space, halving that won't be such a huge improvement.
There are no back links in singly-linked lists, by the way. The reversing simply builds new cons cells as it traverses the original one, by the repeated consing.

There will be a penalty in using the built-in reverse procedure, which is O(n) in time complexity and O(1) in space complexity, and looks similar to this (and yes, we have to traverse the list until the end, creating backlinks along the way):
(define (reverse lst)
(let loop ((lst lst) (acc '()))
(if (null? lst)
acc
; notice the tail recursion!
(loop (cdr lst) (cons (car lst) acc)))))
On the other hand, I wouldn't worry too much about it, unless the lists you're operating on are humongous, and your profiler demonstrates that the reverse procedure is a bottleneck.
Just use the built-in, to simplify your code - composing existing functions is the style encouraged by functional programming. In Scheme, we try to process lists in a tail-recursive way, even if that means creating a list in reverse and we need to call reverse at the end, it's totally acceptable.

Imagine the simple copy function:
(define (copy1 lst)
(let loop ((lst lst) (acc '()))
(if (null? lst)
(reverse acc)
(loop (cdr lst) (cons (car lst) acc)))))
Versus one where you use append:
(define (copy2 lst)
(let loop ((lst lst) (acc '()))
(if (null? lst)
acc
(loop (cdr lst) (append acc (list (car lst)))))))
Both append and reverse is O(n), which means they iterate the whole list. The difference between the two versions of copy is that the version with reverse does all elements once at the end while the append version calls append for every element. That makes the last version O(n*n) and much more inefficient than the first which is O(n). With a large list you will quickly notice that difference.
Now to answer your question. You should work with a data type optimized for your task. If you always are making a list where you are appending to the end then do all the work on the list in reverse and wait for the reverse until you need it. You can even make abstractions around it.

Related

Using a Closure instead of a Global Variable

This question is a continuation of the comments at Using Local Special Variables, regarding how best to avoid global variables. As I understand it, global variables are problematic mainly because they have the potential to interfere with referential transparency. Transparency is violated if an expression changes a global value using information outside its calling context (eg, a previous value of the global variable itself, or any other external values). In these cases evaluating the expression may have different results at different times, either in the value(s) returned or in side effects. (However, it seems not all global updates are problematic, since some updates may not depend on any external information--eg, resetting a global counter to 0). The normal global approach for a deeply embedded counter might look like:
* (defparameter *x* 0)
*X*
* (defun foo ()
(incf *x*))
FOO
* (defun bar ()
(foo))
BAR
* (bar)
1
* *x*
1
This would seem to violate referential transparency because (incf *x*) depends on the external (global) value of *x* to do its work. The following is an attempt to maintain both functionality and referential transparency by eliminating the global variable, but I'm not convinced that it really does:
* (let ((x 0))
(defun inc-x () (incf x))
(defun reset-x () (setf x 0))
(defun get-x () x))
GET-X
* (defun bar ()
(inc-x))
BAR
* (defun foo ()
(bar))
FOO
* (get-x)
0
* (foo)
1
* (get-x)
1
The global variable is now gone, but it still seems like the expression (inc-x) has a (latent) side effect, and it will return different (but unused) values each time it is called. Does this confirm that using a closure on the variable in question does not solve the transparency problem?
global variables are problematic mainly because they have the potential to interfere with referential transparency
If one wants to create a global configuration value, a global variable in Common Lisp is just fine.
Often it's desirable to package a bunch of configuration state and then it may be better to put that into an object.
There is no general requirement for procedures to be referential transparent.
It's useful to guide software design by software engineering principles, but often easy debugging and maintenance is more important than strict principles.
(let ((x 0))
(defun inc-x () (incf x))
(defun reset-x () (setf x 0))
(defun get-x () x))
Practically above means that it
is difficult to inspect
has problematic effects of reloading the code
prohibits the file compiler to recognize the top-level nature of the functions
creates a whole API for just managing a single variable
Referential transparency means that if you bind some variable x to an expression e, you can replace all occurrences of x by e without changing the outcome. For example:
(let ((e (* pi 2)))
(list (cos e) (sin e)))
The above could be written:
(list (cos (* pi 2))
(sin (* pi 2)))
The resulting value is equivalent to the first one for some useful definition of equivalence (here equalp, but you could choose another one). Contrast this with:
(let ((e (random))
(list e e))
Here above, each call to random gives a different result (statistically), and thus the behaviour is different if you reuse the same result multiple times or generate a new after each call.
Special variables are like additional arguments to functions, they can influence the outcome of a result simply by being bound to different values. Consider *default-pathname-defaults*, which is used to build pathnames.
In fact, for a given binding of that variable, each call to (merge-pathnames "foo") returns the same result. The result changes only if you use the same expression in different dynamical context, which is no different than calling a function with different arguments.
The main difficulty is that the special variable is hidden, i.e. you might not know that it influences some expressions, and that's why you need them documented and limited in number.
What breaks referential transparency is the presence of side-effects, whether you are using lexical or special variables. In both cases, a place is modified as part of the execution of the function, which means that you need to consider when and how often you call it.
You could have better suggestions if you explained a bit more how your code is organized. You said that you have many special variables due to prototyping but in the refactoring you want to do it seems as-if you want to keep to prototypal code mostly untouched. Maybe there is a way to pack things in a nice modular way but we can't help without knowing more about why you need many special variables, etc.
That code isn't referentially transparent. It is an improvement from special variables though.
The code you put would be a functional nonce if you dropped the reset-x.
My answer to your previous question had general guidelines about special variables. For your specific case, perhaps they are worth it? I could see the case for using special variables as a nonce, for example, where it is probably silly to pass them around.
Common Lisp has so many facilities for dealing with global information, so there is rarely a need for having lots of global variables. You could define an *env* alist to store your values in, or put them in a hash table, or put them into symbol plists, or package them in a closure to pass around, or do something else, or use CLOS.
Where is the side effect of the second example ? The x inside the let isn't accessible from the outside.
Here's another closure example, with top-level functions, and a counter explicitly inside it.
(defun repeater (n)
(let ((counter -1))
(lambda ()
(if (< counter n)
(incf counter)
(setf counter 0)))))
(defparameter *my-repeater* (repeater 3))
;; *MY-REPEATER*
(funcall *my-repeater*)
0
(funcall *my-repeater*)
1
https://lispcookbook.github.io/cl-cookbook/functions.html#closures

Fully evaluated results in Z3?

Z3 often gives back models defined in terms of a bunch of intermediate functions. For example, it's common to see the following (pardon my improper syntax):
(define-const myArray (Array Bool Int) (_ as-array f))
(define-fun f (x Bool) Int (f!10 (k!26 x)))
... And so on.
I'd like to be able to get a result back that I can take in my program (calling Z3 using library bindings) and both print the results, and parse them into a function I can actually run. This is much easier if I can get my model functions as single, straight line programs that I can run, instead of as multiple functions defined in terms of each other.
Is this possible? I'm dealing only with finite domain functions, if that helps.
We will be updating the model construction in a future release to compress away intermediate functions when possible. There are, however, cases where this can cause an exponential overhead because the same auxiliary function can be reused in several contexts. For those models, it doesn't make sense to expand the auxiliary functions. So users are still going to be forced to deal with such functions if they want to post-process the models.

Why does F#'s Seq.windowed return seq of array

Seq.windowed in F# returns a sequence where each window within is an array. Is there a reason why each window is returned as an array (a very concrete type) as opposed to say, another sequence or IList<'T>? An IList<'T>, for example, would be sufficient if the purpose was to communicate that the items of the window can be randomly accessed but an array says two things: elements are mutable and randomly accessible. If you can rationalise the choice of array, how is windowed different from Seq.groupBy? Why does that latter (or operators in the same vein) not also return the members of a group as an array?
I'm wondering if this is simply a design oversight or is there a deeper, contractual reason for an array?
I do not know what is the design principle behind this. I suppose it might just be an accidental aspect of the implementation - Seq.windowed can be quite easily implemented by storing items in arrays, while Seq.groupBy probably needs to use some more complicated structure.
In general, I think that F# APIs either use 'T[] if using array is the natural efficient implementation, or return seq<'T> when the data source may be infinite, lazy, or when the implementation would have to convert the data to an array explicitly (then this can be left to the caller).
For Seq.windowed, I think that array makes a good sense, because you know the length of the array and so you are likely to use indexing. For example, assuming that prices is a sequence of date-price tuples (seq<DateTime * float>) you can write:
prices
|> Seq.windowed 5
|> Seq.map (fun win -> fst (win.[2]), Seq.averageBy snd win)
The sample calculates floating average and uses indexing to get the date in the middle.
In summary, I do not really have a good explanation for the design rationale, but I'm quite happy with the choices made - they seem to work really well with the usual use cases for the functions.
A couple of thoughts.
First, know that in their current version, both Seq.windowed and Seq.groupBy use non-lazy collections in their implementation. windowed uses arrays and returns you arrays. groupBy builds up a Dictionary<'tkey, ResizeArray<'tvalue>>, but keeps that a secret and returns the group values back as a seq instead of ResizeArray.
Returning ResizeArrays from groupBy wouldn't fit with anything else, so that obviously needs to be hidden. The other alternative is to give back the ToArray() of the data. This would require another copy of the data to be created, which is a downside. And there isn't really much upside, since you don't know ahead of time how big your group will be anyways, so you aren't expecting to do random access or any of the other special stuff arrays enable. So simply wrapping in a seq seems like a good option.
For windowed, it's a different story. You want an array back in this case. Why? Because you already know how big that array is going to be, so you can safely do random access or, better yet, pattern matching. That's a big upside. The downside remains, though - the data needs to be re-copied into a newly-allocated array for every window.
seq{1 .. 100} |> Seq.windowed 3 |> Seq.map (fun [|x; _; y|] -> x + y)
There is still the open question - "but couldn't we avoid the array allocation/copy downside by internally only using true lazy seqs, and returning them as such? Isn't that more in the 'spirit of the seq'?" It would be kind of tricky (would need some fancy cloning of enumerators?), but sure, probably with some careful coding. There's a huge downside to this, though. You'd need to cache the entire unspooled seq in memory to make it work, which kind of negates the whole goal of doing things lazily. Unlike lists or arrays, enumerating a seq multiple times is not guaranteed to yield the same results (e.g. a seq that returns random numbers), so the backing data for these seq windows you are returning needs to be cached somewhere. When that window is eventually accessed, you can't just tap in and re-enumerate through the original source seq - you might get back different data, or the seq might end at a different place. This points to the other upside of using arrays in Seq.windowed - only windowSize elements need to be kept in memory at once.
This is of course pure guess. I think this is related to the way both functions are implemented.
As already mentioned, in Seq.groupBy the groups are of variable length and in Seq.windowed they are of a fixed size.
So in the implementation from Seq.windowed it makes more sense to use a fixed size array, as opposed to the Generic.List used in Seq.groupBy, which btw in F# is called ResizeArray.
Now to the outside world, an Array though mutable is widely used in F# code and libraries and F# provides syntactic support for creating, initializing and manipulating arrays, whereas the ResizeArray is not that widely used in F# code and the language provides no syntactic support apart from the type alias, so I think that's why they decided to expose it as a Seq.

Charting with F# does not take in Seq?

Following this from the Real-World Functional Programming blog about drawing bar and column charts, I was trying to draw a histogram for my data which is stored at a set of tuples (data_value, frequency) in a lazy sequence.
It does not work unless I convert the sequence into a List, the error message being that in case of sequence "the IEnumerable 'T does not support the Reset function". Is there any way to generate a histogram/chart etc. using the .NET library from a lazily-evaluated sequence?
Also (ok newbie query alert), is there any way to make the chart persist when the program is run from the console? The usual System.Console.ReadKey() |> ignore makes the chart window hang, and otherwise it disappears in an instant. I've been using "Send to Interactive" to see results til now.
The problem is that sequences (of type seq<T>, which is just an alias for IEnumerable<T>) generated using the F# sequence expression notation do not support the Reset method. The method is required by the charting library (because it needs to obtain the data each time it redraws the screen).
This means that, for example, the following will not work:
seq { for x in data -> x } |> FSharpChart.Line
Many of the standard library functions from the Seq module are implemented using sequence expressions, so the result will not support Reset. You can fix that by converting the data to a list (using List.ofSeq) or to an array (using Array.ofSeq) or by writing the code using lists:
[ for x in data -> x ] |> FSharpChart.Line
... and if you're using some function, you can take the one from List (not all of the Seq functions are available for List, so sometimes you will need to use conversion):
[ for x in data -> x ] |> List.choose op |> FSharpChart.Line
No, it does not accept sequences.
That said, there is a good reason for not supporting seq. it is about the structure itself : A seq is just that, a seq, and therefore does not and should not, support the kind of operations needed in drawing a graph. That said, I really wish this stack was more advanced and supported more use style.
So the answer is to
|> Seq.toArray
or
|> Seq.toList
before sending to the chart library

Erlang: Can this be done without lists:reverse?

I am a beginner learning Erlang. After reading about list comprehensions and recursion in Erlang, I wanted to try to implement my own map function, which turned out like this:
% Map: Map all elements in a list by a function
map(List,Fun) -> map(List,Fun,[]).
map([],_,Acc) -> lists:reverse(Acc);
map([H|T],Fun,Acc) -> map(T,Fun,[Fun(H)|Acc]).
My question is: It feels wrong to build up a list through the recursive function, and then reverse it at the end. Is there any way to build up the list in the right order, so we don't need the reverse ?
In oder to understand why accumulating and reversing is quite fast you have to understand how lists are build in Erlang. Erlangs lists like those in Lisp are build out of cons cells (look at the picture in the link).
In a singly linked list like the Erlang lists it is very cheap to prepend a element (or a short list). This is was the List = [H|T] construct does.
Reversing a singly linked list made of cons cells is quite fast since you only need one pass along the list, just prepending the next element to you already reversed partial result. And as was already mentioned, it is also implemented in C in Erlang.
Also building a result in reverse order often can be accomplished by a tail recursive function which means that no stack is built up and (in old versions of Erlang only!) therefore some memory can be saved.
Having said all this: it is one of The Eight Myths of Erlang Performance that it is always better to build in reverse in a tail recursive function and call lists:reverse/1 at the end.
Here is a body recursive version without lists:reverse/1 this would use more memory on Erlang versions before 12 but not so on current versions:
map([H|T], Fun) ->
[ Fun(H) | map(T,Fun) ];
map([],_) -> [].
And here is a version of map using list comprehensions:
map(List, Fun) ->
[ Fun(X) || X <- List ].
As you can see this is so simple because map is just a built in part of the list comprehension, so you would use the list comprehension directly and have no need for map anymore.
As an extra a pure Erlang implementation that shows how reversing a list of cons cells wors efficiently (in Erlang its always faster to call lists:reverse/1 because it is in C, but does the same thing).
reverse(List) ->
reverse(List, []).
reverse([H|T], Acc) ->
reverse(T, [H|Acc]);
reverse([], Acc) ->
Acc.
As you can see there are only [A|B] operations on the list, taking cons cells apart (when pattern matching) and building new when doing the [H|Acc].
Such construction [ Element | Acc] and then lists:reverse(Acc) is quite common pattern in functional programming.
You can write code with an accumulator storing data in correct order, but this code will be slower than the one in your question (hint: use Acc ++ [Fun(H)]).
Actually within Erlang lists:reverse is implemented in C and it works quite fast.

Resources