SICP Streams (Scheme) - CDR into the implicit stream of ones - stream

(define ones (cons-stream 1 ones))
(stream-cdr ones)
returns an infinite sequence of evaluated 1s - i.e. I get
;Value: #0={1 1 1 1 1 1 1 and so on... - not the symbolic {1 1 ...} I would expect ...
On the other end if I define ints and cdr into it,
(define ints (cons-stream 1 (stream-map + ones ints)))
(stream-cdr ints)
I obtain the expected {1 2 ...}
Can anyone explain me why? I expect the stream-map definition to be not too far from
(define (mystream-map proc . argstreams)
(if (stream-null? (car argstreams))
the-empty-stream
(cons-stream
(apply proc (map stream-car argstreams))
(apply mystream-map (cons proc (map stream-cdr argstreams))))))
which I defined in ex 3.50 and returns the same result ... but this seems to (via (map stream-cdr argstreams)) stream-cdr into ones, which I would expect to result in the infinite sequence I get above!
Even though, if I understand correctly, due to cons-stream being a macro the second part inside the stream-map will only be lazily evaluated, at some point while I cdr into the ints stream I would expect to have to cdr into ones as well.. which instead doesn't seem to be happening! :/
Any help in understanding this would be very much appreciated!
(I also didn't override any scheme symbol and I'm running everything in MIT Scheme - Release 11.2 on OS X.)

(define ones (cons-stream 1 ones))
(stream-cdr ones)
returns an infinite sequence of evaluated ones.
No it does not. (stream-cdr ones) = (stream-cdr (cons-stream 1 ones)) = ones. It is just one ones, not the sequence of ones.
And what is ones?
Its stream-car is 1, and its stream-cdr is ones.
Whose stream-car is 1, and stream-cdr is ones again.
And so if you (stream-take n ones) (with an obvious implementation of stream-take) you will receive an n-long list of 1s, whatever the (non-negative) n is.
In other words, repeated cdring into ones produces an unbounded sequence of 1s. Or symbolically, {1 1 1 ...} indeed.
After your edit it becomes clear the question is about REPL behavior. What you're seeing most probably has to do with the difference between sharing and recalculation, just like in letrec with the true reuse (achieved through self-referring binding) vs. the Y combinator with the recalculation ((Y g) == g (Y g)).
You probably can see the behavior you're expecting, with the re-calculating definition (define (onesF) (cons-stream 1 (onesF))).
I only have Racket, where
(define ones (cons-stream 1 ones))
(define (onesF) (cons-stream 1 (onesF)))
(define ints (cons-stream 1 (stream-map + ones ints)))
(stream-cdr ones)
(stream-cdr (onesF))
(stream-cdr ints)
prints
(mcons 1 #<promise>)
(mcons 1 #<promise>)
(mcons 2 #<promise>)
Furthermore, having defined
(define (take n s)
(if (<= n 1)
(if (= n 1) ; prevent an excessive `force`
(list (stream-car s))
'())
(cons (stream-car s)
(take (- n 1)
(stream-cdr s)))))
we get
> (display (take 5 ones))
(1 1 1 1 1)
> (display (take 5 (onesF)))
(1 1 1 1 1)
> (display (take 5 ints))
(1 2 3 4 5)

Related

SICP Chapter 3.5.2 infinite streams integers definition

I am reading SICP and am having difficulty understanding one example provided for infinite streams:
https://mitpress.mit.edu/sites/default/files/sicp/full-text/book/book-Z-H-24.html#%_sec_3.5.2
We can do more interesting things by manipulating streams with operations such as add-streams, which produces the elementwise sum of two given streams:62
(define (add-streams s1 s2)
(stream-map + s1 s2))
Now we can define the integers as follows:
(define integers (cons-stream 1 (add-streams ones integers)))
I can obviously understand the intent behind the integers definition but I'm struggling to "simulate" this stream in my head. Prior examples haven't been an issue because the maintenance of state has been explicit. Such as in this example:
(define (integers-starting-from n)
(cons-stream n (integers-starting-from (+ n 1))))
(define integers (integers-starting-from 1))
I have no problem understanding this definition of integers.
The book describes the definition of ones:
(define ones (cons-stream 1 ones))
This works much like the definition of a recursive procedure: ones is a pair whose car is 1 and whose cdr is a promise to evaluate ones. Evaluating the cdr gives us again a 1 and a promise to evaluate ones, and so on.
Perhaps this line is throwing me off. Ones is simple, because on each stream-cdr the procedure is evaluated and a new "1" is provided and the next promise.
When I try to apply this reasoning to integers, I'm struggling to understand why the resultant stream isn't "1 2 2 2 2 2 ..." as integers gets continually re-evaluated and essentially restarting at 1.
edit
I was remiss in not elaborating on whether or not memoization was to be assumed in my question. SICP does indeed mention the concern of quadratic behavior raised in the answers and provides a solution in the form of a memoizing delay function:
(define (memo-proc proc)
(let ((already-run? false) (result false))
(lambda ()
(if (not already-run?)
(begin (set! result (proc))
(set! already-run? true)
result)
result))))
Delay is then defined so that (delay ) is equivalent to
(memo-proc (lambda () <exp>))
If our streams are memoized then the integers passed as an argument to add-streams is always "one step behind" the integers that we are enumerating, so it always has access to the memoized value. With numbers in (parens) showing the use of memoized values:
Integers: 1, add-streams / Ones: 1, 1, 1, 1, 1, 1, ...
\Integers: 1, (2), (3), (4), (5), (6), ...
=== === === === === ===
Results: 1 2, 3, 4, 5, 6, 7, ...
Memoized: 2, 3, 4, 5, 6,
If our streams are not memoized then each time stream-cdr is called on integers a new series of ones is created and added to all the previous ones.
integers 1
ones 1, 1, 1, 1, 1, 1, ...
integers 1
ones 1, 1, 1, 1, 1, ...
integers 1
ones 1, 1, 1, 1, ...
integers 1
ones 1, 1, 1, ...
integers 1
ones 1, 1, ...
integers 1
ones 1, ...
integers 1
== == == == == == ==
1, 2, 3, 4, 5, 6, 7, ...
So 100 is generated by adding an element of ones 99 times and the stream-car of the integers that is the result of the previous 99 calls to integers.
Although the first add-streams is only combining two streams, the second stream will (after returning 1) be returning results from a new add-streams, the second stream of which will be will be the result of another add-streams:
1, add-streams / 1, 1, 1, ...
\ 1, add-streams / 1, 1, ...
\ 1, add-streams / 1, ...
\ 1, add-streams ...
So add-streams, a bit like using cons to create a list, is creating pairs of streams where the first is a stream of ones and the second is another pair of streams.
Without memoizing this is not a practical implementation of integers because its performance is O(n^2):
Time to Access Elements
Element of CPU Time
integers (msec)
========== ========
1 st 0
2 nd 0
4 th 0
8 th 0
16 th 0
32 nd 47
64 th 78
128 th 313
256 th 1,171
512 th 4,500
1,024 th 17,688
2,048 th 66,609
4,096 th 272,531
With the simplest, non-memoizing streams implementation, we get:
(define (stream-map2 f s1 s2)
(cons (f (car s1) (car s2))
(lambda ()
(stream-map2 f ((cdr s1)) ((cdr s2))))))
(define ones (cons 1 (lambda () ones)))
(define integers
(cons 1
(lambda ()
(stream-map2 + ones integers))) ;; 1
=
(cons 1
(lambda ()
(cons (+ (car ones) (car integers))
(lambda ()
(stream-map2 + ones
(stream-map2 + ones integers)))))) ;; 2
=
(cons 1
(lambda ()
(cons (+ (car ones) (car integers))
(lambda ()
(let ((i2 (stream-map2 + ones integers)))
(stream-map2 + ones i2))))))
i.e.
=
(cons 1
(lambda ()
(cons (+ (car ones) (car integers))
(lambda ()
(let ((i2 (cons (+ (car ones) (car integers)) ;; <---- 1
(lambda ()
(stream-map2 + ones
(stream-map2 + ones integers))))))
(cons (+ (car ones) (car i2))
(lambda ()
(stream-map2 + ones ((cdr i2))))))))))
=
(cons 1
(lambda ()
(cons (+ (car ones) (car integers))
(lambda ()
(cons (+ (car ones)
(+ (car ones) (car integers)))
(lambda ()
(stream-map2 + ones
(stream-map2 + ones
(stream-map2 + ones integers))))))))) ;; 3
=
....
Indeed we see a triangular, quadratic computation unfolding here.
See the foot note 56.
cons-stream must be a special form. If cons-stream were a procedure, then, according to our model of evaluation, evaluating (cons-stream <a> <b>) would automatically cause <b> to be evaluated, which is precisely what we do not want to happen.
The piece I was missing here was that integers isn't being re-evaluated at all.
The promise that add-streams returns is the stream-cdr of each of the input streams.
The "state" previously referred to is maintained in the feedback loop.
Its quite mind-bending and to be honest still seems almost magical in it's power.

Scheme: problem about display when using `delay` expression

This is a problem related to ex3.51 in SICP, here is the code
(define (cons-stream x y)
(cons x (delay y)))
(define (stream-car stream) (car stream))
(define (stream-cdr stream) (force (cdr stream)))
(define (stream-map proc s)
(if (stream-null? s)
the-empty-stream
(cons-stream
(proc (stream-car s))
(stream-map proc (stream-cdr s)))))
(define (stream-enumerate-interval low high)
(if (> low high)
the-empty-stream
(cons-stream
low
(stream-enumerate-interval (+ low 1) high))))
(define (stream-ref s n)
(if (= n 0)
(stream-car s)
(stream-ref (stream-cdr s) (- n 1))))
(define (show x)
(display x)
x)
;test
(stream-map show (stream-enumerate-interval 0 10))
the output is 012345678910(0 . #<promise>).
but I thought the delay expression in cons-stream delayed the evaluation, if i use a different processing function in stream-map like lambda (x) (+ x 1) the output (1 . #<promise>) is more reasonable, so why does display print all the numbers?
The problem is with this definition:
(define (cons-stream x y)
(cons x (delay y)))
It defines cons-stream as a function, since it uses define.
Scheme's evaluation is eager: the arguments are evaluated before the function body is entered. Thus y is already fully calculated when it is passed to delay.
Instead, cons-stream should be defined as a macro, like
(define-syntax cons-stream
(syntax-rules ()
((_ a b) (cons a (delay b)))))
or we can call delay explicitly, manually, like e.g.
(define (stream-map proc s)
(if (stream-null? s)
the-empty-stream
(cons
(proc (stream-car s))
(delay
(stream-map proc (stream-cdr s))))))
Then there'd be no calls to cons-stream in our code, only the (cons A (delay B)) calls. And delay is a macro (or special form, whatever), it does not evaluate its arguments before working but rather goes straight to manipulating the argument expressions instead.
And we could even drop the calls to delay, and replace (cons A (delay B)) with (cons A (lambda () B)). This would entail also reimplementing force (which is built-in, and goes together with the built-in delay) as simply (define (force x) (x)) or just calling the (x) manually where appropriate to force a stream's tail.
You can see such lambda-based streams code towards the end of this answer, or an ideone entry (for this RosettaCode entry) without any macros using the explicit lambdas instead. This approach can change the performance of the code though, as delay is memoizing but lambda-based streams are not. The difference will be seen if we ever try to access a stream's element more than once.
See also this answer for yet another take on streams implementation, surgically modifying list's last cons cell as a memoizing force.

SICP Infinite Streams (Chapter 3.5.2)

This is a question related to the SICP Book Chapter 3.5.2.
I'm implementing a stream data structure in other programming languages. And I'm not sure if I understand the following snippet correctly.
(define (integers-starting-from n)
(cons-stream n (integers-starting-from (+ n 1))))
(define integers (integers-starting-from 1))
From what I understood at (integers-starting-from (+ n 1)) will execute the function which return a value by executing(cons-stream n (integers-starting-from (+ n 1)))). Because the second formal parameter of the cons-stream is (integers-starting-from (+ n 1)), and because it is enclosed by( ), thus it will execute the function again and again infinitely instead of delaying the execution.
From what I see before executing this snippet, It seems that the following integer will leads to an infinite recursive before even the seconds element of the stream being executed.
Why does this seems to work for scheme as shown during the lecture?
From my understand it should be written something like this instead:
(define (integers-starting-from n)
(cons-stream n (lambda() (integers-starting-from (+ n 1)))))
(define integers (integers-starting-from 1))
Does this means that scheme has some kinds of magic that delay the execution of (integers-starting-from (+ n 1)) ?
Thank you in advance
The trick lies in how we implement cons-stream. You explicitly created an evaluation promise when you defined the (lambda () ...) thunk. The special form cons-stream does this, but implicitly and using Scheme's primitives. For example, it can be implemented like this, notice how we use delay:
(define-syntax stream-cons
(syntax-rules ()
((stream-cons head tail)
(cons head (delay tail)))))
It makes more sense to encapsulate all the promise-creation logic in a single place, say cons-stream, instead of explicitly creating thunks everywhere.

How to limit memory use when using a stream

I'm working with a data set that is too big to fit into memory and therefore would like to use a stream to process the data. However, I'm finding that the stream isn't limiting the amount of memory used as I expected. The sample code illustrates the problem and will run out of memory if the memory limit is set to 128mb. I know I could up the memory limit, but with the sort of big data set that I want to use this won't be an option. How should I keep the memory use down?
#lang racket
(struct prospects (src pos num max-num)
#:methods gen:stream
[(define (stream-empty? stream)
(equal? (prospects-num stream) (prospects-max-num stream)))
;
(define (stream-first stream)
(list-ref (prospects-src stream) (prospects-pos stream)))
;
(define (stream-rest stream)
(let ([next-pos (add1 (prospects-pos stream))]
[src (prospects-src stream)]
[next-num (add1 (prospects-num stream))]
[max-num (prospects-max-num stream)])
(if (< next-pos (length src))
(prospects src next-pos next-num max-num)
(prospects src 0 next-num max-num))))])
(define (make-prospects src num)
(prospects src 0 0 num))
(define (calc-stats prospects)
(let ([just-a (stream-filter
(λ (p) (equal? "a" (vector-ref p 0)))
prospects)]
[just-b (stream-filter
(λ (p) (equal? "b" (vector-ref p 0)))
prospects)])
;
(let ([num-a (stream-length just-a)]
[num-b (stream-length just-b)]
[sum-ref1-a (for/sum ([p (in-stream just-a)])
(vector-ref p 1))]
[sum-ref1-b (for/sum ([p (in-stream just-b)])
(vector-ref p 1))])
;
#|
; Have also tried with stream-fold instead of for/sum as below:
[sum-ref1-a (stream-fold
(λ (acc p) (+ acc (vector-ref p 1)))
0 just-a)]
[sum-ref1-b (stream-fold
(λ (acc p) (+ acc (vector-ref p 1)))
0 just-b)])
|#
;
(list num-a num-b sum-ref1-a sum-ref1-b))))
;================================
; Main
;================================
(define num-prospects 800000)
(define raw-prospects (list #("a" 2 2 5 4 5 6 2 4 2 45 6 2 4 5 6 3 4 5 2)
#("b" 1 3 5 2 4 3 2 4 5 34 3 4 5 3 2 4 5 6 3)))
(calc-stats (make-prospects raw-prospects num-prospects))
Note: This program was created just to demonstrate the problem; the real stream would access a database to bring in the data.
The main problem in your code was that you were trying to make multiple passes through the stream. (Each call to stream-length is one pass, and each of your calls to for/sum (or stream-fold, for that matter) is another pass.) This means that you had to materialise the whole stream without allowing the earlier stream elements to be garbage-collected.
Here's a modification of your code to make only one pass. Note that I made num-prospects to be 8,000,000 in my version, since even the multi-pass version didn't run out of memory on my system with only 800,000:
#lang racket
(require srfi/41)
(define (make-prospects src num)
(stream-take num (apply stream-constant src)))
(define (calc-stats prospects)
(define default (const '(0 . 0)))
(define ht (for/fold ([ht #hash()])
([p (in-stream prospects)])
(hash-update ht (vector-ref p 0)
(λ (v)
(cons (add1 (car v))
(+ (cdr v) (vector-ref p 1))))
default)))
(define stats-a (hash-ref ht "a" default))
(define stats-b (hash-ref ht "b" default))
(list (car stats-a) (car stats-b) (cdr stats-a) (cdr stats-b)))
;================================
; Main
;================================
(define num-prospects 8000000)
(define raw-prospects '(#("a" 2 2 5 4 5 6 2 4 2 45 6 2 4 5 6 3 4 5 2)
#("b" 1 3 5 2 4 3 2 4 5 34 3 4 5 3 2 4 5 6 3)))
(calc-stats (make-prospects raw-prospects num-prospects))
I should clarify that the use of srfi/41 is simply to enable writing a more-efficient version of make-prospects (though, the reference implementation of stream-constant isn't very efficient, but still more efficient than what your prospects stream generator did); calc-stats doesn't use it.
I upvoted Chris' excellent answer and suggest you pick it to mark as accepted.
However, what would use the least memory for a data set that doesn't fit into RAM? Probably something like the following pseudo code:
(require db)
(define dbc <open a db connection>)
(define just-a (query-value dbc "SELECT Count(*) FROM t WHERE n = $1" "a"))
(define just-b (query-value dbc "SELECT Count(*) FROM t WHERE n = $1" "b"))
Why this is a somewhat smartypants answer:
Ostensibly you asked about using Racket streams to handle things that don't fit in memory.
If you need more complex aggregations than count (or sum/min/max), you'll need to write more complex SQL queries, and probably want to make them stored procedures on the server.
Why it isn't necessarily smartypants:
You did mention your real use case involved a database. ;)
A speciality of a DB server and SQL is large data sets that don't fit in memory. Taking advantage of the server often will beat something DB-ish re-implemented in a general-purpose language (as well as probably being friendlier to other uses/users of the same DB server).
You are running out of memory because you are hanging onto the head of the stream while traversing it. The GC can't collect anything because since you have a pointer to the head, every element of the stream is still reachable.
To demonstrate, with this stream:
(define strm (make-prospects raw-prospects num-prospects))
this blows up:
(define just-a (stream-filter (λ (p) (equal? "a" (vector-ref p 0))) strm))
(stream-length just-a)
while this is fine:
(stream-length (stream-filter (λ (p) (equal? "a" (vector-ref p 0))) strm))

Infinite ascending sequence in Racket

Is there an analog of Python's itertools.count in Racket? I want to create an infinite stream of evenly spaced numbers. in-naturals is similar to what i want, but does not provide step. I'd want not to reinvent the wheel, but if there's no equivalent function, how to write one? (i presume, generators should be used)
You can get the same functionality of Python's count using in-range with an infinite end value:
(define (count start step)
(in-range start +inf.0 step))
For example:
(define s (count 2.5 0.5))
(stream-ref s 0)
=> 2.5
(stream-ref s 1)
=> 3.0
(stream-ref s 2)
=> 3.5
(stream-ref s 3)
=> 4.0
Making the function yourself can be done in a single line:
(define (stream-from n s) (stream-cons n (stream-from (+ n s) s)))
To test it, you here is an example that prints 100000 numbers:
#lang racket
(require racket/stream)
(define (stream-from n s) (stream-cons n (stream-from (+ n s) s)))
(define (stream-while s p)
(let ([fst (stream-first s)])
(if (p fst) (stream-cons fst (stream-while (stream-rest s) p)) empty-stream)))
(define test (stream-while (stream-from 0 1) (λ (x) (< x 100000))))
(stream-for-each println test)

Resources