GHC compiled binary stack overflow? - stack

I'm reading << real world haskell >> Chapter 8 and wanted to see how the SumFile.hs program handles say, 1 million numbers:
main :: IO ()
main = do
contents <- getContents
print (sumFile contents)
where sumFile = sum . map read . words
When I feed 1 million integers to the program with:
runhaskell SumFile.hs < data.txt, the program gives a correct result.
However, when I compiled it using GHC:
ghc SumFile.hs
The binary gives a "Stack space overflow" error:
./SumFile < data.txt
Stack space overflow: current size 8388608 bytes.
Use `+RTS -Ksize -RTS' to increase it.
I have two questions:
What is causing the stack space usage?
Why does the compiled version differ from the interpreted version and what can I do?
Thanks!
EDIT:
Alright the reason is map, but here's a modified version that uses lazy bytestring:
import qualified Data.ByteString.Lazy as L
import qualified Data.ByteString.Lazy.Char8 as LCHAR
import Data.Monoid
import Data.List
main :: IO ()
main = do
contents <- L.getContents
case sumFile contents of
Nothing -> print "Invalid input"
Just s -> print $ getSum s
where sumFile = foldl' mappend (Just (Sum 0)) . map ((fmap Sum) . (fmap fst) . LCHAR.readInt) . (LCHAR.words)
The result is the same: binary version uses up stack space even though I'm not using sum.

I had a discussion with folks on #haskell, the reason that the ByteString version gives stack overflow error is due to the nested Just (Sum Num) where the inner part is not strictly evaluated.
Essentially when we mappend two Maybe (Just Num), say, Just (Sum 2) and Just (Sum 3), foldl' uses seq to produce a Just ((Sum 2) mappend (Sum 3)), ie. seq strictly evaluated the outmost constructor (mappend two Just (Monoid) to produce a Just (Monoid)). In this case, the inner Monoid is not strictly evaluated, so they are left as mappend connected (Sum Num). This results in 1 million mappend connected (Sum Num) wrapped in Just.
So Saizan on #haskell gives this version which strictly evaluates the inner part of Maybe (Sum Num)
import qualified Data.ByteString.Lazy as L
import qualified Data.ByteString.Lazy.Char8 as LCHAR
import Data.Monoid
import Data.List
forceMaybe Nothing = Nothing
forceMaybe (Just x) = x `seq` (Just x)
main :: IO ()
main = do
contents <- L.getContents
case sumFile contents of
Nothing -> print "Invalid input"
Just s -> print $ getSum s
where sumFile = foldl' (\ x y -> forceMaybe (x `mappend` y)) (Just (Sum 0)) . map ((fmap Sum) . (fmap fst) . LCHAR.readInt) . (LCHAR.words)

First, simple clarification: the stack in ghc runtime has nothing to deal with stack segment, it is internal structure of runtime and this is not source of buffer-overflow type attacks.
Second. Haskell is lazy. Lazy io (getContents) produce lazy list. sum produce result lazily. However, once the result of sum is requested, it has to dig into list recursively, quickly exhausting stack space (you can look in the sources if wish)
to avoid it, you have to use strict version of sum, it should eliminate problem. Standard library has a special function for such cases, foldl' - a strict version of foldl. using foldl' (+) 0 in place of sum should eliminate problem
Third. Stack space leaks are very common problem when one use lazy IO. It may be solved if one switch to iteratee-based IO. Otherwise one should learn to add strictness annotation where needed.
Ah. And by the way. GHC is optimizing compiler. It is not common, but still possible to have some problems with memory leakage in compiled program and to not have them with ghci and vice versa.

I checked out the online version of the book, there're some discussion under that program and the reason it uses stack space is due to map, replacing map with foldl' solves the problem.

Related

Why does my program run using so much memory?

I'm reading a book on Erlang and I make simple example from the book.
%% exrs.erl
-module(exrs).
-export([sum/1]).
sum(0) -> 0;
sum(N) -> N + sum(N - 1).
When I run this example for large number (i.g. 1000000000) it use 16Gb RAM and 48Gb swap file on my PC for calculation this function.
1> exrs:sum(1000000000).
Is this a usual behavior for Erlang VM? And how to avoid the problem like that?
PS:
10> erlang:system_info(version).
"11.1"
11> erlang:system_info(otp_release).
"23"
OS: Win10 x64
As said in other answers, your recursion is not tail optimized. What happens in your code is that erlang evaluates right side of expression and recursively appends new function call to stack. Like below
1_000_000 + sum(999_999 + sum(999_998 + sum(....)))
That is what eats your memory. The proper way is to write function that accepts accumulator as second argument of sum function, like this
-module(exrs).
-export([sum/2]).
sum(0, ACC) -> ACC;
sum(N, ACC) -> sum(N - 1, ACC + N).
Your recursive function can't make use of tail-call optimisation, so it's using a stack frame for each recursive call.
1,000,000,000 recursive calls is a lot of stack frames.
See, for example, this section of "Learn you some Erlang", for more details.

How to count number of non-empty nodes in binary tree in F#

Consider the binary tree algebraic datatype
type btree = Empty | Node of btree * int * btree
and a new datatype deļ¬ned as follows:
type finding = NotFound | Found of int
Heres my code so far:
let s = Node (Node(Empty, 5, Node(Empty, 2, Empty)), 3, Node (Empty, 6, Empty))
(*
(3)
/ \
(5) (6)
/ \ | \
() (2) () ()
/ \
() ()
*)
(* size: btree -> int *)
let rec size t =
match t with
Empty -> false
| Node (t1, m, t2) -> if (m != Empty) then sum+1 || (size t1) || (size t2)
let num = occurs s
printfn "There are %i nodes in the tree" num
This probably isn't close, I took a function that would find if an integer existed in a tree and tried changing the code for what I was trying to do.
I am very new to using F# and would appreciate any help. I am trying to count all non empty nodes in the tree. For example the tree I'm using should print the value 4.
I did not run the compiler on your code, but I believe this does even compile.
However your idea to use a pattern match in a recursive function is good.
As rmunn commented, you want to determine the number of nodes in each case:
An empty tree has no nodes, hence the result is zero.
A non-empty tree, has at least the root node plus the count of its left and right subtrees.
So something along the lines of the following should work
let rec size t =
match t with
| Empty -> 0
| Node (t1, _, t2) -> 1 + (size t1) + (size t2)
The most important detail here is, that you do not need a global variable sum to store any intermediate values. The whole idea of a recursive function is that those intermediate values are the results of recursive calls.
As a remark, your tree in the comment should look like this, I believe.
(*
(3)
/ \
(5) (6)
/ \ | \
() (2) () ()
/ \
() ()
*)
Edit: I misread the misaligned () as leaves of an empty tree, where in fact they are leaves of the subtree (2). So it was just an ASCII art issue :-)
Friedrich already posted a simple version of the size function that will work for most trees. However, the solution is not "tail-recursive", so it can cause a Stack Overflow for large trees. In functional programming languages like F#, recursion is often the preferred technique for things like counting and other aggregate functions. However, recursive functions generally consume a stack frame for each recursive call. This means that for large structures, the call stack can be exhausted before the function completes. In order to avoid this problem, compilers can optimize functions that are considered "tail-recursive" so that they use only one stack frame regardless of how many times they recurse. Unfortunately, this optimization cannot just be implemented for any recursive algorithm. It requires that the recursive call be the last thing that the function does, thereby ensuring that the compiler does not have to worry about jumping back into the function after the call, allowing it to overwrite the stack frame instead of adding another one.
In order to change the size function to be tail-recursive, we need some way to avoid having to call it twice in the case of a non-empty node, so that the call can be the last step of the function, instead of the addition between the two calls in Friedrich's solution. This can be accomplished using a couple different techniques, generally either using an accumulator or using Continuation Passing Style. The simpler solution is often to use an accumulator to keep track of the total size instead of having it be the return value, while Continuation Passing Style is a more general solution that can handle more complex recursive algorithms.
In order to make an accumulator pattern work for a tree where we have to sum both the left and right sub-trees, we need some way to make one tail-call at the end of the function, while still making sure that both sub-trees are evaluated. A simple way to do that is to also accumulate the right sub-trees in addition to the total count, so we can make subsequent tail-calls to evaluate those trees while evaluating the left sub-trees first. That solution might look something like this:
let size t =
let rec size acc ts = function
| Empty ->
match ts with
| [] -> acc
| head :: tail -> head |> size acc tail
| Node (t1, _, t2) ->
t1 |> size (acc + 1) (t2 :: ts)
t |> size 0 []
This adds the acc parameter and the ts parameter to represent the total count and remaining unevaluated sub-trees. When we hit a populated node, we evaluate the left sub-tree while adding the right sub-tree to our list of trees to evaluate later. When we hit the an empty node, we start evaluating any ts we've accumulated, until we have no further populated nodes or unevaluated sub-trees. This isn't the best possible solution for computing the tree-size, and most real solutions would use Continuation Passing Style to make it tail-recusive, but that should make a good exercise as you get more familiar with the language.

Parse Error: (incorrect indentation or misplaced bracket)

I'm starting out to learn Haskell. Even though I'm a dunce extraordinaire, I am intent on making this work. The error I received is listed as the title. This is the code that I wrote to try to implement the behavior of replicating a list (n) times and concatenating its new length as a new list. Now I have a basic understanding of how parsing works in Haskell, below my original code I will give example of some modified code to see if my understanding on parsing is adequate. My question for now is how I can properly indent or structure my block in order to not receive this error (is that specific enough :O) -- is there a piece of information I'm missing when it comes to creating instances and formatting? PLEASE DO NOT TELL ME OR OFFER SUGGESTIONS IF YOU NOTICE THAT MY CURRENT INSTANCE OR MAIN FUNCTION ARE SYNTACTICALLY WRONG. I want to figure it out and will deal with that GHC error when I get to it. (I hope that's the proper way to learn). BUT if I could ask for anyone's help in getting past this first obstacle in understanding proper formatting, I'd be grateful.
module Main where
import Data.List
n :: Int
x :: [Char]
instance Data stutter n x where
x = []
n = replicate >>= x : (n:xs)
stutter >>= main = concat [x:xs]
let stutter 6 "Iwannabehere" -- <-- parse error occurs here!!!
--Modified code with appropriate brackets, at least where I think they go.
module Main where
import Data.List
n :: Int
x :: [Char]
instance Data stutter n x where{
;x = []
;n = replicate >>= x : (n:xs)
;stutter >>= main = concat [x:xs]
;
};let stutter 6 "Iwannabehere" -- there should be no bracket of any kind at the end of this
I placed the 'let' expression on the outside of the block, I don't believe it goes inside and I also receive a parsing error if I do that. Not correct but I thought I'd ask anyway.
I'm not sure what the instance Data stutter n x is supposed to be, the instance XYZ where syntax is used solely for typeclasses, but you have a couple syntax errors here.
First of all, while GHC says that the error is on let stutter 6 "Iwannabehere", your first error occurs before that with stutter >>= main = concat [x:xs]. A single = sign is reserved for assignments, which are merely definitions. You can have assignments at the top level, inside a where block, or inside a let block (the where includes typeclass instance definitions). You can't have an assignment be part of an expression like x >>= y = z.
Your next syntax error is the let itself. let blocks can not appear at the top level, they only appear within another definition. You use let in GHCi but the reasons for that are outside the scope of this answer. Suffice to say that entering expression in GHCi is not equivalent to the top level of a source file.
Next, if you were to use a let block somewhere, it can only contain definitions. The syntax looks more like
let <name> [<args>] = <definition>
[<name> [<args>] = <definition>]
in <expression>
And this whole block makes an expression. For example, you could write
def f(x, y, z):
w = x + y + z
u = x - y - z
return w * u
in Python, and this would be equivalent to the Haskell function definition
f x y z = let w = x + y + z
u = x - y - z
in w * u
It only defines local variables. There is another form when you're using it inside do blocks where you can exclude the in <expression> part, such as
main = do
name <- getLine
let message = if length name > 5 then "short name" else "long name"
goodbye n = putStrLn ("Goodbye, " ++ n)
putStrLn message
goodbye name
Note that there is no need to use in here. You can if you want, it just means you have to start a new do block:
main = do
name <- getLine
let message = ...
goodbye n = ...
in do
putStrLn message
goodbye name
And this isn't as pretty.
Hopefully this points you more towards correct syntax, but it looks like you have some misunderstandings about how Haskell works. Have you looked at Learn You a Haskell? It's a pretty gentle and fun introduction to the language that can really help you learn the syntax and core ideas.
Your parse error is from the let keyword. Remove it and no error related to that will occur. let x = y is only relevant in GHCi and do-blocks, neither of which is relevant at this point. Essentially, just replace it with this line:
theWordIGet = stutter 6 "Iwannabehere"
Secondly, instance keyword in Haskell has absolutley nothing to do with what you want to do at this stage. This is not how Haskell functions are defined, which is what I'm guessing you want to do. This is what you're wanting to do to create a stutter function, assuming it simply repeats a string n times.
stutter :: Int -> String -> String
stutter n x = concat (replicate n x)
You'll also want to remove the type declarations for the (out-of-scope) values n and x: they're not objects, they're arguments for a function, which has its own signature determining the types of n and x within a function call.
Lastly, I imagine you will want to print the value of stutter 6 "Iwannabehere" when the program is executed. To do that, just add this:
main :: IO ()
main = print (stutter 6 "Iwannabehere")
In conclusion, I implore you to start from scratch and read 'Learn You a Haskell' online here, because you're going off in entirely the wrong direction - the program you've quoted is a jumble of expressions that could have a meaning, but are in the wrong place entirely. The book will show you the syntax of Haskell much better that I can write about in this one answer, and will explain fully how to make your program behave in the way you expect.

How to parse arbitrary lists with Haskell parsers?

Is it possible to use one of the parsing libraries (e.g. Parsec) for parsing something different than a String? And how would I do this?
For the sake of simplicity, let's assume the input is a list of ints [Int]. The task could be
drop leading zeros
parse the rest into the pattern (S+L+)*, where S is a number less than 10, and L is a number larger or equal to ten.
return a list of tuples (Int,Int), where fst is the product of the S and snd is the product of the L integers
It would be great if someone could show how to write such a parser (or something similar).
Yes, as user5402 points out, Parsec can parse any instance of Stream, including arbitrary lists. As there are no predefined token parsers (as there are for text) you have to roll your own, (myToken below) using e.g. tokenPrim
The only thing I find a bit awkward is the handling of "source positions". SourcePos is an abstract type (rather than a type class) and forces me to use its "filename/line/column" format, which feels a bit unnatural here.
Anyway, here is the code (without the skipping of leading zeroes, for brevity)
import Text.Parsec
myToken :: (Show a) => (a -> Bool) -> Parsec [a] () a
myToken test = tokenPrim show incPos $ justIf test where
incPos pos _ _ = incSourceColumn pos 1
justIf test x = if (test x) then Just x else Nothing
small = myToken (< 10)
large = myToken (>= 10)
smallLargePattern = do
smallints <- many1 small
largeints <- many1 large
let prod = foldl1 (*)
return (prod smallints, prod largeints)
myIntListParser :: Parsec [Int] () [(Int,Int)]
myIntListParser = many smallLargePattern
testMe :: [Int] -> [(Int, Int)]
testMe xs = case parse myIntListParser "your list" xs of
Left err -> error $ show err
Right result -> result
Trying it all out:
*Main> testMe [1,2,55,33,3,5,99]
[(2,1815),(15,99)]
*Main> testMe [1,2,55,33,3,5,99,1]
*** Exception: "your list" (line 1, column 9):
unexpected end of input
Note the awkward line/column format in the error message
Of course one could write a function sanitiseSourcePos :: SourcePos -> MyListPosition
There is very likely a way to get Parsec to use [a] as the stream type, but the idea behind parser combinators is actually very simple, and it's not very difficult to roll your own library.
A very accessible resource I would recommend is Monadic Parsing in Haskell by Graham Hutton and Erik Meijer.
Indeed, right now Erik Meijer is teaching an intro Haskell/functional programming course on edx.org (link) and Lecture 7 is all about functional parsers. As he states in the intro to the lecture:
"... No one can follow the path towards mastering functional programming without writing their own parser combinator library. We start by explaining what parsers are and how they can naturally be viewed as side-effecting functions. Next we define a number of basic parsers and higher-order functions for combining parsers. ..."

How do I know if a function is tail recursive in F#

I wrote the follwing function:
let str2lst str =
let rec f s acc =
match s with
| "" -> acc
| _ -> f (s.Substring 1) (s.[0]::acc)
f str []
How can I know if the F# compiler turned it into a loop? Is there a way to find out without using Reflector (I have no experience with Reflector and I Don't know C#)?
Edit: Also, is it possible to write a tail recursive function without using an inner function, or is it necessary for the loop to reside in?
Also, Is there a function in F# std lib to run a given function a number of times, each time giving it the last output as input? Lets say I have a string, I want to run a function over the string then run it again over the resultant string and so on...
Unfortunately there is no trivial way.
It is not too hard to read the source code and use the types and determine whether something is a tail call by inspection (is it 'the last thing', and not in a 'try' block), but people second-guess themselves and make mistakes. There's no simple automated way (other than e.g. inspecting the generated code).
Of course, you can just try your function on a large piece of test data and see if it blows up or not.
The F# compiler will generate .tail IL instructions for all tail calls (unless the compiler flags to turn them off is used - used for when you want to keep stack frames for debugging), with the exception that directly tail-recursive functions will be optimized into loops. (EDIT: I think nowadays the F# compiler also fails to emit .tail in cases where it can prove there are no recursive loops through this call site; this is an optimization given that the .tail opcode is a little slower on many platforms.)
'tailcall' is a reserved keyword, with the idea that a future version of F# may allow you to write e.g.
tailcall func args
and then get a warning/error if it's not a tail call.
Only functions that are not naturally tail-recursive (and thus need an extra accumulator parameter) will 'force' you into the 'inner function' idiom.
Here's a code sample of what you asked:
let rec nTimes n f x =
if n = 0 then
x
else
nTimes (n-1) f (f x)
let r = nTimes 3 (fun s -> s ^ " is a rose") "A rose"
printfn "%s" r
I like the rule of thumb Paul Graham formulates in On Lisp: if there is work left to do, e.g. manipulating the recursive call output, then the call is not tail recursive.

Resources