megaparsec reports incorrect location on parse error - parsing

For this project I'm parsing in two stages. The first stage handles include/ifdef/define directives and chunks the input up into [Span] items which define their start/end points in the original inputs along with the body text. This stream is then parsed by the second stage into my AST for subsequent processing.
Each element of the AST carries it's source position and any semantic error caught after parsing prints the correct error position regardless of include depth. This part is crucial since it comes after the stage that has the problem.
The problem is given a parse error in the second stage from an included file it reports a bogus error with a location at the top level rule in the input. A parse error in the initial file works fine. The presence of any directives will divide even the initial file into multiple chunks so it's not a 'single chunk' vs. 'multiple chunks' issue.
Given the fact that the AST is getting the locations correct I'm stumped as to how Megaparsec is reporting bad info when parse errors are encountered.
I'm included my stream instance and (set|get)(Position|Input) code since these seem like the relevant bits. i feel like there must be some bit of megaparsec housekeeping that I'm not doing or that my Stream instance is invalid for some reason.
data Span = Span
{ spanStart :: SourcePos
, spanEnd :: SourcePos
, spanBody :: T.Text
} deriving (Eq, Ord, Show)
instance Stream [Span] where
type Token [Span] = Span
type Tokens [Span] = [Span]
tokenToChunk Proxy = pure
tokensToChunk Proxy = id
chunkToTokens Proxy = id
chunkLength Proxy = foldl1 (+) . map (T.length . spanBody)
chunkEmpty Proxy = all ((== 0) . T.length . spanBody)
positionAt1 Proxy pos (Span start _ _) = trace ("pos1" ++ show start) start
positionAtN Proxy pos [] = pos
positionAtN Proxy _ (Span start _ _:_) = trace ("posN" ++ show start) start
advance1 Proxy _ _ (Span _ end _) = end
advanceN Proxy _ pos [] = pos
advanceN Proxy _ _ ts = let Span _ end _ = last ts in end
take1_ [] = Nothing
take1_ s = case takeN_ 1 s of
Nothing -> Nothing
Just (sp, s') -> Just (head sp, s')
takeN_ _ [] = Nothing
takeN_ n s#(t:ts)
| s == [] = Nothing
| n <= 0 = Just ([t {spanEnd = spanStart t, spanBody = ""}], s)
| n < (T.length . spanBody) t = let (l, r) = T.splitAt n (spanBody t)
sL = spanStart t
eL = foldl (defaultAdvance1 (mkPos 3)) sL (T.unpack (T.tail l))
sR = defaultAdvance1 (mkPos 3) eL (T.last l)
eR = spanEnd t
l' = [Span sL eL l]
r' = (Span sR eR r):ts
in Just (trace (show n) l', r')
| n == (T.length . spanBody) t = Just ([t], ts)
| otherwise = case takeN_ (n - T.length (spanBody t)) ts of
Nothing -> Just ([t], [])
Just (t', ts') -> Just (t:t', ts')
takeWhile_ p s = fromJust $ takeN_ (go 0 s) s
where go n s = case take1_ s of
Nothing -> n
Just (c, s') -> if p c
then go (n + 1) s'
else n
Find include and swap to it:
"include" -> do
file <- between dquote dquote (many (alphaNumChar <|> char '.' <|> char '/' <|> char '_'))
s <- liftIO (Data.Text.IO.readFile file)
p <- getPosition
i <- getInput
pushPosition p
stack %= (:) (p, i)
setPosition (initialPos file)
setInput s
And if we reach the end of input pop stack and continue:
parseStream' :: StreamParser [Span]
parseStream' = concat <$> many p
where p = do
b <- tick <|> block
end <- option False (True <$ hidden eof)
h <- use stack
when (end && (h /= [])) $ do
popPosition
setInput (h ^?! ix 0 . _2)
stack %= tail
return b

Related

Stack overflow with two functions calling each other in Applicative parser

I'm doing data61's course: https://github.com/data61/fp-course. In the parser one, the following implementation will cause parse (list1 (character *> valueParser 'v')) "abc" stack overflow.
Existing code:
data List t =
Nil
| t :. List t
deriving (Eq, Ord)
-- Right-associative
infixr 5 :.
type Input = Chars
data ParseResult a =
UnexpectedEof
| ExpectedEof Input
| UnexpectedChar Char
| UnexpectedString Chars
| Result Input a
deriving Eq
instance Show a => Show (ParseResult a) where
show UnexpectedEof =
"Unexpected end of stream"
show (ExpectedEof i) =
stringconcat ["Expected end of stream, but got >", show i, "<"]
show (UnexpectedChar c) =
stringconcat ["Unexpected character: ", show [c]]
show (UnexpectedString s) =
stringconcat ["Unexpected string: ", show s]
show (Result i a) =
stringconcat ["Result >", hlist i, "< ", show a]
instance Functor ParseResult where
_ <$> UnexpectedEof =
UnexpectedEof
_ <$> ExpectedEof i =
ExpectedEof i
_ <$> UnexpectedChar c =
UnexpectedChar c
_ <$> UnexpectedString s =
UnexpectedString s
f <$> Result i a =
Result i (f a)
-- Function to determine is a parse result is an error.
isErrorResult ::
ParseResult a
-> Bool
isErrorResult (Result _ _) =
False
isErrorResult UnexpectedEof =
True
isErrorResult (ExpectedEof _) =
True
isErrorResult (UnexpectedChar _) =
True
isErrorResult (UnexpectedString _) =
True
-- | Runs the given function on a successful parse result. Otherwise return the same failing parse result.
onResult ::
ParseResult a
-> (Input -> a -> ParseResult b)
-> ParseResult b
onResult UnexpectedEof _ =
UnexpectedEof
onResult (ExpectedEof i) _ =
ExpectedEof i
onResult (UnexpectedChar c) _ =
UnexpectedChar c
onResult (UnexpectedString s) _ =
UnexpectedString s
onResult (Result i a) k =
k i a
data Parser a = P (Input -> ParseResult a)
parse ::
Parser a
-> Input
-> ParseResult a
parse (P p) =
p
-- | Produces a parser that always fails with #UnexpectedChar# using the given character.
unexpectedCharParser ::
Char
-> Parser a
unexpectedCharParser c =
P (\_ -> UnexpectedChar c)
--- | Return a parser that always returns the given parse result.
---
--- >>> isErrorResult (parse (constantParser UnexpectedEof) "abc")
--- True
constantParser ::
ParseResult a
-> Parser a
constantParser =
P . const
-- | Return a parser that succeeds with a character off the input or fails with an error if the input is empty.
--
-- >>> parse character "abc"
-- Result >bc< 'a'
--
-- >>> isErrorResult (parse character "")
-- True
character ::
Parser Char
character = P p
where p Nil = UnexpectedString Nil
p (a :. as) = Result as a
-- | Parsers can map.
-- Write a Functor instance for a #Parser#.
--
-- >>> parse (toUpper <$> character) "amz"
-- Result >mz< 'A'
instance Functor Parser where
(<$>) ::
(a -> b)
-> Parser a
-> Parser b
f <$> P p = P p'
where p' input = f <$> p input
-- | Return a parser that always succeeds with the given value and consumes no input.
--
-- >>> parse (valueParser 3) "abc"
-- Result >abc< 3
valueParser ::
a
-> Parser a
valueParser a = P p
where p input = Result input a
-- | Return a parser that tries the first parser for a successful value.
--
-- * If the first parser succeeds then use this parser.
--
-- * If the first parser fails, try the second parser.
--
-- >>> parse (character ||| valueParser 'v') ""
-- Result >< 'v'
--
-- >>> parse (constantParser UnexpectedEof ||| valueParser 'v') ""
-- Result >< 'v'
--
-- >>> parse (character ||| valueParser 'v') "abc"
-- Result >bc< 'a'
--
-- >>> parse (constantParser UnexpectedEof ||| valueParser 'v') "abc"
-- Result >abc< 'v'
(|||) ::
Parser a
-> Parser a
-> Parser a
P a ||| P b = P c
where c input
| isErrorResult resultA = b input
| otherwise = resultA
where resultA = a input
infixl 3 |||
My code:
instance Monad Parser where
(=<<) ::
(a -> Parser b)
-> Parser a
-> Parser b
f =<< P a = P p
where p input = onResult (a input) (\i r -> parse (f r) i)
instance Applicative Parser where
(<*>) ::
Parser (a -> b)
-> Parser a
-> Parser b
P f <*> P a = P b
where b input = onResult (f input) (\i f' -> f' <$> a i)
list ::
Parser a
-> Parser (List a)
list p = list1 p ||| pure Nil
list1 ::
Parser a
-> Parser (List a)
list1 p = (:.) <$> p <*> list p
However, if I change list to not use list1, or use =<< in list1, it works fine. It also works if <*> uses =<<. I feel like it might be an issue with tail recursion.
UPDATE:
If I use lazy pattern matching here
P f <*> ~(P a) = P b
where b input = onResult (f input) (\i f' -> f' <$> a i)
It works fine. Pattern matching here is the problem. I don't understand this... Please help!
If I use lazy pattern matching P f <*> ~(P a) = ... then it works fine. Why?
This very issue was discussed recently. You could also fix it by using newtype instead of data: newtype Parser a = P (Input -> ParseResult a).(*)
The definition of list1 wants to know both parser arguments to <*>, but actually when the first will fail (when input is exhausted) we don't need to know the second! But since we force it, it will force its second argument, and that one will force its second parser, ad infinitum.(**) That is, p will fail when input is exhausted, but we have list1 p = (:.) <$> p <*> list p which forces list p even though it won't run when the preceding p fails. That's the reason for the infinite looping, and why your fix with the lazy pattern works.
What is the difference between data and newtype in terms of laziness?
(*)newtype'd type always has only one data constructor, and pattern matching on it does not actually force the value, so it is implicitly like a lazy pattern. Try newtype P = P Int, let foo (P i) = 42 in foo undefined and see that it works.
(**) This happens when the parser is still prepared, composed; before the combined, composed parser even gets to run on the actual input. This means there's yet another, third way to fix the problem: define
list1 p = (:.) <$> p <*> P (\s -> parse (list p) s)
This should work regardless of the laziness of <*> and whether data or newtype was used.
Intriguingly, the above definition means that the parser will be actually created during run time, depending on the input, which is the defining characteristic of Monad, not Applicative which is supposed to be known statically, in advance. But the difference here is that the Applicative depends on the hidden state of input, and not on the "returned" value.

How to parse a boolean expression

I'm pretty new to F#, and I'm trying to use recursion to solve a problem.
The function receives a string, and returns a bool. The string gets parsed, and evaluated. This is bool logic, so
(T|F) returns true
(T&(T&T)) returns true
((T|T)&(T&F)) returns false
(F) = returns false
My idea was that every time I found a ), replace the part of the string from the previous ( to that ) with the result of the Comparison match. Doing this over and over until only T or F remains, to return true or false.
EDIT:
I expect it to take the string, and keep swapping out what is in between the ( and ) with the result of the comparison until it comes down to a T or F. What is happening, is an error about an incomplete structured construct. The error is in the for loop.
As I am so new to this language, I'm not sure what I'm doing wrong. Do you see it?
let ComparisonSolver (comp:string) =
let mutable trim = comp
trim <- trim.Replace("(", "")
trim <- trim.Replace(")", "")
match trim with
| "T" -> "T"
| "F" -> "F"
| "!T" -> "F"
| "!F" -> "T"
| "T&T" -> "T"
| "F&F" -> "T"
| "T&F" -> "F"
| "F&T" -> "F"
| "T|T" -> "T"
| "F|F" -> "F"
| "T|F" -> "T"
| "F|T" -> "T"
| _ -> ""
let rec BoolParser arg =
let mutable args = arg
if String.length arg = 1 then
match arg with
| "T" -> true
| "F" -> false
else
let mutable ParseStart = 0
let endRange = String.length args
for letter in [0 .. endRange]
if args.[letter] = "(" then
ParseStart <- letter
else if args.[letter] = ")" then
args <- args.Replace(args.[ParseStart .. letter], ComparisonSolver args.[ParseStart .. letter])
BoolParser args
let result = BoolParser "(T)&(F)"
There are a few things you need to correct.
for letter in [0 .. endRange] is missing a do at the end of it - it should be for letter in [0 .. endRange] do
The if comparisons in the for loop are comparing chars with strings. You need to replace "(" and ")" with '(' and ')'
for letter in [0 .. endRange] will go out of range: In F# the array construct [x..y] will go from x to y inclusive. It's a bit like in C# if you had for (int i = 0; i <= array.Length; i++). In F# you can also declare loops like this: for i = 0 to endRange - 1 do.
for letter in [0 .. endRange] will go out of range again: It's going from 0 to endrange, which is the length of args. But args is getting shortened in the for loop, so it will eventually try to get a character from args that's out of range.
Now, the problem with the if..then..else statements, which is what I think you were looking at from the beginning.
if args.[letter] = '(' then
ParseStart <- letter
else if args.[letter] = ')' then
args <- args.Replace(args.[ParseStart .. letter], ComparisonSolver args.[ParseStart .. letter])
BoolParser args
Let's take the code within the two branches as two separate functions.
The first does ParseStart <- letter, which assigns letter to ParseStart. This function returns unit, which is F# equivalent of void.
The second does:
args <- args.Replace(args.[ParseStart .. letter], ComparisonSolver args.[ParseStart .. letter])
BoolParser args
This function returns a bool.
Now when you put them together in an if..then..else statement you have in one branch that results a unit and in the other in a bool. In this case it doesn't know which one to return, so it shows an "expression was expected to have type" error.
I strongly suspect that you wanted to call BoolParser args from outside
the for/if loop. But it's been indented so that F# treats it as part of the else if statement.
There are many ways to parse a boolean expression. It might be a good idea to look at the excellent library FParsec.
http://www.quanttec.com/fparsec/
Another way to implement parsers in F# is to use Active Patterns which can make for readable code
https://learn.microsoft.com/en-us/dotnet/fsharp/language-reference/active-patterns
It's hard to provide good error reporting through Active Patterns but perhaps you can find some inpiration from the following example:
let next s i = struct (s, i) |> Some
// Skips whitespace characters
let (|SkipWhitespace|_|) struct (s, i) =
let rec loop j =
if j < String.length s && s.[j] = ' ' then
loop (j + 1)
else
next s j
loop i
// Matches a specific character: ch
let (|Char|_|) ch struct (s, i) =
if i < String.length s && s.[i] = ch then
next s (i + 1)
else
None
// Matches a specific character: ch
// and skips trailing whitespaces
let (|Token|_|) ch =
function
| Char ch (SkipWhitespace ps) -> Some ps
| _ -> None
// Parses the boolean expressions
let parse s =
let rec term =
function
| Token 'T' ps -> Some (true, ps)
| Token 'F' ps -> Some (false, ps)
| Token '(' (Parse (v, Token ')' ps)) -> Some (v, ps)
| _ -> None
and opReducer p ch reducer =
let (|P|_|) ps = p ps
let rec loop l =
function
| Token ch (P (r, ps)) -> loop (reducer l r) ps
| Token ch _ -> None
| ps -> Some (l, ps)
function
| P (l, ps) -> loop l ps
| _ -> None
and andExpression ps = opReducer term '&' (&&) ps
and orExpression ps = opReducer andExpression '|' (||) ps
and parse ps = orExpression ps
and (|Parse|_|) ps = parse ps
match (struct (s, 0)) with
| SkipWhitespace (Parse (v, _)) -> Some v
| _ -> None
module Tests =
// FsCheck allows us to get better confidence in that the parser actually works
open FsCheck
type Whitespace =
| Space
type Ws = Ws of (Whitespace [])*(Whitespace [])
type Expression =
| Term of Ws*bool
| And of Expression*Ws*Expression
| Or of Expression*Ws*Expression
override x.ToString () =
let orPrio = 1
let andPrio = 2
let sb = System.Text.StringBuilder 16
let ch c = sb.Append (c : char) |> ignore
let token (Ws (l, r)) c =
sb.Append (' ', l.Length) |> ignore
sb.Append (c : char) |> ignore
sb.Append (' ', r.Length) |> ignore
let enclose p1 p2 f =
if p1 > p2 then ch '('; f (); ch ')'
else f ()
let rec loop prio =
function
| Term (ws, v) -> token ws (if v then 'T' else 'F')
| And (l, ws, r) -> enclose prio andPrio <| fun () -> loop andPrio l; token ws '&' ;loop andPrio r
| Or (l, ws, r) -> enclose prio orPrio <| fun () -> loop orPrio l ; token ws '|' ;loop orPrio r
loop andPrio x
sb.ToString ()
member x.ToBool () =
let rec loop =
function
| Term (_, v) -> v
| And (l, _, r) -> loop l && loop r
| Or (l, _, r) -> loop l || loop r
loop x
type Properties() =
static member ``Parsing expression shall succeed`` (expr : Expression) =
let expected = expr.ToBool () |> Some
let str = expr.ToString ()
let actual = str |> parse
expected = actual
let fscheck () =
let config = { Config.Quick with MaxTest = 1000; MaxRejected = 1000 }
Check.All<Properties> config

Parser Error Reporting deriving the right instances

I am trying to build an error reporting parser in haskell. Currently I have been looking at a tutorial and this is what I have so far.
type Position = (Int, Int)
type Err = (String, Position)
newtype Parser1 a = Parser1 {parse1 :: StateT String (StateT Position (MaybeT
(Either Err))) a} deriving (Monad, MonadState String, Applicative, Functor)
runParser :: Parser1 a -> String -> Either Err (Maybe ((a, String), Position))
runParser p ts = runMaybeT $ runStateT (runStateT (parse1 p) ts) (0, 0)
basicItem = Parser1 $ do
state <- get
case state of
(x:xs) -> do {put xs; return x}
[] -> empty
item = Parser1 $ do
c <- basicItem
pos <- lift get
lift (put (f pos))
return c
f :: Char -> Position -> Position
f d (ln, c) = (ln + 1, 0)
f _ (ln, c) = (ln , c + 1)
This piece of code does not compile, I think it is to do with my item parser and the fact that I am trying to access the inner state namely position. I was wondering how in the deriving clause do I make Haskell derive the instances for both states in my parser type, so then I can access the inner state?
Edit 1:
I initially tried declaring basicItem as:
basicItem :: (MonadState String m, Alternative m) => m t
basicItem = do
state <- get
case state of
(x:xs) -> do {put xs; return x}
[] -> empty`
However, I kept getting the error:
I was wondering why it cannot deduce context of get from MonadState String m,
when in my deriving clause I have MonadState String.
The error for my initial question is here:

parsec produce strange errors when trying to handle Maybe as ParseError

If I have this code :
import Text.Parsec
ispositive a = if (a<0) then Nothing else (Just a)
f a b = a+b
parserfrommaybe :: String -> (Maybe c) -> Parsec a b c
parserfrommaybe msg Nothing = fail msg
parserfrommaybe _ (Just res) = return res
(<!>) :: Parsec a b (Maybe c) -> String -> Parsec a b c
(<!>) p1 msg = p1 >>= (parserfrommaybe msg)
integermaybeparser = (ispositive <$> integer) <!> "negative numbers are not allowed"
testparser = f <$> (integermaybeparser <* whiteSpace) <*> integermaybeparser
when I test testparser with input like this "-1 3" it gives :
Left (line 1, column 4):
unexpected "3"
negative numbers are not allowed
I expected it to give error on Column 1 and give the error message without the sentence "unexpected 3" but it seems parsec continued parsing.
Why did this happen ? and how to make parsec give the error message I expect ?
I have found the solution, the cause of is that the first parser gets run and consumes input even when failing.
The solution was to use lookAhead like this:
(<!>) :: (Monad m,Stream a m t) => ParsecT a b m (Maybe c) -> String -> ParsecT a b m c
(<!>) p1 msg = ((lookAhead p1) >>= (parserfrommaybe msg)) *> (p1 >>= (parserfrommaybe msg))
if lookAhead p1 returns Nothing then the first argument of *> would fail without consuming input because of lookAhead, now if lookAhead p1 returns Just res then it would succeed again without consuming input and the result would be obtained from the second argument of *>.
ofcourse I had to change parserfrommaybe type annotation to (Monad m) => String -> (Maybe c) -> ParsecT a b m c to satisfy ghc.

Monadic parsing functional pearl - gluing multiple parsers together

I am working my way through the functional pearl paper Monadic parsing in Haskell (after recommendation at haskellforall.com to read that paper to understand parsing). I wrote an implementation until section 4 on page 3 as below:
newtype Parser a = Parser (String -> [(a,String)])
parse (Parser p) = p
instance Monad Parser where
return a = Parser (\cs -> [(a,cs)])
p >>= f = Parser (\cs -> concat [parse (f a) cs' | (a,cs') <- parse p cs])
item :: Parser Char
item = Parser (\cs -> case cs of
"" -> []
(c:cs) -> [(c,cs)])
p :: Parser (Char,Char)
p = do { a <- item; item; b <- item; return (a,b)}
According to the paper, p is a parser that consumes three characters, skips middle one, and returns a pair of first and second. What I can't figure out is how the modified input string is passed to 2nd and 3rd definitions of item in p. We are not passing the result of first parser to second parser, and so on (because ;, syntactic sugar for >> is used which discards the result as shown by type signature (>>) :: Monad m => m a -> m b -> m b). I will appreciate explanation of how the modified function is being passed in last two invocations of item in p.
Another thing that confuses me is the handling of cs in item - it doesn't return (head,tail) pair. Shouldn't it be redefined as follow since the item parser consumes one character according to the paper:
item :: Parser Char
item = Parser (\cs -> case cs of
"" -> []
(c:cs') -> [(c,cs')]) -- redefinition - use cs' to denote tail
The syntax ; is not always syntactic sugar for >>.
Rather, we have:
do m ; n = m >> n
do x<-m ; n = m >>= \x -> n
(The above translation is simplified, the full gory details can be found in the Haskell Report)
So, your definition for p is equivalent to:
p = item >>= \a -> ( item >> (item >>= \b -> return (a,b) ))
Here, you can see that the first and third items do not have their results discarded (because >>= binds them to a and b respectively), while the middle item does.
Also note that the code
\cs -> case cs of
"" -> []
(c:cs) -> [(c,cs)]
is misleading since it is defining variable cs twice: once in the \cs and once in the
pattern (c:cs). It is equivalent to
\cs -> case cs of
"" -> []
(x:xs) -> [(x,xs)]
This clarifies that the final String is the output is not the original cs one, but rather its tail xs.
In a comment, the poster wondered why the three uses of item do not return the same result, i.e., why in return (a,b) the character a is not equal to b. This is due to the >>= monadic operator, which in this Parser monad automatically feeds the output string xs of each item occurence to the next one. Indeed, the whole point of this monad is to help feeding the "leftover" output of each parser as the "to-be-consumed" input in the next one. This has two advantages: it frees the programmer from having to write code to pass this string around, and it ensures that the string is not accidentally "rewound" to a previous state. To illustrate the latter point, here's some wrong code:
let [(c1,s1)] = someParser someInitialString
[(c2,s2)] = anotherParser1 s1
[(c3,s3)] = anotherParser2 s2
[(c4,s4)] = anotherParser3 s3
[(c5,s5)] = anotherParser4 s2 -- Whoops! Should have been s4
in [c1,c2,c3,c4,c5]
In the last step the string, after having been consumed multiple times, is wrongly rolled back to a previous state, as if the parsers anotherParser2 and anotherParser3 did not consume anything at all. This error is prevented by composing parsers through >>= instead.
I'll try shedding some more light regarding >>.
As you see in the other answer, you should desugar the do's into >>= to better understand what's going on.
Let's for example write a parser that parses two chars and returns them.
twoChars :: Parser (Char,Char)
twoChars = do
i <- item
j <- item
return (i,j)
Now, desugar the do syntax:
twoChars :: Parser (Char,Char)
twoChars =
item >>= (\i ->
item >>= (\j ->
return (i,j) ) )
I put brackets for clarity. As you see, the second item receives the result of the first item parser in the anonymous function, with the result bound to i. The >>= function takes a parser, a function, and returns a parser. Best way to understand it would be to plug it into the definition:
f = \i → item »= \j → return (i,j)
twoChars = item >>= f
twoChars = Parser (\cs -> concat [parse (f a) cs' | (a,cs') <- parse item cs])
So we got back a new Parser. Try to imagine what it will do on an input "abc". cs is bound to "abc", and the item Parser is used to get back [('a',"bc")]. Now, we apply f to 'a', to get back the new parser:
item >>= \j -> return ('a',j)
This parser will be passed the rest of our string left to process ("bc"), and it will use the item parser to get out the b when the \j above is bound to b. We then get a return ('a','b') statement, which puts ('a','b') into a parser that just return ('a','b').
I hope this clears up how the information flow happens. Now, suppose that you want to ignore a character. You could do it like this.
twoChars :: Parser (Char,Char)
twoChars =
item >>= \i ->
item >>= \j ->
item >>= \k ->
return (i,k)
It's ok that the j is bound to 'b' for the example "abc", you never use it. We can so replace j by _.
twoChars :: Parser (Char,Char)
twoChars =
item >>= \i ->
item >>= \_ ->
item >>= \k ->
return (i,k)
But we also know that >> :: m a -> m b -> m b can be defined as:
p >> q = p >>= \_ -> q
So we are left with
twoChars :: Parser (Char,Char)
twoChars =
item >>= \i ->
item >>
item >>= \k ->
return (i,k)
Finally, you can sugar this back into do. The application of >> simply sugars into a single-line statement with no bounding. It results in:
twoChars :: Parser (Char,Char)
twoChars = do
i <- item
item
j <- item
return (i,j)
Hope this cleared some things up.
The more uniform translation of your
p3 = do { a <- item; item; b <- item; return (a,b)}
-- do { a <- item; z <- item; b <- item; return (a,b)} -- z is ignored
is
p3 = item >>= (\a ->
item >>= (\z ->
item >>= (\b ->
return (a,b)))) -- z is unused
(the key observation here is that the functions are nested). Which means that
-- parse (return a) cs = [(a,cs)]
-- parse (p >>= f) cs = [r | (a,cs1) <- parse p cs, -- concat
-- r <- parse (f a) cs1] ) -- inlined !
parse p3 cs
= [ r | (a,cs1) <- parse item cs,
r <- [ r | (z,cs2) <- parse item cs1,
r <- [ r | (b,cs3) <- parse item cs2,
r <- -- parse (return (a,b)) cs3
[((a,b),cs3)]]]] -- z is unused
= [ ((a,b),cs3) | (a,cs1) <- parse item cs,
(_,cs2) <- parse item cs1,
(b,cs3) <- parse item cs2]
So you see, "the input string" does change: first it's cs, then cs1, then cs2.
That is the simple real computation behind all the Parser tags and do syntax. It's all just about the chaining of inputs and outputs in the nested loops, in the end:
parse p3 cs =
for each (a,cs1) in (parse item cs):
for each (z,cs2) in (parse item cs1):
for each (b,cs3) in (parse item cs2):
yield ((a,b),cs3)

Resources