let sub4 a b c d =
a-b-c-d;
printfn"%d" sub4;
I'm thinking there are 14 tokens in first one and 5 tokens in second one.
Another confusion is "sub4","a" here is identifier ,"printfn" is keyword, and "%d" is literal ,right?
Thank you for helping me!
I think the counting will somewhat depend on what you mean by a "token".
The most over-engineered solution to this answer is to use the F# compiler itself to parse the source code and report what it parses.
The following invokes the F# parser and reports the tokens that the parser sees when it looks at your code:
#r "FSharp.Compiler.Service.dll"
open FSharp.Compiler.SourceCodeServices
let sourceTok = FSharpSourceTokenizer([], Some "C:\\test.fsx")
let tokenizer = sourceTok.CreateLineTokenizer("""let sub4 a b c d =
a-b-c-d;
printfn"%d" sub4;""")
Seq.unfold (fun state ->
let res, state = tokenizer.ScanToken(state)
res |> Option.map (fun res -> res, state) ) FSharpTokenizerLexState.Initial
|> Seq.filter (fun t -> t.TokenName <> "WHITESPACE")
|> Seq.iter (fun t -> printf "%s, " t.TokenName)
The tokenizer counts whitespace as tokens, but I assume you want to skip over those, so I added a bit of code to filter those out. The result is (formatted to match the source code lines):
LET, IDENT, IDENT, IDENT, IDENT, IDENT, EQUALS
IDENT, MINUS, IDENT, MINUS, IDENT, MINUS, IDENT, SEMICOLON
IDENT, STRING_TEXT, STRING_TEXT, STRING_TEXT, STRING, IDENT, SEMICOLON
This should give you a good idea about what the F# compiler considers as a token.
The F# parser treats semicolon as a token - it's a valid token in the language, because it's used to separate expressions, but you may or may not want to count it.
The F# tokenizer is a bit confused by the string "%d" and parses this as a couple of separate STRING_TEXT and STRING tokens - but this is just a technical aspect of the tokenizer, so I would count all of those as just a single token.
Related
I am calling the string method "contains" in a lambda function, and would like to negate it. I thought this could be done with not myString.Contains("abbr") but it gives me the error
Successive arguments should be separated by spaces or tupled, and arguments involving function or method applications should be parenthesized
My actual function is this
open System.IO
let createWordArray filePath =
File.ReadLines(filePath)
|> Seq.filter (fun line -> line <> "")
|> Seq.filter (fun line -> not line.Contains("abbr.")) // Error occurs here
|> Seq.map (fun line -> line.Split(' ').[0])
|> Seq.filter (fun word -> word.StartsWith("-") || word.EndsWith("-"))
|> Seq.toArray
Please point out any other obvious mistakes I'm making.
You just need to add parentheses around the argument of the not function:
|> Seq.filter (fun line ->
not (line.Contains("abbr.")))
Without the parentheses, the compiler is interpreteing your code as a call to not with two arguments:
not (line.Contains) ("abbr.")
F# syntax is not like C# (or C, or C++, or Java)
In particular, F# does not use parentheses for passing function arguments. Instead, F# uses whitespace for that:
let x = f y z
You are, of course, free to enclose any terms in parentheses if you wanted to indicate the order of operations, or just for aesthetic reasons:
let x = f (y+5) z // parens for order of operations
let x = f (y) (z) // parens just for the heck of it
So you see, when you write:
line.Contains("abbr.")
There is no special meaning to the parens. You could just as well write this:
line.Contains "abbr."
It would be equivalent.
See what's happening? Not yet? Well, ok, let's try to add the not to the mix:
not line.Contains "abbr."
Is it clearer now? This looks like you're trying to call the not function, and you're giving it two arguments: first argument is line.Contains, and the second argument is "abbr."
This is not what you meant, right? What you meant was probably to first call line.Contains passing it "abbr " as argument, and then pass the result of that to not
The most straightforward way to do this is to use parentheses to indicate the order of operations:
not (line.Contains "abbr.")
Or, alternatively, you could use operator <|, which is intended specifically for this kind of thing. It just passes a parameter to a function, so pretty much does nothing. But its point is that it's an operator, so it's precedence is lower than a function call:
not <| line.Cobtains "abbr."
In order to have a better understanding of packrat I've tried to have a look at the provided implementation coming with the paper (I'm focusing on the bind):
instance Derivs d => Monad (Parser d) where
-- Sequencing combinator
(Parser p1) >>= f = Parser parse
where parse dvs = first (p1 dvs)
first (Parsed val rem err) =
let Parser p2 = f val
in second err (p2 rem)
first (NoParse err) = NoParse err
second err1 (Parsed val rem err) =
Parsed val rem (joinErrors err1 err)
second err1 (NoParse err) =
NoParse (joinErrors err1 err)
-- Result-producing combinator
return x = Parser (\dvs -> Parsed x dvs (nullError dvs))
-- Failure combinator
fail [] = Parser (\dvs -> NoParse (nullError dvs))
fail msg = Parser (\dvs -> NoParse (msgError (dvPos dvs) msg))
For me it looks like (errors handling aside) to parser combinators (such as this simplified version of Parsec):
bind :: Parser a -> (a -> Parser b) -> Parser b
bind p f = Parser $ \s -> concatMap (\(a, s') -> parse (f a) s') $ parse p s
I'm quite confused because before that I thought that the big difference was that packrat was a parser generator with a memoization part.
Sadly it seems that this concept is not used in this implementation.
What is the big difference between parser combinators and packrat at implementation level?
PS: I also have had a look at Papillon but it seems to be very different from the implementation coming with the paper.
The point here is really that this Packrat parser combinator library is not a full implementation of the Packrat algorithm, but more like a set of definitions that can be reused between different packrat parsers.
The real trick of the packrat algorithm (namely the memoization of parse results) happens elsewhere.
Look at the following code (taken from Ford's thesis):
data Derivs = Derivs {
dvAdditive :: Result Int,
dvMultitive :: Result Int,
dvPrimary :: Result Int,
dvDecimal :: Result Int,
dvChar :: Result Char}
pExpression :: Derivs -> Result ArithDerivs Int
Parser pExpression = (do char ’(’
l <- Parser dvExpression
char ’+’
r <- Parser dvExpression
char ’)’
return (l + r))
</> (do Parser dvDecimal)
Here, it's important to notice that the recursive call of the expression parser to itself is broken (in a kind of open-recursion fashion) by simply projecting the appropriate component of the Derivs structure.
This recursive knot is then tied in the "recursive tie-up function" (again taken from Ford's thesis):
parse :: String -> Derivs
parse s = d where
d = Derivs add mult prim dec chr
add = pAdditive d
mult = pMultitive d
prim = pPrimary d
dec = pDecimal d
chr = case s of
(c:s’) -> Parsed c (parse s’)
[] -> NoParse
These snippets are really where the packrat trick happens.
It's important to understand that this trick cannot be implemented in a standard way in a traditional parser combinator library (at least in a pure programming language like Haskell), because it needs to know the recursive structure of the grammar.
There are experimental approaches to parser combinator libraries that use a particular representation of the recursive structure of the grammar, and there it is possible to provide a standard implementation of Packrat.
For example, my own grammar-combinators library (not maintained atm, sorry) offers an implementation of Packrat.
As stated elsewhere, packrat is not an alternative to combinators, but is an implementation option in those parsers. Pyparsing is a combinator that offers an optional packrat optimization by calling ParserElement.enablePackrat(). Its implementation is almost a drop-in replacement for pyparsing's _parse method (renamed to _parseNoCache), with a _parseCache method.
Pyparsing uses a fixed-length FIFO queue for its cache, since packrat cache entries get stale once the combinator fully matches the cached expression and moves on through the input stream. A custom cache size can be passed as an integer argument to enablePackrat(), or if None is passed, the cache is unbounded. A packrat cache with the default value of 128 was only 2% less efficient than an unbounded cache against the supplied Verilog parser, with significant savings in memory.
The example code below appears to work nicely:
open FParsec
let capitalized : Parser<unit,unit> =(asciiUpper >>. many asciiLower >>. eof)
let inverted : Parser<unit,unit> =(asciiLower >>. many asciiUpper >>. eof)
let capsOrInvert =choice [capitalized;inverted]
You can then do:
run capsOrInvert "Dog";;
run capsOrInvert "dOG";;
and get a success or:
run capsOrInvert "dog";;
and get a failure.
Now that I have a ParserResult, how do I do things with it? For example, print the string backwards?
There are several notable issues with your code.
First off, as noticed in #scrwtp's answer, your parser returns unit. Here's why: operator (>>.) returns only the result returned by the right inner parser. On the other hand, (.>>) would return the result of a left parser, while (.>>.) would return a tuple of both left and right ones.
So, parser1 >>. parser2 >>. eof is essentially (parser1 >>. parser2) >>. eof.
The code in parens completely ignores the result of parser1, and the second (>>.) then ignores the entire result of the parser in parens. Finally, eof returns unit, and this value is being returned.
You may need some meaningful data returned instead, e.g. the parsed string. The easiest way is:
let capitalized = (asciiUpper .>>. many asciiLower .>> eof)
Mind the operators.
The code for inverted can be done in a similar manner.
This parser would be of type Parser<(char * char list), unit>, a tuple of first character and all the remaining ones, so you may need to merge them back. There are several ways to do that, here's one:
let mymerge (c1: char, cs: char list) = c1 :: cs // a simple cons
let pCapitalized = capitalized >>= mymerge
The beauty of this code is that your mymerge is a normal function, working with normal char's, it knows nothing about parsers or so. It just works with the data, and (>>=) operator does the rest.
Note, pCapitalized is also a parser, but it returns a single char list.
Nothing stops you from applying further transitions. As you mentioned printing the string backwards:
let pCapitalizedAndReversed =
capitalized
>>= mymerge
>>= List.rev
I have written the code in this way for purpose. In different lines you see a gradual transition of your domain data, still within the paradigm of Parser. This is an important consideration, because any subsequent transition may "decide" that the data is bad for some reason and raise a parsing exception, for example. Or, alternatively, it may be merged with other parser.
As soon as your domain data (a parsed-out word) is complete, you extract the result as mentioned in another answer.
A minor note. choice is superfluous for only two parsers. Use (<|>) instead. From experience, careful choosing parser combinators is important because a wrong choice deep inside your core parser logic can easily make your parsers dramatically slow.
See FParsec Primitives for further details.
ParserResult is a discriminated union. You simply match the Success and Failure cases.
let r = run capsOrInvert "Dog"
match r with
| Success(result, _, _) -> printfn "Success: %A" result
| Failure(errorMsg, _, _) -> printfn "Failure: %s" errorMsg
But this is probably not what you find tricky about your situation.
The thing about your Parser<unit, unit> type is that the parsed value is of type unit (the first type argument to Parser). What this means is that this parser doesn't really produce any sensible output for you to use - it can only tell you whether it can parse a string (in which case you get back a Success ((), _, _) - carrying the single value of type unit) or not.
What do you expect to get out of this parser?
Edit: This sounds close to what you want, or at least you should be able to pick up some pointers from it. capitalized accepts capitalized strings, inverted accepts capitalized strings that have been reversed and reverses them as part of the parser logic.
let reverse (s: string) =
System.String(Array.rev (Array.ofSeq s))
let capitalized : Parser<string,unit> =
(asciiUpper .>>. manyChars asciiLower)
|>> fun (upper, lower) -> string upper + lower
let inverted : Parser<string,unit> =
(manyChars asciiLower .>>. asciiUpper)
|>> fun (lower, upper) -> reverse (lower + string upper)
let capsOrInvert = choice [capitalized;inverted]
run capsOrInvert "Dog"
run capsOrInvert "doG"
run capsOrInvert "dog"
Using Parsec, I'm able to write a function of type String -> Maybe MyType with relative ease. I would now like to create a Read instance for my type based on that; however, I don't understand how readsPrec works or what it is supposed to do.
My best guess right now is that readsPrec is used to build a recursive parser from scratch to traverse a string, building up the desired datatype in Haskell. However, I already have a very robust parser who does that very thing for me. So how do I tell readsPrec to use my parser? What is the "operator precedence" parameter it takes, and what is it good for in my context?
If it helps, I've created a minimal example on Github. It contains a type, a parser, and a blank Read instance, and reflects quite well where I'm stuck.
(Background: The real parser is for Scheme.)
However, I already have a very robust parser who does that very thing for me.
It's actually not that robust, your parser has problems with superfluous parentheses, it won't parse
((1) (2))
for example, and it will throw an exception on some malformed inputs, because
singleP = Single . read <$> many digit
may use read "" :: Int.
That out of the way, the precedence argument is used to determine whether parentheses are necessary in some place, e.g. if you have
infixr 6 :+:
data a :+: b = a :+: b
data C = C Int
data D = D C
you don't need parentheses around a C 12 as an argument of (:+:), since the precedence of application is higher than that of (:+:), but you'd need parentheses around C 12 as an argument of D.
So you'd usually have something like
readsPrec p = needsParens (p >= precedenceLevel) someParser
where someParser parses a value from the input without enclosing parentheses, and needsParens True thing parses a thing between parentheses, while needsParens False thing parses a thing optionally enclosed in parentheses [you should always accept more parentheses than necessary, ((((((1)))))) should parse fine as an Int].
Since the readsPrec p parsers are used to parse parts of the input as parts of the value when reading lists, tuples etc., they must return not only the parsed value, but also the remaining part of the input.
With that, a simple way to transform a parsec parser to a readsPrec parser would be
withRemaining :: Parser a -> Parser (a, String)
withRemaining p = (,) <$> p <*> getInput
parsecToReadsPrec :: Parser a -> Int -> ReadS a
parsecToReadsPrec parsecParser prec input
= case parse (withremaining $ needsParens (prec >= threshold) parsecParser) "" input of
Left _ -> []
Right result -> [result]
If you're using GHC, it may however be preferable to use a ReadPrec / ReadP parser (built using Text.ParserCombinators.ReadP[rec]) instead of a parsec parser and define readPrec instead of readsPrec.
Given a LL(1) grammar what is an appropriate data structure or algorithm for producing an immutable concrete syntax tree in a functionally pure manner? Please feel free to write example code in whatever language you prefer.
My Idea
symbol : either a token or a node
result : success or failure
token : a lexical token from source text
value -> string : the value of the token
type -> integer : the named type code of the token
next -> token : reads the next token and keeps position of the previous token
back -> token : moves back to the previous position and re-reads the token
node : a node in the syntax tree
type -> integer : the named type code of the node
symbols -> linkedList : the symbols found at this node
append -> symbol -> node : appends the new symbol to a new copy of the node
Here is an idea I have been thinking about. The main issue here is handling syntax errors.
I mean I could stop at the first error but that doesn't seem right.
let program token =
sourceElements (node nodeType.program) token
let sourceElements node token =
let (n, r) = sourceElement (node.append (node nodeType.sourceElements)) token
match r with
| success -> (n, r)
| failure -> // ???
let sourceElement node token =
match token.value with
| "function" ->
functionDeclaration (node.append (node nodeType.sourceElement)) token
| _ ->
statement (node.append (node nodeType.sourceElement)) token
Please Note
I will be offering up a nice bounty to the best answer so don't feel rushed. Answers that simply post a link will have less weight over answers that show code or contain detailed explanations.
Final Note
I am really new to this kind of stuff so don't be afraid to call me a dimwit.
You want to parse something into an abstract syntax tree.
In the purely functional programming language Haskell, you can use parser combinators to express your grammar. Here an example that parses a tiny expression language:
EDIT Use monadic style to match Graham Hutton's book
-- import a library of *parser combinators*
import Parsimony
import Parsimony.Char
import Parsimony.Error
(+++) = (<|>)
-- abstract syntax tree
data Expr = I Int
| Add Expr Expr
| Mul Expr Expr
deriving (Eq,Show)
-- parse an expression
parseExpr :: String -> Either ParseError Expr
parseExpr = Parsimony.parse pExpr
where
-- grammar
pExpr :: Parser String Expr
pExpr = do
a <- pNumber +++ parentheses pExpr -- first argument
do
f <- pOp -- operation symbol
b <- pExpr -- second argument
return (f a b)
+++ return a -- or just the first argument
parentheses parser = do -- parse inside parentheses
string "("
x <- parser
string ")"
return x
pNumber = do -- parse an integer
digits <- many1 digit
return . I . read $ digits
pOp = -- parse an operation symbol
do string "+"
return Add
+++
do string "*"
return Mul
Here an example run:
*Main> parseExpr "(5*3)+1"
Right (Add (Mul (I 5) (I 3)) (I 1))
To learn more about parser combinators, see for example chapter 8 of Graham Hutton's book "Programming in Haskell" or chapter 16 of "Real World Haskell".
Many parser combinator library can be used with different token types, as you intend to do. Token streams are usually represented as lists of tokens [Token].
Definitely check out the monadic parser combinator approach; I've blogged about it in C# and in F#.
Eric Lippert's blog series on immutable binary trees may be helpful. Obviously, you need a tree which is not binary, but it will give you the general idea.