Should variables and strings be treated differently in constructing a language? [closed]

Should variables and strings be treated differently in constructing a language? [closed] - parsing

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I'm currently working on a language and have come to an issue regarding variable declarations in that to check what value a variable is in the REPL, I must use quotation marks.
> x := 1
> "x"
1
The desired behaviour however would be the following:
> x := 1
> x
1
I have defined my ADT in the following way:
data S = Integer Integer | String String | Assign String S | Expr Add [S]
I can parse everything correctly.
parseString :: Parser HVal
parseString = char '"' *> many1 (noneOf "\"") <* char '"' >>= (return . String)
parseAssign :: Parser HVal
parseAssign = do
var <- many1 letter
spaces
string ":="
spaces
val <- try (parsersHVal)
spaces
return $ Assign var val
I think however that the problem is done to the evaluation functions.
evalHVal :: Env -> HVal -> IOThrowsError HVal
evalHVal env val#(Integer _) = return $ val
evalHVal env val#(String _) = return $ val
evalHVal env (String val) = getVar env val >>= \var -> evalHVal env var
If I keep the first line that evaluates a string, the following occurs in the REPL and I receive a warning that the second line is redundant:
> x := 1
> "x"
'x'
If I keep the second line however I get the behaviour as described from the beginning.
In both cases, quotations around the variable have to be placed in order to evaluate it. I recongise though that I use many1 letter rather than parseString in the function parseAssign. I have tried changing this to parseString but I obtain the same behaviour.
What confuses me the most however is that since everything is read in as a string, then why doesn't many1 letter require quotations in parseAssign like how parseString requires? I tried changing parseString to the following (many1 letter >>= (return . String)) but it neither assigns nor allows for the use of strings like before.

Should variables and strings be treated differently in constructing a language?
Yes
data S = ...
Should be:
data S = ... | Var String
What confuses me the most however is that since everything is read in as a string, then why doesn't many1 letter require quotations in parseAssign like how parseString requires?
That should be obvious. See the definition:
parseString = char '"' *> ...
The very first part of parseString clearly looks to parse char '"'.
The definition of parseAssign does not look for ".
I tried changing parseString to the following (many1 letter >>= (return . String)) but it neither assigns nor allows for the use of strings like before.
Well " is not a letter so it shouldn't/wouldn't allow for a quotation mark. More, it wouldn't "assign", whatever that verb means, because it lacks the Assign constructor along with all the other critical parts of parseAssign.

Variables and strings should be considered differently. Although they're both strings in the source language, they cannot be treated identically as they are distinct within the data model of the language.

Related

Parse letter or number with Parsec

I am trying to write a parser for strings such as x, A (i.e. single letters), 657 and 0 (i.e. integer positive numbers).
Here is the code I wrote.
import Text.Parsec
data Expression = String String | Number Int
value = letter <|> many1 digit
However I get the following error.
Couldn't match type ‘[Char]’ with ‘Char’
How to convert Char -> String inside the parser?
What should the type annotation be for value ?

letter parses just a single letter and returns a Char. You want to parse a String, namely [Char] (it's the same thing), so I guess you want to parse many letter?
But if you want to parse just a single letter as a String you can take advantage of the fact that Parsec _ _ has a Functor instance in order to map over its result and pack it in a list:
value :: Parsec s u String
value = fmap (:[]) letter <|> many1 digit
After the edit I guess you want to parse the Expression you have presented to us, so you will need some more fancy fmapping to wrap the results in proper constructors:
value :: Parsec s u Expression
value = fmap (String . (:[])) letter
<|> fmap (Number . read) (many1 digit)

Parse String to Datatype in Haskell

I'm taking a Haskell course at school, and I have to define a Logical Proposition datatype in Haskell. Everything so far Works fine (definition and functions), and i've declared it as an instance of Ord, Eq and show. The problem comes when I'm required to define a program which interacts with the user: I have to parse the input from the user into my datatype:
type Var = String
data FProp = V Var
| No FProp
| Y FProp FProp
| O FProp FProp
| Si FProp FProp
| Sii FProp FProp
where the formula: ¬q ^ p would be: (Y (No (V "q")) (V "p"))
I've been researching, and found that I can declare my datatype as an instance of Read.
Is this advisable? If it is, can I get some help in order to define the parsing method?

Not a complete answer, since this is a homework problem, but here are some hints.
The other answer suggested getLine followed by splitting at words. It sounds like you instead want something more like a conventional tokenizer, which would let you write things like:
(Y
(No (V q))
(V p))
Here’s one implementation that turns a string into tokens that are either a string of alphanumeric characters or a single, non-alphanumeric printable character. You would need to extend it to support quoted strings:
import Data.Char
type Token = String
tokenize :: String -> [Token]
{- Here, a token is either a string of alphanumeric characters, or else one
- non-spacing printable character, such as "(" or ")".
-}
tokenize [] = []
tokenize (x:xs) | isSpace x = tokenize xs
| not (isPrint x) = error $
"Invalid character " ++ show x ++ " in input."
| not (isAlphaNum x) = [x]:(tokenize xs)
| otherwise = let (token, rest) = span isAlphaNum (x:xs)
in token:(tokenize rest)
It turns the example into ["(","Y","(","No","(","V","q",")",")","(","V","p",")",")"]. Note that you have access to the entire repertoire of Unicode.
The main function that evaluates this interactively might look like:
main = interact ( unlines . map show . map evaluate . parse . tokenize )
Where parse turns a list of tokens into a list of ASTs and evaluate turns an AST into a printable expression.
As for implementing the parser, your language appears to have similar syntax to LISP, which is one of the simplest languages to parse; you don’t even need precedence rules. A recursive-descent parser could do it, and is probably the easiest to implement by hand. You can pattern-match on parse ("(":xs) =, but pattern-matching syntax can also implement lookahead very easily, for example parse ("(":x1:xs) = to look ahead one token.
If you’re calling the parser recursively, you would define a helper function that consumes only a single expression, and that has a type signature like :: [Token] -> (AST, [Token]). This lets you parse the inner expression, check that the next token is ")", and proceed with the parse. However, externally, you’ll want to consume all the tokens and return an AST or a list of them.
The stylish way to write a parser is with monadic parser combinators. (And maybe someone will post an example of one.) The industrial-strength solution would be a library like Parsec, but that’s probably overkill here. Still, parsing is (mostly!) a solved problem, and if you just want to get the assignment done on time, using a library off the shelf is a good idea.

the read part of a REPL interpreter typically looks like this
repl :: ForthState -> IO () -- parser definition
repl state
= do putStr "> " -- puts a > character to indicate it's waiting for input
input <- getLine -- this is what you're looking for, to read a line.
if input == "quit" -- allows user to quit the interpreter
then do putStrLn "Bye!"
return ()
else let (is, cs, d, output) = eval (words input) state -- your grammar definition is somewhere down the chain when eval is called on input
in do mapM_ putStrLn output
repl (is, cs, d, [])
main = do putStrLn "Welcome to your very own interpreter!"
repl initialForthState -- runs the parser, starting with read
your eval method will have various loops, stack manipulations, conditionals, etc to actually figure out what the user inputted. hope this helps you with at least the reading input part.

Why Parsec's sepBy stops and does not parse all elements?

I am trying to parse some comma separated string which may or may not contain a string with image dimensions. For example "hello world, 300x300, good bye world".
I've written the following little program:
import Text.Parsec
import qualified Text.Parsec.Text as PS
parseTestString :: Text -> [Maybe (Int, Int)]
parseTestString s = case parse dimensStringParser "" s of
Left _ -> [Nothing]
Right dimens -> dimens
dimensStringParser :: PS.Parser [Maybe (Int, Int)]
dimensStringParser = (optionMaybe dimensParser) `sepBy` (char ',')
dimensParser :: PS.Parser (Int, Int)
dimensParser = do
w <- many1 digit
char 'x'
h <- many1 digit
return (read w, read h)
main :: IO ()
main = do
print $ parseTestString "300x300,40x40,5x5"
print $ parseTestString "300x300,hello,5x5,6x6"
According to optionMaybe documentation, it returns Nothing if it can't parse, so I would expect to get this output:
[Just (300,300),Just (40,40),Just (5,5)]
[Just (300,300),Nothing, Just (5,5), Just (6,6)]
but instead I get:
[Just (300,300),Just (40,40),Just (5,5)]
[Just (300,300),Nothing]
I.e. parsing stops after first failure. So I have two questions:
Why does it behave this way?
How do I write a correct parser for this case?

In order to answer this question, it's handy to take a piece of paper, write down the input, and act as a dumb parser.
We start with "300x300,hello,5x5,6x6", our current parser is optionMaybe .... Does our dimensParser correctly parse the dimension? Let's check:
w <- many1 digit -- yes, "300"
char 'x' -- yes, "x"
h <- many1 digit -- yes, "300"
return (read w, read h) -- never fails
We've successfully parsed the first dimension. The next token is ,, so sepBy successfully parses that as well. Next, we try to parse "hello" and fail:
w <- many1 digit -- no. 'h' is not a digit. Stop
Next, sepBy tries to parse ,, but that's not possible, since the next token is a 'h', not a ,. Therefore, sepBy stops.
We haven't parsed all the input, but that's not actually necessary. You would get a proper error message if you've used
parse (dimensStringParser <* eof)
Either way, if you want to discard anything in the list that's not a dimension, you can use
dimensStringParser1 :: Parser (Maybe (Int, Int))
dimensStringParser1 = (Just <$> dimensParser) <|> (skipMany (noneOf ",") >> Nothing)
dimensStringParser = dimensStringParser1 `sepBy` char ','

I'd guess that optionMaybe dimensParser, when fed with input "hello,...", tries dimensParser. That fails, so optionMaybe returns success with Nothing, and consumes no portion of the input.
The last part is the crucial one: after Nothing is returned, the input string to be parsed is still "hello,...".
At that point sepBy tries to parse char ',', which fails. So, it deduces that the list is over, and terminates the output list, without consuming any more input.
If you want to skip other entities, you need a "consuming" parser that returns Nothing instead of optionMaybe. That parser, however, need to know how much to consume: in your case, until the comma.
Perhaps you need some like (untested)
( try (Just <$> dimensParser)
<|> (noneOf "," >> return Nothing))
`sepBy` char ','

Parsing multiple lines into a list of lists in Haskell

I am trying to parse a file that looks like:
a b c
f e d
I want to match each of the symbols in the line and parse everything into a list of lists such as:
[[A, B, C], [D, E, F]]
In order to do that I tried the following:
import Control.Monad
import Text.ParserCombinators.Parsec
import Text.ParserCombinators.Parsec.Language
import qualified Text.ParserCombinators.Parsec.Token as P
parserP :: Parser [[MyType]]
parserP = do
x <- rowP
xs <- many (newline >> rowP)
return (x : xs)
rowP :: Parser [MyType]
rowP = manyTill cellP $ void newline <|> eof
cellP :: Parser (Cell Color)
cellP = aP <|> bP <|> ... -- rest of the parsers, they all look very similar
aP :: Parser MyType
aP = symbol "a" >> return A
bP :: Parser MyType
bP = symbol "b" >> return B
lexer = P.makeTokenParser emptyDef
symbol = P.symbol lexer
But it fails to return multiple inner lists. Instead what I get is:
[[A, B, C, D, E, F]]
What am I doing wrong? I was expecting manyTill to parse cellP until the newline character, but that's not the case.

Parser combinators are overkill for something this simple. I'd use lines :: String -> [String] and words :: String -> [String] to break up the input and then map the individual tokens into MyTypes.
toMyType :: String -> Maybe MyType
toMyType "a" = Just A
toMyType "b" = Just B
toMyType "c" = Just C
toMyType _ = Nothing
parseMyType :: String -> Maybe [[MyType]]
parseMyType = traverse (traverse toMyType) . fmap words . lines

You're right that manyTill keeps parsing until a newline. But manyTill never gets to see the newline because cellP is too eager. cellP ends up calling P.symbol, whose documentation states
symbol :: String -> ParsecT s u m String
Lexeme parser symbol s parses string s and skips trailing white space.
The keyword there is 'white space'. It turns out, Parsec defines whitespace as being any character which satisfies isSpace, which includes newlines. So P.symbol is happily consuming the c, followed by the space and the newline, and then manyTill looks and doesn't see a newline because it's already been consumed.
If you want to drop the Parsec routine, go with Benjamin's solution. But if you're determined to stick with that, the basic idea is that you want to modify the language's whiteSpace field to correctly define whitespace to not be newlines. Something like
lexer = let lexer0 = P.makeTokenParser emptyDef
in lexer0 { whiteSpace = void $ many (oneOf " \t") }
That's pseudocode and probably won't work for your specific case, but the idea is there. You want to change the definition of whiteSpace to be whatever you want to define as whiteSpace, not what the system defines by default. Note that changing this will also break your comment syntax, if you have one defined, since whiteSpace was previously equipped to handle comments.
In short, Benjamin's answer is probably the best way to go. There's no real reason to use Parsec here. But it's also helpful to know why this particular solution didn't work: Parsec's default definition of a language wasn't designed to treat newlines with significance.

Haskell -- parser combinators keywords

I am working on building a parser in Haskell using parser combinators. I have an issue with parsing keywords such as "while", "true", "if" etc
So the issue I am facing is that after a keyword there is a requirement that there is a separator or whitespace, for example in the statement
if cond then stat1 else stat2 fi;x = 1
with this statement all keywords have either a space in front of them or a semi colon. However in different situations there can be different separators.
Currently I have implemented it as follows:
keyword :: String -> Parser String
keyword k = do
kword <- leadingWS (string k)
check (== ';') <|> check isSpace <|> check (== ',') <|> check (== ']')
junk
return word
however the problem with this keyword parser is that it will allow programs which have statements like if; cond then stat1 else stat2 fi
We tried passing in a (Char -> Bool) to keyword, which would then be passed to check. But this wouldn’t work because where we parse the keyword we don’t know what kind of separator is allowed.
I was wondering if I could have some help with this issue?

Don't try to handle the separators in keyword but you need to ensure that keyword "if" will not be confused with an identifier "iffy" (see comment by sepp2k).
keyword :: String -> Parser String
keyword k = leadingWS $ try (do string k
notFollowedBy alphanum)
Handling separators for statements would go like this:
statements = statement `sepBy` semi
statement = ifStatement <|> assignmentStatement <|> ...

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Should variables and strings be treated differently in constructing a language? [closed] - parsing

Variables and strings should be considered differently. Although they're both strings in the source language, they cannot be treated identically as they are distinct within the data model of the language.

Related

Parse letter or number with Parsec

Parse String to Datatype in Haskell

Why Parsec's sepBy stops and does not parse all elements?

Parsing multiple lines into a list of lists in Haskell

Haskell -- parser combinators keywords

Categories

Resources