How to convert string to a list of integers?

How to convert string to a list of integers? - parsing

I would like to be able to input a sequence of integers on one line, such as:
97, 128, 125, 17, 2
and have the Haskell program convert the input into a list of integers, such as:
[97, 128, 135, 17, 2]
so that I can do some math operations like zipWith(ing) the list with another list of integers. Having trouble with this. I tried using the read and words functions but I wasn't able to achieve the expected result. Any ideas?

One possible (again, quick'n'dirty) solution is to use read with the instance defined for lists, which expects strings in the format [item1, item2, item3...]:
convert :: String -> [Int]
convert s = read $ "[" ++ s ++ "]"
A more robust solution would be parsing with filter or similar (as shown in the other answer) or using a parsing library to do the job properly.

The problem with only using words is that the comma (,) will still be included.
A quick-and-dirty hack is probably to first map all characters instead of digits to a space:
import Data.Char(isDigit)
cnv x | isDigit x = x
| otherwise = ' '
and then use:
map read . words . map cnv :: Read b => [Char] -> [b]
demo
*Main> ((map read . words . map cnv) "97, 128, 125, 17, 2" :: [Int]
[97,128,125,17,2]
A potential problem is of course that you omit [A-z] characters, etc. Furthermore this approach is not the most efficient.
An advantage is that by using read all items that can be read are still candidates to process the stream of "words".
Why not filtering?
One can also use a filter evidently to obtain for instance only spaces and digits. For instance
map read . words . filter (\x -> isDigit x || isSpace x)
A potential problem is that it is possible that the numbers are not separated by spaces ( ), but only by commas (,), semi-colons (;), etc. Working with the above expression generates the correct result:
(map read . words . filter (\x -> isDigit x || isSpace x)) "97, 128, 125, 17, 2" :: [Int]
[97,128,125,17,2]
but
(map read . words . filter (\x -> isDigit x || isSpace x)) "97,128,125,17,2" :: [Int]
[97128125172]
doesn't.

The task you're specifying falls under the category of textual parsing. When facing such a problem the safe bet is to approach it with either the "parsec" or the "attoparsec" library. Those libraries provide APIs which abstract over parsing in a safe and composable (hence scalable) way.
Here's how you'd write the "attoparsec" parser for your task:
listOfInts :: Parser [Int]
listOfInts =
sepBy decimal separator
where
separator =
skipSpace *> char ',' *> skipSpace
Note that the provided implementation already allows you to parse a not well formed input, where the separator might have multiple or no spaces before and after the comma. Also note how simple it is to express this already complicated condition using such a parser.

Thank you all for your help. For my application, this seems to work well:
myInput <- getLine
123 23 345 23
(map read . words) myInput::[Int]
I was having a little trouble understanding why the parenthesis go where they do, but this seems to work also:
myInput <- getLine
234 34 235 465 34
map read $ words myInput::[Int]
Since I'm just using spaces to separate the numbers, I don't have to use the filter, but thanks for posting it because now I understand the syntax better.
Don

Related

How to parse a keyword that is also an operator

I am trying to parse the following code using parsec
for x = Int in [1, 2, 3]
print x + 1
The only part of the example that might be hard to understand is x = Int which means the variable x is defined as an Int. Syntactically Int here is an expression. It might just as well be replaced with a function call that returns a type.
So far I have been able to parse all the simple literals and operators. My problem now is that in this language in is a keyword as well as an operator and types (Int) are objects like any other (that can be in lists). E.g. the following code is perfectly valid and prints false
print (Int in [1, 2, 3])
So right now my parser parses for x = correctly and then it parses Int in [1, 2, 3] as ONE expression. How can I make the for parser grab the in instead of leaving it to the expression parser? I have a feeling that parsec has something like that built in, but I have no idea how to find it.
Edit: I changed the example to make more sense...
Edit: I have this difficulty in various places, the language is very complex. Another example is the else operator which returns it's second argument if it's first argument is null:
print (if true then (null else "hello") else "world")
# >> hello
print (if true then null else "hello" else "world")
# >> world

Thank you very much #talex and #n.m. for pointing me where I had to look. This is how I solved this specific problem:
I parameterized the expression parser (had to enable {-# LANGUAGE FlexibleContexts #-}) with a list of "eject" words and equally every relevant parser below it, specifically the binOperator parser
expression :: [String] -> MyParser AST
binOperator :: [String] -> MyParser AST
If one of the "eject"-words is encountered in the position of a binary operator, the binOperator parser fails (and with the chainl1 based parser that reads binary operations), thus leaving the "eject" word (in this case in) to the for parser to consume. This should work just as well with the if parser.
And I simply don't pass the eject words to the paren parser so there are no eject words recognized between ( and ) (and similar parsers like list).

Parse String to Datatype in Haskell

I'm taking a Haskell course at school, and I have to define a Logical Proposition datatype in Haskell. Everything so far Works fine (definition and functions), and i've declared it as an instance of Ord, Eq and show. The problem comes when I'm required to define a program which interacts with the user: I have to parse the input from the user into my datatype:
type Var = String
data FProp = V Var
| No FProp
| Y FProp FProp
| O FProp FProp
| Si FProp FProp
| Sii FProp FProp
where the formula: ¬q ^ p would be: (Y (No (V "q")) (V "p"))
I've been researching, and found that I can declare my datatype as an instance of Read.
Is this advisable? If it is, can I get some help in order to define the parsing method?

Not a complete answer, since this is a homework problem, but here are some hints.
The other answer suggested getLine followed by splitting at words. It sounds like you instead want something more like a conventional tokenizer, which would let you write things like:
(Y
(No (V q))
(V p))
Here’s one implementation that turns a string into tokens that are either a string of alphanumeric characters or a single, non-alphanumeric printable character. You would need to extend it to support quoted strings:
import Data.Char
type Token = String
tokenize :: String -> [Token]
{- Here, a token is either a string of alphanumeric characters, or else one
- non-spacing printable character, such as "(" or ")".
-}
tokenize [] = []
tokenize (x:xs) | isSpace x = tokenize xs
| not (isPrint x) = error $
"Invalid character " ++ show x ++ " in input."
| not (isAlphaNum x) = [x]:(tokenize xs)
| otherwise = let (token, rest) = span isAlphaNum (x:xs)
in token:(tokenize rest)
It turns the example into ["(","Y","(","No","(","V","q",")",")","(","V","p",")",")"]. Note that you have access to the entire repertoire of Unicode.
The main function that evaluates this interactively might look like:
main = interact ( unlines . map show . map evaluate . parse . tokenize )
Where parse turns a list of tokens into a list of ASTs and evaluate turns an AST into a printable expression.
As for implementing the parser, your language appears to have similar syntax to LISP, which is one of the simplest languages to parse; you don’t even need precedence rules. A recursive-descent parser could do it, and is probably the easiest to implement by hand. You can pattern-match on parse ("(":xs) =, but pattern-matching syntax can also implement lookahead very easily, for example parse ("(":x1:xs) = to look ahead one token.
If you’re calling the parser recursively, you would define a helper function that consumes only a single expression, and that has a type signature like :: [Token] -> (AST, [Token]). This lets you parse the inner expression, check that the next token is ")", and proceed with the parse. However, externally, you’ll want to consume all the tokens and return an AST or a list of them.
The stylish way to write a parser is with monadic parser combinators. (And maybe someone will post an example of one.) The industrial-strength solution would be a library like Parsec, but that’s probably overkill here. Still, parsing is (mostly!) a solved problem, and if you just want to get the assignment done on time, using a library off the shelf is a good idea.

the read part of a REPL interpreter typically looks like this
repl :: ForthState -> IO () -- parser definition
repl state
= do putStr "> " -- puts a > character to indicate it's waiting for input
input <- getLine -- this is what you're looking for, to read a line.
if input == "quit" -- allows user to quit the interpreter
then do putStrLn "Bye!"
return ()
else let (is, cs, d, output) = eval (words input) state -- your grammar definition is somewhere down the chain when eval is called on input
in do mapM_ putStrLn output
repl (is, cs, d, [])
main = do putStrLn "Welcome to your very own interpreter!"
repl initialForthState -- runs the parser, starting with read
your eval method will have various loops, stack manipulations, conditionals, etc to actually figure out what the user inputted. hope this helps you with at least the reading input part.

removing piped dot in string common lisp

my task is to parse that list
(100 30 5 . 50 6)
to number 135.56
format of input list is always the same
and I've wrote:
(reduce
'string-concat
(mapcar
(lambda (x) (remove #\0 x))
(mapcar
'write-to-string
l
)
)
)
and output I've "135|.|56"
and then read-from-string does'n read it, so...
have You any idea how I can do this parsing?
using or not code above

Your approach does not look particular robust. Also it is kind of difficult to understand what the input list is. Is the dot a symbol, like in |.|? The vertical bars are escaping the name, so that it does not collide with the built-in usage of the dot character in Lisp. It is used in dotted pairs, which stand for cons cells: (a . b).
If it is a symbol, then you can write the symbol without escaping to a string. First, with escaping:
CL-USER 5 > (write-to-string '|.|)
"\\."
Next, without:
CL-USER 6 > (princ-to-string '|.|)
"."

Your list (100 30 5 . 50 6) isn't a valid list structure in Common Lisp. A "dotted pair" must have only one element after the dot. If you want to know more about it, look at your favorite Common Lisp Book how lists are build from cons cells. (For example Peter Seibels "Practical Common Lisp")
So you cannot parse this string as a list as such - you need to have a pre-processing step.
(defun pre-processing (str)
(let ((idx (position #\. str)))
(list (read-from-string (concatenate 'string (subseq str 0 idx) ")"))
(read-from-string (concatenate 'string "(" (subseq str (1+ idx)))))))
This function splits your string in two lists that you can process the way you want to.
CL-USER 1 > (pre-processing "(100 30 5 . 50 6)")
((100 30 5) (50 6))

Using Parsec to write a Read instance

Using Parsec, I'm able to write a function of type String -> Maybe MyType with relative ease. I would now like to create a Read instance for my type based on that; however, I don't understand how readsPrec works or what it is supposed to do.
My best guess right now is that readsPrec is used to build a recursive parser from scratch to traverse a string, building up the desired datatype in Haskell. However, I already have a very robust parser who does that very thing for me. So how do I tell readsPrec to use my parser? What is the "operator precedence" parameter it takes, and what is it good for in my context?
If it helps, I've created a minimal example on Github. It contains a type, a parser, and a blank Read instance, and reflects quite well where I'm stuck.
(Background: The real parser is for Scheme.)

However, I already have a very robust parser who does that very thing for me.
It's actually not that robust, your parser has problems with superfluous parentheses, it won't parse
((1) (2))
for example, and it will throw an exception on some malformed inputs, because
singleP = Single . read <$> many digit
may use read "" :: Int.
That out of the way, the precedence argument is used to determine whether parentheses are necessary in some place, e.g. if you have
infixr 6 :+:
data a :+: b = a :+: b
data C = C Int
data D = D C
you don't need parentheses around a C 12 as an argument of (:+:), since the precedence of application is higher than that of (:+:), but you'd need parentheses around C 12 as an argument of D.
So you'd usually have something like
readsPrec p = needsParens (p >= precedenceLevel) someParser
where someParser parses a value from the input without enclosing parentheses, and needsParens True thing parses a thing between parentheses, while needsParens False thing parses a thing optionally enclosed in parentheses [you should always accept more parentheses than necessary, ((((((1)))))) should parse fine as an Int].
Since the readsPrec p parsers are used to parse parts of the input as parts of the value when reading lists, tuples etc., they must return not only the parsed value, but also the remaining part of the input.
With that, a simple way to transform a parsec parser to a readsPrec parser would be
withRemaining :: Parser a -> Parser (a, String)
withRemaining p = (,) <$> p <*> getInput
parsecToReadsPrec :: Parser a -> Int -> ReadS a
parsecToReadsPrec parsecParser prec input
= case parse (withremaining $ needsParens (prec >= threshold) parsecParser) "" input of
Left _ -> []
Right result -> [result]
If you're using GHC, it may however be preferable to use a ReadPrec / ReadP parser (built using Text.ParserCombinators.ReadP[rec]) instead of a parsec parser and define readPrec instead of readsPrec.

Parsing user input with reads in Haskell

I am trying to parse user entered string like "A12", into a Haskell tuple, like ('A', 12).
Here's what I have tried:
import Data.Maybe
type Pos = (Char, Int)
parse :: String -> Maybe Pos
parse u = do
(c, rest) <- (listToMaybe.reads) u
(r, _) <- (listToMaybe.reads) rest
return $ (c, r)
But this always returns Nothing. Why does this happen, and what is the correct way to parse this string? Since this is fairly simple, I'd like to avoid using Parsec or a similar advanced parsing library.
EDIT (to clarify):
Sample Input and Output:
"A12" gives Just ('A', 12)
"J5" gives Just ('J', 5)
"A" gives Nothing
"2324" gives Nothing

read is usually the opposite of show and they both generally use Haskell syntax to represent the given values. This means that since the Haskell syntax for characters uses single quotes, show on a character will add single quotes around it, and read will expect the single quotes to be there.
In other words, your function expects syntax like 'A' 42, and indeed it works if you try that:
> parse "'A' 42"
Just ('A',42)
For your format, I would instead use pattern matching for the first character and then reads for the rest, e.g. something like this:
parse :: String -> Maybe Pos
parse [] = Nothing
parse (c:rest) = do
(r, _) <- listToMaybe $ reads rest
return (c, r)

Do you have to use do notation? If not, the following function suits your needs. It's not pretty, but it gets the job done.
parse :: String -> Maybe Pos
parse (x:xs) = Just (x,read xs::Int)
I'm not sure what you consider "failing" and thus worth of a Nothing

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

How to convert string to a list of integers? - parsing

Related

How to parse a keyword that is also an operator

Parse String to Datatype in Haskell

removing piped dot in string common lisp

Using Parsec to write a Read instance

Parsing user input with reads in Haskell

Categories

Resources