Haskell Parsing Recursion and Maybe - parsing

I'm trying to write a parser in Haskell.
This parser take a string (example: "abc def") in parameter and return a Maybe (String, String).
Maybe (String, String)
First String get characters while it's number or letter.
Second String get the rest
In this example, I want to return Maybe ("abc", " def").
parseString :: String -> Maybe (String, String)
parseString "" = Nothing
parseString expr = case isString expr of
Just (char, rest) -> fmap (char:) (parseString rest)
Nothing -> Just ("", expr)
isString return :
Maybe (Char, String) -> Char = first character, String = rest / Nothing if isn't a letter or digit.
The problem, I can not return the rest of my String in the maybe.

The issue seems to be in
fmap (char:) (parseString rest)
Now, (char:) is a function String -> String, so fmap (char:) becomes Maybe String -> Maybe String (or its generalization to another functor). However, parseString rest is not a Maybe String, it is a Maybe (String, String).
So, we need to adapt (char:) to work on the first component of that pair. I'd try
fmap (\(res,rest2) -> (char:res, rest2)) (parseString rest)
(By importing first from Data.Bifunctor or Control.Arrow, that can be written as fmap (first (char:)) (parseString rest), but that's not that important.)

Related

Best way to implement very simple F# function

I'm very new to F#, and functional programming, I started learning today! Is this is the best way to implement a function that returns if a string has a digit?
open System;
let stringHasDigit (str: String) =
not (String.forall(fun c -> (Char.IsDigit(c) = false)) str)
printfn "%b" (stringHasDigit "This string has 1 digits")
Look for functions in String before looking in Seq. They tend to be faster. This is twice as fast as using Seq.exists
let stringHasDigit (s: string) =
String.exists Char.IsDigit s
Btw, you don't need semicolon at the end of the open statement.
SInce string is a sequence of Chars, one can use functions from the Seq module:
let hasDigits (s: string) =
s |> Seq.exists Char.IsDigit

How to pick correct method overload for function composition?

Here is a simple composition of functions in F#
let composedFunction = System.Text.Encoding.UTF8.GetBytes >> Array.length
"test" |> composedFunction
Type inference correctly defines the type of composed function string -> int. But compiler cannot pick correct overload of System.Text.Encoding.UTF8.GetBytes method:
Error FS0041: A unique overload for method 'GetBytes' could not be
determined based on type information prior to this program point. A
type annotation may be needed. Candidates:
System.Text.Encoding.GetBytes(chars: char []) : byte [],
System.Text.Encoding.GetBytes(s: string) : byte []Blockquote
Is there any way to compose correct overload of System.Text.Encoding.UTF8.GetBytes which accepts string parameter?
Or course, I can do following
// declare function which calls correct overload and then use it for compostion
let getBytes (s: string) = System.Text.Encoding.UTF8.GetBytes s
let composedFunction = getBytes >> Array.length
// start composition with ugly lambda
let composedFunction =
(fun (s: string) -> s) >> System.Text.Encoding.UTF8.GetBytes >> Array.length
But I wonder if there is any way without additional function declarations to make the compiler pick right overload according to the inferred string -> int type of composed function?
You can always add annotations:
let composedFunction : string -> _ = System.Text.Encoding.UTF8.GetBytes >> Array.length
or
let composedFunction = (System.Text.Encoding.UTF8.GetBytes : string -> _) >> Array.length
As your example shows, .NET methods do not always compose well - I think the idiomatic approach in such situations is just to use the .NET style when you're dealing with .NET libraries (and use functional style when you're dealing with functional libraries).
In your specific case, I would just define a normal function with type annotation and get the length using the Length member rather than using the function:
let composedFunction (s:string) =
System.Text.Encoding.UTF8.GetBytes(s).Length
The existing answer shows how to get the composition to work with type annotations. Another trick you can do (which I would definitely not use in practice) is that you can add identity function on string to the composition to constrain the types:
let composedFunction = id<string> >> System.Text.Encoding.UTF8.GetBytes >> Array.length
It's fun that this works, but as I said, I would never actually use this, because a normal function as defined above is much easier to understand.

Parsing camel case strings with nom

I want to parse a string like "ParseThis" or "parseThis" into a vector of strings like ["Parse", "This"] or ["parse", "this"] using the nom crate.
All attempts I've tried do not return the expected result. It's possible that I don't understand yet how to use all the functions in nom.
I tried:
named!(camel_case<(&str)>,
map_res!(
take_till!(is_not_uppercase),
std::str::from_utf8));
named!(p_camel_case<&[u8], Vec<&str>>,
many0!(camel_case));
But p_camel_case just returns a Error(Many0) for parsing a string that starts with an uppercase letter and for parsing a string that starts with a lowercase letter it returns Done but with an empty string as a result.
How can I tell nom that I want to parse the string, separated by uppercase letters (given there can be a first uppercase or lowercase letter)?
You are looking for things that start with any character, followed by a number of non-uppercase letters. As a regex, that would look akin to .[a-z]*. Translated directly to nom, that's something like:
#[macro_use]
extern crate nom;
use nom::anychar;
fn is_uppercase(a: u8) -> bool { (a as char).is_uppercase() }
named!(char_and_more_char<()>, do_parse!(
anychar >>
take_till!(is_uppercase) >>
()
));
named!(camel_case<(&str)>, map_res!(recognize!(char_and_more_char), std::str::from_utf8));
named!(p_camel_case<&[u8], Vec<&str>>, many0!(camel_case));
fn main() {
println!("{:?}", p_camel_case(b"helloWorld"));
// Done([], ["hello", "World"])
println!("{:?}", p_camel_case(b"HelloWorld"));
// Done([], ["Hello", "World"])
}
Of course, you probably need to be careful about actually matching proper non-ASCII bytes, but you should be able to extend this in a straight-forward manner.

fparsec to parse sequence of string

I have an user input text like "abc,def,ghi". I want to parse it to get list of string as ["abc", "def"].
I tried
let str : Parser<_> = many1Chars (noneOf ",")
let listParser : Parser<_> = many (str);;
but it always give me the first item only ["abc"]. "Def" and others are not coming in result list
You're parsing up to the first comma, but not parsing the comma itself.
To parse a list of things separated by other things, use sepBy:
let comma = pstring ","
let listParser = sepBy str comma
If you need to parse "at least one", use sepBy1 instead.

fparsec parsing key value pairs with different data types

I'm trying to write a parser which can parse key value pairs which can vary on the value data type.
KEY1:1,2,3
KEY2:abc
KEY3:123
With the following code
open FParsec
type UserState = unit
type Parser<'t> = Parser<'t,UserState>
let str s = pstring s
let str_ws s = str s .>> spaces
let stringLiteral : Parser<_> = manyChars (noneOf "\">")
let numList : Parser<_> = sepBy1 (pint32) (str ",")
let parseHeader inner header = str header >>. str ":" >>. inner
let parseKvps =
let strHeader header = parseHeader stringLiteral header .>> newline
let numListHeader header = parseHeader numList header .>> newline
let numHeader header = parseHeader pint32 header .>> newline
let rest = parse {
let! key1 = numListHeader "KEY1"
let! key2 = strHeader "KEY2"
let! key3 = numHeader "KEY3"
return key1,key2,key3
}
rest
let kvps = "KEY1:1,2,3\nKEY2:abc\nKEY3:123"
run parseKvps kvps
Above gives the following error:
val it : ParserResult<(int32 list * string * int32),unit> =
Failure:
Error in Ln: 3 Col: 9
KEY3:123
^
Note: The error occurred at the end of the input stream.
Expecting: any char not in ‘">’ or newline
I think this has something to do with the numList parser because taking the first key out works as expected.
Appreciate any help! Thanks in advance!
The third parser fails because FParsec did not find a required \n at the end.
There are several options to fix the problem:
Make your data valid by adding \n to the stream:
let kvps = "KEY1:1,2,3\nKEY2:abc\nKEY3:123\n"
Downside: modifying a source stream is only good for testing, not for the real app.
Make .>> newline optional:
.>> (optional newline)
Downside: It may lead to errors when two keys at the same source line are attempted to parse.
Try using eof as a possible alternative to newline
Also, a side note. Your code seems to be very hard to support. Think what happens if the keys occur in a wrong order in the source string, or a new key needs to be added over the course of your app development.
Check this and this answers for more details.

Resources