Why unable to parse a string | Rust - parsing

I am making decimal to binary conversion program. Why am i unable to parse a string into a number? binary_vec[i] is a character. I applied to_string() method to convert it into a string because parse() doesn't apply to a character, but it still giving me an error.
use std::io;
fn main() {
let mut binary = String::new();
println!("Enter a decimal: ");
io::stdin().read_line(&mut binary)
.ok()
.expect("Couldn't read line");
println!("{}",to_decimal(binary));
}
fn to_decimal(mut binary_str:String) -> String {
let mut binary_no: u32 = binary_str.trim().parse().expect("invalid input");
if binary_no == 0 {
format!("{}",binary_no)
}
else {
let mut bits = String::new();
let mut binary_vec: Vec<char> = binary_str.chars().collect();
let mut result = 0;
let mut i = 0;
while i <=binary_str.len()-2{
let mut result = result + (binary_vec[i].to_string().parse::<u32>().unwrap() * 2^(i as u32));
i = i +1;
}
format!("{}",result)
}
}
Result:
Compiling sixteen v0.1.0 (C:\Users\Muhammad.3992348\Desktop\rust\hackathon\sixteen)
warning: unused variable: `bits`
--> src\main.rs:19:17
|
19 | let mut bits = String::new();
| ^^^^ help: consider prefixing with an underscore: `_bits`
|
= note: #[warn(unused_variables)] on by default
warning: unused variable: `result`
--> src\main.rs:24:21
|
24 | let mut result = result + (binary_vec[i].to_string().parse::<u32>().unwrap() * 2^(i as u32));
| ^^^^^^ help: consider prefixing with an underscore: `_result`
warning: variable does not need to be mutable
--> src\main.rs:13:15
|
13 | fn to_decimal(mut binary_str:String) -> String {
| ----^^^^^^^^^^
| |
| help: remove this `mut`
|
= note: #[warn(unused_mut)] on by default
warning: variable does not need to be mutable
--> src\main.rs:14:9
|
14 | let mut binary_no: u32 = binary_str.trim().parse().expect("invalid input");
| ----^^^^^^^^^
| |
| help: remove this `mut`
warning: variable does not need to be mutable
--> src\main.rs:19:13
|
19 | let mut bits = String::new();
| ----^^^^
| |
| help: remove this `mut`
warning: variable does not need to be mutable
--> src\main.rs:20:13
|
20 | let mut binary_vec: Vec<char> = binary_str.chars().collect();
| ----^^^^^^^^^^
| |
| help: remove this `mut`
warning: variable does not need to be mutable
--> src\main.rs:21:13
|
21 | let mut result = 0;
| ----^^^^^^
| |
| help: remove this `mut`
warning: variable does not need to be mutable
--> src\main.rs:24:17
|
24 | let mut result = result + (binary_vec[i].to_string().parse::<u32>().unwrap() * 2^(i as u32));
| ----^^^^^^
| |
| help: remove this `mut`
Finished dev [unoptimized + debuginfo] target(s) in 1.04s
Running `target\debug\sixteen.exe`
Enter a decimal:
100
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ParseIntError { kind: InvalidDigit }', src\libcore\result.rs:997:5
note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
error: process didn't exit successfully: `target\debug\sixteen.exe` (exit code: 101)
Edit:
I have input 100 and I expect to get 4 at output.

There are a lot of problems with the code as written. The good news is that you can fix most of the problems by listening to the compiler warnings!
--> src/main.rs:19:17
|
19 | let mut bits = String::new();
| ^^^^ help: consider prefixing with an underscore: `_bits`
|
= note: #[warn(unused_variables)] on by default
This tells you that the variable bits isn't used after this declaration. If you intend to use it later, you can just comment it out for now.
warning: unused variable: `result`
--> src/main.rs:24:21
|
24 | let mut result = result + (binary_vec[i].to_string().parse::<u32>().unwrap() * 2^(i as u32));
| ^^^^^^ help: consider prefixing with an underscore: `_result`
|
= note: #[warn(unused_variables)] on by default
This is the big one. This tells us that the variable result is unused. But wait! we're using it right here! Nope! Actually, by using let mut here, we're making a new variable and shadowing the old one. What you want is to instead overwrite the old value. Simply change let mut result = ... to result = ....
Now if we run the program again and input 100, we'll get an output of 5. This still doesn't seem right, but we still have a few warnings to fix, so let's come back to this.
warning: variable does not need to be mutable
--> src/main.rs:13:15
|
13 | fn to_decimal(mut binary_str:String) -> String {
| ----^^^^^^^^^^
| |
| help: remove this `mut`
|
= note: #[warn(unused_mut)] on by default
If we aren't going to mutate the input string, we shouldn't make it mutable. Just remove the mut. Same for the other two warnings (lines 14 and 20).
Alright! Now we can run the program without any warnings. However, there are some more advanced lints we can run using cargo clippy. If you don't have clippy installed yet, you can install it with rustup install component clippy.
Now we have a few more warnings to take care of.
warning: operator precedence can trip the unwary
--> src/main.rs:24:32
|
24 | result = result + (binary_vec[i].to_string().parse::<u32>().unwrap() * 2^(i as u32));
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: consider parenthesizing your expression: `(binary_vec[i].to_string().parse::<u32>().unwrap() * 2) ^ (i as u32)`
|
= note: #[warn(clippy::precedence)] on by default
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#precedence
Uh oh. Clippy is telling us that the precedence rules indicate that * is evaluated before ^. That's not what we'd expect from multiplication and exponents. It turns out that ^ is not the exponent operator. Instead, it's the bitwise xor. If we want powers of a number, we can use the pow method, so replace 2 ^ (i as u32) with 2.pow(i as u32). This will cause a compiler error with the message error[E0689]: can't call method `pow` on ambiguous numeric type `{integer}`. Ok. We can make the numeric type unambiguous using a suffix. Change it to 2_u32.pow(i as u32).
We can fix the other three warnings that Clippy gives us just by using the suggestion. After that, Clippy gives one more warning, which also is fixed with just the suggestion.
Before we continue, let's make our code a bit prettier by running cargo fmt. Finally, the code looks like this:
use std::io;
fn main() {
let mut binary = String::new();
println!("Enter a decimal: ");
io::stdin()
.read_line(&mut binary)
.expect("Couldn't read line");
println!("{}", to_decimal(binary));
}
fn to_decimal(binary_str: String) -> String {
let binary_no: u32 = binary_str.trim().parse().expect("invalid input");
if binary_no == 0 {
format!("{}", binary_no)
} else {
//let mut bits = String::new();
let binary_vec: Vec<char> = binary_str.chars().collect();
let mut result = 0;
let mut i = 0;
while i <= binary_str.len() - 2 {
result += binary_vec[i].to_string().parse::<u32>().unwrap() * 2_u32.pow(i as u32);
i += 1;
}
format!("{}", result)
}
}
We fixed two bugs and cleaned up the code only using compiler warnings and basic tools! Pretty good, eh? But now if we input 100, our output is 1. That isn't right. Let's add some debug statements to see if we can see what's going on.
result += dbg!(binary_vec[i].to_string().parse::<u32>().unwrap()) * dbg!(2_u32.pow(i as u32));
If we run it now, here's what we get.
Enter a decimal:
100
[src/main.rs:22] binary_vec[i].to_string().parse::<u32>().unwrap() = 1
[src/main.rs:22] 2u32.pow(i as u32) = 1
[src/main.rs:22] binary_vec[i].to_string().parse::<u32>().unwrap() = 0
[src/main.rs:22] 2u32.pow(i as u32) = 2
[src/main.rs:22] binary_vec[i].to_string().parse::<u32>().unwrap() = 0
[src/main.rs:22] 2u32.pow(i as u32) = 4
1
We can see that it's parsing the digits correctly: binary_vec[i].to_string().parse::<u32>().unwrap() is 1, then 0, then 0. The powers of 2 look fine too. The problem here is that it's backwards! We want the left digit to be multiplied with the highest power of 2. Let me note that this is the part of the code where you're the least idiomatic. It's much better to use, at the very least, a for loop for this kind of thing. Iterators would be even better if you can manage it.
In any case, we can reverse the order by making our powers go from binary_str.len() - 2 down to 0 instead of the other way around. Thus, we can change the code to
while i <= binary_str.len() - 2 {
result += dbg!(binary_vec[i].to_string().parse::<u32>().unwrap()) * dbg!(2_u32.pow((binary_str.len() - 2 - i) as u32));
i += 1;
}
We get
Enter a decimal:
100
[src/main.rs:22] binary_vec[i].to_string().parse::<u32>().unwrap() = 1
[src/main.rs:22] 2u32.pow((binary_str.len() - 2 - i) as u32) = 4
[src/main.rs:22] binary_vec[i].to_string().parse::<u32>().unwrap() = 0
[src/main.rs:22] 2u32.pow((binary_str.len() - 2 - i) as u32) = 2
[src/main.rs:22] binary_vec[i].to_string().parse::<u32>().unwrap() = 0
[src/main.rs:22] 2u32.pow((binary_str.len() - 2 - i) as u32) = 1
4
so we see that the powers are correctly reversed from 4 down to 1. And our answer is finally correct! Test it on a few more inputs and try to find some edge cases. There are still some more bugs to find. Once you're happy with your code, take out the debug statements.

I would usually try to fix the code, but this one just seems way to convoluted.
Here is a simpler solution:
use std::io;
fn main() {
let mut binary = String::new();
println!("Enter a decimal: ");
io::stdin().read_line(&mut binary)
.expect("Couldn't read line");
match to_decimal(binary) {
Some(num) => println!("{}", num),
None => println!("Not a binary!")
}
}
fn to_decimal(binary_str:String) -> Option<i64> {
let binary_str = binary_str.trim();
let mut out_num = 0;
for c in binary_str.chars() {
match c {
'0' => out_num = out_num * 2,
'1' => out_num = out_num * 2 + 1,
_ => return None
}
}
Some(out_num)
}

Related

Using FParsec to parse possibly malformed input

I'm writing a parser for a specific file format using FParsec as a firstish foaray into learning fsharp. Part of the file has the following format
{ 123 456 789 333 }
Where the numbers in the brackets are pairs of values and there can be an arbitrary number of spaces to separate them. So these would also be valid things to parse:
{ 22 456 7 333 }
And of course the content of the brackets might be empty, i.e. {}
In addition I want the parser to be able to handle the case where the content is a bit malformed, eg. { some descriptive text } or maybe more likely { 12 3 4} (invalid since the 4 wouldn't be paired with anything). In this case I just want the contents saved to be processed separately.
I have this so far:
type DimNummer = int
type ObjektNummer = int
type DimObjektPair = DimNummer * ObjektNummer
type ObjektListResult = Result<DimObjektPair list, string>
let sieObjektLista =
let pnum = numberLiteral NumberLiteralOptions.None "dimOrObj"
let ws = spaces
let pobj = pnum .>> ws |>> fun x ->
let on: ObjektNummer = int x.String
on
let pdim = pnum |>> fun x ->
let dim: DimNummer = int x.String
dim
let pdimObj = (pdim .>> spaces1) .>>. pobj |>> DimObjektPair
let toObjektLista(objList:list<DimObjektPair>) =
let res: ObjektListResult = Result.Ok objList
res
let pdimObjs = sepBy pdimObj spaces1
let validList = pdimObjs |>> toObjektLista
let toInvalid(str:string) =
let res: ObjektListResult =
match str.Trim(' ') with
| "" -> Result.Ok []
| _ -> Result.Error str
res
let invalidList = manyChars anyChar |>> toInvalid
let pres = between (pchar '{') (pchar '}') (ws >>. (validList <|> invalidList) .>> ws)
pres
let parseSieObjektLista = run sieObjektLista
However running this on a valid sample I get an error:
{ 53735 7785 86231 36732 }
^
Expecting: whitespace or '}'
You're trying to consume too many spaces.
Look: pdimObj is a pdim, followed by some spaces, followed by pobj, which is itself a pnum followed by some spaces. So if you look at the first part of the input:
{ 53735 7785 86231 36732 }
\___/\______/\__/\/
^ ^ ^ ^
| | | |
pnum | | |
^ spaces1 | |
| | ws
pdim pnum ^
^ ^ |
| \ /
| \ /
| \/
\ pobj
\ /
\________/
^
|
pdimObj
One can clearly see from here that pdimObj consumes everything up to 86231, including the space just before it. And therefore, when sepBy inside pdimObjs looks for the next separator (which is spaces1), it can't find any. So it fails.
The smallest way to fix this is to make pdimObjs use many instead of sepBy: since pobj already consumes trailing spaces, there is no need to also consume them in sepBy:
let pdimObjs = many pdimObj
But a cleaner way, in my opinion, would be to remove ws from pobj, because, intuitively, trailing spaces aren't part of the number representing your object (whatever that is), and instead handle possible trailing spaces in pdimObjs via sepEndBy:
let pobj = pnum |>> fun x ->
let on: ObjektNummer = int x.String
on
...
let pdimObjs = sepEndBy pdimObj spaces1
The main problem here is in pdimObjs. The sepBy parser fails because the separator spaces following each number have already been consumed by pobj, so spaces1 cannot succeed. Instead, I suggest you try this:
let pdimObjs = many pdimObj
Which gives the following result on your test input:
Success: Ok [(53735, 7785); (86231, 36732)]

How to construct a match expression

I am allowing a command-line parameter like this --10GB, where -- and GB are constant, but a number like 1, 10, or 100 could be substituted in between the constant values, like --5GB.
I could easily parse the start and end of the string with substr or written a command line parser, but wanted to use match instead. I am just not sure how to structure the match expression.
let GB1 = cvt_bytes_to_gb(int64(DiskFreeLevels.GB1))
let arg = argv.[0]
let match_head = "--"
let match_tail = "GB"
let parse_min_gb_arg arg =
match arg with
| match_head & match_tail -> cvt_gb_arg_to_int arg
| _ -> volLib.GB1
I get a warning saying _ This rule will never be matched. How should the what is an AND expression be constructed?
You can't match on strings, except matching on the whole value, e.g. match s with | "1" -> 1 | "2" -> 2 ...
Parsing beginning and end would be the most efficient way to do this, there is no need to get clever (this, by the way, is a universally true statement).
But if you really want to use pattern matching, it is definitely possible to do, but you'll have to make yourself some custom matchers (also known as "active patterns").
First, make a custom matcher that would parse out the "middle" part of the string surrounded by prefix and suffix:
let (|StrBetween|_|) starts ends (str: string) =
if str.StartsWith starts && str.EndsWith ends then
Some (str.Substring(starts.Length, str.Length - ends.Length - starts.Length))
else
None
Usage:
let x = match "abcd" with
| StrBetween "a" "d" s -> s
| _ -> "nope"
// x = "bc"
Then make a custom matcher that would parse out an integer:
let (|Int|_|) (s: string) =
match System.Int32.TryParse s with
| true, i -> Some i
| _ -> None
Usage:
let x = match "15" with
| Int i -> i
| _ -> 0
// x = 15
Now, combine the two:
let x = match "--10GB" with
| StrBetween "--" "GB" (Int i) -> i
| _ -> volLib.GB1
// x = 10
This ability of patterns to combine and nest is their primary power: you get to build a complicated pattern out of small, easily understandable pieces, and have the compiler match it to the input. That's basically why it's called "pattern matching". :-)
The best I can come up with is using a partial active pattern:
let (|GbFormat|_|) (x:string) =
let prefix = "--"
let suffix = "GB"
if x.StartsWith(prefix) && x.EndsWith(suffix) then
let len = x.Length - prefix.Length - suffix.Length
Some(x.Substring(prefix.Length, len))
else
None
let parse_min_gb_arg arg =
match arg with
| GbFormat gb -> gb
| _ -> volLib.GB1
parse_min_gb_arg "--42GB"

F#, This value is not a function and the + operator

This code:
let extractTime line =
let result = Regex.Match(line, "(\d\d):(\d\d):(\d\d)")
let captureGroups = List.tail [for g in result.Groups -> g.Value]
match captureGroups with
| hrs::mins::secs::[] -> ((int hrs)*3600) + ((int mins)*60) +(int secs)
| _ -> 0
Gives the error: "This value is not a function and cannot be applied" on this part of the code
((int mins)*60)
I bashed my head on it for a while, then added a space here
+ (int secs)
Taking out the spaces altogether also makes the error go away. e.g.
((int hrs)*3600)+((int mins)*60)+(int secs)
I don't understand why that space makes a difference. Can someone please explain it to me.
It's because +(int secs) is interpreted as a value (+ being a unary operator), in which case ((int mins)*60) would need to be a function, for it to make sense. When there is no space, the following + can't possibly be interpreted as anything else than the binary operator. You can compare it with the following simple lines:
> 1 + 2;;
val it : int = 3
> 1 +2;;
1 +2;;
^
stdin(2,1): error FS0003: This value is not a function and cannot be applied
> 1 2;;
1 2;;
^
stdin(3,1): error FS0003: This value is not a function and cannot be applied
> 12;;
val it : int = 12

F# Develop a function counting the number of branches containing only "leaves"

I have expression - (cos(9**5)-cos(8*5))*(sin(3+1)**exp(6*6)).
I present this expression in type -
type common =
Exp of common*common
| Sin of common
| Cos of common
| Bin of common*string*common
| Digit of float
| Exponent of common
let expr = Bin(Bin(Cos(Exp(Digit(9.0),Digit(5.0))),"-",Cos(Bin(Digit(8.0),"*",Digit(5.0)))),"*",Exp(Sin(Bin(Digit(3.0),"+",Digit(1.0))),Exponent(Bin(Digit(6.0),"*",Digit(6.0)))));
I have function that calculate expression -
let rec evalf com =
match com with
Digit(x) -> x
|Exp(d1,d2) -> let dig1 = evalf(d1)
let dig2 = evalf(d2)
System.Math.Pow(dig1,dig2)
|Sin(d) -> let dig = evalf(d)
System.Math.Sin(dig)
|Cos(d) -> let dig = evalf(d)
System.Math.Cos(dig)
|Exponent(d) -> let dig = evalf(d)
System.Math.Exp(dig)
|Bin(d1,op,d2) -> let dig1 = evalf(d1)
let dig2 = evalf(d2)
match op with
| "*" -> dig1*dig2
| "+" -> dig1+dig2
| "-" -> dig1-dig2
I need develop a function counting the number of branches containing only "leaves". Please help.
If you defines "leaves" as digits, then to count the number of branches containing only "leaves" you would need to count the number of expressions that only reference digits.
This can be achieved with a recursive function similar to evalf, that returns 1 for branches with only "leaves"/digits and recurses for the non-digit cases e.g.
let rec count expr =
match expr with
| Expr(Digit(_),Digit(_) -> 1
| Expr(d1,d2) -> count d1 + count d2
| Sin(Digit(_)) -> 1
| Sin(d) -> count d
// ... for all cases
A similar technique can be used to simplify an expression tree, for example a binary operation (Bin) on 2 numbers could be matched and simplified to a single number. This might be used for example as a compiler optimization step.

Translate one term differently in one program

I try to make a frontend for a kind of programs... there are 2 particularities:
1) When we meet a string beginning with =, I want to read the rest of the string as a formula instead of a string value. For instance, "123", "TRUE", "TRUE+123" are considered having string as type, while "=123", "=TRUE", "=TRUE+123" are considered having Syntax.formula as type. By the way,
(* in syntax.ml *)
and expression =
| E_formula of formula
| E_string of string
...
and formula =
| F_int of int
| F_bool of bool
| F_Plus of formula * formula
| F_RC of rc
and rc =
| RC of int * int
2) Inside the formula, some strings are interpreted differently from outside. For instance, in a command R4C5 := 4, R4C5 which is actually a variable, is considered as a identifier, while in "=123+R4C5" which tries to be translated to a formula, R4C5 is translated as RC (4,5): rc.
So I don't know how to realize this with 1 or 2 lexers, and 1 or 2 parsers.
At the moment, I try to realize all in 1 lexer and 1 parser. Here is part of code, which doesn't work, it still considers R4C5 as identifier, instead of rc:
(* in lexer.mll *)
let begin_formula = double_quote "="
let end_formula = double_quote
let STRING = double_quote ([^ "=" ])* double_quote
rule token = parse
...
| begin_formula { BEGIN_FORMULA }
| 'R' { R }
| 'C' { C }
| end_formula { END_FORMULA }
| lex_identifier as li
{ try Hashtbl.find keyword_table (lowercase li)
with Not_found -> IDENTIFIER li }
| STRING as s { STRING s }
...
(* in parser.mly *)
expression:
| BEGIN_FORMULA f = formula END_FORMULA { E_formula f }
| s = STRING { E_string s }
...
formula:
| i = INTEGER { F_int i }
| b = BOOL { F_bool b }
| f0 = formula PLUS f1 = formula { F_Plus (f0, f1) }
| rc { F_RC $1 }
rc:
| R i0 = INTEGER C i1 = INTEGER { RC (i0, i1) }
Could anyone help?
New idea: I am thinking of sticking on 1 lexer + 1 parser, and create a entrypoint for formula in lexer as what we do normally for comment... here are some updates in lexer.mll and parser.mly:
(* in lexer.mll *)
rule token = parse
...
| begin_formula { formula lexbuf }
...
| INTEGER as i { INTEGER (int_of_string i) }
| '+' { PLUS }
...
and formula = parse
| end_formula { token lexbuf }
| INTEGER as i { INTEGER_F (int_of_string i) }
| 'R' { R }
| 'C' { C }
| '+' { PLUS_F }
| _ { raise (Lexing_error ("unknown in formula")) }
(* in parser.mly *)
expression:
| formula { E_formula f }
...
formula:
| i = INTEGER_F { F_int i }
| f0 = formula PLUS_F f1 = formula { F_Plus (f0, f1) }
...
I have done some tests, for instance to parse "=R4", the problem is that it can parse well R, but it considers 4 as INTEGER instead of INTEGER_F, it seems that formula lexbuf needs to be added from time to time in the body of formula entrypoint (Though I don't understand why parsing in the body of token entrypoint works without always mentioning token lexbuf). I have tried several possibilities: | 'R' { R; formula lexbuf }, | 'R' { formula lexbuf; R }, etc. but it didn't work... ... Could anyone help?
I think the simplest choice would be to have two different lexers and two different parsers; call the lexer&parser for formulas from inside the global parser. After the fact you can see how much is shared between the two grammars, and factorize things when possible.

Resources