Rust - How to parse UTF-8 alphabetical characters in nom? - parsing

I am trying to parse character sequences of alphabetical characters, including german umlauts (ä ö ü) and other alphabetical characters from the UTF-8 charset.
This is the parser I tried first:
named!(
parse(&'a str) -> Self,
map!(
alpha1,
|s| Self { chars: s.into() }
)
);
But it only works for ASCII alphabetical characters (a-zA-Z).
I tried to perform the parsing char by char:
named!(
parse(&str) -> Self,
map!(
take_while1!(nom::AsChar::is_alpha),
|s| Self { chars: s.into() }
)
);
But this won't even parse "hello", but result in an Incomplete(Size(1)) error:
How do you parse UTF-8 alphabetical characters in nom?
A snippet from my code:
extern crate nom;
#[derive(PartialEq, Debug, Eq, Clone, Hash, Ord, PartialOrd)]
pub struct Word {
chars: String,
}
impl From<&str> for Word {
fn from(s: &str) -> Self {
Self {
chars: s.into(),
}
}
}
use nom::*;
impl Word {
named!(
parse(&str) -> Self,
map!(
take_while1!(nom::AsChar::is_alpha),
|s| Self { chars: s.into() }
)
);
}
#[test]
fn parse_word() {
let words = vec![
"hello",
"Hi",
"aha",
"Mathematik",
"mathematical",
"erfüllen"
];
for word in words {
assert_eq!(Word::parse(word).unwrap().1, Word::from(word));
}
}
When I run this test,
cargo test parse_word
I get:
thread panicked at 'called `Result::unwrap()` on an `Err` value: Incomplete(Size(1))', ...
I know that chars are already UTF-8 encoded in Rust (thank heavens, almighty), but it seems that the nom library is not behaving as I would expect. I am using nom 5.1.0

First nom 5 use function for parsing, I advice to use this form because error message are much better and the code is much cleaner.
You requierement is odd, you could just take the full input make it a string and over:
impl Word {
fn parse(input: &str) -> IResult<&str, Self> {
Ok((
&input[input.len()..],
Self {
chars: input.to_string(),
},
))
}
}
But I guess your purpose is to parse a word, so here a example of what you could do:
#[derive(PartialEq, Debug, Eq, Clone, Hash, Ord, PartialOrd)]
pub struct Word {
chars: String,
}
impl From<&str> for Word {
fn from(s: &str) -> Self {
Self { chars: s.into() }
}
}
use nom::{character::complete::*, combinator::*, multi::*, sequence::*, IResult};
impl Word {
fn parse(input: &str) -> IResult<&str, Self> {
let (input, word) =
delimited(space0, recognize(many1_count(none_of(" \t"))), space0)(input)?;
Ok((
input,
Self {
chars: word.to_string(),
},
))
}
}
#[test]
fn parse_word() {
let words = vec![
"hello",
" Hi",
"aha ",
" Mathematik ",
" mathematical",
"erfüllen ",
];
for word in words {
assert_eq!(Word::parse(word).unwrap().1, Word::from(word.trim()));
}
}
You could also make a custom function that use is_alphabetic() instead of none_of(" \t") but this require make a custom error for nom and is currently in my opinion very annoying to do.

On this Github Issue a fellow contributor quickly whipped up a library (nom-unicode) to handle this nicely:
use nom_unicode::complete::{alphanumeric1};
impl Word {
named!(
parse(&'a str) -> Self,
map!(
alphanumeric1,
|w| Self::new(w)
)
);
}

Related

Parsing an f64 variable into a usize variable in Rust

I have currently been dabbling in the Rust programming language and decided a good way to test my skills was to program an application that would find the median of any given list of numbers.
Eventually I got into the Final stretch of code and stumbled into a problem.
I needed to parse an f64 variable into a usize variable.
However, I don't know how to go about doing this (Wow what a surprise!).
Take a look at the second function, calc_med() in my code. The variable n2 is supposed to take n and parse it into a usize. The code is not finished yet, but if you can see any more problems with the code please let me know.
use std::io;
use std::sync::Mutex;
#[macro_use]
extern crate lazy_static;
lazy_static! {
static ref v1: Mutex<Vec<f64>> = Mutex::new(Vec::new());
}
fn main() {
loop {
println!("Enter: ");
let mut inp: String = String::new();
io::stdin().read_line(&mut inp).expect("Failure");
let upd_inp: f64 = match inp.trim().parse() {
Ok(num) => num,
Err(_) => if inp.trim() == String::from("q") {
break;
} else if inp.trim() == String::from("d"){
break
{
println!("Done!");
calc_med();
}
} else {
continue;
}
};
v1.lock().unwrap().push(upd_inp);
v1.lock().unwrap().sort_by(|a, b| a.partial_cmp(b).unwrap());
println!("{:?}", v1.lock().unwrap());
}
}
fn calc_med() { // FOR STACKOVERFLOW: THIS FUNCTION
let n: f64 = ((v1.lock().unwrap().len()) as f64 + 1.0) / 2.0;
let n2: usize = n.to_usize().expect("Failure");
let median: f64 = v1[n2];
println!("{}", median)
}

Leetcode 1249. Minimum Remove to Make Valid Parentheses

I need help in understanding the swift implementation to the problem below. The part I do not understand is the for loop; the if part of the loop appends the index of "(" to stack array I am not sure how the else if works to pop elements from the stack.Also with the final loop, what does it do?
*Given a string s of '(' , ')' and lowercase English characters.
Your task is to remove the minimum number of parentheses ( '(' or ')', in any positions ) so that the resulting parentheses string is valid and return any valid string.
Formally, a parentheses string is valid if and only if:
It is the empty string, contains only lowercase characters, or
It can be written as AB (A concatenated with B), where A and B are valid strings, or
It can be written as (A), where A is a valid string.*
func minRemoveToMakeValid(_ s: String) -> String {
var arraySrting = s.map({String($0)})
var stacks = [Int]()
for i in 0..<arraySrting.count{
if arraySrting[i] == "("{
stacks.append(i)
}
else if arraySrting[i] == ")" && stacks.popLast() == nil{
arraySrting[i] = ""
}
}
for stack in stacks {
arraySrting[stack] = ""
}
return arraySrting.joined()
}
Those variable names are spelled wrong and incorrect. This is the same thing with optional characters instead of the language-agnostic solution of using empty strings.
import Algorithms // `compacted()` is better than `compactMap { $0 }`
func minRemoveToMakeValid(_ s: String) -> String {
var characters: [Character?] = Array(s)
var unmatchedOpenParenthesisIndices: [Array.Index] = []
func removeUnpairedParenthesis(at index: Array.Index) {
characters[index] = nil
}
for (index, character) in characters.enumerated() {
switch character {
case "(":
unmatchedOpenParenthesisIndices.append(index)
case ")":
switch unmatchedOpenParenthesisIndices.popLast() {
case .some:
// This `)` was paired with a previous `(`.
break
case nil:
// This `)` was not.
removeUnpairedParenthesis(at: index)
}
default:
break
}
}
unmatchedOpenParenthesisIndices.forEach(removeUnpairedParenthesis)
return .init(characters.compacted())
}
You can just build up characters more directly though, and unmatchedOpenParenthesisIndices can be a Set.
func minRemoveToMakeValid(_ s: String) -> String {
var characters: [Character?] = []
var unmatchedOpenParenthesisIndices: Set<Int> = []
for (index, character) in s.enumerated() {
switch character {
case "(":
unmatchedOpenParenthesisIndices.insert(index)
case ")":
switch unmatchedOpenParenthesisIndices.popFirst() {
case .some:
// This `)` was paired with a previous `(`.
break
case nil:
// This `)` was not.
characters.append(nil)
continue
}
default:
break
}
characters.append(character)
}
return .init(
characters.enumerated().compactMap {
unmatchedOpenParenthesisIndices.contains($0.offset)
? nil
: $0.element
}
)
}

How to sort an array both numerically and alphabetically in Swift

Say I have an array:
var array = ["5C", "4D", "2H", "13S", "4C", "5H"]
How would I be able to sort this array so that the new array would have the last character sorted alphabetically, then the previous numerical values sorted numerically such as:
["4C", "5C", "4D", "2H", "5H", "13S"]
I am relatively new to coding in general and have a very basic grasp of syntax. Other searches have shown me how to sort numerically using the .sorted function and .ascendingOrder, but I couldn't find a solution that could sort both alphabetically and numerically.
EDIT:
My answer shows how to use sorted() to sort an array of strings into "numeric" order. It is not quite what the OP asked.
To the OP: You should accept vadian's answer. His was the first correct answer.
However, I spend some time in my answer explaining Swift closure syntax, so I am going to leave the answer.
You can use the array method sorted(), which takes a closure that compares pairs of objects and returns true if the first item should come first.
Then you can use the NSString method compare(options:) to do a "numeric" string comparison, where sequences of digits are treated as numbers inside the string.
Here is a working code snippet that will sort your array:
var array = ["5C", "4D", "2H", "13S", "4C", "5H"]
let sorted = array.sorted (by: { (first: String, second: String) -> Bool in
return first.compare(second, options: .numeric) == .orderedAscending
})
The function sorted() is a "higher order function`, or a function that takes another function as a parameter. For an array of strings, that function takes 2 strings, and returns a Bool. It actually takes a closure rather than a function, where a closure is an "anonymous function" (a function with no name.)
Adapting vadian's code that gives the CORRECT answer to my snippet, it would look like this:
var array = ["5C", "4D", "2H", "13S", "4C", "5H"]
let sorted = array.sorted (by: { (first: String, second: String) -> Bool in
if first.suffix(1) == second.suffix(1) {
return first.dropLast.compare(second, options: .numeric) == .orderedAscending
} else {
return first.suffix(1) < second.suffix(1)
}
})
You can rewrite the above with several shortcuts:
With a "trailing closure" you skip the () that contains the closure as a parameter and just provide the closure in braces after the function name.
You can skip the declaration of the parameters and return type of the closure, and skip the return statement:
let sorted = array.sorted { $0.compare($1, options: .numeric) == .orderedAscending }
For more complex code like vadian's that gives the correct answer, I suggest not using positional parameters like that. Using local variables like first and second make the code easier to read.
I suggest studying the chapter on Closures in Apple's Swift iBooks carefully until you understand the various ways that closures can be expressed and their different shortcut syntaxes. It's confusing at first, and using closures is fundamental to using Swift.
You have to write your own comparator which is pretty handy in Swift.
If the last character is the same sort the string without the last character numerically otherwise sort by the last character
let array = ["5C", "4D", "2H", "13S", "4C", "5H"]
let sortedArray = array.sorted { (str1, str2) -> Bool in
if str1.suffix(1) == str2.suffix(1) {
return str1.dropLast().localizedStandardCompare(str2.dropLast()) == .orderedAscending
} else {
return str1.suffix(1) < str2.suffix(1)
}
}
// ["4C", "5C", "4D", "2H", "5H", "13S"]
Welcome to StackOverflow!
What do these numbers represent? I would create a struct to model that "thing" (I'll call it Thing for now), and function that can parse a String into a Thing, like so:
struct Thing: Equatable { // FIXME: Name me something descriptive
let number: Int // FIXME: Name me something descriptive
let letter: Character // FIXME: Name me something descriptive
static func parse(from string: String) -> Thing? {
let numberSegment = string.prefix(while: { $0.isNumber })
guard !numberSegment.isEmpty,
let number = Int(numberSegment) else { return nil }
let letterSegement = string.drop(while: { $0.isNumber })
guard letterSegement.count == 1,
let letter = letterSegement.first else { return nil }
return Thing(number: number, letter: letter)
}
}
Then, you can just conform to Comparable, defining how you want things to be sorted, by defining the comparison operator <:
extension Thing: Comparable {
static func < (lhs: Thing, rhs: Thing) -> Bool {
return (lhs.letter, lhs.number) < (rhs.letter, rhs.number)
}
}
From there, it's just a matter of parsing all your strings into Things, and sorting them:
let array = ["5C", "4D", "2H", "13S", "4C", "5H"]
let things = array.map { Thing.parse(from: $0)! }
print("Before sorting:")
things.forEach { print("\t\($0)") }
let sortedThings = things.sorted()
print("\nAfter sorting:")
sortedThings.forEach { print("\t\($0)") }
Output:
Before sorting:
Thing(number: 5, letter: "C")
Thing(number: 4, letter: "D")
Thing(number: 2, letter: "H")
Thing(number: 13, letter: "S")
Thing(number: 4, letter: "C")
Thing(number: 5, letter: "H")
After sorting:
Thing(number: 4, letter: "C")
Thing(number: 5, letter: "C")
Thing(number: 4, letter: "D")
Thing(number: 2, letter: "H")
Thing(number: 5, letter: "H")
Thing(number: 13, letter: "S")
Welcome to StackOverflow!
this is my solution hope it works for you, I just organize first the numbers and next comparate with the alphabeth to create a new array:
var array = ["5C", "4D", "2H", "13S", "4C", "5H"]
array = array.sorted { $0.numbersValues < $1.numbersValues }
let str = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
var newArrray: [String] = []
for letter in str {
for value in array {
if value.lettersValues.hasPrefix(String(letter)) {
newArrray.append(value)
}
}
}
Don't forget include this helpers methods in your project
extension String {
var lettersValues: String {
return self.components(separatedBy: CharacterSet.decimalDigits).joined()
}
var numbersValues: String {
return self.filter { "0"..."9" ~= $0 }
}
}

How to parse an octal string as a float in Rust?

I need to take an octal string, such as "42.1", and get a float from it (34.125). What's the best way to do this in Rust? I see there previously was a from_str_radix function, but it's now removed.
use std::fmt;
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct ParseFloatError {
_private: (),
}
impl ParseFloatError {
fn new() -> ParseFloatError {
ParseFloatError { _private: () }
}
}
impl fmt::Display for ParseFloatError {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(f, "Could not parse float")
}
}
pub fn parse_float_radix(s: &str, radix: u32) -> Result<f64, ParseFloatError> {
let s2 = s.replace(".", "");
let i = i64::from_str_radix(&s2, radix).map_err(|_| ParseFloatError::new())?;
let count = s.split('.').count();
let fraction_len: usize;
match count {
0 => unreachable!(),
1 => fraction_len = 0,
2 => fraction_len = s.split('.').last().unwrap().len(),
_ => return Err(ParseFloatError::new()),
}
let f = (i as f64) / f64::from(radix).powi(fraction_len as i32);
Ok(f)
}
fn main() {
println!("{}", parse_float_radix("42.1", 8).unwrap());
}
It first parses the input as an integer and then divides it by radix^number_of_fractional_digits.
It doesn't support scientific notation or special values like infinity or NaN. It also fails if the intermediate integer overflows.
Since posting this question, a crate has appeared that solves this: lexical. Compiling with the radix feature enables a parse_radix function, which can parse strings into floats with radices from 2 to 36.

Swift: Remove specific characters of a string only at the beginning

i was looking for an answer but haven't found one yet, so:
For example: i have a string like "#blablub" and i want to remove the # at the beginning, i can just simply remove the first char. But, if i have a string with "#####bla#blub" and i only want to remove all # only at the beginning of the first string, i have no idea how to solve that.
My goal is to get a string like this "bla#blub", otherwise it would be to easy with replaceOccourencies...
I hope you can help.
Swift2
func ltrim(str: String, _ chars: Set<Character>) -> String {
if let index = str.characters.indexOf({!chars.contains($0)}) {
return str[index..<str.endIndex]
} else {
return ""
}
}
Swift3
func ltrim(_ str: String, _ chars: Set<Character>) -> String {
if let index = str.characters.index(where: {!chars.contains($0)}) {
return str[index..<str.endIndex]
} else {
return ""
}
}
Usage:
ltrim("#####bla#blub", ["#"]) //->"bla#blub"
var str = "###abc"
while str.hasPrefix("#") {
str.remove(at: str.startIndex)
}
print(str)
I recently built an extension to String that will "clean" a string from the start, end, or both, and allow you to specify a set of characters which you'd like to get rid of. Note that this will not remove characters from the interior of the String, but it would be relatively straightforward to extend it to do that. (NB built using Swift 2)
enum stringPosition {
case start
case end
case all
}
func trimCharacters(charactersToTrim: Set<Character>, usingStringPosition: stringPosition) -> String {
// Trims any characters in the specified set from the start, end or both ends of the string
guard self != "" else { return self } // Nothing to do
var outputString : String = self
if usingStringPosition == .end || usingStringPosition == .all {
// Remove the characters from the end of the string
while outputString.characters.last != nil && charactersToTrim.contains(outputString.characters.last!) {
outputString.removeAtIndex(outputString.endIndex.advancedBy(-1))
}
}
if usingStringPosition == .start || usingStringPosition == .all {
// Remove the characters from the start of the string
while outputString.characters.first != nil && charactersToTrim.contains(outputString.characters.first!) {
outputString.removeAtIndex(outputString.startIndex)
}
}
return outputString
}
A regex-less solution would be:
func removePrecedingPoundSigns(s: String) -> String {
for (index, char) in s.characters.enumerate() {
if char != "#" {
return s.substringFromIndex(s.startIndex.advancedBy(index))
}
}
return ""
}
A swift 3 extension starting from OOPer's response:
extension String {
func leftTrim(_ chars: Set<Character>) -> String {
if let index = self.characters.index(where: {!chars.contains($0)}) {
return self[index..<self.endIndex]
} else {
return ""
}
}
}
As Martin R already pointed out in a comment above, a regular expression is appropriate here:
myString.replacingOccurrences(of: #"^#+"#, with: "", options: .regularExpression)
You can replace the inner # with any symbol you're looking for, or you can get more complicated if you're looking for one of several characters or a group etc. The ^ indicates it's the start of the string (so you don't get matches for # symbols in the middle of the string) and the + represents "1 or more of the preceding character". (* is 0 or more but there's not much point in using that here.)
Note the outer hash symbols are to turn the string into a raw String so escaping is not needed (though I suppose there's nothing that actually needs to be escaped in this particular example).
To play around with regex I recommend: https://regexr.com/

Resources