Parsing camel case strings with nom - parsing

I want to parse a string like "ParseThis" or "parseThis" into a vector of strings like ["Parse", "This"] or ["parse", "this"] using the nom crate.
All attempts I've tried do not return the expected result. It's possible that I don't understand yet how to use all the functions in nom.
I tried:
named!(camel_case<(&str)>,
map_res!(
take_till!(is_not_uppercase),
std::str::from_utf8));
named!(p_camel_case<&[u8], Vec<&str>>,
many0!(camel_case));
But p_camel_case just returns a Error(Many0) for parsing a string that starts with an uppercase letter and for parsing a string that starts with a lowercase letter it returns Done but with an empty string as a result.
How can I tell nom that I want to parse the string, separated by uppercase letters (given there can be a first uppercase or lowercase letter)?

You are looking for things that start with any character, followed by a number of non-uppercase letters. As a regex, that would look akin to .[a-z]*. Translated directly to nom, that's something like:
#[macro_use]
extern crate nom;
use nom::anychar;
fn is_uppercase(a: u8) -> bool { (a as char).is_uppercase() }
named!(char_and_more_char<()>, do_parse!(
anychar >>
take_till!(is_uppercase) >>
()
));
named!(camel_case<(&str)>, map_res!(recognize!(char_and_more_char), std::str::from_utf8));
named!(p_camel_case<&[u8], Vec<&str>>, many0!(camel_case));
fn main() {
println!("{:?}", p_camel_case(b"helloWorld"));
// Done([], ["hello", "World"])
println!("{:?}", p_camel_case(b"HelloWorld"));
// Done([], ["Hello", "World"])
}
Of course, you probably need to be careful about actually matching proper non-ASCII bytes, but you should be able to extend this in a straight-forward manner.

Related

Dart and Flutter: How can I substitute invisible control characters in a String with e.g. \n?

I download an XML file in my flutter app and convert it into Dart Objects that I later want to serialize with JSON. Since JSON does not accept any invisible carriage return characters, I am looking for a way to substitute those with \n.
From your question why don't you use dart String replaceAll method.
With a simple regExp you could replace all the return carriages.
You can pass a String to the jsonEncode() function from the dart:convert library, and it will automatically replace newlines with a \, n sequence (and will quote the string).
You can pass string to json by using jsonEncode() or jsonDecode(), and you might declare variable with var
import 'dart:convert';
void main() {
var string = {
'a': 'Indication\n',
'b': 'Indication\t',
'c': 1
};
var enCode = json.encode(string);
print(enCode); // {"a":Indication\n,"b":Indication\t,"c":1}
print(jsonDecode(enCode)); // {"a":Indication
// ,"b":Indication ,"c":3}
}

What's the best way to transform an Array of type Character to a String in Swift?

This question is specifically about converting an Array of type Character to a String. Converting an Array of Strings or numbers to a string is not the topic of discussion here.
In the following 2 lines, I would expect myStringFromArray to be set to "C,a,t!,🐱"
var myChars: [Character] = ["C", "a", "t", "!", "🐱"]
let myStringFromArray = myChars.joinWithSeparator(",");
However, I can't execute that code because the compiler complains about an "ambiguous reference to member joinWithSeparator".
So, two questions:
1) Apple says,
"Every instance of Swift’s Character type represents a single extended
grapheme cluster. An extended grapheme cluster is a sequence of one or
more Unicode scalars that (when combined) produce a single
human-readable character."
Which to me sounds at least homogeneous enough to think it would be reasonable to implement the joinWithSeparator method to support the Character type. So, does anyone have a good answer as to why they don't do that???
2) What's the best way to transform an Array of type Character to a String in Swift?
Note: if you don't want a separator between the characters, the solution would be:
let myStringFromArray = String(myChars)
and that would give you "Cat!🐱"
Which to me sounds at least homogeneous enough to think it would be reasonable to implement the joinWithSeparator method to support the Character type. So, does anyone have a good answer as to why they don't do that???
This may be an oversight in the design. This error occurs because there are two possible candidates for joinWithSeparator(_:). I suspect this ambiguity exists because of the way Swift can implicit interpret double quotes as either String or Character. In this context, it's ambiguous as to which to choose.
The first candidate is joinWithSeparator(_: String) -> String. It does what you're looking for.
If the separator is treated as a String, this candidate is picked, and the result would be: "C,a,t,!,🐱"
The second is joinWithSeparator<Separator : SequenceType where Separator.Generator.Element == Generator.Element.Generator.Element>(_: Separator) -> JoinSequence<Self>. It's called on a Sequence of Sequences, and given a Sequence as a seperator. The method signature is a bit of a mouthful, so lets break it down. The argument to this function is of Separator type. This Separator is constrained to be a SequenceType where the elements of the sequence (Seperator.Generator.Element) must have the same type as the elements of this sequence of sequences (Generator.Element.Generator.Element).
The point of that complex constraint is to ensure that the Sequence remains homogeneous. You can't join sequences of Int with sequences of Double, for example.
If the separator is treated as a Character, this candidate is picked, the result would be: ["C", ",", "a", ",", "t", ",", "!", ",", "🐱"]
The compiler throws an error to ensure you're aware that there's an ambiguity. Otherwise, the program might behave differently than you'd expect.
You can disambiguate this situation by this by explicitly making each Character into a String. Because String is NOT a SequenceType, the #2 candidate is no longer possible.
var myChars: [Character] = ["C", "a", "t", "!", "🐱"]
var anotherVar = myChars.map(String.init).joinWithSeparator(",")
print(anotherVar) //C,a,t,!,🐱
This answer assumes Swift 2.2.
var myChars: [Character] = ["C", "a", "t", "!", "🐱"]
var myStrings = myChars.map({String($0)})
var result = myStrings.joinWithSeparator(",")
joinWithSeparator is only available on String arrays:
extension SequenceType where Generator.Element == String {
/// Interpose the `separator` between elements of `self`, then concatenate
/// the result. For example:
///
/// ["foo", "bar", "baz"].joinWithSeparator("-|-") // "foo-|-bar-|-baz"
#warn_unused_result
public func joinWithSeparator(separator: String) -> String
}
You could create a new extension to support Characters:
extension SequenceType where Generator.Element == Character {
#warn_unused_result
public func joinWithSeparator(separator: String) -> String {
var str = ""
self.enumerate().forEach({
str.append($1)
if let arr = self as? [Character], endIndex: Int = arr.endIndex {
if $0 < endIndex - 1 {
str.append(Character(separator))
}
}
})
return str
}
}
var myChars: [Character] = ["C", "a", "t", "!", "🐱"]
let charStr = myChars.joinWithSeparator(",") // "C,a,t,!,🐱"
Related discussion on Code Review.SE.
Context: Swift3(beta)
TL;DR Goofy Solution
var myChars:[Character] = ["C", "a", "t", "!", "🐱"]
let separators = repeatElement(Character("-"), count: myChars.count)
let zipped = zip(myChars, separators).lazy.flatMap { [$0, $1] }
let joined = String(zipped.dropLast())
Exposition
OK. This drove me nuts. In part because I got caught up in the join semantics. A join method is very useful, but when you back away from it's very specific (yet common) case of string concatenation, it's doing two things at once. It's splicing other elements in with the original sequence, and then it's flattening the 2 deep array of characters (array of strings) into one single array (string).
The OPs use of single characters in an Array sent my brain elsewhere. The answers given above are the simplest way to get what was desired. Convert the single characters to single character strings and then use the join method.
If you want to consider the two pieces separately though... We start with the original input:
var input:[Character] = ["C", "a", "t", "!", "🐱"]
Before we can splice our characters with separators, we need a collection of separators. In this case, we want a pseudo collection that is the same thing repeated again and again, without having to actually make any array with that many elements:
let separators = repeatElement(Character(","), count: myChars.count)
This returns a Repeated object (which oddly enough you cannot instantiate with a regular init method).
Now we want to splice/weave the original input with the separators:
let zipped = zip(myChars, separators).lazy.flatMap { [$0, $1] }
The zip function returns a Zip2Sequence(also curiously must be instantiated via free function rather than direct object reference). By itself, when enumerated the Zip2Sequence just enumerates paired tuples of (eachSequence1, eachSequence2). The flatMap expression turns that into a single series of alternating elements from the two sequences.
For large inputs, this would create a largish intermediary sequence, just to be soon thrown away. So we insert the lazy accessor in there which lets the transform only be computed on demand as we're accessing elements from it (think iterator).
Finally, we know we can make a String from just about any sort of Character sequence. So we just pass this directly to the String creation. We add a dropLast() to avoid the last comma being added.
let joined = String(zipped.dropLast())
The valuable thing about decomposing it this way (it's definitely more lines of code, so there had better be a redeeming value), is that we gain insight into a number of tools we could use to solve problems similar, but not identical, to join. For example, say we want the trailing comma? Joined isn't the answer. Suppose we want a non constant separator? Just rework the 2nd line. Etc...

How modify the word boundary that includes '-' as a word character

I'd like to capture a passcode that is between 6 and 8 digits long.
I'd like to match:
123-4567 and
12-34-56-78
And fail:
1234567890 and 123-456-7890
As it stands I'm using (\\b(?:\\d[-,\\h]?+){5,7}\\d\\b)
This successfully knocks back 1234567890, but gives a partial match on 123-456-7890. Is there a way for the word boundary to include hyphens within it's count?
You can use lookarounds:
(?<!-)\b\d(?:[-,\h]?\d){5,7}(?!-)\b
See the regex demo
Swift regex uses ICU flavor, so both the lookbehind and a lookahead will work. The (?<!-) lookbehind makes sure there is no - before the digit that starts a new word (or after a word boundary), and (?!-) lookahead makes sure there is no - after the 8th digit right at the word boundary.
Do not forget to use double backslashes.
As #AlanMoore suggests, the word boundaries and --lookarounds can be substituted with lookarounds (?<![\w-]) and (?![\w-]). This will make the regex a bit more efficient since there will be only one position to be checked once at the start and end:
(?<![\w-])\d(?:[-,\h]?\d){5,7}(?![\w-])
See another demo
Not an exact literal answer, but an alternative native Swift solution
enum CheckResult {
case Success(String), Failure
}
func checkPassCode(string : String) -> CheckResult
{
let filteredArray = string.characters.filter{ $0 != "-" }.map{ String($0) }
return (6...8).contains(filteredArray.count) ? .Success(filteredArray.joinWithSeparator("")) : .Failure
}
checkPassCode("123-4567") // Success(1234567)
checkPassCode("12-34-56-78") // Success(12345678)
checkPassCode("1234567890") // Failure
checkPassCode("123-456-7890") // Failure

String literal as argument for func within println?

Is there anyway to use a string literal as an argument to a function within a println statement.
func greetings(name: String) -> String {
return "Greetings \(name)!"
}
What I was trying to do: (I tried escaping the quotes around Earthling.)
println("OUTPUT: \(greetings("Earthling"))")
You can alternatively do this:
let name = "Earthling"
println("OUTPUT: \(greetings(name))")
And this works too:
println(greetings("Earthling"))
I tried escaping the quotes in the first example but with no luck, its not super important as its only a test, I was just curious if there was a way to do this, using a function call with a string literal as an argument within a print or println statement that contains other text.
From the Apple docs:
The expressions you write inside parentheses within an interpolated
string cannot contain an unescaped double quote (") or backslash (\),
and cannot contain a carriage return or line feed.
The problem is of course not with println but with the embedding of expressions with quotes in string literals.
Thus
let b = false
let s1 = b ? "is" : "isn't"
let s2 = "it \(b ? "is" : "isn't")" // won't compile
However NSLog as a one-liner'' works quite well here
NSLog("it %#", b ? "is" : "isn't")
Note %#, not %s. Try the latter in a playground to see why.

How do I compare two characters in Dart?

I want to compare two characters. Something like this:
if ('a' > 'b')
However, the above code is comparing two strings.
How do I do this in Dart?
Dart doesn't have a 'char' or 'character' type. You can get the UTF-16 character code from any point in a string, and compare that.
Use codeUnitAt to get the actual character code from a string.
if ('a'.codeUnitAt(0) > 'b'.codeUnitAt(0))
See the codeUnitAt docs: https://api.dartlang.org/docs/channels/stable/latest/dart_core/String.html#codeUnitAt
String in Dart implements the Comparable interface. You can use compareTo to compare them.
String a = 'a';
String b = 'b';
String c = 'a';
print('value: ${a.compareTo(b)}'); // prints "value: -1"
print('value: ${a.compareTo(c)}'); // prints "value: 0"
print('value: ${b.compareTo(a)}'); // prints "value: 1"

Resources