Swift 4.2 extract substring using multiple characters are delimiter - ios

I'm new to Swift and after going through the Apple documentation and other sources is not clear for me how can I extract a substring using more than one character as delimiter. For example: I have a string which looks like:
A.1 value1
B.2 value2
E value3
C value4
and need to assign the values 1 - 4 to different variables.

• Possible solution:
1. Separate all the elements (separator: white space)
2. Iterate 2 by 2 and use a key/value system, like a Dictionary.
3. Read each values from the keys afterward
Step 1:
let string = "A.1 value1 B.2 value2 E value3 C value4"
let components = string.components(separatedBy: CharacterSet.whitespaces)
Step 2:
var dictionary: [String: String] = [:]
stride(from: 0, to: components.count - 1, by: 2).forEach({
dictionary[components[$0]] = components[$0+1]
})
or
let dictionary = stride(from: 0, to: components.count - 1, by: 2).reduce(into: [String: String]()) { (result, currentInt) in
result[components[currentInt]] = components[currentInt+1]
}
dictionary is ["A.1": "value1", "C": "value4", "E": "value3", "B.2": "value2"]
Inspiration for the stride(from:to:) that I rarely use.
Step 3:
let name = dictionary["A.1"]
let surname = dictionary["C"]
• Potential issues:
If you have:
let string = "A.1 value One B.2 value2 E value3 C value4"
You want "value One", and since there is a space, you'll get some issue because if will give a false result (since there is the separator).
You'll get: ["A.1": "value", "One": "B.2", "value2": "E", "value3": "C"] for dictionary.
So you could use instead a regex: A.1(.*)B.2(.*)E(.*)C(.*) (for instance).
let string = "A.1 value One B.2 value2 E value3 C value4"
let regex = try! NSRegularExpression(pattern: "A.1(.*)B.2(.*)E(.*)C(.*)", options: [])
regex.enumerateMatches(in: string, options: [], range: NSRange(location: 0, length: string.utf16.count)) { (result, flags, stop) in
guard let result = result,
let aValueRange = Range(result.range(at: 1), in: string),
let bValueRange = Range(result.range(at: 2), in: string),
let cValueRange = Range(result.range(at: 4), in: string),
let eValueRange = Range(result.range(at: 3), in: string) else { return }
let aValue = string[aValueRange].trimmingCharacters(in: CharacterSet.whitespaces)
print("aValue: \(aValue)")
let bValue = string[bValueRange].trimmingCharacters(in: CharacterSet.whitespaces)
print("bValue: \(bValue)")
let cValue = string[cValueRange].trimmingCharacters(in: CharacterSet.whitespaces)
print("cValue: \(cValue)")
let eValue = string[eValueRange].trimmingCharacters(in: CharacterSet.whitespaces)
print("eValue: \(eValue)")
}
Output:
$>aValue: value One
$>bValue: value2
$>cValue: value4
$>eValue: value3
Note that the trim could be inside the regex, but I don't especially like having too complex regexes.

I like regular expressions for this sort of thing.
I'm going to take you very literally and assume that the substrings to be found are preceded by "A.1", "B.2", "E", and "C", and are all preceded and followed by a space except for the last substring which is followed by the end of the original string. Moreover I'm going to assume very simple-mindedly that the delimiters such as "E" cannot appear in our string in any other way. Then we can capture each substring with an appropriate pattern:
let s = "A.1 harpo B.2 chico E zeppo C groucho"
let p1 = "^A\\.1 (.*) B\\.2 "
let p2 = " B\\.2 (.*) E "
let p3 = " E (.*) C "
let p4 = " C (.*)$"
let patts = [p1,p2,p3,p4]
var result = [String]()
for patt in patts {
let regex = try! NSRegularExpression(pattern: patt, options: [])
if let match = regex.firstMatch(in: s, options: [],
range: NSRange(s.startIndex..<s.endIndex, in: s)) {
let r = match.range(at: 1)
result.append((s as NSString).substring(with: r))
}
}
// result is now ["harpo", "chico", "zeppo", "groucho"]
We now have the four desired substrings extracted into an array, and dealing with them from there is trivial.
Observe that we make no assumptions about spaces. The above works perfectly well even if the target substrings contain spaces, because we are appealing only to the delimiters. For example, if the original string is
let s = "A.1 the rain B.2 in spain E stays mainly C in the plain"
then result is the array
["the rain", "in spain", "stays mainly", "in the plain"]
I should point out, however, that another way to do this sort of thing is to walk the original string with a Scanner. You might prefer this because regular expressions are not really needed here, and if you don't know regular expressions you'll find this kind of walk much clearer. So here it is rewritten to use a scanner. Note that we end up with four Optional NSString objects, because Scanner is actually an Objective-C Cocoa Foundation thing, but it isn't difficult to turn those into String objects as needed:
let s = "A.1 the rain B.2 in spain E stays mainly C in the plain"
let scan = Scanner(string: s)
scan.scanString("A.1 ", into: nil)
var r1 : NSString? = nil
scan.scanUpTo(" B.2 ", into: &r1)
scan.scanString("B.2 ", into: nil)
var r2 : NSString? = nil
scan.scanUpTo(" E ", into: &r2)
scan.scanString("E ", into: nil)
var r3 : NSString? = nil
scan.scanUpTo(" C ", into: &r3)
scan.scanString("C ", into: nil)
var r4 : NSString? =
(scan.string as NSString).substring(from: scan.scanLocation) as NSString
r1 // the rain
r2 // in spain
r3 // stays mainly
r4 // in the plain

Related

Convert placeholders such as %1$s to {x} in Swift

I'm parsing an XML doc (using XMLParser) and some of the values have php-like placeholders, e.g. %1$s, and I would like to convert those to {x-1}.
Examples:
%1$s ---> {0}
%2$s ---> {1}
I'm doing this in a seemingly hacky way, using regex:
But there must be a better implementation of this regex.
Consider a string:
let str = "lala fawesfgeksgjesk 3rf3f %1$s rk32mrk3mfa %2$s fafafczcxz %3$s czcz $#$##%## %4$s qqq %5$s"
Now we're going to extract the integer strings between strings % and $s:
let regex = try! NSRegularExpression(pattern: "(?<=%)[^$s]+")
let range = NSRange(location: 0, length: str.utf16.count)
let matches = regex.matches(in: str, options: [], range: range)
matches.map {
print(String(str[Range($0.range, in: str)!]))
}
Works quite fine. The issue is that the "4" value got mixed up because of the preceding random strings before the %4$s.
Prints:
1
2
3
## %4
5
Is there any better way to do this?
This might not be a very efficient (or swifty :)) way but it gets the job done. What it does is that it searches for a given reg ex and uses the matched substring to extract the numeric value and decrease it and then perform a simple replace between the substring and a newly constructed placeholder value. This is executed in a loop until no more matches are found.
let pattern = #"%(\d*)\$s"#
while let range = str.range(of: pattern, options: .regularExpression) {
let placeholder = str[range]
let number = placeholder.trimmingCharacters(in: CharacterSet(charactersIn: "0123456789.").inverted)
if let value = Int(number) {
str = str.replacingOccurrences(of: placeholder, with: "{\(value - 1)}")
}
}

How to put and sort word in NSCountedSet in swift?

I'm try to getting most duplicated word from string with this code.
let text = """
aa bb aa bb aa bb cc dd dd cc zz zz cc dd zz
"""
let words = text.unicodeScalars.split(omittingEmptySubsequences: true, whereSeparator: { !CharacterSet.alphanumerics.contains($0) })
.map { String($0) }
let wordSet = NSCountedSet(array: words)
let sorted = wordSet.sorted { wordSet.count(for: $0) > wordSet.count(for: $1) }
print(sorted.prefix(3))
result is
[cc, dd, aa]
Currently, it put all words, even it is a single charcter.
What I'm going to do is,
put a word to NSCountedSet which has more than one character.
if words in NSCountedSet have same count, sort it alphabetically.
(desired result is aa ,cc, dd)
And if it is possible..
omit parts of speech from the string, such as 'and, a how,of,to,it,in on, who '....etc
Let's consider this string:
let text = """
She was young the way an actual young person is young.
"""
You could use a linguistic tagger :
import NaturalLanguage
let options = NSLinguisticTagger.Options.omitWhitespace.rawValue
let tagger = NSLinguisticTagger(tagSchemes: NSLinguisticTagger.availableTagSchemes(forLanguage: "en"), options: Int(options))
To count the multiplicity of each word I'll be using a dictionary:
var dict = [String : Int]()
Let's define the accepted linguistic tags (you change these to your liking) :
let acceptedtags: Set = ["Verb", "Noun", "Adjective"]
Now let's parse the string, using the linguistic tagger :
let range = NSRange(location: 0, length: text.utf16.count)
tagger.string = text
tagger.enumerateTags(
in: range,
scheme: .nameTypeOrLexicalClass,
options: NSLinguisticTagger.Options(rawValue: options),
using: { tag, tokenRange, sentenceRange, stop in
guard let range = Range(tokenRange, in: text)
else { return }
let token = String(text[range]).lowercased()
if let tagValue = tag?.rawValue,
acceptedtags.contains(tagValue)
{
dict[token, default: 0] += 1
}
// print(String(describing: tag) + ": \(token)")
})
Now the dict has the desired words with their multiplicity
print("dict =", dict)
As you can see a Dictionary is an unoreded collection. Now let's introduce some law and order:
let ordered = dict.sorted {
($0.value, $1.key) > ($1.value, $0.key)
}
Now let's get the keys 🗝 only:
let mostFrequent = ordered.map { $0.key }
and print the three most frequent words :
print("top three =", mostFrequent.prefix(3))
To get the topmost frequent words, it would be more efficient to use a Heap (or a Trie) data structure, instead of having to hash every word, sort them all by frequency, and then prefixing. It should be a fun exercise 😉.

extract only words from sentence which includees numbers

i am using the TesseractOCR to read a receipt and i have managed to extract the text from the receipt line by line e.g
2 melon £3.00
1 lime £1.50
5 chicken wings £10.00
But now, for each line, i would like to extract the item name(melons, lime, chicken wings), then the integer and then the float all sepearately line by line. I have googled a lot and have written this in ruby using regex but cant figure out how to do it in swift. I have figured out the float and integer part just not the words only part.
a link to an answer already would be great or an answer. thanks for any help in advance.
If you have solved this using regex in Ruby, the solution in Swift is similar. First let's define some helper functions since NSRegularExpression still deals in NSRange units:
extension String {
var fullRange: NSRange {
return NSMakeRange(0, self.characters.count)
}
subscript(range: NSRange) -> String {
let startIndex = self.index(self.startIndex, offsetBy: range.location)
let endIndex = self.index(startIndex, offsetBy: range.length)
return self[startIndex..<endIndex]
}
}
And the code:
let text =
"2 melon £3.00\n" +
"1 lime £1.50\n" +
"5 chicken wings £10.00"
let regex = try! NSRegularExpression(pattern: "(\\d+)\\s+(.+?)\\s+£([\\d\\.]+)$", options: [.anchorsMatchLines])
regex.enumerateMatches(in: text, options: [], range: text.fullRange) { result, flag, stop in
if let result = result {
let r1 = result.rangeAt(1)
let r2 = result.rangeAt(2)
let r3 = result.rangeAt(3)
print("quantity = \(text[r1]), item = \(text[r2]), price = \(text[r3])")
}
}
use componentSeparatedByString
let a = "5 Chicken Wing"
let b = a.componentSeparatedByString(" ") //meaning space
let b0 = b[0] //5
let b1 = b[1] //Chicken
let b2 = b[2] //Wing

regex to get "words" delimited by space(s)?

What would the regex be (to be used in IOS, "NSRegularExpression") to get the "words" from a string delimited by a space(s), i.e. could be " ", or " ", or " " etc as the delimited.
So therefore:
"26:43:33 S 153:02:51 E"
Would give:
1-"26:43:33"
2-"S"
3-"153:02:51"
4-"E"
So therefore:
"26:43:33 S 153:02:51 E"
Would give:
1-"26:43:33"
2-"S"
3-"153:02:51"
4-"E"
So if you're going to use a regex for this, you want to look for all contiguous stretches of not-space. Like this:
let s = "26:43:33 S 153:02:51 E" as NSString
let pattern = "[^ ]+"
let reg = try! NSRegularExpression(pattern: pattern, options: [])
let matches = reg.matchesInString(s as String, options: [], range: NSMakeRange(0, s.length))
let result = matches.map {s.substringWithRange($0.range)}
// result is: ["26:43:33", "S", "153:02:51", "E"]
As an alternative to regex, I would suggest using the split method on your string.
let string = "26:43:33 S 153:02:51 E"
let words = string.characters.split { $0 == " " }.map { String($0) }
Because calling split on the characters property will return an array of Character types, we need to use the map method to convert them back to strings. map will perform a closure on each element of a collection. In this case we just use it to cast each element to a String

Swift sort NSArray

I have a problem with my sorting algorithm.
My NSArray (here vcd.signals.key) contains values of Strings for example:
"x [0]", "x [18]", "x [15]", "x [1]"...
When I try to sort this the result ends up in
"x [0]", "x [15]", "x [18]", "x [1]"
instead of:
"x [0]", "x [1]", "x [15]", "x [18]"
This is my code:
let sortedKeys = sorted(vcd.signals.keys) {
var val1 = $0 as! String
var val2 = $1 as! String
return val1 < val2
}
Any idea how I can fix this issue?
Your problem come associated with your comparison , for example see what happen when you compare the two following strings:
println("x [15]" < "x [1]") // true
This is because the default lexicography comparer goes character for character, position by position comparing ,and of course 5 in position 3 is less than ] in position 3:
println("5" < "]") // true
For the explained above you need to create you own comparer but , only compare for the numbers inside the [$0]. For achieve this I use regular expressions to match any numbers inside the brackets like in the following way:
func matchesForRegexInText(regex: String!, text: String!) -> [String] {
let regex = NSRegularExpression(pattern: regex,
options: nil, error: nil)!
let nsString = text as NSString
let results = regex.matchesInString(text,
options: nil, range: NSMakeRange(0, nsString.length))
as! [NSTextCheckingResult]
return map(results) { nsString.substringWithRange($0.range)}
}
var keysSorted = keys.sorted() {
var key1 = $0
var key2 = $1
var pattern = "([0-9]+)"
var m1 = self.matchesForRegexInText(pattern, text: key1)
var m2 = self.matchesForRegexInText(pattern, text: key2)
return m1[0] < m2[0]
}
In the above regular expression I assume that the numbers only appears inside the brackets and match any number inside the String, but feel free to change the regular expression if you want to achieve anything more. Then you achieve the following:
println(keysSorted) // [x [0], x [1], x [15], x [18]]
I hope this help you.
The issue you are running into is the closing brace character ']' comes after digits. This means that "18" is less than "1]". As long as all your strings share the form of "[digits]" then you can remove the closing brace, sort the strings, add the closing brace back to your final array. The code below works for Swift 2:
let arr = ["x [0]", "x [18]", "x [15]", "x [1]"]
let sorted = arr.map { $0.substringToIndex($0.endIndex.predecessor()) }.sort().map { $0 + "]" }
print(sorted)

Resources