I have a string [Desired Annual Income] /([Income per loan %] /100)
Using this string, I have to find two sub strings 'Desired Annual Income' and 'Income per loan %' in Swift3.
I am using below code to achieve this 'How do I get the substring between braces?':
let myString = "[Desired Annual Income] /([Income per loan %] /100)"
let start: NSRange = (myString as NSString).range(of: "[")
let end: NSRange = (myString as NSString).range(of: "]")
if start.location != NSNotFound && end.location != NSNotFound && end.location > start.location {
let result: String = (myString as NSString).substring(with: NSRange(location: start.location + 1, length: end.location - (start.location + 1)))
print(result)
}
But as an output I am getting only 'Desired Annual Income', How can I get all substrings?
Try this,
Hope it will work
let str = "[Desired Annual Income] /([Income per loan %] /100)"
let trimmedString = str.components(separatedBy: "]")
for i in 0..<trimmedString.count - 1{ // not considering last component since it's of no use hence count-1 times loop
print(trimmedString[i].components(separatedBy: "[").last ?? "")
}
Output:-
Desired Annual Income
Income per loan %
It's a very good use case for regular expressions (NSRegularExpression). The principle of regular expressions is to describe a "pattern" that you want to search in a string.
In that case you search something between two brackets.
The code is then:
let str = "[Desired Annual Income] /([Income per loan %] /100)"
if let regex = try? NSRegularExpression(pattern: "\\[(.+?)\\]", options: [.caseInsensitive]) {
var collectMatches: [String] = []
for match in regex.matches(in: str, options: [], range: NSRange(location: 0, length: (str as NSString).length)) {
// range at index 0: full match (including brackets)
// range at index 1: first capture group
let substring = (str as NSString).substring(with: match.range(at: 1))
collectMatches.append(substring)
}
print(collectMatches)
}
For the explanation about regular expressions, there are plenty of tutorial on internet. But in very short:
\\[ and \\]: opening and closing brackets characters (the double backslashes are because brackets have a meaning in regular expression, so you need to escape them. In a text editor one backslash is enough, but you need a second one because you are in a String and you need to escape the backslash to have a backslash.
(.+?) is a bit more complex: the parentheses are the "capture group", what you want to get. . means "any character", + one or more time, ? after a + is the greedy operator, which means that you want the capture to stop ASAP. If you don't put it, your capture can be in your case "Desired Annual Income] /([Income per loan %", depending of the regex library that you are using. Foundation seems to be greedy by default, that being said.
Regex are not always super easy/direct, but if you do often text processing, it's a very powerful tool to know.
Related
I need to make a function that returns me ranges of matching words in a given string, for example, given the sentence below:
Hey, bro! Your brother is also her brother.
I want to find an array of Range in the sentence that matches the word "bro", it should match the exact word (case insensitive), so "bro" should only match "bro" but not "brother".
I thought about:
split the sentence, e.g. "hey", "bro", "your", "brother", "is", "also", "her", "brother"
map each word to a word with range, e.g. "hey" would become ["hey", 0...2]
filter and map the word and range array, matching "bro"
Step 2 needs some treatment to make sure the range for each word (in the sentence) can be mapped to the right word, e.g. the first "brother" and second "brother" should have different ranges depending on where they are located.
Is there any smarter way of doing this?
Edit:
Sorry, I forgot to mention, the reason for not using Regex was that sometimes the word has a dot in it, for example:
there is orange in the basket.
from the above sentence, finding the string "or.ge" using regex would match "orange" as well.
I have tested in Playground, You can use this extension to get the values matching this reg ex.
extension String {
func ranges(of substring: String, options: CompareOptions = [], locale: Locale? = nil) -> [Range<Index>] {
var ranges: [Range<Index>] = []
while ranges.last.map({ $0.upperBound < self.endIndex }) ?? true,
let range = self.range(of: substring, options: options, range: (ranges.last?.upperBound ?? self.startIndex)..<self.endIndex, locale: locale)
{
ranges.append(range)
}
return ranges
}
}
let searchString = "bro"
var str = "Hey, bro! Your brother is also her brother."
var reg = str.ranges(of: "(?<![\\p{L}\\d])\(searchString)(?![\\p{L}\\d])", options: [.regularExpression, .caseInsensitive])
str.removeSubrange(reg.first!)
print(str)
Credits to,
iOS - regex to match word boundary, including underscore
One simple solution is to use regular expressions with \b to match “word boundaries”, e.g.
let searchString = "bro"
let sentence = "Hey, Bro! Your brother is also her brother."
let regex = try! NSRegularExpression(pattern: #"\b\#(searchString)\b"#, options: .caseInsensitive)
regex.enumerateMatches(in: sentence, range: NSRange(sentence.startIndex..., in: sentence)) { match, _, _ in
guard let match = match else { return }
print(match.range)
// or, if you want a String.Range
if let range = Range(match.range, in: sentence) {
print(sentence[range])
}
}
There are other richer API (e.g. the Natural Language framework), which, while not perfect, provide richer parsing of natural language text. For example, the below will differentiate between the verb “saw” and noun “saw”:
import NaturalLanguage
let text = "I saw the hammer. I did not see a saw."
let tagger = NLTagger(tagSchemes: [.lexicalClass])
tagger.string = text
let options: NLTagger.Options = [.omitWhitespace, .joinContractions]
tagger.enumerateTags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .lexicalClass, options: options) { tag, range in
guard let tag = tag else { return true }
print(tag, String(text[range]))
return true
}
Producing:
NLTag(_rawValue: Pronoun) I
NLTag(_rawValue: Verb) saw
NLTag(_rawValue: Determiner) the
NLTag(_rawValue: Noun) hammer
NLTag(_rawValue: SentenceTerminator) .
NLTag(_rawValue: Pronoun) I
NLTag(_rawValue: Verb) did
NLTag(_rawValue: Adverb) not
NLTag(_rawValue: Verb) see
NLTag(_rawValue: Determiner) a
NLTag(_rawValue: Noun) saw
NLTag(_rawValue: SentenceTerminator) .
I'm parsing an XML doc (using XMLParser) and some of the values have php-like placeholders, e.g. %1$s, and I would like to convert those to {x-1}.
Examples:
%1$s ---> {0}
%2$s ---> {1}
I'm doing this in a seemingly hacky way, using regex:
But there must be a better implementation of this regex.
Consider a string:
let str = "lala fawesfgeksgjesk 3rf3f %1$s rk32mrk3mfa %2$s fafafczcxz %3$s czcz $#$##%## %4$s qqq %5$s"
Now we're going to extract the integer strings between strings % and $s:
let regex = try! NSRegularExpression(pattern: "(?<=%)[^$s]+")
let range = NSRange(location: 0, length: str.utf16.count)
let matches = regex.matches(in: str, options: [], range: range)
matches.map {
print(String(str[Range($0.range, in: str)!]))
}
Works quite fine. The issue is that the "4" value got mixed up because of the preceding random strings before the %4$s.
Prints:
1
2
3
## %4
5
Is there any better way to do this?
This might not be a very efficient (or swifty :)) way but it gets the job done. What it does is that it searches for a given reg ex and uses the matched substring to extract the numeric value and decrease it and then perform a simple replace between the substring and a newly constructed placeholder value. This is executed in a loop until no more matches are found.
let pattern = #"%(\d*)\$s"#
while let range = str.range(of: pattern, options: .regularExpression) {
let placeholder = str[range]
let number = placeholder.trimmingCharacters(in: CharacterSet(charactersIn: "0123456789.").inverted)
if let value = Int(number) {
str = str.replacingOccurrences(of: placeholder, with: "{\(value - 1)}")
}
}
Im trying to replace matched strings using regex in swift, my requirement is as below
originalString = "It is live now at Germany(DE)"
i want the string within the (" ") i.eDE to be separated by space i.e. "D E"
so replacedString should be "It is live now at Germany(D E)"
i tried below code
var value: NSMutableString = "It is live now at Germany(DE)"
let pattern = "(\\([A-Za-z ]+\\))"
let regex = try? NSRegularExpression(pattern: pattern)
regex?.replaceMatches(in: value, options: .reportProgress, range:
NSRange(location: 0,length: value.length), withTemplate: " $1 ")
print(value)
output is It is live now at Germany (DE), i know it's not what is required.
here it is based on the template where we cannot modify based on matched string value. Is there any way to achieve this ?
Thanks in advance
You may use
var value: NSMutableString = "It is live now at Germany(DE) or (SOFE)"
let pattern = "(?<=\\G(?<!\\A)|\\()[A-Za-z](?=[A-Za-z]+\\))"
let regex = try? NSRegularExpression(pattern: pattern)
regex?.replaceMatches(in: value, options: .reportProgress, range: NSRange(location: 0,length: value.length), withTemplate: "$0 ")
print(value)
Or just
let val = "It is live now at Germany(DE) or (SOFE)"
let pattern = "(?<=\\G(?<!\\A)|\\()[A-Za-z](?=[A-Za-z]+\\))"
print( val.replacingOccurrences(of: pattern, with: "$0 ", options: .regularExpression, range: nil) )
Output: It is live now at Germany(D E) or (S O F E)
Pattern details
(?<=\\G(?<!\\A)|\\() - a positive lookbehind that matches a location right after ( or at the end of the preceding successful match
[A-Za-z] - matches and consumes any ASCII letter
(?=[A-Za-z]+\\)) - a positive lookahead that matches a location that is immediately followed with 1+ ASCII letters and then a ) char.
The $0 in the replacement inserts the whole match value back into the resulting string.
A string:
"jim#domain.com, bill#domain.com, chad#domain.com, tom#domain.com"
Through gesture recognizer, I am able to get the character the user tapped on (happy to provide code, but don't see the relevance at this point).
Let's say the User tapped on o in "chad#domain.com" and the character index is 39
Given 39 the index of o, I would like to get the string start index of c where "chad#domain.com" begins, and an end index for m from "com" where "chad#domain.com" ends.
In another words, given an index of a character in a String, I need to get the index on the left and right right before we encounter a space in a String on the left and a comma on the right.
Tried, but this only provides the last word in the String:
if let range = text.range(of: " ", options: .backwards) {
let suffix = String(text.suffix(from: range.upperBound))
print(suffix) // tom#domain.com
}
I am not sure where to go from here?
You can call range(of:) on two slices of the given string:
text[..<index] is the text preceding the given character position,
and text[index...] is the text starting at the given position.
Example:
let text = "jim#domain.com, bill#domain.com, chad#domain.com, tom#domain.com"
let index = text.index(text.startIndex, offsetBy: 39)
// Search the space before the given position:
let start = text[..<index].range(of: " ", options: .backwards)?.upperBound ?? text.startIndex
// Search the comma after the given position:
let end = text[index...].range(of: ",")?.lowerBound ?? text.endIndex
print(text[start..<end]) // chad#domain.com
Both range(of:) calls return nil if no space (or comma) has
been found. In that case the nil-coalescing operator ?? is used
to get the start (or end) index instead.
(Note that this works because Substrings share a common index
with their originating string.)
An alternative approach is to use a "data detector",
so that the URL detection does not depend on certain separators.
Example (compare How to detect a URL in a String using NSDataDetector):
let text = "jim#domain.com, bill#domain.com, chad#domain.com, tom#domain.com"
let index = text.index(text.startIndex, offsetBy: 39)
let detector = try! NSDataDetector(types: NSTextCheckingResult.CheckingType.link.rawValue)
let matches = detector.matches(in: text, range: NSRange(location: 0, length: text.utf16.count))
for match in matches {
if let range = Range(match.range, in: text), range.contains(index) {
print(text[range])
}
}
Different approach:
You have the string and the Int index
let string = "jim#domain.com, bill#domain.com, chad#domain.com, tom#domain.com"
let characterIndex = 39
Get the String.Index from the Int
let stringIndex = string.index(string.startIndex, offsetBy: characterIndex)
Convert the string into an array of addresses
let addresses = string.components(separatedBy: ", ")
Map the addresses to their ranges (Range<String.Index>) in the string
let ranges = addresses.map{string.range(of: $0)!}
Get the (Int) index of the range which contains stringIndex
if let index = ranges.index(where: {$0.contains(stringIndex)}) {
Get the corresponding address
let address = addresses[index] }
One approach could be to split the original string on the “,” and then using simple math to find in what element of the array the given position (39) exist and from there get the right string or indexes for the previous space and next comma depending on what your end goal is.
So here is the string s:
"Hi! How are you? I'm fine. It is 6 p.m. Thank you! That's it."
I want them to be separated to a array as:
["Hi", "How are you", "I'm fine", "It is 6 p.m", "Thank you", "That's it"]
Which means the separators should be ". " + "? " + "! "
I've tried:
let charSet = NSCharacterSet(charactersInString: ".?!")
let array = s.componentsSeparatedByCharactersInSet(charSet)
But it will separate p.m. to two elements too. Result:
["Hi", " How are you", " I'm fine", " It is 6 p", "m", " Thank you", " That's it"]
I've also tried
let array = s.componentsSeparatedByString(". ")
It works well for separating ". " but if I also want to separate "? ", "! ", it become messy.
So any way I can do it? Thanks!
There is a method provided that lets you enumerate a string. You can do so by words or sentences or other options. No need for regular expressions.
let s = "Hi! How are you? I'm fine. It is 6 p.m. Thank you! That's it."
var sentences = [String]()
s.enumerateSubstringsInRange(s.startIndex..<s.endIndex, options: .BySentences) {
substring, substringRange, enclosingRange, stop in
sentences.append(substring!)
}
print(sentences)
The result is:
["Hi! ", "How are you? ", "I\'m fine. ", "It is 6 p.m. ", "Thank you! ", "That\'s it."]
rmaddy's answer is correct (+1). A Swift 3 implementation is:
var sentences = [String]()
string.enumerateSubstrings(in: string.startIndex ..< string.endIndex, options: .bySentences) { substring, substringRange, enclosingRange, stop in
sentences.append(substring!)
}
You can also use regular expression, NSRegularExpression, though it's much hairier than rmaddy's .bySentences solution. In Swift 3:
var sentences = [String]()
let regex = try! NSRegularExpression(pattern: "(^|\\s+)(\\w.*?[.!?]+)(?=(\\s+|$))")
regex.enumerateMatches(in: string, range: NSMakeRange(0, string.characters.count)) { match, flags, stop in
sentences.append((string as NSString).substring(with: match!.rangeAt(2)))
}
Or Swift 2:
let regex = try! NSRegularExpression(pattern: "(^|\\s+)(\\w.*?[.!?]+)(?=(\\s+|$))", options: [])
var sentences = [String]()
regex.enumerateMatchesInString(string, options: [], range: NSMakeRange(0, string.characters.count)) { match, flags, stop in
sentences.append((string as NSString).substringWithRange(match!.rangeAtIndex(2)))
}
The [.!?] syntax matches any of those three characters. The | means "or". The ^ matches the start of the string. The $ matches the end of the string. The \\s matches a whitespace character. The \\w matches a "word" character. The * matches zero or more of the preceding character. The + matches one or more of the preceding character. The (?=) is a look-ahead assertion (e.g. see if there's something there, but don't advance through that match).
I've tried to simplify this a bit, and it's still pretty complicated. Regular expressions offer rich text pattern matching, but, admittedly, it is a little dense when you first use it. But this rendition matches (a) repeated punctuation (e.g. "Thank you!!!"), (b) leading spaces, and (c) trailing spaces, too.
If the splitting basis is something a little more esoteric than sentences, this extension could work.
extension String {
public func components(separatedBy separators: [String]) -> [String] {
var output: [String] = [self]
for separator in separators {
output = output.flatMap { $0.components(separatedBy: separator) }
}
return output.map { $0.trimmingCharacters(in: .whitespaces)}
}
}
let artists = "Rihanna, featuring Calvin Harris".components(separated by: [", with", ", featuring"])
I tried to find a regex to solve this too: (([^.!?]+\s)*\S+(\.|!|\?))
Here the explanation from regexper and an example
Well I've found a regex too from here
var pattern = "(?<=[.?!;…])\\s+(?=[\\p{Lu}\\p{N}])"
let s = "Hi! How are you? I'm fine. It is 6 p.m. Thank you! That's it."
let sReplaced = s.stringByReplacingOccurrencesOfString(pattern, withString:"[*-SENTENCE-*]" as String, options:NSStringCompareOptions.RegularExpressionSearch, range:nil)
let array = sReplaced.componentsSeparatedByString("[*-SENTENCE-*]")
Perhaps it's not a good way as it has to first replace and than separate the string. :)
UPDATE:
For regex part, if you also want to match Chinese/Japanese punctuations (which space after each punctuation is not necessary), you can use the following one:
((?<=[.?!;…])\\s+|(?<=[。!?;…])\\s*)(?=[\\p{L}\\p{N}])