I have a String Add "ABC" here and I want to extract ABC from those string. For this I do:
text.rangeOfString("(?<=\")[^\"]+", options: .RegularExpressionSearch)
but it returns me
Optional(Range(5..<7))
How can I extract those text from there?
You firstly need to unwrap the resulted range, and call substringWithRange. You can do this via conditional binding
let text = "Add \"ABC\""
let range = text.rangeOfString("(?<=\")[^\"]+", options: .RegularExpressionSearch, range: nil, locale: nil)
if let nonNilRange = range {
print(text.substringWithRange(nonNilRange))
}
You can use the "([^"]+)" regex to extract any matches and any captured groups with the following code:
func regMatchGroup(regex: String, text: String) -> [[String]] {
do {
var resultsFinal = [[String]]()
let regex = try NSRegularExpression(pattern: regex, options: [])
let nsString = text as NSString
let results = regex.matchesInString(text,
options: [], range: NSMakeRange(0, nsString.length))
for result in results {
var internalString = [String]()
for var i = 0; i < result.numberOfRanges; ++i{
internalString.append(nsString.substringWithRange(result.rangeAtIndex(i)))
}
resultsFinal.append(internalString)
}
return resultsFinal
} catch let error as NSError {
print("invalid regex: \(error.localizedDescription)")
return [[]]
}
}
// USAGE:
let string = "Add \"ABC\" \"ABD\""
let matches = regMatchGroup("\"([^\"]+)\"", text: string)
if (matches.count > 0) // If we have matches....
{
print(matches[0][1]) // Print the first one, Group 1.
}
See SwiftStub demo
Due to error handling added, no crash should occur when no match is found.
The solution is:
let regex = myText.rangeOfString("(?<=\")[^\"]+")
myText.substringWithRange(regex, options: .RegularExpressionSearch)!)
Related
I want to slice a very long string from one word to another. I want to get the substring between those words.
For that, I use the following string extension:
extension String {
func slice(from: String, to: String) -> String? {
guard let rangeFrom = range(of: from)?.upperBound else { return nil }
guard let rangeTo = self[rangeFrom...].range(of: to)?.lowerBound else { return nil }
return String(self[rangeFrom..<rangeTo])
}
That works really good, but my raw-string contains a few of the "from" "to"-words and I need every substring that is between of these two words, but with my extension I can ony get the first substring.
Example:
let raw = "id:244476end36475677id:383848448end334566777788id:55678900end543"
I want to get the following substrings from this raw string example:
sub1 = "244476"
sub2 = "383848448"
sub3 = "55678900"
If I call:
var text = raw.slice(from: "id:" , to: "end")
I only get the first occurence (text = "244476")
Thank you for reading. Every answer would be nice.
PS: I get always an error by making code snippets in stackoverflow.
You can get the ranges of your substrings using a while loop to repeat the search from that point to the end of your string and use map to get the substrings from the resulting ranges:
extension StringProtocol {
func ranges<S:StringProtocol,T:StringProtocol>(between start: S, and end: T, options: String.CompareOptions = []) -> [Range<Index>] {
var ranges: [Range<Index>] = []
var startIndex = self.startIndex
while startIndex < endIndex,
let lower = self[startIndex...].range(of: start, options: options)?.upperBound,
let range = self[lower...].range(of: end, options: options) {
let upper = range.lowerBound
ranges.append(lower..<upper)
startIndex = range.upperBound
}
return ranges
}
func substrings<S:StringProtocol,T:StringProtocol>(between start: S, and end: T, options: String.CompareOptions = []) -> [SubSequence] {
ranges(between: start, and: end, options: options).map{self[$0]}
}
}
Playground testing:
let string = """
your text
id:244476end
id:383848448end
id:55678900end
the end
"""
let substrings = string.substrings(between: "id:", and: "end") // ["244476", "383848448", "55678900"]
Rather thant trying to parse the string from start to end, I would use a combination of existing methods to transform it into the desire result. Here's How I would do this:
import Foundation
let raw = "id:244476end36475677id:383848448end334566777788id:55678900end543"
let result = raw
.components(separatedBy: "id:")
.filter{ !$0.isEmpty }
.map { segment -> String in
let slices = segment.components(separatedBy: "end")
return slices.first! // Removes the `end` and everything thereafter
}
print(result) // => ["244476", "383848448", "55678900"]
I have a certain string given like so..
let string = "[#he man:user:123] [#super man:user:456] [#bat man:user:789]"
Now, I need an array containing just the name and the id. For that, I applied the following regex..
extension String {
func findMentionText2() -> [[String]] {
let regex = try? NSRegularExpression(pattern: "(#\\w+(?: \\w+)*):user:(\\w+)", options: [])
if let matches = regex?.matches(in: self, options:[], range:NSMakeRange(0, self.count)) {
return matches.map { match in
return (1..<match.numberOfRanges).map {
let rangeBounds = match.range(at: $0)
guard let range = Range(rangeBounds, in: self) else {
return ""
}
return String(self[range])
}
}
} else {
return []
}
}
}
Now when I do let hashString = string.findMentionText() and print hashString, I get an array like so..
[["#he man", "123"], ["#super man", "456"], ["#bat man", "789"]]
So far so good..:)
Now I made a typealias and want to add it to an array..
So I did this...
typealias UserTag = (name: String, id: String)
var userTagList = [UserTag]()
and then,
let hashString2 = string.findMentionText2()
for unit in hashString2 {
let user: UserTag = (name: unit.first!, id: unit.last!)
userTagList.append(user)
}
for value in userTagList {
print(value.id)
print(value.name)
}
Now here, instead of giving unit.first and unit.last in let user: UserTag = (name: unit.first!, id: unit.last!), want to add the name and id to the typealias as and when they are matched from the regex..ie.when I get the name or id, it should be added to the array instead of giving unit.first or unit.last..
How can I achieve that..?
You just need to refactor your map to generate an array of UserTag instead of an array of string arrays. Here's one approach:
typealias UserTag = (name: String, id: String)
extension String {
func findMentionText2() -> [UserTag] {
let regex = try? NSRegularExpression(pattern: "(#\\w+(?: \\w+)*):user:(\\w+)", options: [])
if let matches = regex?.matches(in: self, options:[], range:NSMakeRange(0, self.count)) {
return matches.compactMap { match in
if match.numberOfRanges == 3 {
let name = String(self[Range(match.range(at: 1), in:self)!])
let id = String(self[Range(match.range(at: 2), in:self)!])
return UserTag(name: name, id: id)
} else {
return nil
}
}
} else {
return []
}
}
}
let string = "[#he man:user:123] [#super man:user:456] [#bat man:user:789]"
print(string.findMentionText2())
But I suggest you create a struct instead of using a tuple. It doesn't really change the implementation of findMentionText2 but using a struct lets you add other properties and methods as needed.
I want to extract value from a string which has unique starting and ending character. In my case its em
"Fully <em>Furni<\/em>shed |Downtown and Canal Views",
result
Furnished
I guess you want to remove the tags.
If the backslash is only virtual the pattern is pretty simple: Basically <em> with optional slash /?
let trimmedString = string.replacingOccurrences(of: "</?em>", with: "", options: .regularExpression)
Considering also the backslash it's
let trimmedString = string.replacingOccurrences(of: "<\\\\?/?em>", with: "", options: .regularExpression)
If you want to extract only Furnished you have to capture groups: The string between the tags and everything after the closing tag until the next whitespace character.
let string = "Fully <em>Furni<\\/em>shed |Downtown and Canal Views"
let pattern = "<em>(.*)<\\\\?/em>(\\S+)"
do {
let regex = try NSRegularExpression(pattern: pattern)
if let match = regex.firstMatch(in: string, range: NSRange(string.startIndex..., in: string)) {
let part1 = string[Range(match.range(at: 1), in: string)!]
let part2 = string[Range(match.range(at: 2), in: string)!]
print(String(part1 + part2))
}
} catch { print(error) }
Given this string:
let str = "Fully <em>Furni<\\/em>shed |Downtown and Canal Views"
and the corresponding NSRange:
let range = NSRange(location: 0, length: (str as NSString).length)
Let's construct a regular expression that would match letters between <em> and </em>, or preceded by </em>
let regex = try NSRegularExpression(pattern: "(?<=<em>)\\w+(?=<\\\\/em>)|(?<=<\\\\/em>)\\w+")
What it does is :
look for 1 or more letters: \\w+,
that are preceded by <em>: (?<=<em>) (positive lookbehind),
and followed by <\/em>: (?=<\\\\/em>) (positive lookahead),
or : |
letters: \\w+,
that are preceded by <\/em>: (?=<\\\\/em>) (positive lookbehind)
Let's get the matches:
let matches = regex.matches(in: str, range: range)
Which we can turn into substrings:
let strings: [String] = matches.map { match in
let start = str.index(str.startIndex, offsetBy: match.range.location)
let end = str.index(start, offsetBy: match.range.length)
return String(str[start..<end])
}
Now we can join the strings in even indices, with the ones in odd indices:
let evenStride = stride(from: strings.startIndex,
to: strings.index(strings.endIndex, offsetBy: -1),
by: 2)
let result = evenStride.map { strings[$0] + strings[strings.index($0, offsetBy: 1)]}
print(result) //["Furnished"]
We can test it with another string:
let str2 = "<em>Furni<\\/em>shed <em>balc<\\/em>ony <em>gard<\\/em>en"
the result would be:
["Furnished", "balcony", "garden"]
Not a regex but, for obtaining all words in tags, e.g [Furni, sma]:
let text = "Fully <em>Furni<\\/em>shed <em>sma<\\/em>shed |Downtown and Canal Views"
let emphasizedParts = text.components(separatedBy: "<em>").filter { $0.contains("<\\/em>")}.flatMap { $0.components(separatedBy: "<\\/em>").first }
For full words, e.g [Furnished, smashed]:
let emphasizedParts = text.components(separatedBy: " ").filter { $0.contains("<em>")}.map { $0.replacingOccurrences(of: "<\\/em>", with: "").replacingOccurrences(of: "<em>", with: "") }
Regex:
If you want to achieve that by regex, you can use Valexa's answer:
public extension String {
public func capturedGroups(withRegex pattern: String) -> [String] {
var results = [String]()
var regex: NSRegularExpression
do {
regex = try NSRegularExpression(pattern: pattern, options: [])
} catch {
return results
}
let matches = regex.matches(in: self, options: [], range: NSRange(location:0, length: self.count))
guard let match = matches.first else { return results }
let lastRangeIndex = match.numberOfRanges - 1
guard lastRangeIndex >= 1 else { return results }
for i in 1...lastRangeIndex {
let capturedGroupIndex = match.range(at: i)
let matchedString = (self as NSString).substring(with: capturedGroupIndex)
results.append(matchedString)
}
return results
}
}
like this:
let text = "Fully <em>Furni</em>shed |Downtown and Canal Views"
print(text.capturedGroups(withRegex: "<em>([a-zA-z]+)</em>"))
result:
["Furni"]
NSAttributedString:
If you want to do some highlighting or you only need to get rid of tags or any other reason that you can't use the first solution, you can also do that using NSAttributedString:
extension String {
var attributedStringAsHTML: NSAttributedString? {
do{
return try NSAttributedString(data: Data(utf8),
options: [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue],
documentAttributes: nil)
}
catch {
print("error: ", error)
return nil
}
}
}
func getTextSections(_ text:String) -> [String] {
guard let attributedText = text.attributedStringAsHTML else {
return []
}
var sections:[String] = []
let range = NSMakeRange(0, attributedText.length)
// we don't need to enumerate any special attribute here,
// but for example, if you want to just extract links you can use `NSAttributedString.Key.link` instead
let attribute: NSAttributedString.Key = .init(rawValue: "")
attributedText.enumerateAttribute(attribute,
in: range,
options: .longestEffectiveRangeNotRequired) {attribute, range, pointer in
let text = attributedText.attributedSubstring(from: range).string
sections.append(text)
}
return sections
}
let text = "Fully <em>Furni</em>shed |Downtown and Canal Views"
print(getTextSections(text))
result:
["Fully ", "Furni", "shed |Downtown and Canal Views"]
Here is basic implementation in PHP (yes, I know you asked Swift, but it's to demonstrate the regex part):
<?php
$in = "Fully <em>Furni</em>shed |Downtown and Canal Views";
$m = preg_match("/<([^>]+)>([^>]+)<\/\\1>([^ ]+|$)/i", $in, $t);
$s = $t[2] . $t[3];
echo $s;
Output:
ZC-MGMT-04:~ jv$ php -q regex.php
Furnished
Obviously, the most important bit is the regular expression part which would match any tag and find a respective closing tag and reminder afterward
If you just want to extract the text between <em> and <\/em> (note this is not normal HTML tags as then it would have been <em> and </em>) tags, we can simply capture this pattern and replace it with the group 1's value captured. And we don't need to worry about what is present around the matching text and just replace it with whatever got captured between those text which could actually be empty string, because OP hasn't mentioned any constraint for that. The regex for matching this pattern would be this,
<em>(.*?)<\\\/em>
OR to be technically more robust in taking care of optional spaces (as I saw someone pointing out in comment's of other answers) present any where within the tags, we can use this regex,
<\s*em\s*>(.*?)<\s*\\\/em\s*>
And replace it with \1 or $1 depending upon where you are doing it. Now whether these tags contain empty string, or contains some actual string within it, doesn't really matter as shown in my demo on regex101.
Here is the demo
Let me know if this meets your requirements and further, if any of your requirement remains unsatisfied.
I highly recommend the use of regex capture groups.
create your regex putting the name for the desired capture group:
let capturePattern = "(?<=<em>)(?<data1>\\w+)(?=<\\\\/em>)|(?<=<\\\\/em>)(?<data2>\\w+)"
now use the Swift capture pattern to get the data:
let captureRegex = try! NSRegularExpression(
pattern: capturePattern,
options: []
)
let textInput = "Fully <em>Furni<\/em>shed |Downtown and Canal Views"
let textInputRange = NSRange(
textInput.startIndex..<textInput.endIndex,
in: textInput
)
let matches = captureRegex.matches(
in: textInput,
options: [],
range: textInputRange
)
guard let match = matches.first else {
// Handle exception
throw NSError(domain: "", code: 0, userInfo: nil)
}
let data1Range = match.range(withName: "data1")
// Extract the substring matching the named capture group
if let substringRange = Range(data1Range, in: textInput) {
let capture = String(textInput[substringRange])
print(capture)
}
The same can be done to get the data2 group name:
let data2Range = match.range(withName: "data2")
if let substringRange = Range(data2Range, in: textInput) {
let capture = String(textInput[substringRange])
print(capture)
}
This method's main advantage is the group index independency. This makes this use less attached to the regex expression.
I cannot think of the a function to remove a repeating substring from my string. My string looks like this:
"<bold><bold>Rutger</bold> Roger</bold> rented a <bold>testitem zero dollars</bold> from <bold>Rutger</bold>."
And if <bold> is followed by another <bold> I want to remove the second <bold>. When removing that second <bold> I also want to remove the first </bold> that follows.
So the output that I'm looking for should be this:
"<bold>Rutger Roger</bold> rented a <bold>testitem zero dollars</bold> from <bold>Rutger</bold>."
Anyone know how to achieve this in Swift (2.2)?
I wrote a solution using regex with the assumption that tags won't appear in nested contents more than 1 times. In other words it just cleans the double tags not more than that. You can use the same code and a recursive call to clean as many nested repeating tag as you want:
class Cleaner {
var tags:Array<String> = [];
init(tags:Array<String>) {
self.tags = tags;
}
func cleanString(html:String) -> String {
var res = html
do {
for tag in tags {
let start = "<\(tag)>"
let end = "</\(tag)>"
let pattern = "\(start)(.*?)\(end)"
let regex = try NSRegularExpression(pattern: pattern, options: NSRegularExpression.Options.caseInsensitive)
let matches = regex.matches(in: res, options: [], range: NSRange(location: 0, length: res.utf16.count))
var diff = 0;
for match in matches {
let outer_range = NSMakeRange(match.rangeAt(0).location - diff, match.rangeAt(0).length)
let inner_range = NSMakeRange(match.rangeAt(1).location - diff, match.rangeAt(1).length)
let node = (res as NSString).substring(with: outer_range)
let content = (res as NSString).substring(with: inner_range)
// look for the starting tag in the content of the node
if content.range(of: start) != nil {
res = (res as NSString).replacingCharacters(in: outer_range, with: content);
//for shifting future ranges
diff += (node.utf16.count - content.utf16.count)
}
}
}
}
catch {
print("regex was bad!")
}
return res
}
}
let cleaner = Cleaner(tags: ["bold"]);
let html = "<bold><bold>Rutger</bold> Roger</bold> rented a <bold><bold>testitem</bold> zero dollars</bold> from <bold>Rutger</bold>."
let cleaned = cleaner.cleanString(html: html)
print(cleaned)
//<bold>Rutger Roger</bold> rented a <bold>testitem zero dollars</bold> from <bold>Rutger</bold>.
Try this, i have just made. Hope this helpful.
class Test : NSObject {
static func removeFirstString (originString: String, removeString: String, withString: String) -> String {
var genString = originString
if originString.contains(removeString) {
let range = originString.range(of: removeString)
genString = genString.replacingOccurrences(of: removeString, with: withString, options: String.CompareOptions.anchored, range: range)
}
return genString
}
}
var newString = Test.removeFirstString(originString: str, removeString: "<bold>", withString: "")
newString = Test.removeFirstString(originString: newString, removeString: "</bold>", withString: "")
I'm trying to parse a string using one regular expression pattern.
Here is the pattern:
(\")(.+)(\")\s*(\{)
Here is the text to be parsed:
"base" {
I want to find these 4 capturing groups:
1. "
2. base
3. "
4. {
I am using the following code trying to capture those groups
class func matchesInCapturingGroups(text: String, pattern: String) -> [String] {
var results = [String]()
let textRange = NSMakeRange(0, count(text))
var index = 0
if let matches = regexp(pattern)?.matchesInString(text, options: NSMatchingOptions.ReportCompletion, range: textRange) as? [NSTextCheckingResult] {
for match in matches {
// this match = <NSExtendedRegularExpressionCheckingResult: 0x7fac3b601fd0>{0, 8}{<NSRegularExpression: 0x7fac3b70b5b0> (")(.+)(")\s*(\{) 0x1}
results.append(self.substring(text, range: match.range))
}
}
return results
}
Unfortunately it is able to find only one group with range (0, 8) which is equal to: "base" {. So it finds one group which is the entire string instead of 4 groups.
Is that even possible to get those groups using NSRegularExpression?
Yes, of course it is possible. You just have to change your current logic for finding the actual groups:
func matchesInCapturingGroups(text: String, pattern: String) -> [String] {
var results = [String]()
let textRange = NSMakeRange(0, text.lengthOfBytesUsingEncoding(NSUTF8StringEncoding))
do {
let regex = try NSRegularExpression(pattern: pattern, options: [])
let matches = regex.matchesInString(text, options: NSMatchingOptions.ReportCompletion, range: textRange)
for index in 1..<matches[0].numberOfRanges {
results.append((text as NSString).substringWithRange(matches[0].rangeAtIndex(index)))
}
return results
} catch {
return []
}
}
let pattern = "(\")(.+)(\")\\s*(\\{)"
print(matchesInCapturingGroups("\"base\" {", pattern: pattern))
You actually only get 1 match. You have to go into that match and in there you will find the captured groups. Note that I omit the first group since the first group represents the entire match.
This will output
[""", "base", """, "{"]
Note the escaped regex string and make sure that you are using the same one.