How to regex string that includes html - ios

I have this string which is part of a larger string with multiple occurrences of "content" and "/content": I want to capture the whole string between "content ..." and "/content".
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">RATING: ★★★★<br/>
TAGS: Fiction, General, Science Fiction<br/>
SERIES: 20 SienceFiction Greats [19]<br/>
<p class="description">SUMMARY:<br/>Luna is an open colony and the regime is a harsh one....</p></div>
</content>
I want to capture all between "content type="xhtml"" and "/content"
I tried this code:
let regexPattern = "<content type=\"xhtml\">.*</content>"
let result:[String] = matches(for: regexPattern, in: dataString)
but it returns an empty array.

Your regex pattern is close. You do not have any capturing groups within the regex. A capturing group is defined as any pattern between (). So with a little adjustment of the regex you will get a match. The updated Regex should look like this:
let regexPattern = "<content type=\"xhtml\">(.*)<\/content>"

Well I found a solution as the .* pattern captures everything from the first occurrence of "content" to the last occurrence of "/content".
so this is my solution: get an array of "content" occurrences and an array of "/content" occurrences, from these arrays I can calculate the ranges I need from the string
private func getXHTMLContentFromDataString(dataString: String) -> [String] {
var contentStringArray: [String] = []
let startString: String = "<content type=\"xhtml\">"
let endString: String = "</content>"
var isFound = true
var currentString = dataString
while isFound == true && currentString.characters.count > 0 {
if let contentStartRange = currentString.range(of: startString), let contentEndRange = currentString.range(of: endString) {
isFound = true
let contentStr: String = currentString[contentStartRange.upperBound .. < contentEndRange.lowerBound]
contentStringArray.append(contentStr)
currentString = currentString[contentEndRange.upperBound .. <currentString.endIndex]
} else {
isFound = false
}
}
return contentStringArray
}

Related

How to Check if String begins with Alphabet Letter in Swift 5?

Problem: i am currently trying to Sort a List in SwiftUI according to the Items First Character. I also would like to implement a Section for all Items, which doesn't begin with a Character of the Alphabet (Numbers, Special Chars).
My Code so far:
let nonAlphabetItems = items.filter { $0.name.uppercased() != /* beginns with A - Z */ }
Does anyone has a Solution for this Issue. Of course I could do a huge Loop Construct, however I hope there is a more elegant way.
Thanks for your help.
You can check if a string range "A"..."Z" contains the first letter of your name property:
struct Item {
let name: String
}
let items: [Item] = [.init(name: "Def"),.init(name: "Ghi"),.init(name: "123"),.init(name: "Abc")]
let nonAlphabetItems = items.filter { !("A"..."Z" ~= ($0.name.first?.uppercased() ?? "#")) }
nonAlphabetItems // [{name "123"}]
Expanding on this topic we can extend Character to add a isAsciiLetter property:
extension Character {
var isAsciiLetter: Bool { "A"..."Z" ~= self || "a"..."z" ~= self }
}
This would allow to extend StringProtocol to check is a string starts with an ascii letter:
extension StringProtocol {
var startsWithAsciiLetter: Bool { first?.isAsciiLetter == true }
}
And just a helper to negate a boolean property:
extension Bool {
var negated: Bool { !self }
}
Now we can filter the items collection as follow:
let nonAlphabetItems = items.filter(\.name.startsWithAsciiLetter.negated) // [{name "123"}]
If you need an occasional filter, you could simply write a condition combining standard predicates isLetter and isASCII which are already defined for Character. It's as simple as:
let items = [ "Abc", "01bc", "Ça va", "", " ", "𓀫𓀫𓀫𓀫"]
let nonAlphabetItems = items.filter { $0.isEmpty || !$0.first!.isASCII || !$0.first!.isLetter }
print (nonAlphabetItems) // -> Output: ["01bc", "Ça va", "", " ", "𓀫𓀫𓀫𓀫"]
If the string is not empty, it has for sure a first character $0.first!. It is tempting to use isLetter , but it appears to be true for many characters in many local alphabets, including for example the antique Egyptian hieroglyphs like "𓀫" or the French alphabet with "Ç"and accented characters. This is why you need to restrict it to ASCII letters, to limit yourself to the roman alphabet.
You can use NSCharacterSet in the following way :
let phrase = "Test case"
let range = phrase.rangeOfCharacter(from: characterSet)
// range will be nil if no letters is found
if let test = range {
println("letters found")
}
else {
println("letters not found")
}```
You can deal with ascii value
extension String {
var fisrtCharacterIsAlphabet: Bool {
guard let firstChar = self.first else { return false }
let unicode = String(firstChar).unicodeScalars
let ascii = Int(unicode[unicode.startIndex].value)
return (ascii >= 65 && ascii <= 90) || (ascii >= 97 && ascii <= 122)
}
}
var isAlphabet = "Hello".fisrtCharacterIsAlphabet
The Character type has a property for this:
let x: Character = "x"
x.isLetter // true for letters, false for punctuation, numbers, whitespace, ...
Note that this will include characters from other alphabets (Greek, Cyrillic, Chinese, ...).
As String is a Sequence with Element equal to Character, we can use the .first property to get the first char.
With this, you can filter your items:
let filtered = items.filter { $0.name.first?.isLetter ?? false }
You can get this done through this simple String extension
extension StringProtocol {
var isFirstCharacterAlp: Bool {
first?.isASCII == true && first?.isLetter == true
}
}
Usage:
print ("H1".isFirstCharacterAlp)
print ("ابراهيم1".isFirstCharacterAlp)
Output
true
false
Happy Coding!
Reference

How to split string as English and non English using Swift 4?

I have a string which contains English and Arabic together. I am using an API, that is why I cannot set an indicator in it.
What I want to get is: the Arabic and English split into tow parts. Here is a sample String:
"بِاسْمِكَ رَبِّي وَضَعْتُ جَنْبِي، وَبِكَ أَرْفَعُهُ، فَإِنْ أَمْسَكْتَ نَفْسِي فَارْحَمْهَا، وَإِنْ أَرْسَلْتَهَا فَاحْفَظْهَا، بِمَا تَحْفَظُ بِهِ عِبَادَكَ الصَّالِحِينَ.Bismika rabbee wadaAAtu janbee wabika arfaAAuh, fa-in amsakta nafsee farhamha, wa-in arsaltaha fahfathha bima tahfathu bihi AAibadakas-saliheen. In Your name my Lord, I lie down and in Your name I rise, so if You should take my soul then have mercy upon it, and if You should return my soul then protect it in the manner You do so with Your righteous servants.",
I cannot find how to split it into 2 parts that I get Arabic and English into two different parts.
What I want:
so there can be any language, my problem is to only take out English or Arabic language and show them in respective fields.
How can I achieve it?
You can use a Natural Language Tagger, which would work even if both scripts are intermingled:
import NaturalLanguage
let str = "¿como? بداية start وسط middle начать средний конец نهاية end. 從中間開始. "
let tagger = NLTagger(tagSchemes: [.script])
tagger.string = str
var index = str.startIndex
var dictionary = [String: String]()
var lastScript = "other"
while index < str.endIndex {
let res = tagger.tag(at: index, unit: .word, scheme: .script)
let range = res.1
let script = res.0?.rawValue
switch script {
case .some(let s):
lastScript = s
dictionary[s, default: ""] += dictionary["other", default: ""] + str[range]
dictionary.removeValue(forKey: "other")
default:
dictionary[lastScript, default: ""] += str[range]
}
index = range.upperBound
}
print(dictionary)
and print the result if you'd like:
for entry in dictionary {
print(entry.key, ":", entry.value)
}
yielding :
Hant : 從中間開始.
Cyrl : начать средний конец
Arab : بداية وسط نهاية
Latn : ¿como? start middle end.
This is still not perfect since the language tagger only checks to which script the most number of letters in a word belong to. For example, in the string you're working with, the tagger would consider الصَّالِحِينَ.Bismika as one word. To overcome this, we could use two pointers and traverse the original string and check the script of words individually. Words are defined as contiguous letters:
let str = "بِاسْمِكَ رَبِّي وَضَعْتُ جَنْبِي، وَبِكَ أَرْفَعُهُ، فَإِنْ أَمْسَكْتَ نَفْسِي فَارْحَمْهَا، وَإِنْ أَرْسَلْتَهَا فَاحْفَظْهَا، بِمَا تَحْفَظُ بِهِ عِبَادَكَ الصَّالِحِينَ.Bismika rabbee wadaAAtu janbee wabika arfaAAuh, fa-in amsakta nafsee farhamha, wa-in arsaltaha fahfathha bima tahfathu bihi AAibadakas-saliheen. In Your name my Lord, I lie down and in Your name I rise, so if You should take my soul then have mercy upon it, and if You should return my soul then protect it in the manner You do so with Your righteous servants."
let tagger = NLTagger(tagSchemes: [.script])
var i = str.startIndex
var dictionary = [String: String]()
var lastScript = "glyphs"
while i < str.endIndex {
var j = i
while j < str.endIndex,
CharacterSet.letters.inverted.isSuperset(of: CharacterSet(charactersIn: String(str[j]))) {
j = str.index(after: j)
}
if i != j { dictionary[lastScript, default: ""] += str[i..<j] }
if j < str.endIndex { i = j } else { break }
while j < str.endIndex,
CharacterSet.letters.isSuperset(of: CharacterSet(charactersIn: String(str[j]))) {
j = str.index(after: j)
}
let tempo = String(str[i..<j])
tagger.string = tempo
let res = tagger.tag(at: tempo.startIndex, unit: .word, scheme: .script)
if let s = res.0?.rawValue {
lastScript = s
dictionary[s, default: ""] += dictionary["glyphs", default: ""] + tempo
dictionary.removeValue(forKey: "glyphs")
}
else { dictionary["other", default: ""] += tempo }
i = j
}
You can use the NaturalLanguageTagger as answered by #ielyamani but the only limitation is that it is iOS 12+
If you are trying to do this on earlier iOS versions, you can take a look at NSCharacterSet
You can create your own characterset to check whether a string has english characters and numbers
extension String {
func containsLatinCharacters() -> Bool {
var charSet = NSCharacterSet(charactersInString: "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890")
charSet = charSet.invertedSet
let range = (self as NSString).rangeOfCharacterFromSet(charSet)
if range.location != NSNotFound {
return false
}
return true
}
}
Another option is to use the charactersets already available:
let nonLatinString = string.trimmingCharacters(in: .alphanumerics)//symbols will still get through
let latinString = string.trimmingCharacters(in: CharacterSet.alphanumerics.inverted)//symbols and non-latin characters wont get through
With these you can get the strings you want quite easily. But if these are not good enough, you can look to create your own characterset, use union, intersect etc to filter out the wanted and the unwanted characters.
Step 1:
You have to split whole string into an array by "." as I can see there are "." between sentence.
Step 2:
Pass each sentence to determine its language and append into different string.
Final Code
//add in your viewController
enum Language : String {
case arabic = "ar"
case english = "en"
}
override func viewDidLoad() {
super.viewDidLoad()
//make array of string
let kalmaArray = "بِاسْمِكَ رَبِّي وَضَعْتُ جَنْبِي، وَبِكَ أَرْفَعُهُ، فَإِنْ أَمْسَكْتَ نَفْسِي فَارْحَمْهَا، وَإِنْ أَرْسَلْتَهَا فَاحْفَظْهَا، بِمَا تَحْفَظُ بِهِ عِبَادَكَ الصَّالِحِينَ.Bismika rabbee wadaAAtu janbee wabika arfaAAuh, fa-in amsakta nafsee farhamha, wa-in arsaltaha fahfathha bima tahfathu bihi AAibadakas-saliheen. In Your name my Lord, I lie down and in Your name I rise, so if You should take my soul then have mercy upon it, and if You should return my soul then protect it in the manner You do so with Your righteous servants.".components(separatedBy: ".")
splitInLanguages(kalmaArray: kalmaArray)
}
private func splitInLanguages(kalmaArray: [String]){
var englishText = ""
var arabicText = ""
for kalma in kalmaArray {
if kalma.count > 0 {
if let language = NSLinguisticTagger.dominantLanguage(for: kalma) {
switch language {
case Language.arabic.rawValue:
arabicText.append(kalma)
arabicText.append(".")
break
default: // English
englishText.append(kalma)
englishText.append(".")
break
}
} else {
print("Unknown language")
}
}
}
debugPrint("Arabic: ", arabicText)
debugPrint("English: ", englishText)
}
I hope it will help you to split the string in two language. Let me know if you are still having any issue.

How to remove 1 component in a string? Swift

I have this string "01:07:30" and I would like to remove the zero in "01" and keep everything else the same. My final string should look like this
"1:07:30"
Is there a way to do so in Swift? Thank you so much!
Try the following. This will create a string, check to see if the first character is 0 if it is remove it. Otherwise do nothing.
var myString = "01:07:30"
if myString.first == "0" {
myString.remove(at: myString.startIndex)
}
print(myString) // 1:07:30
The final result is the string will now be 1:07:30 instead of 01:07:30.
If I'm best understanding your question, you want to replace the occurrences of "01" with "1" in your string, so you can use regex or the string functionality it self
simply:
let searchText = "01"
let replaceWithValue = "1"
let string = "01:07:08:00:01:01"
let newString = string.replacingOccurrences(of: searchText, with: replaceWithValue) // "1:07:08:00:1:1"
If you want to replace the fist occurrence only, simply follow this answer:
https://stackoverflow.com/a/33822186/3911553
If you just want to deal with string here is one of many solution:
var myString = "01:07:30"
let list = myString.components(separatedBy: ":")
var finalString = ""
for var obj in list{
if obj.first == "0" {
obj.removeFirst()
}
finalString += finalString.count == 0 ? "\(obj)" : ":\(obj)"
}
print(finalString)

Remove special characters from the string

I am trying to use an iOS app to dial a number. The problem is that the number is in the following format:
po placeAnnotation.mapItem.phoneNumber!
"‎+1 (832) 831-6486"
I want to get rid of some special characters and I want the following:
832-831-6486
I used the following code but it did not remove anything:
let charactersToRemove = CharacterSet(charactersIn: "()+-")
var telephone = placeAnnotation.mapItem.phoneNumber?.trimmingCharacters(in: charactersToRemove)
Any ideas?
placeAnnotation.mapItem.phoneNumber!.components(separatedBy: CharacterSet.decimalDigits.inverted)
.joined()
Here you go!
I tested and works well.
If you want something similar to CharacterSet with some flexibility, this should work:
let phoneNumber = "1 (832) 831-6486"
let charsToRemove: Set<Character> = Set("()+-".characters)
let newNumberCharacters = String(phoneNumber.characters.filter { !charsToRemove.contains($0) })
print(newNumberCharacters) //prints 1 832 8316486
I know the question is already answered, but to format phone numbers in any way one could use a custom formatter like below
class PhoneNumberFormatter:Formatter
{
var numberFormat:String = "(###) ### ####"
override func string(for obj: Any?) -> String? {
if let number = obj as? NSNumber
{
var input = number as Int64
var output = numberFormat
while output.characters.contains("#")
{
if let range = output.range(of: "#", options: .backwards)
{
output = output.replacingCharacters(in: range, with: "\(input % 10)")
input /= 10
}
else
{
output.replacingOccurrences(of: "#", with: "")
}
}
return output
}
return nil
}
func string(from number:NSNumber) -> String?
{
return string(for: number)
}
}
let phoneNumberFormatter = PhoneNumberFormatter()
//Digits will be filled backwards in place of hashes. It is easy change the custom formatter in anyway
phoneNumberFormatter.numberFormat = "###-##-##-##-##"
phoneNumberFormatter.string(from: 18063783889)
Swift 3
func removeSpecialCharsFromString(_ str: String) -> String {
struct Constants {
static let validChars = Set("1234567890-".characters)
}
return String(str.characters.filter { Constants.validChars.contains($0) })
}
To Use
let str : String = "+1 (832) 831-6486"
let newStr : String = self.removeSpecialCharsFromString(str)
print(newStr)
Note: you can add validChars which you want in string after operation perform.
If you have the number and special character in String format the use following code to remove special character
let numberWithSpecialChar = "1800-180-0000"
let actulNumber = numberWithSpecialChar.components(separatedBy: CharcterSet.decimalDigit.inverted).joined()
Otherwise, If you have the characters and special character in String format the use following code to remove special character
let charactersWithSpecialChar = "A man, a plan, a cat, a ham, a yak, a yam, a hat, a canal-Panama!"
let actulString = charactersWithSpecialChar.components(separatedBy: CharacterSet.letters.inverted).joined(separator: " ")
NSString *str = #"(123)-456-7890";
NSLog(#"String: %#", str);
// Create character set with specified characters
NSMutableCharacterSet *characterSet =
[NSMutableCharacterSet characterSetWithCharactersInString:#"()-"];
// Build array of components using specified characters as separtors
NSArray *arrayOfComponents = [str componentsSeparatedByCharactersInSet:characterSet];
// Create string from the array components
NSString *strOutput = [arrayOfComponents componentsJoinedByString:#""];
NSLog(#"New string: %#", strOutput);

StringBetweenString function

I need to get substring between two strings from my text. For example, I have text "http://google.com" and I want to get substring between "://" and ".".
I don't know, how I can do that.
I try to use regular expressions, but I think, it's bad way.
A couple of options:
Regular expressions work well. See ICU User Guide: Regular Expressions
Example:
let us = "http://google.com"
let range = us.rangeOfString("(?<=://)[^.]+(?=.)", options:.RegularExpressionSearch)
if range != nil {
let found = us.substringWithRange(range!)
println("found: \(found)") // found: google
}
Notes:
(?<=://) means preceded by ://
[^.]+ means any characters except .
(?=.) means followed by .
NSScanner is also a good method. See Apple's NSScanner Class Reference
Example:
let us = "http://google.com"
let scanner = NSScanner(string:us)
var scanned: NSString?
if scanner.scanUpToString("://", intoString:nil) {
scanner.scanString("://", intoString:nil)
if scanner.scanUpToString(".", intoString:&scanned) {
let result: String = scanned as String
println("result: \(result)") // result: google
}
}
You can use the regular Expression
://.+.
it matches to
://google.
in this code:
var yourURL: NSString = "http://google.com" // this is your input and could be any URL
var regex: NSRegularExpression = NSRegularExpression.regularExpressionWithPattern("://.+\\.", options: NSRegularExpressionOptions.fromMask(UInt(0)), error: nil) // need double backspace because of backspace in String is \\ not \
var needleRange = regex.rangeOfFirstMatchInString(yourURL, options:NSMatchingOptions.Anchored, range: NSMakeRange(0, yourURL.length))
var needle: NSString = yourURL.substringWithRange(needleRange)
Now you can remove the first 3 symbols and the last one and you got
google
with this code:
import Foundation
var halfURL: NSString = "://google."
var prefix: NSString = "://"
var suffix: NSString = "."
var needleRange: NSRange = NSMakeRange(prefix.length, halfURL.length - prefix.length - suffix.length)
var needle: NSString = halfURL.substringWithRange(needleRange)
// needle is now 'google'
If your input is a valid URL, you can take advantage of the NSURL class to do the parsing for you:
var result : NSString?
let input = "http://test.com/blabla"
// Parse the string; might fail
let url : NSURL? = NSURL(string: input)
// Get the host part of the URL ("test.com")
let host = url?.host
// Split it up at the dots.
let hostParts = host?.componentsSeparatedByString(".")
// Assign the first part of the hostname if we were successful up to here.
if hostParts?.count > 0 {
result = hostParts![0]
}
Bonus: ignore "www":
if hostParts?.count > 0 {
if (hostParts![0] == "www" && hostParts!.count > 1) {
result = hostParts![1]
} else {
result = hostParts![0]
}
}
For swift 3.0:
let us = "http://example.com"
let range = us.range(of:"(?<=://)[^.]+(?=.com)", options:.regularExpression)
if range != nil {
let found = us.substring(with: range!)
print("found: \(found)") // found: example
}

Resources