Getting certain parts of strings - ios

If I have a string that returns a value of :
<div style="clear:both;"></div>
<div style="float:left;">
<div style="float:left; height:27px; font-size:13px; padding-top:2px;">
<div style="float:left;">Download</div>
How can I just get the <a href="http://www.hulkshare.com/ap-nxy2n2wn7ke8.mp3" part out of it? I apologise if there is posts about this already, I couldn't find any.

Here's an example of using regular expressions to find substrings. It looks for "href=" and then for the first quote (") after href=. Once these indexes are found, the string between then is returned.
Regular expressions aren't really needed in my example, you could use simple NSString methods to find substrings instead.
This is just a hard coded example that fits your specific case. In practice you're better off using a DOM/XML parser to do something like this.
Also I'm assuming you want to extract the actual URL and don't care about the
Also note this function doesn't handle the case that there is no href match in the string.
- (NSString *)stringByExtractingAnchorTagURLFromString:(NSString *)dom {
NSError *error;
// Find the "href=" part
NSRegularExpression *firstRegexp = [NSRegularExpression regularExpressionWithPattern:#"href=\"" options:NSRegularExpressionCaseInsensitive error:&error];
NSTextCheckingResult *firstResult = [firstRegexp firstMatchInString:dom options:NSMatchingReportProgress range:NSMakeRange(0, [dom length])];
NSUInteger startIndex = firstResult.range.location + firstResult.range.length;
// Find the first quote (") character after the href=
NSRegularExpression *secondRegexp = [NSRegularExpression regularExpressionWithPattern:#"\"" options:NSRegularExpressionCaseInsensitive error:&error];
NSTextCheckingResult *secondResult = [secondRegexp firstMatchInString:dom options:NSMatchingReportProgress range:NSMakeRange(startIndex, [dom length]-startIndex)];
NSUInteger endIndex = secondResult.range.location;
// The URL is the string between these two found locations
return [dom substringWithRange:NSMakeRange(startIndex, endIndex-startIndex)];
}
This is how I tested it:
NSString *dom = #"<div style=\"clear:both;\"></div><div style=\"float:left;\"><div style=\"float:left; height:27px; font-size:13px; padding-top:2px;\"><div style=\"float:left;\">Download</div>";
NSString *result = [self stringByExtractingAnchorTagURLFromString:dom];
NSLog(#"Result: %#", result);
The test prints:
Result: http://www.hulkshare.com/ap-nxy2n2wn7ke8.mp3
UPDATE -- Multiple HREFs
For multiple hrefs use this function, which will return an array of NSStrings holding the urls:
- (NSArray *)anchorTagURLsFromString:(NSString *)dom {
NSError *error;
NSMutableArray *urls = [NSMutableArray array];
// First find all matching hrefs in the dom
NSRegularExpression *firstRegexp = [NSRegularExpression regularExpressionWithPattern:#"href=\"" options:NSRegularExpressionCaseInsensitive error:&error];
NSArray *matches = [firstRegexp matchesInString:dom options:NSMatchingReportProgress range:NSMakeRange(0, [dom length])];
// Go through all matches and extrac the URL
for (NSTextCheckingResult *match in matches) {
NSUInteger startIndex = match.range.location + match.range.length;
// Find the first quote (") character after the href=
NSRegularExpression *secondRegexp = [NSRegularExpression regularExpressionWithPattern:#"\"" options:NSRegularExpressionCaseInsensitive error:&error];
NSTextCheckingResult *secondResult = [secondRegexp firstMatchInString:dom options:NSMatchingReportProgress range:NSMakeRange(startIndex, [dom length]-startIndex)];
NSUInteger endIndex = secondResult.range.location;
[urls addObject:[dom substringWithRange:NSMakeRange(startIndex, endIndex-startIndex)]];
}
return urls;
}
This is how I tested it:
NSString *dom2 = #"<div style=\"clear:both;\"></div><div style=\"float:left;\"><div style=\"float:left; height:27px; font-size:13px; padding-top:2px;\"><div style=\"float:left;\">DownloadDownload</div>";
NSArray *urls = [self anchorTagURLsFromString:dom2];
for (NSString *url in urls) {
NSLog(#"URL: %#", url);
}
This is the output of the test:
URL: http://www.hulkshare.com/ap-nxy2n2wn7ke8.mp3
URL: http://www.google.com/blabla

I would have a look as NSRegularExpression Class
http://developer.apple.com/library/ios/#documentation/Foundation/Reference/NSRegularExpression_Class/Reference/Reference.html

Related

Detect hashtags including & in hashtag

I can detect hashtags like this.
+ (NSArray *)getHashArrayWithInputString:(NSString *)inputStr
{
NSError *error = nil;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"#(\\w+)" options:0 error:&error];
NSArray *matches = [regex matchesInString:inputStr options:0 range:NSMakeRange(0, inputStr.length)];
NSMutableArray *muArr = [NSMutableArray array];
for (NSTextCheckingResult *match in matches) {
NSRange wordRange = [match rangeAtIndex:1];
NSString* word = [inputStr substringWithRange:wordRange];
NSCharacterSet* notDigits = [[NSCharacterSet decimalDigitCharacterSet] invertedSet];
if ([word rangeOfCharacterFromSet:notDigits].location == NSNotFound)
{
// newString consists only of the digits 0 through 9
}
else
[muArr addObject:[NSString stringWithFormat:#"#%#",word]];
}
return muArr;
}
Problem is that if inputStr is "#D&D", it can detect only #D. How shall I do?
For that with your reg expression add special character that you want allow.
#(\\w+([&]*\\w*)*) //To allow #D&D&d...
#(\\w+([&-]*\\w*)*) //To allow both #D&D-D&...
Same way you add other special character that you want.
So simply change your regex like this.
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"#(\\w+([&]*\\w*)*)" options:0 error:&error];
I was using this lib:
https://cocoapods.org/pods/twitter-text
There is TwitterText class with method
(NSArray *)hashtagsInText:(NSString *)text checkingURLOverlap (BOOL)checkingURLOverlap It could help.
I used this pod year ago last time, then it worked great. For today you need to check if it is still ok. Let me know :) Good luck

How to use regular expression match "in iOS

For example , the following is the source I want match:
<div class="cont">
I use
<div\s+class\=\"cont\">
But it doesn't work , if I modify the expression like
<div\s+class\=.*?cont.*?>
Now , it can give me the result I want .
So I think , the problem should be in " this character.
Following is the code I use in iOS , it can works for some other regular expression:
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:reg options:NSRegularExpressionCaseInsensitive error:nil];
NSArray *matches = [regex matchesInString:resultStr options:0 range:NSMakeRange(0, [resultStr length])];
for (NSTextCheckingResult *match in matches) {
NSRange matchRange = [match range];
NSString *tagString = [resultStr substringWithRange:matchRange];
[resultArr addObject:tagString];
}
You are trying to match HTML with regular expressions. It is definitely very troublesome, since HTML you receive can be all uppercase, single quotes may be used instead of double or be missing, etc.
That said, if you really need a regex solution, I'd recommend to account for any number of attributes before class=cont and allow any attribute value delimiters:
NSString *pattern = #"<div\\b[^<]*class=[\"']?cont\\b[^<]*>";
Here, I am using \b to match a word boundary, [^<]* checks for any other attributes before class, ["']? allows either a single or double quotation mark or nothing, then \b makes sure cont is followed by a non-word character, and [^<]* checks for any other attributes before final >.
Also, \" is escaped once as it is a C string delimiter and \\b is escaped twice to make sure we pass \b to the regex engine.
Sample code at CodingGround:
#import <Foundation/Foundation.h>
#import <Foundation/NSTextCheckingResult.h>
int main (int argc, const char * argv[])
{
NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];
NSError *error = nil;
NSString *pattern = #"<div\\b[^<]*class=[\"']?cont\\b[^<]*>";
NSString *string = #"<div class=\"cont\">";
NSRange range = NSMakeRange(0, string.length);
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:pattern options:0 error:&error];
NSArray *matches = [regex matchesInString:string options:0 range:range];
for (NSTextCheckingResult *match in matches) {
NSRange matchRange = [match range];
NSString *m = [string substringWithRange:matchRange];
NSLog(#"Matched string: %#", m);
}
[pool drain];
return 0;
}
Here goes the code:
NSString *stricterFilterString = #"[A-Z0-9a-z\\._%+-]+#([A-Za-z0-9-]+\\.)+[A-Za-z]{2,4}";
NSString *laxString = #".+#([A-Za-z0-9]+\\.)+[A-Za-z]{2}[A-Za-z]*";
NSString *emailRegex = stricterFilter ? stricterFilterString : laxString;
NSPredicate *emailTest = [NSPredicate predicateWithFormat:#"SELF MATCHES %#", emailRegex];

How to work with the results from NSRegularExpression when using the regex pattern as a string delimiter

I'm using a simple pattern with NSRegularExpression to delimit content within a string:
(\s)+(and|or)(\s)+
So, when I use matchesInString it's not the matches that I'm interested in, but the other stuff.
Below is the code that I'm using. Iterating over the matches and then using indexes and lengths to pull out the content.
Question: I'm just wondering if I'm missing something in the api to get the other bits? Or, is the approach below generally ok?
- (NSArray*)separateText:(NSString*)text
{
NSString* regExPattern = #"(\\s)+(and|or)(\\s)+";
NSError* error = NULL;
NSRegularExpression* regex = [NSRegularExpression regularExpressionWithPattern:regExPattern
options:NSRegularExpressionCaseInsensitive
error:&error];
NSArray* matches = [regex matchesInString:text options:0 range:NSMakeRange(0, text.length)];
if (matches.count == 0) {
return #[text];
}
NSInteger itemStartIndex = 0;
NSMutableArray* result = [NSMutableArray new];
for (NSTextCheckingResult* match in matches) {
NSRange matchRange = [match range];
if (!matchRange.location == 0) {
NSInteger matchStartIndex = matchRange.location;
NSInteger length = matchStartIndex - itemStartIndex;
NSString* item = [text substringWithRange:NSMakeRange(itemStartIndex, length)];
if (item.length != 0) {
[result addObject:item];
}
}
itemStartIndex = NSMaxRange(matchRange);
}
if (itemStartIndex != text.length) {
NSInteger length = text.length - itemStartIndex;
NSString* item = [text substringWithRange:NSMakeRange(itemStartIndex, length)];
[result addObject:item];
}
return result;
}
You can capture the string before the and|or with parentheses, and add it to your array with rangeAtIndex.
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"(.+?)(\\s+(and|or)\\W+|\\s*$)" options:NSRegularExpressionCaseInsensitive error:&error];
NSMutableArray *phrases = [NSMutableArray array];
[regex enumerateMatchesInString:string options:0 range:NSMakeRange(0, [string length]) usingBlock:^(NSTextCheckingResult *result, NSMatchingFlags flags, BOOL *stop) {
NSRange range = [result rangeAtIndex:1];
[phrases addObject:[string substringWithRange:range]];
}];
A couple of minor points about my regex:
I added the |\\s*$ construct to capture the last string after the final and|or. If you don't want that, you can eliminate that.
I replaced the second \\s+ (whitespace) with a \\W+ (non-word characters), in case you encountered something like and|or followed by a comma or something else. You could alternatively look explicitly for ,?\\s+ if the comma was the only non-word character you cared about. It just depends upon the specific business problem you're solving.
You might want to replace the first \\s+ with \\W+, too.
If your string contains newline characters, you might want to use the NSRegularExpressionDotMatchesLineSeparators option when you instantiate the NSRegularExpression.
You could replace all matches of the regex with a template string (e.g. ", " or "," etc) and then separate the string components based on that new delimiter.
NSString *stringToBeMatched = #"Your string to be matched";
NSString *regExPattern = #"(\\s)+(and|or)(\\s)+";
NSError *error = nil;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:regExPattern
options:NSRegularExpressionCaseInsensitive
error:&error];
if (error) {
// handle error
}
NSString *replacementString = [regex stringByReplacingMatchesInString:stringToBeMatched
options:0
range:NSMakeRange(0, stringToBeMatched.length)
withTemplate:#","];
NSArray *otherItemsInString = [replacementString componentsSeparatedByString:#","];

iOS: extract substring of NSString in objective C

I have an NSString as:
"<a href='javascript:void(null)' onclick='handleCommandForAnchor(this, 10);return false;'>12321<\/a>"
I need to extract the 12321 near the end of the NSString from it and store.
First I tried
NSString *shipNumHtml=[mValues objectAtIndex:1];
NSInteger htmlLen=[shipNumHtml length];
NSString *shipNum=[[shipNumHtml substringFromIndex:htmlLen-12]substringToIndex:8];
But then I found out that number 12321 can be of variable length.
I can't find a method like java's indexOf() to find the '>' and '<' and then find substring with those indices. All the answers I've found on SO either know what substring to search for or know the location if the substring. Any help?
I don't usually advocate using Regular expressions for parsing HTML contents but it seems a regex matching >(\d+)< would to the job in this simple string.
Here is a simple example:
NSError *error = nil;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#">(\\d+)<"
options:0
error:&error];
// Handle error != nil
NSTextCheckingResult *match = [regex firstMatchInString:string
options:0
range:NSMakeRange(0, [string length])];
if (match) {
NSRange matchRange = [match rangeAtIndex:1];
NSString *number = [string substringWithRange:matchRange]
NSLog(#"Number: %#", number);
}
As #HaneTV says, you can use the NSString method rangeOfString to search for substrings. Given that the characters ">" and "<" appear in multiple places in your string, so you might want to take a look at NSRegularExpression and/or NSScanner.
that may help on you a bit, I've just tested:
NSString *_string = #"<a href='javascript:void(null)' onclick='handleCommandForAnchor(this, 10);return false;'>12321</a>";
NSError *_error;
NSRegularExpression *_regExp = [NSRegularExpression regularExpressionWithPattern:#">(.*)<" options:NSRegularExpressionCaseInsensitive error:&_error];
NSArray *_matchesInString = [_regExp matchesInString:_string options:NSMatchingReportCompletion range:NSMakeRange(0, _string.length)];
[_matchesInString enumerateObjectsUsingBlock:^(NSTextCheckingResult * result, NSUInteger idx, BOOL *stop) {
for (int i = 0; i < result.numberOfRanges; i++) {
NSString *_match = [_string substringWithRange:[result rangeAtIndex:i]];
NSLog(#"%#", _match);
}
}];

how to replace many occurrences of comma by single comma

Earlier I had string as 1,2,3,,5,6,7
To replace string, I used stringByReplacingOccurrencesOfString:#",," withString:#",", which gives output as 1,2,3,5,6,7
Now I have string as below.
1,2,3,,,6,7
To replace string, I used stringByReplacingOccurrencesOfString:#",," withString:#",", which gives output as 1,2,3,,6,7
Is there way where I can replace all double comma by single comma.
I know I can do it using for loop or while loop, but I want to check is there any other way?
for (int j=1;j<=100;j++) {
stringByReplacingOccurrencesOfString:#",," withString:#","]]
}
NSString *string = #"1,2,3,,,6,7";
NSError *error = nil;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#",{2,}" options:NSRegularExpressionCaseInsensitive error:&error];
NSString *modifiedString = [regex stringByReplacingMatchesInString:string options:0 range:NSMakeRange(0, [string length]) withTemplate:#","];
NSLog(#"%#", modifiedString);
This will match any number of , present in the string. It's future proof :)
Not the perfect solution, but what about this
NSString *string = #"1,2,3,,,6,7";
NSMutableArray *array =[[string componentsSeparatedByString:#","] mutableCopy];
[array removeObject:#""];
NSLog(#"%#",[array componentsJoinedByString:#","]);

Resources