Understanding how to use NSRegularExpressions correctly - ios

I am trying to write a function that has an NSString and parses it returning an array of tags.
The definition of a tag is any nsstring text that starts with # and contains only alphanumeric characters after the #.
Is this correct?
#.*?[A-Za-z0-9]
I want to use matchesInString:options:range: but need some help.
My function is:
- (void) getTags
{
NSString* str = #"This is my string and a couple of #tags for #you.";
// Range is 0 to 48 (full length of string)
// NSArray should contain #tags and #you only.
Thanks!

The patten "#.*?[A-Za-z0-9]" matches a # which is followed by zero or more
characters which are not in the set [A-Za-z0-9]. What you probably want is
NSString *pattern = #"#[A-Za-z0-9]+";
The you can create a regular expression using that pattern:
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:pattern options:0 error:nil];
and enumerate all matches in the string:
NSString *string = #"abc #tag1 def #tag2.";
NSMutableArray *tags = [NSMutableArray array];
[regex enumerateMatchesInString:string options:0 range:NSMakeRange(0, string.length)
usingBlock:^(NSTextCheckingResult *result, NSMatchingFlags flags, BOOL *stop) {
NSRange range = [result range];
NSString *tag = [string substringWithRange:range];
[tags addObject:tag];
}];
NSLog(#"%#", tags);
Output:
(
"#tag1",
"#tag2"
)

Related

Regex for finding occurrences of all strings present in the original searched term

I'm searching for string "longer" on string: "This is a long sentence. But can be longer."
I am trying to get the range of all the words that are present in the original search term. In the above scenario, it would be the ranges of "long" and "longer". Please let me know if this is possible with regex?
The code that I'm using:
NSMutableArray *arrayOfAllRanges = [[NSMutableArray alloc] init];
NSString *completeString = #"This is a long sentence. But can be longer.";
NSString *searchedTerm = #"longer";
NSRange range = NSMakeRange(0, completeString.length);
NSString *pattern = [NSString stringWithFormat:#"(%#)", searchedTerm];
NSRegularExpression *expression = [NSRegularExpression regularExpressionWithPattern:pattern options:NSRegularExpressionCaseInsensitive error:nil];
[expression enumerateMatchesInString:completeString options:0 range:range usingBlock:^(NSTextCheckingResult *result, NSMatchingFlags flags, BOOL *stop)
{
NSRange foundRange = [result rangeAtIndex:0];
[arrayOfAllRanges addObject:[NSValue valueWithRange:foundRange]];
}];
NSLog(#"Array of all ranges %#", arrayOfAllRanges);
The above code returns just the occurrences of "longer" with the regex "(longer)" but I'm looking for a replacement regex that finds the text "long" as well.

How to work with the results from NSRegularExpression when using the regex pattern as a string delimiter

I'm using a simple pattern with NSRegularExpression to delimit content within a string:
(\s)+(and|or)(\s)+
So, when I use matchesInString it's not the matches that I'm interested in, but the other stuff.
Below is the code that I'm using. Iterating over the matches and then using indexes and lengths to pull out the content.
Question: I'm just wondering if I'm missing something in the api to get the other bits? Or, is the approach below generally ok?
- (NSArray*)separateText:(NSString*)text
{
NSString* regExPattern = #"(\\s)+(and|or)(\\s)+";
NSError* error = NULL;
NSRegularExpression* regex = [NSRegularExpression regularExpressionWithPattern:regExPattern
options:NSRegularExpressionCaseInsensitive
error:&error];
NSArray* matches = [regex matchesInString:text options:0 range:NSMakeRange(0, text.length)];
if (matches.count == 0) {
return #[text];
}
NSInteger itemStartIndex = 0;
NSMutableArray* result = [NSMutableArray new];
for (NSTextCheckingResult* match in matches) {
NSRange matchRange = [match range];
if (!matchRange.location == 0) {
NSInteger matchStartIndex = matchRange.location;
NSInteger length = matchStartIndex - itemStartIndex;
NSString* item = [text substringWithRange:NSMakeRange(itemStartIndex, length)];
if (item.length != 0) {
[result addObject:item];
}
}
itemStartIndex = NSMaxRange(matchRange);
}
if (itemStartIndex != text.length) {
NSInteger length = text.length - itemStartIndex;
NSString* item = [text substringWithRange:NSMakeRange(itemStartIndex, length)];
[result addObject:item];
}
return result;
}
You can capture the string before the and|or with parentheses, and add it to your array with rangeAtIndex.
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"(.+?)(\\s+(and|or)\\W+|\\s*$)" options:NSRegularExpressionCaseInsensitive error:&error];
NSMutableArray *phrases = [NSMutableArray array];
[regex enumerateMatchesInString:string options:0 range:NSMakeRange(0, [string length]) usingBlock:^(NSTextCheckingResult *result, NSMatchingFlags flags, BOOL *stop) {
NSRange range = [result rangeAtIndex:1];
[phrases addObject:[string substringWithRange:range]];
}];
A couple of minor points about my regex:
I added the |\\s*$ construct to capture the last string after the final and|or. If you don't want that, you can eliminate that.
I replaced the second \\s+ (whitespace) with a \\W+ (non-word characters), in case you encountered something like and|or followed by a comma or something else. You could alternatively look explicitly for ,?\\s+ if the comma was the only non-word character you cared about. It just depends upon the specific business problem you're solving.
You might want to replace the first \\s+ with \\W+, too.
If your string contains newline characters, you might want to use the NSRegularExpressionDotMatchesLineSeparators option when you instantiate the NSRegularExpression.
You could replace all matches of the regex with a template string (e.g. ", " or "," etc) and then separate the string components based on that new delimiter.
NSString *stringToBeMatched = #"Your string to be matched";
NSString *regExPattern = #"(\\s)+(and|or)(\\s)+";
NSError *error = nil;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:regExPattern
options:NSRegularExpressionCaseInsensitive
error:&error];
if (error) {
// handle error
}
NSString *replacementString = [regex stringByReplacingMatchesInString:stringToBeMatched
options:0
range:NSMakeRange(0, stringToBeMatched.length)
withTemplate:#","];
NSArray *otherItemsInString = [replacementString componentsSeparatedByString:#","];

iOS: extract substring of NSString in objective C

I have an NSString as:
"<a href='javascript:void(null)' onclick='handleCommandForAnchor(this, 10);return false;'>12321<\/a>"
I need to extract the 12321 near the end of the NSString from it and store.
First I tried
NSString *shipNumHtml=[mValues objectAtIndex:1];
NSInteger htmlLen=[shipNumHtml length];
NSString *shipNum=[[shipNumHtml substringFromIndex:htmlLen-12]substringToIndex:8];
But then I found out that number 12321 can be of variable length.
I can't find a method like java's indexOf() to find the '>' and '<' and then find substring with those indices. All the answers I've found on SO either know what substring to search for or know the location if the substring. Any help?
I don't usually advocate using Regular expressions for parsing HTML contents but it seems a regex matching >(\d+)< would to the job in this simple string.
Here is a simple example:
NSError *error = nil;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#">(\\d+)<"
options:0
error:&error];
// Handle error != nil
NSTextCheckingResult *match = [regex firstMatchInString:string
options:0
range:NSMakeRange(0, [string length])];
if (match) {
NSRange matchRange = [match rangeAtIndex:1];
NSString *number = [string substringWithRange:matchRange]
NSLog(#"Number: %#", number);
}
As #HaneTV says, you can use the NSString method rangeOfString to search for substrings. Given that the characters ">" and "<" appear in multiple places in your string, so you might want to take a look at NSRegularExpression and/or NSScanner.
that may help on you a bit, I've just tested:
NSString *_string = #"<a href='javascript:void(null)' onclick='handleCommandForAnchor(this, 10);return false;'>12321</a>";
NSError *_error;
NSRegularExpression *_regExp = [NSRegularExpression regularExpressionWithPattern:#">(.*)<" options:NSRegularExpressionCaseInsensitive error:&_error];
NSArray *_matchesInString = [_regExp matchesInString:_string options:NSMatchingReportCompletion range:NSMakeRange(0, _string.length)];
[_matchesInString enumerateObjectsUsingBlock:^(NSTextCheckingResult * result, NSUInteger idx, BOOL *stop) {
for (int i = 0; i < result.numberOfRanges; i++) {
NSString *_match = [_string substringWithRange:[result rangeAtIndex:i]];
NSLog(#"%#", _match);
}
}];

how to replace many occurrences of comma by single comma

Earlier I had string as 1,2,3,,5,6,7
To replace string, I used stringByReplacingOccurrencesOfString:#",," withString:#",", which gives output as 1,2,3,5,6,7
Now I have string as below.
1,2,3,,,6,7
To replace string, I used stringByReplacingOccurrencesOfString:#",," withString:#",", which gives output as 1,2,3,,6,7
Is there way where I can replace all double comma by single comma.
I know I can do it using for loop or while loop, but I want to check is there any other way?
for (int j=1;j<=100;j++) {
stringByReplacingOccurrencesOfString:#",," withString:#","]]
}
NSString *string = #"1,2,3,,,6,7";
NSError *error = nil;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#",{2,}" options:NSRegularExpressionCaseInsensitive error:&error];
NSString *modifiedString = [regex stringByReplacingMatchesInString:string options:0 range:NSMakeRange(0, [string length]) withTemplate:#","];
NSLog(#"%#", modifiedString);
This will match any number of , present in the string. It's future proof :)
Not the perfect solution, but what about this
NSString *string = #"1,2,3,,,6,7";
NSMutableArray *array =[[string componentsSeparatedByString:#","] mutableCopy];
[array removeObject:#""];
NSLog(#"%#",[array componentsJoinedByString:#","]);

How to get all strings inside [...] in one NSString?

Say given an NSString:
#"[myLabel]-10-[youImageView]"
I need an array of:
#[#"myLabel", #"yourImageView"]
How do I do it?
I thought about going through the string and check each '[' and ']', get string inside them, but is there any other better way?
Thanks
You can use regular expressions:
NSString *string = #"[myLabel]-10-[youImageView]";
// Regular expression to find "word characters" enclosed by [...]:
NSString *pattern = #"\\[(\\w+)\\]";
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:pattern
options:0
error:NULL];
NSMutableArray *list = [NSMutableArray array];
[regex enumerateMatchesInString:string
options:0
range:NSMakeRange(0, [string length])
usingBlock:^(NSTextCheckingResult *result, NSMatchingFlags flags, BOOL *stop) {
// range = location of the regex capture group "(\\w+)" in the string:
NSRange range = [result rangeAtIndex:1];
[list addObject:[string substringWithRange:range]];
}
];
NSLog(#"%#", list);
Output:
(
myLabel,
youImageView
)
Would this work for you?
NSCharacterSet *aSet = [NSCharacterSet characterSetWithCharactersInString:#"]-10["];
NSArray *anArray = [aString componentsSeparatedByCharactersInSet:aSet];

Resources