How to Split NSString to multiple Strings after certain number of characters - ios

I am developing an iOS app using Xcode 4.6.2.
My app receives from the server lets say for example 1000 characters which is then stored in NSString.
What I want to do is: split the 1000 characters to multiple strings. Each string must be MAX 100 characters only.
The next question is how to check when the last word finished before the 100 characters so I don't perform the split in the middle of the word?

A regex-based solution:
NSString *string = // ... your 1000-character input
NSString *pattern = #"(?ws).{1,100}\\b";
NSError *error = NULL;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern: pattern options: 0 error: &error];
NSArray *matches = [regex matchesInString:string options:0 range:NSMakeRange(0, [string length])];
NSMutableArray *result = [NSMutableArray array];
for (NSTextCheckingResult *match in matches) {
[result addObject: [string substringWithRange: match.range]];
}
The code for the regex and the matches part is taken directly from the docs, so the only difference is the pattern.
The pattern basically matches anything from 1 to 100 characters up to a word boundary. Being a greedy pattern, it will give the longest string possible while still ending with a whole word. This ensures that it won't split any words in the middle.
The (?ws) makes the word recognition work with Unicode's definition of word breaks (the w flag) and treat a line end as any other character (the s flag).
Notice that the algorithm doesn't handle "words" with more than 100 characters well - it will give you the last 100 characters and drop the first part, but that should be a corner case.

(assuming your words are separated by a single space, otherwise use rangeOfCharacterFromSet:options:range:)
Use NSString -- (NSRange)rangeOfString:(NSString *)aString options:(NSStringCompareOptions)mask range:(NSRange)aRange with:
aString as #" "
mask as NSBackwardsSearch
Then you need a loop, where you check that you haven't already got to the end of the string, then create a range (for use as aRange) so that you start 100 characters along the string and search backwards looking for the space. Once you find the space, the returned range will allow you to get the string with substringWithRange:.
(written freehand)
NSRange testRange = NSMakeRange(0, MIN(100, sourceString.length));
BOOL complete = NO;
NSMutableArray *lines = [NSMutableArray array];
while (!complete && (testRange.location + testRange.length) < sourceString.length) {
NSRange hitRange = [sourceString rangeOfString:#"" options:NSBackwardsSearch range:testRange];
if (hitRange.location != NSNotFound) {
[lines addObject:[sourceString substringWithRange:hitRange];
} else {
complete = YES;
}
NSInteger index = hitRange.location + hitRange.length;
testRange = NSMakeRange(index, MIN(100, sourceString.length - index));
}

This can help
- (NSArray *)chunksForString(NSString *)str {
NSMutableArray *chunks = [[NSMutableArray alloc] init];
double sizeChunk = 100.0; // or whatever you want
int length = 0;
int loopSize = ceil([str length]/sizeChunk);
for (int index = 0; index < loopSize; index++) {
NSInteger newRangeEndLimit = ([str length] - length) > sizeChunk ? sizeChunk : ([str length] - length);
[chunks addObject:[str substringWithRange:NSMakeRange(length, newRangeEndLimit)];
length += 99; // Minus 1 from the sizeChunk as indexing starts from 0
}
return chunks;
}

use NSArray *words = [stringFromServer componentsSeparatedBy:#" "];
this will give you words.
if you really need to make it nearest to 100 characters, start appending strings maintaining the total length of the appended strings and check that it should stay < 100.

Related

Split an NSString at the first capital letter

I have a string like this #"abcdefghijklmnopqrstuvwxyzA". As you can see, A is at the end. How can I find the first capital letter and split the strings:
NSString *lower = #"abcdefghijklmnopqrstuvwxyz";
NSString *upper = #"A";
The string in the beginning is static so the capital letter could be ANYTHING. Will this scanner help?
NSString *String = titleLabelLatestNews.text;
NSScanner *stringScanner = [NSScanner scannerWithString:String];
NSString *content = [[NSString alloc] init];
while ([stringScanner isAtEnd] == NO) {
[stringScanner scanUpToString:#"url=\"" intoString:Nil];
[stringScanner scanUpToString:#"/>" intoString:&content];
}
For another example, #"this is all lower case letters I am awesome"; should become two strings, #"this is all lower case letters"; and #"I am awesome";
Get the idea? Anything before the Capital Letter goes to a string and anything after goes to another string.
An NSScanner will do the trick for you, yes. You just need to create an NSCharacterSet consisting of the capital letters, then use scanUpToCharactersFromSet:intoString:
NSString * s = #"this is all lower case letters I am awesome";
NSScanner * scanner = [NSScanner scannerWithString:s];
NSString * firstPart;
[scanner scanUpToCharactersFromSet:[NSCharacterSet uppercaseLetterCharacterSet]
intoString:&firstPart];
NSString * secondPart = [s substringFromIndex:[scanner scanLocation]];
If you insist on using NSScanner, use scanCharactersFromSet:intoString: where the NSCharacterSet is lowercase characters only.
What I would personally do, if anyone cares, is call rangeOfCharacterFromSet(NSCharacterSet.uppercaseLetterCharacterSet()...) and derive the resulting substrings from there.
A better solution is to use NSString's rangeOfCharacterFromSet
NSString *lowerCaseString=#"";
NSString *upperCaseString=#"";
NSString *stringToSplit = titleLabelLatestNews.text;
NSRange capitalRange=[stringToSplit rangeOfCharacterFromSet:[NSCharacterSet uppercaseLetterCharacterSet]];
if (capitalRange.location == NSNotFound) {
lowerCaseString=stringToSplit;
}
else if (capitalRange.location ==0 ) {
upperCaseString=stringToSplit;
}
else {
lowerCaseString=[stringToSplit substringToIndex:capitalRange.location-1];
upperCaseString=[stringToSplit substringFromIndex:capitalRange.location];
}
NSLog(#"lower case string=%# uppercase=%#",lowerCaseString,upperCaseString);
For completeness, the regular expression solution:
Use NSRegularExpression
The pattern #"([^A-Z]*)([A-Z].*)" will match what you want if you are only interested in A-Z as uppercase characters (see below for unicode change). Broken down this is two group, (...), one for before one for after; first group - anything which is not uppercase, [^A-Z], zero or more times, *; second group - an uppercase letter, [A-Z], followed by anything, .*.
Use firstMatchInString:options:range:; the NSTextCheckingResult will contain the ranges of the two matched groups.
If you wish to allow for Unicode's myriad of uppercase and titlecase letters just change A-Z above to \\p{Lu}\\p{Lt} (make sure you type the double-backslashes, you are passing a backslash to NSRegularExpression). Those two are all the Unicode uppercase letters, \\p{Lu}, and all the title case letters, \\p{Lt}.
HTH
Throwing one more solution into the mix utilizing componentsSeparatedByCharactersInSet: to split the string into multiple arrays (i.e. more than 2 if needed):
// Separate the "sentence" into components separated
// by the characters in the uppercase character set
NSMutableArray *sentenceArray = [[sentence componentsSeparatedByCharactersInSet:[NSCharacterSet uppercaseLetterCharacterSet]] mutableCopy];
// Get the first sentence "segment", i.e. the sentenceArray's
// first object
NSString *segment = [sentenceArray objectAtIndex:0];
// Keep track of the character count with a variable
int characterCount = (int)segment.length;
// Then starting from sentenceArray's index 1, go through
// the rest of sentenceArray's indices
for (int i = 1 ; i < sentenceArray.count ; i ++) {
// Append that "separator" character to the segment at the
// current index by accessing the character before the current segment
segment = [[NSString stringWithFormat:#"%c", [sentence characterAtIndex:characterCount]]stringByAppendingString:[sentenceArray objectAtIndex:i]];
// Replace the object at the current index with this new segment
// string
[sentenceArray replaceObjectAtIndex:i withObject:segment];
// Increment the character count
characterCount += segment.length;
}
NSLog(#"%#", sentenceArray);
// Find index of first capital letter
NSInteger index = ^NSInteger{
for (NSInteger i = 0; i < string.length; ++i) {
unichar c = [string characterAtIndex:i];
if ('A' <= c && c <= 'Z') { return i; }
}
return string.length; // No capital letter, take the entire string
}();
NSLog(#"lower = %#", [string substringToIndex:index]);
NSLog(#"upper = %#", [string substringFromIndex:index]);

How would I use NSRegularExpression where if a section is detected and replaced, it won't be done to again?

I have an issue where I want to parse some Markdown, and when I try to parse text with emphasis, where the text wrapped in underscores is to be emphasized (such as this is some _emphasized_ text).
However links also have underscores in them, such as http://example.com/text_with_underscores/, and currently my regular expression would pick up _with_ as an attempt at emphasized text.
Obviously I don't want it to, and as text with emphasis in the middle of it is valid (such as longword*with*emphasis being valid), my go to solution is to parse links first, and almost "mark" those replacements to not be touched again. Is this possible?
One solution you can implement like this:-
NSString *yourStr=#"this is some _emphasized_ text";
NSMutableString *mutStr=[NSMutableString string];
NSUInteger count=0;
for (NSUInteger i=0; i<yourStr.length; i++)
{
unichar c =[yourStr characterAtIndex:i];
if ((c=='_') && (count==0))
{
[mutStr appendString:[NSString stringWithFormat:#"%#",#"<em>"]];
count++;
}
else if ((c=='_') && (count>0))
{
[mutStr appendString:[NSString stringWithFormat:#"%#",#"</em>"]];
count=0;
}
else
{
[mutStr appendString:[NSString stringWithFormat:#"%C",c]];
}
}
NSLog(#"%#",mutStr);
Output:-
this is some <em>emphasized</em> text
__block NSString *yourString = #"media_w940996738_ _help_ 476.mp3";
NSError *error = NULL;
__block NSString *yourNewString;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"([_])\\w+([_])" options:NSRegularExpressionCaseInsensitive error:&error];
yourNewString=[NSString stringWithString:yourString];
[regex enumerateMatchesInString:yourString options:0 range:NSMakeRange(0, [yourString length]) usingBlock:^(NSTextCheckingResult *match, NSMatchingFlags flags, BOOL *stop){
// detect
NSString *subString = [yourString substringWithRange:[match rangeAtIndex:0]];
NSRange range=[match rangeAtIndex:0];
range.location+=1;
range.length-=2;
//print
NSString *string=[NSString stringWithFormat:#"<em>%#</em>",[yourString substringWithRange:range] ];
yourNewString = [yourNewString stringByReplacingOccurrencesOfString:subString withString:string];
}];
First a more usual way to do processing like this would be to tokenise the input; this both makes handling each kind of token easier and is probably more efficient for large inputs. That said, here is how to solve your problem using regular expressions.
Consider:
matchesInString:options:range returns all the non-overlapping matches for a regular expression.
Regular expressions are built from smaller regular expressions and can contain alternatives. So if you have REemphasis which matches strings to emphasise and REurl which matches URLs, then (REemphasis)|(REurl) matches both.
NSTextCheckingResult, instances of which are returned by matchesInString:options:range, reports the range of each group in the match, and if a group does not occur in the result due to alternatives in the pattern then the group's NSRange.location is set to NSNotFound. So for the above pattern, (REemphasis)|(REurl), if group 1 is NSNotFound the match is for the REurl alternative otherwise it is for REemphasis alternative.
The method replacementStringForResult:inString:offset:template will return the replacement string for a match based on the template (aka the replacement pattern).
The above is enough to write an algorithm to do what you want. Here is some sample code:
- (NSString *) convert:(NSString *)input
{
NSString *emphPat = #"(_([^_]+)_)"; // note this pattern does NOT allow for markdown's \_ escapes - that needs to be addressed
NSString *emphRepl = #"<em>$2</em>";
// a pattern for urls - use whatever suits
// this one is taken from http://stackoverflow.com/questions/6137865/iphone-reg-exp-for-url-validity
NSString *urlPat = #"([hH][tT][tT][pP][sS]?:\\/\\/[^ ,'\">\\]\\)]*[^\\. ,'\">\\]\\)])";
// construct a pattern which matches emphPat OR urlPat
// emphPat is first so its two groups are numbered 1 & 2 in the resulting match
NSString *comboPat = [NSString stringWithFormat:#"%#|%#", emphPat, urlPat];
// build the re
NSError *error = nil;
NSRegularExpression *re = [NSRegularExpression regularExpressionWithPattern:comboPat options:0 error:&error];
// check for error - omitted
// get all the matches - includes both urls and text to be emphasised
NSArray *matches = [re matchesInString:input options:0 range:NSMakeRange(0, input.length)];
NSInteger offset = 0; // will track the change in size
NSMutableString *output = input.mutableCopy; // mutuable copy of input to modify to produce output
for (NSTextCheckingResult *aMatch in matches)
{
NSRange first = [aMatch rangeAtIndex:1];
if (first.location != NSNotFound)
{
// the first group has been matched => that is the emphPat (which contains the first two groups)
// determine the replacement string
NSString *replacement = [re replacementStringForResult:aMatch inString:output offset:offset template:emphRepl];
NSRange whole = aMatch.range; // original range of the match
whole.location += offset; // add in the offset to allow for previous replacements
offset += replacement.length - whole.length; // modify the offset to allow for the length change caused by this replacement
// perform the replacement
[output replaceCharactersInRange:whole withString:replacement];
}
}
return output;
}
Note the above does not allow for Markdown's \_ escape sequence and you need to address that. You probably also need to consider the RE used for URLs - one was just plucked from SO and hasn't been tested properly.
The above will convert
http://example.com/text_with_underscores _emph_
to
http://example.com/text_with_underscores <em>emph</em>
HTH

Regex in Objective-C: how to replace matches with a dynamic template?

My input is like "Hi {{username}}", ie. a string with keywords to replace. However, the input is quite small (~ 10 keywords and 1000 characters total), and I have a million possible keywords stored in a hashtable data structure, each associated to its replacement.
Therefore, I do not want to iterate over the keyword list and try to replace each one in the input for obvious performance reason. I prefer to iterate only once over the input characters by looking for the regex pattern "\{\{.+?\}\}".
In Java, I make use of the Matcher.appendReplacement and Matcher.appendTail methods to do that. But I cannot find a similar API with NSRegularExpression.
private String replaceKeywords(String input)
{
Matcher m = Pattern.compile("\\{\\{(.+?)\\}\\}").matcher(input);
StringBuffer sb = new StringBuffer();
while (m.find())
{
String replacement = getReplacement(m.group(1));
m.appendReplacement(sb, replacement);
}
m.appendTail(sb);
return sb.toString();
}
Am I forced to implement such API myself, or did I miss something?
You can achieve this with NSRegularExpression:
NSString *original = #"Hi {{username}} ... {{foo}}";
NSDictionary *replacementDict = #{#"username": #"Peter", #"foo": #"bar"};
NSString *pattern = #"\\{\\{(.+?)\\}\\}";
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:pattern
options:0
error:NULL];
NSMutableString *replaced = [original mutableCopy];
__block NSInteger offset = 0;
[regex enumerateMatchesInString:original
options:0
range:NSMakeRange(0, original.length)
usingBlock:^(NSTextCheckingResult *result, NSMatchingFlags flags, BOOL *stop) {
NSRange range1 = [result rangeAtIndex:1]; // range of the matched subgroup
NSString *key = [original substringWithRange:range1];
NSString *value = replacementDict[key];
if (value != nil) {
NSRange range = [result range]; // range of the matched pattern
// Update location according to previous modifications:
range.location += offset;
[replaced replaceCharactersInRange:range withString:value];
offset += value.length - range.length; // Update offset
}
}];
NSLog(#"%#", replaced);
// Output: Hi Peter ... bar
I don't believe you can do what you want directly.
You could look at using RegexKit Lite, specifically the stringByReplacingOccurrencesOfRegex:usingBlock: method, where the replacement block holds your logic from your while loop above to find and return the appropriate replacement.

Finding first letter in NSString and counting backwards

I'm new to IOS, and was looking for some guidance.
I have a long NSString that I'm parsing out. The beginning may have a few characters of garbage (can be any non-letter character) then 11 digits or spaces, then a single letter (A-Z). I need to get the location of the letter, and get the substring that is 11 characters behind the letter to 1 character behind the letter.
Can anyone give me some guidance on how to do that?
Example: '!!2553072 C'
and I want : '53072 '
You can accomplish this with the regex pattern: (.{11})\b[A-Z]\b
The (.{11}) will grab any 11 characters and the \b[A-Z]\b will look for a single character on a word boundary, meaning it will be surrounded by spaces or at the end of the string. If characters can follow the C in your example then remove the last \b. This can be accomplished in Objective-C like so:
NSError *error;
NSString *example = #"!!2553072 C";
NSRegularExpression *regex = [NSRegularExpression
regularExpressionWithPattern:#"(.{11})\\b[A-Z]\\b"
options:NSRegularExpressionCaseInsensitive
error:&error];
if(!regex)
{
//handle error
}
NSTextCheckingResult *match = [regex firstMatchInString:example
options:0
range:NSMakeRange(0, [example length])];
if(match)
{
NSLog(#"match: %#", [example substringWithRange:[match rangeAtIndex:1]]);
}
There may be a more elegant way to do this involving regular expressions or some Objective-C wizardry, but here's a straightforward solution (personally tested).
-(NSString *)getStringContent:(NSString *)input
{
NSString *substr = nil;
NSRange singleLetter = [input rangeOfCharacterFromSet:[NSCharacterSet letterCharacterSet]];
if(singleLetter.location != NSNotFound)
{
NSInteger startIndex = singleLetter.location - 11;
NSRange substringRange = NSMakeRange(start, 11);
substr = [tester substringWithRange:substringRange];
}
return substr;
}
You can use NSCharacterSets to split up the string, then take the first remaining component (consisting of your garbage and digits) and get a substring of that. For example (not compiled, not tested):
- (NSString *)parseString:(NSString *)myString {
NSCharacterSet *letters = [NSCharacterSet letterCharacterSet];
NSArray *components = [myString componentsSeparatedByCharactersInSet:letters];
assert(components.count > 0);
NSString *prefix = components[0]; // assuming relatively new Xcode
return [prefix substringFromIndex:(prefix.length - 11)];
}
//to get rid of all non-Digits in a NSString
NSString *customerphone = CustomerPhone.text;
int phonelength = [customerphone length];
NSRange customersearchRange = NSMakeRange(0, phonelength);
for (int i =0; i < phonelength;i++)
{
const unichar c = [customerphone characterAtIndex:i];
NSString* onechar = [NSString stringWithCharacters:&c length:1];
if(!isdigit(c))
{
customerphone = [customerphone stringByReplacingOccurrencesOfString:onechar withString:#"*" options:0 range:customersearchRange];
}
}
NSString *PhoneAllNumbers = [customerphone stringByReplacingOccurrencesOfString:#"*" withString:#"" options:0 range:customersearchRange];

How to split string into substrings on iOS?

I received an NSString from the server. Now I want to split it into the substring which I need.
How to split the string?
For example:
substring1:read from the second character to 5th character
substring2:read 10 characters from the 6th character.
You can also split a string by a substring, using NString's componentsSeparatedByString method.
Example from documentation:
NSString *list = #"Norman, Stanley, Fletcher";
NSArray *listItems = [list componentsSeparatedByString:#", "];
NSString has a few methods for this:
[myString substringToIndex:index];
[myString substringFromIndex:index];
[myString substringWithRange:range];
Check the documentation for NSString for more information.
I wrote a little method to split strings in a specified amount of parts.
Note that it only supports single separator characters. But I think it is an efficient way to split a NSString.
//split string into given number of parts
-(NSArray*)splitString:(NSString*)string withDelimiter:(NSString*)delimiter inParts:(int)parts{
NSMutableArray* array = [NSMutableArray array];
NSUInteger len = [string length];
unichar buffer[len+1];
//put separator in buffer
unichar separator[1];
[delimiter getCharacters:separator range:NSMakeRange(0, 1)];
[string getCharacters:buffer range:NSMakeRange(0, len)];
int startPosition = 0;
int length = 0;
for(int i = 0; i < len; i++) {
//if array is parts-1 and the character was found add it to array
if (buffer[i]==separator[0] && array.count < parts-1) {
if (length>0) {
[array addObject:[string substringWithRange:NSMakeRange(startPosition, length)]];
}
startPosition += length+1;
length = 0;
if (array.count >= parts-1) {
break;
}
}else{
length++;
}
}
//add the last part of the string to the array
[array addObject:[string substringFromIndex:startPosition]];
return array;
}

Resources