NSPredicate to match unescaped apostrophes - ios

I'd like to check an NSString (json) if there are any unescaped apostrophes, but the NSPredicate won't find it, even if the regex is correct.
Here's my code:
NSString* regx = #"[^\\\\]'";
NSPredicate* p = [NSPredicate predicateWithFormat:#"SELF MATCHES %#",regx];
if([p evaluateWithObject:json]){
//gotit
...
I know that there are some apostrophes that are not escaped, but NSPredicate just doesn't find it.
Any idea how to solve this problem?
Also if I look at the json I see the apostrophes as \u0027.

"SELF MATCHES …" tries to match the entire string, therefore you have to use the regex
NSString* regx = #".*[^\\\\]'.*";
Alternatively:
NSString* regx = #"[^\\\\]'";
NSRange r = [json rangeOfString:regx options:NSRegularExpressionSearch];
if (r.location != NSNotfound) {
…
}
But the question remains why this is necessary. NSJSONSerialization should handle
all escaping and quoting correctly.

This is the regex which works for me:
.*[^\\\\]\\\\u0027.*

Related

NSPredicate with regex capture always gets 0 results

Hi to all overflowers,
I'm scratching my head around putting a regular expression inside an NSPredicate.
I would like to move all our thumbnails from Documents directory into Caches directory and catch em'all I've created this regex: _thumb(#[2-3]x)?\.jpg.
Here on regex101.com you can see the above regex working with this test data:
grwior_thumb.jpg <- match
grwior.jpg
vuoetrjrt_thumb#2x.jpg <- match
vuoetrjrt.jpg
hafiruwhf_thumb.jpg <- match
hafiruwhf_thumb#2x.jpg <- match
hafiruwhf_thumb#3x.jpg <- match
hafiruwhf.jpg
But when I put it in the code it's not matching anything:
NSError *error = nil;
NSFileManager *fileManager = [NSFileManager defaultManager];
// Find and move thumbs to the caches folder
NSArray<NSString *> *mediaFilesArray = [fileManager contentsOfDirectoryAtPath:documentsPath error:&error];
NSString *regex = #"_thumb(#[2-3]x)?\\.jpg";
NSPredicate *thumbPredicate = [NSPredicate predicateWithFormat: #"SELF ENDSWITH %#", regex];
NSArray<NSString *> *thumbFileArray = [mediaFilesArray filteredArrayUsingPredicate:thumbPredicate];
thumbFileArray has always 0 elements...
What am I doing wrong?
Use MATCHES rather than ENDSWITH, as ENDSWITH does not treat the expression as a regular expression, but make sure you match all the chars from the start of the string, too, as MATCHES requires a full string match, so you need to somehow match the chars before the _.
Use
NSString *regex = #".*_thumb(#[23]x)?\\.jpg";
And then
[NSPredicate predicateWithFormat: #"SELF MATCHES %#", regex]
The .* will match any 0+ chars other than line break chars, as many as possible.
Note that if you just want to match either 2 or 3, you might as well write [23], no need for a - range operator here.
You may also replace (#[23]x)? with (?:#[23]x)?, i.e. change the capturing group to a non-capturing, since you do not seem to need the submatch to be accessible later. If you do, keep the optional capturing group.
The problem is with ENDSWITH.
ENDSWITH
The left-hand expression ends with the right-hand expression.
MATCHES
The left hand expression equals the right hand expression using a regex-style comparison according to ICU v3
What you need is
NSString *regex = #".+_thumb(#[2-3]x)?\\.jpg";
NSPredicate *thumbPredicate = [NSPredicate predicateWithFormat: #"SELF MATCHES %#", regex];

Validate a string using regex

I want to validate a string to check if it is alphanumeric and contains "-" and "." with the alphanumeric characters. So I have done something like this to form the regex pattern
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"[a-zA-Z0-9\\.\\-]"
options:NSRegularExpressionCaseInsensitive
error:&error];
NSPredicate *regexTest = [NSPredicate predicateWithFormat:#"SELF MATCHES %#", regex];
BOOL valid = [regexTest evaluateWithObject:URL_Query];
App crashes stating that the regex pattern cannot be formed . Can anyone give me a quickfix to what am i doing wrong? Thanks in advance.
You must pass a variable of type NSString to the NSPredicate SELF MATCHES:
NSString * URL_Query = #"PAS.S.1-23-";
NSString * regex = #"[a-zA-Z0-9.-]+";
NSPredicate *regexTest = [NSPredicate predicateWithFormat:#"SELF MATCHES %#", regex];
BOOL valid = [regexTest evaluateWithObject:URL_Query];
See the Objective C demo
Note that you need no anchors with the SELF MATCHES (the regex is anchored by default) and you need to add + to match one or more allows symbols, or * to match 0+ (to also allow an empty string).
You do not need to escape the hyphen at the start/end of the character class, and the dot inside a character class is treated as a literal dot char.
Also, since both the lower- and uppercase ASCII letter ranges are present in the pattern, you need not pass any case insensitive flags to the regex.

add componentsseparatedbystring into a predicate with core data

i have a String stored in an Entity (core data) i want to use an NSFetchedResultsController to get data.
string format: abc,ba,x,s,d. this is an array of IDs saved as string.
i want to get only entities that contains at least an IDs in that string.
the problem is if i use CONTAIN in the predicate and search for "a" i will get a wrong result.
could you please tel me if it's possible to add something like "componentsseparatedbystring" in a predicate so i can iterate and use "in"in the result or if there's an other solution, thanks.
You can use the "MATCHES" operator in a predicate, which does a
regular expression match:
NSString *searchID = #"a";
NSString *pattern = [NSString stringWithFormat:#"(^|.*,)%#(,.*|$)",
[NSRegularExpression escapedPatternForString:searchID]];
NSPredicate *predicate = [NSPredicate predicateWithFormat:#"ID MATCHES %#", pattern];
The pattern (^|.*,)TERM(,.*|$) searches for TERM which is preceded
by either the start of the string or a comma, and followed by the
end of the string or another comma.
First convert your array of ID's into an NSArray:
NSArray *arrayOfIds = [stringOfIds componentsSeparatedByString:#","];
Then use an IN predicate on your fetch:
NSPredicate *predicate = [NSPredicate predicateWithFormat:#"ID IN %#", arrayOfIds];
this assumes your database column is called "ID", and your comma-separated string of ID's is stringOfIds.
finally i will use a dirty solution: we have 4 possibilities:
string format= a
string format= a,..
string format= ..,a
string format= ..,a,..
so the predicate could be:
[NSPredicate predicateWithFormat:#"(ID LIKE %# OR
ID CONTAINS %# OR
ID CONTAINS %# OR
ID ENDSWITH %#)",
searchedID,
[NSString stringWithFormat:#"%#,", searchedID],
[NSString stringWithFormat:#",%#", searchedID],
[NSString stringWithFormat:#",%#,", searchedID]]
but this is a dirty solution, i really want something cleaner.

NSPredicate Format String with unexpected result

I am learning NSPredicate and I have an example with problem.
NSArray * array = #[#{#"name":#"KudoCC"}, #{#"name":#"123"}] ;
NSPredicate * predicate = [NSPredicate predicateWithFormat:#"name == '%#'", #123] ;
NSArray * result = [array filteredArrayUsingPredicate:predicate] ;
The parameter here is #123, it is NSNumber type. I think it works the same as #"name == '123'", but the result is nil, which I expected as #{#"name":#"123"}.
Can somebody tell me why? Thank you in advance.
The document here said,
If you use variable substitution using %# (such as firstName like %#), the quotation marks are added for you automatically.
Quotation marks should be avoided in common cases. If you use something like #"%K == '%#'", you are actually comparing the key with #"%#". Only if you have an array like #[#{#"%#": #"KudoCC"}], you need this way.

String search with Turkish dotless i

When searching the text Çınaraltı Café for the text Ci using the code
NSStringCompareOptions options =
NSCaseInsensitiveSearch |
NSDiacriticInsensitiveSearch |
NSWidthInsensitiveSearch;
NSLocale *locale = [NSLocale localeWithLocaleIdentifier:#"tr"];
NSRange range = [haystack rangeOfString:needle
options:options
range:NSMakeRange(o, haystack.length)
locale:locale];
I get range.location equals NSNotFound.
It's not to do with the diacritic on the initial Ç because I get the same result searching for alti where the only odd character is the ı. I also get a valid match searching for Cafe which contains a diacritic (the é).
The apple docs mention this situation as notes on the locale parameter and I think I'm following them. Though I guess I'm not because it's not working.
How can I get a search for 'i' to match both 'i' and 'ı'?
I don't know whether this helps as an answer, but perhaps explains why it's happening.
I should point out I'm not an expert in this matter, but I've been looking into this for my own purposes and been doing some research.
Looking at the Unicode collation chart for latin, the equivalent characters to ASCII "i" (\u0069) do not include "ı" (\u0131), whereas all the other letters in your example string are as you expect, i.e.:
"c" (\u0063) does include "Ç" (\u00c7)
"e" (\u0065) does include "é" (\u00e9)
The ı character is listed separately as being of primary difference to i. That might not make sense to a Turkish speaker (I'm not one) but it's what Unicode have to say about it, and it does fit the logic of the problem you describe.
In Chrome you can see this in action with an in-page search. Searching in the page for ASCII i highlights all the characters in its block and does not match ı. Searching for ı does the opposite.
By contrast, MySQL's utf8_general_ci collation table maps uppercase ASCII I to ı as you want.
So, without knowing anything about iOS, I'm assuming it's using the Unicode standard and normalising all characters to latin by this table.
As to how you match Çınaraltı with Ci - if you can't override the collation table then perhaps you can just replace i in your search strings with a regular expression, so you search on Ç[iı] instead.
I wrote a simple extension in Swift 3 for Turkish string search.
let turkishSentence = "Türkçe ya da Türk dili, batıda Balkanlar’dan başlayıp doğuda Hazar Denizi sahasına kadar konuşulan Altay dillerinden biridir."
let turkishWannabe = "basLayip"
let shouldBeTrue = turkishSentence.contains(turkishString: turkishWannabe, caseSensitive: false)
let shouldBeFalse = turkishSentence.contains(turkishString: turkishWannabe, caseSensitive: true)
You can check it out from https://github.com/alpkeser/swift_turkish_string_search/blob/master/TurkishTextSearch.playground/Contents.swift
I did this and seems to work well for me.. hope it helps!
NSString *cleanedHaystack = [haystack stringByReplacingOccurrencesOfString:#"ı"
withString:#"i"];
cleanedHaystack = [cleanedHaystack stringByReplacingOccurrencesOfString:#"İ"
withString:#"I"];
NSString *cleanedNeedle = [needle stringByReplacingOccurrencesOfString:#"ı"
withString:#"i"];
cleanedNeedle = [cleanedNeedle stringByReplacingOccurrencesOfString:#"İ"
withString:#"I"];
NSUInteger options = (NSDiacriticInsensitiveSearch |
NSCaseInsensitiveSearch |
NSWidthInsensitiveSearch);
NSRange range = [cleanedHaystack rangeOfString:cleanedNeedle
options:options];
As Tim mentions, we can use regular expression to match text containing i or ı. I also didn't want to add a new field or change the source data as the search looks up huge amounts of string. So I ended up a solution using regular expressions and NSPredicate.
Create NSString category and copy this method. It returns basic or matching pattern. You can use it with any method that accepts regular expression pattern.
- (NSString *)zst_regexForTurkishLettersWithCaseSensitive:(BOOL)caseSensitive
{
NSMutableString *filterWordRegex = [NSMutableString string];
for (NSUInteger i = 0; i < self.length; i++) {
NSString *letter = [self substringWithRange:NSMakeRange(i, 1)];
if (caseSensitive) {
if ([letter isEqualToString:#"ı"] || [letter isEqualToString:#"i"]) {
letter = #"[ıi]";
} else if ([letter isEqualToString:#"I"] || [letter isEqualToString:#"İ"]) {
letter = #"[Iİ]";
}
} else {
if ([letter isEqualToString:#"ı"] || [letter isEqualToString:#"i"] ||
[letter isEqualToString:#"I"] || [letter isEqualToString:#"İ"]) {
letter = #"[ıiIİ]";
}
}
[filterWordRegex appendString:letter];
}
return filterWordRegex;
}
So if the search word is Şırnak, it creates Ş[ıi]rnak for case sensitive and Ş[ıiIİ]rnak for case insensitive search.
And here are the possible usages.
NSString *testString = #"Şırnak";
// First create your search regular expression.
NSString *searchWord = #"şır";
NSString *searchPattern = [searchWord zst_regexForTurkishLettersWithCaseSensitive:NO];
// Then create your matching pattern.
NSString *pattern = searchPattern; // Direct match
// NSString *pattern = [NSString stringWithFormat:#".*%#.*", searchPattern]; // Contains
// NSString *pattern = [NSString stringWithFormat:#"\\b%#.*", searchPattern]; // Begins with
// NSPredicate
// c for case insensitive, d for diacritic insensitive
NSPredicate *predicate = [NSPredicate predicateWithFormat:#"self matches[cd] %#", pattern];
if ([predicate evaluateWithObject:testString]) {
// Matches
}
// If you want to filter an array of objects
NSArray *matchedCities = [allAirports filteredArrayUsingPredicate:
[NSPredicate predicateWithFormat:#"city matches[cd] %#", pattern]];
You can also use NSRegularExpression but I think using case and diacritic insensitive search with NSPredicate is much more simpler.

Resources