Can anybody duplicate this result? I'm testing rangeOfMisspelledWordInString (in iOS) to find mispelled words and some random letters return a valid word as shown below.
UITextChecker* pSpellChecker = [[[UITextChecker alloc] init] autorelease];
NSRange rangeWord = NSMakeRange(0, 8);
NSRange rangeCheck = [_pSpellChecker rangeOfMisspelledWordInString:#"lhpcjeuw"
range:rangeWord
startingAt:0
wrap:NO
language:#"en_US"];
if (rangeCheck.location == NSNotFound) {
NSLog(#"Valid Word:");
}
Below are some of the words that are also valid according to rangeOfMisspelledWordInString:
BTW, I've made sure to convert the following words to lowercase before testing.
LD
THY
THE
THECA
TD
HL
HT
YD
YLEQXXH
DV
DVX
DVXX
DVXXD
DVXXDX
DVXXX
DVHXG
DVHEJWCP
DH
DH
DPJLEHHY
Very strange. Am I doing something wrong?
I think "the" and "thy" just might be valid words ;)
Other than that, my best guess is that the text system can't provide a guess for the word, and so ignores it entirely - the semantic meaning of "misspelled word" might not include "strings of letters which cannot conceivably be a misspelling of a word." I notice that when I type those strings into a system text field (e.g. in Messages), I don't get any replacement suggestions.
You could also make sure that your UITextChecker instance isn't set to ignore those particular words; take a look at the ignoredWords property.
Yes, I can reproduce it, and I would call this a bug. If I put in your test word, #"lhpcjeuw", it treats it as a valid word. But, if I use #"lahpcjeuw" (added an "a" in second position), it catches it. I noticed the same thing Tim did -- when writing this answer, the spell checker underlined that second one but not the first as I typed.
Related
I've been looking for a good way to see if a string of items are all numbers, and thought there might be a way of specifying a range from 0 to 9 and seeing if they're included in the string, but all that I've looked up online has really confused me.
def validate_pin(pin)
(pin.length == 4 || pin.length == 6) && pin.count("0-9") == pin.length
end
The code above is someone else's work and I've been trying to identify how it works. It's a pin checker - takes in a set of characters and ensures the string is either 4 or 6 digits and all numbers - but how does the range work?
When I did this problem I tried to use to_a? Integer and a bunch of other things including ranges such as (0..9) and ("0..9) and ("0".."9") to validate a character is an integer. When I saw ("0-9) it confused the heck out of me, and half an hour of googling and youtube has only left me with regex tutorials (which I'm interested in, but currently just trying to get the basics down)
So to sum this up, my goal is to understand a more semantic/concise way to identify if a character is an integer. Whatever is the simplest way. All and any feedback is welcome. I am a new rubyist and trying to get down my fundamentals. Thank You.
Regex really is the right way to do this. It's specifically for testing patterns in strings. This is how you'd test "do all characters in this string fall in the range of characters 0-9?":
pin.match(/\A[0-9]+\z/)
This regex says "Does this string start and end with at least one of the characters 0-9, with nothing else in between?" - the \A and \z are start-of-string and end-of-string matchers, and the [0-9]+ matches any one or more of any character in that range.
You could even do your entire check in one line of regex:
pin.match(/\A([0-9]{4}|[0-9]{6})\z/)
Which says "Does this string consist of the characters 0-9 repeated exactly 4 times, or the characters 0-9, repeated exactly 6 times?"
Ruby's String#count method does something similar to this, though it just counts the number of occurrences of the characters passed, and it uses something similar to regex ranges to allow you to specify character ranges.
The sequence c1-c2 means all characters between c1 and c2.
Thus, it expands the parameter "0-9" into the list of characters "0123456789", and then it tests how many of the characters in the string match that list of characters.
This will work to verify that a certain number of numbers exist in the string, and the length checks let you implicitly test that no other characters exist in the string. However, regexes let you assert that directly, by ensuring that the whole string matches a given pattern, including length constraints.
Count everything non-digit in pin and check if this count is zero:
pin.count("^0-9").zero?
Since you seem to be looking for answers outside regex and since Chris already spelled out how the count method was being implemented in the example above, I'll try to add one more idea for testing whether a string is an Integer or not:
pin.to_i.to_s == pin
What we're doing is converting the string to an integer, converting that result back to a string, and then testing to see if anything changed during the process. If the result is =>true, then you know nothing changed during the conversion to an integer and therefore the string is only an Integer.
EDIT:
The example above only works if the entire string is an Integer and won’t properly deal with leading zeros. If you want to check to make sure each and every character is an Integer then do something like this instead:
pin.prepend(“1”).to_i.to_s(1..-1) == pin
Part of the question seems to be exactly HOW the following portion of code is doing its job:
pin.count("0-9")
This piece of the code is simply returning a count of how many instances of the numbers 0 through 9 exist in the string. That's only one piece of the relevant section of code though. You need to look at the rest of the line to make sense of it:
pin.count("0-9") == pin.length
The first part counts how many instances then the second part compares that to the length of the string. If they are equal (==) then that means every character in the string is an Integer.
Sometimes negation can be used to advantage:
!pin.match?(/\D/) && [4,6].include?(pin.length)
pin.match?(/\D/) returns true if the string contains a character other than a digit (matching /\D/), in which case it it would be negated to false.
One advantage of using negation here is that if the string contains a character other than a digit pin.match?(/\D/) would return true as soon as a non-digit is found, as opposed to methods that examine all the characters in the string.
I am facing one issue related some hexa value in string, i need to remove hexadecimal characters from NSString.
The problem is when i print object it prints as "BLANK line". And in debug mode it shows like :
So how can i remove it from the string?
EDIT
Triming whitespace :
result of NSLog is :
2015-12-14 15:37:10.710 MyApp [2731:82236] tmp :''
Database:
Earlier question:
how to detect garbage string value in ios?
As your dataset clearly has garbage values, You can use this method to check if your string is valid or not. Define your validation criteria and simply don't entertain the values which are garbage. But as suggested before by gnasher, you should rather look for the bug which is causing insertion of garbage data in your database. Once you have done that, check if the input string matches your defined criteria. If it does, do what you want. If it doesn't, simply move on.
-(BOOL) isValidString: (NSString*) input
{
NSMutableCharacterSet *validSpecialChars = [NSMutableCharacterSet characterSetWithCharactersInString:#"_~.,"];//Add your desired characters here
[validSpecialChars formUnionWithCharacterSet:[NSCharacterSet alphanumericCharacterSet]];
return [[input stringByTrimmingCharactersInSet:validSpecialChars] isEqualToString:#""];
}
If your string will contain only your defined characters, it will return true. If it contains any other characters (garbage or invalid) it will return false.
I'm not sure exactly what you are looking for, but if you want to remove all the control characters then
string = [[string componentsSeparatedByCharactersInSet:[NSCharacterSet controlCharacterSet]] componentsJoinedByString:#""]
If you need to be faster and are sure the control characters are only at the beginning and ending of a string then
string = [string stringByTrimmingCharactersInSet:[NSCharacterSet controlCharacterSet]];
NOTE: Removing all control characters will remove all new lines (\n)!
From NSCharacterSet Class Reference:
These characters are specifically the Unicode values U+0000 to U+001F and U+007F to U+009F.
The value you are having a problem with is \x06 which is U+0006.
If you want to remove just \x06, then you can always create a characters set just for it.
NSCharacterSet *hex6 = [NSCharacterSet characterSetWithCharactersInString:#"\x06"];
string = [[string componentsSeparatedByCharactersInSet:hex6] componentsJoinedByString:#""]
First, don't trust the Xcode debugger. Print characterAtIndex:0 to be sure that you really have what you think you have.
Second, deleting stuff is all good and well, but you are doctoring around with a symptom. You should really try to figure out where the contents of _lastUpdatedBy comes from and why it is what it is. You might have a serious bug here and trying to cover it up. For example, there might be a bug that stores rubbish data instead of the correct data, and you are just covering up for that bug.
I have been coding a program in Lua that automatically formats IRC logs from a roleplay. In the roleplay logs there is a specific guideline for "Out of character" conversation, which we use double parentheses for. For example: ((<Things unrelated to roleplay go here>)). I have been trying to have my program remove text between double brackets (and including both brackets). The code is:
ofile = io.open("Output.txt", "w")
rfile = io.open("Input.txt", "r")
p = rfile:read("*all")
w = string.gsub(p, "%(%(.*?%)%)", "")
ofile:write(w)
The pattern here is > "%(%(.*?%)%)" I've tried multiple variations of the pattern. All resulted in fruitless results:
1. %(%(.*?%)%) --Wouldn't do anything.
2. %(%(.*%)%) --Would remove *everything* after the first OOC message.
Then, my friend told me that prepending the brackets with percentages wouldn't work, and that I had to use backslashes to 'escape' the parentheses.
3. \(\(.*\)\) --resulted in the output file being completely empty.
4. (\(\(.*\)\)) --Same result as above.
5. (\(\(.*?\)\) --would for some reason, remove large parts of the text for no apparent reason.
6. \(\(.*?\)\) --would just remove all the text except for the last line.
The short, absolute question:
What pattern would I need to use to remove all text between double parentheses, and remove the double parentheses themselves too?
You're friend is thinking of regular expressions. Lua patterns are similar, but different. % is the correct escape character.
Your pattern should be %(%(.-%)%). The - is similar to * in that it matches any number of the preceding sequence, but while * tries to match as many characters as it can (it's greedy), - matches the least amount of characters possible (it's non-greedy). It won't go overboard and match extra double-close-parenthesis.
I have some NSString like :
test = #"this is %25test%25 string";
I am trying to replace test with some arabic text , but it is not replacing exactly as it is :
[test stringByReplacingOccurrencesOfString:#"test" withString:#"اختبار"];
and the result is :
this is %25 اختبار %25 string
Some where I read there could be some problem with encoding or text alignment.Is there extra adjustment needed to be done for arabic string operations .
EDIT : I have used NSMutable string insert property but still the same result .
EDIT 2:
One other thing that occurs to me that is causing most of your trouble in this specific example. You have a partially percent-encoded string above. You have spaces, but you also have %25. You should avoid doing that. Either percent-encode a string or don't. Convert it all at once when required (using stringByAddingPercentEscapesUsingEncoding:). Don't try to "hard-code" percent-encoding. If you just used "this is a %اختبار% string" (and then percent-encoded the entire thing at the end), all your directional problems would go away (see how that renders just fine?). The rest of these answers address the more general question when you really need to deal with directionality.
EDIT:
The original answer after the line relates to human-readable strings, and is correct for human-readable strings, but your actual question (based on your followups) is about URLs. URLs are not human-readable strings, even if they occasionally look like them. They are a sequence of bytes that are independent of how they are rendered to humans. "اختبار" cannot be in the path or fragment parts of an URL. These characters are not part of the legal set of characters for those sections (اختبار is allowed to be part of the host, but you have to follow the IDN rules for that).
The correct URL encoding for this is a %25<arabic>%25 string is:
this%20is%20a%20%2525%D8%A7%D8%AE%D8%AA%D8%A8%D8%A7%D8%B1%2525%20string
If you decode and render this string to the screen, it will appear like this:
this is a %25اختبار%25 string
But it is in fact exactly the string you mean (and it is the string you should pass to the browser). Follow the bytes (like the computer will):
this - this (ALPHA)
%20 - <space> (encoded)
is - is (ALPHA)
%20 - <space> (encoded)
a - a (ALPHA)
%20 - <space> (encoded)
%25 - % (encoded)
25 - 25 (DIGIT)
%D8%A7 - ا (encoded)
%D8%AE - خ (encoded)
%D8%AA - ت (encoded)
%D8%A8 - ب (encoded)
%D8%A7 - ا (encoded)
%D8%B1 - ر (encoded)
%25 - % (encoded)
25 - 25 (DIGIT)
%20 - <space> (encoded)
string - string (ALPHA)
The Unicode BIDI display algorithm is doing what it means to do; it just isn't what you expect. But those are the bytes and they're in the correct order. If you add any additional bytes (such as LRO) to this string, then you are modifying the URL and it means something different.
So the question you need to answer is, are you making an URL, or are you making a human-readable string? If you're making an URL, it should be URL-encoded, in which case you will not have this display problem (unless this is part of the host, which is a different set of rules, but I don't believe that's your problem). If this is a human-readable string, see below about how to provide hints and overrides to the BIDI algorithm.
It's possible that you really need both (a human-friendly string, and a correct URL that can be pasted). That's fine, you just need to handle the clipboard yourself. Show the string, but when the user goes to copy it, replace it with the fully encoded URL using UIPasteboard or by overriding copy:. See Copy, Cut, and Paste Operations. This is fairly common (note how in Safari, it displays just "stackoverflow.com" in the address bar but if you copy and paste it, it pastes "https://stackoverflow.com/" Same thing.
Original answer related to human-readable strings.
Believe it or not, stringByReplacingOccuranceOfString: is doing the right thing. It's just not displaying the way you expect. If you walk through characterAtIndex:, you'll find that it's:
% 2 5 ا ...
The problem is that the layout engine gets very confused around all the "neutral direction" characters. The engine doesn't understand whether you meant "%25" to be attached to the left to right part or right to left part. You have to help it out here by giving it some explicit directional characters to work with.
There are a few ways to go about this. First, you can do it the Unicode 6.3 tr9-29 way with Explicit Directional Isolates. This is exactly the kind of problem that Isolates are meant to solve. You have some piece of text whose direction you want to be considered completely independently of all other text. Unicode 6.3 isn't actually supported by iOS or OS X as best I can tell, but for many (though not all) uses, it "works."
You want to surround your Arabic with FSI (FIRST STRONG ISOLATE U+2068) and PDI (POP DIRECTIONAL ISOLATE U+2069). You could also use RLI (RIGHT-TO-LEFT ISOLATE) to be explicit. FSI means "treat this text as being in the direction of the first strong character you find."
So you could ideally do this:
NSString *test = #"this is a %25\u2068test\u2069%25 string";
NSString *arabic = #"اختبار";
NSString *result = [test stringByReplacingOccurrencesOfString:#"test" withString:arabic];
That works if you know what you're going to substitute before hand (so you know where to put the FSI and PDI). If you don't, you can do it the other way and make it part of the substitution:
NSString * const FSI = #"\u2068";
NSString * const PDI = #"\u2069";
NSString *test = #"this is %25test%25 string";
NSString *arabic = #"اختبار";
NSString *replaceString = [#[FSI, arabic, PDI] componentsJoinedByString:#""];
NSString *result = [test stringByReplacingOccurrencesOfString:#"test" withString:replaceString];
I said this "mostly" works. It's fine for UILabel, and it probably is fine for anything using Core Text. But in NSLog output, you'll get these extra "placeholder" characters:
You might get this other places, too. I haven't checked UIWebView for instance.
So there are some other options. You can use directional marks. It's a little awkward, though. LRM and RLM are zero-width strongly directional characters. So you can bracket the arabic with LRM (left to right mark) so that the arabic doesn't disturb the surrounding text. This is a little ugly since it means the substitution has to be aware of what it's substituting into (which is why isolates were invented).
NSString * const LRM = #"\u200e";
NSString *test = #"this is a %25test%25 string";
NSString *replaceString = [#[LRM, arabic, LRM] componentsJoinedByString:#""];
NSString *result = [test stringByReplacingOccurrencesOfString:#"test" withString:replaceString];
BTW, Directional Marks are usually the right answer. They should always be the first thing you try. This particular problem is just a little too tricky.
One more way is to use Explicit Directional Overrides. These are the giant "do what I tell you to do" hammer of the Unicode world. You should avoid them whenever possible. There are some security concerns with them that make them forbidden in certain places (<RLO>elgoog<PDF>.com would display as google.com for instance). But they will work here.
You bracket the whole string with LRO/PDF to force it to be left-to-right. You then bracket the substitution with RLO/PDF to force it to the right-to-left. Again, this is a last resort, but it lets you take complete control over the layout:
NSString * const LRO = #"\u202d";
NSString * const RLO = #"\u202e";
NSString * const PDF = #"\u202c";
NSString *test = [#[LRO, #"this is a %25test%25 string", PDF] componentsJoinedByString:#""];
NSString *arabic = #"اختبار";
NSString *replaceString = [#[RLO, arabic, PDF] componentsJoinedByString:#""];
NSString *result = [test stringByReplacingOccurrencesOfString:#"test" withString:replaceString];
I would think you could solve this problem with the Explicit Directional Embedding characters, but I haven't really found a way to do it without at least one override (for instance, you could use RLE instead of RLO above, but you still need the LRO).
Those should give you the tools you need to figure all of this out. See the Unicode TR9 for the gory details. And if you want a deeper introduction to the problem and solutions, see Cal Henderson's excellent Understanding Bidirectional (BIDI) Text in Unicode.
You should try like this:
NSString *test = #"this is %25test%25 string";
NSString *test2 = [[[test stringByReplacingPercentEscapesUsingEncoding:NSStringEncodingConversionAllowLossy] componentsSeparatedByString:#"test"] componentsJoinedByString:#"اختبار"];
So as I work my way through understanding string methods, I came across this useful class
NSCharacterSet
which is defined in this post quite well as being similar to a string excpet it is used for holding the char in an unordered set
What is differnce between NSString and NSCharacterset?
So then I came across the useful method invertedSet, and it bacame a little less clear what was happening exactly. Also I a read page a fter page on it, they all sort of glossed over the basics of what was happening and jumped into advanced explainations. So if you wanted to know what this is and why we use It SIMPLY put, it was not so easy instead you get statements like this from the apple documentation: "A character set containing only characters that don’t exist in the receiver." - and how do I use this exactly???
So here is what i understand to be the use. PLEASE provide in simple terms if I have explained this incorrectly.
Example Use:
Create a list of Characters in a NSCharacterSetyou want to limit a string to contain.
NSString *validNumberChars = #"0123456789"; //Only these are valid.
//Now assign to a NSCharacter object to use for searching and comparing later
validCharSet = [NSCharacterSet characterSetWithCharactersInString:validNumberChars ];
//Now create an inverteds set OF the validCharSet.
NSCharacterSet *invertedValidCharSet = [validCharSet invertedSet];
//Now scrub your input string of bad character, those characters not in the validCharSet
NSString *scrubbedString = [inputString stringByTrimmingCharactersInSet:invertedValidCharSet];
//By passing in the inverted invertedValidCharSet as the characters to trim out, then you are left with only characters that are in the original set. captured here in scrubbedString.
So is this how to use this feature properly, or did I miss anything?
Thanks
Steve
A character set is a just that - a set of characters. When you invert a character set you get a new set that has every character except those from the original set.
In your example you start with a character set containing the 10 standard digits. When you invert the set you get a set that has every character except the 10 digits.
validCharSet = [NSCharacterSet characterSetWithCharactersInString:validNumberChars];
This creates a character set containing the 10 characters 0, 1, ..., 9.
invertedValidCharSet = [validCharSet invertedSet];
This creates the inverted character set, i.e. the set of all Unicode characters without
the 10 characters from above.
scrubbedString = [inputString stringByTrimmingCharactersInSet:invertedValidCharSet];
This removes from the start and end of inputString all characters that are in
the invertedValidCharSet. For example, if
inputString = #"abc123d€f567ghj😄"
then
scrubbedString = #"123d€f567"
Is does not, as you perhaps expect, remove all characters from the given set.
One way to achieve that is (copied from NSString - replacing characters from NSCharacterSet):
scrubbedString = [[inputString componentsSeparatedByCharactersInSet:invertedValidCharSet] componentsJoinedByString:#""]
This is probably not the most effective method, but as your question was about understanding
NSCharacterSet I hope that it helps.