Convert Escaped unicode to unicode in Objective-c - ios

SO,
A seemingly simple question has me stumped. I have two statements:
NSLog(#"%#", #"\U0001f1ee\U0001f1f9");
NSLog(#"%#", #"\\U0001f1ee\\U0001f1f9");
The first outputs the correct emoji (Flag). The second outputs an escaped string. What conversion do I need to do to the second string to make it output the flag as well?
In other words: I have strings of escaped Unicode that I want to print out as the proper Emoji. How would I go about doing that?
I tried converting to NSUTF8StringEncoding NSData and then back to NSString, I tried using NSNonLossyASCIIStringEncoding, no joy. I must be using them wrong...
Thanks for any help!

Easy. Use -stringByRemovingPercentEncoding.
NSString * string = #"\\U0001f1ee\\U0001f1f9" ;
NSLog( #"%#", [string stringByRemovingPercentEncoding]);

Related

How to remove hexadecimal characters from NSString

I am facing one issue related some hexa value in string, i need to remove hexadecimal characters from NSString.
The problem is when i print object it prints as "BLANK line". And in debug mode it shows like :
So how can i remove it from the string?
EDIT
Triming whitespace :
result of NSLog is :
2015-12-14 15:37:10.710 MyApp [2731:82236] tmp :''
Database:
Earlier question:
how to detect garbage string value in ios?
As your dataset clearly has garbage values, You can use this method to check if your string is valid or not. Define your validation criteria and simply don't entertain the values which are garbage. But as suggested before by gnasher, you should rather look for the bug which is causing insertion of garbage data in your database. Once you have done that, check if the input string matches your defined criteria. If it does, do what you want. If it doesn't, simply move on.
-(BOOL) isValidString: (NSString*) input
{
NSMutableCharacterSet *validSpecialChars = [NSMutableCharacterSet characterSetWithCharactersInString:#"_~.,"];//Add your desired characters here
[validSpecialChars formUnionWithCharacterSet:[NSCharacterSet alphanumericCharacterSet]];
return [[input stringByTrimmingCharactersInSet:validSpecialChars] isEqualToString:#""];
}
If your string will contain only your defined characters, it will return true. If it contains any other characters (garbage or invalid) it will return false.
I'm not sure exactly what you are looking for, but if you want to remove all the control characters then
string = [[string componentsSeparatedByCharactersInSet:[NSCharacterSet controlCharacterSet]] componentsJoinedByString:#""]
If you need to be faster and are sure the control characters are only at the beginning and ending of a string then
string = [string stringByTrimmingCharactersInSet:[NSCharacterSet controlCharacterSet]];
NOTE: Removing all control characters will remove all new lines (\n)!
From NSCharacterSet Class Reference:
These characters are specifically the Unicode values U+0000 to U+001F and U+007F to U+009F.
The value you are having a problem with is \x06 which is U+0006.
If you want to remove just \x06, then you can always create a characters set just for it.
NSCharacterSet *hex6 = [NSCharacterSet characterSetWithCharactersInString:#"\x06"];
string = [[string componentsSeparatedByCharactersInSet:hex6] componentsJoinedByString:#""]
First, don't trust the Xcode debugger. Print characterAtIndex:0 to be sure that you really have what you think you have.
Second, deleting stuff is all good and well, but you are doctoring around with a symptom. You should really try to figure out where the contents of _lastUpdatedBy comes from and why it is what it is. You might have a serious bug here and trying to cover it up. For example, there might be a bug that stores rubbish data instead of the correct data, and you are just covering up for that bug.

How to show or convert (non-standard) \U0099 for Trademark "TM" Character?

I've ran into some issue displaying the trademark "TM" character on my UILabel.
The "TM" character having problem showing up is \U0099 instead of the usual \U2122
I dig a little deeper and find out the "TM" character \U0099 belongs to a very few Chinese fonts.
So I'm guessing iOS doesn't have the font to show it in labels or does not recognize it at all.
I've tried to scan my data for "\U0099" and stringreplace it to \U2122, but seems like NSString functions will escape unicode characters automatically so this "TM" character won't even be there.
Has anyone encountered this issue before or can give me suggestions as to how to deal with this \U0099 character?
Thanks in advance
It is unclear to me how you've obtained your NSString or what you have actually tried to solve your problem. So this suggestion might be completely unsuitable, but let's see if it helps...
U+0099 is an unassigned Unicode control character, it is not a TM symbol. It is fairly hard to get this character into an NSString as Clang at least objects if you place the escape into a literal, and Cocoa fails to translate a sequence of bytes in UTF-8 into an NSString if it contains it. This problem might be what is behind your comment that you could not string replace it.
However starting with UTF-16, I did manage to create a string with U+0099 in it:
unichar b[] = { 0x61, 0x62, 0x63, 0x99, 0x64, 0x65, 0x66 };
NSString *s = [[NSString alloc] initWithBytes:b length:14 encoding:NSUTF16LittleEndianStringEncoding];
That is the string "abc\U0099def" (calling characterAtIndex:3 will show you this).
Using the same approach an NSString with just U+0099 in it can be generated:
unichar notTMChar = 0x99;
NSString *notTMStr = [[NSString alloc] initWithBytes:&notTMChar length:2 encoding:NSUTF16LittleEndianStringEncoding];
and that can be used in a string replace call:
NSString *t = [s stringByReplacingOccurrencesOfString:notTMStr withString:#"™"];
giving t the value "abc™def" as required.
Warning: We are dealing with an unassigned Unicode control character here. Clang/Cocoa rejected it in UTF-8, it is probably unintentional that it accepted it in UTF-16. Using C library functions to do this is probably more reliable. Xcode 5.1.1 with Clang 5.1 was used for the tests.
HTH
Thanks for the suggestions.
I've talked to my clients and they agreed that \u0099 shouldn't be there.
I have also implemented rmaddy's suggestion to replace instance \u0099 to \u2122.
NSString *problemString = dictionaryWithU099AsValue.description;
if ([problemString rangeOfString:#"0099"].location != NSNotFound) {
NSString *fixedDescriptionString = [[[problemString stringByReplacingOccurrencesOfString:#"U0099" withString:#"U2122"];
// Then I reconstruct the NSString back to a new NSDictionary
}
Note that the trademark symbol ™ appears as hex 99 in Code Page 1252 (a common Windows character set).

NSString stringWithFormat not working with special chars like currency symbols

I have been having a lot of trouble with NSString's stringWithFormat: method as of late. I have written an object that allows you to align N lines of text (separated by new lines) either centered, right, or left. At the core of my logic is NSString's stringWithFormat. I use this function to pad my strings with spaces on the left or right of individual lines to produce the alignment I want. Here is an example:
NSString *str = #"$3.00" --> 3 dollars
[NSString stringWithFormat:#"%8s", [str cStringUsingEncoding:NSUnicodeStringEncoding]] --> returns --> " $3.00"
As you can see the above example works great, I padded 3 spaces on the left and the resulting text is right aligned/justified. Problems begin to arise when I start to pass in foreign currency symbols, the formatting just straight up does not work. It either adds extra symbols or just returns garbage.
NSString *str = #"Kč1.00" --> 3 Czech Koruna (Czech republic's currency)
[NSString stringWithFormat:#"%8s", [str cStringUsingEncoding:NSUnicodeStringEncoding]] --> returns --> " Kč1.00"
The above is just flat out wrong... Now I am not a string encoding expert but I do know NSString uses the international standardized unicode encoding for special characters well outside basic ASCII domain.
How can I fix my problem? What encoding should I use? I have tried so many different encoding enums I have lost count, everything from NSMACOSRomanEncoding to NSUTF32UnicodeBigEndian.. My last resort will be to just completely ditch using stringWithFormat all together, maybe it was only meant for simple UTF8Strings and basic symbols.
If you want to represent currency, is a lot better if you use a NSNumberFormatter with currency style (NSNumberFormatterCurrencyStyle). It reads the currentLocale and shows the currency based on it. You just need to ask its string representation and append to a string.
It will be a lot easier than managing unicode formats, check a tutorial here
This will give you the required result
NSString *str = #"Kč1.00";
str=[NSString stringWithFormat:#"%#%8#",[#" " stringByPaddingToLength:3 withString:#" " startingAtIndex:0],str];
Out Put : #" Kč1.00";
Just one more trick to achieve this -
If you like use it :)
[NSString stringWithFormat:#"%8s%#",[#"" cStringUsingEncoding:NSUTF8StringEncoding],str];
This will work too.

iOS correct comparison of file paths

I have an issue in an application I'm writing where I need to compare one NSURL that points to a file and an NSString, which is an incoming string representation of the same file path.
I can't get them to compare – the output I'm given when NSLogging is confusing, perhaps it is a encoding issue?
I can make them look the same with this code: [urlString stringByRemovingPercentEncoding];
The raw output for the NSURL is:
file:///var/mobile/Applications/F14AFBD8-FF60-4094-8BBD-7AC2477E0B20/Documents/1.%20AKTIV%20SA%CC%88LJFOLDER/Sa%CC%88ljfolder2014-SP1.pdf
And for the NSString:
/var/mobile/Applications/F14AFBD8-FF60-4094-8BBD-7AC2477E0B20/Documents/1. AKTIV SÄLJFOLDER/Säljfolder2014-SP1.pdf
If I run stringByRemovingPercentEncoding on the NSURL it looks the same, but they don't compare.
If I run stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding to the NSString I get file:///var/mobile/Applications/F14AFBD8-FF60-4094-8BBD-7AC2477E0B20/Documents/nestle/1.%20AKTIV%20S%C3%84LJFOLDER/S%C3%A4ljfolder2014-SP1.pdf
Note that the percentages is not the same on the urls. I have tried so many things, changing encodings etc. but can't find a way to solve this.
Edit
So, I tried the precomposedStringWithCanonicalMapping as follows:
NSLog(#"EQUAL? :%hhd", [[strippedUrlString precomposedStringWithCanonicalMapping] isEqualToString:[filePath precomposedStringWithCanonicalMapping]]); – returns 0
I logged the strings and got
/Users/xxxxxx/Library/Application Support/iPhone Simulator/7.0/Applications/C05E0885-7B58-4B2F-A6B4-D9388E60462C/Documents/1. AKTIV SÄLJFOLDER/Säljfolder2014-SP1.pdf
with NSLog(#"Precompose url 1: %#", [strippedUrlString precomposedStringWithCanonicalMapping]);
for the first string and
/Users/xxxxxx/Library/Application%20Support/iPhone%20Simulator/7.0/Applications/C05E0885-7B58-4B2F-A6B4-D9388E60462C/Documents/1.%20AKTIV%20SA%CC%88LJFOLDER/Sa%CC%88ljfolder2014-SP1.pdf
with NSLog(#"Precompose file 1: %#", [filePath precomposedStringWithCanonicalMapping]);
for the second.
Tried same code, but with precomposedStringWithCompatibilityMapping and got exactly the same result :(
Probably you ran in a problem that in Unicode equivalent strings are not always binary equal.
http://en.wikipedia.org/wiki/Unicode_equivalence
You have
…SA%CC%88…:
This is the problem.
It means: We have an "A" and a combining diaeresis -> Ä. The diaeresis is the 0xCC88, which is UTF-8 for Unicode 0x0308 (COMBINING DIAERESIS). So the Ä is encoded as an A with an combining diaeresis.
…S%C3%84…:
This is easy. 0xC384 is UTF-8 for 0x00C4 that means A-Umlaut -> Ä
First of all: What is the source of the first string?
Addition: You can use precomposedStringWith…Mapping (NSString).
BTW: You can compare strings without diacritic marks using -compare:withOptions: et al. with the option NSDiacriticInsensitiveSearch. In this case, I assume, string 1 equals string 2. Butt it would equal an "A", too, what is probably not what you want.

ios stringByAddingPercentEscapesUsingEncoding convert not corrected

I have some content like 2ofsjw0234lnc.jpg\t2m03fcsmaokwf.jpg\n want to encode as a url Parameter,so I use code like bellowed.
NSString * attachsString = [_attachments stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding];
the output is 2ofsjw0234lnc.jpg%5ct2m03fcsmaokwf.jpg%5cn
it is not my wanted, 2ofsjw0234lnc.jpg%092m03fcsmaokwf.jpg%0d%0a, the Escape character are not convert as i wanted. only "\" have been converted.
so what could you give some advice?
If you put \t in a string literal, then at compile time, this gets converted to a tab. Then stringByAddingPercentEscapesUsingEncoding: would convert the tab to %09. However, if at runtime, the string has the two characters \t, then this is not treated any different than any other two characters. In this case, stringByAddingPercentEscapesUsingEncoding: simply sees the backslash and converts it to %5c.
If you want to convert \t or \n strings (not characters) at runtime, then you should use:
// note the double backslash for the tab
string = [string stringByReplacingOccurrencesOfString:#"\\t" withString:#"%09"];
Repeat for the \n.

Resources