+(const char /*wchar_t*/ *)wcharFromString:(NSString *)string
{
return [string cStringUsingEncoding:NSUTF8StringEncoding];
}
Does it return char or wchar_t?
from the method name, it should return wchar_t, but why there is a comment around wchar_t return type?
Source is here:
How to convert wchar_t to NSString?
That code just looks incorrect. They're claiming it does one thing, but it actually does another. The return type is const char *.
This method is not correct. It returns a const char *, encoded as a UTF8 string. That is a perfectly sensible way of getting a C string from an NSString, but nowhere here is anyone actually doing anything with wchar_ts.
wchar_t is a "wide char", and a pointer to it would be a "wide string" (represented by const wchar_t *). These are designed to precisely represent larger character sets, and can be two-byte wide character strings; they use a whole different variant set of string manipulation functions to do things with them. (Strings like this are very rarely seen in iOS development, for what it's worth.)
Related
While experimenting with the zig syntax, I noticed the type expression of string literals is omitted in all examples. Which is totally fine, I'm not saying it shouldn't be.
const zig_string = "I am a string"; //it looks nice enough for sure and compiles fine ofcourse
However, because this type omission is a bit inconsistent* with other type declarations in zig, it can lead to beginners (like me) misinterpreting the actual type of string literals (which is fact quite rightfully complicated and 'different'). Anyway, after reading about the type of string literals being 'pointers to (utf-8 encoded) immutable (const), sentinel terminated arrays of u8 bytes' (yes?), with next to the hard coded length field, a terminator field like so: [<length>:0]. To check my own understanding, I thought it reasonable to try adding this type expression to the declaration, similar to how other arrays are conveniently declared, so with an underscore to infer the length, because who likes counting characters?
const string: *const [_:0]u8 = "jolly good"; //doesn't compile: unable to infer array size
But it didn't compile :(.
After dutifully counting characters and now specifying the length of my string however, it proudly compiled :)!
const string: *const [10:0]u8 = "jolly good"; //happily compiles
Which led me to my question:
Why is this length specification needed for string literals and not for other literals/arrays? - (And should this be so?)
Please correct my type description of string literals if I missed an important nuance.
I'd like to know to further deepen my understanding of the way strings are handled in zig.
*although there are more cases where the zig compiler can infer the type without it
Types never have _ in them.
"jolly good" is a string literal. *const [10:0]u8 is the type.
For "other literals/arrays":
const a = [_]u8{ 1, 2, 3 };
[_]u8{ 1, 2, 3 } is an array literal. The type is [3]u8 and it cannot be specified as [_]u8.
Look into slices. They offer a very convenient way to use strings and arrays.
I have a string that include some special char (like é,â,î,ı etc.), When I use substring on this string. I encounter inconsistent results. Some special char change uncontrollably
You are assuming that these are all characters:
[newword substringWithRange:NSMakeRange(0,1)];
[newword substringWithRange:NSMakeRange(1,1)];
[newword substringWithRange:NSMakeRange(2,1)];
[newword substringWithRange:NSMakeRange(3,1)];
// and so on...
In other words, you believe that:
A location always falls at the start of a character.
A character always has length 1.
Both assumptions are wrong. Please read the Characters and Grapheme Clusters chapter of Apple's String Programming Guide (here).
Your é happens to have length 2, because it is a base letter e followed by a combining diacritical accent. If you want it to have length 1, you need to normalize the string before you use it. Call precomposedStringWithCanonicalMapping and use the resulting string.
Example and proof (in Swift, but it won't matter, as I use NSString throughout):
let s = "é,â,î,ı" as NSString
let c = s.substring(with: NSRange(location: 0, length: 1)) // e
let s2 = s.precomposedStringWithCanonicalMapping as NSString
let c2 = s2.substring(with: NSRange(location: 0, length: 1)) // é
You're treating a unicode string like a sequence of bytes. Unicode codepoints, aside from low UTF8 can be multi-byte so you are changing the text style by stripping out parts responsible for the accent above the letter like this part: https://www.compart.com/en/unicode/U+0301
UTF8 is variable width so by treating it as raw bytes you may get weird results, I would suggest using something that is more aware of unicode like ICU (International Components for Unicode).
Now imagine you have a two byte sequence like this (this may not be 100% accurate but it illustrates my point):
0x056 0x000
e NUL
Now you have a UTF8 string with 1 codepoint and a null terminator. Now say you want to add an accent to that e. How would you do that? You could use a special unicode codepoint to modify the e so now the string is:
0x056 0x0CC 0x810 0x000
e U+0301 NUL
Where U+0301 is 2 a byte control character (Combining Acute Accent) and makes the e accented.
Edit: The answer assumes UTF8 encoding which is likely a bad assumption but I think the answer, whether UTF8 or UTF16, or any other type of encoding with control characters, illustrates why you may have mysterious dissapearing accents. While this may be UTF16, for the sake of simplicity let's pretend we live in a world where life is just slightly better because everyone only uses UTF8 and UTF16 doesn't exist.
To address the comment (this is less to do with the question but is some fun trivia) and for some fun detils about NS/CF/Swift runtimes and bridging and constant CF strings and other fun stuff like that: The representation of the actual string in memory is implementation defined and can vary (even for constant strings, trust me, I know, I fixed the ELF implementation of them in Clang for CoreFoundation a few days ago). Anyway, here's some code:
CF_INLINE CFStringEncoding __CFStringGetSystemEncoding(void) {
if (__CFDefaultSystemEncoding == kCFStringEncodingInvalidId) (void)CFStringGetSystemEncoding();
return __CFDefaultSystemEncoding;
}
CFStringEncoding CFStringFileSystemEncoding(void) {
if (__CFDefaultFileSystemEncoding == kCFStringEncodingInvalidId) {
#if DEPLOYMENT_TARGET_MACOSX || DEPLOYMENT_TARGET_EMBEDDED || DEPLOYMENT_TARGET_EMBEDDED_MINI || DEPLOYMENT_TARGET_WINDOWS
__CFDefaultFileSystemEncoding = kCFStringEncodingUTF8;
#else
__CFDefaultFileSystemEncoding = CFStringGetSystemEncoding();
#endif
}
return __CFDefaultFileSystemEncoding;
}
Throughout CoreFoundation/Foundation/SwiftFoundation (Yes you never know what sort of NSString is actually the one you're holding, they usually pretend to be the same thing but under the hood depending on how you got the object you may be holding onto one of the three variations of it).
This is why code like this exists, because NS/CF(Constant)/Swift strings have implementation defined internal representation.
if (((encoding & 0x0FFF) == kCFStringEncodingUnicode) && ((encoding == kCFStringEncodingUnicode) || ((encoding > kCFStringEncodingUTF8) && (encoding <= kCFStringEncodingUTF32LE)))) {
If you want consistent behavior you have to encode the string using a specific fixed encoding instead of relying on the internal representation.
I receive a NSDictionary using AFNetworking. that dictionary has many values.
one of these I think is an INT or NSNumber, dont know what's the automatic conversion. The numbers expected are 2 and 3 but somehow I always get a very long number... I tried all the things I know and could find but I cant get it to show the right number in the
cell.numberOfLikes.text
These are all the things I've tried with no success. I would appreciate some guidance
NSLog(#"%i",_myDownloadedInfo[#"routines"][indexPath.row][#"routine"][#"routineDownloads"]);
NSLog(#"%i",[NSNumber numberWithInt:_myDownloadedInfo[#"routines"][indexPath.row][#"routine"][#"routineDownloads"]]);
cell.numberOfDownloads.text = [NSString stringWithFormat:#"%i",_myDownloadedInfo[#"routines"][indexPath.row][#"routine"][#"routineDownloads"]];
cell.numberOfLikes.text =_myDownloadedInfo[#"routines"][indexPath.row][#"routine"][#"routineDownloads"]);
The %i format specified is for basic, integer data types. Since you have (or try to create) an NSNumber object, you can use %# to log it. %# is used for Objective-C objects in format specifiers.
But that doesn't help when you want to assign the NSNumber to your text field. You need to convert the number to an NSString. Ideally you should use an NSNumberFormatter to convert the NSNumber to an NSString.
Do this:
NSNumber *likesNumber = _myDownloadedInfo[#"routines"][indexPath.row][#"routine"][#"routineDownloads"];
NSString *numberText = [NSString stringWithFormat:#"%d". [likesNumber intValue]];
NSLog(#"Number: %#", numberText);
cell.numberOfLikes.text = numberText;
Several points.
First, the first argument of NSLog is a format specifier. It is entirely equivalent to the format specifier used in [NSString stringWithFormat:] and there's rarely a need for both.
Next, data in Objective-C can be individual int values, char values, char* values (which are C strings), float values, and Objective-C object values. (Plus a few assorted long, unsigned, etc values that you rarely use.)
An NSString is an Objective-C object, as is an NSNumber. An NSInteger, on the other hand, is an alias for a particular size of int and generally interchangeable with int. Very often people get confused with NSNumber (object) vs NSInteger (scalar int) -- they are not the same.
In an NSLog or stringWithFormat specifier you use % followed by one or more characters to indicate how a value (in the following list of values) is to be formatted. %d or %i is an int. (%d is preferred.) %f is a simple floating-point value (but the formatting of these can get more complicated). %s is a C-style string (ie, a char* value). %# is used for formatting an Objective-C object. You use %#, in particular, to format an NSString or (to get the default presentation) an NSNumber.
It's important to understand how %# works in a format string. When the format interpreter encounters it, it interprets the next item in the value list as a pointer to an Objective-C object and invokes the description method of that object. An NSString just returns itself as the result of description, while an NSNumber returns an NSString that represents the character representation of the number's value. Other objects (all Objective-C objects support description, if only by default) return either a simple identification of the object type or a representation of the object's internal values.
So, if you have an NSNumber, you can directly format it with %#, or you can extract the numeric value (eg, [someNSNumber intValue]) and then format that value appropriately (eg, %d). A reason for doing the latter would be if you wanted to specify the column width for the number (eg, %5d, to format the number into a 5-position field).
See here for some (gory) details on the various format specifiers.
Hint: %i is the conversion specifier for int.
one of these I think is an INT or NSNumber, dont know what's the automatic conversion
Now go back and read the documentation of NSArray, think about whether or not it can contain ints, then proceed reading.
So the objects in the dictionary are instances of NSNumber -> they're Objective-C objects -> they are implemented using pointers. Poor NSLog() tries to convert the pointer value to an integer. Either use %# to get the description of the number, or %i and [theNumber intValue] to print its integer value.
I'm using CFStringTokenizer to break a load of text into words, but I'm having difficulty bridging whatever encoding CFString is using and UTF8. Consider this:
NSString *theString = #"Lorem ipsum dolor sit amet!";
const char *theCString = [theString cStringUsingEncoding:NSUTF8StringEncoding];
tokenizer = CFStringTokenizerCreate(kCFAllocatorDefault,
(__bridge CFStringRef)theString,
CFRangeMake(0, [theString length]),
kCFStringTokenizerUnitWordBoundary,
locale);
while ((tokenType = CFStringTokenizerAdvanceToNextToken(tokenizer)) != kCFStringTokenizerTokenNone) {
tokenRange = CFStringTokenizerGetCurrentTokenRange(tokenizer);
memcpy(resultPtr, theCString+tokenRange.location, tokenRange.length);
}
Unfortunately the range reported by the tokenizer is incorrect when trying to read from the C string if any non-ascii characters have been encountered. How can I go about getting the correct range from the tokenizer to be able to pull the correct chars from my C string?
To clarify, the memcpy stuff is a tad more complex than above, and is necessary for performance on my target device, the iPhone. So I can't even do anything like create a CFString substring and convert that, I need the range in the C string. Is there any way to do that without reimplementing various word boundary libraries to get it working for the various different locales I need it to work with? (which is as many as possible, so I can't just iterate through looking for ' ' unfortunately..)
Alec
NSStrings and CFStrings deal in UTF-16, not UTF-8, but that isn't the real problem.
Your code has two problems:
You're assuming that the C string's indexes correspond to the source string's indexes.
You're copying and converting the entire string to a UTF-8 C string at once.
#1 is the cause of the range mismatches, and #2 causes potentially high memory usage, depending on the length and content of the string. (UTF-8 can take as many as four bytes per character in some alphabets—and then add one for the C string terminator.)
You can solve both of these problems in a single change.
Create an NSMutableData to hold the output. For each token, set the data's length to the range's length; then, tell the string to get bytes within the desired range in the desired encoding and store them in the data's mutableBytes buffer. NSString has a method with a very long selector (briefly, getBytes:::::::) that you will want to use for this.
Since you use the range that is relative to the string exclusively with the string, there is no index/range mismatch, and each token will be output correctly.
If you really need a C string, you can set the data's length to the range's length + 1, then set the last byte to '\0' with a separate assignment after getting the token bytes. (Without the separate assignment, the byte may hold a previous value.)
How to convert to TCHAR[] to char[] ?
Honestly, I don't know how to do it with arrays but with pointers, Microsoft provides us with some APIs, such as wctomb and wcstombs. First one is less secure than the second one. So I think you can do what you want to achieve with one array-to-pointer and one pointer-to-array casting like;
// ... your includes
#include <stdlib.h>
// ... your defines
#define MAX_LEN 100
// ... your codes
// I assume there is no any defined TCHAR array to be converted so far, so I'll create one
TCHAR c_wText[MAX_LEN] = _T("Hello world!");
// Now defining the char pointer to be a buffer for wcstomb/wcstombs
char c_szText[MAX_LEN];
wcstombs(c_szText, c_wText, wcslen(c_wText) + 1);
// ... and you're free to use your char array, c_szText
PS: Could not be the best solution but at least it's working and functional.
TCHAR is a Microsoft-specific typedef for either char or wchar_t (a wide character).
Conversion to char depends on which of these it actually is. If TCHAR is actually a char, then you can do a simple cast, but if it is truly a wchar_t, you'll need a routine to convert between character sets. See the function MultiByteToWideChar().
Why not just use wcstombs_s ?
Here is the code to show how simple it is.
#define MAX_LENGTH 500
...
TCHAR szWideString[MAX_LENGTH];
char szString[MAX_LENGTH];
size_t nNumCharConverted;
wcstombs_s(&nNumCharConverted, szString, MAX_LENGTH,
szWideString, MAX_LENGTH);
It depends on the character set (Unicode or ANSI) (wchar_t or char), so if you are using ANSI simply TCHAR will be char without any casting, but for Unicode, you have to convert from wchar_t to char, you can use WideCharToMultiByte