Convert TCHAR array to char array - character-encoding

How to convert to TCHAR[] to char[] ?

Honestly, I don't know how to do it with arrays but with pointers, Microsoft provides us with some APIs, such as wctomb and wcstombs. First one is less secure than the second one. So I think you can do what you want to achieve with one array-to-pointer and one pointer-to-array casting like;
// ... your includes
#include <stdlib.h>
// ... your defines
#define MAX_LEN 100
// ... your codes
// I assume there is no any defined TCHAR array to be converted so far, so I'll create one
TCHAR c_wText[MAX_LEN] = _T("Hello world!");
// Now defining the char pointer to be a buffer for wcstomb/wcstombs
char c_szText[MAX_LEN];
wcstombs(c_szText, c_wText, wcslen(c_wText) + 1);
// ... and you're free to use your char array, c_szText
PS: Could not be the best solution but at least it's working and functional.

TCHAR is a Microsoft-specific typedef for either char or wchar_t (a wide character).
Conversion to char depends on which of these it actually is. If TCHAR is actually a char, then you can do a simple cast, but if it is truly a wchar_t, you'll need a routine to convert between character sets. See the function MultiByteToWideChar().

Why not just use wcstombs_s ?
Here is the code to show how simple it is.
#define MAX_LENGTH 500
...
TCHAR szWideString[MAX_LENGTH];
char szString[MAX_LENGTH];
size_t nNumCharConverted;
wcstombs_s(&nNumCharConverted, szString, MAX_LENGTH,
szWideString, MAX_LENGTH);

It depends on the character set (Unicode or ANSI) (wchar_t or char), so if you are using ANSI simply TCHAR will be char without any casting, but for Unicode, you have to convert from wchar_t to char, you can use WideCharToMultiByte

Related

Should const string: *const [_:0]u8 = "infer size"; be compilable in zig?

While experimenting with the zig syntax, I noticed the type expression of string literals is omitted in all examples. Which is totally fine, I'm not saying it shouldn't be.
const zig_string = "I am a string"; //it looks nice enough for sure and compiles fine ofcourse
However, because this type omission is a bit inconsistent* with other type declarations in zig, it can lead to beginners (like me) misinterpreting the actual type of string literals (which is fact quite rightfully complicated and 'different'). Anyway, after reading about the type of string literals being 'pointers to (utf-8 encoded) immutable (const), sentinel terminated arrays of u8 bytes' (yes?), with next to the hard coded length field, a terminator field like so: [<length>:0]. To check my own understanding, I thought it reasonable to try adding this type expression to the declaration, similar to how other arrays are conveniently declared, so with an underscore to infer the length, because who likes counting characters?
const string: *const [_:0]u8 = "jolly good"; //doesn't compile: unable to infer array size
But it didn't compile :(.
After dutifully counting characters and now specifying the length of my string however, it proudly compiled :)!
const string: *const [10:0]u8 = "jolly good"; //happily compiles
Which led me to my question:
Why is this length specification needed for string literals and not for other literals/arrays? - (And should this be so?)
Please correct my type description of string literals if I missed an important nuance.
I'd like to know to further deepen my understanding of the way strings are handled in zig.
*although there are more cases where the zig compiler can infer the type without it
Types never have _ in them.
"jolly good" is a string literal. *const [10:0]u8 is the type.
For "other literals/arrays":
const a = [_]u8{ 1, 2, 3 };
[_]u8{ 1, 2, 3 } is an array literal. The type is [3]u8 and it cannot be specified as [_]u8.
Look into slices. They offer a very convenient way to use strings and arrays.

Is it a good bison practice to not have semantic values, but use side effects of actions?

I have a language where the semantic meaning of everything, is an array of characters, or array of arrays. So I have the following YYSTYPE:
typedef struct _array {
union {
char *chars; // start of string
void *base; // start of array
};
unsigned n; // number of valid elements in above
unsigned allocated; // number of allocated elements for above
} array;
#define YYSTYPE array
and I can append an array of characters to an array of arrays with
void append(YYSTYPE *parray, YYSTYPE *string);
Suppose the grammar (SSCCE) is:
%token WORD
%%
array : WORD
| array WORD
;
So I accept a sequence of words. For each word, the semantic value becomes that array of characters, and then I would like to append each of these, to the array of arrays, for the whole sequence.
There are several possible ways to design the actions:
Have array symbol have the semantic value of type array. If I do this, then the action for array WORD will have to copy the array $1 to $$ which is slow, so I don't like that.
Have array symbol have the semantic value of type array *. Now the action for array WORD, I can just add to the array *$1 and then set $$ to be equal to $1. But I don't like this for two reasons. First, the semantic meaning is not a pointer to array, it is the array. Second, for the action for the rule array : WORD, I will have to malloc the structure, which is slow. Yes, the 'append' sometimes does a malloc, but if I allocate enough, not frequently. I want to avoid any unnecessary malloc for performance reasons.
Forget about trying to have a semantic value for the symbol array at all, and use globals:
static YYSTYPE g_array;
YYSTYPE *g_parray = &g_array;
and then, the actions will just use
append(g_parray, word_array)
The way the whole grammar works, I don't need more than one g_array. The above is the fastest I can think of. But it is really bad design - lots of globals, no semantic values, instead, everything happens by side effects to globals.
So, personally I don't like any of them. Which is the commonly accepted best practice for bison?
In most cases, there is no point in using globals. More-or-less modern versions of bison have the %parse-param directive, which allows you to have a sort of 'parsing context'. The context may take care of all memory allocations etc.
It may reflect the current parsing state - i. e. have the notion of 'current array' etc. In this case, your semantic actions can rely on context knowing where you are.
%{
typedef struct tagContext Context;
typedef struct tagCharString CharString;
void start_words(Context* ctx);
void add_word(Context* ctx, CharString* word);
%}
%union {
CharString* word;
}
%parse-param {Context* ctx}
%token<word> WORD
%start words
%%
words
: { start_words(ctx); } word
| words word
;
word
: WORD { add_word(ctx, $1); }
;
If you only parse a list of words and nothing else, you can make it your context.
However, in a simple grammar, it is much clearer if you pass information through YYSTYPE:
%{
typedef struct tagContext Context;
typedef struct tagCharString CharString;
typedef struct tagWordList WordList;
// word_list = NULL to start a new list
WordList* add_word(Context* ctx, WordList* prefix, CharString* word);
%}
%union {
CharString* word;
WordList* word_list;
}
%parse-param {Context* ctx}
%token<word> WORD
%type<word_list> words words_opt
%start words
%%
words
: words_opt WORD { $words = add_word(ctx, $words_opt, $WORD); }
;
words_opt
: %empty { $words_opt = NULL; }
| words
;
Performance differences between the two approaches seem to be negligible.
Memory cleanup
If your input text is parsed without errors, you are always responsible of cleaning up all dynamic memory. However, if your input text causes parse errors, the parser will have to discard some tokens. There may be two approaches to cleanup in this case.
First, you can keep track of all memory allocations in your context and free them all when destroying the context.
Second, you can rely on bison destructors:
%{
void free_word_list(WordList* word_list);
%}
%destructor { free_word_list($$); } <word_list>

How to read a char in Vala?

I'm programming in Vala language and I've a problem: I don't know how to read a char (although I know how to read a string and an integer).
string name = stdin.read_line();
int maximum = int.parse(stdin.read_line());
char option = ????;
One option is using scanf(), but if I use it I have problems during execution time.
If you just want to read a single character from a GLib.FileStream, you can use GLib.FileStream.getc, which "reads the next character from stream and returns it as an unsigned char cast to an int, or EOF on end of file or error." Actually, GLib.FileStream.read_line is implemented using getc. So, the "????" in your question would be something like (char) stdin.getc().
If you are trying to interact with a user, note that input is typically buffered by the terminal until the end of a line. If you want to get a character immediately after the user presses a key you'll need to use something like ncurses (for which Vala bindings are distributed with valac).

what does this return parameter means?

+(const char /*wchar_t*/ *)wcharFromString:(NSString *)string
{
return [string cStringUsingEncoding:NSUTF8StringEncoding];
}
Does it return char or wchar_t?
from the method name, it should return wchar_t, but why there is a comment around wchar_t return type?
Source is here:
How to convert wchar_t to NSString?
That code just looks incorrect. They're claiming it does one thing, but it actually does another. The return type is const char *.
This method is not correct. It returns a const char *, encoded as a UTF8 string. That is a perfectly sensible way of getting a C string from an NSString, but nowhere here is anyone actually doing anything with wchar_ts.
wchar_t is a "wide char", and a pointer to it would be a "wide string" (represented by const wchar_t *). These are designed to precisely represent larger character sets, and can be two-byte wide character strings; they use a whole different variant set of string manipulation functions to do things with them. (Strings like this are very rarely seen in iOS development, for what it's worth.)

What is the CCHAR type equivalent in Delphi?

The ShortNameLength member of FILE_BOTH_DIR_INFORMATION structure is declared as follows:
typedef struct FILE_BOTH_DIR_INFORMATION {
...
CCHAR ShortNameLength;
...
};
From the explanation of CCHAR type, CCHAR is a 8-bit Windows (ANSI) character. So, it is equivalent to AnsiChar in Delphi, right? However, the description of ShortNameLength member of FILE_BOTH_DIR_INFORMATION structure says,
“ShortNameLength specifies the length, in bytes, of the short file name string.”
The statement makes me think that the CCHAR equivalent is Byte in Delphi. Another example is the NumberOfProcessors member of SYSTEM_BASIC_INFORMATION which is declared in winternl.h as follows:
typedef struct _SYSTEM_BASIC_INFORMATION {
BYTE Reserved1[24];
PVOID Reserved2[4];
CCHAR NumberOfProcessors;
}
Once again, the CCHAR type seems to be used in a Byte context, rather than AnsiChar context.
Now, I confuse, whether to use AnsiChar or Byte as a CCHAR equivalent in Delphi.
Note
JwaWinType.pas of JEDI Windows API declares CCHAR as AnsiChar.
It's a byte, or at least, it is used as a 1 byte integer. In C, chars can be used for this purpose. In Delphi, you couldn't do that without typecasting. So you could use Char, but then you would need to give it the value 'A' or Chr(65) to indicate a string of 65 characters. Now, that would be silly. :-)
To be able to pass it to the API it must have the same size. Apart from that, the callee will not even know how it is declared, so declaring it as a Delphi byte is the most logical solution. A choice backed up by the other declaration you found.
I believe the explanation of CCHAR is wrong. The C prefix indicates that this is a count of characters so this is probably a simple copy-paste error done by Microsoft when writing the explanation.
It is stored in a byte and it is used to count the number of bytes of a string of characters. These characters may be wide characters but the CCHAR value still counts the number of bytes used to store the characters.
The natural translation for this type is Byte. If you marshal it to a character type like AnsiChar you will have to convert the character to an integer value (e.g. a byte) before using it.

Resources