Should const string: *const [_:0]u8 = "infer size"; be compilable in zig? - zig

While experimenting with the zig syntax, I noticed the type expression of string literals is omitted in all examples. Which is totally fine, I'm not saying it shouldn't be.
const zig_string = "I am a string"; //it looks nice enough for sure and compiles fine ofcourse
However, because this type omission is a bit inconsistent* with other type declarations in zig, it can lead to beginners (like me) misinterpreting the actual type of string literals (which is fact quite rightfully complicated and 'different'). Anyway, after reading about the type of string literals being 'pointers to (utf-8 encoded) immutable (const), sentinel terminated arrays of u8 bytes' (yes?), with next to the hard coded length field, a terminator field like so: [<length>:0]. To check my own understanding, I thought it reasonable to try adding this type expression to the declaration, similar to how other arrays are conveniently declared, so with an underscore to infer the length, because who likes counting characters?
const string: *const [_:0]u8 = "jolly good"; //doesn't compile: unable to infer array size
But it didn't compile :(.
After dutifully counting characters and now specifying the length of my string however, it proudly compiled :)!
const string: *const [10:0]u8 = "jolly good"; //happily compiles
Which led me to my question:
Why is this length specification needed for string literals and not for other literals/arrays? - (And should this be so?)
Please correct my type description of string literals if I missed an important nuance.
I'd like to know to further deepen my understanding of the way strings are handled in zig.
*although there are more cases where the zig compiler can infer the type without it

Types never have _ in them.
"jolly good" is a string literal. *const [10:0]u8 is the type.
For "other literals/arrays":
const a = [_]u8{ 1, 2, 3 };
[_]u8{ 1, 2, 3 } is an array literal. The type is [3]u8 and it cannot be specified as [_]u8.
Look into slices. They offer a very convenient way to use strings and arrays.

Related

How to define string type in getOrInsertFunction() llvm?

I'm new to llvm and was trying to do instrument. But I found LLVM API only has primitive types, like: getInt32Ty(Ctx).. What i want to do use getOrInsertFunction(),the function argument type is string type.As is known, when argument type is int, we can do it like is :
LLVMContext &Ctx = F.getContext();
Constant *logFunc = F.getParent()->getOrInsertFunction(
"logop", Type::getVoidTy(Ctx), Type::getInt32Ty(Ctx), NULL );
Type::getInt32Ty(Ctx) is the function argument type (int), what i want to do is:
getOrInsertFunction(
"logop", Type::getVoidTy(Ctx), string type, NULL);
The string type i don't know how to define it .In short, could you please tell me how to define it, thanks!
LLVM IR does not define any special string or char types.
Usually either [N x i8] or i8* are used, but it's really up to you - for example, Java-style strings will probably be a struct with some i32 for string length and an i16* for the UTF-16 code points.
LLVM IR does have a "string literal" which is typed as an i8 array - for example c"hello world\0A\00" is [13 x i8]. But that doesn't dictate what string form you should be using.
Keep in mind that if your function is supposed to interop with something, e.g. a hosting C++ application, then you need to use the same string type - in that case whatever std::string is compiling to. You can use Clang or this online demo to check what that type is.

what does this return parameter means?

+(const char /*wchar_t*/ *)wcharFromString:(NSString *)string
{
return [string cStringUsingEncoding:NSUTF8StringEncoding];
}
Does it return char or wchar_t?
from the method name, it should return wchar_t, but why there is a comment around wchar_t return type?
Source is here:
How to convert wchar_t to NSString?
That code just looks incorrect. They're claiming it does one thing, but it actually does another. The return type is const char *.
This method is not correct. It returns a const char *, encoded as a UTF8 string. That is a perfectly sensible way of getting a C string from an NSString, but nowhere here is anyone actually doing anything with wchar_ts.
wchar_t is a "wide char", and a pointer to it would be a "wide string" (represented by const wchar_t *). These are designed to precisely represent larger character sets, and can be two-byte wide character strings; they use a whole different variant set of string manipulation functions to do things with them. (Strings like this are very rarely seen in iOS development, for what it's worth.)

which one of these is an example of coercion

I have been pondering a multiple choice question on coercion. One of the 4 examples a,b,c or d is an example of coercion. I narrowed it down to A or B. But I am having a problem choosing between the two. Cane someone please explain why one is coercion and one isn't.
A)
string s="tomat";
char c='o';
s=s+c;
I thought A could be correct because we have two different types, character and string, being added. Meaning that c is promoted to string, hence coercion.
B)
double x=1.0;
double y=2.0;
int i=(int)(x+y);
I also thought B was the correct answer because the double (x+y) is being turned into a int to be placed in i. But I thought this could be wrong because its being done actively through use of (int) rather than passively such as "int i = x + y"
I'll list the other two options, even though I believe that neither one is the correct answer
C)
char A=0x20;
A = A << 1 | 0x01;
cout << A << endl;
D)
double x=1.0;
double y=x+1;
return 0;
I'm not just looking for an answer, but an explanation. I have read tons of things on coercion and A and B both look like the right answer. So why is one correct and the other not.
I actually think it's B. Even though there's the explicit (int), it's still type coercion (just not automatic type coercion). You're converting a floating point value (probably stored as an IEEE floating point value) to an integer value (probably stored in two's complement).
Whereas A is simply concatenating a character to a string, where a string is just a null terminated array of characters. There's no data type conversion going on there, just a bit of memory manipulation.
I could be wrong though.
EDIT: I would have to agree with Parris. Given that this is a C++ string and not a C array of characters (my mistake), the chracter in A is probably being coerced to a string.
I don't think type casting is equivalent to type coercion, which is why A would probably be the right answer.
B takes a double and casts it to an int, which is more like a conversion than a coercion. In A you aren't converting anything you're being implicit. You are telling the runtime/compiler/whatever "these 2 things are similar can you figure out how to concatenate them?"
C isn't a conversion or coercion its just bit shifting. Although the cout might be coercion... I am not sure if there is coercion to a string there to write to the console.
D might contain a coercion since 1 is an int and you are adding it to a double. However, you can do floating point math with integers having a decimal is just more explicit.
I think A is the most straight forward example of coercion. Although C's cout statement seems suspicious as well.

How can I make fixed-length Delphi strings use wide characters?

Under Delphi 2010 (and probably under D2009 also) the default string type is UnicodeString.
However if we declare...
const
s :string = 'Test';
ss :string[4] = 'Test';
... then the first string s if declared as UnicodeString, but the second one ss is declared as AnsiString!
We can check this: SizeOf(s[1]); will return size 2 and SizeOf(ss[1]); will return size 1.
If I declare...
var
s :string;
ss :string[4];
... than I want that ss is also UnicodeString type.
How can I tell to Delphi 2010 that both strings should be UnicodeString type?
How else can I declare that ss holds four WideChars? The compiler will not accept the type declarations WideString[4] or UnicodeString[4].
What is the purpose of two different compiler declarations for the same type name: string?
The answer to this lies in the fact that string[n], which is a ShortString, is now considered a legacy type. Embarcadero took the decision not to convert ShortString to have support for Unicode. Since the long string was introduced, if my memory serves correctly, in Delphi 2, that seems a reasonable decision to me.
If you really want fixed length arrays of WideChar then you can simply declare array [1..n] of char.
You can't, using string[4] as the type. Declaring it that way automatically makes it a ShortString.
Declare it as an array of Char instead, which will make it an array of 4 WideChars.
Because a string[4] makes it a string containing 4 characters. However, since WideChars can be more than one byte in size, this would be a) wrong, and b) confusing. ShortStrings are still around for backward compatibility, and are automatically AnsiStrings because they consist of [x] one byte chars.

Casting Delphi 2009/2010 string literals to PAnsiChar

So the question is whether or not string literals (or const strings) in Delphi 2009/2010 can be directly cast as PAnsiChar's or do they need an additional cast to AnsiString first for this to work?
The background is that I am calling functions in a legacy DLL with a C interface that has some functions that require C-style char pointers. In the past (before Delphi 2009) code like the following worked like a charm (where the param to the C DLL function is a LPCSTR):
either:
LegacyFunction(PChar('Fred'));
or
const
FRED = 'Fred';
...
LegacyFunction(PChar(FRED));
So in changing to Delphi 2009 (and now in 2010), I changed the call to this:
LegacyFunction(PAnsiChar('Fred'));
or
const
FRED = 'Fred';
...
LegacyFunction(PAnsiChar(FRED));
This seems to work and I get the correct results from the function call. However there is some definite instability in the app that seems to be occurring mostly the second or third time through the code that calls the legacy functions (that was not present before the move to the 2009 version of the IDE). In investigating this, I realized that the native string literal (and const string) in Delphi 2009/2010 is a Unicode string so my cast was possibly in error. Examples here and elsewhere seem to indicate this call should look more like this:
LegacyFunction(PAnsiChar(AnsiString('Fred')))
What confuses me is that with the code above in the second examples, casting the string literal directly to a PAnsiChar does not generate any compiler warnings. If instead of a string literal, I was casting a string var, I would get a suspicious cast warning (and the string would be mangled). This (and the fact that the string is usable in the DLL) leads me to believe the compiler is doing some magic to correctly interpret the string literal as the intended string type. Is this what is happening or is the double cast (first to AnsiString, then to PAnsiChar) really necessary and the lack of it in my code the reason for the hard to track down instability? And does the same answer hold true for const strings as well?
For type-inferred constants (only initializable from literals) the compiler changes the actual text at compile-time, rather than at runtime. That means it knows whether or not the conversion loses data, so it doesn't need to warn you if it doesn't.
To 'visualize' Barry Kelly and Mason Wheeler words:
const
FRED = 'Fred';
var
p: PAnsiChar;
w: PWideChar;
begin
w := PWideChar(Fred);
p := PAnsiChar(Fred);
In ASM:
Unit7.pas.32: w := PWideChar(Fred);
00462146 BFA4214600 mov edi,$004621a4
// no conversion, just a pointer to constant/"-1 RefCounted" UnicodeString
Unit7.pas.33: p := PAnsiChar(Fred);
0046214B BEB0214600 mov esi,$004621b0
// no conversion, just a pointer to constant/"-1 RefCounted" AnsiString
As you can see in both cases PWideChar/PChar(FRED) and PAnsiChar(FRED), there is no conversion and Delphi compiler make 2 constant strings, one AnsiString and one UnicodeString.
Constants, including string literals, are untyped by default, and the compiler will fit them into whatever format works in the context you're using them in. As long as there are no non-ANSI characters in your string literal, the compiler won't have any trouble generating the string as ANSI instead of Unicode in this situation.
As Mason Wheeler points out all is fine as long as you don't have non-ANSI characters in your string const. If you have things like:
const FRED = 'Frédérick';
I'm pretty sure Delphi 2009/2010 will either issue charset hints (and apply a string conversion automatically - thus the hint) or fail at comparing ('Frédérick' is different in ISO-8859-1 than UTF-16).
If you can have "special" characters in your consts you will need to call string conversion.
Here are some basic examples with TStringList:
TStringList.SaveToFile(DestFilename, TEncoding.GetEncoding(28591)); //ISO-8859-1 (Latin1)
TStringList.SaveToFile(DestFilename, TEncoding.UTF8);

Resources