How to use UnicodeString.sprintf() mixed with std::string arguments? - c++builder

I have to add a feature to a project made in C++ with RAD Studio, and I can't seem to wrap my head around all of the different string types.
This compiles:
std::string batchID = "abc";
UnicodeString msg = UnicodeString().sprintf(L"Batch# %s", batchID.c_str());
But the variables contain these values:
[batchID] _Mypair { { { "abc\0\"\0\0€¼ÇwG\x01\0\0", "", "abc\0\"\0\0€¼ÇwG\x01\0\0" }, 3, 15 } }
[msg] Data :02787394 L"Batch# 扡c\"耀잼瞁Ň
Somewhere else in the code, the format string is %ls and it works! In fact, I copied these two lines from elsewhere that works, but this is what I get. Why? How to fix this?
Why are there 20 different string types all incompatible from one another!

What you are getting in msg is commonly known as "Mojibake", which is caused by interpreting string data in the wrong encoding.
UnicodeString is exclusively a UTF-16 encoded string type on all platforms. Internally, the UnicodeString::sprintf() method is using the vsnwprintf() function, where the undecorated %s placeholder expects a C-style null-terminated UTF-16 character string (ie, wchar_t* on Windows, and char16_t* on Posix), but you are giving it an 8bit char* string instead.
To print a char* string using UnicodeString::sprintf(), you need to use the %hs placeholder (not %ls) instead (see Format Specifiers in C/C++), eg:
std::string batchID = "abc";
UnicodeString msg = UnicodeString().sprintf(_D("Batch# %hs"), batchID.c_str());
An alternative solution is to use the UnicodeString::Format() method instead, which accepts both 8bit and 16bit string types in its %s placeholder, eg:
std::string batchID = "abc";
UnicodeString msg = UnicodeString::Format(_D("Batch# %s"), ARRAYOFCONST(( batchID.c_str() )) );
Alternatively, UnicodeString can be constructed from a char* string, so you can prepare the msg content using just char data, and then construct a final UnicodeString at the end, eg:
std::string batchID = "abc";
std::ostringstream oss;
oss << "Batch# " << batchID;
UnicodeString msg = oss.str().c_str();
std::string batchID = "abc";
UnicodeString msg = ("Batch# " + batchID).c_str();
Or, using the {fmt} library (until C++Builder adds support for C++20's std::format()):
std::string batchID = "abc";
UnicodeString msg = fmt::format("Batch# {}", batch).c_str();
Otherwise, just convert the std::string by itself to UnicodeString and concatenate it, eg:
std::string batchID = "abc";
UnicodeString msg = _D("Batch# ") + UnicodeString(batchID.c_str());

Related

Swift: convert const char ** output parameter to String

I'm interacting with a C++ library (with the header in C) which uses const char ** as an output parameter.
After executing a method in that library, the value I need is written in that variable, for example:
CustomMethod(const char **output)
CustomMethod(&output)
// Using the `output` here
Normally, in Swift it's possible to pass just a standard Swift String as a parameter and it will be transparently transformed into the const char * (Interacting with C Pointers - Swift Blog).
For example, I already use the following construct a lot with the same library:
// C
BasicMethod(const char *input)
// Swift
let string = "test"
BasicMethod(string)
However, when it comes to working with const char **, I couldn't just pass a pointer to the Swift String, as I'd expected:
// C
CustomMethod(const char **output)
// Swift
var output: String?
CustomMethod(&output)
Getting an error:
Cannot convert value of type 'UnsafeMutablePointer<String?>' to
expected argument type 'UnsafeMutablePointer<UnsafePointer?>'
(aka 'UnsafeMutablePointer<Optional<UnsafePointer>>')
The only way I could make it work is by manipulating the pointers directly:
// C
CustomMethod(const char **output)
// Swift
var output: UnsafePointer<CChar>?
CustomMethod(&output)
let stringValue = String(cString: json)
Is there any way to use the automatic Swift string to const char ** conversion, or does it only work with const char *?
The bridged C function expects a mutable pointer to a CChar pointer, so you'll need to provide one, there's no automatic bridging here.
var characters: UnsafePointer<CChar>?
withUnsafeMutablePointer(to: &characters) {
CustomMethod($0)
}
if let characters = characters {
let receivedString = String(cString: characters)
print(receivedString)
}
Same code, but in a more FP manner:
var characters: UnsafePointer<CChar>?
withUnsafeMutablePointer(to: &characters, CustomMethod)
var receivedString = characters.map(String.init)
print(receivedString)

AnsiString and += operator

this very little code fragment is heavily confusing me, although I'm only trying to concatenate to strings.
void __fastcall TForm1::Button1Click(TObject *Sender)
{
AnsiString HelloWorld = "Hello ";
HelloWorld += "World";
TStringList *sl1 = new TStringList();
sl1->Add("Hello");
sl1->Strings[0] += " World";
TStringList *sl2 = new TStringList();
sl2->Add("Hello");
sl2->Strings[0] = sl2->Strings[0] + " World";
Memo1->Lines->Add( HelloWorld ); // prints "Hello World"
Memo1->Lines->Add( sl1->Strings[0] ); // prints "Hello" =====> WHY?
Memo1->Lines->Add( sl2->Strings[0] ); // prints "Hello World"
}
Is the operator += not working on TStringList items?
What would be the proper way to do so?
Because when you use Strings[0] you are actually accessing a Property, and not the actual string. As such, when you use
sl1->Strings[0] += " World";
what is really happening is that you are invoking the read method for the property Strings, which is then resulting a string. On that resulting string, you are concatenating something else.
This is not changing the property's inner string at all.
In this case, just for the sake of understanding how it works, you could think that reading
sl1->Strings[0]
is the same as calling a function that returns a string (and in fact, it is! Because when you read a property it is running it's read method).

How to convert ascii value in integer to its character equivalent in flutter?

I am new to flutter and I just want to display a list of alphabets in a for loop. I just want to know how can I convert the integer to ascii character. I searched for this and I found dart:convert library, but I don't know how to use it.
I want something like -
for(int i=65; i<=90; i++){
print(ascii(i)); //ascii is not any method, its just to understand my question
}
It should print the letters from 'A' to 'Z'.
You don't need dart:convert, you can just use String.fromCharCode
print(String.fromCharCode(i));
More info: https://api.dartlang.org/stable/2.0.0/dart-core/String/String.fromCharCode.html
In Dart, use these 2 functions to convert from int (byte) to String (char) and vice versa.
int value = ';'.codeUnitAt(0); //get unicode for semicolon
String char = String.fromCharCode(value); //get the semicolon string ;
This ia exactly what you need to generate your alphabet:
import 'dart:core';
void RandomString() {
List<int> a = new List<int>.generate(26, (int index) => index + 65);
String f = String.fromCharCodes(a);
print(f);
}
void main() {
RandomString();
}
Also You can copy, paste and test it here https://dartpad.dartlang.org/

Why str.FirstChar() does not return the first char?

UnicodeString us = "12345";
Label1->Caption= us.FirstChar();
The caption will show "12345" instead of "1".
Why is that?
The help page for FirstChar is empty:
Embarcadero Technologies does not currently have any additional
information. Please help us document this topic by using the
Discussion page!
The declaration is this:
const WideChar* FirstChar() const;
const WideChar* LastChar() const;
WideChar* FirstChar();
WideChar* LastChar();
The UnicodeString::FirstChar() method returns a pointer to the first character (just as the UnicodeString::LastChar() returns a pointer to the last character).
The data being pointed to is null-terminated. So the statement Label1->Caption = us.FirstChar(); is the same as if you had written Label1->Caption = L"12345"; instead. The TLabel::Caption property is also a UnicodeString, which has a constructor that accepts a null-terminated WideChar* pointer as input. That is why you see the result you are getting.
If you want just the first character by itself, use UnicodeString::operator[] instead:
Label1->Caption = us[1]; // UnicodeString is 1-indexed!
Or, using FirstChar(), simply dereference the pointer:
Label1->Caption = *(us.FirstChar());
Note that if the UnicodeString::IsEmpty() method returns true, both approaches will fail. operator[] will throw an ERangeError exception. FirstChar() will return a NULL pointer, which is undefined behavior to dereference. So watch out for that, eg:
if (!us.IsEmpty())
Label1->Caption = us[1];
else
Label1->Caption = _D("");
if (!us.IsEmpty())
Label1->Caption = *(us.FirstChar());
else
Label1->Caption = _D("");
A safer option would be to use the UnicodeString::SubString() method instead, which will return an empty string if the requested substring is out of range:
Label1->Caption = us.SubString(1, 1); // also 1-indexed!
Alternatively, you can use the RTL's System::Strutils::LeftStr() function instead:
#include <System.StrUtils.hpp>
Label1->Caption = LeftStr(us, 1);

golang convert iso8859-1 to utf8

I am trying to convert an ISO 8859-1 encoded string to UTF-8.
The following function works with my testdata which contains german umlauts, but I'm not quite sure what source encoding the rune(b) cast assumes. Is it assuming some kind of default encoding, e.g. ISO8859-1 or is there any way to tell it what encoding to use?
func toUtf8(iso8859_1_buf []byte) string {
var buf = bytes.NewBuffer(make([]byte, len(iso8859_1_buf)*4))
for _, b := range(iso8859_1_buf) {
r := rune(b)
buf.WriteRune(r)
}
return string(buf.Bytes())
}
rune is an alias for int32, and when it comes to encoding, a rune is assumed to have a Unicode character value (code point). So the value b in rune(b) should be a unicode value. For 0x00 - 0xFF this value is identical to Latin-1, so you don't have to worry about it.
Then you need to encode the runes into UTF8. But this encoding is simply done by converting a []rune to string.
This is an example of your function without using the bytes package:
func toUtf8(iso8859_1_buf []byte) string {
buf := make([]rune, len(iso8859_1_buf))
for i, b := range iso8859_1_buf {
buf[i] = rune(b)
}
return string(buf)
}
The effect of
r := rune(expression)
is:
Declare variable r with type rune (alias for int32).
Initialize variable r with the value of expresion.
No (re)encoding is involved and saying which one should be optionally used is possible only by explicitly writing/handling some re-encoding in code. Luckily, in this case no (re)encoding is necessary, Unicode incorporated those codes of ISO 8859-1 in a comparable way as ASCII. (If I checked correctly here)

Resources