TParam.LoadFromStream is not working in Delphi XE2? - delphi

I have written a below code in Delphi XE2.
var
stream : TStringStream;
begin
stream := TStringStream.Create;
//Some logic to populate stream from memo.
ShowMessage(stream.datastring); //This line is showing correct data
// some Insert query with below parameter setting
ParamByName('Text').LoadFromStream(stream , ftMemo);
But this is storing text as ???? in table.
This type of code is working fine in Delphi 4.
Is there any issue in TParam.LoadFromStream function in Delphi XE2?
EDIT:
Table field is of type 'Text'.

After doing some trial and error methods I have found a solution to this problem.
We can use below code,
ParamByName('Text').AsMemo := SampleMemo.Text;

The root of your issue is that TStringStream does not operate the same way in D2009+ as it did in D4.
In D4, TStringStream was a simple wrapper around an AnsiString variable. The DataString property simply returned a direct reference to that variable, and all reads/writes operated directly on the contents of the variable. The stream's bytes and String characters were basically one and the same thing back then.
In D2009+, TStringStream is now a wrapper around a TBytes array of encoded bytes instead, where the default encoding is the default Ansi encoding of the OS that your app is running on. If you write a string to the stream using WriteString(), it gets encoded from Unicode to bytes using the stream's encoding, and then those encoded bytes are stored. If you read a string from the stream using ReadString(), or read the DataString property, the stored bytes are decoded to a Unicode string. Any other read/write operations operate on the raw encoded bytes instead, like any other stream type would. So when you call TParam.LoadFromStream(), it is reading the raw encoded bytes, not a Unicode string. The stream's raw bytes and String characters are NOT one and the same thing anymore. So the data you see in the ShowMessage() is not the same data that TParam sees.

Related

Problems with unicode text

I use delphi xe3 and i have small problem !! but i don't how to fix it..
problem is with this letter "è" this letter is inside a file path "C:\lène.mp4"
i save this path into a tstringlist , when i save this tstringlist to a file the path will be shown fine inside the txt file ..
but when trying to loading it using tstringlist it will be shown as "è" (showing it inside a memo or int a variable) in this case it gonna be an invalid path ..
but adding the path(string) directly to the tstring list and then passing it to the path variable it works fine
but loading from the file and passing to the path variable it doesnt work (getting "è" instead of "è")
normally i will work with a lot of uncite string but for i'm struggling with that letter
this will not work ..
var
resp : widestring;
xfiles : tstringlist;
begin
xfiles := tstringlist.Create;
try
xfiles.LoadFromFile('C:\Demo6-out.txt'); // this file contains only "C:\lène.mp4"
resp := (xfiles.Strings[0]);
// if i save xfiles to a file "path string" will be saved fine ... !
finally
xfiles.Free ;
end;
but like this it work ..
var
resp : widestring;
xfiles : tstringlist;
begin
xfiles := tstringlist.Create;
try
xfiles.Add('C:lène.mp4');
resp := (xfiles.Strings[0]);
finally
xfiles.Free ;
end;
i'm really confused
First, you should be using UnicodeString instead of WideString. UnicodeString was introduced in Delphi 2009, and is much more efficient than WideString. The RTL uses UnicodeString (almost) everywhere it previously used AnsiString prior to 2009.
Second, something else introduced in Delphi 2009 is SysUtils.TEncoding, which is used for Byte<->Character conversions. Several existing RTL classes, including TStrings/TStringList, were updated to support TEncoding when converting bytes to/from strings.
What happens when you load a file into TStringList is that an internal TEncoding object is assigned to help convert the file's raw bytes to UnicodeString values. Which implementation of TEncoding it uses depends on the character encoding that LoadFromFile() thinks the file is using, if not explicitly stated (LoadFromFile() has an optional AEncoding parameter). If the file has a UTF BOM, a matching TEncoding is used, whether that be TEncoding.UTF8 or TEncoding.(BigEndian)Unicode. If no BOM is present, and the AEncoding parameter is not used, then TEncoding.Default is used, which represents the OS's default charset locale (and thus provides backwards compatibility with existing pre-2009 code).
When saving a TStringList to file, if the list was previously loaded from a file then the same TEncoding used for loading is used for saving, otherwise TEncoding.Default is used (again, for backwards compatibility), unless overwritten by the optional AEncoding parameter of SaveToFile().
In your first example, the input file is most likely encoded in UTF-8 without a BOM. So LoadFromFile() would use TEncoding.Default to interpret the file's bytes. è is the result of the UTF-8 encoded form of è (byte octets 0xC3 0xA8) being misinterpreted as Windows-1252 instead of UTF-8. So, you would have to load the file like this instead:
xfiles.LoadFromFile('C:\Demo6-out.txt', TEncoding.UTF8);
In your second example, you are not loading a file or saving a file. You are simply assigning a string literal (which is unicode-aware in D2009+) to a UnicodeString variable (inside of the TStringList) and then assigning that to a WideString variable (WideString and UnicodeString use the same UTF-16 character encoding, they just different memory managements). So there are no data conversions being performed.
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

How to use an arbitrary string encoding?

I'm trying to get some code working against an API published by a Chinese company. I have a spec and some sample code (in Java), enough to understand most of what's going on, but I ran across one thing I don't know how to do.
String ecodeform = "GBK";
String sm = new String(Hex.encodeHex("Insert message here".getBytes(ecodeform))); //test message
It's creating a string from the char array result of the hex representation of the original string, encoded in GBK format (the standard Chinese character encoding, equivalent to ASCII for English text). I can work out how to do most of that in Delphi, but I don't know how to encode a string to GBK, which is specifically required by this API.
In SysUtils, there's a TEncoding class that comes with a few built-in encodings, such as UTF8, UTF16, and "Default" (the system's default code page), but I don't know how to set up a TEncoding for an arbitrary encoding such as GBK.
Does anyone know how to set this up?
You can use the TEncoding.GetEncoding() method to get a TEncoding object for a specific codepage/charset, eg:
var
Enc: TEncoding;
Bytes: TBytes;
begin
Enc := TEncoding.GetEncoding(936); // or TEncoding.GetEncoding('gb2312')
try
Bytes := Enc.GetBytes('Insert message here');
finally
Enc.Free;
end;
// encode Bytes to hex string as needed...
end;
TEncoding has a GetEncoding method for that. Give it the encoding name or number, and it will return a TEncoding instance.
For GBK, the number I think you want is 936. See Microsoft's list of code pages for more.

TIdHTTP character encoding of POST response

Take following situation:
procedure Test;
var
Response : String;
begin
Response := IdHttp.Post(MyUrL, AStream);
DoSomethingWith(Response);
end;
Now the webserver returns me data in UTF-8.
Suppose it returns me some UTF-8 XML containing the character é.
If I use the variable Response it does not contain this character but it's UTF-8 variant (#C3#A9), so Indy did no decoding?
Now I know how to solve this problem:
procedure Test;
var
Response : String;
begin
Response := UTF8ToString(IdHttp.Post(MyUrL, AStream));
DoSomethingWith(Response);
end;
One caveat with this solution: Delphi raises warning W1058 (Implicit string cast with potential data loss from 'string' to 'RawByteString')
My question : is this the correct way to deal with this problem or can I instruct TIdHTTP to do the conversion to UnicodeString for me?
If you are using an up-to-date version of Indy 10, then the overloaded version of TIdHTTP.Post() that returns a String does decode the data to Unicode, however the actual charset used for the decoding depends on what media type the HTTP Content-Type response header specifies:
if the media type is either application/xml, application/xml-external-parsed-entity, application/xml-dtd, or is not a text/... type but does end with +xml, then the charset specified in the encoding attribute of the XML's prolog is used. If no charset is specified, UTF-8 is used.
otherwise, if the Content-Type response header specifies a charset, then it is used.
otherwise, if the media type is a text/... type, then:
a. if the media type is text/xml, text/xml-external-parsed-entity, or ends with +xml, then us-ascii is used.
b. otherwise ISO-8859-1 is used.
otherwise, Indy's default encoding (ASCII by default) is used.
Without seeing the actual HTTP Content-Type header, it is hard to know which condition your situation falls into. It sounds like it is falling into either #2 or #3b, which would account for the UTF-8 byte values being returned as-is, if ISO-8859-1 or similar charset is being used.
UTF8ToString() expects a UTF-8 encoded RawByteString as input, but you are passing it a UTF-16 encoded UnicodeString instead. The RTL will perform a UTF16->Ansi conversion in that situation, using a default Ansi charset for the conversion. That is why you get the compiler warning, because such a conversion can lose data.
XML is really a binary data format, subject to charset encodings. An XML parser needs to know what the XML's encoding is, and be able to parse the raw encoded bytes accordingly. That is why XML has an explicit encoding attribute right in the XML prolog. However, when TIdHTTP downloads XML as a String, although it does automatically decode it to Unicode, it does not yet update the XML's prolog accordingly.
The real solution is to not download XML as a String in the first place. Download it as a TStream instead (TMemoryStream is a better choice than TStringStream) so your XML parser has access to the original bytes, the original charset declaration, etc. You can pass the TStream to the TXMLDocument.LoadFromStream() method, for instance.
You can do this:
var
sstream: TStringStream;
begin
sstream := TStringStream.Create('', TEncoding.UTF8);
try
IdHttp.Post(MyUrL, AStream, sstream);
DoSomethingWith(sstream.DataString);
finally
sstream.Free;
end;

Is a PChar UTF-8 coded?

I'm writing a tool, which use a C-DLL. The functions of the C-DLL expect a char*, which is in UTF-8 Format.
My question: Can I pass a PChar or do I have to use UTF8Encode(string)?
Consider a string variable named s. On an ANSI Delphi PChar(s) is ANSI encoded. On a Unicode Delphi it is UTF-16 encoded.
Therefore, either way, you need to convert s to UTF-8 encoding. And then you can use PAnsiChar(...) to get a pointer to a null terminated C string.
So, the code you need looks like this:
PAnsiChar(UTF8Encode(s))
Please edit the question and add the tag with your target Delphi version.
Pass it as PAnsiChar; PChar is a joker and may mean different data types. When you work with DLL-like API, you ignore compiler safety net and that means you should make your own. And that means you should use real types, not jokers, the types that would not change no matter which compiler settings and version would be active.
But before getting passing the pointer you should ensure that the source data is encoded in UTF8 actually.
.
Var data: string; buffer: UTF8String; buffer_ptr: PAnsiChar;
Begin
buffer := data + #0;
// transcoding to UTF8 from whatever charset it was, transparently done by Delphi RTL
// last zero to ensure that even for empty string you would have valid pointer below
buffer_ptr := Pointer(#buffer[1]); // making sure there can be no codepage bound to the datatype
C_DLL_CALL(buffeR_ptr);
End;

TSQLQuery.FieldByName().AsString -> TStringStream Corrupts Data

I'm using Delphi XE2. My code pulls data from a SQL-Server 2008 R2 database. The data returned is a nvarchar(max) field with 1,055,227 bytes of data. I use the following code to save the field data to a file:
procedure WriteFieldToFile(FieldName: string; Query: TSQLQuery);
var
ss: TStringStream;
begin
ss := TStringStream.Create;
try
ss.WriteString(Query.FieldByName(FieldName).AsString);
ss.Position := 0;
ss.SaveToFile('C:\Test.txt');
finally
FreeAndNil(ss);
end;
end;
When I inspect the file in a hex viewer, the first 524,287 bytes (exactly 1/2 meg) look correct. The remaining bytes (524,288 to 1,055,227) are all nulls (#0), instead of the original data.
Is this the right way to save a string field from a TSQLQuery to a file? I chose to use TStringStream because I will eventually add code to do other things to the data on the stream, which I can't do with a TFileStream.
TStringStream is TEncoding-aware in XE2, but you are not specifying any encoding in the constructor so TEncoding.Default will be used, meaning that any string you provide to it will internally be converted to the OS default Ansi encoding. Make sure that encoding supports the Unicode characters you are trying to work with, or else specify a more suitable encoding, such as TEncoding.UTF8.
Also make sure that AsString is returning a valid and correct UnicodeString value to begin with. TStringStream will not save the data correctly if it is given garbage as input. Make sure that FieldByName() is returning a pointer to a TWideStringField object and not a TStringField object in order to handle the database's Unicode data correctly.

Resources