How to Convert Ansi to UTF 8 with TXMLDocument in Delphi - delphi

It's possible to convert the XML to UTF-8 encoding in Delphi 6?
Currently that's what I am doing:
Fill TXMLDocument with AnsiString
At the end convert the Data to UTF-8 by using WideStringVariable = AnsiToUtf8(Doc.XML.Text);
Save the value of WideStringVariable to file using TFileStream and Adding BOM for UTF8 at the file beggining.
CODE:
Procedure SaveAsUTF8( const Name:String; Data: TStrings );
const
cUTF8 = $BFBBEF;
var
W_TXT: WideString;
fs: TFileStream;
wBOM: Integer;
begin
if TRIM(Data.Text) <> '' then begin
W_TXT:= AnsiToUTF8(Data.Text);
fs:= Tfilestream.create( Name, fmCreate );
try
wBOM := cUTF8;
fs.WriteBUffer( wBOM, sizeof(wBOM)-1);
fs.WriteBuffer( W_TXT[1], Length(W_TXT)*Sizeof( W_TXT[1] ));
finally
fs.free
end;
end;
end;
If I open the file in Notepad++ or another editor that detects encoding, it shows me UTF-8 with BOM. However, it seems like the text it's not properly encoded.
What is wrong and how can I fix it?
UPDATE: XML Properties:
XMLDoc.Version := '1.0';
XMLDoc.Encoding := 'UTF-8';
XMLDoc.StandAlone := 'yes';

You can save the file using standard SaveToFile method over the TXMLDocument variable: http://docs.embarcadero.com/products/rad_studio/delphiAndcpp2009/HelpUpdate2/EN/html/delphivclwin32/XMLDoc_TXMLDocument_SaveToFile.html
Whether the file would be or not UTF8 you have to check using local tools like aforementioned Notepad++ or Hex Editor or anything else.
If you insist of using intermediate string and file stream, you should use the proper variable. AnsiToUTF8 returns UTF8String type and that is what to be used.
Compiling `WideStringVar := AnsiStringSource' would issue compiler warning and
It is a proper warning. Googling for "Delphi WideString" - or reading Delphi manuals on topic - shows that WideString aka Microsoft OLE BSTR keeps data in UTF-16 format. http://delphi.about.com/od/beginners/l/aa071800a.htm
Thus assignment UTF16 string <= 8-bit source would necessarily convert data and thus dumping WideString data can not be dumping UTF-8 text by the definition of WideString
Procedure SaveAsUTF8( const Name:String; Data: TStrings );
const
cUTF8: array [1..3] of byte = ($EF,$BB,$BF)
var
W_TXT: UTF8String;
fs: TFileStream;
Trimmed: AnsiString;
begin
Trimmed := TRIM(Data.Text);
if Trimmed <> '' then begin
W_TXT:= AnsiToUTF8(Trimmed);
fs:= TFileStream.Create( Name, fmCreate );
try
fs.WriteBuffer( cUTF8[1], sizeof(cUTF8) );
fs.WriteBuffer( W_TXT[1], Length(W_TXT)*Sizeof( W_TXT[1] ));
finally
fs.free
end;
end;
end;
BTW, this code of yours would not create even empty file if the source data was empty. It looks rather suspicious, though it is you to decide whether that is an error or not wrt the rest of your program.
The proper "uploading" of received file or stream to web is yet another issue (to be put as a separate question on Q&A site like SO), related to testing conformance with HTTP. As a foreword, you can readsome hints at WWW server reports error after POST Request by Internet Direct components in Delphi

In order to have the correct encoding inside the document, you should set it by using the Encoding property in your XML Document, like this:
myXMLDocument.Encoding := 'UTF-8';
I hope this helps.

You simply need to call the SaveToFile method of the document:
XMLDoc.SaveToFile(FileName);
Since you specified the encoding already, the component will use that encoding.
This won't include a BOM, but that's generally what you want for an XML file. The content of the file will specify the encoding.
As regards your SaveAsUTF8 method, it is not needed, but it is easy to fix. And that may be instructive to you.
The problem is that you are converting to UTF-16 when you assign to a WideString variable. You should instead put the UTF-8 text into an AnsiString variable. Changing the type of the variable that you named W_TXT to AnsiString is enough.
The function might look like this:
Procedure SaveAsUTF8(const Name: string; Data: TStrings);
const
UTF8BOM: array [0..2] of AnsiChar = #$EF#$BB#$BF;
var
utf8: AnsiString;
fs: TFileStream;
begin
utf8 := AnsiToUTF8(Data.Text);
fs:= Tfilestream.create(Name, fmCreate);
try
fs.WriteBuffer(UTF8BOM, SizeOf(UTF8BOM));
fs.WriteBuffer(Pointer(utf8)^, Length(utf8));
finally
fs.free;
end;
end;

Another solution:
procedure SaveAsUTF8(const Name: string; Data: TStrings);
var
fs: TFileStream;
vStreamWriter: TStreamWriter;
begin
fs := TFileStream.Create(Name, fmCreate);
try
vStreamWriter := TStreamWriter.Create(fs, TEncoding.UTF8);
try
vStreamWriter.Write(Data.Text);
finally
vStreamWriter.Free;
end;
finally
fs.free;
end;
end;

Related

Saving my BlobField in a different codepage

My simple code is:
var
TMPStream : TStringStream;
myencoding: TEncoding;
...
try
myencoding := TEncoding.GetEncoding(CP_UTF8);
TMPStream := TStringStream.Create('', myencoding);
try
(IBQueryTMP.FieldByName('MYTEXT') as TBlobField).SaveToStream(TMPStream);
TMPStream.SaveToFile(ExtractFilePath(Application.ExeName)+'myfile.txt');
except
Showmessage ('Error');
end;
finally
TMPStream.Free;
myencoding.Free;
end;
As I see in the file (for instance, in Notepad++), the codepage is UTF-16 Little Endian, although it was declared UTF-8. The database is UTF-8, too.
What's wrong?

Encode string variable to UTF-16LE base64 using Delphi

I'm looking to encode a string variable to UTF-16LE and base64 , the problem is that I find nothing about how to do UTF-16LE in Delphi.
Example in Python :
from base64 import b64encode
b64encode('my text'.encode('UTF-16LE'))
Example in Ruby :
require "base64"
Base64.encode64('my text'.force_encoding('UTF-16LE'))
As I can do this in Delphi?
Updated :
procedure TFormTest.btnTestClick(Sender: TObject);
var
dest, src: TEncoding;
srcBytes, destBytes: TBytes;
Encoder: TIdEncoderMime;
begin
Encoder := TIdEncoderMime.Create(nil);
src := TEncoding.Unicode;
srcBytes := src.GetBytes(Edit1.Text);
Edit2.Text := Encoder.EncodeBytes(srcBytes);
FreeAndNil(Encoder);
end;
Is a valid base64 UTF-16LE created?
Powershell tells me it is invalid
Command to use :
(New-Object System.Net.WebClient).DownloadFile('http://localhos/update_program.exe','updater.exe'); Start-Process 'updater.exe'
Output error :
Missing expression after unary operator '-'.
What you have shown is technically correct. The String gets encoded to a UTF-16LE byte array first, and then the bytes get base64 encoded.
Since you are calling TIdEncoderMIME.Create() to create an object instance, you should be using the Encode() instance method instead of the EncodeBytes() static method (which creates another instance internally):
procedure TFormTest.btnTestClick(Sender: TObject);
var
Encoder: TIdEncoderMIME;
begin
Encoder := TIdEncoderMIME.Create(nil);
// prior to Indy 10.6.0, use TIdTextEncoding.Unicode
// instead of IndyTextEncoding_UTF16LE...
Edit2.Text := Encoder.Encode(Edit1.Text, IndyTextEncoding_UTF16LE);
Encoder.Free;
end;
Which can be simplified further using the EncodeString() static method:
procedure TFormTest.btnTestClick(Sender: TObject);
begin
// prior to Indy 10.6.0, use TIdTextEncoding.Unicode
// instead of IndyTextEncoding_UTF16LE...
Edit2.Text := TIdEncoderMIME.EncodeString(Edit1.Text, IndyTextEncoding_UTF16LE);
end;
But either way, the output is all the same. So any problem you are still having has to be elsewhere. But you have not provided any details about how you are validating the data, what tools are rejecting it, what errors are actually being reported, etc.

How to save classic Delphi string to disk (and read them back)?

I want to achieve a very very basic task in Delphi: to save a string to disk and load it back. It seems trivial but I had problems doing this TWICE since I upgraded to IOUtils (and one more time before that... this is why I took the 'brilliant' decision to upgrade to IOUtils).
I use something like this:
procedure WriteToFile(CONST FileName: string; CONST uString: string; CONST WriteOp: WriteOperation);
begin
if WriteOp= (woOverwrite)
then IOUtils.TFile.WriteAllText (FileName, uString) //overwrite
else IOUtils.TFile.AppendAllText(FileName, uString); //append
end;
Simple right? What could go wrong? Well, I recently stepped into a (another) bug in IOUtils. So, TFile is buggy. The bug is detailed here.
Anyone has can share an alternative (or simply your thoughts/ideas) that is not based on IOUtils and it is known to work? Well... the code above also worked for a while for me... So, I know if difficult to guaranty that a piece of code (no matter how small) will really work!
Also I would REALLY like to have my WriteToFile procedure to save the string to an ANSI file when it is possible (the uString contains only ANSI chars) and as Unicode otherwise.
Then the ReadAFile function should automagically detect the encoding and correctly read the string back.
The idea is that there are still text editors out there that will wrongly open/interpret an Unicode/UTF file. So, whenever possible, give a good old ANSI text file to the user.
So:
- Overwrite/Append
- Save as ANSI when possible
- Memory efficient (don't eat 4GB of ram when the file to load is 2GB)
- Should work with any text file (up to 2GB, obviously)
- No IOUtils (too buggy to be of use)
Then the ReadAFile function should automagically detect the encoding and correctly read the string back.
This is not possible. There exists files that are well-formed if interpreted as any text encoding. For instance see The Notepad file encoding problem, redux.
This means that your goals are unattainable and that you need to change them.
My advice is to do the following:
Pick a single encoding, UTF-8, and stick to it.
If the file does not exists, create it and write UTF-8 bytes to it.
If the file exists, open it, seek to the end, and append UTF-8 bytes.
A text editor that does not understand UTF-8 is not worth supporting. If you feel inclined, include a UTF-8 BOM when you create the file. Use TEncoding.UTF8.GetBytes and TEncoding.UTF8.GetString to encode and decode.
Just use TStringList, until size of file < ~50-100Mb (it depends on CPU speed):
procedure ReadTextFromFile(const AFileName: string; SL: TStringList);
begin
SL.Clear;
SL.DefaultEncoding:=TEncoding.ANSI; // we know, that old files has this encoding
SL.LoadFromFile(AFileName, nil); // let TStringList detect real encoding.
// if not - it just use DefaultEncoding.
end;
procedure WriteTextToFile(const AFileName: string; const TextToWrite: string);
var
SL: TStringList;
begin
SL:=TStringList.Create;
try
ReadTextFromFile(AFileName, SL); // read all file with encoding detection
SL.Add(TextToWrite);
SL.SaveToFile(AFileName, TEncoding.UTF8); // write file with new encoding.
// DO NOT SET SL.WriteBOM to False!!!
finally
SL.Free;
end;
end;
The Inifiles unit should support unicode. At least according to this answer: How do I read a UTF8 encoded INI file?
Inifiles are quite commonly used to store strings, integers, booleans and even stringlists.
procedure TConfig.ReadValues();
var
appINI: TIniFile;
begin
appINI := TIniFile.Create(ChangeFileExt(Application.ExeName,'.ini'));
try
FMainScreen_Top := appINI.ReadInteger('Options', 'MainScreen_Top', -1);
FMainScreen_Left := appINI.ReadInteger('Options', 'MainScreen_Left', -1);
FUserName := appINI.ReadString('Login', 'UserName', '');
FDevMode := appINI.ReadBool('Globals', 'DevMode', False);
finally
appINI.Free;
end;
end;
procedure TConfig.WriteValues(OnlyWriteAnalyzer: Boolean);
var
appINI: TIniFile;
begin
appINI := TIniFile.Create(ChangeFileExt(Application.ExeName,'.ini'));
try
appINI.WriteInteger('Options', 'MainScreen_Top', FMainScreen_Top);
appINI.WriteInteger('Options', 'MainScreen_Left', FMainScreen_Left);
appINI.WriteString('Login', 'UserName', FUserName);
appINI.WriteBool('Globals', 'DevMode', FDevMode);
finally
appINI.Free;
end;
end;
Also see the embarcadero documentation on inifiles: http://docwiki.embarcadero.com/Libraries/Seattle/en/System.IniFiles.TIniFile
Code based on David's suggestions:
{--------------------------------------------------------------------------------------------------
READ/WRITE UNICODE
--------------------------------------------------------------------------------------------------}
procedure WriteToFile(CONST FileName: string; CONST aString: String; CONST WriteOp: WriteOperation= woOverwrite; WritePreamble: Boolean= FALSE); { Write Unicode strings to a UTF8 file. It can also write a preamble }
VAR
Stream: TFileStream;
Preamble: TBytes;
sUTF8: RawByteString;
aMode: Integer;
begin
ForceDirectories(ExtractFilePath(FileName));
if (WriteOp= woAppend) AND FileExists(FileName)
then aMode := fmOpenReadWrite
else aMode := fmCreate;
Stream := TFileStream.Create(filename, aMode, fmShareDenyWrite); { Allow read during our writes }
TRY
sUTF8 := Utf8Encode(aString); { UTF16 to UTF8 encoding conversion. It will convert UnicodeString to WideString }
if (aMode = fmCreate) AND WritePreamble then
begin
preamble := TEncoding.UTF8.GetPreamble;
Stream.WriteBuffer( PAnsiChar(preamble)^, Length(preamble));
end;
if aMode = fmOpenReadWrite
then Stream.Position:= Stream.Size; { Go to the end }
Stream.WriteBuffer( PAnsiChar(sUTF8)^, Length(sUTF8) );
FINALLY
FreeAndNil(Stream);
END;
end;
procedure WriteToFile (CONST FileName: string; CONST aString: AnsiString; CONST WriteOp: WriteOperation);
begin
WriteToFile(FileName, String(aString), WriteOp, FALSE);
end;
function ReadFile(CONST FileName: string): String; {Tries to autodetermine the file type (ANSI, UTF8, UTF16, etc). Works with UNC paths }
begin
Result:= System.IOUtils.TFile.ReadAllText(FileName);
end;

StrAlloc not working after migrating to Delphi XE7

I am working on an application which was recently upgraded from Delphi 2007 to XE7. There is one particular scenario where the conversion of TMemoryStream to PChar is failing. Here is the code:
procedure TCReport.CopyToClipboard;
var
CTextStream: TMemoryStream;
PValue: PChar;
begin
CTextStream := TMemoryStream.Create;
//Assume that this code is saving a report column to CTextStream
//Verified that the value in CTextStream is correct
Self.SaveToTextStream(CTextStream);
//The value stored in PValue below is corrupt
PValue := StrAlloc(CTextStream.Size + 1);
CTextStream.Read(PValue^, CTextStream.Size + 1);
PValue[CTextStream.Size] := #0;
{ Copy text stream to clipboard }
Clipboard.Clear;
Clipboard.SetTextBuf(PValue);
CTextStream.Free;
StrDispose(PValue);
end;
Adding the code for SaveToTextStream:
procedure TCReport.SaveToTextStream(CTextStream: TStream);
var
CBinaryMemoryStream: TMemoryStream;
CWriter: TWriter;
begin
CBinaryMemoryStream := TMemoryStream.Create;
CWriter := TWriter.Create(CBinaryMemoryStream, 24);
try
CWriter.Ancestor := nil;
CWriter.WriteRootComponent(Self);
CWriter.Free;
CBinaryMemoryStream.Position := 0;
{ Convert Binary 'WriteComponent' stream to text}
ObjectBinaryToText(CBinaryMemoryStream, CTextStream);
CTextStream.Position := 0;
finally
CBinaryMemoryStream.Free;
end;
end;
I observed that the StrLen(PChar) is also coming out to be half the size of TMemoryStream. But in Delphi 2007 it was coming out to be same as the size of TMemoryStream.
I know that the above code is assuming the size of a char to be 1 byte, and that could be a problem. But I tried multiple approaches, and nothing works.
Could you suggest a better way to go about this conversion?
Yet again, this is the issue of Delphi 2009 and later using Unicode text. In Delphi 2007 and earlier:
Char is an alias to AnsiChar.
PChar is an alias to PAnsiChar.
string is an alias to AnsiString.
In Delphi 2009 and later:
Char is an alias to WideChar.
PChar is an alias to PWideChar.
string is an alias to UnicodeString.
Your code is written assuming that PChar is PAnsiChar. Hence your problems. You need to stop using StrAlloc anyway. You are making life hard for yourself by manually allocating heap memory here. Let the compiler do the work.
You need to obtain your text in a string variable, and then simply do:
Clipboard.AsText := MyStrVariable;
Exactly how best to obtain the string depends on the facilities that TCReport offers. I expect that it will yield a string directly in which case you'll write something like this:
procedure TCReport.CopyToClipboard;
begin
Clipboard.AsText := Self.ReportAsText;
end;
I'm guessing as to what your functionality your TCReport offers, but I'm sure you know.
By reffering to what hvd and David Heffernan wrote above, one possible way is to change CTextStream on CopyToClipboard to TStringStream as follow :
procedure TCReport.CopyToClipboard;
var
CTextStream: TStringStream;
begin
CTextStream := TStringStream.Create;
try
//Assume no error with Self.SaveToTextStream
Self.SaveToTextStream(CTextStream);
{ Copy text stream to clipboard }
Clipboard.AsText := CTextStream.DataString;
finally
CTextStream.Free;
end;
end;
But you should make sure that SaveToTextStream function provides CTextStream with the exact encoding text data.

How to use Delphi XE's TEncoding to save Cyrillic or ShiftJis text to a file?

I'm trying to save some lines of text in a codepage different from my system's such as Cyrillic to a TFileStream using Delphi XE. However I can't find any code sample to produce those encoded file ?
I tried using the same code as TStrings.SaveToStream however I'm not sure I implemented it correctly (the WriteBom part for example) and would like to know how it would be done elsewhere. Here is my code:
FEncoding := TEncoding.GetEncoding(1251);
FFilePool := TObjectDictionary<string,TFileStream>.Create([doOwnsValues]);
//...
procedure WriteToFile(const aFile, aText: string);
var
Preamble, Buffer: TBytes;
begin
// Create the file if it doesn't exist
if not FFilePool.ContainsKey(aFile) then
begin
// Create the file
FFilePool.Add(aFile, TFileStream.Create(aFile, fmCreate));
// Write the BOM
Preamble := FEncoding.GetPreamble;
if Length(Preamble) > 0 then
FFilePool[aFile].WriteBuffer(Preamble[0], Length(Preamble));
end;
// Write to the file
Buffer := FEncoding.GetBytes(aText);
FFilePool[aFile].WriteBuffer(Buffer[0], Length(Buffer));
end;
Thanks in advance.
Not sure what example are you looking for; may be the following can help - the example converts unicode strings (SL) to ANSI Cyrillic:
procedure SaveCyrillic(SL: TStrings; Stream: TStream);
var
CyrillicEncoding: TEncoding;
begin
CyrillicEncoding := TEncoding.GetEncoding(1251);
try
SL.SaveToStream(Stream, CyrillicEncoding);
finally
CyrillicEncoding.Free;
end;
end;
If I understand it's pretty simple. Declare an AnsiString with affinity for Cyrillic 1251:
type
// The code page for ANSI-Cyrillic is 1251
CyrillicString = type AnsiString(1251);
Then assign your Unicode string to one of these:
var
UnicodeText: string;
CyrillicText: CyrillicString;
....
CyrillicText := UnicodeText;
You can then write CyrillicText to a stream in the traditional manner:
if Length(CyrillicText)>0 then
Stream.WriteBuffer(CyrillicText[1], Length(CyrillicText));
There should be no BOM for an ANSI encoded text file.

Resources