How to convert widestring to string of unicode bytes?

How to convert widestring to string of unicode bytes? - delphi

When i create a file in Notepad, containing (example) the string 1d and save as unicode file, i get a 6 bytes size file containing the bytes #255#254#49#0#100#0.
OK. Now I need a Delphi 6 function which takes (example) input the widestring 1d and returns the string containing #255#254#49#0#100#0 (and viceversa).
How?
Thanks.
D

It is easier to read bytes if you use hex. #255#254#49#0#100#0 is represented in hex as
FF FE 31 00 64 00
Where
FF FE is the UTF-16LE BOM, which identifies the following bytes as being encoded as UTF-16 using values in Little Endian.
31 00 is the ASCII character '1'
64 00 is the ASCII character 'd'.
To create a WideString containing these bytes is very easy:
var
W: WideString;
S: String;
begin
S := '1d';
W := WideChar($FEFF) + S;
end;
When an AnsiString (which is Delphi 6's default string type) is assigned to a WideString, the RTL automatically converts the AnsiString data from 8-bit to UTF-16LE using the local machine's default Ansi charset for the conversion.
Going the other way is just as easy:
var
W: WideString;
S: String;
begin
W := WideChar($FEFF) + '1d';
S := Copy(W, 2, MaxInt);
end;
When you assign a WideString to an AnsiString, the RTL automatically converts the WideString data from UTF-16LE to 8-bit using the default Ansi charset.
If the default Ansi charset is not suitable for your needs (say the 8-bit data needs to be encoded in a different charset), you will have to use the Win32 API MultiByteToWideChar() and WideCharToMultiByte() functions directly (or 3rd party library with equivalent functionality) so you can specify the desired charset/codepage as needed.
Now then, Delphi 6 does not offer any useful helpers to read Unicode files (Delphi 2009 and later do), so you will have to do it yourself manually, for example:
function ReadUnicodeFile(const FileName: string): WideString;
const
cBOM_UTF8: array[0..2] of Byte = ($EF, $BB, $BF);
cBOM_UTF16BE: array[0..1] of Byte = ($FE, $FF);
cBOM_UTF16LE: array[0..1] of Byte = ($FF, $FE);
cBOM_UTF32BE: array[0..3] of Byte = ($00, $00, $FE, $FF);
cBOM_UTF32LE: array[0..3] of Byte = ($FF, $FE, $00, $00);
var
FS: TFileStream;
BOM: array[0..3] of Byte;
NumRead: Integer;
U8: UTF8String;
U32: UCS4String;
I: Integer;
begin
Result := '';
FS := TFileStream.Create(FileName, fmOpenRead or fmShareDenyWrite);
try
NumRead := FS.Read(BOM, 4);
// UTF-8
if (NumRead >= 3) and CompareMem(#BOM, #cBOM_UTF8, 3) then
begin
if NumRead > 3 then
FS.Seek(-(NumRead-3), soCurrent);
SetLength(U8, FS.Size - FS.Position);
if Length(U8) > 0 then
begin
FS.ReadBuffer(PAnsiChar(U8)^, Length(U8));
Result := UTF8Decode(U8);
end;
end
// the UTF-16LE and UTF-32LE BOMs are ambiguous! Check for UTF-32 first...
// UTF-32
else if (NumRead = 4) and (CompareMem(#BOM, cBOM_UTF32LE, 4) or CompareMem(#BOM, cBOM_UTF32BE, 4)) then
begin
// UCS4String is not a true string type, it is a dynamic array, so
// it must include room for a null terminator...
SetLength(U32, ((FS.Size - FS.Position) div SizeOf(UCS4Char)) + 1);
if Length(U32) > 1 then
begin
FS.ReadBuffer(PUCS4Chars(U32)^, (Length(U32) - 1) * SizeOf(UCS4Char));
if CompareMem(#BOM, cBOM_UTF32BE, 4) then
begin
for I := Low(U32) to High(U32) do
begin
U32[I] := ((U32[I] and $000000FF) shl 24) or
((U32[I] and $0000FF00) shl 8) or
((U32[I] and $00FF0000) shr 8) or
((U32[I] and $FF000000) shr 24);
end;
end;
U32[High(U32)] := 0;
// Note: UCS4StringToWidestring() does not actually support UTF-16,
// only UCS-2! If you need to handle UTF-16 surrogates, you will
// have to convert from UTF-32 to UTF-16 manually, there is no RTL
// or Win32 function that will do it for you...
Result := UCS4StringToWidestring(U32);
end;
end
// UTF-16
else if (NumRead >= 2) and (CompareMem(#BOM, cBOM_UTF16LE, 2) or CompareMem(#BOM, cBOM_UTF16BE, 2)) then
begin
if NumRead > 2 then
FS.Seek(-(NumRead-2), soCurrent);
SetLength(Result, (FS.Size - FS.Position) div SizeOf(WideChar));
if Length(Result) > 0 then
begin
FS.ReadBuffer(PWideChar(Result)^, Length(Result) * SizeOf(WideChar));
if CompareMem(#BOM, cBOM_UTF16BE, 2) then
begin
for I := 1 to Length(Result) then
begin
Result[I] := WideChar(
((Word(Result[I]) and $00FF) shl 8) or
((Word(Result[I]) and $FF00) shr 8)
);
end;
end;
end;
end
// something else, assuming UTF-8
else
begin
if NumRead > 0 then
FS.Seek(-NumRead, soCurrent);
SetLength(U8, FS.Size - FS.Position);
if Length(U8) > 0 then
begin
FS.ReadBuffer(PAnsiChar(U8)^, Length(U8));
Result := UTF8Decode(U8);
end;
end;
finally
FS.Free;
end;
end;
Update: if you want to store UTF-16LE encoded bytes inside of an AnsiString variable (why?), then you can Move() the raw bytes of a WideString's character data into the memory block of an AnsiString: eg:
function WideStringAsAnsi(const AValue: WideString): AnsiString;
begin
SetLength(Result, Length(AValue) * SizeOf(WideChar));
Move(PWideChar(AValue)^, PAnsiChar(Result)^, Length(Result));
end;
var
W: WideString;
S: AnsiString;
begin
W := WideChar($FEFF) + '1d';
S := WideStringAsAnsi(W);
end;
I would not suggest misusing AnsiString like this, though. If you need bytes, operate on bytes, eg:
type
TBytes = array of Byte;
function WideStringAsBytes(const AValue: WideString): TBytes;
begin
SetLength(Result, Length(AValue) * SizeOf(WideChar));
Move(PWideChar(AValue)^, PByte(Result)^, Length(Result));
end;
var
W: WideString;
B: TBytes;
begin
W := WideChar($FEFF) + '1d';
B := WideStringAsBytes(W);
end;

A WideString is already a string of Unicode bytes. Specifically, in UTF16-LE encoding.
The two extra bytes you see in the Unicode file saved by Notepad are called a BOM - Byte Order Mark. This is a special character in Unicode that is used to indicate the order of bytes in the data that follows, to ensure that the string is decoded correctly.
Adding a BOM to a string (which is what you are asking for) is simply a matter of pre-fixing the string with that special BOM character. The BOM character is U+FEFF (that is the Unicode notation for the hex representation of a 'character').
So, the function you need is very simple:
function WideStringWithBOM(aString: WideString): WideString;
const
BOM = WideChar($FEFF);
begin
result := BOM + aString;
end;
However, although the function is very simple, this possibly isn't the end of the matter.
The string that is returned from this function will include the BOM and as far as any Delphi code is concerned that BOM will be treated as part of the string.
Typically you would only add a BOM to string when passing that string to some external recipient (via a file or web service response for example) if there is no other mechanism for indicating the encoding you have used.
Likewise, when reading strings from some received data which may be Unicode you should check the first two bytes:
If you find #255#254 ($FFFE) then you know that the bytes in the U+FEFF BOM have been switched (U+FFFE is not a valid Unicode character). i.e. the string that follows is UTF16-LE. Therefore, for a Delphi WideString you can discard those first two bytes and load the remaining bytes directly in to a suitable WideString variable.
If you find #254#255 then the bytes in the U+FEFF BOM have not been switched around. i.e. you know that the string that follows is UTF16-BE. In that case you again need to discard the first two bytes but when loading the remaining bytes into the WideString you must switch each pair of bytes around to convert from the UTF16-BE bytes to the UTF16-LE encoding of a WideString.
If the first 2 bytes are #255#254 (or vice versa) then you are either dealing with UTF16-LE without a BOM or possibly some other encoding entirely.
Good luck. :)

Related

Delphi FMX dcpcrypt wrong result on macOS 64-bit

I am using Delphi 10.3.2
I am trying to use the DCPCrypt units with the firemonkey framework to encrypt a string.
It works 100% on Win32, Win64 and macOS 32 targets, the result is always the same. But when I compile for macOS 64, the result is different.
This is the code used:
function EncodeAES(code:ansistring; key:ansistring):string;
var
s,u:ansistring;
enc: TEncoding;
k,iv, Data, Crypt: TBytes;
Cipher: TDCP_rijndael;
begin
u:='';
enc:=TEncoding.ANSI;
Data := enc.GetBytes(code); // VMpuXJGbUNOv
k:=enc.GetBytes(key); // kj3214ed)k32nre2
iv:=enc.GetBytes(u);
Cipher:=TDCP_rijndael.Create(nil);
Cipher.Init(K[0], 128, #iv[0]);
Crypt:=Copy(Data, 0, Length(Data));
BytePadding(Crypt, Cipher.BlockSize, pmPKCS7);
Cipher.EncryptECB(Crypt[0], Crypt[0]);
Cipher.Free;
s:=tencoding.ANSI.GetString(crypt);
result:=StringToHex(s);
end;
and this is the BytePadding function:
procedure BytePadding(var Data: TBytes; BlockSize: integer; PaddingMode: TPaddingMode);
var
I, DataBlocks, DataLength, PaddingStart, PaddingCount: integer;
begin
BlockSize := BlockSize div 8;
if PaddingMode in [pmZeroPadding, pmRandomPadding] then
if Length(Data) mod BlockSize = 0 then Exit;
DataBlocks := (Length(Data) div BlockSize) + 1;
DataLength := DataBlocks * BlockSize;
PaddingCount := DataLength - Length(Data);
if PaddingMode in [pmANSIX923, pmISO10126, pmPKCS7] then
if PaddingCount > $FF then Exit;
PaddingStart := Length(Data);
SetLength(Data, DataLength);
case PaddingMode of
pmZeroPadding, pmANSIX923, pmISO7816: // fill with $00 bytes
FillChar(Data[PaddingStart], PaddingCount, 0);
pmPKCS7: // fill with PaddingCount bytes
FillChar(Data[PaddingStart], PaddingCount, PaddingCount);
pmRandomPadding, pmISO10126: // fill with random bytes
for I := PaddingStart to DataLength-1 do Data[I] := Random($FF);
end;
case PaddingMode of
pmANSIX923, pmISO10126:
Data[DataLength-1] := PaddingCount; // set end-marker with number of bytes added
pmISO7816:
Data[PaddingStart] := $80; // set fixed end-markder $80
end;
end;
I call the function like:
procedure TForm1.BtnClick(Sender: TObject);
var
s:string;
begin
s:=EncodeAES('VMpuXJGbUNOv','kj3214ed)k32nre2');
end;
Result with Win32/Win64/macOS 32 (correct):
E32A9DE47CC60BDB70CA27885128D17A
Result with macOS 64 (wrong):
CF622155545E485AC3A083E8A0478493
What am I doing wrong?

Encryption operates on binary data, not textual data. When dealing with text, you have to encode characters to bytes before encryption, and then decode bytes to characters after decryption. Which means using the same character encoding before encryption and after decryption. You are attempting to do that encoding, but you are not accounting for the fact that TEncoding.ANSI is NOT portable across OS platforms, or even across different machines using the same OS platform. For better portability, you need to use a consistent encoding, such as TEncoding.UTF8.
Also, TEncoding operates only with UnicodeString, so using TEncoding.GetBytes() and TEncoding.GetString() with AnsiString will perform implicit conversions between ANSI and Unicode using the RTL's definition of ANSI, not yours, producing bytes you are likely not expecting if your strings contain any non-ASCII characters in them.
Your EncodeAES() function is best off using (Unicode)String for all of its string handling and forget that AnsiString even exists. Although platforms like Linux are largely UTF-8 systems, Delphi's default (Unicode)string uses UTF-16 on all platforms. If you want to encode ANSI strings, use RawByteString to avoid any implicit conversions if calling code wants to use UTF8String, AnsiString(N), etc. Encrypt the 8bit characters as-is without using TEncoding at all. Use TEncoding only for UnicodeString data.
Lastly, your EncodeAES() function should not be decoding encrypted bytes to a UnicodeString just to convert that to a hex string. You should be hex-encoding the encrypted bytes as-is.
Try this instead:
function EncodeAES(const Data: TBytes; const Key: string): string; overload;
const
Hex: array[0..15] of Char = ('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F');
var
enc: TEncoding;
iv, k, Crypt: TBytes;
Cipher: TDCP_rijndael;
B: Byte;
I: Integer;
begin
enc := TEncoding.UTF8;
iv := enc.GetBytes('');
k := enc.GetBytes(Key);
Cipher := TDCP_rijndael.Create(nil);
try
Cipher.Init(k[0], 128, #iv[0]);
Crypt := Copy(Data, 0, Length(Data));
BytePadding(Crypt, Cipher.BlockSize, pmPKCS7);
Cipher.EncryptECB(Crypt[0], Crypt[0]);
finally
Cipher.Free;
end;
SetLength(Result, Length(Crypt)*2);
I := Low(Result);
for B in Crypt do
begin
Result[ I ] := Hex[(B shr 4) and $F];
Result[I+1] := Hex[B and $F];
Inc(I, 2);
end;
end;
function EncodeAES(const S, Key: string): string; overload;
begin
Result := EncodeAES(TEncoding.UTF8.GetBytes(S), Key);
end;
function EncodeAES(const S: RawByteString; const Key: string): string; overload;
begin
Result := EncodeAES(BytesOf(S), Key);
end;

How to read all bytes from server using Indy Client in Delphi?

I am using Indy client to read the message the server is sending to me (client). It sends 512 bytes of data to me in one go. This 512 bytes of data is composed of two datatypes (Word and String). For example, it sends Word of 2 bytes, then again Word of 2 bytes and then String of 50 bytes and so on. I am trying following code to cope with this problem:
var BufferArray : Array[0..512] of Byte;
if IdTCPClient1.IOHandler.InputBufferIsEmpty then
begin
if IdTCPClient1.IOHandler.CheckForDataOnSource(1000) then
begin
Edit1.Text := idtcpclient1.IOHandler.ReadBytes(BufferArray ,512, true);
end;
end;
I am getting error on line Edit1.Text := idtcpclient1.IOHandler.ReadBytes(BufferArray ,512, true); Error: Type of actual and formal var parameter must be identical.
Is it right approach I am using. I want to store whole 512 bytes on Edit1.Text and then will do whatever I want to do with that data. Please help me in getting all 512 bytes from the server.
Update: Alternating Approach
I am using this approach to read word and string values
WordArray : array[0..5] of word;
if IdTCPClient1.IOHandler.InputBufferIsEmpty then
begin
if IdTCPClient1.IOHandler.CheckForDataOnSource(1000) then
begin
i := 0;
while i < 6 do //Read all the words
begin
//Fill WORD data in array
WordArray[i] := (IdTCPClient1.Socket.ReadWord(True));
end;
end;
end;
Similar approach for string like
WordArray[i] := (IdTCPClient1.Socket.ReadString(50));
Thats working fine, but I have to remain the connection open while I read all the data in loop. If in between connection goes, I lose everything and have to request the whole package again from server.

It's hard to answer you, unless you precisely describe what's written in documentation you have. So far we know that your 512B packet consists from 6 words and 10x50B strings. So, take this just as starting point, until you tell us more:
procedure TForm1.Button1Click(Sender: TObject);
var
I: Integer;
Buffer: TBytes;
WordArray: array[0..5] of Word;
StringArray: array[0..9] of AnsiString;
begin
if IdTCPClient1.IOHandler.InputBufferIsEmpty then
begin
if IdTCPClient1.IOHandler.CheckForDataOnSource(1000) then
IdTCPClient1.IOHandler.ReadBytes(Buffer, 512, False);
for I := 0 to High(WordArray) do
begin
WordRec(WordArray[I]).Hi := Buffer[I * 2];
WordRec(WordArray[I]).Lo := Buffer[I * 2 + 1];
end;
for I := 0 to High(StringArray) do
SetString(StringArray[I], PAnsiChar(#Buffer[I * 50 + 12]), 50);
// here you have the arrays prepared to be processed
end;
end;

1: what is the charset of the string ? is it 1-byte windows-1251 ? or 2-bytes Unicode UCS-2 ? or variable-length UTF-8 or UTF-16 ?
2: what is the length of the string ? always 50 ?
reading the buffer:
reading the manuals
1.1 http://www.indyproject.org/docsite/html/TIdIOHandler_ReadBytes#TIdBytes#Integer#boolean.html
1.2 http://www.indyproject.org/docsite/html/TIdIOHandler_ReadSmallInt#Boolean.html
1.3 http://docwiki.embarcadero.com/Libraries/XE2/en/System.SetString
making code accurately following types and parameter descriptions.
2.1 Reading header: That should result in something like
var Word1, Word2: word;
Word1 := IOHandler.ReadSmallInt(false);
Word2 := IOHandler.ReadSmallInt(false);
reading single-byte string
3.1 reading buffer
3.2 converting buffer to string
var Word1, Word2: word; Buffer: TIdBytes;
var s: RawByteString;
// or AnsiString; or maybe UTF8String; but probably not UnicodeString aka string
Word1 := IOHandler.ReadSmallInt(false);
Word2 := IOHandler.ReadSmallInt(false);
// You should check that you really received 50 bytes,
// then do something like that:
IOHandler.ReadBytes(Buffer, 50, false);
Assert(Length(Buffer)=50);
SetString (s, pointer(#Buffer[0]), 50);
Continue reading the rest - you only read 50+2+2 = 54 bytes of 512 bytes packet - there should be more data.
512 = 54*9+26 - so it might look like a loop - and discarding the 26 bytes tail.
var Word1, Word2: word; Buffer: TIdBytes;
var s: RawByteString;
for i := 1 to 9 do begin
Word1 := IOHandler.ReadSmallInt(false);
Word2 := IOHandler.ReadSmallInt(false);
IOHandler.ReadBytes(Buffer, 50, false);
Assert(Length(Buffer)=50);
SetString (s, pointer(#Buffer[0]), 50);
SomeOutputCollection.AppendNewElement(Word1, Word2, s);
end;
IOHandler.ReadBytes(Buffer, 512 - 9*(50+2+2), false); // discard the tail

Convert char pos of UnicodeString to byte pos in a utf8 string

I use Scintilla and set it's encoding to utf8 (and this is the only way to make it compatible with Unicode characters, if I understand it correctly). With this set up, when talking about a positions in the text Scintilla means byte positions.
The problem is, I use UnicodeString in the rest of my program, and when I need to select a particular rang in the Scintilla editor, I need to convert from char pos of the UnicodeString to byte pos in a utf8 string that's corresponding to the UnicodeString. How can I do that easily? Thanks.
PS, when I found ByteToCharIndex I thought it's what I need, however, according to its document and the result of my testing, it only works If the system uses a multi-byte character system (MBCS).

You should parse UTF8 strings yourself using UTF8 description. I have written a quick UTF8 analog of ByteToCharIndex and tested on cyrillic string:
function UTF8PosToCharIndex(const S: UTF8String; Index: Integer): Integer;
var
I: Integer;
P: PAnsiChar;
begin
Result:= 0;
if (Index <= 0) or (Index > Length(S)) then Exit;
I:= 1;
P:= PAnsiChar(S);
while I <= Index do begin
if Ord(P^) and $C0 <> $80 then Inc(Result);
Inc(I);
Inc(P);
end;
end;
const TestStr: UTF8String = 'abФЫВА';
procedure TForm1.Button2Click(Sender: TObject);
begin
ShowMessage(IntToStr(UTF8PosToCharIndex(TestStr, 1))); // a = 1
ShowMessage(IntToStr(UTF8PosToCharIndex(TestStr, 2))); // b = 2
ShowMessage(IntToStr(UTF8PosToCharIndex(TestStr, 3))); // Ф = 3
ShowMessage(IntToStr(UTF8PosToCharIndex(TestStr, 5))); // Ы = 4
ShowMessage(IntToStr(UTF8PosToCharIndex(TestStr, 7))); // В = 5
end;
The reverse function is no problem too:
function CharIndexToUTF8Pos(const S: UTF8String; Index: Integer): Integer;
var
P: PAnsiChar;
begin
Result:= 0;
P:= PAnsiChar(S);
while (Result < Length(S)) and (Index > 0) do begin
Inc(Result);
if Ord(P^) and $C0 <> $80 then Dec(Index);
Inc(P);
end;
if Index <> 0 then Result:= 0; // char index not found
end;

I wrote a function based on Serg's code with great respect, I posted it here as a separate answer with the hope that it's helpful to others too. Serg's answer is accepted instead.
{Return the index (1-based) of the first byte of the character (unicode point)
specified by aCharIdx (1-based) in aUtf8Str.
Code is amended by Edwin Yip based on code written by SO member Serg (https://stackoverflow.com/users/246408/serg)
ref 1: https://stackoverflow.com/a/10388131/133516
ref 2: http://sergworks.wordpress.com/2012/05/01/parsing-utf8-strings/
}
function CharPosToUTF8BytePos(const aUtf8Str: UTF8String; const aCharIdx:
Integer): Integer;
var
p: PAnsiChar;
charCount: Integer;
begin
p:= PAnsiChar(aUtf8Str);
Result:= 0;
charCount:= 0;
while (Result < Length(aUtf8Str)) do
begin
if IsUTF8LeadChar(p^) then
Inc(charCount);
if charCount = aCharIdx then
Exit(Result + 1);
Inc(p);
Inc(Result);
end;
end;

Both UTF-8 and UTF-16 (what UnicodeString uses) are variable-length encodings. A given Unicode codepoint can be encoded in UTF-8 using between 1-4 single-byte codeunits, and in UTF-16 using either 1 or 2 2-byte codeunits, depending on the codepoint's numeric value. The only way to translate a position in a UTF-16 string into a position in an equivilent UTF-8 string is to decode the UTF-16 codeunits preceeding the position back to their original Unicode codepoint values and then re-encode them to UTF-8 codeunits.
It sounds like you are better off re-writting the code that interacts with Scintilla to use UTF8String instead of UnicodeString, then you won't have to translate between UTF-8 and UTF-16 at that layer anymore. When interacting with the rest of your code, you can convert between UTF8String and UnicodeString as needed.

Hex view of a file

I am using Delphi 2009.
I want to view the contents of a file (in hexadecimal) inside a memo.
I'm using this code :
var
Buffer:String;
begin
Buffer := '';
AssignFile(sF,Source); //Assign file
Reset(sF);
repeat
Readln(sF,Buffer); //Load every line to a string.
TempChar:=StrToHex(Buffer); //Convert to Hex using the function
...
until EOF(sF);
end;
function StrToHex(AStr: string): string;
var
I ,Len: Integer;
s: chr (0)..255;
//s:byte;
//s: char;
begin
len:=length(AStr);
Result:='';
for i:=1 to len do
begin
s:=AStr[i];
//The problem is here. Ord(s) is giving false values (251 instead of 255)
//And in general the output differs from a professional hex editor.
Result:=Result +' '+IntToHex(Ord(s),2)+'('+IntToStr(Ord(s))+')';
end;
Delete(Result,1,1);
end;
When I declare variable "s" as char (i know that char goes up to 255) I get results hex values up to 65535!
When i declare variable "s" as byte or chr (0)..255, it outputs different hex values, comparing to any Hexadecimal Editor!
Why is that? How can I see the correct values?
Check images for the differences.
1st image: Professional Hex Editor.
2nd image: Function output to Memo.
Thank you.

Your Delphi 2009 is unicode-enabled, so Char is actually WideChar and that's a 2 byte, 16 bit unsigned value, that can have values from 0 to 65535.
You could change all your Char declarations to AnsiChar and all your String declarations to AnsiString, but that's not the way to do it. You should drop Pascal I/O in favor of modern stream-based I/O, use a TFileStream, and don't treat binary data as Char.
Console demo:
program Project26;
{$APPTYPE CONSOLE}
uses SysUtils, Classes;
var F: TFileStream;
Buff: array[0..15] of Byte;
CountRead: Integer;
HexText: array[0..31] of Char;
begin
F := TFileStream.Create('C:\Temp\test', fmOpenRead or fmShareDenyWrite);
try
CountRead := F.Read(Buff, SizeOf(Buff));
while CountRead <> 0 do
begin
BinToHex(Buff, HexText, CountRead);
WriteLn(HexText); // You could add this to the Memo
CountRead := F.Read(Buff, SizeOf(Buff));
end;
finally F.Free;
end;
end.

In Delphi 2009, a Char is the same thing as a WideChar, that is, a Unicode character. A wide character occupies two bytes. You want to use AnsiChar. Prior to Delphi 2009 (that is, prior to Unicode Delphi), Char was the same thing as AnsiChar.
Also, you shouldn't use ReadLn. You are treating the file as a text file with text-file line endings! This is a general file! It might not have any text-file line endings at all!

For an easier to read output, and looking better too, you might want to use this simple hex dump formatter.
The HexDump procedure dumps an area of memory into a TStrings in lines of two chunks of 8 bytes in hex, and 16 ascii chars
example
406563686F206F66 660D0A6966206578 #echo off..if ex
69737420257E7331 5C6E756C20280D0A ist %~s1\nul (..
0D0A290D0A ..)..
Here is the code for the dump format function
function HexB (b: Byte): String;
const HexChar: Array[0..15] of Char = '0123456789ABCDEF';
begin
result:= HexChar[b shr 4]+HexChar[b and $0f];
end;
procedure HexDump(var data; size: Integer; s: TStrings);
const
sepHex=' ';
sepAsc=' ';
nonAsc='.';
var
i : Integer;
hexDat, ascDat : String;
buff : Array[0..1] of Byte Absolute data;
begin
hexDat:='';
ascDat:='';
for i:=0 to size-1 do
begin
hexDat:=hexDat+HexB(buff[i]);
if ((buff[i]>31) and (buff[i]<>255)) then
ascDat:=ascDat+Char(buff[i])
else
ascDat:=ascDat+nonAsc;
if (((i+1) mod 16)<>0) and (((i+1) mod 8)=0) then
hexDat:=hexDat+sepHex;
if ((i+1) mod 16)=0 then
begin
s.Add(hexdat+sepAsc+ascdat);
hexdat:='';
ascdat:='';
end;
end;
if (size mod 16)<>0 then
begin
if (size mod 16)<8 then
hexDat:=hexDat+StringOfChar(' ',(8-(size mod 8))*2)
+sepHex+StringOfChar(' ',16)
else
hexDat:=hexDat+StringOfChar(' ',(16-(size mod 16))*2);
s.Add(hexDat + sepAsc + ascDat);
end;
end;
And here is a complete code example for dumping the contents of a file into a Memo field.
procedure TForm1.Button1Click(Sender: TObject);
var
FStream: TFileStream;
buff: array[0..$fff] of Byte;
nRead: Integer;
begin
FStream := TFileStream.Create(edit1.text, fmOpenRead or fmShareDenyWrite);
try
repeat
nRead := FStream.Read(Buff, SizeOf(Buff));
if nRead<>0 then
hexdump(buff,nRead,memo1.lines);
until nRead=0;
finally
F.Free;
end;
end;

string is UnicodeString in Delphi 2009. If you want to use single-byte strings use AnsiString or RawByteString.
See String types.

Unicode string and TStringStream

Delphi 2009 and above uses unicode strings for their default string type. To my understanding unicode char is actually 16 bit value or 2 bytes (note: I understand there is possibility of 3 or 4 bytes char, but let's consider the most usual case). However I found that TStringStream is not very reliable to manipulating this strings. For example, TStringStream.Size property returns the length of the string, while I think it should return the byte count of the contained string. Okay, you can adjust it on your own, but the thing that really confused me the most is: TStringStream does not read from or write to a buffer reliably.
Please check the following code (it's a DUnit test and always fail). Please let me know where the problem is (I was using D2010 when testing the code).
procedure TestTCPackage.TestStringStream;
const
cCount = 10;
cOrdMaxChar = Ord(High(Char));
var
B: Pointer;
SW, SR: TStringStream;
T: string;
i, j, k : Integer;
vStrings: array [0..cCount-1] of string;
begin
RandSeed := GetTickCount;
for i := 0 to cCount - 1 do
begin
j := Random(100) + 1;
SetLength(vStrings[i], j);
for k := 1 to j do
// fill string with random char (but no #0)
vStrings[i][k] := Char(Random(cOrdMaxChar-1) + 1);
end;
for i := 0 to cCount - 1 do
begin
SW := TStringStream.Create(vStrings[i]);
try
GetMem(B, SW.Size * SizeOf(Char));
try
SW.Read(B^, SW.Size * SizeOf(Char));
SR := TStringStream.Create;
try
SR.Write(B^, SW.Size * SizeOf(Char));
SR.Position := 0;
// check the string in the TStringStream with original value
Check(SR.DataString = vStrings[i]);
finally
SR.Free;
end;
finally
FreeMem(B);
end;
finally
SW.Free;
end;
end;
end;
Note: I already tried to use an instance of TMemoryStream as intermediary from reading/writing the buffer and use CopyFrom of the TStringStream to read the content of that TMemoryStream with same failing effect.

Unicode strings aren't for data storage; use TBytes for that. TStringStream uses its associated encoding (the Encoding property) for encoding strings passed in with WriteString, and decoding strings read out with ReadString or the DataString property.

After reading this post (and thanks to Serg who provided the answer to that question) and Barry Kelly's answer, I have found the problem. TStringStream is actually using ASCII/ansistring encoding by default. So even if your default string type is unicode, unless you spesifically tell it to, it won't use unicode encoding. Personally I think it's strange. Maybe for making it easier to convert old codes.
So you have to specifically set the encoding of the TStringStream to TEncoding.Unicode to manipulate unicode string properly.
Here is my modified code which passes DUnit test is:
procedure TestTCPackage.TestStringStream;
const
cCount = 10;
cOrdMaxChar = Ord(High(Char));
var
B: Pointer;
SW, SR: TStringStream;
i, j, k : Integer;
vStrings: array [0..cCount-1] of string;
begin
RandSeed := GetTickCount;
for i := 0 to cCount - 1 do
begin
j := Random(100) + 1;
SetLength(vStrings[i], j);
for k := 1 to j do
// fill string with random char (but no #0)
vStrings[i][k] := Char(Random(cOrdMaxChar-1) + 1);
end;
for i := 0 to cCount - 1 do
begin
SW := TStringStream.Create(vStrings[i], ***TEncoding.Unicode***);
try
GetMem(B, SW.Size);
try
SW.ReadBuffer(B^, SW.Size);
SR := TStringStream.Create('', ***TEncoding.Unicode***);
try
SR.WriteBuffer(B^, SW.Size);
SR.Position := 0;
// check the string in the TStringStream with original value
Check(SR.DataString = vStrings[i]);
finally
SR.Free;
end;
finally
FreeMem(B);
end;
finally
SW.Free;
end;
end;
end;
Last note: Unicode does bite! :D

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

How to convert widestring to string of unicode bytes? - delphi

Related

Delphi FMX dcpcrypt wrong result on macOS 64-bit

How to read all bytes from server using Indy Client in Delphi?

Convert char pos of UnicodeString to byte pos in a utf8 string

Hex view of a file

Unicode string and TStringStream

Categories

Resources