Assign [array of byte] to a Variant with no Unicode conversion - delphi

Consider the following code snippet (in Delphi XE2):
function PrepData(StrVal: string; Base64Val: AnsiString): OleVariant;
begin
Result := VarArrayCreate([0, 1], varVariant);
Result[0] := StrVal;
Result[1] := Base64Val;
end;
Base64Val is a binary value encoded as Base64 (so no null bytes). The (OleVariant) Result is automatically marshalled and sent between a client app and a DataSnap server.
When I capture the traffic with Wireshark, I see that both StrVal and Base64Val are transferred as Unicode strings. If I can, I would like to avoid the Unicode conversion for Base64Val. I've looked at all the Variant types and don't see anything other than varString that can transfer an array of characters.
I found this question that shows how to create a variant array of bytes. I'm thinking that I could use this technique instead of using an AnsiString. I'm curious though, is there another way to assign an array of non-Unicode character data to a Variant without a conversion to a Unicode string?

Delphi's implementation supports storing AnsiString and UnicodeString in a Variant, using custom variant type codes. These codes are varString and varUString.
But interop will typically use standard OLE variants and the OLE string, varOleStr, is 16 bit encoded. That would seem to be the reason for your observation.
You'll need to put the data in as an array of bytes if you do wish to avoid a conversion to 16 bit text. Doing so renders base64 encoding pointless. Stop base64 encoding the payload and send the binary in a byte array.

Keeping with the example in the question, this is how I made it work (using code and comments from David's answer to another question as referenced in my question):
function PrepData(StrVal: string; Data: TBytes): OleVariant;
var
SafeArray: PVarArray;
begin
Result := VarArrayCreate([0, 1], varVariant);
Result[0] := StrVal;
Result[1] := VarArrayCreate([1, Length(Data)], varByte);
SafeArray := VarArrayAsPSafeArray(Result[1]);
Move(Pointer(Data)^, SafeArray.Data^, Length(Data));
end;
Then on the DataSnap server, I can extract the binary data from the OleVariant like this, assuming Value is Result[1] from the Variant Array in the OleVariant:
procedure GetBinaryData(Value: Variant; Result: TMemoryStream);
var
SafeArray: PVarArray;
begin
SafeArray := VarArrayAsPSafeArray(Value);
Assert(SafeArray.ElementSize=1);
Result.Clear;
Result.WriteBuffer(SafeArray.Data^, SafeArray.Bounds[0].ElementCount);
end;

Related

Copy TByteDynArray (array of byte) to string

How can I copy contents of a TByteDynArray variable to a string variable or even better, to a TMemoryStream?
Remy, thanks for your answer.
Well, I can't get it to work.
I'm doing this:
obtReferenciaPagamentoResponse.pdf is a TByteDynArray (array of byte) that comes throught a WebService call, that is referenced on the XSD like xsd:base64Binary.
procedure saveFile;
var
LInput, LOutput: TMemoryStream;
Id: Integer;
Buff: AnsiString;
//Buff: String;
begin
LInput := TMemoryStream.Create;
LOutput := TMemoryStream.Create;
// Tried like this also
//SetLength(Buff, Length(obtReferenciaPagamentoResponse.pdf));
//Move(obtReferenciaPagamentoResponse.pdf[0], Buff[1], Length(obtReferenciaPagamentoResponse.pdf));
// Tried other charsets
Buff := TEncoding.Ansi.GetString(obtReferenciaPagamentoResponse.pdf);
LInput.Write(Buff[1], Length(Buff) * SizeOf(Buff[1]));
LInput.Position := 0;
TNetEncoding.Base64.Decode(LInput, LOutput);
LOutput.Position := 0;
LOutput.SaveToFile(SaveDialog2.FileName);
LInput.Free;
LOutput.Free;
end;
But the PDF file is saved incompleted, I guess, because is always corrupted on open.
What am I doing wrong?
String is an alias for UnicodeString since 2009. As UnicodeString characters are now encoded in UTF-16, it does not make sense to copy raw bytes into a (Unicode)String unless the bytes are also encoded in UTF-16. In that case, you can simply use SetLength() to allocate the String's length to the appropriate number of Chars and then Move() the raw bytes into the String's allocated memory. Otherwise, use TEncoding.GetString() instead to decode the bytes into a UTF-16 String using the appropriate charset.
As for TMemoyStream, it has a Write() method for writing raw bytes into the stream. Simply set its Position property to the desired offset and then write the bytes.

How to interpret arrays passed from VBscript to a Delphi COM server App

I am trying to pass an array of bytes from VBscript to my windows Delphi Application and can't seem to find the correct syntax to interpret the passed data.
The requirement is fairly simple as the VBscript snippet below demonstrates
Dim inst,arr(5)
Sub Main
set inst=instruments.Find("EP1")
arr(0) = 0
arr(1) = 1
arr(2) = 2
arr(3) = 3
arr(4) = 4
inst.writebytes arr,5
end Sub
I can get the server to accept the olevariant passed by the script but the data seems garbled, my example server code is shown below and is based on the Stackoverflow question here How to use variant arrays in Delphi
procedure TInstrument.WriteBytes(Data: OleVariant; Length: Integer);
var i,n:integer; Pdat:Pbyte; Adata:PvarArray;
begin
if VarIsArray(data) then
begin
n:=TVarData(Data).VArray^.Bounds[0].ElementCount;
Adata:= VarArrayLock(Data);
Getmem(Pdat,length);
try
for i:=0 to length-1 do
Pdat[i]:=integerArray(Adata.data^)[i];
Finstrument.WriteBytes(Pdat,Length);
finally
freemem(Pdat)
end;
end;
end;
So the idea is to accept the integers passed by the script, convert it to the local data representation (array of byte) then pass it on to my function to use the data.
I have tried several different data types and methods to try and get some ungarbled data out of the variant all to no avail.
What is the correct method of extracting the array data from the passed variant?
Also, TVarData(Data).VArray^.Bounds[0].ElementCount has a value of zero, why would that be?
Arrays created in VBScript are
zero based
untyped
declared with upper bound (not size as you assumed; size of array declared as Dim arr(5) is 6)
include dimension info in them (so you don't need to pass it along with the array)
When used in COM, they are passed as variant arrays of type varVariant (as the Ondrej Kelle points out in his comment). To process such an array in your method you have to assert that:
the value is a single dimensional array
each element can be converted to byte
You can write helper routine for that:
function ToBytes(const Data: Variant): TBytes;
var
Index, LowBound, HighBound: Integer;
ArrayData: Pointer;
begin
if not VarIsArray(Data) then
raise EArgumentException.Create('Variant array expected.');
if VarArrayDimCount(Data) <> 1 then
raise EArgumentException.Create('Single dimensional variant array expected.');
LowBound := VarArrayLowBound(Data, 1);
HighBound := VarArrayHighBound(Data, 1);
SetLength(Result, HighBound - LowBound + 1);
if TVarData(Data).VType = varArray or varByte then
begin
ArrayData := VarArrayLock(Data);
try
Move(ArrayData^, Result[0], Length(Result));
finally
VarArrayUnlock(Data);
end;
end
else
begin
for Index := LowBound to HighBound do
Result[Index - LowBound] := Data[Index];
end;
end;
for loop in the routine will be horribly slow when processing large arrays, so there's optimization for special case (variant array of bytes) that uses Move to copy bytes to result. But this will never happen with VBScript array. You might consider using VB.Net or PowerShell.
Using such a routine has downside of keeping 2 instances of the array in memory - as variant array and as byte array. Use it as a guide when applying it to your use case.

Delphi Lockbox Hashing

I need to hash a string, preferably as SHA512, although it could be SHA256, SHA1, MD5 or CRC32.
I have downloaded Lockbox 3, put a TCryptographicLibrary and a THash component on a form, set the Hash property to SHA-512 and used the following code to produce a test result:
procedure TForm1.Button1Click(Sender: TObject);
begin
Hash1.HashString('myhashtest');
Edit1.Text := Stream_To_AnsiString(Hash1.HashOutputValue);
end;
To best illustrate the problem, I have gone on to an online hash calculator and the MD5 hash of 'myhashtest' is ff91e22313f0a41b46719e7ee6f99451 but setting the hash property in my test program to MD5 results in ÿ‘â#ð¤Fqž~æù”Q which is clearly wrong. I have tried the same test using other Hash properties, including the SHA512 which i want, and they all return rubbish.
Where am I going wrong?
THash.HashOutputValue is a stream of the raw hashed bytes. It appears that Stream_To_AnsiString() merely copies those raw bytes as-is into an AnsiString, it does not encode the bytes in any way. What you are looking for is the hex encoded version of the raw bytes instead. I do know that LockBox has a Stream_To_Base64() function (as shown in this example), but I do not know if it has a Stream_To_Hex() type of function. If it does not, you can easily create your own, eg:
function Stream_To_Hex(Stream: TStream): AnsiString;
var
NumBytes, I: Integer;
B: Byte;
begin
NumBytes := Stream.Size - Stream.Position;
SetLength(Result, NumBytes * 2);
for I := 0 to NumBytes-1 do
begin
Stream.ReadBuffer(B, 1);
BinToHex(#B, #Result[(I*2)+1], 1);
end;
end;
procedure TForm1.Button1Click(Sender: TObject);
begin
Hash1.HashString('myhashtest');
Edit1.Text := Stream_To_Hex(Hash1.HashOutputValue);
end;
Many cryptographic functions 'silently' (i.e. without stating so in the docs) output and require Base64- or hex-encoded strings (and also often AnsiStrings). This is because encrypted text can contain any data, and as soon as you start treating that as 'strings', string handling functions can easily choke on that (e.g. null-terminated strings containing a null). By Base-64/hex encoding the cryptotext you make sure it will be plain old ASCII characters that evene old code can read/write.
If you dig around a little in the cryptocode or its method parameters you usually can determine that, and convert your strings accordingly.
I figured out where stream_to_hex, it is inside uTPLB_StreamUtils (pas or hpp) depending if you are using c builder or delphi.

How to convert AnsiChar to UnicodeChar with specific CodePage?

I'm generating texture atlases for rendering Unicode texts in my app. Source texts are stored in ANSI codepages (1250, 1251, 1254, 1257, etc). I want to be able to generate all the symbols from each ANSI codepage.
Here is the outline of the code I would expect to have:
for I := 0 to 255 do
begin
anChar := AnsiChar(I); //obtain AnsiChar
//Apply codepage without converting the chars
//<<--- this part does not work, showing:
//"E2033 Types of actual and formal var parameters must be identical"
SetCodePage(anChar, aCodepages[K], False);
//Assign AnsiChar to UnicodeChar (automatic conversion)
uniChar := anChar;
//Here we get Unicode character index
uniCode := Ord(uniChar);
end;
The code above does not works (E2033) and I'm not sure it is a proper solution at all. Perhaps there's much shorter version.
What is the proper way of converting AnsiChar into Unicode with specific codepage in mind?
I would do it like this:
function AnsiCharToWideChar(ac: AnsiChar; CodePage: UINT): WideChar;
begin
if MultiByteToWideChar(CodePage, 0, #ac, 1, #Result, 1) <> 1 then
RaiseLastOSError;
end;
I think you should avoid using strings for what is in essence a character operation. If you know up front which code pages you need to support then you can hard code the conversions into a lookup table expressed as an array constant.
Note that all the characters that are defined in the ANSI code pages map to Unicode characters from the Basic Multilingual Plane and so are represented by a single UTF-16 character. Hence the size assumptions of the code above.
However, the assumption that you are making, and that this answer persists, is that a single byte represents a character in an ANSI character set. That's a valid assumption for many character sets, for example the single byte western character sets like 1252. But there are character sets like 932 (Japanese), 949 (Koren) etc. that are double byte character sets. Your entire approach breaks down for those code pages. My guess is that only wish to support single byte character sets.
If you are writing cross-platform code then you can replace MultiByteToWideChar with UnicodeFromLocaleChars.
You can also do it in one step for all characters. Here is an example for codepage 1250:
var
encoding: TEncoding;
bytes: TBytes;
unicode: TArray<Word>;
I: Integer;
S: string;
begin
SetLength(bytes, 256);
for I := 0 to 255 do
bytes[I] := I;
SetLength(unicode, 256);
encoding := TEncoding.GetEncoding(1250); // change codepage as needed
try
S := encoding.GetString(bytes);
for I := 0 to 255 do
unicode[I] := Word(S[I+1]); // as long as strings are 1-based
finally
encoding.Free;
end;
end;
Here is the code I have found to be working well:
var
I: Byte;
anChar: AnsiString;
Tmp: RawByteString;
uniChar: Char;
uniCode: Word;
begin
for I := 0 to 255 do
begin
anChar := AnsiChar(I);
Tmp := anChar;
SetCodePage(Tmp, aCodepages[K], False);
uniChar := UnicodeString(Tmp)[1];
uniCode := Word(uniChar);
<...snip...>
end;

Delphi: Fast(er) widestring concatenation

i have a function who's job is to convert an ADO Recordset into html:
class function RecordsetToHtml(const rs: _Recordset): WideString;
And the guts of the function involves a lot of wide string concatenation:
while not rs.EOF do
begin
Result := Result+CRLF+
'<TR>';
for i := 0 to rs.Fields.Count-1 do
Result := Result+'<TD>'+VarAsWideString(rs.Fields[i].Value)+'</TD>';
Result := Result+'</TR>';
rs.MoveNext;
end;
With a few thousand results, the function takes, what any user would feel, is too long to run. The Delphi Sampling Profiler shows that 99.3% of the time is spent in widestring concatenation (#WStrCatN and #WstrCat).
Can anyone think of a way to improve widestring concatenation? i don't think Delphi 5 has any kind of string builder. And Format doesn't support Unicode.
And to make sure nobody tries to weasel out: pretend you are implementing the interface:
IRecordsetToHtml = interface(IUnknown)
function RecordsetToHtml(const rs: _Recordset): WideString;
end;
Update One
I thought of using an IXMLDOMDocument, to build up the HTML as xml. But then i realized that the final HTML would be xhtml and not html - a subtle, but important, difference.
Update Two
Microsoft knowledge base article: How To Improve String Concatenation Performance
WideString are inherently slow because they were implemented for COM compatibility and go through COM calls. If you look at the code, it will keep on reallocating the string and call SysAllocStringLen() & C which are APIs from oleaut32.dll. It doesn't use the Delphi memory manager but AFAIK it uses the COM memory manager.
Because most HTML pages don't use UTF-16, you may get better result using the native Delphi string type and a string list, although you should be careful about conversion from UTF and the actual codepage, and the conversion will downgrade performance as well.
Also you're using a VarAsString() function that probably converts a variant to an AnsiString then converted to a WideString. Check if your version of Delphi has a VarAsWideString() or something alike function to avoid it, or rely on Delphi automatic conversion if you could be sure your variant will never be NULL.
Yup, your algorithm is clearly in O(n^2).
Instead of returning a string, try returning a TStringList, and replace your loop with
while not rs.EOF do
begin
Result.Add('<TR>');
for i := 0 to rs.Fields.Count-1 do
Result.Add( '<TD>'+VarAsString(rs.Fields[i].Value)+'</TD>' );
Result := Result.Add('</TR>');
rs.MoveNext;
end;
You can then save your Result using TStringList.SaveToFile
I'm unable to spend the time right now to give you the exact code.
But I think the fastest thing you can do is:
Loop through all the strings and total their length also adding for the extra table tags you'll need.
Use SetString to allocate one string of the proper length.
Loop through all the strings again and use the "Move" procedure to copy to the string to the proper place in the final string.
The key thing is that many concatenations to a string take longer and longer because of the constant allocating and freeing of memory. A single allocation will be your biggest timesaver.
i found the best solution. The open source HtmlParser for Delphi, has a helper TStringBuilder class. It is internally used to build what he calls DomStrings, which is actually an alias of WideString:
TDomString = WideString;
With a little bit of fiddling of his class:
TStringBuilder = class
public
constructor Create(ACapacity: Integer);
function EndWithWhiteSpace: Boolean;
function TailMatch(const Tail: WideString): Boolean;
function ToString: WideString;
procedure AppendText(const TextStr: WideString);
procedure Append(const value: WideString);
procedure AppendLine(const value: WideString);
property Length: Integer read FLength;
end;
The guts of the routine becomes:
while not rs.EOF do
begin
sb.Append('<TR>');
for i := 0 to rs.Fields.Count-1 do
sb.Append('<TD>'+VarAsWideString(rs.Fields[i].Value));
sb.AppendLine('</TR>');
rs.MoveNext;
end;
The code then feels to run infinitely afaster. Profiling shows much improvement; the WideString manipulation and length-counting became negligible. In its place was FastMM's own internal operations.
Notes
Nice catch on the mistaken forcing of all strings into current code-page (VarAsString rather than VarAsWideString)
Some HTML closing tags are optional; omitted ones that logically make no sense.
Widestring is not reference counted, any modification means a string manipulation. If your content is not unicode encoded, you can internally use the native string (reference counted) to concatenate string and then convert it to a Widestring. Example is as follows:
var
NativeString: string;
begin
// ...
NativeString := '';
while not rs.EOF do
begin
NativeString := NativeString + CRLF + '<TR>';
for i := 0 to rs.Fields.Count-1 do
NativeString := NativeString + '<TD>'+VarAsString(rs.Fields[i].Value) + '</TD>';
NativeString := NativeString + '</TR>';
rs.MoveNext;
end;
Result := WideString(NativeString);
I have also seen another approach: Encode Unicode to UTF8String (as reference counted), concatenate them and finally convert UTF8String to Widestring. But I am not sure, if two UTF8String can be concatenated directly. The time on encoding should also be considered.
Anyway, although Widestring concatenation is much slower than native string operations. But it is IMO still acceptable. Too much tuning on such kind of thing should be avoided. Seriously considering of performance, you should then upgrade your Delphi to at least 2009. The costs on buying a tool is for long-term cheaper than doing heavy hacks on an old Delphi.

Resources