Optimal Buffer Stream write process - delphi

We have our own data streaming algorithm that include some metadata+records+fields values.
Currently we use a TStream and write to add values to the stream.
Now I'm wondering if this time cosuming operation could be made faster by using some technic.
Edit: We are only appending data to the end, not moving or seeking.
Some things I was thinking of are:
Not use Streams buf some large memory allocated buffer to copy data to, the problem is if we go beyond the buffer size, then we have to relocate to some new memory space.
Use a Stream prefilled with #0s to some size and then start adding values. The rationale is that Tstream have to allocate it's own buffer every time I do a write (I don't know how it really works, just wondering...)
We are adding strings to the TStream and binary data in the form #0#0#0#1 for example.
The data is then transfered via TCP, so it's not about write to a File.
So what is the best method for this?

First, you are assuming the TStream is the bottleneck. You need to profile your code, such as with AQTime, to determine where the bottleneck really is. Don't make assumptions.
Second, what type of TStream are you actually using? TMemoryStream? TFileStream? Something else? Different stream types handle memory differently. TMemoryStream allocates a memory buffer and grows it by a pre-set amount of bytes whenever the buffer fills up. TFileStream, on the other hand, doesn't use any memory at all, it just writes directly to the file and lets the OS handle any buffering.
Regardless of which type of stream you use, one thing you could try is implement your own custom TStream class that has an internal fixed-size buffer and a pointer to your real destination TStream object. You can then pass an instance of your custom class to your algorithm. Have your class override the TStream::Write() method to copy the input data into its buffer until it fills up, then you can Write() the buffer to the destination TStream and clear the buffer. Your algorithm will never know the difference. Both TMemoryStream and TFileStream would benefit from the extra buffering - fewer larger writes means more efficient memory allocations and file I/O. For example:
type
TMyBufferedStreamWriter = class(TStream)
private
fDest: TStream;
fBuffer: array[0..4095] of Byte;
fOffset: Cardinal;
public
constructor Create(ADest: TStream);
function Read(var Buffer; Count: Longint): Longint; override;
function Write(const Buffer; Count: Longint): Longint; override;
procedure FlushBuffer;
end;
.
uses
RTLConsts;
constructor TMyBufferedStreamWriter.Create(ADest: TStream);
begin
fDest := ADest;
end;
function TMyBufferedStreamWriter.Read(var Buffer; Count: Longint): Longint;
begin
Result := 0;
end;
function TMyBufferedStreamWriter.Write(const Buffer; Count: Longint): Longint;
var
pBuffer: PByte;
Num: Cardinal;
begin
Result := 0;
pBuffer := PByte(#Buffer);
while Count > 0 do
begin
Num := Min(SizeOf(fBuffer) - fOffset, Cardinal(Count));
if Num = 0 then FlushBuffer;
Move(pBuffer^, fBuffer[fOffset], Num);
Inc(fOffset, Num);
Inc(pBuffer, Num);
Dec(Count, Num);
Inc(Result, Num);
end;
end;
procedure TMyBufferedStreamWriter.FlushBuffer;
var
Idx: Cardinal;
Written: Longint;
begin
if fOffset = 0 then Exit;
Idx := 0;
repeat
Written := fDest.Write(fBuffer[Idx], fOffset - Idx);
if Written < 1 then raise EWriteError.CreateRes(#SWriteError);
Inc(Idx, Written);
until Idx = fOffset;
fOffset := 0;
end;
.
Writer := TMyBufferedStreamWriter.Create(RealStreamHere);
try
... write data to Writer normally as needed...
Writer.FlushBuffer;
finally
Writer.Free;
end;

Use a profiler to see where it is actually slow.
If it is indeed because of multiple reallocation to increase the size of the stream, you can avoid that by setting the Size property to a big enough amount upfront.
The only case where using a memory buffer could make an obvious difference is if you're using FileStreams, all other useable streams are already doing this for you.

Overwrite TMemoryStream and remove the restriction of Size and Capacity. And do not call TMemoryStream.Clear but call TMemoryStream.SetSize(0)
type
TMemoryStreamEx = class(TMemoryStream)
public
procedure SetSize(NewSize: Longint); override;
property Capacity;
end;
implementation
{ TMemoryStreamEx }
procedure TMemoryStreamEx.SetSize(NewSize: Integer);
var
OldPosition: Longint;
begin
if NewSize > Capacity then
inherited SetSize(NewSize)
else
begin
OldPosition := Position;
SetPointer(Memory, NewSize);
if OldPosition > NewSize then
Seek(0, soFromEnd);
end;
end;

Related

Sending the right record size over socket

I have this record:
type
TSocks5_Packet = record
Socks_ID: String[6];
Socks_Packet: array of byte;
end;
I put String[6] because I'm sure that this string will always have 6 characters (and to try facilitate my job, but wasn't enough).
I try to send this record here:
procedure TForm1.SocksServerClientRead(Sender: TObject;
Socket: TCustomWinSocket);
var
Socks5_Request: TSocks5_Packet;
begin
SetLength(Socks5_Request.Socks_Packet, Socket.ReceiveLength);
Socket.ReceiveBuf(Socks5_Request.Socks_Packet[0], Length(Socks5_Request.Socks_Packet));
Socks5_Request.Socks_ID:= PSocket_Identity(Socket.Data).Sock_ID;
TunnelClient.SendBuf(Socks5_Request, Length(Socks5_Request.Socks_Packet + Length(Socks5_Request.Socks_ID)));
end;
I'm pretty sure the problem is on SendBuf second parameter, where I specify the count of bytes to be sent. What is the correct approach, and how I should study to learn it?
There are several problems with your code:
you are expecting String[6] to be 6 bytes, but it is actually 7 bytes instead. You are declaring a ShortString with a max length of 6 AnsiChar characters, and a ShortString contains a leading byte for the length.
your record is not packed, so it is subject to alignment padding. So don't try to send it as-is. You are dealing with variable-length data, so you should be serializing the record's content instead.
array of byte is a dynamic array. A dynamic array is a pointer to data allocated elsewhere in memory. The byte size of the array variable itself is SizeOf(Pointer), which is 4 on 32bit and 8 on 64bit. You have to dereference the pointer in order to access the allocated memory block. You are doing that on your receive, but are not doing that on your send.
TCP is a streaming transport, it has no concept of messages at all. Socket.ReceiveLength reports the number of unread bytes that are currently in the socket's internal receive buffer at that particular moment. Those bytes are arbitrary, there may be as few as 1 byte in the buffer. And more bytes may be received after you query the length and before performing the actual read.
when sending data, it is not guaranteed to send as many bytes are you request, it may send fewer bytes. The return value of SendBuf() indicates the actual number of bytes accepted for sending, so you may need to call SendBuf() multiple times to send a particular piece of data. So loop the send until the data is exhausted.
Since you are trying to tunnel arbitrary data, you need to specify how many bytes you are actually tunneling.
With that said, try something more like this:
function TForm1.SendData(Socket: TCustomWinSocket; const Data; DataLen: Integer): Boolean;
var
PData: PByte;
NumSent: Integer;
begin
Result := False;
PData := PByte(#Data);
while DataLen > 0 do
begin
// SendBuf() returns -1 on error. If that error is WSAEWOULDBLOCK
// then retry the send again. Otherwise, TCustomWinSocket disconnects
// itself, and if its OnError event handler does not set the ErrorCode
// to 0 then it raises an ESocketError exception...
//
NumSent := Socket.SendBuf(PData^, DataLen);
if NumSent = -1 then
begin
if WSAGetLastError() <> WSAEWOULDBLOCK then
Exit;
end else
begin
Inc(PData, NumSent);
Dec(DataLen, NumSent);
end;
end;
Result := True;
end;
function TForm1.SendInteger(Socket: TCustomWinSocket; Value: Integer): Boolean;
begin
Value := htonl(Value);
Result := SendData(Socket, Value, SizeOf(Value));
end;
function TForm1.SendString(Socket: TCustomWinSocket; const Value: String): Boolean;
var
S: AnsiString; // or UTF8String
begin
S := AnsiString(Value);
// or: S := UTF8Encode(Value);
Result := SendInteger(Socket, Length(S));
if Result then
Result := SendData(Socket, PAnsiChar(S)^, Length(S));
end;
function TForm1.SendBytes(Socket: TCustomWinSocket; const Data: array of Byte): Boolean;
begin
Result := SendInteger(Socket, Length(Data));
if Result then
Result := SendData(Socket, PByte(Data)^, Length(Data));
end;
procedure TForm1.SocksServerClientRead(Sender: TObject; Socket: TCustomWinSocket);
var
Packet: array of byte;
begin
Len := Socket.ReceiveLength;
if Len <= 0 the Exit;
SetLength(Packet, Len);
Len := Socket.ReceiveBuf(PByte(Packet)^, Len);
if Len <= 0 the Exit;
if Len < Length(Packet) then
SetLength(Packet, Len);
if SendString(TunnelClient, PSocket_Identity(Socket.Data).Sock_ID) then
SendBytes(TunnelClient, Packet);
end;
And then adjust your tunnel receiver accordingly to de-serialize the values as needed (read an Integer byte count, convert it to host byte order using ntohl(), then read the specified number of bytes).

Writing tList<string> to tFileStream

I use Berlin in Windows 10. I try to save tList<string> to a file.
I know how to handle tStringlist, tStreamWriter and tStreamReader but I need to use tFileStream because the other type of data should be added.
In the following code the loop of Button2Click which reads the data raises an eOutOfMemory exception. When I allocate simple string value to _String it works well but if I put tList value to the same _String it seems that wrong data were written on the file. I can't understand the difference between _String := _List.List[i] and _String := 'qwert'.
How can I write tList<string> to tFileSteam?
procedure TForm1.Button1Click(Sender: TObject);
var
_List: TList<string>;
_FileStream: TFileStream;
_String: string;
i: Integer;
begin
_List := TList<string>.Create;
_List.Add('abcde');
_List.Add('abcde12345');
_FileStream := TFileStream.Create('test', fmCreate);
for i := 0 to 1 do
begin
_String := _List.List[i]; // _String := 'qwert' works well
_FileStream.Write(_string, 4);
end;
_FileStream.Free;
_List.Free;
end;
procedure TForm1.Button2Click(Sender: TObject);
var
_FileStream: TFileStream;
_String: string;
i: Integer;
begin
_FileStream := TFileStream.Create('test', fmOpenRead);
for i := 0 to 1 do
begin
_FileStream.Read(_String, 4);
Memo1.Lines.Add(_String);
end;
_FileStream.Free;
end;
If you lookup in the docs what TFileStream.Write does, it tells you (inherited from THandleStream.Write):
function Write(const Buffer; Count: Longint): Longint; override;
function Write(const Buffer: TBytes; Offset, Count: Longint): Longint; override;
Writes Count bytes from the Buffer to the current position in the
resource.
Now, Buffer is untyped and as such is expected to be the memory address of the data to be written. You are passing a string variable which is a reference to the actual string data, the address of the variable holds a pointer to string data. You are therefore writing a pointer to the file.
To correct it pass the strings first character for the buffer, ....write(_string[1], ...
If you have compiler directive {$ZEROBASEDSTRINGS ON} you would use index 0.
Alternatively, typecast the string to PChar and dereference it: ....write(PChar(_String)^, ...
Then look at the second parameter, Count. As the docs say, it indicates the number of bytes to be written, specifically not characters. In Delphi 2009 and later strings are UnicodeString, so each character is 2 bytes. You need to pass the strings size in bytes.
This will write 4 characters (8 bytes) to the file stream:
_FileStream.Write(_String[1], 4 * SizeOf(Char));
or better
_FileStream.Write(PChar(_String)^, 4 * SizeOf(Char));
For reading you need to make corresponding changes, but most notable, you need to set the strings length before reading (length is counted in characters).
SetLength(_String, 4);
for i := 0 to 1 do
begin
_FileStream.Read(_String[1], 4 * SizeOf(Char));
Memo1.Lines.Add(_String);
end;
To continue with this low-level approach you could generalize string writing and reading as follows:
Add a variable to hold the length of a string
var
_String: string;
_Length: integer;
then writing
begin
...
for ....
begin
_String := _List.List[i];
_Length := Length(_String);
_FileStream.Write(_Length, SizeOf(Integer));
_FileStream.Write(PChar(_List.List[i])^, _Length * SizeOf(Char));
end;
and reading
begin
...
for ....
begin
_FileStream.Read(_Length, SizeOf(Integer));
SetLength(_String, _Length);
_FileStream.Read(_String[1], _Length * SizeOf(Char));
Memo1.Lines.Add(_String);
end;
IOW, you write the length first and then the string. On reading you read the length and then the string.

What is the purpose of the Count parameter of TStream.WriteData and TStream.ReadData?

The TStream class contains many overloads of WriteData that are of this form:
function WriteData(const Buffer: Int32; Count: Longint): Longint; overload;
There are overloads for all the usual suspects, AnsiChar, Char, UInt32, Double and so on. Similarly for ReadData. I'm trying to understand what purpose the Count parameter serves. The implementation of the overload mentioned above is as follows:
function TStream.Skip(Amount: Integer): Integer;
var
P: Integer;
begin
P := Position;
Result := Seek(Amount, soCurrent) - P;
end;
function TStream.WriteData(const Buffer: Int32; Count: Longint): Longint;
const
BufSize = SizeOf(Buffer);
begin
if Count > BufSize then
Result := Write(Buffer, BufSize) + Skip(Count - BufSize)
else
Result := Write(Buffer, Count)
end;
I can obviously see what this code does, but I cannot understand why it would make sense to perform a partial write. Why would it ever make sense to call this function with Count < BufSize? The behaviour then is very odd.
Does anyone know why these overloads were added and what purpose they are intended for? Naturally I've looked at the documentation which has nothing to say about these methods.
As an aside I will submit bug report concerning this line:
Result := Write(Buffer, BufSize) + Skip(Count - BufSize);
It is a mistake to assume that the call to Write will occur before the call to Skip. The evaluation order of the operands to the + operator is not defined. This code should rather be written like this:
Result := Write(Buffer, BufSize);
inc(Result, Skip(Count - BufSize));
Theory crafting
if TStream predate the introduction of the overload keyword (Delphi 3 IIRC), they probably introduced a single method to write integer that was probably int32. When calling the function with a "byte" variable, it would get passed to the function as Integer, and then the Count parameter would only allow to write a single byte. Now they support this for backward compatibility purpose.
In some cases(like next one), supporting Count < Bufsize is indeed especially silly :
function WriteData(const Buffer: Int8; Count: Longint): Longint; overload;
Another justification would be in the next situation when a variable only need to be saved to stream as an Int8 but is worked on as a Int32 during program execution (because it is passed to a function that only takes a var : Int32 as parameter).
procedure SomeProc(var MyInt : Integer);
procedure DoSomeStream;
var
iVal : Integer;
// bVal : ShortInt;
begin
SomeProc(iVal);
Stream.WriteData(iVal, SizeOf(Byte));
//Instead of
// SomeProc(iVal);
// bVal := iVal;
// Stream.WriteData(bVal)
end;
I'm not saying it's required (can be worked around) but in some corner case situation, it could be useful.
For me it seems that this code enables you to write some data and than skip to a position far behind the data.
e.g. you have a stream containing multiple integers and you want to overwrite every 5th, you can do it with:
mData := 15;
WriteData(mData, SizeOf(mData) * 5);

is possible write/read a file using a string data type structure?

for write something in a file i use for example this code:
procedure MyProc (... );
const
BufSize = 65535;
var
FileSrc, FileDst: TFileStream;
StreamRead: Cardinal;
InBuf, OutBuf: Array [0..bufsize] of byte;
begin
.....
FileSrc := TFileStream.Create (uFileSrc, fmOpenRead Or fmShareDenyWrite);
try
FileDst := TFileStream.Create (uFileTmp, fmCreate);
try
StreamRead := 0;
while ((iCounter < iFileSize) or (StreamRead = Cardinal(BufSize)))
begin
StreamRead := FileSrc.Read (InBuf, BufSize);
Inc (iCounter, StreamRead);
end;
finally
FileDst.Free;
end;
finally
FileSrc.Free;
end;
end;
And for I/O file i use a array of byte, and so is all ok, but when i use a string, for example declaring:
InBuf, OutBuf: string // in delphi xe2 = unicode string
then not work. In sense that file not write nothing. I have understood why, or just think to have understood it.
I think that problem maybe is why string contain just a pointer to memory and not static structure; correct?
In this case, there is some solution for solve it? In sense, is possible to do something for i can to write a file using string and not vector? Or i need necessary use a vector?
If possible, can i can to do ?
Thanks very much.
There are two issues with using strings. First of all you want to use RawByteString so that you ensure the use of byte sized character elements – a Unicode string has elements that are two bytes wide. And secondly you need to dereference the string which is really just a pointer.
But I wonder why you would prefer strings to the stack allocated byte array.
procedure MyProc (... );
const
BufSize = 65536;
var
FileSrc, FileDst: TFileStream;
StreamRead: Cardinal;
InBuf: RawByteString;
begin
.....
FileSrc := TFileStream.Create (uFileSrc, fmOpenRead Or fmShareDenyWrite);
try
FileDst := TFileStream.Create (uFileTmp, fmCreate);
try
SetLength(InBuf, BufSize);
StreamRead := 0;
while ((iCounter < iFileSize) or (StreamRead = Cardinal(BufSize)))
begin
StreamRead := FileSrc.Read (InBuf[1], BufSize);
Inc (iCounter, StreamRead);
end;
finally
FileDst.Free;
end;
finally
FileSrc.Free;
end;
end;
Note: Your previous code declared a buffer of 65536 bytes, but you only ever used 65535 of them. Probably not what you intended.
To use a string as a buffer (which I would not recommend), you'll have to use SetLength to allocate the internal buffer, and you'll have to pass InBuf[1] and OutBuf[1] as the data to read or write.
var
InBuf, OutBuf: AnsiString; // or TBytes
begin
SetLength(InBuf, BufSize);
SetLength(OutBuf, BufSize);
...
StreamRead := FileSrc.Read(InBuf[1], BufSize); // if TBytes, use InBuf[0]
// etc...
You can also use a TBytes, instead of an AnsiString. The usage remains the same.
But I actually see no advantage in dynamically allocating TBytes, AnsiStrings or RawByteStrings here. I'd rather do what you already do: use a stack based buffer. I would perhaps make it a little smaller in a multi-threaded environment.
Yes, you can save / load strings to / from stream, see the following example
var Len: Integer;
buf: string;
FData: TStream;
// save string to stream
// save the length of the string
Len := Length(buf);
FData.Write(Len, SizeOf(Len));
// save string itself
if(Len > 0)then FData.Write(buf[1], Len * sizeof(buf[1]));
// read string from stream
// read the length of the string
FData.Read(Len, SizeOf(Len));
if(Len > 0)then begin
// get memory for the string
SetLength(buf, Len);
// read string content
FData.Read(buf[1], Len * sizeof(buf[1]));
end else buf := '';
On a related note, to copy the contents from one TStream to another TStream, you could just use the TStream.CopyFrom() method instead:
procedure MyProc (... );
var
FileSrc, FileDst: TFileStream;
begin
...
FileSrc := TFileStream.Create (uFileSrc, fmOpenRead Or fmShareDenyWrite);
try
FileDst := TFileStream.Create (uFileTmp, fmCreate);
try
FileDst.CopyFrom(FileSrc, 0); // or FileDst.CopyFrom(FileSrc, iFileSize)
finally
FileDst.Free;
end;
finally
FileSrc.Free;
end;
...
end;
Which can be simplified by calling CopyFile() instead:
procedure MyProc (... );
begin
...
CopyFile(PChar(uFileSrc), PChar(uFileTmp), False);
...
end;
Either way, you don't have to worry about read/writing the file data manually at all!

Packing and unpacking a Cardinal into four bytes

I have to pack and unpack a Cardinal into four one-byte fields (in Delphi 2010).
I'm doing this across all the pixels of a large image, so I need it to be fast!
Can anyone show me how to write these two functions? (The const and out keywords are just for clarity. If they interfere with inline assembly, then I can remove them.)
procedure FromCardinalToBytes( const aInput: Cardinal;
out aByte1: Byte;
out aByte2: Byte;
out aByte3: Byte;
out aByte4: Byte); inline;
function FromBytesToCardinal( const aByte1: Byte;
const aByte2: Byte;
const aByte3: Byte;
const aByte4: Byte):Cardinal; inline;
I'd recommed not using a function, just use a variant record.
type
TCardinalRec = packed record
case Integer of
0: (Value: Cardinal;);
1: (Bytes: array[0..3] of Byte;);
end;
Then you can easily use this to obtain the individual bytes.
var
LPixel: TCardinalRec;
...
LPixel.Value := 123455;
//Then read each of the bytes using
B1 := LPixel.Bytes[0];
B2 := LPixel.Bytes[1];
//etc.
If you absolutely must, you can put this into a function, but it's trivial enough not to bother with the overhead of a function call.
EDIT
To illustrate the efficiency of the variant record approach consider the following (assuming you're reading your image from a Stream).
var
LPixelBuffer: array[0..1023] of TCardinalRec;
...
ImageStream.Read(LPixelBuffer, SizeOf(LPixelBuffer));
for I := Low(LPixelBuffer) to High(LPixelBuffer) do
begin
//Here each byte is accessible by:
LPixelBuffer[I].Bytes[0]
LPixelBuffer[I].Bytes[1]
LPixelBuffer[I].Bytes[2]
LPixelBuffer[I].Bytes[3]
end;
PS: Instead of an arbitrarily generic Bytes array, you could explicitly name each Byte in the variant record as Red, Green, Blue, (and whatever the fourth Byte means).
There are many ways. The simplest is
function FromBytesToCardinal(const AByte1, AByte2, AByte3,
AByte4: byte): cardinal; inline;
begin
result := AByte1 + (AByte2 shl 8) + (AByte3 shl 16) + (AByte4 shl 24);
end;
procedure FromCardinalToBytes(const AInput: cardinal; out AByte1,
AByte2, AByte3, AByte4: byte); inline;
begin
AByte1 := byte(AInput);
AByte2 := byte(AInput shr 8);
AByte3 := byte(AInput shr 16);
AByte4 := byte(AInput shr 24);
end;
Slightly more sophisticated (but not necessarily faster) is
function FromBytesToCardinal2(const AByte1, AByte2, AByte3,
AByte4: byte): cardinal; inline;
begin
PByte(#result)^ := AByte1;
PByte(NativeUInt(#result) + 1)^ := AByte2;
PByte(NativeUInt(#result) + 2)^ := AByte3;
PByte(NativeUInt(#result) + 3)^ := AByte4;
end;
procedure FromCardinalToBytes2(const AInput: cardinal; out AByte1,
AByte2, AByte3, AByte4: byte); inline;
begin
AByte1 := PByte(#AInput)^;
AByte2 := PByte(NativeUInt(#AInput) + 1)^;
AByte3 := PByte(NativeUInt(#AInput) + 2)^;
AByte4 := PByte(NativeUInt(#AInput) + 3)^;
end;
If you don't need the bytes to be byte variables, you can do even trickier things, like declaring
type
PCardinalRec = ^TCardinalRec;
TCardinalRec = packed record
Byte1,
Byte2,
Byte3,
Byte4: byte;
end;
and then just cast:
var
c: cardinal;
begin
c := $12345678;
PCardinalRec(#c)^.Byte3 // get or set byte 3 in c
If you want fast you need to consider the 80x86 architecture.
The speed depends heavily on what you are doing with the bytes.
The x86 can access the bottom 2 bytes really fast, using the AL and AH registers
(least significant bytes in the 32-bit EAX register)
If you want to get at the higher order two bytes, you do not want to access those directly. Because you'll get an unaligned memory access, wasting CPU-cycles and messing up with the cache.
Making it faster
All this stuff messing with individual bytes is really not needed.
If you want to be really fast, work with 4 bytes at a time.
NewPixel:= OldPixel or $0f0f0f0f;
If you want to process your pixels really fast use inline MMX assembly and work with 8 bytes at a time.
Links:
Wikipedia: http://en.wikipedia.org/wiki/MMX_%28instruction_set%29
Explanation of the MMX instruction set: http://webster.cs.ucr.edu/AoA/Windows/HTML/TheMMXInstructionSet.html
Or re-ask your question on SO: How do I do this bitmap manipulation ... in MMX.
Really really fast
If you want it really really fast, like 100 or 1000x faster than MMX can, your videocard can do that. Google for CUDA or GPGPU.

Resources