I am using: HawkNL library.
There is an nlRead procedure as follows:
function nlRead(socket: NLsocket; var buffer; nbytes: NLint): NLint;
In all examples and other resources a static array is used to read the bytes to.
Just something like this:
FBufArray = array [0..1024] of Byte;
I have a few questions regarding this matter.
Which model/type would be appropriate to satisfy nlRead function that I could
dynamically allocate space for read data?
I was trying to use Pointer and GetMem or dynamic table with SetLength, but It seemed not to work as it should.
What is the correct approach in the situation when I have to read bytes with the unknown speed and do it as fast as possible. I mean What should be the size of the buffor for example?
For me it is relevant because read bytes I have to re-send further at the same time.
Generally how can I read and send bytes as fast as it is possible?
If your question is about what to pass as 'buffer'. You can pass anything you like.
If you pass a pointer you have to dereference it. For example when you call nl Read
Procedure read;
Type
TChunk = record
data: pointer;
datasize: NLint;
End;
Var
Chunk: TChunk;
Const
IdealReadSize = 1024;
Begin
GetMem( Chunk.data, IdealReadSize);
Try
Chunk.datasize := nlRead( YourSocket, Chunk.data^, IdealReadSize );
// Chunk.datasize hold the count of bytes which have been effectively read
// (maybe less than 1024 in case of an error)
// do something with your chunk
Finally
Freemem( Chunk.data, IdealReadSize );
End;
End;
It's the same approach as the TStream.Read() procedure.
You could use an untyped pointer: var P: Pointer;, allocate with GetMem or AllocMem, pass it to your function dereferenced: nlRead(Socket, P^, Count); and deallocate with FreeMem.
With regard to speed, a static buffer, sufficiently large: Buf: array[0..BufSize - 1] of Byte is probably the best.
Related
Given a buffer and its size in bytes, is there a way to convert this to TBytes without copying it?
Example:
procedure HandleBuffer(_Buffer: PByte; _BufSize: integer);
var
Arr: TBytes;
i: Integer;
begin
// some clever code here to get contents of the buffer into the Array
for i := 0 to Length(Arr)-1 do begin
HandleByte(Arr[i]);
end;
end;
I could of course copy the data:
procedure HandleBuffer(_Buffer: PByte; _BufSize: integer);
var
Arr: TBytes;
i: Integer;
begin
// this works but is very inefficient
SetLength(Arr, _BufSize);
Move(PByte(_Buffer)^, Arr[0], _BufSize);
//
for i := 0 to Length(Arr)-1 do begin
HandleByte(Arr[i]);
end;
end;
But for a large buffer (about a hundred megabytes) this would mean I have double the memory requirement and also spend a lot of time unnecessarily copying data.
I am aware that I could simply use a PByte to process each byte in the buffer, I'm only interested in a solution to use a TBytes instead.
I think it's not possible, but I have been wrong before.
No, this is not possible (without unreasonable hacks).
The problem is that TBytes = TArray<Byte> = array of Byte is a dynamic array and the heap object for a non-empty dynamic array has a header containing the array's reference count and length.
A function that accepts a TBytes parameter, when given a plain pointer to an array of bytes, might (rightfully) attempt to read the (non-existing) header, and then you are in serious trouble.
Also, dynamic arrays are managed types (as indicated by the reference count I mentioned), so you might have problems with that as well.
However, in your particular example code, you don't actually use the dynamic array nature of the data at all, so you can work directly with the buffer:
procedure HandleBuffer(_Buffer: PByte; _BufSize: integer);
var
i: Integer;
begin
for i := 0 to _BufSize - 1 do
HandleByte(_Buffer[i]);
end;
I have a procedure which calls several functions:
procedure TForm1.Button1Click(Sender: TObject);
var
rawData: TRawData;
rawInts: TRawInts;
processedData: TProcessedData;
begin
rawData := getRawData();
rawInts := getRawInts(rawData);
processedData := getProcessedData(rawInts);
end;
The data types are defined like this:
TRawData = array[0..131069] of Byte;
TRawInts = array[0..65534] of LongInt;
TProcessedData = array[0..65534] of Double;
running the program with just:
rawData := getRawData();
rawInts := getRawInts(rawData);
Works totally fine. However, when I try to run:
getProcessedData(rawInts)
I get a stackoverflow error. I don't see why this is. The function code for getProcessedData is very simple:
function getProcessedData(rawInts : TRawInts) : TProcessedData;
var
i: Integer;
tempData: TProcessedData;
scaleFactor: Double;
begin
scaleFactor := 0.01;
for i := 0 to 65534 do
tempData[i] := rawInts[i] * scaleFactor;
Result := tempData;
end;
Why is this causing an error ?
The default maximum stack size for a thread is 1 MB. The three local variables of Button1Click total 131,070 + 65,535 * 4 + 65,535 * 8 = 917,490 bytes. When you call getProcessedData, you pass the parameter by value, which means that the function makes a local copy of the parameter on the stack. That adds SizeOf(TRawInts) = 262,140 bytes to bring the stack to at least 1,179,630 bytes, or about 1.1 MB. There's your stack overflow.
You can reduce the stack use by passing the TRawInts array by reference instead. Then the function won't make its own copy. Zdravko's answer suggests using var, but since the function has no need to modify the passed-in array, you should use const instead.
function getProcessedData(const rawInts: TRawInts): TProcessedData;
Naively, we might expect the tempData and Result variables in getProcessedData to occupy additional stack space, but in reality, they probably won't. First, large return types typically result in the compiler changing the function signature, so it would act more like your function were declared with a var parameter instead of a return value:
procedure getProcessedData(rawInts: TRawInts; var Result: TProcessedData);
Then the call is transformed accordingly:
getProcessedData(rawInts, processedData);
Thus, Result doesn't take up any more stack space because it's really just an alias for the variable in the caller's frame.
Furthermore, sometimes the compiler recognizes that an assignment at the end of your function, like Result := tempData, means that tempData doesn't really need any space of its own. Instead, the compiler may treat your function as though you had been writing directly into Result all along:
begin
scaleFactor := 0.01;
for i := 0 to 65534 do
Result[i] := rawInts[i] * scaleFactor;
end;
However, it's best not to count on the compiler to make those sorts of memory-saving changes. Instead, it's better not to lean so heavily on the stack in the first place. To do that, you can use dynamic arrays. Those will move the large amounts of memory out of the stack and into the heap, which is the part of memory used for dynamic allocation. Start by changing the definitions of your array types:
type
TRawData = array of Byte;
TRawInts = array of Integer;
TProcessedData = array of Double;
Then, in your functions that return those types, use SetLength to assign the length of each array. For example, the function we've seen already might go like this:
function getProcessedData(const rawInts: TRawInts): TProcessedData;
var
i: Integer;
scaleFactor: Double;
begin
Assert(Length(rawInts) = 65535);
SetLength(Result, Length(rawInts));
scaleFactor := 0.01;
for i := 0 to High(rawInts) do
Result[i] := rawInts[i] * scaleFactor;
end;
These objects are all very large. And you appear to be allocating them as local variables. They will reside on the stack which has an fixed size, by default 1MB on Windows. You have allocated enough of these large objects in various parts of the call stack to exceed the 1MB limit. Hence a stack overflow.
Another problem in your code is the way you pass these objects as parameters. Passing large objects as value parameters results in copies being made. Copying an integer or two is nothing to worry about. Copying 65 thousand doubles is wasteful. It hurts performance. Don't do that. Pass references to large objects. Passing as const parameters achieves that.
The stack is well suited for small objects. It is not suited to these large objects. Allocate these objects on the heap. Use dynamic arrays: array of Integer, TArray<Integer> etc.
Do not increase the default stack size for your process. Especially in modern times of multi-core machines this is a recipe for out of memory errors.
Do not use magic constants. Use low() and high() to obtain array bounds.
Do pass input parameters by const. This allows the compiler to make optimisations that are significantly beneficial.
The key issue here is the size of your arrays.
If you use SizeOf you will see that they are probably larger than you think:
program Project3;
{$APPTYPE CONSOLE}
uses
SysUtils;
type
TRawData = array[0..131069] of Byte;
TRawInts = array[0..65534] of Longint;
TProcessedData = array[0..65534] of Double;
begin
try
writeln('TProcessedData:':20, SizeOf(TProcessedData):8);
writeln('TRawData:':20, SizeOf(TRawData):8);
writeln('TRawInts:':20, SizeOf(TRawInts):8);
writeln('Total:':20, SizeOf(TRawInts) + SizeOf(TProcessedData) + SizeOf(TRawData):8);
readln;
except
on E:Exception do
Writeln(E.Classname, ': ', E.Message);
end;
end.
Output:
TProcessedData: 524280
TRawData: 131070
TRawInts: 262140
Total: 917490
So most of the 1MB stack is consumed by the arrays. As some of the stack will already be allocated, you get a stack overflow.
This can be avoid by using dynamic arrays, which have their memory allocated from the heap.
TRawData = array of Byte;
TRawInts = array of Longint;
TProcessedData = array of Double;
...
SetLength(TProcessedData, 65535);
...
I want to use a function from FMOD library that locks data in memory of a given sound and returns pointer to the data, so some reading or modification of the data can be done:
function FSOUND_Sample_Lock(Sptr: PFSoundSample; Offset: Integer; Length: Integer;
var Ptr1: Pointer; var Ptr2: Pointer; var Len1: Cardinal;
var Len2: Cardinal): ByteBool;
ptr1 and ptr2 is a pointer to memory, len1 and len2 is length of the data in the memory.
How can I iterate over the data at ptr1 reading each time next SmallInt ?
I'm using Delphi 7 so {$POINTERMATH ON} does not work
In a modern Delphi, with {$POINTERMATH ON}, you can index the elements of the array like this:
PSmallint(ptr1)[i]
If you don't have $POINTERMATH in your Delphi, or if you prefer to leave it off, you can do this:
type
TSmallintArray = array[0..MaxInt div SizeOf(Smallint)-1] of Smallint;
PSmallintArray = ^TSmallintArray;
And then access the elements with:
PSmallintArray(ptr1)[i]
Personally I've never found the latter approach to my tastes, but the choice is yours.
I have a function which creates Pointer to a data from a Stream.
function StreamToByteArray(Stream: TStream): Pointer;
var
ByteArr: array of Byte;
begin
if Assigned(Stream) then
begin
Stream.Position := 0;
SetLength(ByteArr, Stream.Size);
Stream.Read(ByteArr[0], Stream.Size);
end
else
SetLength(ByteArr, 0);
result := #ByteArr[0];
end;
How can I convert it back, from a Pointer to dynamic byte array and
then save the content to a stream. Or maybe it is possible to load stream directly from
a Pointer?
Thanks for help.
Ouch, this code is (unfortunately) very bad. Your function returns a pointer to the ByteArr array, but unfortunately that array runs out of scope when the function exists: you're essentially returning an Invalid Pointer! Even if the error doesn't immediately pop up, you've got a latent Access Violation in there.
Longer explanation
A Pointer is a dangerous structure: it doesn't contain data, it simply says where that data exists. Your example of an untyped Pointer is the most difficult kind of Pointer, it says nothing about the data that exists at the given address. It might point towards some bytes you read from a stream, might point to a string or even a picture of some sorts. You can't even know the amount of data that's at the given address.
The Pointer concept is closely related to the concept of allocating memory. We use many different techniques for allocation memory, using local variables, global variables, objects, dynamic arrays etc. In your sample function you're using a dynamic array, the array of Byte. The compiler does a very nice job of shielding you from the internals of allocating and reallocation memory, you can simply use SetLength() to say how big the array should be. Things work pretty well because the dynamic array is a managed data structure in Delphi: the compiler keeps track of how you're using the dynamic array and will free the associated memory as soon as the dynamic array is no longer needed. As far as the compiler is concerned, the associated memory is no longer required when your function exists.
When you're doing:
Result := #ByteArr[0];
You're essentially taking the address for the compiler-allocated memory block. Since you're using a very low level structure to do that (the Pointer), the compiler can't possibly keep track of your usage of the memory, so it will free the memory when the function exists. That leaves you with a Pointer to un-allocated memory.
How to properly return a Pointer from a function
First of all you should avoid Pointers if possible: they're low-level, the compiler can't help with type-safety or deallocation, they're simply too easy to get wrong. And when you do get Pointers wrong, the errors are usually Access Violations, and they're difficult to track.
That said, if you really want to return a pointer, you should return a pointer to explicitly allocated memory, so you know the compiler doesn't free it for you. When you do that, make sure the receiving code knows it's responsible for the memory (should free the memory when it's no longer needed). For example, your function could be re-written like this:
function StreamToByteArray(Stream: TStream): Pointer;
begin
if Assigned(Stream) then
begin
Result := AllocMem(Stream.Size);
Stream.Position := 0;
Stream.Read(Result^, Stream.Size);
end
else
Result := nil;
end;
How to change a Pointer back to array of byte or TStream
The answer is, there's no way to change it back. A pointer is just that, a pointer to some random data. An array of byte is more then the data it contains. A TStream is even more abstract: it's an interface that tells you how to retrieve data, it doesn't necessarily hold any data. For example, a TFileStream (and that is a TStream) doesn't hold any bytes of data: all the data is in the file on disk.
If you need a pointer to memory to pass to e.g. a function in a DLL, you should make that call while the buffer is still allocated. There are numerous ways to refactor the code below, but the same principle applies regardless of how your code ends up: You must not pass your pointer around after buffer has already been deallocated.
var
ByteArr: array of Byte;
begin
if Assigned(Stream) then
begin
Stream.Position := 0;
SetLength(ByteArr, Stream.Size);
Stream.Read(ByteArr[0], Stream.Size);
end
else
SetLength(ByteArr, 0);
Test(Pointer(ByteArray),Length(ByteArray));
end;
In your Test procedure you can do this:
procedure Test(aData: Pointer; aCount: Integer);
var
ByteArr: array of Byte;
begin
SetLength(ByteArr,aCount);
Move(aData^,Pointer(ByteArr)^,aCount);
Possible solution:
type
TBytes = array of byte;
function StreamToByteArray(Stream: TStream): TBytes;
begin
if Assigned(Stream) then
begin
Stream.Position := 0;
SetLength(result, Stream.Size);
Stream.Read(pointer(result)^, Stream.Size);
end
else
SetLength(result, 0);
end;
procedure Test;
var P: pointer;
begin
P := pointer(StreamToByteArray(aStream)); // returns an allocated TBytes
// ... use P
end; // here the hidden TBytes will be released
You can use pointer() around the result to get the memory location.
And your code won't leak any memory nor trigger any access violation, since an implicit try...finally block will be added by the compiler:
procedure Test;
var P: pointer;
tmp: TBytes; // created by the compiler
begin
tmp := StreamToByteArray(aStream)); // returns an allocated TBytes
try
P := pointer(tmp);
// ... use P
finally // here the hidden TBytes will be released
Finalize(tmp);
end;
end;
You can use RawByteString instead of TBytes if you wish.
Cosmin is right your are returing a pointer to an array that will become out of scope, the pointer will point to an area of memory that was on the stack and may get overwriten, It may appear as though the function works if you use the resust immediatly.
You need to pass the array to be filled into the function as well, or as I usually do (depending upon the data type) simple return a string and use that as a byte array (if you intend to move to a newer Delphi you need to be careful which string type your use).
Also dynamic arrays store the length and data type before the data (8 bytes of) and passing pointers to the 1st element losses the fact its a dynamic array and becomes just a memory buffer making freeing of the array dangerous.
To answer your question, a pointer (+ length) can be put back into a stream with the TStream.WriteBuffer. You may need to clear the stream first as this, as most stream write operations do, will append from the current stream position.
Hope that helps
When I use a large file in memorystream or filestream I see an error which is "out of memory"
How can I solve this problem?
Example:
procedure button1.clıck(click);
var
mem:TMemoryStream;
str:string;
begin
mem:=Tmemorystream.create;
mem.loadfromfile('test.txt');----------> there test.txt size 1 gb..
compressstream(mem);
end;
Your implementation is very messy. I don't know exactly what CompressStream does, but if you want to deal with a large file as a stream, you can save memory by simply using a TFileStream instead of trying to read the whole thing into a TMemoryStream all at once.
Also, you're never freeing the TMemoryStream when you're done with it, which means that you're going to leak a whole lot of memory. (Unless CompressStream takes care of that, but that's not clear from the code and it's really not a good idea to write it that way.)
You can't fit the entire file into a single contiguous block of 32 bit address space. Hence the out of memory error.
Read the file in smaller pieces and process it piece by piece.
Answering the question in the title, you need to process the file piece by piece, byte by byte if that's needed: you definitively do not load the file all at once into memory! How you do that obviously depends on what you need to do with the file; But since we know you're trying to implement an Huffman encoder, I'll give you some specific tips.
An Huffman encoder is a stream encoder: Bytes go in and bits go out. Each unit of incoming data is replaced with it's corresponding bit pattern. The encoder doesn't need to see the whole file at once, because it is in fact only working on one byte each time.
Here's how you'd huffman-compress a file without loading it all into memory; Of course, the actual Huffman encoder is not shown, because the question is about working with big files, not about building the actual encoder. This piece of code includes buffered input and output and shows how you'd link an actual encoder procedure to it.
(beware, code written in browser; if it doesn't compile you're expected to fix it!)
type THuffmanBuffer = array[0..1023] of Byte; // Because I need to pass the array as parameter
procedure DoActualHuffmanEncoding(const EncodeByte:Byte; var BitBuffer: THuffmanBuffer; var AtBit: Integer);
begin
// This is where the actual Huffman encoding would happen. This procedure will
// copy the correct encoding for EncodeByte in BitBuffer starting at AtBit bit index
// The procedure is expected to advance the AtBit counter with the number of bits
// that were actually written (that's why AtBit is a var parameter).
end;
procedure HuffmanEncoder(const FileNameIn, FileNameOut: string);
var InFile, OutFile: TFileStream;
InBuffer, OutBuffer: THuffmanBuffer;
InBytesCount: Integer;
OutBitPos: Integer;
i: Integer;
begin
// First open the InFile
InFile := TFileStream.Create(FileNameIn, fmOpenRead or fmShareDenyWrite);
try
// Now prepare the OutFile
OutFile := TFileStream.Create(FileNameOut, fmCreate);
try
// Start the out bit counter
OutBitPos := 0;
// Read from the input file, one buffer at a time (for efficiency)
InBytesCount := InFile.Read(InBuffer, SizeOf(InBuffer));
while InBytesCount <> 0 do
begin
// Process the input buffer byte-by-byte
for i:=0 to InBytesCount-1 do
begin
DoActualHuffmanEncoding(InBuffer[i], OutBuffer, OutBitPos);
// The function writes bits to the outer buffer, not full bytes, and the
// encoding for a rare byte might be significantly longer then 1 byte.
// Whenever the output buffer approaches it's capacity we'll flush it
// out to the OutFile
if (OutBitPos > ((SizeOf(OutBuffer)-10)*8) then
begin
// Ok, we've got less then 10 bytes available in the OutBuffer, time to
// flush!
OutFile.Write(OutBuffer, OutBitPos div 8);
// We're now possibly left with one incomplete byte in the buffer.
// We'll copy that byte to the start of the buffer and continue.
OutBuffer[0] := OutBuffer[OutBitPos div 8];
OutBitPos := OutBitPos mod 8;
end;
end;
// Read next chunk
InBytesCount := InFile.Read(InBuffer, SizeOf(InBuffer));
end;
// Flush the remaining of the output buffer. This time we want to flush
// the final (potentially incomplete) byte as well, because we've got no
// more input, there'll be no more output.
OutFile.Write(OutBuffer, (OutBitPos + 7) div 8);
finally OutFile.Free;
end;
finally InFile.Free;
end;
end;
The Huffman encoder is not a difficult encoder to implement, but doing it both correctly and fast might be a challenge. I suggest you start with a correct encoder, once you've got both encoding and decoding working figure out how to do a fast encoder.
try something like http://www.explainth.at/en/delphi/mapstream.shtml