I'm having a weird issue on using Delphi's TMemoryStream (or TFileStream for that matter). While reading a part of the stream into a byte array. Here's some code as an example.
procedure readfromstream();
var
ms : TMemoryStream;
buffer : array of byte;
recordSize : Integer;
begin
try
begin
ms := TMemeoryStream.Create();
ms.LoadFromFile(<some_path_to_a_binary_file>);
while ms.Position < ms.Size do
begin
buffer := nil;
SetLength(buffer, 4);
ms.ReadBuffer(buffer, 4);
move(buffer[0], recordSize, 4);
SetLength(buffer, recordSize);
ms.Position := ms.Position - 4; // Because I was having issues trying to read the rest of the record into a specific point in the buffer
FillChar(buffer, recordSize, ' ');
ms.ReadBuffer(buffer, recordSize); // Issue line ???
// Create the record from the buffer
end;
finally
begin
ms.Free();
end;
end;
procedure is called as,
// Some stuff happens before it
readfromstream();
// Some stuff happens after it
on debugging, I can see that it reads the stream into the buffer and the record is stored in memory appropriately. The procedure then exits normally and the debugger steps out of the procedure, but I end up straight back into the procedure and it repeats.
By forcing the procedure to exit prematurely I believe the issue involves the ms.ReadBuffer(buffer, recordSize); but I don't see why it would cause the issue.
This procedure is called only once. My test data has only one entry/data.
Any help would be greatly appreciated.
FillChar(buffer, recordSize, ' ');
Here you are overwriting the dynamic array variable, a pointer, rather than writing to the content of the array. That causes a memory corruption. Pretty much anything goes at that point.
The call to FillChar is needless anyway. You are going to read into the entire array anyway. Remove the call to FillChar.
For future reference, to do that call correctly, you write it like this:
FillChar(Pointer(buffer)^, ...);
or
FillChar(buffer[0], ...);
I prefer the former since the latter is subject to range errors when the array length is zero.
And then
ms.ReadBuffer(buffer, recordSize);
makes the exact same mistake, writing to the array variable rather than the array, and thus corrupting memory.
That should be
ms.ReadBuffer(Pointer(buffer)^, recordSize);
or
ms.ReadBuffer(buffer[0], recordSize);
The first 4 lines inside the loop are clumsy. Read directly into the variable:
ms.ReadBuffer(recordSize, SizeOf(recordSize));
I recommend that you perform some sanity checks on the value of recordSize that you read. For instance, any value less than 4 is clearly an error.
There's not a lot of point in moving the stream pointer back and reading again. You can copy recordSize into the first 4 bytes and the array and then read the rest.
Move(recordSize, buffer[0], SizeOf(recordSize));
ms.ReadBuffer(buffer[SizeOf(recordSize)], recordSize - SizeOf(recordSize));
A memory stream also seems wasteful. Why read the entire file into memory? That's going to place stress on your address space for large files. Use a buffered file stream.
Letting the caller allocate the stream would give more flexibility to the caller. They could then read from any type of stream and not be constrained to use a disk file.
Your try/finally block is wrong. You must acquire the resource immediately before the try. As you have it, an exception in the constructor leads to you calling Free on an uninitialized variable.
A better version might be:
procedure ReadFromStream(Stream: TStream);
var
buffer: TArray<byte>;
recordSize: Integer;
begin
while Stream.Position < Stream.Size do
begin
Stream.ReadBuffer(recordSize, SizeOf(recordSize));
if recordSize < SizeOf(recordSize) then
raise ...;
SetLength(buffer, recordSize);
Move(recordSize, buffer[0], SizeOf(recordSize));
if recordSize > SizeOf(recordSize) then
Stream.ReadBuffer(buffer[SizeOf(recordSize)],
recordSize - SizeOf(recordSize));
// process record
end;
end;
Sorry I can't add a comment, being a newb and all :) This reply is based on my understanding of Clayton's code in light of his comment with the recordSize values.
The reason David's code is looping is probably that you are interpreting each four byte "block" is a number. I'll assume your first Stream.Readbuffer is correct and that the first four bytes in the file is a length.
Now, unless I'm mistaken, I expect the recordSize will usually be greater than SizeOf(recordSize), which I think should be 4 (the size of an int). Nevertheless, this line is meaningless here.
The SetLength is correct, given my previous assumption.
Now your Move is where the story hits a snag: you haven't read anything since you read the length! So before the move, you should have:
bytesRead := Stream.Readbuffer(Pointer(buffer)^, recordSize);
Now you can check for EOF:
if bytesRead <> recordSize then
raise...;
...and move the buffer somewhere (if you wish):
Move(buffer[0], dest[0], recordSize);
And you are positioned to read the next recordSize value.
Repeat until EOF.
Related
I have 2 Tbytes var:
A: Tbytes;
B: Tbytes;
Now i want to swap then like this
tmp := A;
A := B;
B := tmp;
But I not sure if this is the most efficient way, especially with the copy-on-write (if it's the same as with String)
maybe something like this :
Tmp := Pointer(a);
pointer(a) := pointer(b);
pointer(b) := Tmp ;
There is no copy-on-write for dynamic arrays, but if there were, it would not matter, because nothing is written (to the contents of the arrays).
Your way is the most efficient: only references are copied, and a few reference counts are updated.
The way using pointers would be slightly more efficient (no refcounting), but also a bit more risky. You can do this because in the end, the reference counts of both arrays should be the same as they were before. If nothing can access the (local) references during the swap, it should not matter.
Update
And if you do what David recommended, i.e. put this code in a separate procedure, then it doesn't matter a lot if you use a local Tempvariable or an external one. But the swap using Pointer casts is 10x (ten times) as fast as the normal swap using TBytes!
See my comment to the other answer: it doesn't matter if you use an external or a local Temp variable: they are almost equally fast. I measured the one with a local Temp variable at an average of 6512 milliseconds, the one with the external Temp variable at 6729 milliseconds and the one using pointers at 589 milliseconds. I did several tests in different orders to eradicate any timing errors. There are timing differences when swapping empty (nil) arrays, but I assume these don't matter a lot
As it was already answered your code of swapping two TBytes between each other is the most efficient.
So my post here isn't an answer to your question but instead with it I'm trying just to warn you about how you can possibly screw up performance by using this code impropriety where performance loss will be actually caused by the code that is calling this code from your question.
Now based on the fact that you are even thinking about performance of such small piece of code I'm guessing you are probably planning on executing of this code in one large loop where slightest gain in performance of this code might have big consequences on overall performance of your application. If you would have called this code a few times I bet you wouldn't worry about its performance at all since it would be negligible to the performance of your entire application.
So if you follow the David's suggestion of putting this code into a procedure I'm guessing you might write something like this:
procedure SwapBytes(var A,B: TBytes);
var Temp: Tbytes;
begin
Temp := A;
A := B;
B := Temp;
end;
Nothing fancy. But the problem with this would be that every time you would call this procedure in your loop your application will have to initialize (allocate memory for it) that local variable upon entering the procedure and then finalize (release its memory) it on exiting the above procedure. Now why is this so bad? Because allocating od deallocating memory is much slower than actually writing to or reading from already allocated memory.
So how can you avoid this problem? You do so by initializing the Temp variable outside of your procedure and pass it to the procedure as additional parameter instead. Performance gain can be significant this way can be significant.
Here is my test example where I used both approaches and measure their performance.
//Basic procedure for swapping two TBytes values between each other
//It has local variable Temp of TBytes type which is automatically created when
//entering the procedure and released when exiting the procedure
procedure SwapBytesLocalTempVariable(var A,B: TBytes);
var Temp: TBytes;
begin
Temp := A;
A := B;
B := Temp;
end;
//Same as above bit this procedure does not contain any local variable so you
//need to pas the Temp variable as an additional input parameter
procedure SwapBytesExternalTempVariable(var A,B,Temp: TBytes);
begin
Temp := A;
A := B;
B := Temp;
end;
//Quick procedure for testing
procedure TForm1.Button1Click(Sender: TObject);
var A,B: TBytes;
I: Integer;
SW: TStopWatch;
Temp: TBytes;
begin
//Calling first procedure with local temp variable in a loop many times can be
//quite slow because your program needs to initialize and release that local
//variable in each loop cycle.
SW := TStopWatch.Create;
SW.Start;
for I := 0 to 100000000 do
begin
SwapBytesLocalTempVariable(A,B);
end;
SW.Stop;
Memo1.Lines.Add(Format('Swap bytes with local variable: %f',[SW.Elapsed.TotalMilliseconds]));
//Calling second procedure which does not have local temp variable and passing
//the temp variable as additional parameter is much quicker because this way
//the Temp variable isn't initialized and then released in each loop cycle but
//instead we created (initialized) it outside the loop (out OnClick method of
//TButton and is therefore being reused in each loop cycle.
SW := TStopWatch.Create;
SW.Start;
for I := 0 to 100000000 do
begin
SwapBytesExternalTempVariable(A,B,Temp);
end;
SW.Stop;
Memo1.Lines.Add(Format('Swap bytes with external variable: %f',[SW.Elapsed.TotalMilliseconds]));
end;
Now as you can see the performance difference of these two approaches is quite significant. During my testing calling first procedure with local variable took about 1800 millisecond (almost two seconds) while calling second procedure where I pas Temp variable as additional parameter to it only took about 800 milliseconds. Now that is one second performance gain between the two mentioned approaches.
Any way the general advice is to try and reduce the number of memory allocations as much as possible and try to reuse variables where possible.
I am trying to write and read a non-fixed string using TFileStream. I am getting an access violation error though. Here is my code:
// Saving a file
(...)
count:=p.Tags.Count; // Number of lines to save (Tags is a TStringList)
FS.Write(count, SizeOf(integer));
for j := 0 to p.Tags.Count-1 do
begin
str:=p.Tags.Strings[j];
tmp:=Length(str)*SizeOf(char);
FS.Write(tmp, SizeOf(Integer));
FS.Write(str[1], Length(str)*SizeOf(char));
end;
// Loading a file
(...)
p.Tags.Add('hoho'); // Check if Tags is created. This doesn't throw an error.
Read(TagsCount, SizeOf(integer)); // Number of lines to read
for j := 0 to TagsCount-1 do
begin
Read(len, SizeOf(Integer)); // length of this line of text
SetLength(str, len); // don't know if I have to do this
Read(str, len); // No error, but str has "inaccessible value" in watch list
p.Tags.Add(str); // Throws error
end;
The file seems to save just fine, when I open it with a hexeditor, I can find the right strings saved there, but loading is throwing errors.
Could you help me out?
You save the number of bytes, and that's how many bytes you write. When you read the value, you treat it as the number of characters, and then read that many bytes. That won't cause the problem you're seeing now, though, since you're making the buffer bigger than it needs to be as of Delphi 2009.
The problem is that you're reading into the string variable, not the string's contents. You used str[1] when writing; do the same when reading. Otherwise, you're overwriting the string reference that you allocated when you called SetLength.
Read(nBytes, SizeOf(Integer));
nChars := nBytes div SieOf(Char);
SetLength(str, nChars);
Read(str[1], nBytes);
And yes, you do need to call SetLength. Read doesn't know what its reading into, so it has no way of knowing that it needs to set the size to anything in advance.
I have a function which creates Pointer to a data from a Stream.
function StreamToByteArray(Stream: TStream): Pointer;
var
ByteArr: array of Byte;
begin
if Assigned(Stream) then
begin
Stream.Position := 0;
SetLength(ByteArr, Stream.Size);
Stream.Read(ByteArr[0], Stream.Size);
end
else
SetLength(ByteArr, 0);
result := #ByteArr[0];
end;
How can I convert it back, from a Pointer to dynamic byte array and
then save the content to a stream. Or maybe it is possible to load stream directly from
a Pointer?
Thanks for help.
Ouch, this code is (unfortunately) very bad. Your function returns a pointer to the ByteArr array, but unfortunately that array runs out of scope when the function exists: you're essentially returning an Invalid Pointer! Even if the error doesn't immediately pop up, you've got a latent Access Violation in there.
Longer explanation
A Pointer is a dangerous structure: it doesn't contain data, it simply says where that data exists. Your example of an untyped Pointer is the most difficult kind of Pointer, it says nothing about the data that exists at the given address. It might point towards some bytes you read from a stream, might point to a string or even a picture of some sorts. You can't even know the amount of data that's at the given address.
The Pointer concept is closely related to the concept of allocating memory. We use many different techniques for allocation memory, using local variables, global variables, objects, dynamic arrays etc. In your sample function you're using a dynamic array, the array of Byte. The compiler does a very nice job of shielding you from the internals of allocating and reallocation memory, you can simply use SetLength() to say how big the array should be. Things work pretty well because the dynamic array is a managed data structure in Delphi: the compiler keeps track of how you're using the dynamic array and will free the associated memory as soon as the dynamic array is no longer needed. As far as the compiler is concerned, the associated memory is no longer required when your function exists.
When you're doing:
Result := #ByteArr[0];
You're essentially taking the address for the compiler-allocated memory block. Since you're using a very low level structure to do that (the Pointer), the compiler can't possibly keep track of your usage of the memory, so it will free the memory when the function exists. That leaves you with a Pointer to un-allocated memory.
How to properly return a Pointer from a function
First of all you should avoid Pointers if possible: they're low-level, the compiler can't help with type-safety or deallocation, they're simply too easy to get wrong. And when you do get Pointers wrong, the errors are usually Access Violations, and they're difficult to track.
That said, if you really want to return a pointer, you should return a pointer to explicitly allocated memory, so you know the compiler doesn't free it for you. When you do that, make sure the receiving code knows it's responsible for the memory (should free the memory when it's no longer needed). For example, your function could be re-written like this:
function StreamToByteArray(Stream: TStream): Pointer;
begin
if Assigned(Stream) then
begin
Result := AllocMem(Stream.Size);
Stream.Position := 0;
Stream.Read(Result^, Stream.Size);
end
else
Result := nil;
end;
How to change a Pointer back to array of byte or TStream
The answer is, there's no way to change it back. A pointer is just that, a pointer to some random data. An array of byte is more then the data it contains. A TStream is even more abstract: it's an interface that tells you how to retrieve data, it doesn't necessarily hold any data. For example, a TFileStream (and that is a TStream) doesn't hold any bytes of data: all the data is in the file on disk.
If you need a pointer to memory to pass to e.g. a function in a DLL, you should make that call while the buffer is still allocated. There are numerous ways to refactor the code below, but the same principle applies regardless of how your code ends up: You must not pass your pointer around after buffer has already been deallocated.
var
ByteArr: array of Byte;
begin
if Assigned(Stream) then
begin
Stream.Position := 0;
SetLength(ByteArr, Stream.Size);
Stream.Read(ByteArr[0], Stream.Size);
end
else
SetLength(ByteArr, 0);
Test(Pointer(ByteArray),Length(ByteArray));
end;
In your Test procedure you can do this:
procedure Test(aData: Pointer; aCount: Integer);
var
ByteArr: array of Byte;
begin
SetLength(ByteArr,aCount);
Move(aData^,Pointer(ByteArr)^,aCount);
Possible solution:
type
TBytes = array of byte;
function StreamToByteArray(Stream: TStream): TBytes;
begin
if Assigned(Stream) then
begin
Stream.Position := 0;
SetLength(result, Stream.Size);
Stream.Read(pointer(result)^, Stream.Size);
end
else
SetLength(result, 0);
end;
procedure Test;
var P: pointer;
begin
P := pointer(StreamToByteArray(aStream)); // returns an allocated TBytes
// ... use P
end; // here the hidden TBytes will be released
You can use pointer() around the result to get the memory location.
And your code won't leak any memory nor trigger any access violation, since an implicit try...finally block will be added by the compiler:
procedure Test;
var P: pointer;
tmp: TBytes; // created by the compiler
begin
tmp := StreamToByteArray(aStream)); // returns an allocated TBytes
try
P := pointer(tmp);
// ... use P
finally // here the hidden TBytes will be released
Finalize(tmp);
end;
end;
You can use RawByteString instead of TBytes if you wish.
Cosmin is right your are returing a pointer to an array that will become out of scope, the pointer will point to an area of memory that was on the stack and may get overwriten, It may appear as though the function works if you use the resust immediatly.
You need to pass the array to be filled into the function as well, or as I usually do (depending upon the data type) simple return a string and use that as a byte array (if you intend to move to a newer Delphi you need to be careful which string type your use).
Also dynamic arrays store the length and data type before the data (8 bytes of) and passing pointers to the 1st element losses the fact its a dynamic array and becomes just a memory buffer making freeing of the array dangerous.
To answer your question, a pointer (+ length) can be put back into a stream with the TStream.WriteBuffer. You may need to clear the stream first as this, as most stream write operations do, will append from the current stream position.
Hope that helps
When I use a large file in memorystream or filestream I see an error which is "out of memory"
How can I solve this problem?
Example:
procedure button1.clıck(click);
var
mem:TMemoryStream;
str:string;
begin
mem:=Tmemorystream.create;
mem.loadfromfile('test.txt');----------> there test.txt size 1 gb..
compressstream(mem);
end;
Your implementation is very messy. I don't know exactly what CompressStream does, but if you want to deal with a large file as a stream, you can save memory by simply using a TFileStream instead of trying to read the whole thing into a TMemoryStream all at once.
Also, you're never freeing the TMemoryStream when you're done with it, which means that you're going to leak a whole lot of memory. (Unless CompressStream takes care of that, but that's not clear from the code and it's really not a good idea to write it that way.)
You can't fit the entire file into a single contiguous block of 32 bit address space. Hence the out of memory error.
Read the file in smaller pieces and process it piece by piece.
Answering the question in the title, you need to process the file piece by piece, byte by byte if that's needed: you definitively do not load the file all at once into memory! How you do that obviously depends on what you need to do with the file; But since we know you're trying to implement an Huffman encoder, I'll give you some specific tips.
An Huffman encoder is a stream encoder: Bytes go in and bits go out. Each unit of incoming data is replaced with it's corresponding bit pattern. The encoder doesn't need to see the whole file at once, because it is in fact only working on one byte each time.
Here's how you'd huffman-compress a file without loading it all into memory; Of course, the actual Huffman encoder is not shown, because the question is about working with big files, not about building the actual encoder. This piece of code includes buffered input and output and shows how you'd link an actual encoder procedure to it.
(beware, code written in browser; if it doesn't compile you're expected to fix it!)
type THuffmanBuffer = array[0..1023] of Byte; // Because I need to pass the array as parameter
procedure DoActualHuffmanEncoding(const EncodeByte:Byte; var BitBuffer: THuffmanBuffer; var AtBit: Integer);
begin
// This is where the actual Huffman encoding would happen. This procedure will
// copy the correct encoding for EncodeByte in BitBuffer starting at AtBit bit index
// The procedure is expected to advance the AtBit counter with the number of bits
// that were actually written (that's why AtBit is a var parameter).
end;
procedure HuffmanEncoder(const FileNameIn, FileNameOut: string);
var InFile, OutFile: TFileStream;
InBuffer, OutBuffer: THuffmanBuffer;
InBytesCount: Integer;
OutBitPos: Integer;
i: Integer;
begin
// First open the InFile
InFile := TFileStream.Create(FileNameIn, fmOpenRead or fmShareDenyWrite);
try
// Now prepare the OutFile
OutFile := TFileStream.Create(FileNameOut, fmCreate);
try
// Start the out bit counter
OutBitPos := 0;
// Read from the input file, one buffer at a time (for efficiency)
InBytesCount := InFile.Read(InBuffer, SizeOf(InBuffer));
while InBytesCount <> 0 do
begin
// Process the input buffer byte-by-byte
for i:=0 to InBytesCount-1 do
begin
DoActualHuffmanEncoding(InBuffer[i], OutBuffer, OutBitPos);
// The function writes bits to the outer buffer, not full bytes, and the
// encoding for a rare byte might be significantly longer then 1 byte.
// Whenever the output buffer approaches it's capacity we'll flush it
// out to the OutFile
if (OutBitPos > ((SizeOf(OutBuffer)-10)*8) then
begin
// Ok, we've got less then 10 bytes available in the OutBuffer, time to
// flush!
OutFile.Write(OutBuffer, OutBitPos div 8);
// We're now possibly left with one incomplete byte in the buffer.
// We'll copy that byte to the start of the buffer and continue.
OutBuffer[0] := OutBuffer[OutBitPos div 8];
OutBitPos := OutBitPos mod 8;
end;
end;
// Read next chunk
InBytesCount := InFile.Read(InBuffer, SizeOf(InBuffer));
end;
// Flush the remaining of the output buffer. This time we want to flush
// the final (potentially incomplete) byte as well, because we've got no
// more input, there'll be no more output.
OutFile.Write(OutBuffer, (OutBitPos + 7) div 8);
finally OutFile.Free;
end;
finally InFile.Free;
end;
end;
The Huffman encoder is not a difficult encoder to implement, but doing it both correctly and fast might be a challenge. I suggest you start with a correct encoder, once you've got both encoding and decoding working figure out how to do a fast encoder.
try something like http://www.explainth.at/en/delphi/mapstream.shtml
I do not know how to use any API that is not in the RTL. I have been using SetFilePointer and GetFileSize to read a Physical Disk into a buffer and dump it to a file, something like this in a loop does the job for flash memory cards under 2GB:
SetFilePointer(PD,0,nil,FILE_BEGIN);
SetLength(Buffer,512);
ReadFile(PD,Buffer[0],512,BytesReturned,nil);
However GetFileSize has a limit at 2GB and so does SetFilePointer. I have absolutley no idea how to delcare an external API, I have looked at the RTL and googled for many examples and have found no correct answer.
I tried this
function GetFileSizeEx(hFile: THandle; lpFileSizeHigh: Pointer): DWORD;
external 'kernel32';
and as suggested this
function GetFileSizeEx(hFile: THandle; var FileSize: Int64): DWORD;
stdcall; external 'kernel32';
But the function returns a 0 even though I am using a valid disk handle which I have confirmed and dumped data from using the older API's.
I am using SetFilePointer to jump every 512 bytes and ReadFile to write into a buffer, in reverse I can use it to set when I am using WriteFile to write Initial Program Loader Code or something else to the disk. I need to be able to set the file pointer beyond 2gb well beyond.
Can someone help me make the external declarations and a call to both GetFileSizeEx and SetFilePointerEx that work so I can modify my older code to work with say 4 to 32gb flash cards.
I suggest that you take a look at this Primoz Gabrijelcic blog article and his GpHugeFile unit which should give you enough pointers to get the file size.
Edit 1 This looks rather a daft answer now in light of the edit to the question.
Edit 2 Now that this answer has been accepted, following a long threads of comments to jachguate's answer, I feel it incumbent to summarise what has been learnt.
GetFileSize and
SetFilePointer have no 2GB
limitation, they can be used on files
of essentially arbitrary size.
GetFileSizeEx and
SetFilePointerEx are much
easier to use because they work
directly with 64 bit quantities and
have far simpler error condition
signals.
The OP did not in fact need to
calculate the size of his disk. Since
the OP was reading the entire
contents of the disk the size was not
needed. All that was required was to
read the contents sequentially until
there was nothing left.
In fact
GetFileSize/GetFileSizeEx
do not support handles to devices
(e.g. a physical disk or volume) as
was requested by the OP. What's more,
SetFilePointer/SetFilePointerEx
cannot seek to the end of such device
handles.
In order to obtain the size of a
disk, volume or partition, one should
pass the the
IOCTL_DISK_GET_LENGTH_INFO
control code to
DeviceIoControl.
Finally, should you need to use GetFileSizeEx and SetFilePointerEx then they can be declared as follows:
function GetFileSizeEx(hFile: THandle; var lpFileSize: Int64): BOOL;
stdcall; external 'kernel32.dll';
function SetFilePointerEx(hFile: THandle; liDistanceToMove: Int64;
lpNewFilePointer: PInt64; dwMoveMethod: DWORD): BOOL;
stdcall; external 'kernel32.dll';
One easy way to obtain these API imports them is through the excellent JEDI API Library.
The GetFileSizeEx routine expects a pointer to a LARGE_INTEGER data type, and documentation says:
If your compiler has built-in support for 64-bit integers, use the QuadPart member to store the 64-bit integer
Lucky you, Delphi has built-in support for 64 bit integers, so use it:
var
DriveSize: LongWord;
begin
GetFilePointerSizeEx(PD, #DriveSize);
end;
SetFilePointerEx, on the other hand, expects parameters for liDistanceToMove, lpNewFilePointer, both 64 bit integers. My understanding is it wants signed integers, but you have the UInt64 data type for Unsingned 64 bit integers if I'm missunderstanding the documentation.
Alternative coding
Suicide, first of all your approach is wrong, and because of your wrong approach you ran into some hairy problems with the way Windows handles Disk drives opened as files. In pseudo code your approach seems to be:
Size = GetFileSize;
for i=0 to (Size / 512) do
begin
Seek(i * 512);
ReadBlock;
WriteBlockToFile;
end;
That's functionally correct, but there's a simpler way to do the same without actually getting the SizeOfDisk and without seeking. When reading something from a file (or a stream), the "pointer" is automatically moved with the ammount of data you just read, so you can skip the "seek". All the functions used to read data from a file return the amount of data that was actually read: you can use that to know when you reached the end of the file without knowing the size of the file to start with!
Here's an idea of how you can read an physical disk to a file, without knowing much about the disk device, using Delphi's TFileStream:
var DiskStream, DestinationStream:TFileStream;
Buff:array[0..512-1] of Byte;
BuffRead:Integer;
begin
// Open the disk for reading
DiskStream := TFileStream.Create('\\.\PhysicalDrive0', fmOpenRead);
try
// Create the file
DestinationStream := TFileStream.Create('D:\Something.IMG', fmCreate);
try
// Read & write in a loop; This is where all the work's done:
BuffRead := DiskStream.Read(Buff, SizeOf(Buff));
while BuffRead > 0 do
begin
DestinationStream.Write(Buff, BuffRead);
BuffRead := DiskStream.Read(Buff, SizeOf(Buff));
end;
finally DestinationStream.Free;
end;
finally DiskStream.Free;
end;
end;
You can obviously do something similar the other way around, reading from a file and writing to disk. Before writing that code I actually attempted doing it your way (getting the file size, etc), and immediately ran into problems! Apparently Windows doesn't know the exact size of the "file", not unless you read from it.
Problems with disks opened as files
For all my testing I used this simple code as the base:
var F: TFileStream;
begin
F := TFileStream.Create('\\.\PhysicalDrive0', fmOpenRead);
try
// Test code goes here...
finally F.Free;
end;
end;
The first (obvious) thing to try was:
ShowMessage(IntToStr(DiskStream.Size));
That fails. In the TFileStream implementation that depends on calling FileSeek, and FileSeek can't handle files larger then 2Gb. So I gave GetFileSize a try, using this code:
var RetSize, UpperWord:DWORD;
RetSize := GetFileSize(F.Handle, #UpperWord);
ShowMessage(IntToStr(UpperWord) + ' / ' + IntToStr(RetSize));
That also fails, even those it should be perfectly capable of returning file size as an 64 bit number! Next I tried using the SetFilePointer API, because that's also supposed to handle 64bit numbers. I thought I'd simply seek to the end of the file and look at the result, using this code:
var RetPos, UpperWord:DWORD;
UpperWord := 0;
RetPos := SetFilePos(F.Handle, 0, #UpperWord, FILE_END);
ShowMessage(IntToStr(UpperWord) + ' / ' + IntToStr(RetPos));
This code also fails! And now I'm thinking, why did the first code work? Apparently reading block-by-block works just fine and Windows knows exactly when to stop reading!! So I thought maybe there's a problem with the implementation of the 64 bit file handling routines, let's try seeking to end of the file in small increments; When we get an error seeking we know we reached the end we'll stop:
var PrevUpWord, PrevPos: DWORD;
UpWord, Pos: DWORD;
UpWord := 0;
Pos := SetFilePointer(F.Handle, 1024, #UpWord, FILE_CURRENT); // Advance the pointer 512 bytes from it's current position
while (UpWord <> PrevUpWord) or (Pos <> PrevPos) do
begin
PrevUpWord := UpWord;
PrevPos := Pos;
UpWord := 0;
Pos := SetFilePointer(F.Handle, 1024, #UpWord, FILE_CURRENT);
end;
When trying this code I had a surprise: It doesn't stop at the of the file, it just goes on and on, for ever. It never fails. To be perfectly honest I'm not sure it's supposed to ever fail... It's probably not supposed to fail. Anyway, doing a READ in that loop fails when we're past the end of file so we can use a VERY hacky mixed approach to handle this situation.
Ready-made routines that work around the problem
Here's the ready-made routine that gets the size of the physical disk opened as a file, even when GetFileSize fails, and SetFilePointer with FILE_END fails. Pass it an opened TFileStream and it will return the size as an Int64:
function Hacky_GetStreamSize(F: TFileStream): Int64;
var Step:DWORD;
StartPos: Int64;
StartPos_DWORD: packed array [0..1] of DWORD absolute StartPos;
KnownGoodPosition: Int64;
KGP_DWORD: packed array [0..1] of DWORD absolute KnownGoodPosition;
Dummy:DWORD;
Block:array[0..512-1] of Byte;
begin
// Get starting pointer position
StartPos := 0;
StartPos_DWORD[0] := SetFilePointer(F.Handle, 0, #StartPos_DWORD[1], FILE_CURRENT);
try
// Move file pointer to the first byte
SetFilePointer(F.Handle, 0, nil, FILE_BEGIN);
// Init
KnownGoodPosition := 0;
Step := 1024 * 1024 * 1024; // Initial step will be 1Gb
while Step > 512 do
begin
// Try to move
Dummy := 0;
SetFilePointer(F.Handle, Step, #Dummy, FILE_CURRENT);
// Test: Try to read!
if F.Read(Block, 512) = 512 then
begin
// Ok! Save the last known good position
KGP_DWORD[1] := 0;
KGP_DWORD[0] := SetFilePointer(F.Handle, 0, #KGP_DWORD[1], FILE_CURRENT);
end
else
begin
// Read failed! Move back to the last known good position and make Step smaller
SetFilePointer(F.Handle, KGP_DWORD[0], #KGP_DWORD[1], FILE_BEGIN);
Step := Step div 4; // it's optimal to devide by 4
end;
end;
// From here on we'll use 512 byte steps until we can't read any more
SetFilePointer(F.Handle, KGP_DWORD[0], #KGP_DWORD[1], FILE_BEGIN);
while F.Read(Block, 512) = 512 do
KnownGoodPosition := KnownGoodPosition + 512;
// Done!
Result := KnownGoodPosition;
finally
// Move file pointer back to starting position
SetFilePointer(F.Handle, StartPos_DWORD[0], #StartPos_DWORD[1], FILE_BEGIN);
end;
end;
To be complete, here are two routines that may be used to set and get the file pointer using Int64 for positioning:
function Hacky_SetStreamPos(F: TFileStream; Pos: Int64):Int64;
var aPos:Int64;
DWA:packed array[0..1] of DWORD absolute aPos;
const INVALID_SET_FILE_POINTER = $FFFFFFFF;
begin
aPos := Pos;
DWA[0] := SetFilePointer(F.Handle, DWA[0], #DWA[1], FILE_BEGIN);
if (DWA[0] = INVALID_SET_FILE_POINTER) and (GetLastError <> NO_ERROR) then
RaiseLastOSError;
Result := aPos;
end;
function Hacky_GetStreamPos(F: TFileStream): Int64;
var Pos:Int64;
DWA:packed array[0..1] of DWORD absolute Pos;
begin
Pos := 0;
DWA[0] := SetFilePointer(F.Handle, 0, #DWA[1], FILE_CURRENT);
Result := Pos;
end;
Last notes
The 3 routines I'm providing take as a parameter an TFileStream, because that's what I use for file reading and writing. They obviously only use TFileStream.Handle, so the parameter can simply be replaced with an file handle: the functionality would stay the same.
I know this thread is old, but...
One small suggestion - if you use the Windows DeviceIoControl(...) function you can get Drive Geometry and/or Partition Information, and use them to get the total size/length of the opened drive or partition. No more messing around with incrementally seeking to the end of the device.
Those IOCTLs can also be used to give you the correct volume sector size, and you could use that instead of defaulting to 512 everywhere.
Very very useful. But I got a problem for disks greater then 4 GB.
I solved replacing:
// Ok! Save the last known good position
KGP_DWORD[1] := 0;
KGP_DWORD[0] := SetFilePointer(F.Handle, 0, #KGP_DWORD[1], FILE_CURRENT);
with the following:
// Ok! Save the last known good position
KnownGoodPosition := KnownGoodPosition + Step;
Many thanks again...
And many thanks also to James R. Twine. I followed the advice of using IOCTL_DISK_GET_DRIVE_GEOMETRY_EX and got disk dimension with no problem and no strange workaround.
Here is the code:
TDISK_GEOMETRY = record
Cylinders : Int64; //LargeInteger
MediaType : DWORD; //MEDIA_TYPE
TracksPerCylinder: DWORD ;
SectorsPerTrack: DWORD ;
BytesPerSector : DWORD ;
end;
TDISK_GEOMETRY_EX = record
Geometry: TDISK_GEOMETRY ;
DiskSize: Int64; //LARGE_INTEGER ;
Data : array[1..1000] of byte; // unknown length
end;
function get_disk_size(handle: thandle): int64;
var
BytesReturned: DWORD;
DISK_GEOMETRY_EX : TDISK_GEOMETRY_EX;
begin
result := 0;
if DeviceIOControl(handle,IOCTL_DISK_GET_DRIVE_GEOMETRY_EX,
nil,0,#DISK_GEOMETRY_EX, sizeof(TDISK_GEOMETRY_EX),BytesReturned,nil)
then result := DISK_GEOMETRY_EX.DiskSize;
end;