TFileStream read huge files piece by piece - delphi

Earlier today I opened a question here asking if my method to scan files in computer was correct. As solution, I received a few tips, and the one of the solutions I thought: "this need to be solved urgent!", was saying about memory overflow, once I was reading the files entirely in memory. So I started trying to find a way to read the files piece by piece, and I got something (wrong/bogus), that I need some help to figure out how to do this correctly.
The method is simple like this for now:
procedure ScanFile(FileName: string);
const
MAX_SIZE = 100*1024*1024;
var
i, aux, ReadLimit: integer;
MyFile: TFileStream;
Target: AnsiString;
PlainText: String;
Buff: array of byte;
TotalSize: Int64;
begin
if (POS('.exe', FileName) = 0) and (POS('.dll', FileName) = 0) and
(POS('.sys', FileName) = 0) then //yeah I know it's not the best way...
begin
try
MyFile:= TFileStream.Create(FileName, fmOpenRead);
except on E: EFOpenError do
MyFile:= NIL;
end;
if MyFile <> NIL then
try
TotalSize:= MyFile.Size;
while TotalSize > 0 do begin
ReadLimit:= Min(TotalSize, MAX_SIZE);
SetLength(Buff, ReadLimit);
MyFile.ReadBuffer(Buff[0], ReadLimit);
PlainText:= RemoveNulls(Buff); //this is to transform the array of bytes in string, I posted the code below too...
for i:= 1 to Length(PlainText) do
begin //Begin the search..
end;
dec(TotalSize, ReadLimit);
end;
finally
MyFile.Free;
end;
end;
Code for RemoveNulls is:
function RemoveNulls(const Buff: array of byte): String;
var
i: integer;
begin
for i:= 0 to Length(Buff) do
begin
if Buff[i] <> 0 then
Result:= Result + Chr(Ord(Buff[i]));
end;
end;
Ok, the problems I got with this code so far was:
1- each time the while is repeated, I get more memory consumed, when I was expecting to get only MAX 100MB as described in the MAX_SIZE variable, right?
2- I created a file with 2 occurrences of what should be filtered, and for some unknown reason I got about 10 repeated occurrences, looks like I'm scanning the file repeatedly.
I appreciate your help guys, and if someone have this kind of code already done, post here please, I don't pretend to re-create the wheel...

I'd say that RemoveNulls is your problem. Suppose that you just read 100MB into a string that you passed to RemoveNulls. You would then allocate a string of length 1. The reallocate to length 2. Then to length 3. Then to length 4. And so on, all the way to length 100*1024*1024.
That process will fragment your memory, as well as being appallingly slow. Heap allocation is to be avoided when performance matters. You've no need for it at all. Read a chunk of the file, and search directly in the buffer that you read.
There are various problems with your code that I can see:
Your file extension check is broken, as I described in your previous question.
You are not handling exceptions correctly, as I described in your previous question.
Your for loop in RemoveNulls has buffer overrun. Loop from low() to high().
It's not possible to comment on the search code since that's not present in the question.

Related

Delphi Getting Stream read error on really simple program

I am experiencing a strange error which I have narrowed down to a small piece of code (see below). I am reading a binary file into a TMemoryStream (variable ms1), through a TFileStream object (variable fs). Then I want to copy the binary data to another TMemoryStream object (ms2). This is what gives me the "stream read error" exception. Strange thing is, that if I don't load up the ms1 object with the file contents, things work fine, i.e. ms2.CopyFrom does not give me an exception..
Any help is greatly appreciated....
procedure TForm5.BitBtn1Click(Sender: TObject);
var
ms1: TMemoryStream;
fs: TFileStream;
ms2 : TMemoryStream;
FilePath: string;
begin
FilePath := 'C:\weekcpdf_tech6.bin';
ms1 := TMemoryStream.Create;
fs := nil;
try
ms1 := TMemoryStream.Create;
fs := TFileStream.Create(FilePath, fmOpenRead);
ms1.CopyFrom(fs, fs.Size);
ms2 := TMemoryStream.Create;
ms2.CopyFrom(ms1, ms1.Size);
finally
FreeAndNil(fs);
FreeAndNil(ms1);
end;
end;
First, you have a memory leak, since you create two memory stream objects that you assign to ms1. Remove the second one.
Second, after ms1.CopyFrom(fs, fs.Size); you must set ms1.Position := 0 so that you copy from the start of ms1 -- and not from the end of it.
This is actually fairly well documented:
CopyFrom copies Count bytes from the stream specified by Source into the stream. It then moves the current position by Count bytes and returns the number of bytes copied.
Hence, after ms1.CopyFrom(fs, fs.Size); you are at the end of ms1. Further,
If Count is greater than or less than 0, CopyFrom reads from the current position in Source.
Therefore, ms2.CopyFrom(ms1, ms1.Size); will read from the current position (= the end!) of ms1. Hence, you will likely try to read a lot of bytes that do not exist. And what happens then?
Because the CopyFrom method uses ReadBuffer and WriteBuffer to do the effective copying, if the Count is greater than the SourceStream size, ReadBuffer throws an exception stating that a stream read error has occured [sic!].
Always read the docs! :)
(Although I must admit that the last quote isn't quite perfect in this situation.)

Copying File Fails in when open in fmOpenReadWriteMode

i am working on a little byte patching program but i encountered an error.
copying the file before modifying fails with no error, (no copied output is seen) but the file patches successfully.
Here is the Patch Code
procedure DoMyPatch();
var
i: integer;
FileName: string;
input: TFileStream;
FileByteArray, ExtractedByteArray: array of Byte;
begin
FileName := 'Cute1.res';
try
input := TFileStream.Create(FileName, fmOpenReadWrite);
except
begin
ShowMessage('Error Opening file');
Exit;
end
end;
input.Position := 0;
SetLength(FileByteArray, input.size);
input.Read(FileByteArray[0], Length(FileByteArray));
for i := 0 to Length(FileByteArray) do
begin
SetLength(ExtractedByteArray, Length(OriginalByte));
ExtractedByteArray := Copy(FileByteArray, i, Length(OriginalByte));
// function that compares my array of bytes
if CompareByteArrays(ExtractedByteArray, OriginalByte) = True then
begin
// Begin Patching
CopyFile(PChar(FileName), PChar(ChangeFileExt(FileName, '.BAK')),
true); =======>>> fails at this point, no copied output is seen.
input.Seek(i, SoFromBeginning);
input.Write(BytetoWrite[0], Length(BytetoWrite)); =====>>> patches successfully
input.Free;
ShowMessage('Patch Success');
Exit;
end;
end;
if Assigned(input) then
begin
input.Free;
end;
ShowMessage('Patch Failed');
end;
sidenote : it copies fine if i close the filestream before attempting copy.
by the way, i have tested it on Delphi 7 and XE7.
Thanks
You cannot copy the file because you locked it exclusively when you opened it for the file stream, which is why CopyFile fails.
You should close the file before attempting to call CopyFile. Which would require you to reopen the file to patch it. Or perhaps open the file with a different sharing mode.
Some other comments:
The exception handling is badly implemented. Don't catch exceptions here. Let them float up to the high level.
Lifetime management is fluffed. You can easily leak as it stands. You need to learn about try/finally.
You overrun buffers. Valid indices for a dynamic array are 0 to Length(arr)-1 inclusive. Or use low() and high().
You don't check the value returned by CopyFile. Wrap it with a call to Win32Check.
The Copy function returns a new array. So you make a spurious call to SetLength. To copy the entire array use the one parameter overload of Copy.
Showing messages in this function is probably a mistake. Better to let the caller provide user feedback.
There are loads of other oddities in the code and I've run out of energy to point them all out. I think I got the main ones.

block read error

Can anybody please explain me why I am hitting 'I/O error 998' in the below block read?
function ReadBiggerFile: string;
var
biggerfile: file of char;
BufArray: array [1 .. 4096] of char; // we will read 4 KB at a time
nrcit, i: integer;
sir, path: string;
begin
path := ExtractFilePath(application.exename);
assignfile(biggerfile, path + 'asd.txt');
reset(biggerfile);
repeat
blockread(biggerfile, BufArray, SizeOf(BufArray), nrcit);
for i := 1 to nrcit do
begin
sir := sir + BufArray[i];
Form4.Memo1.Lines.Add(sir);
end;
until (nrcit = 0);
closefile(biggerfile);
ReadBiggerFile := sir;
end;
I think you miss-tagged the question and you're using Delphi 2009+, not Delphi 7. I got the error in the title bar trying your exact code on Delphi 2010 (unicode Delphi). When you say:
var biggerfile: file of Char;
You're declaring the biggerfile to be a file of "records", where each record is a Char. On Unicode Delphi that's 2 bytes. You later request to read SizeOf(BufArray) records, not bytes. That is, you request to 4096 x 2 = 8192 records. But your buffer is only 4096 records long, so you get a weird error.
I was able to fix your code by simply replacing Char with AnsiChar, since AnsiChar has a size of 1, hence the SizeOf() equals Length().
The permanent fix should involve moving from the very old Pascal-style file operations to something modern, TStream based. I'm not sure exactly what you're trying to obtain, but if you simply want to get the content of the file in a string, may I suggest something like this:
function ReadBiggerFile: AnsiString;
var
biggerfile: TFileStream;
begin
biggerfile := TFileStream.Create('C:\Users\Cosmin Prund\Downloads\AppWaveInstall201_385.exe', fmOpenRead or fmShareDenyWrite);
try
SetLength(Result, biggerfile.Size);
biggerfile.Read(Result[1], biggerfile.Size);
finally biggerfile.Free;
end;
end;
Hi: I had the same issue and i simply passed it the first element of the buffer which is the starting point for the memory block like so:
AssignFile(BinFile,binFileName);
reset(BinFile,sizeof(Double));
Aux:=length(numberArray);
blockread(BinFile,numberArray[0],Aux, numRead);
closefile(BinFile);

Is Valid IMAGE_DOS_SIGNATURE

I want to check a file has a valid IMAGE_DOS_SIGNATURE (MZ)
function isMZ(FileName : String) : boolean;
var
Signature: Word;
fexe: TFileStream;
begin
result:=false;
try
fexe := TFileStream.Create(FileName, fmOpenRead or fmShareDenyNone);
fexe.ReadBuffer(Signature, SizeOf(Signature));
if Signature = $5A4D { 'MZ' } then
result:=true;
finally
fexe.free;
end;
end;
I know I can use some code in Windows unit to check the IMAGE_DOS_SIGNATURE. The problem is I want the fastest way to check IMAGE_DOS_SIGNATURE (for a big file). I need your some suggestion about my code or maybe a new code?
Thanks
The size of the file doesn't matter because your code only reads the first two bytes.
Any overhead from allocating and using a TFileStream, which goes through SysUtils.FileRead before reaching Win32 ReadFile, ought to be all but invisible noise compared to the cost of seeking in the only situation where it should matter, where you're scanning through hundreds of executables.
There might possibly be some benefit in tweaking Windows' caching by using the raw WinAPI, but I would expect it to be very marginal.

Fast read/write from file in delphi

I am loading a file into a array in binary form this seems to take a while
is there a better faster more efficent way to do this.
i am using a similar method for writing back to the file.
procedure openfile(fname:string);
var
myfile: file;
filesizevalue,i:integer;
begin
assignfile(myfile,fname);
filesizevalue:=GetFileSize(fname); //my method
SetLength(dataarray, filesizevalue);
i:=0;
Reset(myFile, 1);
while not Eof(myFile) do
begin
BlockRead(myfile,dataarray[i], 1);
i:=i+1;
end;
CloseFile(myfile);
end;
If your really want to read a binary file fast, let windows worry about buffering ;-) by using Memory Mapped Files. Using this you can simple map a file to a memory location an read like it's an array.
Your function would become:
procedure openfile(fname:string);
var
InputFile: TMappedFile;
begin
InputFile := TMappedFile.Create;
try
InputFile.MapFile(fname);
SetLength(dataarray, InputFile.Size);
Move(PByteArray(InputFile.Content)[0], Result[0], InputFile.Size);
finally
InputFile.Free;
end;
end;
But I would suggest not using the global variable dataarray, but either pass it as a var in the parameter, or use a function which returns the resulting array.
procedure ReadBytesFromFile(const AFileName : String; var ADestination : TByteArray);
var
InputFile : TMappedFile;
begin
InputFile := TMappedFile.Create;
try
InputFile.MapFile(AFileName);
SetLength(ADestination, InputFile.Size);
Move(PByteArray(InputFile.Content)[0], ADestination[0], InputFile.Size);
finally
InputFile.Free;
end;
end;
The TMappedFile is from my article Fast reading of files using Memory Mapping, this article also contains an example of how to use it for more "advanced" binary files.
You generally shouldn't read files byte for byte. Use BlockRead with a larger value (512 or 1024 often are best) and use its return value to find out how many bytes were read.
If the size isn't too large (and your use of SetLength seems to support this), you can also use one BlockRead call reading the complete file at once. So, modifying your approach, this would be:
AssignFile(myfile,fname);
filesizevalue := GetFileSize(fname);
Reset(myFile, 1);
SetLength(dataarray, filesizevalue);
BlockRead(myFile, dataarray[0], filesizevalue);
CloseFile(myfile);
Perhaps you could also change the procedure to a boolean function named OpenAndReadFile and return false if the file couldn't be opened or read.
It depends on the file format. If it consists of several identical records, you can decide to create a file of that record type.
For example:
type
TMyRecord = record
fieldA: integer;
..
end;
TMyFile = file of TMyRecord;
const
cBufLen = 100 * sizeof(TMyRecord);
var
file: TMyFile;
i : Integer;
begin
AssignFile(file, filename);
Reset(file);
i := 0;
try
while not Eof(file) do begin
BlockRead(file, dataarray[i], cBufLen);
Inc(i, cBufLen);
end;
finally
CloseFile(file);
end;
end;
If it's a long enough file that reading it this way takes a noticeable amount of time, I'd use a stream instead. The block read will be a lot faster, and there's no loops to worry about. Something like this:
procedure openfile(fname:string);
var
myfile: TFileStream;
filesizevalue:integer;
begin
filesizevalue:=GetFileSize(fname); //my method
SetLength(dataarray, filesizevalue);
myFile := TFileStream.Create(fname);
try
myFile.seek(0, soFromBeginning);
myFile.ReadBuffer(dataarray[0], filesizevalue);
finally
myFile.free;
end;
end;
It appears from your code that your record size is 1 byte long. If not, then change the read line to:
myFile.ReadBuffer(dataarray[0], filesizevalue * SIZE);
or something similar.
Look for a buffered TStream descendant. It will make your code a lot faster as the disk read is done fast, but you can loop through the buffer easily. There are various about, or you can write your own.
If you're feeling very bitheaded, you can bypass Win32 altogether and call the NT Native API function ZwOpenFile() which in my informal testing does shave a tiny bit off. Otherwise, I'd use Davy's Memory Mapped File solution above.

Resources