Related
I wonder, why this code does not catch a 'Disk Full' error like it should?
This is important because the user may lose their data if they do not notice that the saving failed.
I don't get this...
procedure TForm2.Button1Click(Sender: TObject);
var
Writer: TStreamWriter;
n : integer;
begin
Writer := TStreamWriter.Create('MyUTF8Text.txt', false, TEncoding.UTF8);
Try //Finally
Try //Except
for n := 1 to 1000 do
begin
Writer.WriteLine('Testing text writing to the UTF-8 file.');
end;
Except
on E: Exception do
begin
ShowMessage('Exception Class name: ' + E.ClassName);
ShowMessage('Exception Message: ' + E.Message);
end;
end; // except
Finally
Writer.Free();
End; //finally
end;
"BTW. 'Writer might not be initialized' warning, is it serious really?"
Edit: There was this warning because TStreamWriter.Create was after TRY.
Thanks for your advice I corrected that line of code to the correct location before(!) the TRY.
Try This:
procedure WriteStream;
var
Writer: TStreamWriter;
fs: TFileStream;
s: String;
n, bytesWritten : integer;
begin
bytesWritten := 0;
fs := TFileStream.Create('MyUTF8Text.txt', fmCreate);
try
//avoid warning by initiaizing before try
Writer := TStreamWriter.Create(fs);
try //Finally
try //Except
for n := 1 to 10 do
begin
s := 'Testing text writing to the UTF-8 file.' + '#13#10';
//keep count of bytes written
bytesWritten := bytesWritten + TEncoding.UTF8.GetByteCount(s);
Writer.Write(s);
end;
Writer.Flush;
Writer.Close;
//Check stream size to make sure all bytes written
if bytesWritten <> Writer.BaseStream.Size then
raise Exception.Create(String.Format('Expected %d bytes, wrote %d', [Writer.BaseStream.Size, bytesWritten]));
except
on E: Exception do
begin
Showmessage('Exception Class name: ' + E.ClassName);
Showmessage('Exception Message: ' + E.Message);
end;
end; // except
finally
Writer.Free; // Will only free if it has been constructed
end; //finally
finally
fs.Free;
end;
end;
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
When I examine more than 9000 files with the following Delphi code, I get this error:
Please help me fix this error.
I use Delphi 10.2 Tokyo, and a MacBook with 8 gb RAM and an i5 CPU.
Freeandnill tested
Stream Free;
MemStr Free;
It looks like your post is mostly code; please add some more details I have not more info
Tested
// detect image type
var
Form1: TForm1;
JPG_HEADER: array[0..2] of byte = ($FF, $D8, $FF);
GIF_HEADER: array[0..2] of byte = ($47, $49, $46);
BMP_HEADER: array[0..1] of byte = ($42, $4D);
PNG_HEADER: array[0..3] of byte = ($89, $50, $4E, $47);
TIF_HEADER: array[0..2] of byte = ($49, $49, $2A);
TIF_HEADER2: array[0..2] of byte = (77, 77, 00);
type
TImageType = (ifUnknown, ifJPG, ifGIF, ifBMP, ifPNG, ifTIF);
implementation
{$R *.dfm}
function TypeToStr(ImageType: TImageType): String;
begin
case ImageType of
ifJPG: Result := 'Image/JPEG';
ifGIF: Result := 'Image/GIF';
ifPNG: Result := 'Image/PNG';
ifBMP: Result := 'Image/BMP';
ifTIF: Result := 'Image/TIFF';
else
Result := 'Unknown Type';
end;
end;
function GetImageType(FileName: String): TImageType;
var
Stream: TFileStream;
MemStr: TMemoryStream;
buf: integer;
tmp: string;
begin
Result := ifUnknown;
Stream := TFileStream.Create(FileName, fmOpenRead);
MemStr := TMemoryStream.Create;
try
MemStr.CopyFrom(Stream, 5);
if MemStr.Size > 4 then
begin
// uncomment these lines to detect "unknown types"
// MemStr.Position:=0;
// MemStr.Read(buf,1);
// showmessage(inttostr(ord(buf)));
// MemStr.Read(buf,1);
// showmessage(inttostr(ord(buf)));
// MemStr.Read(buf,1);
// showmessage(inttostr(ord(buf)));
if CompareMem(MemStr.Memory, #JPG_HEADER, SizeOf(JPG_HEADER)) then
Result := ifJPG
else if CompareMem(MemStr.Memory, #GIF_HEADER, SizeOf(GIF_HEADER)) then
Result := ifGIF
else if CompareMem(MemStr.Memory, #PNG_HEADER, SizeOf(PNG_HEADER)) then
Result := ifPNG
else if CompareMem(MemStr.Memory, #BMP_HEADER, SizeOf(BMP_HEADER)) then
Result := ifBMP
else if CompareMem(MemStr.Memory, #TIF_HEADER, SizeOf(TIF_HEADER)) then
Result := ifTIF
else if CompareMem(MemStr.Memory, #TIF_HEADER2, SizeOf(TIF_HEADER2)) then
Result := ifTIF;
end;
finally
Stream.Free;
MemStr.Free;
end;
end;
//Run Cod
procedure TForm1.Button1Click(Sender: TObject);
var
FileName: String;
it: TImageType;
begin
if OpenDialog1.Execute then
begin
FileName := OpenDialog1.FileName;
it := GetImageType(FileName);
Label1.Caption := TypeToStr(it);
end;
end;
You are asking the TStream.CopyFrom() method to read exactly 5 bytes. Internally, it uses the TStream.ReadBuffer() method, which raises a stream error if the exact number of bytes requested is not read. For instance, if you try to read from a file that is less than 5 bytes in size.
In comments, you show that you have a loop that calls GetFileSize() before calling GetImageType(). But that loop is checking the file size for <> 0 when it should be checking for >= 5 instead. Your TForm1.Button1Click() method is not checking GetFileSize() at all before calling GetImageType().
That being said, in GetImageType(), you don't need the TMemoryStream at all. Use a local byte[] array instead, and call the TFileStream.Read() method (not ReadBuffer()!) to populate it. The return value tells you the actual number of bytes read. Use that size when checking your image signatures. You don't need GetFileSize() at all (which BTW, is easier to implement using SysUtils.FindFirst() instead of actually opening the file and querying its size). GetImageType() should simply return ifUnknown if the requested file cannot be accessed (wrap the TFileStream.Create in a try/except) or is too small.
Try this:
type
TImageType = (ifUnknown, ifJPG, ifGIF, ifBMP, ifPNG, ifTIF);
...
function TypeToStr(ImageType: TImageType): String;
begin
case ImageType of
ifJPG: Result := 'Image/JPEG';
ifGIF: Result := 'Image/GIF';
ifPNG: Result := 'Image/PNG';
ifBMP: Result := 'Image/BMP';
ifTIF: Result := 'Image/TIFF';
else
Result := 'Unknown Type';
end;
end;
function GetImageType(FileName: String): TImageType;
const
JPG_HEADER: array[0..2] of byte = ($FF, $D8, $FF);
GIF_HEADER: array[0..2] of byte = ($47, $49, $46);
BMP_HEADER: array[0..1] of byte = ($42, $4D);
PNG_HEADER: array[0..3] of byte = ($89, $50, $4E, $47);
TIF_HEADER: array[0..2] of byte = ($49, $49, $2A);
TIF_HEADER2: array[0..2] of byte = ($4D, $4D, $00);
var
Stream: TFileStream;
buf: array[0..3] of Byte;
bufsize: Integer;
function MatchesSignature(const signature; signatureSize: Integer): Boolean;
begin
Result := (bufsize >= signatureSize) and CompareMem(#buf, #signature, signatureSize);
end;
begin
Result := ifUnknown;
try
Stream := TFileStream.Create(FileName, fmOpenRead or fmShareDenyWrite);
try
bufsize := Stream.Read(buf, SizeOf(buf));
if bufsize < 2 then Exit;
finally
Stream.Free;
end;
except
Exit;
end;
// uncomment these lines to detect "unknown types"
// ShowMessage(IntToStr(buf[0]));
// Showmessage(IntToStr(buf[1]));
// Showmessage(IntToStr(buf[2]));
if MatchesSignature(JPG_HEADER, SizeOf(JPG_HEADER)) then
Result := ifJPG
else if MatchesSignature(GIF_HEADER, SizeOf(GIF_HEADER)) then
Result := ifGIF
else if MatchesSignature(PNG_HEADER, SizeOf(PNG_HEADER)) then
Result := ifPNG
else if MatchesSignature(BMP_HEADER, SizeOf(BMP_HEADER)) then
Result := ifBMP
else if MatchesSignature(TIF_HEADER, SizeOf(TIF_HEADER)) then
Result := ifTIF
else if MatchesSignature(TIF_HEADER2, SizeOf(TIF_HEADER2)) then
Result := ifTIF;
end;
You perform MemStr.CopyFrom(Stream, 5); without checking whether stream size is large enough.
So opening of empty or too short file gives that error.
Stream error arises due to reading beyond the end of file.
Try to check Stream.Size before copying
I am trying to compile the Magenta packet capturing units to 64 bit.
https://www.magsys.co.uk/delphi/magmonsock.asp
When I compile I get the dreaded Left side cannot be assigned to on the following line of code
LongWord(bp) := LongWord(bp) + BPF_WORDALIGN(caplen + hdrlen);
This is because the LongWord cast is different byte length in 64 right? Can anyone help properly fix the line so it compiles happily in 32 and 64 bit versions? bp is declared as a pointer. caplen and hdrlen are declared as integers.
The BPF_WORDALIGN function is
function BPF_WORDALIGN(X:LongWord) : LongWord;
begin
result := (((X)+(BPF_ALIGNMENT-1))and not(BPF_ALIGNMENT-1));
end;
Thanks for any help to get this working.
Here is the full procedure with the faulty line if that helps;
function pcap_read( p:PPcap;cnt:integer;CallBack:Tpcap_handler;User:pointer)
: integer;
var cc : Longword;//Counter ?
n : integer;
bp,ep: pointer; //Begin and End Point ?
//bhp : Pbpf_hdr;//pointer to BPF header struct - removed by Lars Peter
hdrlen, //Length of Header
caplen: integer;//Length of captured
begin
if NOT LoadPacketDll then
begin
p.errbuf := 'Cannot load packet.dll';
result:=-1;
exit;
end;
cc := p.cc;
n := 0;
if p.cc = 0 then
begin
// *Capture the Packets*
if PacketReceivePacket(p.adapter,p.packet,TRUE)=false then
begin
// ERROR!
p.errbuf :='Read Error: PacketRecievePacket failed';
result:=-1;
exit;
end;
cc := p.packet.ulBytesReceived;
bp := p.buffer;
end else bp := p.bp;
// Loop through each packet.
ep := ptr(longword(bp)+cc); //move end pointer
while (longword(bp) < longword(ep) ) do
begin
caplen := Pbpf_hdr(bp).bh_caplen;
hdrlen := Pbpf_hdr(bp).bh_hdrlen;
// XXX A bpf_hdr matches apcap_pkthdr.
callback(user,
Ppcap_pkthdr(bp),
ptr(longword(bp)+longword(HdrLen)));
LongWord(bp) := LongWord(bp) + BPF_WORDALIGN(caplen + hdrlen);
inc(n);
if (n >= cnt)and(cnt>0) then
begin
p.bp := bp;
p.cc := longword(ep)-longword(bp);
result := n;
exit;
end;
end;
p.cc := 0;
result:=n;
end;
I need to read the last line in some very large textfiles (to get the timestamp from the data). TStringlist would be a simple approach but it returns an out of memory error. I'm trying to use seek and blockread, but the characters in the buffer are all nonsense. Is this something to do with unicode?
Function TForm1.ReadLastLine2(FileName: String): String;
var
FileHandle: File;
s,line: string;
ok: 0..1;
Buf: array[1..8] of Char;
k: longword;
i,ReadCount: integer;
begin
AssignFile (FileHandle,FileName);
Reset (FileHandle); // or for binary files: Reset (FileHandle,1);
ok := 0;
k := FileSize (FileHandle);
Seek (FileHandle, k-1);
s := '';
while ok<>1 do begin
BlockRead (FileHandle, buf, SizeOf(Buf)-1, ReadCount); //BlockRead ( var FileHandle : File; var Buffer; RecordCount : Integer {; var RecordsRead : Integer} ) ;
if ord (buf[1]) <>13 then //Arg to integer
s := s + buf[1]
else
ok := ok + 1;
k := k-1;
seek (FileHandle,k);
end;
CloseFile (FileHandle);
// Reverse the order in the line read
setlength (line,length(s));
for i:=1 to length(s) do
line[length(s) - i+1 ] := s[i];
Result := Line;
end;
Based on www.delphipages.com/forum/showthread.php?t=102965
The testfile is a simple CSV I created in excel ( this is not the 100MB I ultimately need to read).
a,b,c,d,e,f,g,h,i,j,blank
A,B,C,D,E,F,G,H,I,J,blank
1,2,3,4,5,6,7,8,9,0,blank
Mary,had,a,little,lamb,His,fleece,was,white,as,snow
And,everywhere,that,Mary,went,The,lamb,was,sure,to,go
You really have to read the file in LARGE chunks from the tail to the head.
Since it is so large it does not fit the memory - then reading it line by line from start to end would be very slow. With ReadLn - twice slow.
You also has to be ready that the last line might end with EOL or may not.
Personally I would also account for three possible EOL sequences:
CR/LF aka #13#10=^M^J - DOS/Windows style
CR without LF - just #13=^M - Classic MacOS file
LF without CR - just #10=^J - UNIX style, including MacOS version 10
If you are sure your CSV files would only ever be generated by native Windows programs it would be safe to assume full CR/LF be used. But if there can be other Java programs, non-Windows platforms, mobile programs - I would be less sure. Of course pure CR without LF would be the least probable case of them all.
uses System.IOUtils, System.Math, System.Classes;
type FileChar = AnsiChar; FileString = AnsiString; // for non-Unicode files
// type FileChar = WideChar; FileString = UnicodeString;// for UTF16 and UCS-2 files
const FileCharSize = SizeOf(FileChar);
// somewhere later in the code add: Assert(FileCharSize = SizeOf(FileString[1]);
function ReadLastLine(const FileName: String): FileString; overload; forward;
const PageSize = 4*1024;
// the minimal read atom of most modern HDD and the memory allocation atom of Win32
// since the chances your file would have lines longer than 4Kb are very small - I would not increase it to several atoms.
function ReadLastLine(const Lines: TStringDynArray): FileString; overload;
var i: integer;
begin
Result := '';
i := High(Lines);
if i < Low(Lines) then exit; // empty array - empty file
Result := Lines[i];
if Result > '' then exit; // we got the line
Dec(i); // skip the empty ghost line, in case last line was CRLF-terminated
if i < Low(Lines) then exit; // that ghost was the only line in the empty file
Result := Lines[i];
end;
// scan for EOLs in not-yet-scanned part
function FindLastLine(buffer: TArray<FileChar>; const OldRead : Integer;
const LastChunk: Boolean; out Line: FileString): boolean;
var i, tailCRLF: integer; c: FileChar;
begin
Result := False;
if Length(Buffer) = 0 then exit;
i := High(Buffer);
tailCRLF := 0; // test for trailing CR/LF
if Buffer[i] = ^J then begin // LF - single, or after CR
Dec(i);
Inc(tailCRLF);
end;
if (i >= Low(Buffer)) and (Buffer[i] = ^M) then begin // CR, alone or before LF
Inc(tailCRLF);
end;
i := High(Buffer) - Max(OldRead, tailCRLF);
if i - Low(Buffer) < 0 then exit; // no new data to read - results would be like before
if OldRead > 0 then Inc(i); // the CR/LF pair could be sliced between new and previous buffer - so need to start a bit earlier
for i := i downto Low(Buffer) do begin
c := Buffer[i];
if (c=^J) or (c=^M) then begin // found EOL
SetString( Line, #Buffer[i+1], High(Buffer) - tailCRLF - i);
exit(True);
end;
end;
// we did not find non-terminating EOL in the buffer (except maybe trailing),
// now we should ask for more file content, if there is still left any
// or take the entire file (without trailing EOL if any)
if LastChunk then begin
SetString( Line, #Buffer[ Low(Buffer) ], Length(Buffer) - tailCRLF);
Result := true;
end;
end;
function ReadLastLine(const FileName: String): FileString; overload;
var Buffer, tmp: TArray<FileChar>;
// dynamic arrays - eases memory management and protect from stack corruption
FS: TFileStream; FSize, NewPos: Int64;
OldRead, NewLen : Integer; EndOfFile: boolean;
begin
Result := '';
FS := TFile.OpenRead(FileName);
try
FSize := FS.Size;
if FSize <= PageSize then begin // small file, we can be lazy!
FreeAndNil(FS); // free the handle and avoid double-free in finally
Result := ReadLastLine( TFile.ReadAllLines( FileName, TEncoding.ANSI ));
// or TEncoding.UTF16
// warning - TFIle is not share-aware, if the file is being written to by another app
exit;
end;
SetLength( Buffer, PageSize div FileCharSize);
OldRead := 0;
repeat
NewPos := FSize - Length(Buffer)*FileCharSize;
EndOfFile := NewPos <= 0;
if NewPos < 0 then NewPos := 0;
FS.Position := NewPos;
FS.ReadBuffer( Buffer[Low(Buffer)], (Length(Buffer) - OldRead)*FileCharSize);
if FindLastLine(Buffer, OldRead, EndOfFile, Result) then
exit; // done !
tmp := Buffer; Buffer := nil; // flip-flop: preparing to broaden our mouth
OldRead := Length(tmp); // need not to re-scan the tail again and again when expanding our scanning range
NewLen := Min( 2*Length(tmp), FSize div FileCharSize );
SetLength(Buffer, NewLen); // this may trigger EOutOfMemory...
Move( tmp[Low(tmp)], Buffer[High(Buffer)-OldRead+1], OldRead*FileCharSize);
tmp := nil; // free old buffer
until EndOfFile;
finally
FS.Free;
end;
end;
PS. Note one extra special case - if you would use Unicode chars (two-bytes ones) and would give odd-length file (3 bytes, 5 bytes, etc) - you would never be ble to scan the starting single byte (half-widechar). Maybe you should add the extra guard there, like Assert( 0 = FS.Size mod FileCharSize)
PPS. As a rule of thumb you better keep those functions out of the form class, - because WHY mixing them? In general you should separate concerns into small blocks. Reading file has nothing with user interaction - so should better be offloaded to an extra UNIT. Then you would be able to use functions from that unit in one form or 10 forms, in main thread or in multi-threaded application. Like LEGO parts - they give you flexibility by being small and separate.
PPPS. Another approach here would be using memory-mapped files. Google for MMF implementations for Delphi and articles about benefits and problems with MMF approach. Personally I think rewriting the code above to use MMF would greatly simplify it, removing several "special cases" and the troublesome and memory copying flip-flop. OTOH it would demand you to be very strict with pointers arithmetic.
https://en.wikipedia.org/wiki/Memory-mapped_file
https://msdn.microsoft.com/en-us/library/ms810613.aspx
http://torry.net/quicksearchd.php?String=memory+map&Title=No
Your char type is two byte, so that buffer is 16 byte. Then with blockread you read sizeof(buffer)-1 byte into it, and check the first 2 byte char if it is equal to #13.
The sizeof(buffer)-1 is dodgy (where does that -1 come from?), and the rest is valid, but only if your input file is utf16.
Also your read 8 (or 16) characters each time, but compare only one and then do a seek again. That is not very logical either.
If your encoding is not utf16, I suggest you change the type of a buffer element to ansichar and remove the -1
In response to kopiks suggestion, I figured out how to do it with TFilestream, it works ok with the simple test file, though there may be some further tweeks when I use it on a variety of csv files. Also, I don't make any claims that this is the most efficient method.
procedure TForm1.Button6Click(Sender: TObject);
Var
StreamSize, ApproxNumRows : Integer;
TempStr : String;
begin
if OpenDialog1.Execute then begin
TempStr := ReadLastLineOfTextFile(OpenDialog1.FileName,StreamSize, ApproxNumRows);
// TempStr := ReadFileStream('c:\temp\CSVTestFile.csv');
ShowMessage ('approximately '+ IntToStr(ApproxNumRows)+' Rows');
ListBox1.Items.Add(TempStr);
end;
end;
Function TForm1.ReadLastLineOfTextFile(const FileName: String; var StreamSize, ApproxNumRows : Integer): String;
const
MAXLINELENGTH = 256;
var
Stream: TFileStream;
BlockSize,CharCount : integer;
Hash13Found : Boolean;
Buffer : array [0..MAXLINELENGTH] of AnsiChar;
begin
Hash13Found := False;
Result :='';
Stream := TFileStream.Create(FileName, fmOpenRead or fmShareDenyWrite);
StreamSize := Stream.size;
if StreamSize < MAXLINELENGTH then
BlockSize := StreamSize
Else
BlockSize := MAXLINELENGTH;
// for CharCount := 0 to Length(Buffer)-1 do begin
// Buffer[CharCount] := #0; // zeroing the buffer can aid diagnostics
// end;
CharCount := 0;
Repeat
Stream.Seek(-(CharCount+3), 2); //+3 misses out the #0,#10,#13 at the end of the file
Stream.Read( Buffer[CharCount], 1);
Result := String(Buffer[CharCount]) + result;
if Buffer[CharCount] =#13 then
Hash13Found := True;
Inc(CharCount);
Until Hash13Found OR (CharCount = BlockSize);
ShowMessage(Result);
ApproxNumRows := Round(StreamSize / CharCount);
end;
Just thought of a new solution.
Again, there could be better ones, but this one is the best i thought of.
function GetLastLine(textFilePath: string): string;
var
list: tstringlist;
begin
list := tstringlist.Create;
try
list.LoadFromFile(textFilePath);
result := list[list.Count-1];
finally
list.free;
end;
end;
I am trying to find and replace text in a text file. I have been able to do this in the past with methods like:
procedure SmallFileFindAndReplace(FileName, Find, ReplaceWith: string);
begin
with TStringList.Create do
begin
LoadFromFile(FileName);
Text := StringReplace(Text, Find, ReplaceWith, [rfReplaceAll, rfIgnoreCase]);
SaveToFile(FileName);
Free;
end;
end;
The above works fine when a file is relatively small, however; when the the file size is something like 170 Mb the above code will cause the following error:
EOutOfMemory with message 'Out of memory'
I have tried the following with success, however it takes a long time to run:
procedure Tfrm_Main.button_MakeReplacementClick(Sender: TObject);
var
fs : TFileStream;
s : AnsiString;
//s : string;
begin
fs := TFileStream.Create(edit_SQLFile.Text, fmOpenread or fmShareDenyNone);
try
SetLength(S, fs.Size);
fs.ReadBuffer(S[1], fs.Size);
finally
fs.Free;
end;
s := StringReplace(s, edit_Find.Text, edit_Replace.Text, [rfReplaceAll, rfIgnoreCase]);
fs := TFileStream.Create(edit_SQLFile.Text, fmCreate);
try
fs.WriteBuffer(S[1], Length(S));
finally
fs.Free;
end;
end;
I am new to "Streams" and working with buffers.
Is there a better way to do this?
Thank You.
You have two mistakes in first code example and three - in second example:
Do not load whole large file in memory, especially in 32bit application. If file size more than ~1 Gb, you always get "Out of memory"
StringReplace slows with large strings, because of repeated memory reallocation
In second code you don`t use text encoding in file, so (for Windows) your code "think" that file has UCS2 encoding (two bytes per character). But what you get, if file encoding is Ansi (one byte per character) or UTF8 (variable size of char)?
Thus, for correct find&replace you must use file encoding and read/write parts of file, as LU RD said:
interface
uses
System.Classes,
System.SysUtils;
type
TFileSearchReplace = class(TObject)
private
FSourceFile: TFileStream;
FtmpFile: TFileStream;
FEncoding: TEncoding;
public
constructor Create(const AFileName: string);
destructor Destroy; override;
procedure Replace(const AFrom, ATo: string; ReplaceFlags: TReplaceFlags);
end;
implementation
uses
System.IOUtils,
System.StrUtils;
function Max(const A, B: Integer): Integer;
begin
if A > B then
Result := A
else
Result := B;
end;
{ TFileSearchReplace }
constructor TFileSearchReplace.Create(const AFileName: string);
begin
inherited Create;
FSourceFile := TFileStream.Create(AFileName, fmOpenReadWrite);
FtmpFile := TFileStream.Create(ChangeFileExt(AFileName, '.tmp'), fmCreate);
end;
destructor TFileSearchReplace.Destroy;
var
tmpFileName: string;
begin
if Assigned(FtmpFile) then
tmpFileName := FtmpFile.FileName;
FreeAndNil(FtmpFile);
FreeAndNil(FSourceFile);
TFile.Delete(tmpFileName);
inherited;
end;
procedure TFileSearchReplace.Replace(const AFrom, ATo: string;
ReplaceFlags: TReplaceFlags);
procedure CopyPreamble;
var
PreambleSize: Integer;
PreambleBuf: TBytes;
begin
// Copy Encoding preamble
SetLength(PreambleBuf, 100);
FSourceFile.Read(PreambleBuf, Length(PreambleBuf));
FSourceFile.Seek(0, soBeginning);
PreambleSize := TEncoding.GetBufferEncoding(PreambleBuf, FEncoding);
if PreambleSize <> 0 then
FtmpFile.CopyFrom(FSourceFile, PreambleSize);
end;
function GetLastIndex(const Str, SubStr: string): Integer;
var
i: Integer;
tmpSubStr, tmpStr: string;
begin
if not(rfIgnoreCase in ReplaceFlags) then
begin
i := Pos(SubStr, Str);
Result := i;
while i > 0 do
begin
i := PosEx(SubStr, Str, i + 1);
if i > 0 then
Result := i;
end;
if Result > 0 then
Inc(Result, Length(SubStr) - 1);
end
else
begin
tmpStr := UpperCase(Str);
tmpSubStr := UpperCase(SubStr);
i := Pos(tmpSubStr, tmpStr);
Result := i;
while i > 0 do
begin
i := PosEx(tmpSubStr, tmpStr, i + 1);
if i > 0 then
Result := i;
end;
if Result > 0 then
Inc(Result, Length(tmpSubStr) - 1);
end;
end;
var
SourceSize: int64;
procedure ParseBuffer(Buf: TBytes; var IsReplaced: Boolean);
var
i: Integer;
ReadedBufLen: Integer;
BufStr: string;
DestBytes: TBytes;
LastIndex: Integer;
begin
if IsReplaced and (not(rfReplaceAll in ReplaceFlags)) then
begin
FtmpFile.Write(Buf, Length(Buf));
Exit;
end;
// 1. Get chars from buffer
ReadedBufLen := 0;
for i := Length(Buf) downto 0 do
if FEncoding.GetCharCount(Buf, 0, i) <> 0 then
begin
ReadedBufLen := i;
Break;
end;
if ReadedBufLen = 0 then
raise EEncodingError.Create('Cant convert bytes to str');
FSourceFile.Seek(ReadedBufLen - Length(Buf), soCurrent);
BufStr := FEncoding.GetString(Buf, 0, ReadedBufLen);
if rfIgnoreCase in ReplaceFlags then
IsReplaced := ContainsText(BufStr, AFrom)
else
IsReplaced := ContainsStr(BufStr, AFrom);
if IsReplaced then
begin
LastIndex := GetLastIndex(BufStr, AFrom);
LastIndex := Max(LastIndex, Length(BufStr) - Length(AFrom) + 1);
end
else
LastIndex := Length(BufStr);
SetLength(BufStr, LastIndex);
FSourceFile.Seek(FEncoding.GetByteCount(BufStr) - ReadedBufLen, soCurrent);
BufStr := StringReplace(BufStr, AFrom, ATo, ReplaceFlags);
DestBytes := FEncoding.GetBytes(BufStr);
FtmpFile.Write(DestBytes, Length(DestBytes));
end;
var
Buf: TBytes;
BufLen: Integer;
bReplaced: Boolean;
begin
FSourceFile.Seek(0, soBeginning);
FtmpFile.Size := 0;
CopyPreamble;
SourceSize := FSourceFile.Size;
BufLen := Max(FEncoding.GetByteCount(AFrom) * 5, 2048);
BufLen := Max(FEncoding.GetByteCount(ATo) * 5, BufLen);
SetLength(Buf, BufLen);
bReplaced := False;
while FSourceFile.Position < SourceSize do
begin
BufLen := FSourceFile.Read(Buf, Length(Buf));
SetLength(Buf, BufLen);
ParseBuffer(Buf, bReplaced);
end;
FSourceFile.Size := 0;
FSourceFile.CopyFrom(FtmpFile, 0);
end;
how to use:
procedure TForm2.btn1Click(Sender: TObject);
var
Replacer: TFileSearchReplace;
StartTime: TDateTime;
begin
StartTime:=Now;
Replacer:=TFileSearchReplace.Create('c:\Temp\123.txt');
try
Replacer.Replace('some текст', 'some', [rfReplaceAll, rfIgnoreCase]);
finally
Replacer.Free;
end;
Caption:=FormatDateTime('nn:ss.zzz', Now - StartTime);
end;
Your first try creates several copies of the file in memory:
it loads the whole file into memory (TStringList)
it creates a copy of this memory when accessing the .Text property
it creates yet another copy of this memory when passing that string to StringReplace (The copy is the result which is built in StringReplace.)
You could try to solve the out of memory problem by getting rid of one or more of these copies:
e.g. read the file into a simple string variable rather than a TStringList
or keep the string list but run the StringReplace on each line separately and write the result to the file line by line.
That would increase the maximum file size your code can handle, but you will still run out of memory for huge files. If you want to handle files of any size, your second approach is the way to go.
No - I don't think there's a faster way that the 2nd option (if you want a completely generic search'n'replace function for any file of any size). It may be possible to make a faster version if you code it specifically according to your requirements, but as a general-purpose search'n'replace function, I don't believe you can go faster...
For instance, are you sure you need case-insensitive replacement? I would expect that this would be a large part of the time spent in the replace function. Try (just for kicks) to remove that requirement and see if it doesn't speed up the execution quite a bit on large files (this depends on how the internal coding of the StringReplace function is made - if it has a specific optimization for case-sensitive searches)
I believe refinement of Kami's code is needed to account for the string not being found, but the start of a new instance of the string might occur at the end of the buffer. The else clause is different:
if IsReplaced then begin
LastIndex := GetLastIndex(BufStr, AFrom);
LastIndex := Max(LastIndex, Length(BufStr) - Length(AFrom) + 1);
end else
LastIndex :=Length(BufStr) - Length(AFrom) + 1;
Correct fix is this one:
if IsReplaced then
begin
LastIndex := GetLastIndex(BufStr, AFrom);
LastIndex := Max(LastIndex, Length(BufStr) - Length(AFrom) + 1);
end
else
if FSourceFile.Position < SourceSize then
LastIndex := Length(BufStr) - Length(AFrom) + 1
else
LastIndex := Length(BufStr);