I know that I can efficiently truncate a file and remove bytes from the end of the file.
Is there a corresponding efficient way to truncate files by deleting content from the beginning of the file to a point in the middle of the file?
As I read the question you are asking to remove content from a file starting from the beginning of the file. In other words you wish to delete content at the start of the file and shift the remaining content down.
This is not possible. You can only truncate a file from the end, not from the beginning. You will need to copy the remaining content into a new file, or copy it down yourself within the same file.
However you do it there is no shortcut efficient way to do this. You have to copy the data, for example as #kobik describes.
Raymond Chen wrote a nice article on this topic: How do I delete bytes from the beginning of a file?
Just for fun, here's a simple implementation of a stream based method to delete content from anywhere in the file. You could use this with a read/write file stream. I've not tested the code, I'll leave that to you!
procedure DeleteFromStream(Stream: TStream; Start, Length: Int64);
var
Buffer: Pointer;
BufferSize: Integer;
BytesToRead: Int64;
BytesRemaining: Int64;
SourcePos, DestPos: Int64;
begin
SourcePos := Start+Length;
DestPos := Start;
BytesRemaining := Stream.Size-SourcePos;
BufferSize := Min(BytesRemaining, 1024*1024*16);//no bigger than 16MB
GetMem(Buffer, BufferSize);
try
while BytesRemaining>0 do begin
BytesToRead := Min(BufferSize, BytesRemaining);
Stream.Position := SourcePos;
Stream.ReadBuffer(Buffer^, BytesToRead);
Stream.Position := DestPos;
Stream.WriteBuffer(Buffer^, BytesToRead);
inc(SourcePos, BytesToRead);
inc(DestPos, BytesToRead);
dec(BytesRemaining, BytesToRead);
end;
Stream.Size := DestPos;
finally
FreeMem(Buffer);
end;
end;
A very simple solution would be to shift (move) blocks of data from the "target position offset"
towards BOF, and then trim (truncate) the leftovers:
--------------------------
|******|xxxxxx|yyyyyy|zzz|
--------------------------
BOF <-^ (target position offset)
--------------------------
|xxxxxx|yyyyyy|zzz|******|
--------------------------
^ EOF
Since #David posted a code based on TStream, here is some code based on "low level" I/O pascal style:
function FileDeleteFromBOF(const FileName: string; const Offset: Cardinal): Boolean;
var
Buf: Pointer;
BufSize, FSize,
NumRead, NumWrite,
OffsetFrom, OffsetTo: Cardinal;
F: file;
begin
{$IOCHECKS OFF}
Result := False;
AssignFile(F, FileName);
try
FileMode := 2; // Read/Write
Reset(F, 1); // Record size = 1
FSize := FileSize(F);
if (IOResult <> 0) or (Offset >= FSize) then Exit;
BufSize := Min(Offset, 1024 * 64); // Max 64k - This value could be optimized
GetMem(Buf, BufSize);
try
OffsetFrom := Offset;
OffsetTo := 0;
repeat
Seek(F, OffsetFrom);
BlockRead(F, Buf^, BufSize, NumRead);
if NumRead = 0 then Break;
Seek(F, OffsetTo);
BlockWrite(F, Buf^, NumRead, NumWrite);
Inc(OffsetFrom, NumWrite);
Inc(OffsetTo, NumWrite);
until (NumRead = 0) or (NumWrite <> NumRead) or (OffsetFrom >= FSize);
// Truncate and set to EOF
Seek(F, FSize - Offset);
Truncate(F);
Result := IOResult = 0;
finally
FreeMem(Buf);
end;
finally
CloseFile(F);
end;
end;
Related
I am playing with BASS from http://www.un4seen.com/.
I need to create a flac file(16bits) or flac stream from user speaking on Microphone.
I have seen this demo in BASS source code.
There is a bassenc_flac.dll as well with these functions:
function BASS_Encode_FLAC_Start(handle:DWORD; options:PChar; flags:DWORD; proc:ENCODEPROCEX; user:Pointer): HENCODE; {$IFDEF MSWINDOWS}stdcall{$ELSE}cdecl{$ENDIF}; external bassencflacdll;
function BASS_Encode_FLAC_StartFile(handle:DWORD; options:PChar; flags:DWORD; filename:PChar): HENCODE; {$IFDEF MSWINDOWS}stdcall{$ELSE}cdecl{$ENDIF}; external bassencflacdll;
How could I change the next code to encode the audio to flac file or stream?
From RecordTest BASS demo
(* This is called while recording audio *)
function RecordingCallback(Handle: HRECORD; buffer: Pointer; length: DWORD; user: Pointer): boolean; stdcall;
var level:dword;
begin
level:=BASS_ChannelGetLevel(Handle);
// Copy new buffer contents to the memory buffer
Form1.WaveStream.Write(buffer^, length);
// Allow recording to continue
Result := True;
end;
(* Start recording to memory *)
procedure TForm1.StartRecording;
begin
if ComboBox1.ItemIndex < 0 then Exit;
if WaveStream.Size > 0 then
begin // free old recording
BASS_StreamFree(chan);
WaveStream.Clear;
end;
// generate header for WAV file
with WaveHdr do
begin
riff := 'RIFF';
len := 36;
cWavFmt := 'WAVEfmt ';
dwHdrLen := 16;
wFormat := 1;
wNumChannels := 2;
dwSampleRate := 44100;
wBlockAlign := 4;
dwBytesPerSec := 176400;
wBitsPerSample := 16;
cData := 'data';
dwDataLen := 0;
end;
WaveStream.Write(WaveHdr, SizeOf(WAVHDR));
// start recording # 44100hz 16-bit stereo
rchan := BASS_RecordStart(44100, 2, 0, #RecordingCallback, nil);
if rchan = 0 then
begin
MessageDlg('Couldn''t start recording!', mtError, [mbOk], 0);
WaveStream.Clear;
end
else
begin
bRecord.Caption := 'Stop';
bPlay.Enabled := False;
bSave.Enabled := False;
end;
end;
(* Stop recording *)
procedure TForm1.StopRecording;
var
i: integer;
he:BassEnc.HENCODE;
begin
BASS_ChannelStop(rchan);
bRecord.Caption := 'Record';
// complete the WAV header
WaveStream.Position := 4;
i := WaveStream.Size - 8;
WaveStream.Write(i, 4);
i := i - $24;
WaveStream.Position := 40;
WaveStream.Write(i, 4);
WaveStream.Position := 0;
// create a stream from the recorded data
chan := BASS_StreamCreateFile(True, WaveStream.Memory, 0, WaveStream.Size, 0);
if chan <> 0 then
begin
// enable "Play" & "Save" buttons
bPlay.Enabled := True;
bSave.Enabled := True;
end
else
MessageDlg('Error creating stream from recorded data!', mtError, [mbOk], 0);
if SaveDialog.Execute then
WaveStream.SaveToFile(SaveDialog.FileName);
end;
I have updated code because of comments that show incorrect work of previous encoder version. And I am totally agree with these comments.
In order to create an encoder to FLAC we should go to un4seen web-site and download the next files:
BASS audio library 2.4
BASSFLAC 2.4.4
BASSenc 2.4.14
BASSenc_FLAC 2.4.1.1
Go through these folders and look for the next files:
bass.pas
bassenc.pas
bassenc_flac.pas
Now place these pas-files into one folder and add it to Library via Delphi's options.
After this step create new project, save it in separate folder.
Then go through BASS_XXX folders and look for *.dll files.
Combine them together in the folder where you have saved your project!
Now let's write some code.
Add to the uses clause bass.pas, bassenc.pas and bassenc_flac.pas. Then copy the code shown below.
uses ..., BASS, BASSEnc, BASSEnc_FLAC;
...
TForm1 = class(TForm)
ProgressBar1: TProgressBar;
public
{ Public declarations }
procedure StartEncode(SourceFileName, OutputFileName: String);
procedure StopEncode;
end;
...
procedure TForm1.StartEncode(SourceFileName, OutputFileName: String);
var
PercentDone: Cardinal;
Buffer: array [0..1024] of Byte;
begin
Channel := BASS_StreamCreateFile(false, PChar(SourceFileName), 0, 0, BASS_MUSIC_DECODE or BASS_UNICODE);
BASSEnc_FLAC.BASS_Encode_FLAC_StartFile(Channel, 0, BASS_ENCODE_FP_AUTO or BASS_UNICODE, PChar(OutputFileName));
while BASS_ChannelIsActive(Channel) > 0 do
begin
BASS_ChannelGetData(Channel, #Buffer, 1024);
PercentDone := Trunc(100 * (BASS_ChannelGetPosition(Channel, BASS_POS_BYTE) / BASS_ChannelGetLength(Channel, BASS_POS_BYTE)));
ProgressBar1.Position := PercentDone;
end;
StopEncode;
end;
procedure TForm1.StopEncode;
begin
BASS_Encode_Stop(Channel);
BASS_StreamFree(Channel);
end;
procedure TForm1.Button1Click(Sender: TObject);
begin
BASS_Init(-1, 44100, 0, Application.Handle, nil);
try
// Set name of file to convert it to FLAC and save it with output name
StartEncode('SourceFileName', 'OutputFileName');
finally
BASS.BASS_Free;
end;
end;
One notice:
Indeed, file encoded with previous version of the code had incorrect header (I could see it when opened file in Notepad.exe). After code has been updated I can see valid header (in Notepad, of course, because I have no professional instruments for work with audio-files).
Now you even have no need to add plugin to BASS as I did earlier.
Since this I think that the encoder works as it was expected.
I need to read the last line in some very large textfiles (to get the timestamp from the data). TStringlist would be a simple approach but it returns an out of memory error. I'm trying to use seek and blockread, but the characters in the buffer are all nonsense. Is this something to do with unicode?
Function TForm1.ReadLastLine2(FileName: String): String;
var
FileHandle: File;
s,line: string;
ok: 0..1;
Buf: array[1..8] of Char;
k: longword;
i,ReadCount: integer;
begin
AssignFile (FileHandle,FileName);
Reset (FileHandle); // or for binary files: Reset (FileHandle,1);
ok := 0;
k := FileSize (FileHandle);
Seek (FileHandle, k-1);
s := '';
while ok<>1 do begin
BlockRead (FileHandle, buf, SizeOf(Buf)-1, ReadCount); //BlockRead ( var FileHandle : File; var Buffer; RecordCount : Integer {; var RecordsRead : Integer} ) ;
if ord (buf[1]) <>13 then //Arg to integer
s := s + buf[1]
else
ok := ok + 1;
k := k-1;
seek (FileHandle,k);
end;
CloseFile (FileHandle);
// Reverse the order in the line read
setlength (line,length(s));
for i:=1 to length(s) do
line[length(s) - i+1 ] := s[i];
Result := Line;
end;
Based on www.delphipages.com/forum/showthread.php?t=102965
The testfile is a simple CSV I created in excel ( this is not the 100MB I ultimately need to read).
a,b,c,d,e,f,g,h,i,j,blank
A,B,C,D,E,F,G,H,I,J,blank
1,2,3,4,5,6,7,8,9,0,blank
Mary,had,a,little,lamb,His,fleece,was,white,as,snow
And,everywhere,that,Mary,went,The,lamb,was,sure,to,go
You really have to read the file in LARGE chunks from the tail to the head.
Since it is so large it does not fit the memory - then reading it line by line from start to end would be very slow. With ReadLn - twice slow.
You also has to be ready that the last line might end with EOL or may not.
Personally I would also account for three possible EOL sequences:
CR/LF aka #13#10=^M^J - DOS/Windows style
CR without LF - just #13=^M - Classic MacOS file
LF without CR - just #10=^J - UNIX style, including MacOS version 10
If you are sure your CSV files would only ever be generated by native Windows programs it would be safe to assume full CR/LF be used. But if there can be other Java programs, non-Windows platforms, mobile programs - I would be less sure. Of course pure CR without LF would be the least probable case of them all.
uses System.IOUtils, System.Math, System.Classes;
type FileChar = AnsiChar; FileString = AnsiString; // for non-Unicode files
// type FileChar = WideChar; FileString = UnicodeString;// for UTF16 and UCS-2 files
const FileCharSize = SizeOf(FileChar);
// somewhere later in the code add: Assert(FileCharSize = SizeOf(FileString[1]);
function ReadLastLine(const FileName: String): FileString; overload; forward;
const PageSize = 4*1024;
// the minimal read atom of most modern HDD and the memory allocation atom of Win32
// since the chances your file would have lines longer than 4Kb are very small - I would not increase it to several atoms.
function ReadLastLine(const Lines: TStringDynArray): FileString; overload;
var i: integer;
begin
Result := '';
i := High(Lines);
if i < Low(Lines) then exit; // empty array - empty file
Result := Lines[i];
if Result > '' then exit; // we got the line
Dec(i); // skip the empty ghost line, in case last line was CRLF-terminated
if i < Low(Lines) then exit; // that ghost was the only line in the empty file
Result := Lines[i];
end;
// scan for EOLs in not-yet-scanned part
function FindLastLine(buffer: TArray<FileChar>; const OldRead : Integer;
const LastChunk: Boolean; out Line: FileString): boolean;
var i, tailCRLF: integer; c: FileChar;
begin
Result := False;
if Length(Buffer) = 0 then exit;
i := High(Buffer);
tailCRLF := 0; // test for trailing CR/LF
if Buffer[i] = ^J then begin // LF - single, or after CR
Dec(i);
Inc(tailCRLF);
end;
if (i >= Low(Buffer)) and (Buffer[i] = ^M) then begin // CR, alone or before LF
Inc(tailCRLF);
end;
i := High(Buffer) - Max(OldRead, tailCRLF);
if i - Low(Buffer) < 0 then exit; // no new data to read - results would be like before
if OldRead > 0 then Inc(i); // the CR/LF pair could be sliced between new and previous buffer - so need to start a bit earlier
for i := i downto Low(Buffer) do begin
c := Buffer[i];
if (c=^J) or (c=^M) then begin // found EOL
SetString( Line, #Buffer[i+1], High(Buffer) - tailCRLF - i);
exit(True);
end;
end;
// we did not find non-terminating EOL in the buffer (except maybe trailing),
// now we should ask for more file content, if there is still left any
// or take the entire file (without trailing EOL if any)
if LastChunk then begin
SetString( Line, #Buffer[ Low(Buffer) ], Length(Buffer) - tailCRLF);
Result := true;
end;
end;
function ReadLastLine(const FileName: String): FileString; overload;
var Buffer, tmp: TArray<FileChar>;
// dynamic arrays - eases memory management and protect from stack corruption
FS: TFileStream; FSize, NewPos: Int64;
OldRead, NewLen : Integer; EndOfFile: boolean;
begin
Result := '';
FS := TFile.OpenRead(FileName);
try
FSize := FS.Size;
if FSize <= PageSize then begin // small file, we can be lazy!
FreeAndNil(FS); // free the handle and avoid double-free in finally
Result := ReadLastLine( TFile.ReadAllLines( FileName, TEncoding.ANSI ));
// or TEncoding.UTF16
// warning - TFIle is not share-aware, if the file is being written to by another app
exit;
end;
SetLength( Buffer, PageSize div FileCharSize);
OldRead := 0;
repeat
NewPos := FSize - Length(Buffer)*FileCharSize;
EndOfFile := NewPos <= 0;
if NewPos < 0 then NewPos := 0;
FS.Position := NewPos;
FS.ReadBuffer( Buffer[Low(Buffer)], (Length(Buffer) - OldRead)*FileCharSize);
if FindLastLine(Buffer, OldRead, EndOfFile, Result) then
exit; // done !
tmp := Buffer; Buffer := nil; // flip-flop: preparing to broaden our mouth
OldRead := Length(tmp); // need not to re-scan the tail again and again when expanding our scanning range
NewLen := Min( 2*Length(tmp), FSize div FileCharSize );
SetLength(Buffer, NewLen); // this may trigger EOutOfMemory...
Move( tmp[Low(tmp)], Buffer[High(Buffer)-OldRead+1], OldRead*FileCharSize);
tmp := nil; // free old buffer
until EndOfFile;
finally
FS.Free;
end;
end;
PS. Note one extra special case - if you would use Unicode chars (two-bytes ones) and would give odd-length file (3 bytes, 5 bytes, etc) - you would never be ble to scan the starting single byte (half-widechar). Maybe you should add the extra guard there, like Assert( 0 = FS.Size mod FileCharSize)
PPS. As a rule of thumb you better keep those functions out of the form class, - because WHY mixing them? In general you should separate concerns into small blocks. Reading file has nothing with user interaction - so should better be offloaded to an extra UNIT. Then you would be able to use functions from that unit in one form or 10 forms, in main thread or in multi-threaded application. Like LEGO parts - they give you flexibility by being small and separate.
PPPS. Another approach here would be using memory-mapped files. Google for MMF implementations for Delphi and articles about benefits and problems with MMF approach. Personally I think rewriting the code above to use MMF would greatly simplify it, removing several "special cases" and the troublesome and memory copying flip-flop. OTOH it would demand you to be very strict with pointers arithmetic.
https://en.wikipedia.org/wiki/Memory-mapped_file
https://msdn.microsoft.com/en-us/library/ms810613.aspx
http://torry.net/quicksearchd.php?String=memory+map&Title=No
Your char type is two byte, so that buffer is 16 byte. Then with blockread you read sizeof(buffer)-1 byte into it, and check the first 2 byte char if it is equal to #13.
The sizeof(buffer)-1 is dodgy (where does that -1 come from?), and the rest is valid, but only if your input file is utf16.
Also your read 8 (or 16) characters each time, but compare only one and then do a seek again. That is not very logical either.
If your encoding is not utf16, I suggest you change the type of a buffer element to ansichar and remove the -1
In response to kopiks suggestion, I figured out how to do it with TFilestream, it works ok with the simple test file, though there may be some further tweeks when I use it on a variety of csv files. Also, I don't make any claims that this is the most efficient method.
procedure TForm1.Button6Click(Sender: TObject);
Var
StreamSize, ApproxNumRows : Integer;
TempStr : String;
begin
if OpenDialog1.Execute then begin
TempStr := ReadLastLineOfTextFile(OpenDialog1.FileName,StreamSize, ApproxNumRows);
// TempStr := ReadFileStream('c:\temp\CSVTestFile.csv');
ShowMessage ('approximately '+ IntToStr(ApproxNumRows)+' Rows');
ListBox1.Items.Add(TempStr);
end;
end;
Function TForm1.ReadLastLineOfTextFile(const FileName: String; var StreamSize, ApproxNumRows : Integer): String;
const
MAXLINELENGTH = 256;
var
Stream: TFileStream;
BlockSize,CharCount : integer;
Hash13Found : Boolean;
Buffer : array [0..MAXLINELENGTH] of AnsiChar;
begin
Hash13Found := False;
Result :='';
Stream := TFileStream.Create(FileName, fmOpenRead or fmShareDenyWrite);
StreamSize := Stream.size;
if StreamSize < MAXLINELENGTH then
BlockSize := StreamSize
Else
BlockSize := MAXLINELENGTH;
// for CharCount := 0 to Length(Buffer)-1 do begin
// Buffer[CharCount] := #0; // zeroing the buffer can aid diagnostics
// end;
CharCount := 0;
Repeat
Stream.Seek(-(CharCount+3), 2); //+3 misses out the #0,#10,#13 at the end of the file
Stream.Read( Buffer[CharCount], 1);
Result := String(Buffer[CharCount]) + result;
if Buffer[CharCount] =#13 then
Hash13Found := True;
Inc(CharCount);
Until Hash13Found OR (CharCount = BlockSize);
ShowMessage(Result);
ApproxNumRows := Round(StreamSize / CharCount);
end;
Just thought of a new solution.
Again, there could be better ones, but this one is the best i thought of.
function GetLastLine(textFilePath: string): string;
var
list: tstringlist;
begin
list := tstringlist.Create;
try
list.LoadFromFile(textFilePath);
result := list[list.Count-1];
finally
list.free;
end;
end;
I am trying to find and replace text in a text file. I have been able to do this in the past with methods like:
procedure SmallFileFindAndReplace(FileName, Find, ReplaceWith: string);
begin
with TStringList.Create do
begin
LoadFromFile(FileName);
Text := StringReplace(Text, Find, ReplaceWith, [rfReplaceAll, rfIgnoreCase]);
SaveToFile(FileName);
Free;
end;
end;
The above works fine when a file is relatively small, however; when the the file size is something like 170 Mb the above code will cause the following error:
EOutOfMemory with message 'Out of memory'
I have tried the following with success, however it takes a long time to run:
procedure Tfrm_Main.button_MakeReplacementClick(Sender: TObject);
var
fs : TFileStream;
s : AnsiString;
//s : string;
begin
fs := TFileStream.Create(edit_SQLFile.Text, fmOpenread or fmShareDenyNone);
try
SetLength(S, fs.Size);
fs.ReadBuffer(S[1], fs.Size);
finally
fs.Free;
end;
s := StringReplace(s, edit_Find.Text, edit_Replace.Text, [rfReplaceAll, rfIgnoreCase]);
fs := TFileStream.Create(edit_SQLFile.Text, fmCreate);
try
fs.WriteBuffer(S[1], Length(S));
finally
fs.Free;
end;
end;
I am new to "Streams" and working with buffers.
Is there a better way to do this?
Thank You.
You have two mistakes in first code example and three - in second example:
Do not load whole large file in memory, especially in 32bit application. If file size more than ~1 Gb, you always get "Out of memory"
StringReplace slows with large strings, because of repeated memory reallocation
In second code you don`t use text encoding in file, so (for Windows) your code "think" that file has UCS2 encoding (two bytes per character). But what you get, if file encoding is Ansi (one byte per character) or UTF8 (variable size of char)?
Thus, for correct find&replace you must use file encoding and read/write parts of file, as LU RD said:
interface
uses
System.Classes,
System.SysUtils;
type
TFileSearchReplace = class(TObject)
private
FSourceFile: TFileStream;
FtmpFile: TFileStream;
FEncoding: TEncoding;
public
constructor Create(const AFileName: string);
destructor Destroy; override;
procedure Replace(const AFrom, ATo: string; ReplaceFlags: TReplaceFlags);
end;
implementation
uses
System.IOUtils,
System.StrUtils;
function Max(const A, B: Integer): Integer;
begin
if A > B then
Result := A
else
Result := B;
end;
{ TFileSearchReplace }
constructor TFileSearchReplace.Create(const AFileName: string);
begin
inherited Create;
FSourceFile := TFileStream.Create(AFileName, fmOpenReadWrite);
FtmpFile := TFileStream.Create(ChangeFileExt(AFileName, '.tmp'), fmCreate);
end;
destructor TFileSearchReplace.Destroy;
var
tmpFileName: string;
begin
if Assigned(FtmpFile) then
tmpFileName := FtmpFile.FileName;
FreeAndNil(FtmpFile);
FreeAndNil(FSourceFile);
TFile.Delete(tmpFileName);
inherited;
end;
procedure TFileSearchReplace.Replace(const AFrom, ATo: string;
ReplaceFlags: TReplaceFlags);
procedure CopyPreamble;
var
PreambleSize: Integer;
PreambleBuf: TBytes;
begin
// Copy Encoding preamble
SetLength(PreambleBuf, 100);
FSourceFile.Read(PreambleBuf, Length(PreambleBuf));
FSourceFile.Seek(0, soBeginning);
PreambleSize := TEncoding.GetBufferEncoding(PreambleBuf, FEncoding);
if PreambleSize <> 0 then
FtmpFile.CopyFrom(FSourceFile, PreambleSize);
end;
function GetLastIndex(const Str, SubStr: string): Integer;
var
i: Integer;
tmpSubStr, tmpStr: string;
begin
if not(rfIgnoreCase in ReplaceFlags) then
begin
i := Pos(SubStr, Str);
Result := i;
while i > 0 do
begin
i := PosEx(SubStr, Str, i + 1);
if i > 0 then
Result := i;
end;
if Result > 0 then
Inc(Result, Length(SubStr) - 1);
end
else
begin
tmpStr := UpperCase(Str);
tmpSubStr := UpperCase(SubStr);
i := Pos(tmpSubStr, tmpStr);
Result := i;
while i > 0 do
begin
i := PosEx(tmpSubStr, tmpStr, i + 1);
if i > 0 then
Result := i;
end;
if Result > 0 then
Inc(Result, Length(tmpSubStr) - 1);
end;
end;
var
SourceSize: int64;
procedure ParseBuffer(Buf: TBytes; var IsReplaced: Boolean);
var
i: Integer;
ReadedBufLen: Integer;
BufStr: string;
DestBytes: TBytes;
LastIndex: Integer;
begin
if IsReplaced and (not(rfReplaceAll in ReplaceFlags)) then
begin
FtmpFile.Write(Buf, Length(Buf));
Exit;
end;
// 1. Get chars from buffer
ReadedBufLen := 0;
for i := Length(Buf) downto 0 do
if FEncoding.GetCharCount(Buf, 0, i) <> 0 then
begin
ReadedBufLen := i;
Break;
end;
if ReadedBufLen = 0 then
raise EEncodingError.Create('Cant convert bytes to str');
FSourceFile.Seek(ReadedBufLen - Length(Buf), soCurrent);
BufStr := FEncoding.GetString(Buf, 0, ReadedBufLen);
if rfIgnoreCase in ReplaceFlags then
IsReplaced := ContainsText(BufStr, AFrom)
else
IsReplaced := ContainsStr(BufStr, AFrom);
if IsReplaced then
begin
LastIndex := GetLastIndex(BufStr, AFrom);
LastIndex := Max(LastIndex, Length(BufStr) - Length(AFrom) + 1);
end
else
LastIndex := Length(BufStr);
SetLength(BufStr, LastIndex);
FSourceFile.Seek(FEncoding.GetByteCount(BufStr) - ReadedBufLen, soCurrent);
BufStr := StringReplace(BufStr, AFrom, ATo, ReplaceFlags);
DestBytes := FEncoding.GetBytes(BufStr);
FtmpFile.Write(DestBytes, Length(DestBytes));
end;
var
Buf: TBytes;
BufLen: Integer;
bReplaced: Boolean;
begin
FSourceFile.Seek(0, soBeginning);
FtmpFile.Size := 0;
CopyPreamble;
SourceSize := FSourceFile.Size;
BufLen := Max(FEncoding.GetByteCount(AFrom) * 5, 2048);
BufLen := Max(FEncoding.GetByteCount(ATo) * 5, BufLen);
SetLength(Buf, BufLen);
bReplaced := False;
while FSourceFile.Position < SourceSize do
begin
BufLen := FSourceFile.Read(Buf, Length(Buf));
SetLength(Buf, BufLen);
ParseBuffer(Buf, bReplaced);
end;
FSourceFile.Size := 0;
FSourceFile.CopyFrom(FtmpFile, 0);
end;
how to use:
procedure TForm2.btn1Click(Sender: TObject);
var
Replacer: TFileSearchReplace;
StartTime: TDateTime;
begin
StartTime:=Now;
Replacer:=TFileSearchReplace.Create('c:\Temp\123.txt');
try
Replacer.Replace('some текст', 'some', [rfReplaceAll, rfIgnoreCase]);
finally
Replacer.Free;
end;
Caption:=FormatDateTime('nn:ss.zzz', Now - StartTime);
end;
Your first try creates several copies of the file in memory:
it loads the whole file into memory (TStringList)
it creates a copy of this memory when accessing the .Text property
it creates yet another copy of this memory when passing that string to StringReplace (The copy is the result which is built in StringReplace.)
You could try to solve the out of memory problem by getting rid of one or more of these copies:
e.g. read the file into a simple string variable rather than a TStringList
or keep the string list but run the StringReplace on each line separately and write the result to the file line by line.
That would increase the maximum file size your code can handle, but you will still run out of memory for huge files. If you want to handle files of any size, your second approach is the way to go.
No - I don't think there's a faster way that the 2nd option (if you want a completely generic search'n'replace function for any file of any size). It may be possible to make a faster version if you code it specifically according to your requirements, but as a general-purpose search'n'replace function, I don't believe you can go faster...
For instance, are you sure you need case-insensitive replacement? I would expect that this would be a large part of the time spent in the replace function. Try (just for kicks) to remove that requirement and see if it doesn't speed up the execution quite a bit on large files (this depends on how the internal coding of the StringReplace function is made - if it has a specific optimization for case-sensitive searches)
I believe refinement of Kami's code is needed to account for the string not being found, but the start of a new instance of the string might occur at the end of the buffer. The else clause is different:
if IsReplaced then begin
LastIndex := GetLastIndex(BufStr, AFrom);
LastIndex := Max(LastIndex, Length(BufStr) - Length(AFrom) + 1);
end else
LastIndex :=Length(BufStr) - Length(AFrom) + 1;
Correct fix is this one:
if IsReplaced then
begin
LastIndex := GetLastIndex(BufStr, AFrom);
LastIndex := Max(LastIndex, Length(BufStr) - Length(AFrom) + 1);
end
else
if FSourceFile.Position < SourceSize then
LastIndex := Length(BufStr) - Length(AFrom) + 1
else
LastIndex := Length(BufStr);
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
I'm trying to cut the lines of this code and I do not see how the code I show is the minimum of lines that could achieve:
function read_file(FileName: String): AnsiString;
var
F: File;
Buffer: AnsiString;
Size: Integer;
ReadBytes: Integer;
DefaultFileMode: Byte;
begin
Result := '';
DefaultFileMode := FileMode;
FileMode := 0;
AssignFile(F, FileName);
Reset(F, 1);
if (IOResult = 0) then
begin
Size := FileSize(F);
while (Size > 1024) do
begin
SetLength(Buffer, 1024);
BlockRead(F, Buffer[1], 1024, ReadBytes);
Result := Result + Buffer;
Dec(Size, ReadBytes);
end;
SetLength(Buffer, Size);
BlockRead(F, Buffer[1], Size);
Result := Result + Buffer;
CloseFile(F);
end;
FileMode := DefaultFileMode;
end;
is there any way to reduce more lines?
Like this:
function read_file(const FileName: String): AnsiString;
var
Stream: TFileStream;
begin
Stream := TFileStream.Create(FileName, fmOpenRead);
try
SetLength(Result, Stream.Size);
Stream.ReadBuffer(Pointer(Result)^, Stream.Size);
finally
Stream.Free;
end;
end;
In modern Delphi the TFile class has static methods that can do this as a one liner. Although not directly into an AnsiString.
As well as being shorter I perceive the following additional benefits:
Avoiding Pascal I/O in favour of modern streams.
Error handling by exceptions, taken care of by the stream class.
A single allocation of the string variable as opposed to repeated inefficient re-allocations.
If you must do this with Pascal I0 use a single allocation.
SetLength(Buffer, FileSize(F));
BlockRead(F, Pointer(Result)^, Length(Result), ReadBytes);
If you insist on using old-style I/O, the following function is probably the smallest one you can do and still handle errors appropriately (if appropriately handling errors is to return an empty string).
function read_file(const FileName: String): AnsiString;
var
F: File;
DefaultFileMode: Byte;
begin
DefaultFileMode := FileMode;
try
FileMode := 0;
AssignFile(F, FileName);
{$I-}
Reset(F, 1);
{$I+}
if IoResult=0 then
try
SetLength(Result,FileSize(F));
if Length(Result)>0 then begin
{$I-}
BlockRead(F,Result[1],LENGTH(Result));
{$I+}
if IoResult<>0 then Result:='';
end;
finally
CloseFile(F);
end;
finally
FileMode := DefaultFileMode;
end;
end;
When I search the words "JPEG" and "metadata", I have many answers to manipulate the metadata... and this is the opposite I want... ;o)
I have written a function which exactly works like I want... (if images are similar, and only the metadata change or not, the function returns True ; if at least one pixel changes, it returns False) but, I'd like to improve the performance...
The bottleneck is the bmp.Assign(jpg);
function CompareImages(fnFrom, fnTo: TFileName): Boolean;
var
j1, j2: TJpegImage;
b1, b2: TBitmap;
s1, s2: TMemoryStream;
begin
Result := False;
sw1.Start;
j1 := TJpegImage.Create;
j2 := TJpegImage.Create;
sw1.Stop;
sw2.Start;
s1 := TMemoryStream.Create;
s2 := TMemoryStream.Create;
sw2.Stop;
//sw3.Start;
b1 := TBitmap.Create;
b2 := TBitmap.Create;
//sw3.Stop;
try
sw1.Start;
j1.LoadFromFile(fnFrom);
j2.LoadFromFile(fnTo);
sw1.Stop;
// the very long part...
sw3.Start;
b1.Assign(j1);
b2.Assign(j2);
sw3.Stop;
sw4.Start;
b1.SaveToStream(s1);
b2.SaveToStream(s2);
sw4.Stop;
sw2.Start;
s1.Position := 0;
s2.Position := 0;
sw2.Stop;
sw5.Start;
Result := IsIdenticalStreams(s1, s2);
sw5.Stop;
finally
// sw3.Start;
b1.Free;
b2.Free;
// sw3.Stop;
sw2.Start;
s1.Free;
s2.Free;
sw2.Stop;
sw1.Start;
j1.Free;
j2.Free;
sw1.Stop;
end;
end;
sw1, ..., sw5 are TStopWatch, I used to identify the time spent.
IsIdenticalStreams comes from here.
If I directly compare the TJpegImage, the streams are different...
Any better way to code that?
Regards,
W.
Update:
Testing some solutions extract from the comments, I have the same performance with this code:
type
TMyJpeg = class(TJPEGImage)
public
function Equals(Graphic: TGraphic): Boolean; override;
end;
...
function CompareImages(fnFrom, fnTo: TFileName): Boolean;
var
j1, j2: TMyJpeg;
begin
sw1.Start;
Result := False;
j1 := TMyJpeg.Create;
j2 := TMyJpeg.Create;
try
j1.LoadFromFile(fnFrom);
j2.LoadFromFile(fnTo);
Result := j1.Bitmap.Equals(j2.Bitmap);
finally
j1.Free;
j2.Free;
end;
sw1.Stop;
end;
Any way to directly access the pixel data bytes from the file (skipping the metadata bytes) without bitmap conversion?
JPEG file consists of chunks, which types are identified by markers. The structure of chunks (except for stand-alone SOI, EOI, RSTn):
chunk type marker (big-endian FFxx)
chunk length (big-endian word)
data (length-2 bytes)
Edit: SOS chunk is limited by another marker, not by length.
Metadata chunks start with APPn marker (FFEn), except for APP0 (FFE0) marker with JFIF title.
So we can read and compare only significant chunks and ignore APPn chunks and COM chunk (as TLama noticed).
Example: hex view of some jpeg file:
It starts with SOI (Start Of Image) marker FFD8 (stand-alone, without length),
then APP0 chunk (FFE0) with length = 16 bytes,
then APP1 chunk (FFE1), which contains metadata (EXIF data, NIKON COOLPIX name etc), so we can ignore 9053 bytes (23 5D) and check next chunk marker at address 2373, and so on...
Edit: Simple parsing example:
var
jp: TMemoryStream;
Marker, Len: Word;
Position: Integer;
PBA: PByteArray;
procedure ReadLenAndMovePosition;
begin
Inc(Position, 2);
Len := Swap(PWord(#PBA[Position])^);
Inc(Position, Len);
end;
begin
jp := TMemoryStream.Create;
jp.LoadFromFile('D:\3.jpg');
Position := 0;
PBA := jp.Memory;
while (Position < jp.Size - 1) do begin
Marker := Swap(PWord(#PBA[Position])^);
case Marker of
$FFD8: begin
Memo1.Lines.Add('Start Of Image');
Inc(Position, 2);
end;
$FFD9: begin
Memo1.Lines.Add('End Of Image');
Inc(Position, 2);
end;
$FFE0: begin
ReadLenAndMovePosition;
Memo1.Lines.Add(Format('JFIF Header Len: %d', [Len]));
end;
$FFE1..$FFEF, $FFFE: begin
ReadLenAndMovePosition;
Memo1.Lines.Add(Format('APPn or COM Len: %d Ignored', [Len]));
end;
$FFDA: begin
//SOS marker, data stream, ended by another marker except for RSTn
Memo1.Lines.Add(Format('SOS data stream started at %d', [Position]));
Inc(Position, 2);
while Position < jp.Size - 1 do begin
if PBA[Position] = $FF then
if not (PBA[Position + 1] in [0, $D0..$D7]) then begin
Inc(Position, 2);
Memo1.Lines.Add(Format('SOS data stream ended at %d',
[Position]));
Break;
end;
Inc(Position);
end;
end;
else begin
ReadLenAndMovePosition;
Memo1.Lines.Add(Format('Marker %x Len: %d Significant', [Marker, Len]));
end;
end;
end;
jp.Free;
end;