copy part of a file into stream - delphi

the global target is
using a part of file to get checksum to find duplicated movie and mp3 files,
for this goal i have to get a part of file and generate the md5 because whole file size is up to 25 gigs in some cases,if i found duplicates then i will do a complete md5 for avoid any mistake of wrong file deletion
i dont have any problem i generating md5 from stream , it will be done with indy components
so
for first part
i have to copy first 1mb of a file
so i maked this function
but the memory stream is empty for all checkes!
function splitFile(FileName: string): TMemoryStream;
var
fs: TFileStream;
ms : TMemoryStream;
begin
fs := TFileStream.Create(FileName, fmOpenRead or fmShareDenyWrite) ;
ms := TMemoryStream.Create;
fs.Position :=0;
ms.CopyFrom(fs, 1048576);
result := ms;
end;
how can i fix this? or where is my problem?
update1 - (dirty test) :
this code return error stream read error also memo2 show some string but memo3 is empty!!
function splitFile(FileName: string): TMemoryStream;
var
fs: TFileStream;
ms : TMemoryStream;
begin
fs := TFileStream.Create(FileName, fmOpenRead or fmShareDenyWrite) ;
ms := TMemoryStream.Create;
fs.Position :=0;
form1.Memo2.Lines.LoadFromStream(fs);
ms.CopyFrom(fs,1048576);
ms.Position := 0;
form1.Memo3.Lines.LoadFromStream(ms);
result := ms;
end;
the complete code
function splitFile(FileName: string): TMemoryStream;
var
fs: TFileStream;
ms : TMemoryStream;
i,BytesToRead : integer;
begin
fs := TFileStream.Create(FileName, fmOpenRead or fmShareDenyWrite);
ms := TMemoryStream.Create;
fs.Position :=0;
BytesToRead := Min(fs.Size-fs.Position, 1024*1024);
ms.CopyFrom(fs, BytesToRead);
result := ms;
// fs.Free;
// ms.Free;
end;
function streamFile(FileName: string): TFileStream;
var
fs: TFileStream;
ms : TMemoryStream;
begin
fs := TFileStream.Create(FileName, fmOpenRead or fmShareDenyWrite) ;
result := fs;
end;
function GetFileMD5(const Stream: TStream): String; overload;
var MD5: TIdHashMessageDigest5;
begin
MD5 := TIdHashMessageDigest5.Create;
try
Result := MD5.HashStreamAsHex(Stream);
finally
MD5.Free;
end;
end;
function getMd5HashString(value: string): string;
var
hashMessageDigest5 : TIdHashMessageDigest5;
begin
hashMessageDigest5 := nil;
try
hashMessageDigest5 := TIdHashMessageDigest5.Create;
Result := IdGlobal.IndyLowerCase ( hashMessageDigest5.HashStringAsHex ( value ) );
finally
hashMessageDigest5.Free;
end;
end;
procedure TForm1.Button1Click(Sender: TObject);
var
Path,hash : String;
SR : TSearchRec;
begin
if od1.Execute then
begin
Path:=ExtractFileDir(od1.FileName); //Get the path of the selected file
DirList:=TStringList.Create;
try
if FindFirst(Path+'\*.*', faArchive , SR) = 0 then
begin
repeat
if (SR.Size>10240) then
begin
hash := GetFileMD5(splitFile(Path+'\'+SR.Name));
end
else
begin
hash := GetFileMD5(streamFile(Path+'\'+SR.Name));
end;
memo1.Lines.Add(hash+' | '+SR.Name +' | '+inttostr(SR.Size));
application.ProcessMessages;
until FindNext(SR) <> 0;
FindClose(SR);
end;
finally
DirList.Free;
end;
end;
end;
output:
D41D8CD98F00B204E9800998ECF8427E | eslahat.docx | 13338
D41D8CD98F00B204E9800998ECF8427E | EXT-3000-Data-Sheet.pdf | 682242
D41D8CD98F00B204E9800998ECF8427E | faktor khate ekhtesasi firoozpoor.pdf | 50091
D41D8CD98F00B204E9800998ECF8427E | FileZilla_3.9.0.5_win32-setup.exe | 6057862
D41D8CD98F00B204E9800998ECF8427E | FileZilla_3.9.0.6_win32-setup.exe | 6126536
11210486C9E54E12DA9DF687792257EA | get_stats_of_all_members_of_mu(1).php | 6227
11210486C9E54E12DA9DF687792257EA | get_stats_of_all_members_of_mu.php | 6227
D41D8CD98F00B204E9800998ECF8427E | GOMAUDIOGLOBALSETUP.EXE | 6855616
D41D8CD98F00B204E9800998ECF8427E | harvester-master(1).zip | 54255
D41D8CD98F00B204E9800998ECF8427E | harvester-master.zip | 54180

Here is a procedure that I quickly wrote for you which would alow you to read part of file (chunk) into a memory stream.
The reason why I made this into a procedure and not function is so that it is posible to reuse same memory stream for diferent chunks. This way you avoid all those memory alocations/dealocations and also reduce the chance of introducing the memory leak.
In order to be able to do so you need to pas the memory stream handle to the procedure as variable parameter.
I also adad two more parameters. One for specifying the chunk size (amount of data that you want to read from file) and chunk number.
I also made some rudimentatry safeguards to tell you when you want to read a chunk that is beyond file scope. And also the ability to automatically reduce the size of the last chunk since not all file sizes are multiples of oyur chunk size (in your case not all files are exactly X megabytes in size where X is any valid integer).
procedure readFileChunk(FileName: string; var MS: TMemoryStream; ChunkNo: Integer; ChunkSize: Int64);
var fs: TFileStream;
begin
fs := TFileStream.Create(FileName, fmOpenRead or fmShareDenyWrite);
if ChunkSize * (ChunkNo-1) <= fs.Size then
begin
fs.Position := ChunkSize * (ChunkNo-1);
if fs.Position + ChunkSize <= fs.Size then
ms.CopyFrom(fs, ChunkSize)
else
ms.CopyFrom(fs, fs.Size - fs.Position);
end
else
MessageBox(Form2.WindowHandle, 'File does not have so many chunks', 'WARNING!', MB_OK);
fs.Free;
end;
You use this procedure by calling:
readFileChunk(FileName,MemoryStream,ChunkNumber,ChunkSize);
Make sure you have already created the memory stream before calling this procedure.
Also if you want to reuse the same memory stream multiple times don't forget to set its postion to 0 before calling this procedure othevise new data will be added to the end of the stream and in turn keep increasing the memory stream size.
UPDATE:
After doing some trials I found out that the problem resides in your GetFileMD5 method.
I can't explain exactly why this is happening but if you pass a TMemoryStream to TStream parameter, the TStream parameters simply doesent accept it so the MD5 hashing algorithm the treats it as empty handle.
When I went and changed the parameter type to TMemoryStream instead the code worked but you no longer could pass a TFileStream to GetFileMD5 method anymore so it broke hash generation from entire files that worked before.
SOLUTION:
So after doing some more digging I have a GREAT news for you.
You don't even need to use TMemoryStreams. The "HashStreamAsHex" function can accept two optional parameters which alows you to define the starting point of your data and the size of data block from which you wanna generate the MD5 hash string. And this also works with TFileStream.
So in order to generate MD5 hash string from just small part of your file call this:
MD5.HashStreamAsHex(Stream,StartPosition,DataSize);
StartPositon specifies the inital offset into the stream for the hashing operation. When StartPosition contains a positive non-zero value, the stream position is moved to the specified offset prior to calculating the hash value. When StartPosition contains the value -1, the current position of the stream is used as the initial offset into the specified stream.
DataSize indicates the number of bytes from the stream to include in the hashing operation. When DataSize contains a negative value (<0), the bytes remaining from the current stream position are used for the hashing operation. Otherwise, the number of bytes in DataSize is used. If DataSize is larger than the size of the stream, the smaller of the two values is used for the operation.
In your case for getting the MD5 hash from the first MegaByte you would call:
MD5.HashStreamAsHex(Stream,0,1024*1024);
Now I belive you can modify the rest of your code to get this working as you want to. If not do tell where it stopped and I will help you.

I'm assuming that your code does not raise an exception. If it did you surely would have mentioned that. I also assume that the file is large enough for your attempted read.
Your code does copy. If the call to CopyFrom does not raise an exception then the memory stream contains the first 1024000 bytes of the file.
However, after the call to CopyFrom, the memory stream's pointer is at the end of the stream so if you read from it you will not be able to read anything. Perhaps you need to move the stream pointer to the beginning:
ms.Position := 0;
And then read from the memory stream.
1MB = 1024*1024, FWIW.
Update
Probably my assumptions above were incorrect. It seems likely that your code raises an exception because you attempt to read beyond the end of the file.
What you really seem to be wanting to do is to read as much of the first part of the file as possible. That's a two-liner.
BytesToRead := Min(Source.Size-Source.Position, 1024*1024);
Dest.CopyFrom(Source, BytesToRead);

Related

Delphi - Saving Records to File using Streams

Delphi Tokyo - I have a parameter file that I am needing to save (and later load) from disk. The parameters are a series of record objects. There is one HEADER record and then multiple COMMAND records. These are true records (i.e type = records). The HEADER record has String, Boolean, Integer, and TStringList types within it. I save, which appears to work fine, but when I load, whatever is AFTER a TStringList causes a Stream read error. For example...
type tEDP_PROJ = record
Version : Integer;
Name: String;
...
ColList1: TStringList;
ColList2: TStringList;
ReadyToRun : Boolean;
...
end;
When I read ReadyToRun I get a Stream read error. If I move it BEFORE TStringList (on both SAVE and LOAD routines) then ReadyToRun will load properly, but whatever is after the TStringList will cause an error. It is interesting to note that ColList2 loads fine (even though it is NOT the first TStringList).
I am specifying the Encoding method when I save the TStringList.
...
ColList1.SaveToStream(SavingStream, TEncoding.Unicode);
ColList2.SaveToStream(SavingStream, TEncoding.Unicode);
I am using the same encoding when I load from the (file) Stream.
...
ColList1.LoadFromStream(SavingStream, TEncoding.Unicode);
ColList2.LoadFromStream(SavingStream, TEncoding.Unicode);
Note that when I create the StringList, I am just doing the standard create...
ColList1 := TStringList.Create;
When I save and load, I am following the examples Remy gave here...
The TStringList appears to be changing the way that the stream reads non-TStringList types... What do I need to do to fix this?
Why are you using TEncoding.Unicode? TEncoding.UTF8 would have made more sense.
In any case, this is not an encoding issue. What you are attempting to do will simply not work the way you are trying to do it, because TStrings data is variable-length and needs to be handled accordingly. However, TStrings does not save any kind of terminating delimiter or size information to an output stream. When loading in a stream, TStrings.LoadFromStream() simply reads the ENTIRE stream (well, everything between the current Position and the End-Of-Stream, anyway). That is why you are getting streaming errors when trying to read/write any non-TStrings data after any TStrings data.
Just like the earlier code needed to serialize String data and other variable-length data into a flat format to know where one field ends and the next begins, you need to serialize TStrings data as well.
One option is to save a TStrings object to an intermediate TMemoryStream first, then write that stream's Size to your output stream followed by the TMemoryStream's data. When loading back later, first read the Size, then read the specified number of bytes into an intermediate TMemoryStream, and then load that stream into your receiving TStrings object:
procedure WriteInt64ToStream(Stream: TStream; Value: Int64);
begin
Stream.WriteBuffer(Value, Sizeof(Value));
end;
function ReadInt64FromStream(Stream: TStream): Int64;
begin
Stream.ReadBuffer(Result, Sizeof(Result));
end;
procedure WriteStringsToStream(Stream: TStream; Values: TStrings);
var
MS: TMemoryStream;
Size: Int64;
begin
MS := TMemoryStream.Create;
try
Values.SaveToStream(MS, TEncoding.UTF8);
Size := MS.Size;
WriteInt64ToStream(Stream, Size);
if Size > 0 then
begin
MS.Position := 0;
Stream.CopyFrom(MS, Size);
end;
finally
MS.Free;
end;
end;
procedure ReadStringsFromStream(Stream: TStream; Values: TStrings);
var
MS: TMemoryStream;
Size: Int64;
begin
Size := ReadInt64FromStream(Stream);
MS := TMemoryStream.Create;
try
if Size > 0 then
begin
MS.CopyFrom(Stream, Size);
MS.Position := 0;
end;
Values.LoadFromStream(MS, TEncoding.UTF8);
finally
MS.Free;
end;
end;
Another option is to write the number of string elements in the TStrings object to your output stream, and then write the individual strings:
procedure WriteStringsToStream(Stream: TStream; Values: TStrings);
var
Count, I: Integer;
begin
Count := Values.Count;
WriteIntegerToStream(Stream, Count);
for I := 0 to Count-1 do
WriteStringToStream(Stream, Values[I]);
end;
procedure ReadStringsFromStream(Stream: TStream; Values: TStrings);
var
Count, I: Integer;
begin
Count := ReadIntegerFromStream(Stream);
if Count > 0 then
begin
Values.BeginUpdate;
try
for I := 0 to Count-1 do
Values.Add(ReadStringFromStream(Stream));
finally
Values.EndUpdate;
end;
end;
end;
Either way, you can then do this when streaming your individual records:
WriteIntegerToStream(SavingStream, Version);
WriteStringToStream(SavingStream, Name);
...
WriteStringsToStream(SavingStream, ColList1);
WriteStringsToStream(SavingStream, ColList2);
WriteBooleanToStream(SavingStream, ReadyToRun);
Version := ReadIntegerFromStream(SavingStream);
Name := ReadStringFromStream(SavingStream);
...
ReadStringsFromStream(SavingStream, ColList1);
ReadStringsFromStream(SavingStream, ColList2);
ReadyToRun := ReadBooleanFromStream(SavingStream);

Saved data from TBlobField is corrupted for lengths >= 100KB

I'm modifying a program that is written in Delphi 6.0
I have a table in Oracle with a BLOB column named FILE_CONTENT.
I have already managed to upload an XML File that is about 100 KB. I have verified that the file content was correctly uploaded using SQL Developer.
The problem I have is when I try to download back the file content from DB to a file. This is an example code I'm using to donwload it:
procedure TfrmDownload.Save();
var
fileStream: TFileStream;
bField: TBlobField;
begin
dmDigital.qrGetData.Open;
dmDigital.RequestLive := True;
bField := TBlobField(dmDigital.qrGetData.FieldByName('FILE_CONTENT'));
fileStream := TFileStream.Create('FILE.XML', fmCreate);
bField.SaveToStream(fileStream);
FlushFileBuffers(fileStream.Handle);
fileStream.Free;
dmDigital.qrGetData.Close;
end;
The previous code already downloads the file content to FILE.XML. I'm using RequestLive:=True to be able to download a large BLOB (otherwise the file content is truncated to 32K max)
The resulting file is the same size as the original file. However, when I compare the downloaded file with the original one there are some differences (for example the last character is missing and other characters are also changed), therefore it seems to be a problem while downloading the content.
Do you know what cuould be wrong?
The problem seems to be related to Delphi code because I already tried with C# and the file content is downloaded correctly.
Don't use TBlobField.SaveToStream() directly, use TDataSet.CreateBlobStream() instead (which is what TBlobField.SaveToStream() uses internally anyway):
procedure TfrmDownload.Save;
var
fileStream: TFileStream;
bField: TField;
bStream: TStream;
begin
dmDigital.qrGetData.Open;
try
dmDigital.RequestLive := True;
bField := dmDigital.qrGetData.FieldByName('FILE_CONTENT');
bStream := bField.DataSet.CreateBlobStream(bField, bmRead);
try
fileStream := TFileStream.Create('FILE.XML', fmCreate);
try
fileStream.CopyFrom(bStream, 0);
FlushFileBuffers(fileStream.Handle);
finally
fileStream.Free;
end;
finally
bStream.Free;
end;
finally
dmDigital.qrGetData.Close;
end;
end;
TDataSet.CreateBlobStream() allows the DataSet to decide the best way to access the BLOB data. If the returned TStream is not delivering the data correctly, then either the TStream class implementation that CreateBlobStream() uses is broken, or the underlying DB driver is buggy. Try taking CopyFrom() out of the equation so you can verify the data as it is being retrieved:
procedure TfrmDownload.Save;
const
MaxBufSize = $F000;
var
Buffer: array of Byte;
N: Integer;
fileStream: TFileStream;
bField: TField;
bStream: TStream;
begin
dmDigital.qrGetData.Open;
try
dmDigital.RequestLive := True;
bField := dmDigital.qrGetData.FieldByName('FILE_CONTENT');
bStream := bField.DataSet.CreateBlobStream(bField, bmRead);
try
fileStream := TFileStream.Create('FILE.XML', fmCreate);
try
//fileStream.CopyFrom(bStream, 0);
SetLength(Buffer, MaxBufSize);
repeat
N := bStream.Read(PByte(Buffer)^, MaxBufSize);
if N < 1 then Break;
// verify data here...
fileStream.WriteBuffer(PByte(Buffer)^, N);
until False;
FlushFileBuffers(fileStream.Handle);
finally
fileStream.Free;
end;
finally
bStream.Free;
end;
finally
dmDigital.qrGetData.Close;
end;
end;

How to read first and last 64kb of a video file in Delphi?

I want to use a subtitle API. It requires a md5 hash of first and last 64kb of the video file. I know how to do the md5 part just want to know how will I achieve to get the 128kb of data.
Here is the solution to the problem in Java which I am unable to implement in Delphi. How to read first and last 64kb of a video file in Java?
My Delphi code so far:
function TSubdbApi.GetHashFromFile(const AFilename: string): string;
var
Md5: TIdHashMessageDigest5;
Filestream: TFileStream;
Buffer: TByteArray;
begin
Md5 := TIdHashMessageDigest5.Create;
Filestream := TFileStream.Create(AFilename, fmOpenRead, fmShareDenyWrite);
try
if Filestream.Size > 0 then begin
Filestream.Read(Buffer, 1024 * 64);
Filestream.Seek(64, soFromEnd);
Filestream.Read(Buffer, 1024 * 64);
Result := Md5.HashStreamAsHex(Filestream);
end;
finally
Md5.Free;
Filestream.Free;
end;
end;
I am not getting the accurate md5 hash as stated by the official API.API url here. I am using Delphi XE8.
The hash function used by that API is described as:
Our hash is composed by taking the first and the last 64kb of the
video file, putting all together and generating a md5 of the resulting
data (128kb).
I can see a few problems in your code. You are hashing the file stream, not your Buffer array. Except that you were overwriting that array by subsequent reading from the file stream. And you were trying to seek only 64 bytes, and beyond the end of the stream (you need to use a negative value to seek from the end of the stream). Try something like this instead:
type
ESubDBException = class(Exception);
function TSubdbApi.GetHashFromFile(const AFileName: string): string;
const
KiloByte = 1024;
DataSize = 64 * KiloByte;
var
Digest: TIdHashMessageDigest5;
FileStream: TFileStream;
HashStream: TMemoryStream;
begin
FileStream := TFileStream.Create(AFileName, fmOpenRead, fmShareDenyWrite);
try
if FileStream.Size < DataSize then
raise ESubDBException.Create('File is smaller than the minimum required for ' +
'calculating API hash.');
HashStream := TMemoryStream.Create;
try
HashStream.CopyFrom(FileStream, DataSize);
FileStream.Seek(-DataSize, soEnd);
HashStream.CopyFrom(FileStream, DataSize);
Digest := TIdHashMessageDigest5.Create;
try
HashStream.Position := 0;
Result := Digest.HashStreamAsHex(HashStream);
finally
Digest.Free;
end;
finally
HashStream.Free;
end;
finally
FileStream.Free;
end;
end;

In Delphi, use SaveToStream to save ClientDataSets plus other material to a file?

I would like to use SaveToStream to save a ClientDataSet ALONG WITH OTHER MATERIAL. Here is a short sample:
filename := ChangeFileExt(Application.ExeName, '.dat');
FS := TFileStream.Create(filename, fmCreate);
CDS.SaveToStream(FS);
ShowMessage('After save, position is ' + IntToStr(FS.Position));
{now write a longint}
L := 1234;
siz := SizeOf(L);
Write(L, siz);
FS.Free;
But when I try to load this back in using LoadFromStream, and I again display the position after the ClientDataSet has been loaded, I see that the position is now 4 bytes AFTER the clientdataset was originally saved. It seems that CDS.LoadFromStream just plows ahead and consumes whatever follows it. As a result, when I then try to read the longint, I get an end of file error.
It is not sufficient to just use the CDS.SaveToStream at the end of creating a file, because what I'd really like to do is to save TWO clientdatasets to the file, one after the other, plus other material.
Ideas? Thanks.
[NB, this solution is essentially doubling up the work that (TLama's suggestion) "ReadDataPacket/WriteDataPacket" already does internally. I would use TLama's approach i.e. sub-class TClientDataSet to expose the above protected methods, and use the WriteSize parameter.]
Save the datasets to a temporary stream and then copy that to your destination stream with size information:
procedure InternalSaveToStream(AStream: TStream);
var
ATempStream: TMemoryStream;
ASize: Int64;
begin
ATempStream := TMemoryStream.Create;
// Save first dataset:
DataSet1.SaveToStream(ATempStream, dfBinary);
ASize := ATempStream.Size;
AStream.WriteData(ASize);
ATempStream.Position := 0;
AStream.CopyFrom(ATempStream, ALength);
ATempStream.Clear;
// Save second dataset:
DataSet2.SaveToStream(ATempStream, dfBinary);
ASize := ATempStream.Size;
AStream.WriteData(ASize);
ATempStream.Position := 0;
AStream.CopyFrom(ATempStream, ALength);
ATempStream.Clear;
FreeAndNil(ATempStream);
end;
To read back, first read the size and then copy that section of your source to a temporary stream again and load your dataset from that:
procedure InternalLoadFromStream(AStream: TStream);
var
ATempStream: TMemoryStream;
ASize: Int64;
begin
ATempStream := TMemoryStream.Create;
// Load first datset:
AStream.Read(ASize,SizeOf(ASize));
ASize := ATempStream.Size;
ATempStream.CopyFrom(AStream,ASize);
ATempStream.Position := 0;
DataSet1.LoadFromStream(ATempStream);
//...etc.
end;

Delphi: Alternative to using Reset/ReadLn for text file reading

i want to process a text file line by line. In the olden days i loaded the file into a StringList:
slFile := TStringList.Create();
slFile.LoadFromFile(filename);
for i := 0 to slFile.Count-1 do
begin
oneLine := slFile.Strings[i];
//process the line
end;
Problem with that is once the file gets to be a few hundred megabytes, i have to allocate a huge chunk of memory; when really i only need enough memory to hold one line at a time. (Plus, you can't really indicate progress when you the system is locked up loading the file in step 1).
The i tried using the native, and recommended, file I/O routines provided by Delphi:
var
f: TextFile;
begin
Reset(f, filename);
while ReadLn(f, oneLine) do
begin
//process the line
end;
Problem withAssign is that there is no option to read the file without locking (i.e. fmShareDenyNone). The former stringlist example doesn't support no-lock either, unless you change it to LoadFromStream:
slFile := TStringList.Create;
stream := TFileStream.Create(filename, fmOpenRead or fmShareDenyNone);
slFile.LoadFromStream(stream);
stream.Free;
for i := 0 to slFile.Count-1 do
begin
oneLine := slFile.Strings[i];
//process the line
end;
So now even though i've gained no locks being held, i'm back to loading the entire file into memory.
Is there some alternative to Assign/ReadLn, where i can read a file line-by-line, without taking a sharing lock?
i'd rather not get directly into Win32 CreateFile/ReadFile, and having to deal with allocating buffers and detecting CR, LF, CRLF's.
i thought about memory mapped files, but there's the difficulty if the entire file doesn't fit (map) into virtual memory, and having to maps views (pieces) of the file at a time. Starts to get ugly.
i just want Reset with fmShareDenyNone!
With recent Delphi versions, you can use TStreamReader. Construct it with your file stream, and then call its ReadLine method (inherited from TTextReader).
An option for all Delphi versions is to use Peter Below's StreamIO unit, which gives you AssignStream. It works just like AssignFile, but for streams instead of file names. Once you've used that function to associate a stream with a TextFile variable, you can call ReadLn and the other I/O functions on it just like any other file.
You can use this sample code:
TTextStream = class(TObject)
private
FHost: TStream;
FOffset,FSize: Integer;
FBuffer: array[0..1023] of Char;
FEOF: Boolean;
function FillBuffer: Boolean;
protected
property Host: TStream read FHost;
public
constructor Create(AHost: TStream);
destructor Destroy; override;
function ReadLn: string; overload;
function ReadLn(out Data: string): Boolean; overload;
property EOF: Boolean read FEOF;
property HostStream: TStream read FHost;
property Offset: Integer read FOffset write FOffset;
end;
{ TTextStream }
constructor TTextStream.Create(AHost: TStream);
begin
FHost := AHost;
FillBuffer;
end;
destructor TTextStream.Destroy;
begin
FHost.Free;
inherited Destroy;
end;
function TTextStream.FillBuffer: Boolean;
begin
FOffset := 0;
FSize := FHost.Read(FBuffer,SizeOf(FBuffer));
Result := FSize > 0;
FEOF := Result;
end;
function TTextStream.ReadLn(out Data: string): Boolean;
var
Len, Start: Integer;
EOLChar: Char;
begin
Data:='';
Result:=False;
repeat
if FOffset>=FSize then
if not FillBuffer then
Exit; // no more data to read from stream -> exit
Result:=True;
Start:=FOffset;
while (FOffset<FSize) and (not (FBuffer[FOffset] in [#13,#10])) do
Inc(FOffset);
Len:=FOffset-Start;
if Len>0 then begin
SetLength(Data,Length(Data)+Len);
Move(FBuffer[Start],Data[Succ(Length(Data)-Len)],Len);
end else
Data:='';
until FOffset<>FSize; // EOL char found
EOLChar:=FBuffer[FOffset];
Inc(FOffset);
if (FOffset=FSize) then
if not FillBuffer then
Exit;
if FBuffer[FOffset] in ([#13,#10]-[EOLChar]) then begin
Inc(FOffset);
if (FOffset=FSize) then
FillBuffer;
end;
end;
function TTextStream.ReadLn: string;
begin
ReadLn(Result);
end;
Usage:
procedure ReadFileByLine(Filename: string);
var
sLine: string;
tsFile: TTextStream;
begin
tsFile := TTextStream.Create(TFileStream.Create(Filename, fmOpenRead or fmShareDenyWrite));
try
while tsFile.ReadLn(sLine) do
begin
//sLine is your line
end;
finally
tsFile.Free;
end;
end;
If you need support for ansi and Unicode in older Delphis, you can use my GpTextFile or GpTextStream.
As it seems the FileMode variable is not valid for Textfiles, but my tests showed that multiple reading from the file is no problem. You didn't mention it in your question, but if you are not going to write to the textfile while it is read you should be good.
What I do is use a TFileStream but I buffer the input into fairly large blocks (e.g. a few megabytes each) and read and process one block at a time. That way I don't have to load the whole file at once.
It works quite quickly that way, even for large files.
I do have a progress indicator. As I load each block, I increment it by the fraction of the file that has additionally been loaded.
Reading one line at a time, without something to do your buffering, is simply too slow for large files.
I had same problem a few years ago especially the problem of locking the file. What I did was use the low level readfile from the shellapi. I know the question is old since my answer (2 years) but perhaps my contribution could help someone in the future.
const
BUFF_SIZE = $8000;
var
dwread:LongWord;
hFile: THandle;
datafile : array [0..BUFF_SIZE-1] of char;
hFile := createfile(PChar(filename)), GENERIC_READ, FILE_SHARE_READ or FILE_SHARE_WRITE, nil, OPEN_EXISTING, FILE_ATTRIBUTE_READONLY, 0);
SetFilePointer(hFile, 0, nil, FILE_BEGIN);
myEOF := false;
try
Readfile(hFile, datafile, BUFF_SIZE, dwread, nil);
while (dwread > 0) and (not myEOF) do
begin
if dwread = BUFF_SIZE then
begin
apos := LastDelimiter(#10#13, datafile);
if apos = BUFF_SIZE then inc(apos);
SetFilePointer(hFile, aPos-BUFF_SIZE, nil, FILE_CURRENT);
end
else myEOF := true;
Readfile(hFile, datafile, BUFF_SIZE, dwread, nil);
end;
finally
closehandle(hFile);
end;
For me the speed improvement appeared to be significant.
Why not simply read the lines of the file directly from the TFileStream itself one at a time ?
i.e. (in pseudocode):
readline:
while NOT EOF and (readchar <> EOL) do
appendchar to result
while NOT EOF do
begin
s := readline
process s
end;
One problem you may find with this is that iirc TFileStream is not buffered so performance over a large file is going to be sub-optimal. However, there are a number of solutions to the problem of non-buffered streams, including this one, that you may wish to investigate if this approach solves your initial problem.

Resources