Get a decompression stream for a file in a Zip file - delphi

Background: I'm processing some log files that are zipped up (I'm inserting the log details into a database). The log files are many gigabytes in size and would be nice to be able to process a file without extracting them to disk (the zip files are only a few hundred megabytes). I have this working for an extracted log file (using a TStreamReader).
What I want to do is get some sort of decompression stream for one of the files in the zip file, and use that stream to process the log file without having to extract the entire file to disk. The decompression stream needs to support files larger than 4GB.
TZipFile in system.zip looks like it has support for this, but I've not been able to get it to work - probably because it only supports 32bit file sizes and I'm dealing with files bigger than this. I also have abbrevia installed, but can't see anything that would allow me to do this.
I'm using Delphi XE7 for this project. Note that I'm not want to extract a zip file to a stream (the files are many GB and there wont be enough physical memory), but get the actual decompression stream.

I went with using the standard TZipFile in System.Zip and checking if the internal file is bigger than 4GB - roughly
VAR
zipStream : TStream;
LocalHeader : TZipHeader;
StreamSize : Int64;
FileName : string;
begin
//...
Zip := TZipFile.Create;
Zip.Open (FileName, TZipMode.zmRead);
Zip.Read ('somefile.xml', zipStream, LocalHeader);
StreamSize := LocalHeader.UncompressedSize;
if StreamSize = MAXDWORD then
ShowMessage ('File is too large, only the first 4GB will be processed'#10'To process entire file you must extract .xml file manually');
//process zipStream...
end;
Not perfect, but still useful

Related

Delphi TZipFile extracts zero byte files

Using Delphi XE2 and the native TZipFile I attempt to extract the contents of a downloaded zip file (which contains 2 zipped XML files) and it always extracts zero byte files.
The file is being compressed by C# code like this:
var zipFile = new ZipFile();
foreach (Tuple<string, string> t in filesMeta) {
zipFile.AddFile(string.Format("{0}{1}", StaticVariables.WebServerFileStorage, t.Item2), "").FileName = t.Item1 + ".xml";
}
response.Clear();
response.ContentType = "application/zip";
zipFile.Save(response.OutputStream);
response.End();
The Delphi extraction code is this:
zipFile := TZipFile.Create;
try
filename := 'C:\test\57f52480-ec87-4169-a820-0a65bc4ad952.zip';
if zipFile.IsValid(filename) then begin
zipFile.Open(filename, zmRead);
zipFile.ExtractAll('C:\test\');
end;
finally
zipFile.Free;
end;
I even tried using a TStream as the source instead of a file on disk. That's actually what I want to do since the zip file is downloaded from a web server into a TStream. I tried to extract the data using the overloaded open method of TZipFile to open the stream.
That got me zero byte files so I saved the zip file to disk and tried to open the file from disk and extract. Same zero byte files are extracted.
I even tried using the class method to extract the files from the zip file on disk:
System.Zip.TZipFile.ExtractZipFile(filename, 'C:\Test\');
Same zero byte files extracted.
The zip file is valid and the 2 zipped XML files can be extracted properly by both Windows 7 native file handling and 7-Zip.
Now here is something nutty...
In desperation I tried to see what the ExtractToFile() procedure
David Heffernan came up with in this question about extracting a zip to a stream would do so I tried using it like this:
var x : integer;
var fileCount : integer;
var fileNames : TArray<string>;
begin
zipFile := TZipFile.Create;
try
filename := 'C:\test\57f52480-ec87-4169-a820-0a65bc4ad952.zip';
if zipFile.IsValid(filename) then begin
zipFile.Open(filename, zmRead);
fileCount := zipFile.FileCount;
fileNames := copy(zipFile.FileNames, 0, MaxInt);
zipFile.Close;
for x := 0 to fileCount-1 do begin
// Use David Heffernan's stream procedure
ExtractToFile(filename, x, 'C:\test\' + fileNames[x]);
end;
end;
finally
zipFile.Free;
end;
end;
And David's procedure extracts the files to disk as expected! WTF???
I am really confused why a convoluted method of extraction would work and the simple extraction method would not work. I'll use David's example if I have to but I'd prefer to get the normal extract working if possible.
Any ideas are appreciated.
Cheers!
TJ
I had this same problem.
The source code to TZipFile shows that the TStream passed into the Read function returns the entire zip file with position set to the start of the filename you're wanting. So don't rewind. Just copyfrom or do what you want with the TStream for the uncompressed length given in the TZipHeader.
ZipStream := TStream.Create;
ZipFile.Read(MyFileName, ZipStream, ZipHeader);
//leave ZipStream pointer where it is!!!
SomethingElse.LoadFromStream(ZipStream, ZipHeader.UncompressedSize);
ZipStream.Free;
In my opinion, TZipFile should really load the ZipStream with only what is requested. The way this is implemented is not intuitive without first going through the TZipFile source code.
TL;DR: The solution to my problem was an external component. Zip Forge (or Abbrevia)
Read on for details.
Nothing I tried except for the roundabout way of saving the file and re-opening it using David's function worked. While that would have worked, it was not optimal as it required me to save the downloaded file to disk first and reopen it for extract and then delete the zip file. My goal was to just open the downloaded stream and extract the files directly to disk. One write to disk and no temporary file.
We even tried two different C# libraries to zip the files and both gave the same results on the streamed data. The Delphi TZipFile component could not handle it.
It turns out we have a license to ZipForge which I had forgotten about since I had not used it in ages it and that handles the download stream from the C# web server and extracts the files successfully.
For reference, I also tried the Abbrevia component version 5.2 and that also successfully extracted the files from the stream.
Hopefully this will help someone else.
All the suggestions by David and Uwe were appreciated.
Cheers!
TJ

How To check if zipfile already has been downloaded

I have made a Delphi application which downloads a zipfile (update.zip) at a regular interval. In the zipfiles there are DLL and Exe.
The zipfile is unzipped and the DLLs and Exes are copied to the correct folder.
What I want to Know is how can I know if the zipfile has been downloaded by the client so it doesn't have to download it again. Because it has already been processed by the client. But when the contents of the zipfile has changed then
it must download the zipfile again.
The contents of the zipfiles can change if we build a new DLL or Exe. But the name of the zipfile is the Same.
If you want to know whether the zip file has changed without downloading the zip file, then your server will have to provide some other way for you to discover what versions of DLLs and EXEs are on the server. That could be as simple as keeping a text file on the server. Download that file instead of the whole zip file. If the versions in that text file are newer than the versions you have locally, then download the zip file.
You can also avoid processing the zip file by deleting it after you've processed it the first time. Instead of comparing versions in the text file with versions of files in the local zip file, you can compare the text-file versions with the versions of the actual files on disk.
you could do a HTTP HEAD; and check the file last-modified date on server and local if changed download the new file.
uses
......, IdHTTP;
function getHTTPLastModified(url: string): TDateTime;
var
HTTP: TIdHTTP;
begin
try
try
HTTP := TIdHTTP.Create(nil);
HTTP.Head( url );
result:=HTTP.Response.LastModified;
except
on E: Exception do
//ShowMessage('ProcessHttpRequest failed.');
result := 0;
end;
finally
try
HTTP.Disconnect;
except
end;
end;
end;

Extract zip file to TStream using zlibar in Lazarus

I'm trying to extract a zip file from a TMemoryStream to another TMemoryStream using zlibar in Lazarus. From what I can tell, my code follows the examples found here. I am using a simple zip archive with one text file in it. The zip archive was created using PowerArchiver, nothing special. Here is my code:
uses
zlibar;
var
z, Dest: TMemoryStream;
unZip: TZLibReadArchive;
begin
z := TMemoryStream.Create;
z.LoadFromFile('kov.zip');
unZip := TZLibReadArchive.Create(z);
UnZip.ExtractFileToStream(0, Dest);
I am getting this error: "ZLibError(2) corrupt file or not a correct file type."
See zlibar.pas here: https://dl.dropbox.com/u/8899944/files/zlibar.pas
Any ideas why I am getting this error? Thanks.
The Zlibar library does not read zip files. It reads and writes a custom archive format. You can tell because the table-of-contents format described in zlibar.pas is completely different from the one used in zip files.
The FreePascalArchivePackage link looks like it might someday provide what you want, although the page last had significant changes in 2007.
There's also the ZipFile package, which appears to come with Lazarus.
Just a quick guess: Try to set z.Position := 0 before unZip := TZLibReadArchive.Create(z);.

Delphi XE2 DataSnap - Streaming JPEG Files via TStream From Server To Client

I've written a DataSnap server method that returns a TStream object to transfer a file. The client application calls the method and reads the stream to download the file. The server method is very simple :
function TServerMethods.DownloadFile(sFilePath: string): TStream;
var
strFileStream: TFileStream;
begin
strFileStream := TFileStream.Create(sFilePath, fmOpenRead);
Result := strFileStream;
end;
It works fine downloading many file types (PDF, GIF, BMP, ZIP, EXE) but it doesn't work when downloading JPG files. On the client side the stream object returned from the method call is always 0 in size with JPGs. I can successfully stream JPG files locally on my PC, so it must be something to do with DataSnap. I've done some research which suggests DataSnap converts the stream to JSON behind the scenes and there could be a problem with this when it comes to JPG files - can anybody confirm this? On the client side I'm using the TDSRESTConnection to call the server method. I realise I could ZIP the JPG files before streaming, but would rather not have to do this.
Thought I'd update the thread on my attempts to resolve this. I never found a way to transfer a JPEG file over DataSnap using TStream, but have done it by converting the stream to a TJSONArray and passing this back instead. So my server method now looks as follows:
function TServerMethods.DownloadJPEGFile(sFilePath: string): TJSONArray;
var
strFileStream: TFileStream;
begin
strFileStream := TFileStream.Create(sFilePath, fmOpenRead);
Result := TDBXJSONTools.StreamToJSON(strFileStream, 0, strFileStream.Size);
end;
then at the client end I convert back to a TStream with:
strFileStream := TDBXJSONTools.JSONToStream(JSONArray);
I have created this as a new server method call purely for downloading JPEGs, as I've found transferring the files using TJSONArray instead of TStream is as much as 4 times slower, so I use my original method for all other file types.
Just as an update - after further research I've found this is related to the system locale in use on the PC. I'm using 'English (United Kingdom)' but if I change this to for example 'Japan (Japanese)' then the errors disappear and the file transfer works fine. I've logged this as a QC report with Embarcadero.
Embarcadero have now come back with a fix to this problem (which also affects .DOC files) :
1.Copy '...\RAD Studio\9.0\source\data\datasnap\Datasnap.DSClientRest.pas' to your DataSnap Client project folder
2.Add the .pas file to the project
3.Modify Line#1288 as below
// LResponseJSON := TJSONObject.ParseJSONValue(BytesOf(LResponseText.StringValue), 0);
LResponseJSON := TJSONObject.ParseJSONValue(BytesOf(UTF8String(LResponseText.StringValue)), 0);
4.Rebuild DataSnap REST Client project
5.Run it with REST Server
This fixes the problem.
Add this line to your DownloadFile method:
GetInvocationMetadata.ResponseContentType := 'image/jpeg';

Delphi Determine filesize in real time

Is it possible in Delphi to determine the size of a file as it is being copied? I get a notification when a file is first copied to a folder, but need to wait until the copy is complete before I can process the file.
I've used JclFileUtils.GetSizeOfFile(Filename) but that gives me the 'expected' file size, not the current filesize.
Regards, Pieter
Prompted by the first answer I decided to give up on trying to determine when a file copy has completed. Instead I found that using TFileStream gave me a reliable indication whether a file is in use or not.
function IsFileInUse(Filename: string; var ResultMessage: string): boolean;
var
Stream: TFileStream;
begin
Result := True;
ResultMessage := '';
try
Stream := TFileStream.Create(Filename, fmOpenRead or fmShareDenyWrite);
try
Result := False;
finally
FreeAndNil(Stream);
end;
Except on E: Exception do
ResultMessage := 'IsFileInUse: ' + E.Message
end;
end;
In this way I can keep on checking until the file is not in use anymore before attempting to process it.
It depends on the technique that is used by the copying function. Most copy-methods will allocate the disk space first before they start to copy a file. Thus, if you want to copy a file of 4 GB, the system starts by creating a file with random data for 4 GB in total. (Which is done lightning-fast, btw.) It then copies the data itself, but the file size is already what you expect.
This has as advantage that the sysmen can check if there's enough disk space available to actually copy the data.
If you write your own file copy function then you can have total control over how it does this. Else, you're limited to whatever the chosen copy-method offers you. So, how do you copy a file?
If you have control over the file copy process, it is easiest to have the copy routine create the file using a temporary filename, and when done, rename it to correct filename.
That way, you can use Windows folder monitoring to watch for the renaming (JCL contains a component to help with this, not sure about the name from here). When your code gets triggered you are sure the other side has finished writing the file.
A simple trick I used was to have the copying process create new files with a '$$$' extension. My code still got triggered for those but I ignored them until they were renamed to their proper filename.
Hope this helps.

Resources