Calculate the checksum for a file before it has been downloaded? - checksum

Is it possible to calculate a file's checksum without possessing the file?
Background
I'm interested in creating some software that would be used to download external files. I must be careful, because the files can be altered by the file owners.
I would like to keep a list of checksum values inside the software, to allow the software to validate that the external file is what it claims to be.
I believe this is easily possible once the external file is stored locally (i.e. after it has been downloaded), but I would ideally like to calculate the checksum for the file before downloading. Is this possible?
Essentially I'm tying to get a file's checksum without actually possessing the file. I think that sounds impossible, but I'm new to checksums and may be missing obvious techniques.

I'm not an expert, but here's an idea.
Make your file owners' clients (or the server that receives the files) upload the checksums (and any other metadata you need) of the files separately (as a different file, or a database entry). Then your software can download the checksum and verify it before downloading the bigger file.

Related

Why am I sometimes getting files filled with zeros at their end after being downloaded?

I'm developing a download manager using Indy and Delphi XE (The application uses Multithreading to attempt several connections to the server). Everything works fine but sometimes the final downloaded file is broken and when I check downloaded temp files I see that 2 or 3 of them is filled with zero at their end. (Each temp file is download result of each connection).
The larger the file is, the more broken temp files I get as the result.
For example in one of the temp files which was 65,536,000 bytes, only the range of 0-34,359,426 was valid and from 34,359,427 to 64,535,999 it was full of zeros. If I delete those zeros, application will automatically download the missing segments and what I get as the result, well if the problem wouldn't happen again, is the healthy downloaded file.
I want to get rid of those zeros at the end of the temp files without having a lost in download speed.
P.S. I'm using TFileStream and I'm sending it directly to TIdHTTP and downloading the files using GET method.
Additional Info: I handle OnWork event which assigns AWorkCount to a public int64 variable. Each time the file is downloaded, the downloaded file size (That Int64 variable) is logged to a text file and from what the log says is that the file has been downloaded completely (even those zero bytes).
Make sure the server actually supports downloading byte ranges before you request a range to download. If the server does not support ranges, a requested range will be ignored by the server and the entire file will be sent instead. If you are not already doing so, you should be using TIdHTTP.Head() to text for range support before then calling TIdHTTP.Get(). You also need to do this anyway to detect if the remote file has been altered since the last time you downloaded it. Any decent download manager needs to be able to handle things like that.
Also keep in mind that if TIdHTTP knows up front how many bytes are being transferred, it will pre-allocate the size of the destination TStream before then downloading data into it. This is to speed up the transfer and optimize disc I/O when using a TFileStream. So you should NOT use TFileStream to access the same file as the destination for multiple simultaneous downloads, even if they are writing to different areas of the file. Pre-allocating multiple TFileStream objects will likely trample over each other trying to set the file size to different positions. If you need to download a file in multiple pieces simultaneously then either:
1) download each piece to a separate file and copy them into the final file as needed once you have all of the pieces that you need.
2) use a custom TStream class, or Indy's TIdEventStream class, to manage the file I/O yourself so you can ignore TIdHTTP's pre-allocation attempts and ensure that multiple file I/O operatons do not overlap each other incorrectly.

A list of professionally-useful and safe file types?

I have a system where users can upload, well, anything really - and these files are available to other users.
I need to come up with a list of file types that are genuinely needed by professionals in different industries that are safe from hacking/viruses, etc.
.doc .docx .gif .jpg .jpeg .mpg .mpeg .mp3 .odt .odp .ods .pdf .ppt .pptx .tif .tiff .txt .xls .xlsx .wav
What other file types do you know of that are both useful and safe?
Clarification
Many of the comments and responses are asking for a clearer definition of 'safe from hacking/viruses' - I ask the question with precisely that level of detail because I don't have as sophisticated an understanding of file types and their risks as many of you do, and I would like guidance on 1) any file types that may keep my site more secure, and 2) if there are no 'safe' file types then any advice on how to move forward with a system that allows for flexible uploading and sharing of files.
If indeed any malicious file can be packaged as a seemingly-safe file, how can I protect my users?
No filetype is safe if the program you use to open it with is badly (or carelessly or evil-y) written.
You can't assume that all files with a given extension is safe from 'viruses'.
I can easily rename a malicious executable to .doc and 'hack' your system.
EDIT:
There is no (simple?) way to check whether a user-uploaded file is malicious or not.
The app that you're creating is no different than any other file sharing websites out there (Rapidshare, Megaupload, etc).
There is nothing stopping anyone to upload malicious files to those websites.
Safe files does not exists. The ordinary text file is safe? For example with content:
format c:
if some program can execute a content of the file... you get the idea.
So, here are not safe files - only restrictions to RUN code (programs). (And I understand if this answer does not like.) :)
For "useful" you'll need to ask your customers.
For safe, there's no such thing because a file extension is just a part of the file name that gives a suggestion of what type of file it is. It need not accurately represent the type, and is easily manipulated.
Rather than protecting based on file type. I would get a 3rd party to virus scan each file on upload. Reject those which are identified as positive.
The list is pretty endless! A quick search finds http://filext.com/alphalist.php?extstart=^A
Well you can include all data files and exlude all executable/script files.
One list of executable file extensions is here: http://pcsupport.about.com/od/tipstricks/a/execfileext.htm
you may look other sources to inprove coverage.
Edit: for second part of the question addressing sequrity-
It would be best to have bunch of anti malware software installed on the server to check each sumbission - they are designed for this specialized task, use them. Anyways no executable file is professionaly useful as long as people are not looking for crackware.

Where to save some simple data?

I'm wondering where's the best place to save some simple insensitive data? Like a few URLs and some settings.
Please advise.
If this is a per-user file, you should save it in the current user's profile. For example, on my Windows 7 system, you should use
C:\Users\Andreas Rejbrand\AppData\Local\Your Company Name\Your Product Name\Version
such as
C:\Users\Andreas Rejbrand\AppData\Local\Rejbrand\AlgoSim\2.0
To get the C:\Users\Andreas Rejbrand\AppData\Local path, you use the SHGetSpecialFolderPath function.
Settings, and specifically user-specific settings, can be stored in the registry. Have a look at the Registry unit and the TRegistry object.
Here's some demo code to get you going:
var
r:TRegistry
begin
r:=TRegistry.Create;
try
r.OpenKey('\Software\MyApplication',true);
r.WriteInteger('Setting1',Setting1);
r.WriteString('Setting2',Setting2);
finally
r.Free;
end;
end;
INI file or JSON file or XML file depending on your needs for local usage.
DB is for net usage.
It all depends on the purpose of those settings! If you want XCopy deployment, I would suggest an XML file next to the exe. But if you also need to write to this, you should find a suitable location in the current user's profile or the "all users" profile. The registry (local machine or current user) would also be a good option for simple settings.
Another question is the type of settings that you need to store. If it's simple settings, I generally start with Altova's XMLSpy to generate an XML schema, defining the structure of the settings. Then I use Delphi's XML import wizard to generate code from this schema and just use that generated code. It allows me to modify the structure in an easy way and also makes sure there's at least some documentation (the schema) telling others about the structure. It might sound complex at first, but once you're used to this, it's perfect! No more manual editing of registry settings or forgetting about the structure of your INI files. And no more thinking about writing code to read and write those settings, since Delphi will do that for you!The Registry would also be a good location for settings but not every user will have proper access rights to read from, or write to, the registry which could crash your application. Besides, the registry has some other limitations which makes it unsuitable if you need to store a lot of settings! It would be okay to store a connection string and maybe username and encrypted password for some user account, but if you need to store 40 settings or more, then the Registry becomes unsuitable.The same is true about INI files, which tend to be limited to a maximum size of 64 kilobytes. Of course, you could also store those settings in a regular text file or just some binary file. In the past, I even stored settings inside a ZIP file, because I needed to store dozens of grid-related settings. So each grid would read and write it's settings to some binary stream which would then be stored in an encrypted ZIP file.
There are many options like XML (structured data storage), ini files (simple data), databases or flat files.
I will go for XML's saved with ClientDatasets. They allow lot of options like searching, sorting, usage of the database controls and many more.

How do I generate files and then zip/compress with Heroku?

I sort of want to do the reverse of this.
Instead of unzipping and adding the collection files to S3 I want to
On user's request:
generate a bunch of xml files
zip the xml files with some images (pre-existing images hosted on s3)
download zip
Does anybody know agood way of doing this? I think I could manage this no problem on a normal machine but Heroku complicates things somewhat in that it has a read-only filesystem.
From the heroku documentation on the read-only filesystem:
There are two directories that are writeable: ./tmp and ./log (under your application root). If you wish to drop a file temporarily for the duration of the request, you can write to a filename like #{RAILS_ROOT}/tmp/myfile_#{Process.pid}. There is no guarantee that this file will be there on subsequent requests (although it might be), so this should not be used for any kind of permanent storage.
You should be able to pretty easily write your generated xml files to tmp/ and keep track of the names, download and write the s3 files to the same directory, and (maybe?) invoke a zip command as long as the output is in tmp/, then serve the file to the browser with the correct mime type to prompt a download. I would only be concerned with how big the filesize is and if heroku has an undocumented limit on what they'll allow in the tmp directory. Especially since you are only performing this action for a one-time download in the duration of a single request, I think you have a good chance of being able to do it.
Edit: Looking around a bit, you might be able to use something like RubyZip to create your zip file if you want to avoid calling system commands.

(rails) how to validate whether an uploaded .txt file is not, say, an image file?

I have a upload text file field, and with it I plan to save the file somewhere and then store the location of the file in a database. However, I want to make sure the file they uploaded is a .txt file, and not, say, an image file. I imagine this happens in the validation step. How does one validate such a thing? Also, how do you get the filename of the uploaded file? I could always just check if it said '.txt' but for future reference knowing how to validate without just the filename would be helpful.
Trying to validate the contents of a file based on the filename extension is opening the door for major hackerdom. It's trivial to change the extension and upload the file.
If you are on a Mac/Linux/Unix-based system the OS "file" command is the standard because it looks inside the file for key bytes that flag file types. http://en.wikipedia.org/wiki/File_(Unix) I'm not sure what's available for Windows, but this might help: Determine file type in Ruby
One way of doing it, the simple way really, would be to pass the file through an image loader, preferably one that handles multiple common formats, and see if it throws an error.
The other way is to manually check the file header for common image format headers. For example, .bmp files start with BM. Other formats have their own specific markings you can use.

Resources