When I download, say an ISO image, using a torrent; should I still verify the file's integrity (by calculating sha256 hash, for example), or is this done automatically while downloading?
The BitTorrent protocol has a mechanism for automatically verifying each chunk's integrity after download. Of course, this should only reassure you if you trust the source of the file.
If you have a checksum for the whole file (eg. for some software package), you can definitely verify the file yourself afterwards.
Torrent files have an "announce" section, which specifies the URL of the tracker, and an "info" section, containing (suggested) names for the files, their lengths, the piece length used, and a SHA-1 hash code for each piece, all of which are used by clients to verify the integrity of the data they receive.
https://en.wikipedia.org/wiki/Bittorrent
Related
I am a newbie in CMake and trying to understand the following CMake command
FetchContent_Declare(curl
URL https://github.com/curl/curl/releases/download/curl-7_75_0/curl-7.75.0.tar.xz
URL_HASH SHA256=fe0c49d8468249000bda75bcfdf9e30ff7e9a86d35f1a21f428d79c389d55675
USES_TERMINAL_DOWNLOAD TRUE)
When I open a browser and put in https://github.com/curl/curl/releases/download/curl-7_75_0/curl-7.75.0.tar.xz, the file curl-7.75.0.tar.xz will start downloading without the need for the URL_HASH. I am sure it is not redundant. I wanted to know what the purpose of the URL_HASH is?
Also how can SHA256 be found? Because when I visit https://github.com/curl/curl/releases/download/curl-7_75_0 to find out more, the link is broken.
I am sure it is not redundant. I wanted to know what the purpose of the URL_HASH is?
Secure hash functions like SHA256 are designed to be one-way; it is (in practice) impossible to craft a malicious version of a file with the same SHA256 hash as the original. It is even impossible to find two arbitrary files that have the same hash. Such a pair is called a "collision" and finding even one would constitute a major breakthrough in cryptanalysis.
The purpose of this hash in a CMakeLists.txt, then, is as an integrity check. If a bad actor has intercepted your connection somehow, then checking the hash of the file you actually downloaded against this hard-coded expected hash will detect whether or not the file changed in transit. This will even catch less nefarious data corruptions, like those caused by a faulty hard drive.
Including such a hash (a "checksum") is absolutely necessary when downloading code or other binary artifacts.
Also how can SHA256 be found?
Often, these will be published alongside the binaries. Use a published value if available.
If you have to compute it yourself, you have a few options. On the Linux command line, you can use the sha256sum command. As a hack, you can write a deliberately wrong SHA256=0 value or something and fish the observed value from the error message.
Note that if you compute the hash yourself, you should either (a) download the file from an absolutely trusted connection and device or (b) download it from multiple independent devices (free CI systems like GitHub Actions are useful for this) and ensure the hash is the same across all of them.
Looking at corrupted files on FTP server I think about verifying files uploaded with TIdFtp.Put by downloading them just after upload and comparing byte-to byte.
I think that TIdFtp may be theoretically caching data and return it from cache instead of actually downloading.
Please allay or confirm my concerns.
No, there is no caching, as there is no such thing in the FTP protocol in general. TIdFTP deals only with live data.
Are you, perhaps, uploading binary files in ASCII mode? If so, that would alter line break characters (CR and LF) during transmission. That is a common mistake to make, since ASCII is FTP's default mode. Make sure you are setting the TIdFTP.TransferType property as needed before transferring a file. ASCII mode should only be used for text files, if used at all.
And FWIW, you may not need to download a file to verify its bytes. If the server supports any X<Hash> commands (where Hash can be SHA512, SHA256, SHA1, MD5, or CRC) , TIdFTP has VerifyFile() methods to use them. That calculates a hash of a local file and then compares it to a hash calculated by the server for a remote file. No transfer of file data is needed.
We have a scenario in our project where there are files coming from the client with the same file name, sometimes with the same file size too. Currently when we upload a file, we are checking the new file name with the existing files in the database and if there is a reference we are marking it as duplicate and would not allow to upload at all. But now we have a requirement to check the content of the file when they have the same file name. So we need to find out a solution to differentiate such files based on contents. So, how do we efficiently do that - meaning how to do it avoiding even a minute chance of error?
Rails 3.1, Ruby 1.9.3
Below is one option I have read from a web reference.
require 'digest'
digest_value = Digest::MD5.base64digest(File.read( file_path ))
And the above line will read all the contents of the incoming file and based on which it will generate a unique hash, right? Then we can use it for unique file identification. But we have more than 500 users simultaneously working in 24/7 mode and most of them will be doing this operation. So, if the incoming file has a huge size (> 25MB) then the Digest will take more time to read the whole contents and there by suffer performance issues. So, what could be a better solution considering all these facts?
I have read the question and the comments and I have to say you have the problem stated not 100% correct. It seems that what you need is to identify identical content. Period. Despite whether name and size are equal or not. Correct me if I am wrong, but you likely don’t want to allow users to update 100 duplicates of the same file just because the user has 100 copies of it in local, having different names.
So far, so good. I would use the following approach. The file name is not involved anyhow. The file size might help in terms of fast-check the uniqueness (sizes differ hence files are definitely different.)
Then one might allow the upload with an instant “OK” response. Afterwards, the server in the background should run Digest::MD5, comparing the file against all already uploaded. If there is a duplicate, the new copy of the file should be removed, but the name should stay on the filesystem, being a symbolic link to the original.
That way you’ll not frustrate users, giving them an ability to have as many copies of the file as they want under different names, while preserving the HDD volume at the lowest possible level.
I am implementing pdf upload using Carrierwave with Rails 4. I was asked by the client about malicious content, e.g. if someone attempts to upload a malicious file masked as a pdf. I will be restricting filetype on the frontend to 'application/pdf'. Is there anything else I need to worry about, assuming the uploaded file has a .pdf extension?
File uploads is often a security issue, since there are so many ways to get it wrong. Regarding just the issue of masking a malicious file as a PDF, checking the content type (application/pdf) is good, but not enough, since it's controlled by the client and can be modified.
Filtering on the .pdf extension is definitely advisable, but make sure you don't accept files like virus.pdf.exe.
Other filename attack techniques exist, e.g. involving null or control characters.
Consider using a file type detector to determine that the file is really a PDF document.
But that's just for restricting the file type. There are many other issues you need to be aware of when accepting file uploads.
PDF files can contain malicious code and are a common attack vector.
Make sure uploaded files are written to an appropriate directory on the server. If they aren't meant to be publicly accessible, choose a directory outside of the web root.
Restrict the maximum upload file size.
This is not a complete list by any means. Check out the Unrestricted File Upload vulnerability by OWASP for more info.
In addition to #StefanOS 's great answer, PDF files are required to start with the string:
%PDF-[VERSION]
Generally, at least often, the first couple of bytes (or more) indicate the file type - especially for executables (i.e., Windows executables, called PE files, should start - if memory serves - with "MZ").
For uploaded PDF files, opening the uploaded file and reading the first 5 bytes should always yield %PDF-.
This might be a good enough verification. for most use-cases.
There are some files that I want to download and store in the sandbox. However, they must stay secure (i.e. encrypted) all the time. Now, I can encrypt them while downloading to the Documents itself. But when the files need to be consumed I have to decrypt them before that. The question is where to put these decrypted files?
tmp - Looks like a good place to keep it, but then what if the contents are deleted when the app has been kept minimised for days.
Documents - Keeping the decrypted file here in a separate place may not be a very good idea. It is not automatically cleaned up when the app is relaunched and if the device runs out of battery while the app is still running, these decrypted files will get exposed.
So the moot question is what the best way to ensure Documents directory's data security.
One useful aspect of UNIX-based systems is that you can create/open a file and then immeditely delete the file. The file won't be accessible from outside the app, however the app will be able to read/write data to the file and the file will not actually be deleted until the file handle is closed.
This means you can create/open the decrypted file anywhere within the app's accessible file structure.
While I haven't tested this under iOS, I think there is a good chance it will work.
I would keep the encrypted files in the Documents directory, encrypted with the NSData NSDataWritingFileProtectionComplete option.
If you feel the need to encrypt the files yourself and then decrypt only as needed save the decrypted files in the Documents directory, encrypted with the NSData NSDataWritingFileProtectionComplete. Add the "do not back up" extended attribute to the file. On app launch/wake, etc, based on the police overwrite files that are no longer needed and delete. Use AES, CBC mode with a random iv, random key and keep the key in the Keychain.
An option to open as a stream and decrypt on the fly into a buffer, if this works for your app.
But the catch is I really don't understand you full use-case. Best practice: Hire an iOS security domain expert to advise and vet your solution, I do. Is the security worth that price, a valid question.
In explanation to my comments: I wrote an application to recover images from a corrupted HD, not all that hard.