PDF uploading malicious content vulnerability with Rails - ruby-on-rails

I am implementing pdf upload using Carrierwave with Rails 4. I was asked by the client about malicious content, e.g. if someone attempts to upload a malicious file masked as a pdf. I will be restricting filetype on the frontend to 'application/pdf'. Is there anything else I need to worry about, assuming the uploaded file has a .pdf extension?

File uploads is often a security issue, since there are so many ways to get it wrong. Regarding just the issue of masking a malicious file as a PDF, checking the content type (application/pdf) is good, but not enough, since it's controlled by the client and can be modified.
Filtering on the .pdf extension is definitely advisable, but make sure you don't accept files like virus.pdf.exe.
Other filename attack techniques exist, e.g. involving null or control characters.
Consider using a file type detector to determine that the file is really a PDF document.
But that's just for restricting the file type. There are many other issues you need to be aware of when accepting file uploads.
PDF files can contain malicious code and are a common attack vector.
Make sure uploaded files are written to an appropriate directory on the server. If they aren't meant to be publicly accessible, choose a directory outside of the web root.
Restrict the maximum upload file size.
This is not a complete list by any means. Check out the Unrestricted File Upload vulnerability by OWASP for more info.

In addition to #StefanOS 's great answer, PDF files are required to start with the string:
%PDF-[VERSION]
Generally, at least often, the first couple of bytes (or more) indicate the file type - especially for executables (i.e., Windows executables, called PE files, should start - if memory serves - with "MZ").
For uploaded PDF files, opening the uploaded file and reading the first 5 bytes should always yield %PDF-.
This might be a good enough verification. for most use-cases.

Related

Uniquely identify files with same name and size but with different contents

We have a scenario in our project where there are files coming from the client with the same file name, sometimes with the same file size too. Currently when we upload a file, we are checking the new file name with the existing files in the database and if there is a reference we are marking it as duplicate and would not allow to upload at all. But now we have a requirement to check the content of the file when they have the same file name. So we need to find out a solution to differentiate such files based on contents. So, how do we efficiently do that - meaning how to do it avoiding even a minute chance of error?
Rails 3.1, Ruby 1.9.3
Below is one option I have read from a web reference.
require 'digest'
digest_value = Digest::MD5.base64digest(File.read( file_path ))
And the above line will read all the contents of the incoming file and based on which it will generate a unique hash, right? Then we can use it for unique file identification. But we have more than 500 users simultaneously working in 24/7 mode and most of them will be doing this operation. So, if the incoming file has a huge size (> 25MB) then the Digest will take more time to read the whole contents and there by suffer performance issues. So, what could be a better solution considering all these facts?
I have read the question and the comments and I have to say you have the problem stated not 100% correct. It seems that what you need is to identify identical content. Period. Despite whether name and size are equal or not. Correct me if I am wrong, but you likely don’t want to allow users to update 100 duplicates of the same file just because the user has 100 copies of it in local, having different names.
So far, so good. I would use the following approach. The file name is not involved anyhow. The file size might help in terms of fast-check the uniqueness (sizes differ hence files are definitely different.)
Then one might allow the upload with an instant “OK” response. Afterwards, the server in the background should run Digest::MD5, comparing the file against all already uploaded. If there is a duplicate, the new copy of the file should be removed, but the name should stay on the filesystem, being a symbolic link to the original.
That way you’ll not frustrate users, giving them an ability to have as many copies of the file as they want under different names, while preserving the HDD volume at the lowest possible level.

Zoomify .zif format bad performance

The new .zif single file format provided by Zoomify Pro seems to have some performance issues. Comparing it to the old file structure it loads the page 3 to 4 times slower and the requests that it sends exceed 50% more (Tested with the same initial image in multiple file formats).
Using the old format is not feasible for out product and we are stuck with over a minute of load time.
Has anyone encountered this issue, and are there some workarounds? The results in the internet and the official site doesn't seem to be of any help.
NOTE: Contacting the vendor hasn't led to anything yet.
Although the official site claims the zif format could handle very large image, I'm skeptical about it because the viewer tries to do everything in Javascript. The performance is entire based on the client's machine. Try opening it on a faster machine and see if it improves.
Alternative solution: You could create Deep Zoom Image tiles by using VIPS library.
More information here:
https://libvips.github.io/libvips/API/current/Making-image-pyramids.md.html
Scroll further down in the article and you'll see this snippet:
With 7.40 and later, you can use --container to set the container
type. Normally dzsave will write a tree of directories, but with
--container zip you'll get a zip file instead. Use .zip as the directory suffix to turn on zip format automatically:
$ vips dzsave wtc.tif mypyr.zip
to write a zipfile containing the tiles.
Also, checkout this tutorial:
Serve deepzoom images from a zip archive with openseadragon
https://web.archive.org/web/20170310042401/https://literarymachin.es/deepzoom-osd-server/
The community (openseadragon and vips) is much stronger over there so you'll get help when you hit a wall.
If you want to take a break from all of this and just want the images zoomable, you could use 3rd party service such as zoomable.ca or zoomo.ca. It’s free and user friendly (upload your image and embed the viewer to your site like Google Map).
ZIF format designer here... ZIF can easily handle monstrous images, up to hundreds of terabytes in size.
Without a server, of course the viewer tries to do everything, it's the only option. As a result, serving ZIF directly from a webserver will not be as performant as using an image server. But... you can DO it. Using Zoomify tile folders, speed will be faster, but you may have hundreds of thousands or millions of tiles to deal with at the server side, and transfers will be horrendously slow and error-prone.
There are always trade-offs.See zif.photo for specification.

Need an use case example for stream response in ChicagoBoss

ChicageBoss controller API has this
{stream, Generator::function(), Acc0}
Stream a response to the client using HTTP chunked encoding. For each
chunk, the Generator function is passed an accumulator (initally Acc0)
and should return either {output, Data, Acc1} or done.
I am wondering what is the use case for this? There are others like Json, output. When will this stream be useful?
Can someone present an use case in real world?
Serving large files for download might be the most straight-forward use case.
You could argue that there are also other ways to serve files so that users can download them, but these might have other disadvantages:
By streaming the file, you don't have to read the entire file into memory before starting to send the response to the client. For small files, you could just read the content of the file, and return it as {output, BinaryContent, CustomHeader}. But that might become tricky if you want to serve large files like disk images.
People often suggest to serve downloadable files as static files (e.g. here). However, these downloads bypass all controllers, which might be an issue if you want things like download counters or access restrictions. Caching might be an issue, too.

iOS Sandbox - Securing the data in Documents directory

There are some files that I want to download and store in the sandbox. However, they must stay secure (i.e. encrypted) all the time. Now, I can encrypt them while downloading to the Documents itself. But when the files need to be consumed I have to decrypt them before that. The question is where to put these decrypted files?
tmp - Looks like a good place to keep it, but then what if the contents are deleted when the app has been kept minimised for days.
Documents - Keeping the decrypted file here in a separate place may not be a very good idea. It is not automatically cleaned up when the app is relaunched and if the device runs out of battery while the app is still running, these decrypted files will get exposed.
So the moot question is what the best way to ensure Documents directory's data security.
One useful aspect of UNIX-based systems is that you can create/open a file and then immeditely delete the file. The file won't be accessible from outside the app, however the app will be able to read/write data to the file and the file will not actually be deleted until the file handle is closed.
This means you can create/open the decrypted file anywhere within the app's accessible file structure.
While I haven't tested this under iOS, I think there is a good chance it will work.
I would keep the encrypted files in the Documents directory, encrypted with the NSData NSDataWritingFileProtectionComplete option.
If you feel the need to encrypt the files yourself and then decrypt only as needed save the decrypted files in the Documents directory, encrypted with the NSData NSDataWritingFileProtectionComplete. Add the "do not back up" extended attribute to the file. On app launch/wake, etc, based on the police overwrite files that are no longer needed and delete. Use AES, CBC mode with a random iv, random key and keep the key in the Keychain.
An option to open as a stream and decrypt on the fly into a buffer, if this works for your app.
But the catch is I really don't understand you full use-case. Best practice: Hire an iOS security domain expert to advise and vet your solution, I do. Is the security worth that price, a valid question.
In explanation to my comments: I wrote an application to recover images from a corrupted HD, not all that hard.

Is the chunking option required with plupload and asp.net MVC?

I have seen various posts where developers have opted for the chunking option to upload files, particularly large files.
It seems that if one uses the chunking option, the files are uploaded and progressively saved to disk, is this correct? if so it seems there needs to be a secondary operation to process the files.
If the config is set to allow large files, should plupload work without chunking up to the allowed file size for multiple files?
It seems that if one uses the chunking option, the files are uploaded
and progressively saved to disk, is this correct ?
If you mean "automatically saved to disk", as far as I know, it is not correct. Your MVC controller will have to handle as many requests as there are chunks, concatenate each chunk in a temp file, then rename the file after handling the last chunk.
It is handled this way in the upload.php example of plupload
if so it seems there needs to be a secondary operation to process the
files.
I'm not sure I understand this (perhaps you weren't meaning "automatically saved to disk")
If the config is set to allow large files, should plupload work
without chunking up to the allowed file size for multiple files ?
The answer is yes... and no.... It should work, then fail with some combination of browsers / plupload runtimes when size comes around 100 MB. People also seem to encounter problems to setup the config.
I handle small files (~15MB) and do not have to use chunking.
I would say that if you are to handle large files, chunking is the way to go.

Resources