How do I change this bit of code so that I only allow pdf files to be uploaded:
unless #file.content_type =~ /^image/
errors.add(:file, "is not a recognized format")
return false
end
Of course that code is horribly insecure. It relies on the browser sending the file to get the MIME type correct and assumes no-one has send a hacked request.
Frankly unless you open the file and parse it, knowing what makes a valid file for a particular format you cannot be sure that any file uploaded is of a particular type.
Haven't used that, but the pdf mime type is application/pdf, so it should be just:
unless #file.content_type =~ /^application\/pdf$/
You've going to have to:
Accept the upload;
Try and open the PDF in some library;
Reject the file if you can't open it.
You can't rely on the MIME type the browser gives you. The only way to do this is to verify the file. You can check the format with markers and the like but the easiest and most robust method is to open it with an appropriate library call.
Related
I use webView to preview documents like PDF,Word,PPT. My requirement is to check whether the document is password protected before loading into the webview.
I use below function for PDF
bool CGPDFDocumentIsEncrypted ( CGPDFDocumentRef document );
I just wanted to know how to find out document is password protected for word, ppt,other documents.
Please provide the possible ways to accomplish the above requirement.
The only way to tell is something is encrypted is to be able to tell if it is not encrypted and the only way to tell that is to look at the data and see if it makes sense. Look at each type of file and check for something that must be there, usually there is a pre-amble that can be checked.
An example: a .jpg file will always start with the 4 byte Application marker: 0xff 0xd8 0xff 0xe0.
For all the file types you want to determine encryption of lookup the file formats and write code to verify them.
Encryption changes the data bytes in such a way that they can not be discerned from random bits and bytes, if it is possible to tell anything from the encrypted bytes the encryption method has failed. That is the whole point of encryption.
The flow is:
The user selects an image on the client.
Only filename, content-type and size are sent to the server. (E.g. "file.png", "image/png", "123123")
The response are fields and policies for upload directly to S3. (E.g. "key: xxx, "alc": ...)
The case is that if I change the extension of "file.pdf" to "file.png" and then uploads it, the data sent to the server before uploads to S3 are:
"file.png"
"image/png"
The servers says "ok" and return the S3 fields for upload .
But the content type sent is not a real content type. But how I can validate this on the server?
Thanks!
Example:
Testing Redactorjs server side code (https://github.com/dybskiy/redactor-js/blob/master/demo/scripts/image_upload.php) it checks the file content type. But trying upload fake image (test here: http://imperavi.com/redactor/), it not allows the fake image. Like I want!
But how it's possible? Look at the request params: (It sends as image/jpeg, that should be valid)
When I was dealing with this question at work I found a solution using Mechanize.
Say you have an image url, url = "http://my.image.com"
Then you can use img = Mechanize.new.get(url)[:body]
The way to test whether img is really an image is by issuing the following test:
img.is_a?(Mechanize::Image)
If the image is not legitimate, this will return false.
There may be a way to load the image from file instead of URL, I am not sure, but I recommend looking at the mechanize docs to check.
With older browsers there's nothing you can do, since there is no way for you to access the file contents or any metadata beyond its name.
With the HTML5 file api you can do better. For example,
document.getElementById("uploadInput").files[0].type
Returns the mime type of the first file. I don't believe that the method used to perform this identification is mandated by the standard.
If this is insufficient then you could read the file locally with the FileReader apis and do whatever tests you require. This could be as simple as checking for the magic bytes present at the start of various file formats to fully validating that the file conforms to the relevant specification. MDN has a great article that shows how to use various bits of these apis.
Ultimately none of this would stop a malicious attempt.
My rails app is having trouble identifying Office 2007 documents (pptx, xlsx, docx); it uploads via paperclip with the application/zip mime-type.
It also appears my system (OSX Lion) is detecting the file as a zip as well.
james#JM:~$ file --mime -b test.docx
application/zip; charset=binary
I've tried adding the following to my initializers/mime_types
Rack::Mime::MIME_TYPES.merge!({
".docx" => "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
})
But with no luck.
Any ideas?
This is browser-dependent. The mime types are set as content type by the browser. This depends on the browser implementation and any possible client-side mime type settings that may exist on the client machine.
I've come to the conclusion that checking for document types isn't reliable via the mime type (i.e. content type) alone. It needs a mix of checking for mime type and file extension. File extension alone is also not that reliable, but the combination of both can probably be made to be reasonably workable.
Sadly, Paperclip out of the box doesn't seem to support validating by file extension, so custom code is needed. Here is what I came up with as a custom validation:
has_attached_file :file, ...
validate :mime_type_or_file_extension
private
def mime_type_or_file_extension
if self.file.present? &&
!VALID_UPLOAD_FILE_CONTENT_TYPES.include?(self.file_content_type) &&
!VALID_UPLOAD_FILE_EXTENSIONS.include?(Pathname.new(self.file_file_name).extname[1..-1])
self.errors.add(:file_file_name, "must be one of ." + VALID_UPLOAD_FILE_EXTENSIONS.join(' .'))
end
end
Where VALID_UPLOAD_FILE_CONTENT_TYPES and VALID_UPLOAD_FILE_EXTENSIONS are two arrays we have defined in an initializer. Our attachment is called "file"
Perhaps something like this could be added to the Paperclip gem as pull request. I'll see if I find the time.
Update (12/23/2011) #Jamsi asked about download. We set the Content-Disposition and Content-Type in the response header in the controller, like so:
response.headers['Content-Disposition'] = "attachment; filename=#{#upload.file_file_name}"
response.headers['Content-Type'] = Rack::Mime.mime_type(File.extname(#upload.file_file_name))
Where #upload is our file (Paperclip) object.
In my web page (rendered by Rails), I'd like to let the user right-click on a photo to bring up the browser's Save As dialog, to let the user save the photo to their hard drive.
However, the photos on my server have unusual filenames (long hex names) with no file extension. The filename prompt in the Save As dialog has this ugly filename. If the user hits save, they'll end up with a poorly-named file, with no file extension.
The web page is aware of the photo's real file name (the name that came off the camera, for example). Is there a way for me to programmatically override the Save As dialog's filename prompt with a filename of my choosing?
I'm aware of the Content-Dispostion header, and that via this header a filename can be specified. However, I think that in order to be able to make use of this header, I need to load/render the entire file to the browser. If the asset to be made available for download is a movie, that loading of the file could timeout the browser...like, if it's a 100meg video.
Thoughts?
-A
I think I understand the problem here because I encountered (and resolved) at least part of it myself not too long ago.
I have some large mp3's and I link to them on my website
A few problems
I needed to set my content-disposition header to attachment in order to prevent files from automatically streaming whenever a user clicked the download button
my files are on a remote server
my files are large (100MB)
large files can tie up rails controllers if not handled properly
Now, Michael Koziarsky advises in this article that the best way to keep your rails processes free when serving large files, is to create a download action in your controller, and the do something like this (note the use of x_sendfile=>true):
def download
send_file '/path/to/podcast.mp3', :type => 'application/octet-stream', :disposition => 'attachment', :filename=>'something.mp3', :x_sendfile=>true
end
:x_sendfile tells apache to let the file through without tying up a rails controller process. The rest of the code sets the filename and the content-disposition header.
Great, but I'm on heroku, like everyone else nowadays. So I can't use x_sendfile.
I found that I couldn't modify the nginx configuration file either as it's locked down by heroku so it was not possible to get x-accel-redirect (nginx equivalent of x-sendfile) working
So, I decided to add a perl script (see below) to the cgi-bin on our asset-host and this script sets the content-disposition to attachment and gives our file a name too.
Instead of doing a restful download like this:
link_to "download", download_podcast_path(#podcast.mp3)
we just link to the mp3 making sure that we go in through the cgi-bin so that the perl script gets called on every mp3 that leaves the server
# I'm using haml
%a{:href=>"http://afmpodcast.com/cgi-bin/download.cgi?ID=#{#podcast.mp3}"}
download
The result is that my rails controller is no longer called into action when someone downloads a file
I found the perl script here and chopped it up a bit to work for me:
#!/usr/local/bin/perl -wT
use CGI ':standard';
use CGI::Carp qw(fatalsToBrowser);
my $files_location;
my $ID;
my #fileholder;
$files_location = "../";
$ID = param('ID');
open(DLFILE, "<$files_location/$ID") || Error('open', 'file');
#fileholder = <DLFILE>;
close (DLFILE) || Error ('close', 'file');
print "Content-Type:application/x-download\n";
print "Content-Disposition:attachment;filename=$ID\n\n";
print #fileholder
My code, is on github but you'll likely have all sorts of problems using it on your machine as i make heavy use of ENV variables that I store in bashrc and I have no documentation or tests ^hides^
You could do some smart server side url rewrite, like for example rewriting foo.mpeg to youveryuglyfilenamewithoutextension.
Set the Content-Disposition to "attachment; filename="...that's fine. "attachment" explicitly means it's not to be rendered in the browser, file renaming works nonetheless (or possibly particularly for that case).
Based on your comments, you have a few problems.
You want to set the filename using your Rails app.
The file is on a remote host and your Rails app is acting as a middleman.
The file might be big, so you want the file to be sent out to the browser as you receive it instead of queuing the whole thing.
Streaming only with Rails is tricky for a few reasons.
You would need an HTTP client that lets you access the message body as you receive data instead of blocking until you have everything. Net::HTTP is not that client. I'm not sure what library would be better suited.
Once you have a more event-driven way to get your file in pieces, you can pass a proc to the render:
render :text => proc { |response, output| ... }
output can be used like an IO object. Some servers may buffer before sending anyway, though, so that's something to look out for.
It would be easier not handle the byte-shuffling in Rails.
If your webserver or the proxy in front of your webserver supports the X-REPROXY-URL HTTP header, your application can set that header and your webserver or proxy will stream the file.
Perlbal is the only proxy server I know of that supports that header out of the box.
An Apache2 module is also available.
I'm cleaning up some old Maildir folders, and finding messages with names like:
1095812260.M625118P61205V0300FF04I002DC537_0.redoak.cise.ufl.edu,S=2576:2,ST
They don't show up in my IMAP client, so I presume there's some semaphore indicating the message already got moved somewhere else. Is that the case, and can the files be deleted without remorse?
The 'M' is just part of the unique filename and has nothing to do with the fact that the mail doesn't show up in mail clients.
The 'T' at the end of the filename, after the ':' sign, however tells the IMAP server that this message is Trashed.
See http://cr.yp.to/proto/maildir.html
IMAP, is a protocol for communicating to a message storage, the actual storage is standardised in other ways. The filename looks like a Maildir filename where I think does not put any meaning into the first part of the filename, but you have to check with your software manual.