rails as proxy for remote file download - ruby-on-rails

I am having a rails application on e.g. example.com . I am using a cloud storage provider for any kind of files (videos, images, ...).
No I would like to make them available for download without exposing the url of the actual storage location.
So I was thinking of a kind of proxy. A simple controller which could look like this :
data = open(params[:file])
filename = "#{RAILS_ROOT}/tmp/my_temp_file"
File.open(filename, 'r+') do |f|
f.write data.read
end
send_file filename, ...options...
( code taken from a link ).
Point being is that I would have to download the file first.
So I was wondering if it would be possible to stream the file right away without downloading from the cloud storage first.
best
philip

I was working on this exact issue a while ago and came to the conclusion that this would not be possible without having to download the file to your server and then pass it on to the client as you say.
I'd recommend generating a signed, expiring download link that you insert into a hidden iframe whenever a user clicks a download link on your page. In this way they will get the experience of downloading from your page, without the file making an unnecessary roundtrip to your server.

Related

Creating a dashboard using csv files

I am trying to create a dashboard using CSV files, Highcharts.js, and HTML5. In a local development environment I can render the charts using CSVs both on my file system and hosted on the web. The current goal is to deploy the dashboard live on Heroku.
The CSVs will be updated manually - for now - once per day in a consistent format as required by Highcharts. The web application should be able to render the charts with these new, "standardized" CSVs whenever the dashboard page is requested. My question is: where do I host these CSVs? Do I use S3? Do I keep them on my local file system and manually push the updates to heroku daily? If the CSVs are hosted on another machine, is there a way for my application (and only my application) to access them securely?
Thanks!
Use the gem carrierwave direct to upload the file directly from the client to an Amazon S3 bucket.
https://github.com/dwilkie/carrierwave_direct
You basically give the trusted logged in client a temporary key to upload the file, and nothing else, and then the client returns information about the uploaded file to your web app. Make sure you have set the upload to be private to prevent any third parties from trying to brut force find the CSV. You will then need to create a background worker to do the actually work on the CVS file. The gem has some good docs on how to do this.
https://github.com/dwilkie/carrierwave_direct#processing-and-referencing-files-in-a-background-process
In short in the background process you will download the file temporarily to heroku, parse it out, get the data you need and then discard the copy on heroku, and if you want the copy on S3. This way you get around the heroku issue of permanent file storage, and the issue of tied up dynos with direct uploads, because there is nothing like NGINX for file uploads on heroku.
Also make sure that the file size does not exceed the available memory of your worker dyno, otherwise you will crash. Sense you don't seem to need to worry about concurrency I would suggest https://github.com/resque/resque.

Rails 3.1 : How to download a PDF housed on Linode to client computer?

I am trying to understand what needs to be done to initiate a download
of files that are housed on a server different from the one on which the Rails app is?
Specifically, my Rails app is on Heroku and the PDFs I want to make available for
download are on Linode. I would like the client to get the PDFs when he/she clicks
on a download button in the web-app
My first attempt was with send_file :type => 'application.pdf', :x_sendfile => true.
But this is obviously wrong because it can only serve files stored locally on the web-app server
Is there, therefore, a way to get Linode to send data directly to the client? , OR
Do I have to download the file locally and then call send_file ? (yuck!!!)
Would any changes be required on the Linode end? I have a web-service running there on Tomcat
Thanks in advance for your help,
Abhinav
You can try reading the file into a data stream and using send_data. Here is a Ruby forum discussion of a similar idea. The post also describes your yuck options (#2, local download). Unfortunately I cannot speak to Linode changes.
Also, here is an SO post about a similar issue.

Carrierwave + Fog(s3). Letting the users to download the file

I am working on a project where I have to provide download links to user for the files which are stored in the s3. Initially I tried,
link_to "Download", #medium.file.url
But this opens the file directly on the browser. When i tried to download an mp4 file, chrome started playing it automatically. I don't want that to happen. So I am using send_file for this task,
def download
#medium = Medium.find(params[:id])
send_file #medium.file.url
end
In my local, I have set the storage to file and I have tested this, which works perfectly fine. But on staging, the files are served from s3, I am always getting ActionController::MissingFile. My app is hosted on heroku. I also want to know if using send_file is good choice or if there is a better way of doing this.
config.action_dispatch.x_sendfile_header = 'X-Accel-Redirect'
When i was googling, i found that nginx header setting for sending files should be enabled for production. I added the following line in config/environments/production.rb. Still no luck. I need some help on this. Thanks.
I think this is because S3 wants the file to be played.
I don't have access to my S3 here, but I solved the problem by changing the action for PDF files (which was shown my case) in the S3 control panel.
If you can fix this, I think you can avoid using send_file completely.
Edit: I managed to access S3 now. Go to properties -> Metadata, and add the key
Content-Disposition: Attachment
Then your file will always be downloaded instead of shown.

where is the best place to save images from users upload

I have a website that shows galleries. Users can upload their own content from the web (by entering a URL) or by uploading a picture from their computer.
I am storing the URL in the database which works fine for the first use case but I need to figure out where to store the actual images if a user does a upload from their computer.
Is there any recommendation here or best practice on where I should store these?
Should I save them in the appdata or content folders? Should they not be stored with the website at all because it's user content?
You should NOT store the user uploads anywhere they can be directly accessed by a known URL within your site structure. This is a security risk as users could upload .htm file and .js files. Even a file with the correct extension can contain malicious code that can be executed in the context of your site by an authenticated user allowing server-side or client-side attacks.
See for example http://www.acunetix.com/websitesecurity/upload-forms-threat.htm and What security issues appear when users can upload their own files? which mention some of the issues you need to be aware of before you allow users to upload files and then present them for download within your site.
Don't put the files within your normal web site directory structure
Don't use the original file name the user gave you. You can add a content disposition header with the original file name so they can download it again as the same file name but the path and file name on the server shouldn't be something the user can influence.
Don't trust image files - resize them and offer only the resized version for subsequent download
Don't trust mime types or file extensions, open the file and manipulate it to make sure it's what it claims to be.
Limit the upload size and time.
Depending on the resources you have to implement something like this, it is extremely beneficial to store all this stuff in Amazon S3.
Once you get the upload you simply push it over to Amazon and pop the URL in your database as you're doing with the other images. As mentioned above it would probably be wise to open up the image and resize it before sending it over. This both checks it is actually an image and makes sure you don't accidentally present a full camera resolution image to an end user.
Doing this now will make it much, much easier if you ever have to migrate/failover your site and don't want to sync gigabytes of image assets.
One way is to store the image in a database table with a varbinary field.
Another way would be to store the image in the App_Data folder, and create a subfolder for each user (~/App_Data/[userid]/myImage.png).
For both approaches you'd need to create a separate action method that makes it possible to access the images.
While uploading images you need to verify the content of the file before uploading it. The file extension method is not trustable.
Use magic number method to verify the file content which will be an easy way.
See the stackoverflow post and see the list of magic numbers
One way of saving the file is converting it to binary format and save in our database and next method is using App_Data folder.
The storage option is based on your requirement. See this post also
Set upload limit by setting maxRequestLength property to Web.Config like this, where the size of file is specified in KB
<httpRuntime maxRequestLength="51200" executionTimeout="3600" />
You can save your trusted data just in parallel of htdocs/www folder so that any user can not access that folder. Also you can add .htaccess authentication on your trusted data (for .htaccess you should kept your .htpasswd file in parallel of htdocs/www folder) if you are using apache.

Alternative to X-sendfile in Apache for sending file given a URL?

I'm writing a Rails application that serves files stored on a remote server to the end user.
In my case the files are stored on S3 but the user requests the file via the Rails-application (hiding the actual URL). If the file was on my servers local file-system, I could use the Apache header X-Sendfile to free up the Ruby process for other requests while Apache took over the task of sending the file to the client. But in my case - where the file is not on the local file-system, but on S3 - it seems that I'm forced to download it temporarily inside Rails before sending it to the client.
Isn't there a way for Apache to serve a "remote" file to the client that is not actually on the server it self. I don't mind if Apache has to download the file for this to work, as long as I don't have to tie up the Ruby process while it's going on.
Any suggestions?
Thomas, I have similar requirements/issues and I think I can answer your problem. First (and I'm not 100% sure you care for this part), hiding the S3 url is quite easy as Amazon allows you to point CNAMES to your bucket and use a custom URL instead of the amazon URL. To do that, you need to point your DNS to the correct amazon URL. When I set mine up it was similar to this: files.domain.com points to files.domain.com.s3.amazonaws.com. Then you need to create the bucket with the name of your custom URL (files.domain.com in this example). How to call that URL will be different depending on which gem you use, but a word of warning was that the attachment_fu plugin I was using was incorrectly sending me to files.domain.com/files.domain.com/name_of_file.... I couldn't find the setting to fix it, so a simple .sub method for the S3 portion of the plugin fixed it.
On to your other questions, to execute some rails code (like recording the hit in the db) before downloading you can simply do this:
def download
file = File.find(...
# code to record 'hit' to database
redirect_to 3Object.url_for(file.filename,
bucket,
:expires_in => 3.hours)
end
That code will still cause the file to be served by S3, but and still give you the ability to run some ruby. (Of course the above code won't work as is, you will need to point it to the correct file and bucket and my amazon keys are saved in a config file. The above is also using the syntax for the AWS::S3 gem - http://amazon.rubyforge.org/).
Second, the Content-Disposition: attachment issue is a bit more tricky. Hopefully, your situation is a bit more simple than mine and the following solution can work. Assuming the object 'file' (in this example) is the correct S3 object, you can set the disposition to attachment by
file.content_disposition = "attachment"
file.save
The above code can be executed after the file exists on the S3 server (unlike some other headers and permissions), which is nice and it can also be added when you upload the file (syntax depends on your plugin). I'm still trying to find a way to tell S3 to send it as an attachment and only when requested (not every time), and if you find that, please let me know your solution. I need to be able to sometimes download it and other times save embed images (for example) into HTML. I'm not using the above mentioned redirect but fortunately it seems that if you embed (such as a HTML image tag) a file with the content-disposition/attachment header, and the browser still displays the image normally (but I haven't throughly tested that across enough browsers to send it in the wild).
Hope that helps! Good luck.

Resources