Download large files with Ruby on Rails

Download large files with Ruby on Rails - ruby-on-rails

My small project for internal use is something like "file share portal like sharerapid", it will be use about 100 people. I have problem with downloading large files. Small files (< 200 MB) are downloading fast, but largest files block my server for 2-5 minutes. Maybe problem is with RAM, I have 2 GB ram. My code to download file:
def custom_send(userfile)
file = userfile.attachment.file.url.to_s.split("?").slice(0..-2).join("?")
send_file "#{Rails.root.to_s}/public#{file}" , filename: userfile.name, x_sendfile: true
end
I don't know where is problem, in develop mode on my localhost machine is OK, but problem is on public virtual server(ubuntu 12).

what web server are you using? the most likely cause is that the request is blocking further requests in a single-threaded environment.
the best solution to your problem would be to host the files on amazon s3 and link to them there. if the files must remain local, you could try something more like this:
http://www.therailsway.com/2009/2/22/file-downloads-done-right/

Related

ActiveStorage image blob disappears

In Rails 5.2.1, I have ActiveStorage (5.2.1) configured for the Disk service.
I have a Pic model:
class Pic < ApplicationRecord
has_one_attached :image
end
I can attach images:
imgpath = "/tmp/images/..."
Pic.first.image.attach(io: File.open(imgpath), filename: imgpath)
I wanted to do this in something like a Rake task (but the result is the same if done from the Rails console) to batch-upload images, like for example:
pfs = Dir['testpics/*']
Pic.all.each { |pic|
pf = pfs.shift
pic.image.attach(io: File.open(pf), filename: pf)
}
This runs without errors. However, quiet surprisingly (to me at least) some images don't have a corresponding blob afterwards, and queries fail with 500 Internal Server Error: Errno::ENOENT (No such file or directory # rb_sysopen.
Checking pic.image.attached? returns true. However, pic.image.download throws an exception.
Even stranger, calling pic.image.download right after attaching it does work. 2 seconds later it doesn't.
The only way I could come up with to tell if an image uploaded correctly is to wait ~2 seconds after attaching it, and then try to download. If I keep retrying the attach after waiting 2 seconds and checking if it's ok, all images will be ok. But obviously this is not the right thing to do. :) Simply waiting between attach calls does not help, I have to check after the wait, then reattach and then check again until it is ok - sometimes ok on the first try, sometimes 10th, but eventually it will succeed.
This is all on my local disk, not for example ephemeral storage in Heroku. Also I'm running it on Ubuntu 18.04 (Bionic), with nothing installed that should remove blobs (ie. no antivirus or similar). I really think the problem is internal to ActiveStorage, or the way I use it maybe.
What's going on? Where do blobs go after a few seconds, when they were already uploaded succesfully?
With the S3 service everything is fine, blobs don't disappear.

Wow I think I figured this out. This is not so much of an ActiveStorage issue, but I will leave the question here in case it might be useful for somebody else too.
It turns out the problem was likely Dropbox. :)
What happens is with the Disk strategy, ActiveStorage stores identifiers of two characters in the storage/ directory - similar to a hash. These can (and quite often do) happen to only differ in case, like for example there is a zu and a Zu directory. The way the Dropbox client interferes with this is that if all of this is in a directory that is synced with Dropbox, these directories will get renamed, for example "Zu" will become "zu (Case Conflict)" (so that Dropbox sync works across platforms).
Of course the blobs are not found anymore, and this all happens async, the Dropbox client needs some time to rename stuff, that's why it works for a while right after attaching an image.
So lesson learnt, ActiveStorage doesn't work well with Dropbox.

Now, ActiveStorage supports the DropboxService. Please follow the activestorage-dropbox gem
ActiveStorage::Service::DropboxService
Wraps the Dropbox Storage Service as an Active Storage service.
gem 'activestorage-dropbox'
Usage
Declare an Dropbox service in config/storage.yml
dropbox:
service: Dropbox
access_token: ""
config.active_storage.service = :dropbox
https://rubygems.org/gems/activestorage-dropbox

Paperclip + Rails with load balanced machines

How do I get Paperclip image uploads to work on a Rails app running on 8 machines (load-balanced)?
A user can upload an image on the app. The image is stored on one of the machines. The user later requests the image, but it's not found, because it's being requested from another machine.
What's the workaround for this type of problem? I can't use AWS or any cloud service; images have to be stored in-house.
Thanks.

One solution is to use NFS to mount a shared folder that will be the root of your public/system or whatever you called your folder containing paperclip images.
There's a few things to consider to make everything work though :
Use a dedicated server that will only contain assets, this way your hard drive(s) are dedicated to serve your paperclip images
NFS can be expensive. Use it to write files from your App servers to your Asset server only. You'll have to configure your load balancer or reverse proxy or web server to retrieve all images from the asset server directly, without asking an application server to do it over NFS.
a RAID system is recommended on your asset server of course
a second asset server is recommended, with the same specs. You can make it act as a backup server and regularly rsync your paperclip images to it. If the master asset server ever goes down, you'll be able to switch to this one.
When mounting the shared NFS folder, use the soft option, and mount via a high-speed local network connection, for example : mount -o soft 10.0.0.1:/export/shared_image_folder . If you're not specifying the soft option, and the asset server goes down, your Ruby instances will keep waiting for the server to go up. Everything will be stuck, and the website will look down. Learned this one the hard way ...
THese are general guidelines to use NFS. I'm using it on a quite big production website with hundreds of thousands of images and it works fine for me.

If you don't want to use a file share like NFS, you could store the images in your database. Here is a gem that provides a :database storage type for Paperclip:
https://github.com/softace/paperclip_database

How we can upload Large file in chunks in rails?

I am trying to upload a zip file of 350mb - 500mb to server. It gives "ENOSPC" error.
Is it possible to upload file in chunks and receive it on server as one file ?
or
Use custom location for tmpfs, so that it will be independent of system tmp, because in my case tmp is of 128MB only.

Why not use the Web-server uploading feature like nginx-upload and apache-upload
Not sure what it is called in apache but I guess apache too has it
if you are using Nginx
there is also a nginx-upload-progress which can be helpful if you want to track the progress of the upload
Hope this help

Recommendations for file server to be used with Rails application

I'm working on a Rails app that accepts file uploads and where users can modify these files later. For example, they can change the text file contents or perform basic manipulations on images such as resizing, cropping, rotating etc.
At the moment the files are stored on the same server where Apache is running with Passenger to serve all application requests.
I need to move user files to dedicated server to distribute the load on my setup. At the moment our users upload around 10GB of files in a week, which is not huge amount but eventually it adds up.
And so i'm going through a different options on how to implement the communication between application server(s) and a file server. I'd like to start out with a simple and fool-proof solution. If it scales well later across multiple file servers, i'd be more than happy.
Here are some different options i've been investigating:
Amazon S3. I find it a bit difficult to implement for my application. It adds complexity of "uploading" the uploaded file again (possibly multiple times later), please mind that users can modify files and images with my app. Other than that, it would be nice "set it and forget it" solution.
Some sort of simple RPC server that lives on file server and transparently manages files when looking from the application server side. I haven't been able to find any standard and well tested tools here yet so this is a bit more theorethical in my mind. However, the Bert and Ernie built and used in GitHub seem interesting but maybe too complex just to start out.
MogileFS also seems interesting. Haven't seen it in use (but that's my problem :).
So i'm looking for different (and possibly standards-based) approaches how file servers for web applications are implemented and how they have been working in the wild.

Use S3. It is inexpensive, a-la-carte, and if people start downloading their files, your server won't have to get stressed because your download pages can point directly to the S3 URL of the uploaded file.
"Pedro" has a nice sample application that works with S3 at github.com.
Clone the application ( git clone git://github.com/pedro/paperclip-on-heroku.git )
Make sure that you have the right_aws gem installed.
Put your Amazon S3 credentials (API & secret) into config/s3.yml
Install the Firefox S3 plugin (http://www.s3fox.net/)
Go into Firefox S3 plugin and put in your api & secret.
Use the S3 plugin to create a bucket with a unique name, perhaps 'your-paperclip-demo'.
Edit app/models/user.rb, and put your bucket name on the second last line (:bucket => 'your-paperclip-demo').
Fire up your server locally and upload some files to your local app. You'll see from the S3 plugin that the file was uploaded to Amazon S3 in your new bucket.
I'm usually terribly incompetent or unlucky at getting these kinds of things working, but with Pedro's little S3 upload application I was successful. Good luck.

you could also try and compile a version of Dropbox (they provide the source) and ln -s that to your public/system directory so paperclip saves to it. this way you can access the files remotely from any desktop as well... I haven't done this yet so i can't attest to how easy/hard/valuable it is but it's on my teux deux list... :)

I think S3 is your best bet. With a plugin like Paperclip it's really very easy to add to a Rails application, and not having to worry about scaling it will save on headaches.

Heroku: Serving Large Dynamically-Generated Assets Without a Local Filesystem

I have a question about hosting large dynamically-generated assets and Heroku.
My app will offer bulk download of a subset of its underlying data, which will consist of a large file (>100 MB) generated once every 24 hours. If I were running on a server, I'd just write the file into the public directory.
But as I understand it, this is not possible with Heroku. The /tmp directory can be written to, but the guaranteed lifetime of files there seems to be defined in terms of one request-response cycle, not a background job.
I'd like to use S3 to host the download file. The S3 gem does support streaming uploads, but only for files that already exist on the local filesystem. It looks like the content size needs to be known up-front, which won't be possible in my case.
So this looks like a catch-22. I'm trying to avoid creating a gigantic string in memory when uploading to S3, but S3 only supports streaming uploads for files that already exist on the local filesystem.
Given a Rails app in which I can't write to the local filesystem, how do I serve a large file that's generated daily without creating a large string in memory?

${RAILS_ROOT}/tmp (not /tmp, it's in your app's directory) lasts for the duration of your process. If you're running a background DJ, the files in TMP will last for the duration of that process.
Actually, the files will last longer, the reason we say you can't guarantee availability is that tmp isn't shared across servers, and each job/process can run on a different server based on the cloud load. You also need to make sure you delete your files when you're done with them as part of the job.
-Another Heroku employee

Rich,
Have you tried writing the file to ./tmp then streaming the file to S3?
-Blake Mizerany (Heroku)

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart