Performance issues uploading large text files via Paperclip - ruby-on-rails

I'm in the process of upgrading from Ruby 1.8.7 to 1.9.3 and from Rails 2.3 to 3.2 As part of that upgrade, I'm moving from Paperclip 2.2.9 to 3.5.2. My ImageMagick version is 6.8.6. One issue that I've discovered as part of the upgrade process is that upload performance is very poor when it comes to large (~1 MB) text files. The files in question do not need to specifically be .txt files, anything in plain text format (.xml files, for example) are also effected.
For your reference, here is my Paperclip setup:
has_attached_file :attachment,
:url => "/shared_documents/:id/:basename.:extension",
:path => ":rails_root/user_uploaded_content/shared_documents/:id/:basename.:extension"
For simplicity, I'm omitting our validations and so on as we are simply checking file size and presence.
Watching the top processes running on my development machine, it seems that the bottleneck occurs when Paperclip is calling ImageMagick's identify command. Calling identify on a variety of files through the command line has allowed me to verify that metadata is returned almost immediately for image files but large, non-image text files take a very long time to process.
For my application, I am allowing users to upload documents in whatever format they like so I must be able to efficiently process both images and text files. Has anyone else encountered this issue? Is there a way to selectively disable calling identify on certain file formats in Paperclip but not others? Failing that, we could live with simply not calling identify if that is an option. Perhaps there a way to set configure ImageMagick to more gracefully handle large text files?

If you're not actually post processing the files, just tell Paperclip not to post process them. From the Paperclip documentation, you can do this a couple of ways. One is to supply an empty list of styles in the model:
has_attached_file :attachment,
styles:{},
url: "/shared_documents/:id/:basename.:extension",
path: ":rails_root/user_uploaded_content/shared_documents/:id/:basename.:extension"
or, you may just supply no processors
has_attached_file :attachment,
processors:[],
url: "/shared_documents/:id/:basename.:extension",
path: ":rails_root/user_uploaded_content/shared_documents/:id/:basename.:extension"
or, you could possibly use the before_post_process callback in your model and return false to halt the process, but Paperclip may call identify first to validate the file, which would make this option pointless for your situation:
has_attached_file :attachment,
url: "/shared_documents/:id/:basename.:extension",
path: ":rails_root/user_uploaded_content/shared_documents/:id/:basename.:extension"
before_post_process :skip_processing
def skip_processing
false
end

Related

Serving dynamically generated images in Rails 6.1

My application dynamically generates images using the gruff gem and then serves these images. The images are stored in app/assets/images/gruff. The images are altered during runtime (keeping the same filename but altering their contents), since these images contain a bar graph containing information for the user that changes over time. The images are generated if they are missing, and there is no requirement for them to exist for long periods of time. The images are served using image_tag.
The issue is that there are 2 types of intermittent problems that occur. One is that the image link generated will sometimes have the wrong fingerprint, even though the images will have just been updated before the image_tag is rendered. (Thus 404'ing.)
The second issue is that even though I can verify the image exists on the server, I still occasionally get a ActionView::Template::Error (The asset "gruff/image-1.png" is not present in the asset pipeline.):
If this should be working, how can I further dig into understanding what the issue is?
If the asset pipeline is simply not a good mechanism for what I'm trying to do, what would the community suggest instead?
A better idea for your problem is save this generated image with activestorage, so you should create a model like that:
class DynamicImages < ApplicationRecord
has_one_attached :image
end
And create a record when use gruff to generated this image.

Paperclip: Validate/process attachment before upload

Paperclip offers nice validator methods like
validates :image, attachment_size: { in 0..2.megabytes }
My problem is that attachment files get uploaded to S3 even though the validators would add errors to the attachment hosting object. So if the image is too big it's getting uploaded and the ActiveRecord-Object is getting errors on it when validating. That's okay but for my situation it would be more clean to reject uploads that are too big.
Is there a way to tap into the process and prevent a file from being uploaded to S3 under certain conditions?
Currently my implementation cares for the errors and deletes the attachment afterwards if the hosting object is not valid.
The described situation refers to Rails 4.0 application using Ruby 2.0.
The described problem does not occur in more recent Paperclip versions (most recent version at the time I'm writing this: 4.2). Files won't be uploaded to S3 when validations have attached errors to the AR-Object then.

Rubyzip and zipruby for zip/unzip not creating much difference in the compressed zipfile size

i have been using rubyzip for zip/unzip for files/folder ranging from 20MB TO 1GB.i noticed that after zipping a folder of 20MB,the created zipfile is almost of the same size somewhat 20MB.So is rubyzip just zip the file or actually compresses it because the compressed file must be less than 40%-50% of the actual file size.i even tried using system(zip, archive, Dir["#{path}/**/**"]) but i guess i am unable to get the correct syntax to call it.So my questions are
why rubyzip is unable to create an actual zip file which must be less in size too.
for a zipfile of more than 500MB,how van i send it to the client using send_file because its going to cost performance issue for a file of that size.what if i place that zip of 500MB or above in public folder and let the server serve it which might improve the performance,am i correct?
are there any other option instead of using rubyzip/zipruby(which requires libraries too).
I am using ruby 1.9 and rails 2.3.
my code:-
require 'zip/zip'
require 'fileutils'
require 'zip/zipfilesystem'
def self.compress_test(path)
path="#{RAILS_ROOT/answers/}"
path.sub!(%r[/$],'')
archive = File.join(path,File.basename(path))+'.zip'
FileUtils.rm archive, :force=>true
Zip::ZipFile.open(archive, 'w') do |zipfile|
Dir["#{path}/**/**"].reject{|f|f==archive}.each do |file|
begin
zipfile.add(file.sub(path+'/',''),file)
rescue Zip::ZipEntryExistsError
end
end
end
end
why rubyzip is unable to create an actual zip file which must be less in size too.
This varies a lot depending on what files you are trying to compress. Text and xls files will compress reasonably well. Media files in formats like JPEG, PNG, MPEG etc are already compressed internally, and often get compression ratios of 99%. They will usually be bigger that other files in the same folder, so the result of compressing a folder with some images, text and spreadsheets will not seem much smaller. Compressing a .zip file can even make the end result larger than you started.
for a zipfile of more than 500MB,how van i send it to the client using
send_file because its going to cost performance issue for a file of
that size.what if i place that zip of 500MB or above in public folder
and let the server serve it which might improve the performance,am i
correct?
Yes, saving a large file to disk, and letting the web server send it may be more efficient. The easiest thing to do would be to save the file to a folder where it can be served from, and provide a link. You could also make that from a different server (e.g. a lighttpd instance dedicated to serving out the large files) to avoid loading your application server.
There are some setups that allow you to pass control back from a Ruby process to e.g. Apache ( xsendfile is one I know of ), but setting that up would be a different question, it depends what web server you have, and whether you have security concerns.
are there any other option instead of using rubyzip/zipruby(which
requires libraries too).
Probably yes, but it is not likely you will find solutions to your two other questions by changing which gem you are using, because rubyzip is doing a reasonable job here - it is not failing.

Extracting uploaded archive to S3 with CarrierWave on Heroku

I want to do something what I thought will be a simple task:
Have a form with these controls:
File upload for one file
Checkbox if this file should be extracted
Text input where I would specify which file should I link to (required only if the checkbox is checked) - index_file
After submitting form:
If the checkbox isn't checked, upload the file via CarrierWave to S3 to the specified store_dir
If the checkbox is checked, extract all files from the archive (I expect only ZIP archives; I need to keep the directory structure), upload extracted files to the specified store_dir and set the index_file in database (I don't need to save to database anything about other extracted files)
As I have found, it isn't an easy task because of Heroku limitations. These files will have a large size (hundreds of MiBs or a few GiBs), so I don't want to redownload this file from S3 if possible.
I think that using Delayed Job or Resque might work, but I'm not exactly sure how to do it and what is the best solution of my problem.
Does anyone have any idea how to solve it with using the lowest resources as possible? I can change CarrierWave to another uploader (Paperclip etc.) and my hosting provider too if it isn't possible on Heroku.
I was also thinking about using CloudFlare, would this still work without problems?
Thank you for answers.
Based on this heroku support email, it would seem that the /tmp directory is many gigs in size. You just need to clean up after yourself so Heroku as a platform is not the issue.
A couple of articles may help you solve the problem:
https://github.com/jnicklas/carrierwave/wiki/How-to%3A-Make-Carrierwave-work-on-Heroku - which explains how to configure your app to use the /tmp directory as the cache directory for CarrierWave. Pay attention to the following line:
use Rack::Static, :urls => ['/carrierwave'], :root => 'tmp' # adding this line
This instructs rack to serve /carrierwave/xzy from the /tmp directory (useful for storing images temporarily)
Then, using the uploader.cache! method, you can deliberately cache the inbound uploaded file. Once stored, you can do checks to determine whether to call the uploader.store! method which will promote the contents to S3 (assuming you configured S3 as the store for CarrierWave.

Uploading/Downloading files - Ruby On Rails system

I'm attempting to create a simple file hosting system using Ruby On Rails - I have a large ammount of the system setup (including the registration of new files, and stuff) however I've realised there is a bit of a problem - I'm unsure how to actually get it so that users can upload and download files.
I assume I'd need some kind of file_link attribute for my file object, but how would people upload and download files to/from the server?
Also (this may be a slightly different topic) - but how would I get the file information such as file size and name (as I need them for the upload)?
Sorry for all my questions - I've don't really deal with file handling a lot so am new to the area.
Thanks In Advance,
Regards,
Joe
You should look at Paperclip gem https://github.com/thoughtbot/paperclip
It is very easy to use and allows to upload files.
Look at Paperclip. It does a lot of the heavs lifting for you: https://github.com/thoughtbot/paperclip
As they said look at paperclip. I just did a app that allows the users to upload and delete files. To get started with paperclip use http://railscasts.com/episodes/134-paperclip
To download files after uploading them with paperclip. I did the following in the controller
def download
upload = Upload.find(params[:id])
send_file upload.uploaded.path,
:filename => upload.uploaded_file_name,
:type => upload.uploaded_content_type,
:disposition => 'attachment'
flash[:notice] = "Your file has been downloaded"
end
My sample file upload app should be of help https://github.com/skillachie/File-Upload-App
Need to fix a few things , but the ability to upload and download files is completely functional.

Resources