after_create file saving callback resulting in intermittent error - ruby-on-rails

In my user model, I have an after_create callback that looks like this:
def set_default_profile_image
file = Tempfile.new([self.initials, ".jpg"])
file.binmode
file.write(Avatarly.generate_avatar(self.full_name, format: "jpg", size: 300))
begin
self.profile_image = File.open(file.path)
ensure
file.close
file.unlink
end
self.save
end
(self.initials is simply a utility method that returns the user's initials, so that e.g. my profile image would be "HB.jpg".)
If I call the method directly on an existing user, it works maybe 80% of the time. The other times, it gives me an error message so long I can't reproduce it here (I can't even scroll back far enough in tmux to see the start of it). The error message (or what I can see of it, anyway) comprises a list of MIME types, followed by this bit:
content type discovered from file command: application/x-empty. See documentation to allow this combination.
If I create a new user, the callback results in the same error message 100% of the time.
My method uses the Avatarly gem to generate placeholder avatars; the gem yields them in blob form, hence the creation of a Tempfile to write to.
I can't understand why the above error would occur.

Make sure that full_name has a valid return value and try moving your save call into the begin section. You may be racing against a save and the tempfile being removed/unlink.

What do you expect happens when you do this?
self.profile_image = File.open(file.path)
Without a block, this is the same as:
self.profile_image = File.new(file.path)
They both return a file object. Is profile_image in the database? I'm pretty sure it is going to be mad that you sent a File object to be persisted. If you want the data from that file in the database, do something like:
self.profile_image = File.open(file.path).read
If you want to save the tempfile's path:
self.profile_image = File.path(file.path)
If you are using the path remember that you are saving a tempfile, and the file will not last very long!

I found the solution in an issue on Paperclip's github. I don't really understand the causes very well, but it seems that this is a filesystem issue, where the Tempfile is not yet persisted to disk by the time it gets read into the model.
The solution is to do absolutely anything to the Tempfile before assigning it; file.read works just fine.
def set_default_profile_image
file = Tempfile.new([self.initials, ".jpg"])
file.binmode
file.write(Avatarly.generate_avatar(self.full_name, format: "jpg", size: 300))
file.read # <-- this fixes the issue
begin
self.profile_image = File.open(file.path)
ensure
file.close
file.unlink
end
self.save
end

Related

How to specify a prefix when uploading to S3 using activestorage's direct upload?

With a standard S3 configuration:
AWS_ACCESS_KEY_ID: [AWS ID]
AWS_BUCKET: [bucket name]
AWS_REGION: [region]
AWS_SECRET_ACCESS_KEY: [secret]
I can upload a file to S3 (using direct upload) with this Rails 5.2 code (only relevant code shown):
form.file_field :my_asset, direct_upload: true
This will effectively put my asset in the root of my S3 bucket, upon submitting the form.
How can I specify a prefix (e.g. "development/", so that I can mimic a folder on S3)?
2022 update: as of Rails 6.1 (check this commit), this is actually supported:
user.avatar.attach(key: "avatars/#{user.id}.jpg", io: io, content_type: "image/jpeg", filename: "avatar.jpg")
My current workaround (at least until ActiveStorage introduces the option to pass a path for the has_one_attached and has_many_attached macros) on S3 is to implement the move_to method.
So I'm letting ActiveStorage save the image to S3 as it normally does right now (at the top of the bucket), then moving the file into a folder structure.
The move_to method basically copies the file into the folder structure you pass then deletes the file that was put at the root of the bucket. This way your file ends up where you want it.
So for instance if we were storing driver details: name and drivers_license, save them as you're already doing it so that it's at the top of the bucket.
Then implement the following (I put mine in a helper):
module DriversHelper
def restructure_attachment(driver_object, new_structure)
old_key = driver_object.image.key
begin
# Passing S3 Configs
config = YAML.load_file(Rails.root.join('config', 'storage.yml'))
s3 = Aws::S3::Resource.new(region: config['amazon']['region'],
credentials: Aws::Credentials.new(config['amazon']['access_key_id'], config['amazon']['secret_access_key']))
# Fetching the licence's Aws::S3::Object
old_obj = s3.bucket(config['amazon']['bucket']).object(old_key)
# Moving the license into the new folder structure
old_obj.move_to(bucket: config['amazon']['bucket'], key: "#{new_structure}")
update_blob_key(driver_object, new_structure)
rescue => ex
driver_helper_logger.error("Error restructuring license belonging to driver with id #{driver_object.id}: #{ex.full_message}")
end
end
private
# The new structure becomes the new ActiveStorage Blob key
def update_blob_key(driver_object, new_key)
blob = driver_object.image_attachment.blob
begin
blob.key = new_key
blob.save!
rescue => ex
driver_helper_logger.error("Error reassigning the new key to the blob object of the driver with id #{driver_object.id}: #{ex.full_message}")
end
end
def driver_helper_logger
#driver_helper_logger ||= Logger.new("#{Rails.root}/log/driver_helper.log")
end
end
It's important to update the blob key so that references to the key don't return errors.
If the key is not updated any function attempting to reference the image will look for it in it's former location (at the top of the bucket) rather than in it's new location.
I'm calling this function from my controller as soon as the file is saved (that is, in the create action) so that it looks seamless even though it isn't.
While this may not be the best way, it works for now.
FYI: Based on the example you gave, the new_structure variable would be new_structure = "development/#{driver_object.image.key}".
I hope this helps! :)
Thank you, Sonia, for your answer.
I tried your solution and it works great, but I encountered problems with overwriting attachments. I often got IntegrityError while doing it. I think, that this and checksum handling may be the reason why the Rails core team don't want to add passing pathname feature. It would require changing the entire logic of the upload method.
ActiveStorage::Attached#create_from_blob method, could also accepts an ActiveStorage::Blob object. So I tried a different approach:
Create a Blob manually with a key that represents desired file structure and uploaded attachment.
Attach created Blob with the ActiveStorage method.
In my usage, the solution was something like that:
def attach file # method for attaching in the model
blob_key = destination_pathname(file)
blob = ActiveStorage::Blob.find_by(key: blob_key.to_s)
unless blob
blob = ActiveStorage::Blob.new.tap do |blob|
blob.filename = blob_key.basename.to_s
blob.key = blob_key
blob.upload file
blob.save!
end
end
# Attach method from ActiveStorage
self.file.attach blob
end
Thanks to passing a full pathname to Blob's key I received desired file structure on a server.
Sorry, that’s not currently possible. I’d suggest creating a bucket for Active Storage to use exclusively.
The above solution will still give IntegrityError, need to use File.open(file). Thank Though for idea.
class History < ApplicationRecord
has_one_attached :gs_history_file
def attach(file) # method for attaching in the model
blob_key = destination_pathname(file)
blob = ActiveStorage::Blob.find_by(key: blob_key.to_s)
unless blob
blob = ActiveStorage::Blob.new.tap do |blob|
blob.filename = blob_key.to_s
blob.key = blob_key
#blob.byte_size = 123123
#blob.checksum = Time.new.strftime("%Y%m%d-") + Faker::Alphanumeric.alpha(6)
blob.upload File.open(file)
blob.save!
end
end
# Attach method from ActiveStorage
self.gs_history_file.attach blob
end
def destination_pathname(file)
"testing/filename-#{Time.now}.xlsx"
end
end

How can I ZIP and stream many files without appending to memory on Rails5/Ruby2.4? [duplicate]

I need to serve some data from my database in a zip file, streaming it on the fly such that:
I do not write a temporary file to disk
I do not compose the whole file in RAM
I know that I can do streaming generation of zip files to the filesystemk using ZipOutputStream as here. I also know that I can do streaming output from a rails controller by setting response_body to a Proc as here. What I need (I think) is a way of plugging those two things together. Can I make rails serve a response from a ZipOutputStream? Can I get ZipOutputStream give me incremental chunks of data that I can feed into my response_body Proc? Or is there another way?
Short Version
https://github.com/fringd/zipline
Long Version
so jo5h's answer didn't work for me in rails 3.1.1
i found a youtube video that helped, though.
http://www.youtube.com/watch?v=K0XvnspdPsc
the crux of it is creating an object that responds to each... this is what i did:
class ZipGenerator
def initialize(model)
#model = model
end
def each( &block )
output = Object.new
output.define_singleton_method :tell, Proc.new { 0 }
output.define_singleton_method :pos=, Proc.new { |x| 0 }
output.define_singleton_method :<<, Proc.new { |x| block.call(x) }
output.define_singleton_method :close, Proc.new { nil }
Zip::IoZip.open(output) do |zip|
#model.attachments.all.each do |attachment|
zip.put_next_entry "#{attachment.name}.pdf"
file = attachment.file.file.send :file
file = File.open(file) if file.is_a? String
while buffer = file.read(2048)
zip << buffer
end
end
end
sleep 10
end
end
def getzip
self.response_body = ZipGenerator.new(#model)
#this is a hack to preven middleware from buffering
headers['Last-Modified'] = Time.now.to_s
end
EDIT:
the above solution didn't ACTUALLY work... the problem is that rubyzip needs to jump around the file to rewrite the headers for entries as it goes. particularly it needs to write the compressed size BEFORE it writes the data. this is just not possible in a truly streaming situation... so ultimately this task may be impossible. there is a chance that it might be possible to buffer a whole file at a time, but this seemed less worth it. ultimately i just wrote to a tmp file... on heroku i can write to Rails.root/tmp less instant feedback, and not ideal, but neccessary.
ANOTHER EDIT:
i got another idea recently... we COULD know the compressed size of the files if we do not compress them. the plan goes something like this:
subclass the ZipStreamOutput class as follows:
always use the "stored" compression method, in other words do not compress
ensure we never seek backwards to change file headers, get it all right up front
rewrite any code related to TOC that seeks
I haven't tried to implement this yet, but will report back if there's any success.
OK ONE LAST EDIT:
In the zip standard: http://en.wikipedia.org/wiki/Zip_(file_format)#File_headers
they mention that there's a bit you can flip to put the size, compressed size and crc AFTER a file. so my new plan was to subclass zipoutput stream so that it
sets this flag
writes sizes and CRCs after the data
never rewinds output
furthermore i needed to get all the hacks in order to stream output in rails fixed up...
anyways it all worked!
here's a gem!
https://github.com/fringd/zipline
I had a similar issue. I didn't need to stream directly, but only had your first case of not wanting to write a temp file. You can easily modify ZipOutputStream to accept an IO object instead of just a filename.
module Zip
class IOOutputStream < ZipOutputStream
def initialize io
super '-'
#outputStream = io
end
def stream
#outputStream
end
end
end
From there, it should just be a matter of using the new Zip::IOOutputStream in your Proc. In your controller, you'd probably do something like:
self.response_body = proc do |response, output|
Zip::IOOutputStream.open(output) do |zip|
my_files.each do |file|
zip.put_next_entry file
zip << IO.read file
end
end
end
It is now possible to do this directly:
class SomeController < ApplicationController
def some_action
compressed_filestream = Zip::ZipOutputStream.write_buffer do |zos|
zos.put_next_entry "some/filename.ext"
zos.print data
end
compressed_filestream .rewind
respond_to do |format|
format.zip do
send_data compressed_filestream .read, filename: "some.zip"
end
end
# or some other return of send_data
end
end
This is the link you want:
http://info.michael-simons.eu/2008/01/21/using-rubyzip-to-create-zip-files-on-the-fly/
It builds and generates the zipfile using ZipOutputStream and then uses send_file to send it directly out from the controller.
Use chunked HTTP transfer encoding for output: HTTP header "Transfer-Encoding: chunked" and restructure the output according to the chunked encoding specification, so no need to know the resulting ZIP file size at the begginning of the transfer. Can be easily coded in Ruby with the help of Open3.popen3 and threads.

Carrierwave & Zipfiles: Using an extracted file as a version

Something I'm not getting about the version process...
I have a zip file with a file inside, and I want to upload the file as a "version" of the zip:
Uploader:
version :specificFile do
process :extract_file
end
def extract_file
file = nil
Zip::ZipFile.open(current_path) do |zip_file|
file = zip_file.select{|f| f.name.match(/specificFile/)}.first
zip_file.extract(file, "tmp/" + file.name.gsub("/", "-")){ true }
end
File.open("tmp/" + file.name.gsub("/", "-"))
end
Usage:
=link_to "Specific File", instance.uploader.specificFile.url
Only this just nets me two copies of the zip. Clearly, there's something I'm missing about how version / process works, and I haven't been able to find documentation that actually explains the magic.
So how do I do this, and what am I missing?
This provided the "why", although it took a bit to understand:
How do you create a new file in a CarrierWave process?
To rephrase, when you go to create a version, carrierwave makes a copy of the file and then passes the process the file path. When the process exits, carrierwave will upload the contents of that path - not the file the process returns, which is what I thought was going on.
Working code:
version :specificFile do
process :extract_file
def full_filename (for_file = model.logo.file)
"SpecificFile.ext"
end
end
def extract_plist
file = nil
Zip::ZipFile.open(current_path) do |zip_file|
file = zip_file.select{|f| f.name.match(/specificFile/)}.first
zip_file.extract(file, "tmp/" + file.name.gsub("/", "-")){ true }
end
File.delete(current_path)
FileUtils.cp("tmp/" + file.name.gsub("/", "-"), current_path)
end
So, to make what I want to happen, happen, I:
Tell carrierwave to use a particular filename. I'm using a hardcoded value but you should be able to use whatever you want.
Overwrite the contents of current_path with the contents you want under the version name. In my case, I can't just overwrite the zip while I'm "in it" (I think), so I make a copy of the file I care about and overwrite the zip via File and FileUtils.
PS - It would be nice to avoid the duplication of the zip, but it doesn't look like you can tell carrierwave to skip the duplication.

Generating a CSV and uploading it to S3 when finished in a background job

I'm providing users with the ability to download an extremely large amount of data via CSV. To do this, I'm using Sidekiq and putting the task off into a background job once they've initiated it. What I've done in the background job is generate a csv containing all of the proper data, storing it in /tmp and then call save! on my model, passing the location of the file to the paperclip attribute which then goes off and is stored in S3.
All of this is working perfectly fine locally. My problem now lies with Heroku and it's ability to store files for a short duration dependent on what node you're on. My background job is unable to find the tmp file that gets saved because of how Heroku deals with these files. I guess I'm searching for a better way to do this. If there's some way that everything can be done in-memory, that would be awesome. The only problem is that paperclip expects an actual file object as an attribute when you're saving the model. Here's what my background job looks like:
class CsvWorker
include Sidekiq::Worker
def perform(report_id)
puts "Starting the jobz!"
report = Report.find(report_id)
items = query_ranged_downloads(report.start_date, report.end_date)
csv = compile_csv(items)
update_report(report.id, csv)
end
def update_report(report_id, csv)
report = Report.find(report_id)
report.update_attributes(csv: csv, status: true)
report.save!
end
def compile_csv(items)
clean_items = items.compact
path = File.new("#{Rails.root}/tmp/uploads/downloads_by_title_#{Process.pid}.csv", "w")
csv_string = CSV.open(path, "w") do |csv|
csv << ["Item Name", "Parent", "Download Count"]
clean_items.each do |row|
if !row.item.nil? && !row.item.parent.nil?
csv << [
row.item.name,
row.item.parent.name,
row.download_count
]
end
end
end
return path
end
end
I've omitted the query method for readabilities sake.
I don't think Heroku's temporary file storage is the problem here. The warnings around that mostly center around the facts that a) dynos are ephemeral, so anything you write can and will disappear without notice; and b) dynos are interchangeable, so the presence of inter-request tempfiles are a matter of luck when you have more than one web dyno running. However, in no situation do temporary files just vanish while your worker is running.
One thing I notice is that you're actually creating two temporary files with the same name:
> path = File.new("/tmp/filename", "w")
=> #<File:/tmp/filename>
> path.fileno
=> 3
> CSV.open(path, "w") do |csv| csv << %w(foo bar baz); puts csv.fileno end
4
=> nil
You could change the path = line to just set the filename (instead of opening it for writing), and then make update_report open the filename for reading. I haven't dug into what Paperclip does when you give it an empty, already-overwritten, opened-for-writing file handle, but changing that flow may well fix the issue.
Alternately, you could do this in memory instead: generate the CSV as a string and give it to Paperclip as a StringIO. (Paperclip supports certain non-file objects, including StringIOs, using e.g. Paperclip::StringioAdapter.) Try something like:
# returns a CSV as a string
def compile_csv(items)
CSV.generate do |csv|
# ...
end
end
def update_report(report_id, csv)
report = Report.find(report_id)
report.update_attributes(csv: StringIO.new(csv), status: true)
report.save!
end

Length of uploaded file in Ruby on Rails decreases after UploadedFile.read

On a RoR app that i've inherited, a test is failing that involves a file upload. The assertion that fails looks like so:
assert_equal File.size("#{RAILS_ROOT}/test/fixtures/#{filename}"), #candidate.picture.length
It fails with (the test file is 69 bytes):
<69> expected but was <5>.
This is after a post using:
fixture_file_upload(filename, content_type, :binary)
In the candidate model, the uploaded file is assigned to a property that is then saved to a mediumblob in MySQL. It looks to me like the uploaded file is 69 bytes, but after it is assigned to the model property (using UploadedFile.read), the length is showing as only 5 bytes.
So this code:
puts "file.length=" + file.length.to_s
self.picture = file.read
puts "self.picture.length=" + self.picture.length.to_s
results in this output:
file.length=69
self.picture.length=5
I'm at a bit of a loss as to why this is, any ideas?
This came down to a Windows/Ruby idiosyncrasy, where reading the file appeared to be happening in text mode. There is an extension in this app in test_helper, something like:
class ActionController::TestUploadedFile
# Akward but neccessary for testing since an ActionController::UploadedFile subtype is expected
include ActionController::UploadedFile
def read
tempfile = File.new(self.path)
tempfile.read
end
end
And apparently, on Windows, there is a specific IO method that can be called to force the file into binary mode. Calling this method on the tempfile, like so:
tempfile.binmode
caused everything to work as expected, with the read from the UploadedFile matching the size of the fixture file on disk.

Resources