MIgrating data from EC2 to S3? - ruby-on-rails

I am running a Rails app using Paperclip to take care of file attachments and image resizing, etc. The app is currently hosted on EngineYard cloud, and all attachments are stored in their EBS. Thinking about using S3 to handle all Paperclip attachments.
Does anyone know of a good and safe way for this migration? many thanks!

You could work up a rake task that iterates over your attachments and pushes each to S3. I used this one awhile back with attachment_fu -- wouldn't be too different. This uses the aws-s3 gem.
Basically the process is:
1. Select files from the database that need to be moved
2. Push them to S3
3. Update database to reflect that the file is no longer stored locally (this way you can do them in batches and don't need to worry about pushing the same file twice).
#attachments = Attachment.stored_locally
#attachments.each do |attachment|
base_path = RAILS_ROOT + '/public/assets/'
attachment_folder = ((attachment.respond_to?(:parent_id) && attachment.parent_id) || attachment.id).to_s
full_filename = File.join(base_path, ("%08d" % attachment_folder).scan(/..../), attachment.filename)
require 'aws/s3'
AWS::S3::Base.establish_connection!(
:access_key_id => S3_CONFIG[:access_key_id],
:secret_access_key => S3_CONFIG[:secret_access_key]
)
AWS::S3::S3Object.store(
'assets/' + attachment_folder + '/' + attachment.filename,
File.open(full_filename),
S3_CONFIG[:bucket_name],
:content_type => attachment.content_type,
:access => :private
)
if AWS::S3::Service.response.success?
# Update the database
attachment.update_attribute(:stored_on_s3, true)
# Remove the file on the local filesystem
FileUtils.rm full_filename
# Remove directory also if it is now empty
Dir.rmdir(File.dirname(full_filename)) if (Dir.entries(File.dirname(full_filename))-['.','..']).empty?
else
puts "There was a problem uploading " + full_filename
end
end

I found myself in the same situation and took bensie's code and made it work for myself - this is what I came up with:
require 'aws/s3'
# Ensure you do the following:
# export AMAZON_ACCESS_KEY_ID='your-access-key'
# export AMAZON_SECRET_ACCESS_KEY='your-secret-word-thingy'
AWS::S3::Base.establish_connection!
#failed = []
#attachments = Asset.all # Asset paperclip attachment is: has_attached_file :attachment....
#attachments.each do |asset|
begin
puts "Processing #{asset.id}"
base_path = RAILS_ROOT + '/public/'
attachment_folder = ((asset.respond_to?(:parent_id) && asset.parent_id) || asset.id).to_s
styles = asset.attachment.styles.keys
styles << :original
styles.each do |style|
full_filename = File.join(base_path, asset.attachment.url(style, false))
AWS::S3::S3Object.store(
'attachments/' + attachment_folder + '/' + style.to_s + "/" + asset.attachment_file_name,
File.open(full_filename),
"swellnet-assets",
:content_type => asset.attachment_content_type,
:access => (style == :original ? :private : :public_read)
)
if AWS::S3::Service.response.success?
puts "Stored #{asset.id}[#{style.to_s}] on S3..."
else
puts "There was a problem uploading " + full_filename
end
end
rescue
puts "Error with #{asset.id}"
#failed << asset.id
end
end
puts "Failed uploads: #{#failed.join(", ")}" unless #failed.empty?
Of course, if you have multiple models you will need to adjust as necessary...

Related

Export large mount of data in a zip

I'm exporting some data from my server to the client.
It's an zip archive but when the amount of data is to big : TimeOut !
#On my controller
def export
filename = 'my_archive.zip'
temp_file = Tempfile.new(filename)
begin
Zip::OutputStream.open(temp_file) { |zos| }
Zip::File.open(temp_file.path, Zip::File::CREATE) do |zip|
#videos.each do |v|
video_file_name = v.title + '.mp4'
zip.add(video_file_name, v.source.file.path(:original))
end
end
zip_data = File.read(temp_file.path)
send_data(zip_data, :type => 'application/zip', :filename => filename)
ensure
temp_file.close
temp_file.unlink
end
end
I'm using PaperClip to attach my video on my app.
Is there any way to create and upload the zip (with a stream?) without a too long wait?
You could try the zipline gem. It claims to be "Hacks on Hacks on Hacks" so heads up! Looks very easy to use though, worth a shot.

Need to change the storage "directory" of files in an S3 Bucket (Carrierwave / Fog)

I am using Carrierwave with 3 separate models to upload photos to S3. I kept the default settings for the uploader, which was to store photos in a root S3 bucket. I then decided to store them in sub-directories according to model name like /avatars, items/, etc. based on the model they were uploaded from...
Then, I noticed that files of the same name were being overwritten and when I deleted a model record, the photo wasn't being deleted.
I've since changed the store_dir from an uploader-specific setup like this:
def store_dir
"items"
end
to a generic one which stores photo under the model ID (I use mongo FYI):
def store_dir
"uploads/#{model.class.to_s.underscore}/#{mounted_as}/#{model.id}"
end
Here comes the problem. I am trying to move all the photos already into S3 into the proper "directory" within S3. From what I've ready, S3 doesn't have directories per se. I'm having trouble with the rake task. Since i changed the store_dir, Carrierwave is looking for all the photos previously uploaded in the wrong directory.
namespace :pics do
desc "Fix directory location of pictures on s3"
task :item_update => :environment do
connection = Fog::Storage.new({
:provider => 'AWS',
:aws_access_key_id => 'XXXX',
:aws_secret_access_key => 'XXX'
})
directory = connection.directories.get("myapp-uploads-dev")
Recipe.all.each do |l|
if l.images.count > 0
l.items.each do |i|
if i.picture.path.to_s != ""
new_full_path = i.picture.path.to_s
filename = new_full_path.split('/')[-1].split('?')[0]
thumb_filename = "thumb_#{filename}"
original_file_path = "items/#{filename}"
puts "attempting to retrieve: #{original_file_path}"
original_thumb_file_path = "items/#{thumb_filename}"
photo = directory.files.get(original_file_path) rescue nil
if photo
puts "we found: #{original_file_path}"
photo.expires = 2.years.from_now.httpdate
photo.key = new_full_path
photo.save
thumb_photo = directory.files.get(original_thumb_file_path) rescue nil
if thumb_photo
puts "we found: #{original_thumb_file_path}"
thumb_photo.expires = 2.years.from_now.httpdate
thumb_photo.key = "/uploads/item/picture/#{i.id}/#{thumb_filename}"
thumb_photo.save
end
end
end
end
end
end
end
end
So I'm looping through all the Recipes, looking for items with photos, determining the old Carrierwave path, trying to update it with the new one based on the store_dir change. I thought if I simply updated the photo.key with the new path, it would work, but it's not.
What am I doing wrong? Is there a better way to accomplish the ask here?
Here's what I did to get this working...
namespace :pics do
desc "Fix directory location of pictures"
task :item_update => :environment do
connection = Fog::Storage.new({
:provider => 'AWS',
:aws_access_key_id => 'XXX',
:aws_secret_access_key => 'XXX'
})
bucket = "myapp-uploads-dev"
puts "Using bucket: #{bucket}"
Recipe.all.each do |l|
if l.images.count > 0
l.items.each do |i|
if i.picture.path.to_s != ""
new_full_path = i.picture.path.to_s
filename = new_full_path.split('/')[-1].split('?')[0]
thumb_filename = "thumb_#{filename}"
original_file_path = "items/#{filename}"
original_thumb_file_path = "items/#{thumb_filename}"
puts "attempting to retrieve: #{original_file_path}"
# copy original item
begin
connection.copy_object(bucket, original_file_path, bucket, new_full_path, 'x-amz-acl' => 'public-read')
puts "we just copied: #{original_file_path}"
rescue
puts "couldn't find: #{original_file_path}"
end
# copy thumb
begin
connection.copy_object(bucket, original_thumb_file_path, bucket, "uploads/item/picture/#{i.id}/#{thumb_filename}", 'x-amz-acl' => 'public-read')
puts "we just copied: #{original_thumb_file_path}"
rescue
puts "couldn't find thumb: #{original_thumb_file_path}"
end
end
end
end
end
end
end
Perhaps not the prettiest thing in the world, but it worked.
You need to be interacting with the S3 Objects directly to move them. You'll probably want to look at copy_object and delete_object in the Fog gem, which is what CarrierWave uses to interact with S3.
https://github.com/fog/fog/blob/8ca8a059b2f5dd2abc232dd2d2104fe6d8c41919/lib/fog/aws/requests/storage/copy_object.rb
https://github.com/fog/fog/blob/8ca8a059b2f5dd2abc232dd2d2104fe6d8c41919/lib/fog/aws/requests/storage/delete_object.rb

Migrating paperclip S3 images to new url/path format

Is there a recommended technique for migrating a large set of paperclip S3 images to a new :url and :path format?
The reason for this is because after upgrading to rails 3.1, new versions of thumbs are not being shown after cropping (previously cached version is shown). This is because the filename no longer changes (since asset_timestamp was removed in rails 3.1). I'm using :fingerprint in the url/path format, but this is generated from the original, which doesn't change when cropping.
I was intending to insert :updated_at in the url/path format, and update attachment.updated_at during cropping, but after implementing that change all existing images would need to be moved to their new location. That's around half a million images to rename over S3.
At this point I'm considering copying them to their new location first, then deploying the code change, then moving any images which were missed (ie uploaded after the copy), but I'm hoping there's an easier way... any suggestions?
I had to change my paperclip path in order to support image cropping, I ended up creating a rake task to help out.
namespace :paperclip_migration do
desc 'Migrate data'
task :migrate_s3 => :environment do
# Make sure that all of the models have been loaded so any attachments are registered
puts 'Loading models...'
Dir[Rails.root.join('app', 'models', '**/*')].each { |file| File.basename(file, '.rb').camelize.constantize }
# Iterate through all of the registered attachments
puts 'Migrating attachments...'
attachment_registry.each_definition do |klass, name, options|
puts "Migrating #{klass}: #{name}"
klass.find_each(batch_size: 100) do |instance|
attachment = instance.send(name)
unless attachment.blank?
attachment.styles.each do |style_name, style|
old_path = interpolator.interpolate(old_path_option, attachment, style_name)
new_path = interpolator.interpolate(new_path_option, attachment, style_name)
# puts "#{style_name}:\n\told: #{old_path}\n\tnew: #{new_path}"
s3_copy(s3_bucket, old_path, new_path)
end
end
end
end
puts 'Completed migration.'
end
#############################################################################
private
# Paperclip Configuration
def attachment_registry
Paperclip::AttachmentRegistry
end
def s3_bucket
ENV['S3_BUCKET']
end
def old_path_option
':class/:id_partition/:attachment/:hash.:extension'
end
def new_path_option
':class/:attachment/:id_partition/:style/:filename'
end
def interpolator
Paperclip::Interpolations
end
# S3
def s3
AWS::S3.new(access_key_id: ENV['S3_KEY'], secret_access_key: ENV['S3_SECRET'])
end
def s3_copy(bucket, source, destination)
source_object = s3.buckets[bucket].objects[source]
destination_object = source_object.copy_to(destination, {metadata: source_object.metadata.to_h})
destination_object.acl = source_object.acl
puts "Copied #{source}"
rescue Exception => e
puts "*Unable to copy #{source} - #{e.message}"
end
end
Didn't find a feasible method for migrating to a new url format. I ended up overriding Paperclip::Attachment#generate_fingerprint so it appends :updated_at.

PaperClip -- save a new attachment without deleting the old one

I am using Paperclip (w/ Amazon s3) on Rails 3. I want to attach a new file to my model without replacing the old file. I don't want the old file to be accessible, I only want to have it there on s3 as a back up.
Do you know if there is a way of telling paperclip to take care of it, itself?
in post.rb I have:
has_attached_file :sound,
:storage => :s3,
:s3_credentials => "....",
:styles => {:mp3 => {:format => :mp3}},
:processors => [:sound_processor],
:s3_host_alias => '....',
:bucket => '....',
:path => ":attachment/:id/:style/out.:extension",
:url => ":s3_alias_url"
and the processor is as follows:
class Paperclip::SoundProcessor < Paperclip::Processor
def initialize file, options = {}, attachment = nil
super
#format = options[:format] || "mp3"
#current_format = File.extname(#file.path)
#basename = File.basename(#file.path, #current_format)
end
def make
src = #file
dst = Tempfile.new([#basename,".#{#format}"])
dst.binmode
cmd = "ffmpeg -y -ab 128k -t 600 -i #{File.expand_path(src.path)} #{File.expand_path(dst.path)}"
Paperclip.log(cmd)
out = `#{cmd}`
raise Paperclip::PaperclipError, "processor does not accept the given audio file" unless $?.exitstatus == 0
dst
end
end
Here's what I do. I timestamp the filename before saving it (to prevent another file with the same name overwriting the original), and force paperclip to never delete anything.
before_create :timestamp_filename
def timestamp_filename
fname = Time.now.to_s(:db).gsub(/[^0-9]/,'') + '_' + sound_file_name
sound.instance_write(:file_name, fname)
end
# override paperclip's destroy files
# method to always keep them around
def destroy_attached_files
true
end
It's quite straightforward to add versioning to your models, recently I user CarrierWave and paper_trail in combo to achieve this. We were deploying to Heroku so S3 was in the mix too.
The reason I'm posting this as an answer is because, although controversial, I don't think libraries like PaperClip should support backing up files, a library specific to solve that problem feels better to me personally.
Here is a good resource that you can take a look at: http://eggsonbread.com/2009/07/23/file-versioning-in-ruby-on-rails-with-paperclip-acts_as_versioned/
It creates new versions of a file. Hope this helps.

Rails 3 - Tempfile Path?

I have the following:
attachments.each do |a|
Rails.logger.info a.filename
tempfile = Tempfile.new("#{a.filename}", "#{Rails.root.to_s}/tmp/")
Rails.logger.info tempfile.path
end
Where attachments is from paperclip.
Here's the output:
billgates.jpg
/Users/bhellman/Sites/cline/tmp/billgates.jpg20101204-17402-of0u9o-0
Why is the file name getting 20101204-17402-of0u9o-0 appended to at the end? That's breaking everything with paperclip etc. Anyone seen this before? For the life of I have no idea what's doing it?
Thanks
UPDATE
Paperclip: Paperclip on github
a is the attachment file
tempfile = Tempfile.new("#{a.filename}", "#{Rails.root.to_s}/tmp/")
tempfile << a.body
tempfile.puts
attachments.build(
:attachment => File.open(tempfile.path)
)
best make sure your tempfile has the correct extension, saving you to trying and change it after:
file = Tempfile.new(['hello', '.jpg'])
file.path # => something like: "/tmp/hello2843-8392-92849382--0.jpg"
more here: http://apidock.com/ruby/v1_9_3_125/Tempfile/new/class
The first argument for Tempfile.new is just a basename. To make sure each Tempfile is unique the characters are appended to the end of the file.
You should use Paperclip's API for this:
tempfiles = []
attachments.each do |a|
# use Attachment#to_file to get a :filesystem => file, :s3 => tempfile
tempfiles << a.to_file
end
tempfiles.each do |tf|
Rails.logger.debug tf.filename
end
attachment = attachments.build(
:attachment => File.open(tempfile.path)
)
# change the displayed file name stored in the db record here
attachment.attachment_file_name = a.filename # or whatever else you like
attachment.save!
The best way I found to deal with this was to specify the file extension in the Paperclip attribute. For example:
has_attached_file :picture,
:url => "/system/:hash.jpg",
:hash_secret => "long_secret_string",
:storage => :s3,
:s3_credentials => "#{Rails.root}/config/s3.yml"
Note that the :url is declared as '.jpg' rather than the traditional .:extension.
Good luck!

Resources