Activestorage - transfer all assets from one bucket to another bucket - ruby-on-rails

Im switching my app from Vultr to DigitalOcean. Right now I have a bucket configured on Vultr along with the former server. When I try to access my activestorage images on Vultr from DigitalOcean the images load only 10% of the time and most requests result in 502 errors.
Since Im completely moving this app away from Vultr I feel like it would be a good idea to transfer my app's image assets over to a DigitalOcean bucket.
I've found a lot of posts and a couple of blogs with migration scripts but they're focused on migrating from local to bucket. I havnt found anything on moving from one bucket to another.
I have no idea how to do this, has anyone ever moved from one bucket to another? If so, how did you do it?

We also wanted to do an online, incremental migration from one ActiveStorage backend to another, this is some of the extracted code that handled it for us.
It iterates through each blob, copying the file and updating the blob to reference the new service. It leaves the originals intact in case you need to toggle back in case of a problem.
We didn't bother copying any of the variants, instead opting to just let them regenerate as needed, but the code to copy them would probably be similar.
source_service = ActiveStorage::Blob.services.fetch(:service_a)
destination_service = ActiveStorage::Blob.services.fetch(:service_b)
# :service_a/b above should be top-level keys from `config/storage.yml`
ActiveStorage::Blob.where(service_name: source_service.name).find_each do |blob|
key = blob.key
raise "I can't find blob #{blob.id} (#{key})" unless source_service.exist?(key)
unless destination_service.exist?(key)
source_service.open(blob.key, checksum: blob.checksum) do |file|
destination_service.upload(blob.key, file, checksum: blob.checksum)
end
end
blob.update_columns(service_name: destination_service.name)
end

Related

ActiveStorage image blob disappears

In Rails 5.2.1, I have ActiveStorage (5.2.1) configured for the Disk service.
I have a Pic model:
class Pic < ApplicationRecord
has_one_attached :image
end
I can attach images:
imgpath = "/tmp/images/..."
Pic.first.image.attach(io: File.open(imgpath), filename: imgpath)
I wanted to do this in something like a Rake task (but the result is the same if done from the Rails console) to batch-upload images, like for example:
pfs = Dir['testpics/*']
Pic.all.each { |pic|
pf = pfs.shift
pic.image.attach(io: File.open(pf), filename: pf)
}
This runs without errors. However, quiet surprisingly (to me at least) some images don't have a corresponding blob afterwards, and queries fail with 500 Internal Server Error: Errno::ENOENT (No such file or directory # rb_sysopen.
Checking pic.image.attached? returns true. However, pic.image.download throws an exception.
Even stranger, calling pic.image.download right after attaching it does work. 2 seconds later it doesn't.
The only way I could come up with to tell if an image uploaded correctly is to wait ~2 seconds after attaching it, and then try to download. If I keep retrying the attach after waiting 2 seconds and checking if it's ok, all images will be ok. But obviously this is not the right thing to do. :) Simply waiting between attach calls does not help, I have to check after the wait, then reattach and then check again until it is ok - sometimes ok on the first try, sometimes 10th, but eventually it will succeed.
This is all on my local disk, not for example ephemeral storage in Heroku. Also I'm running it on Ubuntu 18.04 (Bionic), with nothing installed that should remove blobs (ie. no antivirus or similar). I really think the problem is internal to ActiveStorage, or the way I use it maybe.
What's going on? Where do blobs go after a few seconds, when they were already uploaded succesfully?
With the S3 service everything is fine, blobs don't disappear.
Wow I think I figured this out. This is not so much of an ActiveStorage issue, but I will leave the question here in case it might be useful for somebody else too.
It turns out the problem was likely Dropbox. :)
What happens is with the Disk strategy, ActiveStorage stores identifiers of two characters in the storage/ directory - similar to a hash. These can (and quite often do) happen to only differ in case, like for example there is a zu and a Zu directory. The way the Dropbox client interferes with this is that if all of this is in a directory that is synced with Dropbox, these directories will get renamed, for example "Zu" will become "zu (Case Conflict)" (so that Dropbox sync works across platforms).
Of course the blobs are not found anymore, and this all happens async, the Dropbox client needs some time to rename stuff, that's why it works for a while right after attaching an image.
So lesson learnt, ActiveStorage doesn't work well with Dropbox.
Now, ActiveStorage supports the DropboxService. Please follow the activestorage-dropbox gem
ActiveStorage::Service::DropboxService
Wraps the Dropbox Storage Service as an Active Storage service.
gem 'activestorage-dropbox'
Usage
Declare an Dropbox service in config/storage.yml
dropbox:
service: Dropbox
access_token: ""
config.active_storage.service = :dropbox
https://rubygems.org/gems/activestorage-dropbox

Move Images From Parse To S3 AWS

I need help moving the images I have from Parse to S3 on AWS. I have viewed numerous supposed guides and GitHub projects, but everything stops short at giving you all the information. One even says, you need GCS bucket set up, but gives no details on how to set up one. Just someone please help me with this. I have the S3 File Adapter in my index.js all set up for the app, but none of the images are there, they are still hosted in parse.
If you are referring to old images that where hosted with parse.com that you want to move across to your own environment then it can be done with the utility tool.
Get all files across all classess in a Parse database. Print file URLs
to console OR transfer to S3, GCS, or filesystem. Rename files so that
Parse Server no longer detects that they are hosted by Parse. Update
MongoDB with new file names.
https://github.com/parse-server-modules/parse-files-utils
Moving forward if you have setup your S3 bucket correctly all new images from your app will be stored there.
https://github.com/ParsePlatform/parse-server/wiki/Configuring-File-Adapters

What is the recommended approach to parse a CSV file stored in S3?

I am using the aws-sdk gem to read a CSV file stored in AWS S3.
Referencing the AWS doc. So far I have:
Aws::S3::Resource.new.bucket(ENV['AWS_BUCKET_NAME']).object(s3_key).get({ response_target: "#{Rails.root}/tmp/items.csv" })
In Pry, this returns:
output error: #<IOError: closed stream>
However, navigating to tmp/. I can see the items.csv file and it contains the right content. I am not certain wether the return value is an actual error.
My second concern. Is it fine to store temporary files in "#{Rails.root}/tmp/"?
Or should I consider another approach?
I can load the file in memory and then CSV.parse. Will this have implications if the CSV file is huge?
I'm not sure how to synchronously return a file object using the aws gem.
But I can offer some advice on the other topics you mentioned.
First of all, /tmp - I've found that saving files here is a working approach. On AWS, I've used this directory to create a local LRU cache for S3-stored images. The key thing is to preemp the situation where the file has been automatically deleted. The file needs to be refetched if this happens. By the way, Heroku has a 'read-only filesystem' but still permits you to write into /tmp.
The second part is the question of synchronously returning a file object.
While it may be possible to do this using the S3 gem, I've found success fetching it over HTTP using something like open-uri or mechanize. If it's not supposed to be a publically-available asset, you can change the permissions on S3 to restrict access to your server.

Alternative to X-sendfile in Apache for sending file given a URL?

I'm writing a Rails application that serves files stored on a remote server to the end user.
In my case the files are stored on S3 but the user requests the file via the Rails-application (hiding the actual URL). If the file was on my servers local file-system, I could use the Apache header X-Sendfile to free up the Ruby process for other requests while Apache took over the task of sending the file to the client. But in my case - where the file is not on the local file-system, but on S3 - it seems that I'm forced to download it temporarily inside Rails before sending it to the client.
Isn't there a way for Apache to serve a "remote" file to the client that is not actually on the server it self. I don't mind if Apache has to download the file for this to work, as long as I don't have to tie up the Ruby process while it's going on.
Any suggestions?
Thomas, I have similar requirements/issues and I think I can answer your problem. First (and I'm not 100% sure you care for this part), hiding the S3 url is quite easy as Amazon allows you to point CNAMES to your bucket and use a custom URL instead of the amazon URL. To do that, you need to point your DNS to the correct amazon URL. When I set mine up it was similar to this: files.domain.com points to files.domain.com.s3.amazonaws.com. Then you need to create the bucket with the name of your custom URL (files.domain.com in this example). How to call that URL will be different depending on which gem you use, but a word of warning was that the attachment_fu plugin I was using was incorrectly sending me to files.domain.com/files.domain.com/name_of_file.... I couldn't find the setting to fix it, so a simple .sub method for the S3 portion of the plugin fixed it.
On to your other questions, to execute some rails code (like recording the hit in the db) before downloading you can simply do this:
def download
file = File.find(...
# code to record 'hit' to database
redirect_to 3Object.url_for(file.filename,
bucket,
:expires_in => 3.hours)
end
That code will still cause the file to be served by S3, but and still give you the ability to run some ruby. (Of course the above code won't work as is, you will need to point it to the correct file and bucket and my amazon keys are saved in a config file. The above is also using the syntax for the AWS::S3 gem - http://amazon.rubyforge.org/).
Second, the Content-Disposition: attachment issue is a bit more tricky. Hopefully, your situation is a bit more simple than mine and the following solution can work. Assuming the object 'file' (in this example) is the correct S3 object, you can set the disposition to attachment by
file.content_disposition = "attachment"
file.save
The above code can be executed after the file exists on the S3 server (unlike some other headers and permissions), which is nice and it can also be added when you upload the file (syntax depends on your plugin). I'm still trying to find a way to tell S3 to send it as an attachment and only when requested (not every time), and if you find that, please let me know your solution. I need to be able to sometimes download it and other times save embed images (for example) into HTML. I'm not using the above mentioned redirect but fortunately it seems that if you embed (such as a HTML image tag) a file with the content-disposition/attachment header, and the browser still displays the image normally (but I haven't throughly tested that across enough browsers to send it in the wild).
Hope that helps! Good luck.

How do I copy files between buckets using s3 from a rails application?

I am currently developing a rails application that tries to copy/move videos from one bucket to another in s3. However i keep getting a proxy error 502 on my rails application. In the mongrel log it says "failed to allocate memory." Once this error occurs the application dies and we must restart is.
Seems like your code is reading the entire resource into memory, and that out-of-memories your application. A naïve way to do this (and from your description, you're doing something like this already) would be to download the file and upload it again: just download it to a local file and not into memory. However, Amazon engineers have thought ahead and provide APIs that can deal with this specific case, as well.
If you're using something like the RightAWS gem, you can use its S3Interface like so:
# With s3 being an S3 object acquired via S3Interface.new
# Copies key1 from bucket b1 to key1_copy in bucket b2:
s3.copy('b1', 'key1', 'b2', 'key1_copy')
And if you're using the naked S3 HTTP interface, see amazon's object copy docs for a solution that uses only HTTP to copy one object from one bucket to another.
try to stream files instead of loading whole file into memory and then working with it.
for example, if you're using aws-s3 gem, do not use:
data = open(file)
S3Object.store file_name, data, BUCKET
Use following instead:
S3Object.store file_name, open(file), BUCKET
not sure how exactly to "stream-download" the file though.
boto works well. See this thread. Using boto, you copy the objects straight from one bucket to another, rather than downloading them to the local machine and then uploading them to another bucket.
You can copy bucket to bucket directly using the fog gem.
s3 = Fog::Storage.new(your_aws_credentials)
s3.copy_object('source-bucket', 'source/path', 'dest-bucket', 'dest/path')

Resources