How to migrate local storage (active storage) to google cloud storage - ruby-on-rails

i'm trying to migrate my rails app on google cloud.
I've connect active storage with the bucket create on GCS.
I've upload the folder "storage" in the bucket but all the images in the app has 404 error.
How i can correctly migrate the local storage folder in the GCS?
Thank you in advice

This question is very similar to this, as is mentioned on that thread:
DiskService uses a different folder structure than cloud storage service on google.
DiskService uses as folders part of the first chars of the key. Cloud services just use the key and put all variants in a separate folder.
You can create a rake task to copy files to cloud storage, for example:
namespace :active_storage do
desc "Migrates active storage local files to cloud"
task migrate_local_to_cloud: :environment do
raise 'Missing storage_config param' if !ENV.has_key?('storage_config')
require 'yaml'
require 'erb'
require 'google/cloud/storage'
config_file = Pathname.new(Rails.root.join('config/storage.yml'))
configs = YAML.load(ERB.new(config_file.read).result) || {}
config = configs[ENV['storage_config']]
client = Google::Cloud.storage(config['project'], config['credentials'])
bucket = client.bucket(config.fetch('bucket'))
ActiveStorage::Blob.find_each do |blob|
key = blob.key
folder = [key[0..1], key[2..3]].join('/')
file_path = Rails.root.join('storage', folder.to_s, key)
file = File.open(file_path, 'rb')
md5 = Digest::MD5.base64digest(file.read)
bucket.create_file(file, key, content_type: blob.content_type, md5: md5)
file.close
puts key
end
end
end
Executed as: rails active_storage:migrate_local_to_cloud storage_config=google.
You can found useful documentation at here.

I would write a migration and iterate over all models that have attachments and "reassign" the current image with the local file in the directory, so thats will be synced with GCS. Also have a look into the Active Storage guide.

Try to use mirror solution: How to sync new ActiveStorage mirrors? — mirror first, then sync.
This works for my migration from local to s3-service.

Related

Rails - Resave All Models for S3 Migration

rails 6.1.3.2
aws-sdk-s3 gem
I currently have a rails app in production that uses ActiveStorage to attach image data to a wrapper Image model. It's currently using the local strategy to save images to disk and I am migrating it to S3. I am not using paperclip or anything similar.
I succeeded in setting it up. Currently it is set to use local primarily and have S3 as a mirror so that I can write to two places during the migration. However the documentation says that it will only save new images to S3 upon create and update of a record. I would like to "re-save" all models in production to force the migration to happen. Does anyone know how to do this?
Looks like it was already answered!
If you happen to be stuck with only access to the Rails Console like I was, this solution worked perfectly. If you copy-paste this code into the console, it will begin to produce output of the S3 uploads. After 5k of those, I was done. An immense thank you to Tayden for the solution.
all_services = [ActiveStorage::Blob.service.primary, *ActiveStorage::Blob.service.mirrors]
# Iterate through each blob
ActiveStorage::Blob.all.each do |blob|
# Select services where file exists
services = all_services.select { |file| file.exist? blob.key }
# Skip blob if file doesn't exist anywhere
next unless services.present?
# Select services where file doesn't exist
mirrors = all_services - services
# Open the local file (if one exists)
local_file = File.open(services.find{ |service| service.is_a? ActiveStorage::Service::DiskService }.path_for blob.key) if services.select{ |service| service.is_a? ActiveStorage::Service::DiskService }.any?
# Upload local file to mirrors (if one exists)
mirrors.each do |mirror|
mirror.upload blob.key, local_file, checksum: blob.checksum
end if local_file.present?
# If no local file exists then download a remote file and upload it to the mirrors (thanks #Rystraum)
services.first.open blob.key, checksum: blob.checksum do |temp_file|
mirrors.each do |mirror|
mirror.upload blob.key, temp_file, checksum: blob.checksum
end
end unless local_file.present?

How to upload a file to an s3 bucket with a custom resource in aws cdk

I need to upload a zip file to an s3 bucket after its creation. I'm aware of the s3_deployment package but it doesn't fit my usecase because I need the file to be uploaded only once, on stack creation. The s3_deployment package would upload the zip on every update.
I have the following custom resource defined however I'm not sure how to pass the body of the file to the custom resource. I've tried opening the file in binary mode but that returns an error.
app_data_bootstrap = AwsCustomResource(self, "BootstrapData",
on_create={
"service": "S3",
"action": "putObject",
"parameters": {
"Body": open('app_data.zip', 'rb'),
"Bucket": f"my-app-data",
"Key": "app_data.zip",
},
"physical_resource_id": PhysicalResourceId.of("BootstrapDataBucket")
},
policy=AwsCustomResourcePolicy.from_sdk_calls(resources=AwsCustomResourcePolicy.ANY_RESOURCE)
)
I don't believe that's possible unless you write a custom script and runs before your cdk deploy to upload your local files to an intermediary S3 bucket. Then you can write a custom resource that copies content of the intermediary bucket on on_create event to the bucket that was created via CDK.
Read this paragraph from s3_deployment in CDK docs:
This is what happens under the hood:
When this stack is deployed (either via cdk deploy or via CI/CD), the contents of the local website-dist directory will be archived and uploaded to an intermediary assets bucket. If there is more than one source, they will be individually uploaded.
The BucketDeployment construct synthesizes a custom CloudFormation resource of type Custom::CDKBucketDeployment into the template. The source bucket/key is set to point to the assets bucket.
The custom resource downloads the .zip archive, extracts it and issues aws s3 sync --delete against the destination bucket (in this case websiteBucket). If there is more than one source, the sources will be downloaded and merged pre-deployment at this step.
So in order for you do replicate step 1, you have to write a small script that creates an intermediate bucket and uploads your local files to it. A sample of that script can be like this:
#!/bin/sh
aws s3 mb <intermediary_bucket> --region <region_name>
aws s3 sync <intermediary_bucket> s3://<your_bucket_name>
Then your custom resource can be something like this:
*Note that this will work for copying one object, you can change the code to copy multiple objects.
import json
import boto3
import cfnresponse
def lambda_handler(event, context):
print('Received request:\n%s' % json.dumps(event, indent=4))
resource_properties = event['ResourceProperties']
if event['RequestType'] in ['Create']: #What happens when resource is created
try:
s3 = boto3.resource('s3')
copy_source = {
'Bucket': 'intermediary_bucket',
'Key': 'path/to/filename.extension'
}
bucket = s3.Bucket('otherbucket')
obj = bucket.Object('otherkey')
obj.copy(copy_source)
except:
cfnresponse.send(event, context, cfnresponse.FAILED, {})
raise
else:
cfnresponse.send(event, context, cfnresponse.SUCCESS,
{'FileContent': response['fileContent'].decode('utf-8')})
elif event['RequestType'] == 'Delete': # What happens when resource is deleted
cfnresponse.send(event, context, cfnresponse.SUCCESS, {})
Alternative to all of this, is to open an issue in AWS CDK's Github repo and ask them to add your usecase.

How to specify a prefix when uploading to S3 using activestorage's direct upload?

With a standard S3 configuration:
AWS_ACCESS_KEY_ID: [AWS ID]
AWS_BUCKET: [bucket name]
AWS_REGION: [region]
AWS_SECRET_ACCESS_KEY: [secret]
I can upload a file to S3 (using direct upload) with this Rails 5.2 code (only relevant code shown):
form.file_field :my_asset, direct_upload: true
This will effectively put my asset in the root of my S3 bucket, upon submitting the form.
How can I specify a prefix (e.g. "development/", so that I can mimic a folder on S3)?
2022 update: as of Rails 6.1 (check this commit), this is actually supported:
user.avatar.attach(key: "avatars/#{user.id}.jpg", io: io, content_type: "image/jpeg", filename: "avatar.jpg")
My current workaround (at least until ActiveStorage introduces the option to pass a path for the has_one_attached and has_many_attached macros) on S3 is to implement the move_to method.
So I'm letting ActiveStorage save the image to S3 as it normally does right now (at the top of the bucket), then moving the file into a folder structure.
The move_to method basically copies the file into the folder structure you pass then deletes the file that was put at the root of the bucket. This way your file ends up where you want it.
So for instance if we were storing driver details: name and drivers_license, save them as you're already doing it so that it's at the top of the bucket.
Then implement the following (I put mine in a helper):
module DriversHelper
def restructure_attachment(driver_object, new_structure)
old_key = driver_object.image.key
begin
# Passing S3 Configs
config = YAML.load_file(Rails.root.join('config', 'storage.yml'))
s3 = Aws::S3::Resource.new(region: config['amazon']['region'],
credentials: Aws::Credentials.new(config['amazon']['access_key_id'], config['amazon']['secret_access_key']))
# Fetching the licence's Aws::S3::Object
old_obj = s3.bucket(config['amazon']['bucket']).object(old_key)
# Moving the license into the new folder structure
old_obj.move_to(bucket: config['amazon']['bucket'], key: "#{new_structure}")
update_blob_key(driver_object, new_structure)
rescue => ex
driver_helper_logger.error("Error restructuring license belonging to driver with id #{driver_object.id}: #{ex.full_message}")
end
end
private
# The new structure becomes the new ActiveStorage Blob key
def update_blob_key(driver_object, new_key)
blob = driver_object.image_attachment.blob
begin
blob.key = new_key
blob.save!
rescue => ex
driver_helper_logger.error("Error reassigning the new key to the blob object of the driver with id #{driver_object.id}: #{ex.full_message}")
end
end
def driver_helper_logger
#driver_helper_logger ||= Logger.new("#{Rails.root}/log/driver_helper.log")
end
end
It's important to update the blob key so that references to the key don't return errors.
If the key is not updated any function attempting to reference the image will look for it in it's former location (at the top of the bucket) rather than in it's new location.
I'm calling this function from my controller as soon as the file is saved (that is, in the create action) so that it looks seamless even though it isn't.
While this may not be the best way, it works for now.
FYI: Based on the example you gave, the new_structure variable would be new_structure = "development/#{driver_object.image.key}".
I hope this helps! :)
Thank you, Sonia, for your answer.
I tried your solution and it works great, but I encountered problems with overwriting attachments. I often got IntegrityError while doing it. I think, that this and checksum handling may be the reason why the Rails core team don't want to add passing pathname feature. It would require changing the entire logic of the upload method.
ActiveStorage::Attached#create_from_blob method, could also accepts an ActiveStorage::Blob object. So I tried a different approach:
Create a Blob manually with a key that represents desired file structure and uploaded attachment.
Attach created Blob with the ActiveStorage method.
In my usage, the solution was something like that:
def attach file # method for attaching in the model
blob_key = destination_pathname(file)
blob = ActiveStorage::Blob.find_by(key: blob_key.to_s)
unless blob
blob = ActiveStorage::Blob.new.tap do |blob|
blob.filename = blob_key.basename.to_s
blob.key = blob_key
blob.upload file
blob.save!
end
end
# Attach method from ActiveStorage
self.file.attach blob
end
Thanks to passing a full pathname to Blob's key I received desired file structure on a server.
Sorry, that’s not currently possible. I’d suggest creating a bucket for Active Storage to use exclusively.
The above solution will still give IntegrityError, need to use File.open(file). Thank Though for idea.
class History < ApplicationRecord
has_one_attached :gs_history_file
def attach(file) # method for attaching in the model
blob_key = destination_pathname(file)
blob = ActiveStorage::Blob.find_by(key: blob_key.to_s)
unless blob
blob = ActiveStorage::Blob.new.tap do |blob|
blob.filename = blob_key.to_s
blob.key = blob_key
#blob.byte_size = 123123
#blob.checksum = Time.new.strftime("%Y%m%d-") + Faker::Alphanumeric.alpha(6)
blob.upload File.open(file)
blob.save!
end
end
# Attach method from ActiveStorage
self.gs_history_file.attach blob
end
def destination_pathname(file)
"testing/filename-#{Time.now}.xlsx"
end
end

How to copy list of public S3 files to private S3 bucket

In rails, and with (say 5k files) using the aws-sdk gem, what is the easiest way to copy a list of public files that are hosted on S3 (not my account) into my private bucket? I would want to keep the same file and path name.
Example:
http://target.com.s3.amazonaws.com/assets/videos/abc123.mp4 (public)
http://myexample.com.s3.amazonaws.com/assets/videos/abc123.mp4 (private)
I would like read the files into memory and directly stream into S3. I won't have disk space with my hosting provider (Heroku). These files are MP4s and are about 3-4MB in size.
Here's my approach (UNTESTED):
vid_file = 'http://example.com.s3.amazonaws.com/assets/videos/abc123.mp4'
vid_response = HTTParty.get(vid_file)
if vid_response.code == 200
filename = File.basename(vid_file) # TOOD - fix to include s3 folder before object filename
s3 = Aws::S3::Resource.new(region: ENV['AWS_REGION'])
obj = s3.bucket(ENV['S3_BUCKET']).object(filename)
obj.put(body: vid_response.body)
end
However, is the a way with the SDK to direct AWS to perform an internal copy between the S3 bucket, albeit I don't have the keys for the first bucket (but the objects are public)? If NOT, is my above approach correct (streaming into memory, posting to S3)?
One easy solution if you know the file name pattern is to use something like wget and then a ruby s3 client to upload to your private bucket. I understand why you would want to use memory instead of hdd but honestly assuming you have a couple gigs free your internet connection is probably the bottleneck.
1) There's is no sdk feature for an 'internal copy' of public S3 objects to ones private S3 bucket.
2) the below source works, which keeps the same S3 directory structure
vid_file = 'http://example.com.s3.amazonaws.com/assets/videos/abc123.mp4'
vid_response = HTTParty.get(vid_file)
if vid_response.code == 200
uri_path = URI(vid_url).path
uri_path.slice!(0) # slice!(0) removes leading slash, otherwise creates an empty s3 folder
s3 = Aws::S3::Resource.new(region: ENV['AWS_REGION'])
obj = s3.bucket(ENV['S3_BUCKET']).object(uri_path)
obj.put(body: vid_response.body) if !obj.exists?
end

Heroku - how to write into "tmp" directory?

I need to use the tmp folder on Heroku (Cedar) for writing some temporarily data, I am trying to do that this way:
open("#{Rails.root}/tmp/#{result['filename']}", 'wb') do |file|
file.write open(image_url).read
end
But this produce error
Errno::ENOENT: No such file or directory - /app/tmp/image-2.png
I am trying this code and it's running properly on localhost, but I cannot make it work on Heroku.
What is the proper way to save some files to the tmp directory on Heroku (Cedar stack)?
Thank you
EDIT:
I am running method with Delayed Jobs that needs to has access to the tmp file.
EDIT2:
What I am doing:
files.each_with_index do |f, index|
unless f.nil?
result = JSON.parse(buffer)
filename = "#{Time.now.to_i.to_s}_#{result['filename']}" # thumbnail name
thumb_filename = "#{Rails.root}/tmp/#{filename}"
image_url = f.file_url+"/convert?rotate=exif"
open("#{Rails.root}/tmp/#{result['filename']}", 'wb') do |file|
file.write open(image_url).read
end
img = Magick::Image.read(image_url).first
target = Magick::Image.new(150, 150) do
self.background_color = 'white'
end
img.resize_to_fit!(150, 150)
target.composite(img, Magick::CenterGravity, Magick::CopyCompositeOp).write(thumb_filename)
key = File.basename(filename)
s3.buckets[bucket_name].objects[key].write(:file => thumb_filename)
# save path to the new thumbnail to database
f.update_attributes(:file_url_thumb => "https://s3-us-west-1.amazonaws.com/bucket/#{filename}")
end
end
I have in database information about images. These images are stored in Amazon S3 bucket. I need to create thumbnails to these images. So I am going through one image by another one, load the image, temporarily save it, then resize it and afterwards I will upload this thumbnail to S3 bucket.
But this procedure doesn't seems to be working on Heroku, so, how could I do that (my app is running on Heroku)?
Is /tmp included in your git repo? Removed in your .slugignore? The directory may just not exist out on Heroku.
Try tossing in a quick mkdir before the write:
Dir.mkdir(File.join(Rails.root, 'tmp'))
Or even in an initializer or something...
Here's an elegant way
f = File.new("tmp/filename.txt", 'w')
f << "hi there"
f.close
Dir.entries(Dir.pwd.to_s + ("/tmp")) # See your newly created file in /tmp
Don't forget that whenever your app restarts (for any reason, including those outside your control), your files will be deleted, as they are only stored ephemerally.
Try it with heroku restart, you will see the new file you created is no longer there

Resources