Amazon S3 cache audio files - ruby-on-rails

I have created new music application and I store all mp3 files on Amazon S3. Before moving to S3 I used store them on server file system itself. It used to cache files and on consecutive reload of page files weren't downloaded from server. But after moving to S3 everytime I load page it downloads files from S3. This not only making my app slow but every request to S3 is money.
I found some documentation on cache-control and I tried them all but no success. I might be missing something here. Any help is appreciated. Thanks.
Here is my code for uploading mp3 files on S3. I use CarrierWave with Rails.
CarrierWave.configure do |config|
config.fog_credentials = {
:provider => 'AWS',
:aws_access_key_id => MyAppConfig.config['aws']['aws_access_key'],
:aws_secret_access_key => MyAppConfig.config['aws']['aws_secret_key'],
}
config.fog_directory = MyAppConfig.config['aws']['aws_bucket_name']
config.fog_public = false
config.storage = :fog
config.fog_attributes = {'Cache-Control'=>'max-age=315576000'}
end

If you're using signed URLs, which you say you are in the comments, and not reusing those signed URLs then there is no way to cache these requests.
Amazon Web Services cannot override your Web browser's internal cache system. When two URIs are unique, as they are with signed URLs, then your Web browser treats them as two distinct resources on the Internet.
For example, let's take:
http://www.example.com/song1.mp3
http://www.example.com/song2.mp3
These are two discrete URIs. Even if song1.mp3 and song2.mp3 had the same Etag and Content-Length HTTP response headers, they're still two different resources.
The same is true if we merely alter their query strings:
http://www.example.com/song1.mp3?a=1&b=2&c=3
http://www.example.com/song1.mp3?a=1&b=2&c=4
These are still two discrete URIs. They will not reference one another for purposes of caching. This is the principle behind using query strings to override caching.
No amount of fiddling with HTTP headers will ever get you the cache behavior you're seeking.

Take a look at http://www.bucketexplorer.com/documentation/amazon-s3--how-to-set-cache-control-header-for-s3-object.html
Set Cache-Control for already uploaded file on S3 using Update Metadata:
1) Run Bucket Explorer and login with your credentials.
2) After listing all Buckets, select any S3 Bucket.
3) It will list all objects of selected S3 Bucket.
4) Select any file and right click on the objects and select “Update Metadata” option.
5) Add Key and Value in Metadata attributes. Enter Key: “Cache-Control” with Value: “max-age = (time for which you want your object to be accessed from cache in seconds)”
6) Click on Save button. It will update metadata as Cache-Control on all selected S3 objects.
Example to set max-age: For time limit of 15 days = 3600 * 24 * 15 = 1296000 sec. Set Key = “Cache-Control” value = “max-age=1296000”
Note: If object is HTML file, set Key: “Cache-Control” and value: max-age = (time for which you want your object to be accessed from cache in seconds), must-revalidate “i.e. Key: “Cache-Control” value: max-age = “2592000, must re-validate” for 30 days. “must re-validate” string must be added after the time in second as value.

Assuming you have properly set the cache control headers and you are using signed URL's than you will need to hold onto the signed URL for a given file and re-render the exact same URL in subsequent page loads.
If you have not set the cache control headers or you want them to change based on who is making the request you can set them before signing your URL with: &response-cache-control=value or &response-expires=value.

Related

Amazon CloudFront doesn't require me to invalidate objects

I have a ruby on rails application where users can upload their avatar or change it. First I stored the images in Amazon s3 but then I realized that content contents were being served slowly and decided to use Amazon cloudfront.
There is no problem for uploading and getting avatar. However, I can see that an updated photo changes immediately but I expect to invalidate it through cloudfront api. And uploading an image takes a lot of time.
At this point I can't decide whether I use cloudfront correctly or not.
This my carrierwave.rb file inside config/initializer:
CarrierWave.configure do |config|
config.fog_provider = 'fog/aws'
config.fog_credentials = {
provider: 'AWS',
aws_access_key_id: 'key',
aws_secret_access_key: 'value',
region: 'us-east-1'
}
config.storage :fog
config.asset_host = 'http://images.my-domain.com'
config.fog_directory = 'bucket_name'
config.fog_public = true
config.fog_attributes = { cache_control: "public, max-age=315576000" }
end
I can't see what I'm missing ? How can I be assure that I'm using cloudfront properly ?
Thanks.
Your images aren't being stored in CloudFront, they're being served through CloudFront's CDN.
First request for an image served through CF looks like this:
Browser -> CloudFront -> S3
|
Browser <- CloudFront <-
The second request for an image just looks like this:
Browser -> CloudFront
|
Browser <-
The second request never hit's CF because CF has cached the result for that URL.
NOW, your avatar's updating immediately is simply probably because it's being uploaded to S3 and resulting in a new URL, and thusly, an immediate update. This is how you want it to work.

Set file expiry date for AWS S3 with aws-sdk in Ruby/Rails

I use the aws-sdk gem to upload files in my Rails app to AWS S3, which works quite fine. Now I want to set an expiry date for some files to improve performance.
According to the SDK documentation (http://docs.aws.amazon.com/sdkforruby/api/Aws/S3/Object.html#upload_file-instance_method) this should work adding an :expires ... into the options hash of the upload_file method:
expire_date = 1.day.since.httpdate
obj = S3_BUCKET.object(path)
obj.upload_file("/tmp/file.png", {acl: 'public-read', expires: expire_date})
Files are getting uploaded successfully but in the Bucket the file still show "Expiry Date: None" ... I tried using "Time.now" with or without .httpdate and so on, nothing works...
Every help very much appreaciated!

Using S3 Presigned-URL for upload a file that will then have public-read access

I am using Ruby on Rails and AWS gem.
I can get pre-signed URL for upload and download.
But when I get the URL there is no file, and so setting acl to 'public-read'
on the download-url doesn't work.
Use case is this: 1, server provides the user a path to upload content to my bucket that is not readable without credentials. 2, And that content needs to be public later: readable by anyone.
To clarify:
I am not uploading the file, I am providing URL for my users to upload. At that time, I also want to give the user a URL that is readable by the public. It seems like it would be easier if I uploaded the file by myself. Also, read URL needs to never expire.
When you generate a pre-signed URL for a PUT object request, you can specify the key and the ACL the uploader must use. If I wanted the user to upload an objet to my bucket with the key "files/hello.txt" and the file should be publicly readable, I can do the following:
s3 = Aws::S3::Resource.new
obj = s3.bucket('bucket-name').object('files/hello.text')
put_url = obj.presigned_url(:put, acl: 'public-read', expires_in: 3600 * 24)
#=> "https://bucket-name.s3.amazonaws.com/files/hello.text?X-Amz-..."
obj.public_url
#=> "https://bucket-name.s3.amazonaws.com/files/hello.text"
I can give the put_url to someone else. This URL will allow them to PUT an object to the URL. It has the following conditions:
The PUT request must be made within the given expiration. In the example above I specified 24 hours. The :expires_in option may not exceed 1 week.
The PUT request must specify the HTTP header of 'x-amz-acl' with the value of 'public-read'.
Using the put_url, I can upload any an object using Ruby's Net::HTTP:
require 'net/http'
uri = URI.parse(put_url)
request = Net::HTTP::Put.new(uri.request_uri, 'x-amz-acl' => 'public-read')
request.body = 'Hello World!'
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
resp = http.request(request)
Now the object has been uploaded by someone else, I can make a vanilla GET request to the #public_url. This could be done by a browser, curl, wget, etc.
You have two options:
Set the ACL on the object to 'public-read' when you PUT the object. This allows you to use the public url without a signature to GET the object.
Let the ACL on the object default to private and provide pre-signed GET urls for users. These expire, so you have to generate new URLs as needed. A pre-signed URL allows someone to send GET request to the object without credentials themselves.
Upload a public object and generate a public url:
require 'aws-sdk'
s3 = Aws::S3::Resource.new
s3.bucket('bucket-name').object('key').upload_file('/path/to/file', acl:'public-read')
s3.public_url
#=> "https://bucket-name.s3.amazonaws.com/key"
Upload a private object and generate a GET url that is good for 1-hour:
s3 = Aws::S3::Resource.new
s3.bucket('bucket-name').object('key').upload_file('/path/to/file')
s3.presigned_url(:get, expires_in: 3600)
#=> "https://bucket-name.s3.amazonaws.com/key?X-Amz-Algorithm=AWS4-HMAC-SHA256&..."

Getting Resource interpreted as Image but transferred with MIME type text/html

Just switched over to Refile for image uploads on my Rails application.
I have the uploads going directly to my s3 bucket. I have two buckets configured (with same settings), one for testing and one for production.
Everything in my local development works fine, and uploading to my bucket in production works, but on all the uploaded images I get the following when rendering them on a webpage.
Resource interpreted as Image but transferred with MIME type text/html:
Also in production, the images are not showing up.
I've looked into permissions for the buckets, but they seem to be good to go. I've also looked at others questions/answers regarding this warning, but have been unable to find any that pertain here.
If any code is needed please let me know.
config/initializers/refile.rb
require 'refile/backend/s3'
aws = {
access_key_id: Rails.application.secrets.aws['access_key_id'],
secret_access_key: Rails.application.secrets.aws['secret_access_key'],
bucket: Rails.application.secrets.aws['s3_bucket_name'],
use_ssl: true
}
Refile.cache = Refile::Backend::S3.new(max_size: 5.megabytes, prefix: 'cache', **aws)
Refile.store = Refile::Backend::S3.new(prefix: 'store', **aws)
Gist of the image helper
attachment_image_tag(avatar, :image, :fill, size, size)
Thanks for taking a look.

Rails AWS-SDK: Set Expiration Date for Objects

In my Rails app, I allow users to upload images directly to S3, which creates a temporary file that gets automatically deleted after the image record is saved in the database.
Instead of automatically deleting the image after the record is saved, I'd like to set an expiration date for the file on S3 so that it automatically gets deleted after a period (say 24 hours).
I've seen documentation on how to set the expiration date on a bucket (http://docs.aws.amazon.com/AWSRubySDK/latest/AWS/S3/BucketLifecycleConfiguration.html), but I only want a certain folder within the bucket to have files that automatically get removed.
Does anyone have suggestions for how to do it?
s3 = AWS::S3.new(:access_key_id => ENV['AWS_ACCESS_KEY_ID'], :secret_access_key => ENV['AWS_ACCESS_KEY'])
foldername = #image.s3_filepath.split("/")[5]
folder_path = 'uploads/' + foldername
s3.buckets[ENV['AWS_BUCKET']].objects.with_prefix(folder_path).each( #set expiration date header here)
You set the lifecycle configuration on the bucket itself, not each individual object. Using the rest api you'd just write out an xml configuration (there's a field for prefix that let's you only apply this lifecycle config to those keys prefixed by it) and PUT it into the bucket.
Converting that over to the ruby SDK, it looks like the example is doing what you want; that first parameter of add_rule appears to be the prefix.
Although you set the lifecycle on the bucket, you don't set it on the bucket object directly. You need to use the AWS::S3::BucketLifecycleConfiguration class. There is more about how to manage life cycle here:
http://docs.aws.amazon.com/AWSRubySDK/latest/AWS/S3/BucketLifecycleConfiguration.html
You can now specify a folder or prefix within a bucket to narrow the lifecycle rule.
I was struggling with the same issue that you have and it seems that the AWS documentation for rails apps it doesn't say to much about how to send this parameter through the write method. Here you have some links where they describe how to upload a file to S3 bucket AWS SDK for Ruby and Upload an Object Using the AWS SDK for Ruby.
Using AWS SDK for Ruby - Version 1
#!/usr/bin/env ruby
require 'rubygems'
require 'aws-sdk'
bucket_name = '*** Provide bucket name ***'
file_name = '*** Provide file name ****'
# Get an instance of the S3 interface.
s3 = AWS::S3.new
# Upload a file.
key = File.basename(file_name)
s3.buckets[bucket_name].objects[key].write(:file => file_name)
puts "Uploading file #{file_name} to bucket #{bucket_name}."
This also was a good source for me Amazon S3: Cache-Control and Expiry Date difference and setting trough REST API but still he doesn't mention how to setup the expiration date, or any of the answers or links that are there.
So I went through the documentation of the code itself (I'm using the aws-sdk-v1 gem) and I found here Method: AWS::S3::S3Object#write all the possible options that allow us to configure the upload of the S3 object, however there was two that seemed to work for the same purpose:
:metadata (Hash) - A hash of metadata to be included with the object. These will be sent to S3 as headers prefixed with x-amz-meta. Each name, value pair must conform to US-ASCII.
:expires (String) - The date and time at which the object is no longer cacheable.
So I started looking which of them should I configure, and I found here Set content expires and cache-control metadata for AWS S3 objects with Ruby what I was looking for:
require 'rubygems'
require 'aws-sdk'
s3 = AWS::S3.new(
:access_key_id => 'your_access_key',
:secret_access_key => 'your_secret_access_key')
bucket = s3.buckets['bucket_name']
one_year_in_seconds = 365 * 24 * 60 * 60
one_year_from_now = Time.now + one_year_in_seconds
# for a new object / to update an existing object
o = bucket.objects['object']
o.write(:file => 'path_to_file',
:cache_control => "max-age=#{one_year_in_seconds}",
:expires => one_year_from_now.httpdate)
# to update an existing object
o.copy_to(o.key,
:cache_control => "max-age=#{one_year_in_seconds}",
:expires => one_year_from_now.httpdate)
And that is pretty much what I've done in order to configure the expiration date and cache control, here is the code adapted to the app where I'm working on:
one_year_in_seconds = 365 * 24 * 60 * 60
files.each do |path, s3path|
#puts "Uploading #{path} to s3: #{File.join(prefix, s3path)}"
s3path = File.join(prefix, s3path) unless prefix.nil?
one_year_from_now = Time.now + one_year_in_seconds
bucket.objects[File.join(s3path)].write(
:file => path,
:acl => (public == true ? :public_read : :private),
:cache_control => "max-age=#{one_year_in_seconds}",
:expires => one_year_from_now.httpdate
)
end
Also here http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/Expiration.html you will find the necessary support for the decision that I've done of configure expires and cache_control instead of metadata:
The Cache-Control max-age directive lets you specify how long (in seconds) that you want an object to remain in the cache before CloudFront gets the object again from the origin server. The minimum expiration time CloudFront supports is 0 seconds for web distributions and 3600 seconds for RTMP distributions. The maximum value is 100 years. Specify the value in the following format:
Cache-Control: max-age=seconds
For example, the following directive tells CloudFront to keep the associated object in the cache for 3600 seconds (one hour):
Cache-Control: max-age=3600
If you want objects to stay in CloudFront edge caches for a different duration than they stay in browser caches, you can use the Cache-Control max-age and Cache-Control s-maxage directives together. For more information, see Specifying the Amount of Time that CloudFront Caches Objects for Web Distributions.
The Expires header field lets you specify an expiration date and time using the format specified in RFC 2616, Hypertext Transfer Protocol -- HTTP/1.1 Section 3.3.1, Full Date, for example:
Sat, 27 Jun 2015 23:59:59 GMT
Regarding your question, yes you can specify an expiration date and a cache control date per each object in your bucket.

Resources