Rails + S3: Parse bucket and load images outside of controller - ruby-on-rails

In a controller in my Rails app I had a function that parses my S3 bucket and selects images. It's causing page load speeds to go slow but I like being able to loop through the bucket without having all the URLs hard coded.
Here is what I have:
#bucket = S3_BUCKET
#images = []
#bucket.objects.each do |file|
if file.key.include?("inspiration")
#images << { url: file.public_url, key: file.key, type: 'file'}
end
end
Is there another way to accomplish this so page load speeds don't suffer?

As it turns out there were many more files than expected and the loop took a long time to complete. I changed the code to:
#images = #bucket.objects({prefix: 'inspiration')
and the response was much faster.

Since you really can't regulate the speed at which you access your s3 bucket I would suggest setting up a CDN(Content delivery network) on Amazons Cloudfont. Please take a look at this article Written by Brandon Hikert about implementing a CDN
https://brandonhilkert.com/blog/setting-up-a-cloudfront-cdn-for-rails/
Side note - If you would like a free CDN option I would use
https://cloudinary.com/pricing
Referencing when to use a CDN over s3
https://stackoverflow.com/questions/3327425/when-to-use-amazon-cloudfront-or-s3

Related

Best practice for deploying low traffic rails app without cost concerns

My questions
Cost related pit falls to avoid when deploying rails app?
Attacks are welcome as it would teach me what to expect and brace myself against.
I would rather avoid big bills at the end of month, however.
Easy cloud hosting services to use?
I picked AWS because it seems scalable and I thought I can avoid leaning another service later.
I have no regrets but AWS is overwhelming, if there was significantly simpler service, I should have used it.
My current concern
Dos attack or get request flooding on aws S3 could raise hosting cost significantly as I'm uploading some contents there.
Billing alarm is useful, but without automatic shutdown I feel a little uncomfortable taking a break and going into a jungle or an inhabited island where I get no INTERNET connection to be informed of or to shut down my service...
Obvious fix for my case
Stop using S3 and save user uploads to database where I can control scaling behavior. But then, most people seems to be using S3 with carrierwave, why?
What I'm doing
Making my first ever home page using:
elastic beanstalk
rails5
Carrierwave gem configured to save user uploads in S3
Edit
In the end, I could not find any real solution to the no cap for S3 issue.
The below is more or less my note.
I'm guessing S3 has some basic built in defense against attacks because I have not heard of sad stories about people using S3 to host static web sites and getting a bill over 10000 US, which can still happen though regardless of how good amazon's defense might be.
mitigation
A script that periodically checks for s3 log files and calls an action that disables s3 resource serving when the cumulative size of those files is too large.
S3 log sometimes takes more than an hour before they become available, so it's no solution but better than nothing.
class LogObserver
def initialize
aws_conf = Aws.config.update({
access_key_id: ENV['AWS_ACCESS_KEY_ID'],
secret_access_key: ENV['AWS_SECRET_ACCESS_KEY'],
region: 'ap-northeast-1'})
#bucket_name="bucket name that holds s3 log"
#last_checked_log_timestamp = Time.now.utc
log "started at: #{Time.now}"
end
def run
bucket = Aws::S3::Resource.new.bucket(#bucket_name)
while true
prv_log_ts = #last_checked_log_timestamp
log_size = fetch_log_size(bucket)
log "The total size of S3 log accumulated since last time this script was executed: #{log_size}"
time_range = #last_checked_log_timestamp - prv_log_ts # float
log_size_per_second = log_size/time_range
if log_size_per_second > (500.kilobyte/60)
log "Disabling S3 access as S3 log size is greater than expected."
`curl localhost/static_pages/disable_s3`
end
sleep 60*60
end
end
def log text
puts text
File.open('./s3_observer_log.txt','a') do |f|
f << text
end
end
def fetch_log_size(bucket)
log_size=0
bucket.objects(prefix: 'files').each do |o|
time_object = o.last_modified
if time_object < #last_checked_log_timestamp
next
end
#last_checked_log_timestamp = time_object
log_size += o.size.to_i
end
return log_size
end
end
Rake task:
namespace :s3_log do
desc "Access S3 access log files and check their cumulative size. If the size is above the expected value, disables s3 access."
task :start_attack_detection_loop do
require './s3_observer.rb'
id=Process.fork do
o=LogObserver.new
o.run
end
puts "Forked a new process that watches s3 log. Process id: #{id}"
end
end
controller action:
before_action :ensure_permited_ip, only: [:enable_s3, :disable_s3]
def enable_s3
# allow disabling s3 only from localhost
CarrierWave.configure do |config|
config.fog_authenticated_url_expiration = 3
end
end
def disable_s3
# allow disabling s3 only from localhost
CarrierWave.configure do |config|
config.fog_authenticated_url_expiration = 0
end
end
private
def ensure_permited_ip
if request.remote_ip!= "127.0.0.1" # allow access only from localhost
redirect_to root_path
end
end
Gems:
gem 'aws-sdk-rails'
gem 'aws-sdk-s3', '~> 1'
My experiences are limited but my suggestions would be:
Cost related pit falls to avoid when deploying rails app?
if you're gonna be using a background-job, use rufus-scheduler instead of sidekiq or delayed_job, because it runs on top of your rails server and would not require additional memory / additional dedicated processes. This allows you to procure the smallest/cheapest possible instance: t2.nano, which I did once before.
Easy cloud hosting services to use?
Heroku would be a good choice, because it is a lot easy to set it up. However if you're doing this for the experience, I would suggest to procure unmanaged hosting like AWS EC2 or Linode. I recently migrated my server from AWS to Vpsdime 3 months ago because it's cheap and has big memory; so far so good.
My current concern
For carrierwave, you may restrict S3 access. See reference. This then prevents hotlinking and would then require a user to view your Rails pages first in order to download or view or show the S3 files. Now that Rails now have control over the S3 files, you can just simply use something like Rack::Attack to prevent DDOS or excessive requests. If your Rails app is configured with Apache or Nginx, you can instead set up DDOS rules there instead of using Rack::Attack. Or, if you are gonna be using AWS load balancer to manage / route the requests, then you can use AWS Shield ... haven't really used this yet though.

Retrieving just file name from AWS S3 in Ruby

I'm making Angular-Rails web app now. I successfully retrieve files from certain path in AWS S3.
Let's say I call below function
#files = bucket.objects.with_prefix('pdf/folder/')
#files.each(:limit => 20) do |file|
puts file.key
end
file.key prints pdf/folder/file1.pdf, pdf/folder.file2.pdf, etc.
I do not want the whole path but just name of files like file1.pdf, file2.pdf, etc.
Is regex the only way or is there a API call for this in AWS S3? I was reading the doc and could not find related API function.
The call you want is probably File#basename:
puts File.basename(file.key)

Carrierwave + Fog + S3 remove file without going through a model

I am building an application that has a chat component to it. The application allows users to upload files to the chat. The chat is all javascript but i wanted to use Carrierwave for the uploads because i am using it elsewhere in the application. I am doing the handling of the uploads through AJAX so that i can get into Rails land and let Carrierwave take over.
I have been able to get the chat to successfully upload the files to the correct location in my S3 bucket. The thing i can't figure out is how to delete the files. Here is my code the uploads the files - this is the method that is called from the route that the AJAX call hits.
def upload
file = File.open(params[:file_0].tempfile)
uploader = ChatUploader.new
uploader.store!(file)
end
There is little to no documentation with Carrierwave on how to upload files without going through a model and basically NO documentation on how to remove files without going through a model. I assume it is possible though - i just need to know what to call. So i guess my question is how do i delete files?
UPDATE (11/23)
I got the code to save and delete files from S3 using these methods:
# code to save the file
def upload
file = File.open(params[:file_0].tempfile)
uploader = ChatUploader.new
uploader.store!(file)
uploader.store_path()
end
# code to remove files
def remove_file
file = params[:file]
uploader = ChatUploader.new
uploader.retrieve_from_store!(file)
uploader.remove!
end
My only issue now is that the filename for the uploaded file is not correct. It saves all files with a "RackMultipart" and then some numbers which look like a date, time, and identifier? (example: RackMultipart20141123-17740-1tq4j1g) Need to try and use the original filename plus maybe a timestamp for uniqueness.
I believe it has something to do with these two lines:
file = File.open(params[:file_0].tempfile)
and
uploader.store!(file)

Rails: allow download of files stored on S3 without showing the actual S3 URL to user

I have a Rails application hosted on Heroku. The app generates and stores PDF files on Amazon S3. Users can download these files for viewing in their browser or to save on their computer.
The problem I am having is that although downloading of these files is possible via the S3 URL (like "https://s3.amazonaws.com/my-bucket/F4D8CESSDF.pdf"), it is obviously NOT a good way to do it. It is not desirable to expose to the user so much information about the backend, not to mention the security issues that rise.
Is it possible to have my app somehow retrieve the file data from S3 in a controller, then create a download stream for the user, so that the Amazon URL is not exposed?
You can create your s3 objects as private and generate temporary public urls for them with url_for method (aws-s3 gem). This way you don't stream files through your app servers, which is more scalable. It also allows putting session based authorization (e.g. devise in your app), tracking of download events, etc.
In order to do this, change direct links to s3 hosted files into links to controller/action which creates temporary url and redirects to it. Like this:
class HostedFilesController < ApplicationController
def show
s3_name = params[:id] # sanitize name here, restrict access to only some paths, etc
AWS::S3::Base.establish_connection!( ... )
url = AWS::S3::S3Object.url_for(s3_name, YOUR_BUCKET, :expires_in => 2.minutes)
redirect_to url
end
end
Hiding of amazon domain in download urls is usually done with DNS aliasing. You need to create CNAME record aliasing your subdomain, e.g. downloads.mydomain, to s3.amazonaws.com. Then you can specify :server option in AWS::S3::Base.establish_connection!(:server => "downloads.mydomain", ...) and S3 gem will use it for generating links.
Yes, this is possible - just fetch the remote file with Rails and either store it temporarily on your server or send it directly from the buffer. The problem with this is of course the fact that you need to fetch the file first before you can serve it to the user. See this thread for a discussion, their solution is something like this:
#environment.rb
require 'open-uri'
#controller
def index
data = open(params[:file])
send_data data, :filename => params[:name], ...
end
This issue is also somewhat related.
First you need create a CNAME in your domain, like explain here.
Second you need create a bucket with the same name that you put in CNAME.
And to finish you need add this configurations in your config/initializers/carrierwave.rb:
CarrierWave.configure do |config|
...
config.asset_host = 'http://bucket_name.your_domain.com'
config.fog_directory = 'bucket_name.your_domain.com'
...
end

Rails: Images on one server, CSS and Javascript on another

I am working on a rails app that has a bunch (hundreds) of images that are hosted on an S3 server. To have helpers like image_tag point here I had to add this to by config/environments/development.rb test.rb and production.rb:
config.action_controller.asset_host = "http://mybucket.s3.amazonaws.com"
However, this also means that it looks there for CSS and Javascript. This is a huge pain because each time I change the CSS I have to re-upload it to Amazon.
So.. Is there an easy way I can make my app look to Amazon for images, but locally for CSS/Javascript?
(I'm using Rails 3.0)
You can pass a Proc object to config.action_controller.asset_host and have it determine the result programmatically at runtime.
config.action_controller.asset_host = Proc.new do |source|
case source
when /^\/(images|videos|audios)/
"http://mybucket.s3.amazonaws.com"
else
"http://mydomain.com"
end
end
but left as it is, this would give you http://mybucket.s3.amazonaws.com/images/whatever.png when you use image_tag :whatever.
If you want to modify the path as well, you can do something very similar with config.action_controller.asset_path
config.action_controller.asset_path = Proc.new do |path|
path.sub /^\/(images|videos|audios)/, ""
end
which would give you http://mybucket.s3.amazonaws.com/whatever.png combined with the former.
There's nothing stopping you from passing full url to image_tag: image_tag("#{IMAGE_ROOT}/icon.png").
But to me moving static images (icons, backgrounds, etc) to S3 and leaving stylesheets/js files on rails sounds kinda inconsistent. You could either move them all to S3 or setup Apache for caching (if you're afraid users pulling big images will create too much overhead for Rails).
BTW, you don't have to put config.action_controller... into config files for all three environments: placing that line just in config/environment.rb will have the same effect.

Resources