My questions
Cost related pit falls to avoid when deploying rails app?
Attacks are welcome as it would teach me what to expect and brace myself against.
I would rather avoid big bills at the end of month, however.
Easy cloud hosting services to use?
I picked AWS because it seems scalable and I thought I can avoid leaning another service later.
I have no regrets but AWS is overwhelming, if there was significantly simpler service, I should have used it.
My current concern
Dos attack or get request flooding on aws S3 could raise hosting cost significantly as I'm uploading some contents there.
Billing alarm is useful, but without automatic shutdown I feel a little uncomfortable taking a break and going into a jungle or an inhabited island where I get no INTERNET connection to be informed of or to shut down my service...
Obvious fix for my case
Stop using S3 and save user uploads to database where I can control scaling behavior. But then, most people seems to be using S3 with carrierwave, why?
What I'm doing
Making my first ever home page using:
elastic beanstalk
rails5
Carrierwave gem configured to save user uploads in S3
Edit
In the end, I could not find any real solution to the no cap for S3 issue.
The below is more or less my note.
I'm guessing S3 has some basic built in defense against attacks because I have not heard of sad stories about people using S3 to host static web sites and getting a bill over 10000 US, which can still happen though regardless of how good amazon's defense might be.
mitigation
A script that periodically checks for s3 log files and calls an action that disables s3 resource serving when the cumulative size of those files is too large.
S3 log sometimes takes more than an hour before they become available, so it's no solution but better than nothing.
class LogObserver
def initialize
aws_conf = Aws.config.update({
access_key_id: ENV['AWS_ACCESS_KEY_ID'],
secret_access_key: ENV['AWS_SECRET_ACCESS_KEY'],
region: 'ap-northeast-1'})
#bucket_name="bucket name that holds s3 log"
#last_checked_log_timestamp = Time.now.utc
log "started at: #{Time.now}"
end
def run
bucket = Aws::S3::Resource.new.bucket(#bucket_name)
while true
prv_log_ts = #last_checked_log_timestamp
log_size = fetch_log_size(bucket)
log "The total size of S3 log accumulated since last time this script was executed: #{log_size}"
time_range = #last_checked_log_timestamp - prv_log_ts # float
log_size_per_second = log_size/time_range
if log_size_per_second > (500.kilobyte/60)
log "Disabling S3 access as S3 log size is greater than expected."
`curl localhost/static_pages/disable_s3`
end
sleep 60*60
end
end
def log text
puts text
File.open('./s3_observer_log.txt','a') do |f|
f << text
end
end
def fetch_log_size(bucket)
log_size=0
bucket.objects(prefix: 'files').each do |o|
time_object = o.last_modified
if time_object < #last_checked_log_timestamp
next
end
#last_checked_log_timestamp = time_object
log_size += o.size.to_i
end
return log_size
end
end
Rake task:
namespace :s3_log do
desc "Access S3 access log files and check their cumulative size. If the size is above the expected value, disables s3 access."
task :start_attack_detection_loop do
require './s3_observer.rb'
id=Process.fork do
o=LogObserver.new
o.run
end
puts "Forked a new process that watches s3 log. Process id: #{id}"
end
end
controller action:
before_action :ensure_permited_ip, only: [:enable_s3, :disable_s3]
def enable_s3
# allow disabling s3 only from localhost
CarrierWave.configure do |config|
config.fog_authenticated_url_expiration = 3
end
end
def disable_s3
# allow disabling s3 only from localhost
CarrierWave.configure do |config|
config.fog_authenticated_url_expiration = 0
end
end
private
def ensure_permited_ip
if request.remote_ip!= "127.0.0.1" # allow access only from localhost
redirect_to root_path
end
end
Gems:
gem 'aws-sdk-rails'
gem 'aws-sdk-s3', '~> 1'
My experiences are limited but my suggestions would be:
Cost related pit falls to avoid when deploying rails app?
if you're gonna be using a background-job, use rufus-scheduler instead of sidekiq or delayed_job, because it runs on top of your rails server and would not require additional memory / additional dedicated processes. This allows you to procure the smallest/cheapest possible instance: t2.nano, which I did once before.
Easy cloud hosting services to use?
Heroku would be a good choice, because it is a lot easy to set it up. However if you're doing this for the experience, I would suggest to procure unmanaged hosting like AWS EC2 or Linode. I recently migrated my server from AWS to Vpsdime 3 months ago because it's cheap and has big memory; so far so good.
My current concern
For carrierwave, you may restrict S3 access. See reference. This then prevents hotlinking and would then require a user to view your Rails pages first in order to download or view or show the S3 files. Now that Rails now have control over the S3 files, you can just simply use something like Rack::Attack to prevent DDOS or excessive requests. If your Rails app is configured with Apache or Nginx, you can instead set up DDOS rules there instead of using Rack::Attack. Or, if you are gonna be using AWS load balancer to manage / route the requests, then you can use AWS Shield ... haven't really used this yet though.
Related
In a controller in my Rails app I had a function that parses my S3 bucket and selects images. It's causing page load speeds to go slow but I like being able to loop through the bucket without having all the URLs hard coded.
Here is what I have:
#bucket = S3_BUCKET
#images = []
#bucket.objects.each do |file|
if file.key.include?("inspiration")
#images << { url: file.public_url, key: file.key, type: 'file'}
end
end
Is there another way to accomplish this so page load speeds don't suffer?
As it turns out there were many more files than expected and the loop took a long time to complete. I changed the code to:
#images = #bucket.objects({prefix: 'inspiration')
and the response was much faster.
Since you really can't regulate the speed at which you access your s3 bucket I would suggest setting up a CDN(Content delivery network) on Amazons Cloudfont. Please take a look at this article Written by Brandon Hikert about implementing a CDN
https://brandonhilkert.com/blog/setting-up-a-cloudfront-cdn-for-rails/
Side note - If you would like a free CDN option I would use
https://cloudinary.com/pricing
Referencing when to use a CDN over s3
https://stackoverflow.com/questions/3327425/when-to-use-amazon-cloudfront-or-s3
I have a typical Rails REST Api written for a http consumers. However, it turns out they need web socket API because of the integration POS Machines.
The typical API looks like this;
class Api::Pos::V1::TransactionsController < ApplicationController
before_action :authenticate
def index
#transactions = #current_business.business_account.business_deposits.last(5)
render json: {
status: 200,
number: #transactions.count,
transactions: #transactions.as_json(only: [:created_at, :amount, :status, :client_card_number, :client_phone_number])
}
end
private
def request_params
params.permit(:account_number, :api_key)
end
def authenticate
render status: 401, json: {
status: 401,
error: "Authentication Failed."
} unless current_business
end
def current_business
account_number = request_params[:account_number].to_s
api_key = request_params[:api_key].to_s
if account_number and api_key
account = BusinessAccount.find_by(account_number: account_number)
if account && Business.find(account.business_id).business_api_key.token =~ /^(#{api_key})/
#current_business = account.business
else
false
end
end
end
end
How can i serve the same responses using web-sockets?
P.S: Never worked with sockets before
Thank you
ActionCable
I would second Dimitris's reference to ActionCable, as it's expected to become part of Rails 5 and should (hopefully) integrate with Rails quite well.
Since Dimitris suggested SSE, I would recommend against doing so.
SSE (Server Sent Events) use long polling and I would avoid this technology for many reasons which include the issue of SSE connection interruptions and extensibility (websockets allow you to add features that SSE won't support).
I am almost tempted to go into a rant about SSE implementation performance issues, but... even though websocket implementations should be more performant, many of them suffer from similar issues and the performance increase is often only in thanks to the websocket connection's longer lifetime...
Plezi
Plezi* is a real-time web application framework for Ruby. You can either use it on it's own (which is not relevant for you) or together with Rails.
With only minimal changes to your code, you should be able to use websockets to return results from your RESTful API. Plezi's Getting Started Guide has a section about unifying the backend's RESTful and Websocket API's. Implementing it in Rails should be similar.
Here's a bit of Demo code. You can put it in a file called plezi.rb and place it in your application's config/initializers folder...
Just make sure you're not using any specific Servers (thin, puma, etc'), allowing Plezi to override the server and use the Iodine server, and remember to add Plezi to your Gemfile.
class WebsocketDemo
# authenticate
def on_open
return close unless current_business
end
def on_message data
data = JSON.parse(data) rescue nil
return close unless data
case data['msg']
when /\Aget_transactions\z/i
# call the RESTful API method here, if it's accessible. OR:
transactions = #current_business.business_account.business_deposits.last(5)
write {
status: 200,
number: transactions.count,
# the next line has what I think is an design flaw, but I left it in
transactions: transactions.as_json(only: [:created_at, :amount, :status, :client_card_number, :client_phone_number])
# # Consider, instead, to avoid nesting JSON streams:
# transactions: transactions.select(:created_at, :amount, :status, :client_card_number, :client_phone_number)
}.to_json
end
end
# don't disclose inner methods to the router
protected
# better make the original method a class method, letting you reuse it.
def current_business
account_number = params[:account_number].to_s
api_key = params[:api_key].to_s
if account_number && api_key
account = BusinessAccount.find_by(account_number: account_number)
if account && Business.find(account.business_id).business_api_key.token =~ /^(#{api_key})/
return (#current_business = account.business)
end
false
end
end
end
Plezi.route '/(:api_key)/(:account_number)', WebsocketDemo
Now we have a route that looks something like: wss://my.server.com/app_key/account_number
This route can be used to send and receive data in JSON format.
To get the transaction list, the client side application can send:
JSON.stringify({msg: "get_transactions"})
This will result in data being send to the client's websocket.onmessage callback with the last five transactions.
Of course, this is just a short demo, but I think it's a reasonable proof of concept.
* I should point out that I'm biased, as I'm Plezi's author.
P.S.
I would consider moving the authentication into a websocket "authenticate" message, allowing the application key to be sent in a less conspicuous manner.
EDIT
These are answers to the questions in the comments.
Capistrano
I don't use Capistrano, so I'm not sure... but, I think it would work if you add the following line to your Capistrano tasks:
Iodine.protocol = false
This will prevent the server from auto-starting, so your Capistrano tasks flow without interruption.
For example, at the beginning of the config/deploy.rb you can add the line:
Iodine.protocol = false
# than the rest of the file, i.e.:
set :deploy_to, '/var/www/my_app_name'
#...
You should also edit your rakefile and add the same line at the beginning of the rakefile, so your rakefile includes the line:
Iodine.protocol = false
Let me know how this works. Like I said, I don't use Capistrano and I haven't tested it out.
Keeping Passenger using a second app
The Plezi documentation states that:
If you really feel attached to your thin, unicorn, puma or passanger server, you can still integrate Plezi with your existing application, but they won't be able to share the same process and you will need to utilize the Placebo API (a guide is coming soon).
But the guide isn't written yet...
There's some information in the GitHub Readme, but it will be removed after the guide is written.
Basically you include the Plezi application with the Redis URL inside your Rails application (remember to make sure to copy all the gems used in the gemfile). than you add this line:
Plezi.start_placebo
That should be it.
Plezi will ignore the Plezi.start_placebo command if there is no other server defined, so you can put the comment in a file shared with the Rails application as long as Plezi's gem file doesn't have a different server.
You can include some or all of the Rails application code inside the Plezi application. As long as Plezi (Iodine, actually) is the only server in the Plezi GEMFILE, it should work.
The applications will synchronize using Redis and you can use your Plezi code to broadcast websocket events inside your Rails application.
You may want to have a look at https://github.com/rails/actioncable which is the Rails way to deal with WebSockets, but currently in Alpha.
Judging from your code snippet, the client seems to only consume data from your backend. I'm skeptical whether you really need WebSockets. Ιf the client won't push data back to the server, Server Sent Events seem more appropriate.
See relevant walk-through and documentation.
The Problem:
I have a rails app that requires a user to upload some type of spreadsheet (csv, xslx, xsl etc) for processing which can be a costly operation so we've decided to send it off to a background service as a solution to this problem. The issue we're concerned about is that because our production system is on Heroku we need to store the file on AS3 first then retrieve later for processing.
Because uploading the file to AS3 is in itself a costly operation, this should probably also be done as a background job. The problem is the concern that using Resque to do this could eat up a lot of RAM due to Resque needing to put the file data into Redis or later retrieval. As you know, Redis only stores its data in RAM and also prefers simple key value pairs so we would like to try and avoid this.
Heres some pseudocode as an example of what we'd like try and do:
workers/AS3Uploader.rb
require 'fog'
class AS3Uploader
#queue = :as3_uploader
def self.perform(some, file, data)
# create a connection
connection = Fog::Storage.new({
:provider => 'AWS',
:aws_access_key_id => APP_CONFIG['s3_key'],
:aws_secret_access_key => APP_CONFIG['s3_secret']
})
# First, a place to contain the glorious details
directory = connection.directories.create(
:key => "catalog-#{Time.now.to_i}", # globally unique name
:public => true
)
# list directories
p connection.directories
# upload that catalog
file = directory.files.create(
:key => 'catalog.xml',
:body => File.open(blah), # not sure how to get file data here with out putting it into RAM first using Resque/Redis
:public => true
end
# make a call to Enqueue the processing of the catalog
Resque.enqueue(CatalogProcessor, some, parameters, here)
end
controllers/catalog_upload_controller.rb
def create
# process params
# call Enqueue to start the file processing
# What do I do here? I could send all of the file data here right now
# but like I said previously that means storing potentially 100s of MB into RAM
Resque.enqueue(AS3Uploader, some, parameters, here)
end
The way I would suggest you to do would be
store your file in tmp dir you create and get the file-path
tell Resque to upload the file by using the file-path
make Resque to store the file-path in the redis not the whole file-content ( It would be very expensive )
Now worker will upload the file to AWS- S3
Note: If you have multiple instances like One instance for background processing, One for database, One as utility instance then your tmp dir may not be available to other instances.. so store the file in the temp dir inside the instance holding the resque
I have a Rails application hosted on Heroku. The app generates and stores PDF files on Amazon S3. Users can download these files for viewing in their browser or to save on their computer.
The problem I am having is that although downloading of these files is possible via the S3 URL (like "https://s3.amazonaws.com/my-bucket/F4D8CESSDF.pdf"), it is obviously NOT a good way to do it. It is not desirable to expose to the user so much information about the backend, not to mention the security issues that rise.
Is it possible to have my app somehow retrieve the file data from S3 in a controller, then create a download stream for the user, so that the Amazon URL is not exposed?
You can create your s3 objects as private and generate temporary public urls for them with url_for method (aws-s3 gem). This way you don't stream files through your app servers, which is more scalable. It also allows putting session based authorization (e.g. devise in your app), tracking of download events, etc.
In order to do this, change direct links to s3 hosted files into links to controller/action which creates temporary url and redirects to it. Like this:
class HostedFilesController < ApplicationController
def show
s3_name = params[:id] # sanitize name here, restrict access to only some paths, etc
AWS::S3::Base.establish_connection!( ... )
url = AWS::S3::S3Object.url_for(s3_name, YOUR_BUCKET, :expires_in => 2.minutes)
redirect_to url
end
end
Hiding of amazon domain in download urls is usually done with DNS aliasing. You need to create CNAME record aliasing your subdomain, e.g. downloads.mydomain, to s3.amazonaws.com. Then you can specify :server option in AWS::S3::Base.establish_connection!(:server => "downloads.mydomain", ...) and S3 gem will use it for generating links.
Yes, this is possible - just fetch the remote file with Rails and either store it temporarily on your server or send it directly from the buffer. The problem with this is of course the fact that you need to fetch the file first before you can serve it to the user. See this thread for a discussion, their solution is something like this:
#environment.rb
require 'open-uri'
#controller
def index
data = open(params[:file])
send_data data, :filename => params[:name], ...
end
This issue is also somewhat related.
First you need create a CNAME in your domain, like explain here.
Second you need create a bucket with the same name that you put in CNAME.
And to finish you need add this configurations in your config/initializers/carrierwave.rb:
CarrierWave.configure do |config|
...
config.asset_host = 'http://bucket_name.your_domain.com'
config.fog_directory = 'bucket_name.your_domain.com'
...
end
I have been struggling with a problem for the past days in a Ruby on Rails App I'm currently working on. I have different countries and for each country we use different Amazon S3 buckets. Amazon S3 key credentials are stored as constants in config/environments/environment_name.rb(ex:demo.rb) There is no way for me to determine which country we are operating from the config file. I can determine which country we are operating from the controllers,models,views,etc but not from the config file. Is there a Ruby meta programming or some other kind of magic that I'm not aware of so that I want to say if we are working on UK as a country in the app, use UK's bucket credentials or Germany as a country, use Germany's bucket credentials? I can't think of a way to pass parameters to environment files from the app itself. Thank you very much in advance for all your helps.
Rather than actually pass the configuration details to whichever S3 client you're using at launch, you should probably select the relevant credentials for each request. Your config file can define them all in a hash like so:
# config/s3.rb
S3_BUCKETS => {
:us => 'our-files-us',
:gb => 'our-files-gb',
:tz => 'special-case'
}
Then you can select the credentials on request like so (in maybe your AppController):
bucket_name = S3_BUCKETS[I18n.locale]
# pass this info to your S3 client
Make sense?
Write a little middleware if you want to keep the knowledge of the per-country configuration out of the main application.
A middleware is extremely simple. A do-nothing middleware looks like this:
class DoesNothing
def initialize(app, *args)
#app = app
end
def call(env)
#app.call(env)
end
end
Rack powers applications through chaining a series of middlewares together... each one is given a reference to #app, which is the next link in the chain, and it must invoke #call on that application. The one at the end of the chain runs the app.
So in your case, you can do some additional configuration in here.
class PerCountryConfiguration
def initialize(app)
#app = app
end
def call(env)
case env["COUNTRY"]
when "AU"
Rails.application.config.s3_buckets = { ... }
when "US"
Rails.application.config.s3_buckets = { ... }
... etc
end
#app.call(env)
end
end
There are several ways to use the middleware, but since it depends on access to the Rails environment, you'll want to do it from inside Rails. Put it in your application.rb:
config.middleware.use PerCountryConfiguration
If you want to pass additional arguments to the constructor of your middleware, just list them after the class name:
config.middleware.use PerCountryConfiguration, :some_argument
You can also mount the middleware from inside of ApplicationController, which means all of the initializers and everything will have already been executed, so it may be too far along the chain.