Prevent bots from accessing rails active_storage images - ruby-on-rails

My site has a large number of graphs which are recalculated each day as new data is available. The graphs are stored on Amazon S3 using active_storage. A typical example would be
# app/models/graph.rb
class Graph < ApplicationRecord
has_one_attached :plot
end
and in the view
<%= image_tag graphs.latest.plot %>
where graphs.latest retrieves the latest graph. Each day, a new graph and attached plot is created and the old graph/plot is deleted.
A number of bots, including from Google and Yandex are indexing the graphs, but then are generating exceptions when the bot returns and accesses the image again at urls like
www.myapp.com/rails/active_storage/representations/somelonghash
Is there a way to produce a durable link for the plot that does not expire when the graph/plot is deleted and then recalculated. Failing this, is there a way to block bots from accessing these plots.
Note that I currently have a catchall at the end of the routes.rb file:
get '*all', to: 'application#route_not_found', constraints: lambda { |req|
req.path.exclude? 'rails/active_storage'
} if Rails.env.production?
The exclusion of active storage in the catchall is in response to this issue. It is tempting to remove the active_storage exemption, but this might then stop proper active_storage routes.
Maybe I can put something in rack_rewrite.rb to fix this?

Interesting question.
A simple solution would be to use the send_data functionality to send the image directly. However, that can have it's own issues, mostly in terms of probably increasing server bandwidth usage (and reducing server performance). However, a solution like that is what you need if you don't want to go through the trouble of the below as far as creating a redirect model goes and the logic around that.
Original Answer
The redirect will require setting up some sort of Redirects::Graph model. That basically can verify that a graph was deleted and redirect to the new graph instead of the requested one. It would have two fields, a old_signed_id (biglonghash) and a new_signed_id.
Every time you delete
We'll need to populate the redirects model and also add a new entry every time a new graph is created (we should be able to generate the signed_id from the blob somehow).
For performance, and to avoid tons of redirects in a row which may result in a different error/issue. You'll have to manage the request change. IE: Say you now have a redirect A => B, you delete B replacing it with C now you need a A => C and B => C (to avoid an A => B => C redirect), this chain could get rather long indeed. This can be handled efficiently by just adding the new signed_id => new_id index and doing a Redirects::Graph.where(new_signed_id: old_signed_id).update_all(new_signed_id: new_signed_id) to update all the relevant old redirects, whenever you re-generate the graph.
The controller itself is trickier, cleanest method I can think of is to monkey patch the ActiveStorage::RepresentationsController to add a before_action that does something like this (may not work as-is, the params[:signed_id] and the representations path may not be right):
before_action :redirect_if_needed
def redirect_if_needed
redirect_model = Redirects::Graph.find_by(old_signed_id: params[:signed_id])
redirect_to rails_activestorage_representations_path(
signed_id: redirect_model.new_signed_id
) if redirect_model.present?
end
If you have version control setup for your database (IE: Papertrail gem or something) you may be able to work out the old_signed_id and new_signed_id with a bit of work and build the redirects for the urls currently causing errors. Otherwise, sadly this approach will only prevent future errors, and it may be impossible to get the current broken urls working.
Ideally, though you would update the blob itself to use the new graph instead of the old graph rather than deleting, but not sure that's possible/practical.

have you tried?
In the file /robots.txt put:
User-agent: *
Disallow: /rails/active_storage*

Related

How to make a rails server wide object?

I am using the RedditKit gem and in order to access certain elements, I need to send a request to reddit api to create a "client" object. Below is my current logic:
## application_controller
before_action :redditkit_login
private
def redditkit_login
#client = RedditKit::Client.new ENV["reddit_username"], ENV["reddit_password"]
end
As you can see in my logic here, before EVERY REQUEST, I am subsequently making a new client object and then using that everywhere.
My question is, how do I only make one client object which can be used to serve ALL requests from anywhere?
My motive behind this is speed. For every request to server, I am making a new request to reddit and then responding to the original request. I want to have the client object readily available at all times.
You have a lot of options. A simple one would be to create a config/initializers/reddit_client.rb file and put in there:
RedditClient = RedditKit::Client.new ENV.fetch("reddit_username"), ENV("reddit_password")
(note I switched to ENV.fetch because it will error if the key is not found, which can be helpful).
You could also rename the file as app/models/reddit_client.rb. Although it's not really a model, that folder is also autoloaded so it should work as well.

Shopify API calls not working in Rails background job

In my Rails controller action, I have a method that does a bunch of Shopify API calls. Things like:
ShopifyAPI::Product.all()
ShopifyAPI::Product.find(:all, params: {title: title})
ShopifyAPI::Product.create(title: title, body_html: description, images: images, tags: tags, product_type: product_type)
All of it does what I want...very neat.
The problem is that I'm going to be uploading a CSV and using this controller method. It's fine if I have like 8 line items, but very quickly it gets slow. So, I thought, let's move it to a background worker.
I'm using Redis/Resque to get everything going and using some dummy outputs (i.e. puts 'Hi there champ!') I've confirmed that the background worker is configured properly and executing when and where it should be. Neat.
So then I put bits and pieces of my controller action in and output that. That all works until I hit my Shopify API calls. I can call .new on about any object, but the when I try to .find, .all, or .create any valid object (which worked before I abstracted it to the background job), it sort of goes dead. Super descriptive! I've tried to output what's going on via logger and puts but I can't seem to generate much output of what's going on, but I have isolated it down to the Shopify API work. I thought that, even though I have an initializer that specifies my passwords, site, API keys, secrets, etc, I might need to reinitialize my Shopify session, as per their setup docs here. I either did it wrong, or that did solve the issue.
At this point I'm sure I'm just missing something in the docs, but I cannot find out how to make these necessary API calls from my background job. Any thoughts on what I might be doing obviously wrong that could solve this? Anyone dealt with anything similar?
Turns out this has to do with where the Shopify Engine was mounted. In my routes.rb I have the following (in addition to other routes; these are the two pertinent ones):
mount ShopifyApp::Engine, at: '/'
root to: 'products#index'
This is all fine and good, but sort of forces the context of your Shopify API calls to be made within the context of the products.rb index controller action...without some changes. 2 ways to do this, one obviously the more Railsy way to do it:
Option 1:
Include
session = ShopifyApp::SessionRepository.retrieve(1)
ShopifyAPI::Base.activate_session(session)
at the beginning of any file in which you want to make Shopify API calls. This sets the session (assuming you only have 1 store, by the way...this is using the retrieve method to retrieve store 1. Risky assumption), authenticate to the API, and everything in life is good.
Option 2:
Class inheritance for the win. Have all your controllers that are making API calls inherit from ShopifyApp::AuthenticatedController. This makes the initializer actually work, and that's it. This is (in retrospect) the clear and obvious way to go. Have an order controller? class OrdersController < ShopifyApp::AuthenticatedController and done: order = ShopifyAPI::Order.find(params[:id]) does exactly what you'd expect it to.

Rails: Model.find() or Model.find_by_id() to avoid RecordNotFound

I just realized I had a very hard to find bug on my website. I frequently use Model.find to retrieve data from my database.
A year ago I merged three websites causing a lot of redirections that needed to be handled. To do I created a "catch all"-functionality in my application controller as this:
around_filter :catch_not_found
def catch_not_found
yield
rescue ActiveRecord::RecordNotFound
require 'functions/redirections'
handle_redirection(request.path)
end
in addition I have this at the bottom of my routes.rb:
match '*not_found_path', :to => 'redirections#not_found_catcher', via: :get, as: :redirect_catcher, :constraints => lambda{|req| req.path !~ /\.(png|gif|jpg|txt|js|css)$/ }
Redirection-controller has:
def not_found_catcher
handle_redirection(request.path)
end
I am not sure these things are relevant in this question but I guess it is better to tell.
My actual problem
I frequently use Model.find to retrieve data from my database. Let's say I have a Product-model with a controller like this:
def show
#product = Product.find(params[:id])
#product.country = Country.find(...some id that does not exist...)
end
# View
<%= #product.country.name %>
This is something I use in some 700+ places in my application. What I realized today was that even though the Product model will be found. Calling the Country.find() and NOT find something causes a RecordNotFound, which in turn causes a 404 error.
I have made my app around the expectation that #product.country = nil if it couldn't find that Country in the .find-search. I know now that is not the case - it will create a RecordNotFound. Basically, if I load the Product#show I will get a 404-page where I would expect to get a 500-error (since #product.country = nil and nil.name should not work).
My question
My big question now. Am I doing things wrong in my app, should I always use Model.find_by_id for queries like my Country.find(...some id...)? What is the best practise here?
Or, does the problem lie within my catch all in the Application Controller?
To answer your questions:
should I always use Model.find_by_id
If you want to find by an id, use Country.find(...some id...). If you want to find be something else, use eg. Country.find_by(name: 'Australia'). The find_by_name syntax is no longer favoured in Rails 4.
But that's an aside, and is not your problem.
Or, does the problem lie within my catch all in the Application Controller?
Yeah, that sounds like a recipe for pain to me. I'm not sure what specifically you're doing or what the nature of your redirections is, but based on the vague sense I get of what you're trying to do, here's how I'd approach it:
Your Rails app shouldn't be responsible for redirecting routes from your previous websites / applications. That should be the responsibility of your webserver (eg nginx or apache or whatever).
Essentially you want to make a big fat list of all the URLs you want to redirect FROM, and where you want to redirect them TO, and then format them in the way your webserver expects, and configure your webserver to do the redirects for you. Search for eg "301 redirect nginx" or "301 redirect apache" to find out info on how to set that up.
If you've got a lot of URLs to redirect, you'll likely want to generate the list with code (most of the logic should already be there in your handle_redirection(request.path) method).
Once you've run that code and generated the list, you can throw that code away, your webserver will be handling the redirects form the old sites, and your rails app can happily go on with no knowledge of the previous sites / URLs, and no dangerous catch-all logic in your application controller.
That is a very interesting way to handle exceptions...
In Rails you use rescue_from to handle exceptions on the controller layer:
class ApplicationController < ActionController::Base
rescue_from SomeError, with: :oh_noes
private def oh_noes
render text: 'Oh no.'
end
end
However Rails already handles some exceptions by serving static html pages (among them ActiveRecord::RecordNotFound). Which you can override with dynamic handlers.
However as #joshua.paling already pointed out you should be handling the redirects on the server level instead of in your application.

Prevent modification ("hacking") of hidden fields in form in rails3?

So lets say I have a form for submitting a new post.
The form has a hidden field which specify's the category_id. We are also on the show view for that very category.
What I'm worried about, is that someone using something like firebug, might just edit the category id in the code, and then submit the form - creating a post for a different category.
Obviously my form is more complicated and a different scenario - but the idea is the same. I also cannot define the category in the post's create controller, as the category will be different on each show view...
Any solutions?
EDIT:
Here is a better question - is it possible to grab the Category id in the create controller for the post, if its not in a hidden field?
Does your site have the concept of permissions / access control lists on the categories themselves? If the user would have access to the other category, then I'd say there's no worry here since there's nothing stopping them from going to that other category and doing the same.
If your categories are restricted in some manner, then I'd suggest nesting your Post under a category (nested resource routes) and do a before_filter to ensure you're granted access to the appropriate category.
config/routes.rb
resources :categories do
resources :posts
end
app/controllers/posts_controller
before_filter :ensure_category_access
def create
#post = #category.posts.new(params[:post])
...
end
private
def ensure_category_access
#category = Category.find(params[:category_id])
# do whatever you need to do. if you don't have to validate access, then I'm not sure I'd worry about this.
# If the user wants to change their category in their post instead of
# going to the other category and posting there, I don't think I see a concern?
end
URL would look like
GET
/categories/1/posts/new
POST
/categories/1/posts
pst is right- never trust the user. Double-check the value sent via the view in your controller and, if it does't match something valid, kick the user out (auto-logout) and send the admin an email. You may also want to lock the user's account if it keeps happening.
Never, ever trust the user, of course ;-)
Now, that being said, it is possible to with a very high degree of confidence rely on hidden fields for temporal storage/staging (although this can generally also be handled entirely on the server with the session as well): ASP.NET follows this model and it has proven to be very secure against tampering if used correctly -- so what's the secret?
Hash validation aka MAC (Message Authentication Code). The ASP.NET MAC and usage is discussed briefly this article. In short the MAC is a hash of the form data (built using a server -- and perhaps session -- secret key) which is embedded in the form as a hidden field. When the form submission occurs this MAC is re-calculated from the data and then compared with the original MAC. Because the secrets are known only to the server it is not (realistically) possible for a client to generate a valid MAC from the data itself.
However, I do not use RoR or know what modules, if any, may implement security like this. I do hope that someone can provide more insight (in their own answer ;-) if such solutions exist, because it is a very powerful construct and easily allows safe per-form data association and validation.
Happy coding.

Rails: Obfuscating Image URLs on Amazon S3? (security concern)

To make a long explanation short, suffice it to say that my Rails app allows users to upload images to the app that they will want to keep in the app (meaning, no hotlinking).
So I'm trying to come up with a way to obfuscate the image URLs so that the address of the image depends on whether or not that user is logged in to the site, so if anyone tried hotlinking to the image, they would get a 401 access denied error.
I was thinking that if I could route the request through a controller, I could re-use a lot of the authorization I've already built into my app, but I'm stuck there.
What I'd like is for my images to be accessible through a URL to one of my controllers, like:
http://railsapp.com/images/obfuscated?member_id=1234&pic_id=7890
If the user where to right-click on the image displayed on the website and select "Copy Address", then past it in, it would be the SAME url (as in, wouldn't betray where the image is actually hosted).
The actual image would be living on a URL like this:
http://s3.amazonaws.com/s3username/assets/member_id/pic_id.extension
Is this possible to accomplish? Perhaps using Rails' render method? Or something else? I know it's possible for PHP to return the correct headers to make the browser think it's an image, but I don't know how to do this in Rails...
UPDATE: I want all users of the app to be able to view the images if and ONLY if they are currently logged on to the site. If the user does not have a currently active session on the site, accessing the images directly should yield a generic image, or an error message.
S3 allows you to construct query strings for requests which allow a time-limited download of an otherwise private object. You can generate the URL for the image uniquely for each user, with a short timeout to prevent reuse.
See the documentation, look for the section "Query String Request Authentication Alternative". I'd link directly, but the frame-busting javascript prevents it.
Should the images be available to only that user or do you want to make it available to a group of users (friends)?
In any case if you want to stop hotlinking you should not store the image files under DocumentRoot of your webserver.
If the former, you could store the image on the server as MD5(image_file_name_as_exposed_to_user + logged_in_username_from_cookie). When the user requests image_file_name_as_exposed_to_user, in your rails app, construct the image filename as previously mentioned and then open the file in rails app and write it out (after first setting Content-Type in response header appropriately). This is secure by design.
If the image could be shared with friends, then you should not incorporate username in constructed filename but rest of the advice should work.
This is late in the day to be answering, but another option altogether would be to store the files in MongoDB's GridFS, served through a bit of Rack Middleware that requires auth to be passed. Pretty much as secure as you like, and the URLs don't even need obfuscation.
The other benefit of this is in the availability of the files and the future scalability of the system.
Thanks for your responses, but I'm still skeptical as to whether or not "timing out" the URL from Amazon is a very effective way to go.
I've updated my question above to be a little more clear about what I'm trying to do, and trying to prevent.
After some experimentation, I've come up with a way to do what I want to do in my Rails App, though this solution is not without downsides. Effectively what I've done is to construct my image_tag with a URL that points to a controller, and takes a path parameter. That controller first tests whether or not the user is authorized to see the image, then it fetches the content of the image in a separate request, and stores the content in an instance variable, which is then passed to a repond_to view to return the image, successfully obfuscating the actual image's URL (since that request is made separately).
Cons:
Adds to request time (I feel that the additional time it takes to do this double-request is acceptable considering the privacy this method gives me)
Adds some clutter to views and routes (a small amount, maybe a bit more than I'd like)
If the user is authorized, and tries to access the image directly, the image is downloaded immediately rather than displayed in the browser (anyone know how to fix this? Modify HTTP headers? Only seems to do this with the jpg, though...)
You have to make a separate view for each file format you intend to serve (two for me, jpg and png)
Are there any other cons or considerations I should be aware of with this method? So far what I've listed, I can live with...
(Refactoring welcome.)
application_controller.rb
class ApplicationController < ActionController::Base
def obfuscate_image
respond_to do |format|
if current_user
format.jpg { #obfuscated_image = fetch_url "http://s3.amazonaws.com/#{Settings.bucket}/#{params[:path]}" }
else
format.png { #obfuscated_image = fetch_url "#{root_url}/images/assets/profile/placeholder.png" }
end
end
end
protected
# helps us fetch an image, obfuscated
def fetch_url(url)
r = Net::HTTP.get_response(URI.parse(url))
if r.is_a? Net::HTTPSuccess
r.body
else
nil
end
end
end
views/application/obfuscate_image.png.haml & views/application/obfuscate_image.jpg.haml
= #obfuscated_image
routes.rb
map.obfuscate_image 'obfuscate_image', :controller => 'application', :action => 'obfuscate_image'
config/environment.rb
Mime::Type.register "image/png", :png
Mime::Type.register "image/jpg", :jpg
Calling an obfuscated image
= image_tag "/obfuscate_image?path=#{#user.profile_pic.path}"
The problem you have is that as far as I know you need the images on S3 to be World-readable for them to be accessible. At some point in the process an HTTP GET is going to have to be performed to retrieve the image, which is going to expose the real URL to tools that can sniff HTTP, such as Firebug.
Incidentally, 37signals don't consider this to be a huge problem because if I view an image in my private Backpack account I can see the public S3 URL in the browser address bar. Your mileage may vary...

Resources