Rails: dynamic robots.txt with erb

Rails: dynamic robots.txt with erb - ruby-on-rails

I'm trying to render a dynamic text file (robots.txt) in my Rails (3.0.10) app, but it continues to render it as HTML (says the console).
match 'robots.txt' => 'sites#robots'
Controller:
class SitesController < ApplicationController
respond_to :html, :js, :xml, :css, :txt
def robots
#site = Site.find_by_subdomain # blah blah
end
end
app/views/sites/robots.txt.erb:
Sitemap: <%= #site.url %>/sitemap.xml
But when I visit http://www.example.com/robots.txt I get a blank page/source, and the log says:
Started GET "/robots.txt" for 127.0.0.1 at 2011-11-21 11:22:13 -0500
Processing by SitesController#robots as HTML
Site Load (0.4ms) SELECT `sites`.* FROM `sites` WHERE (`sites`.`subdomain` = 'blah') ORDER BY created_at DESC LIMIT 1
Completed 406 Not Acceptable in 828ms
Any idea what I'm doing wrong?
Note: I added this to config/initializers/mime_types, cause Rails was complaining about not knowing what the .txt mime type was:
Mime::Type.register_alias "text/plain", :txt
Note 2: I did remove the stock robots.txt from the public directory.

NOTE: This is a repost from coderwall.
Reading up on some advice to a similar answer on Stackoverflow, I currently use the following solution to render a dynamic robots.txt based on the request's host parameter.
Routing
# config/routes.rb
#
# Dynamic robots.txt
get 'robots.:format' => 'robots#index'
Controller
# app/controllers/robots_controller.rb
class RobotsController < ApplicationController
# No layout
layout false
# Render a robots.txt file based on whether the request
# is performed against a canonical url or not
# Prevent robots from indexing content served via a CDN twice
def index
if canonical_host?
render 'allow'
else
render 'disallow'
end
end
private
def canonical_host?
request.host =~ /plugingeek\.com/
end
end
Views
Based on the request.host we render one of two different .text.erb view files.
Allowing robots
# app/views/robots/allow.text.erb # Note the .text extension
# Allow robots to index the entire site except some specified routes
# rendered when site is visited with the default hostname
# http://www.robotstxt.org/
# ALLOW ROBOTS
User-agent: *
Disallow:
Banning spiders
# app/views/robots/disallow.text.erb # Note the .text extension
# Disallow robots to index any page on the site
# rendered when robot is visiting the site
# via the Cloudfront CDN URL
# to prevent duplicate indexing
# and search results referencing the Cloudfront URL
# DISALLOW ROBOTS
User-agent: *
Disallow: /
Specs
Testing the setup with RSpec and Capybara can be done quite easily, too.
# spec/features/robots_spec.rb
require 'spec_helper'
feature "Robots" do
context "canonical host" do
scenario "allow robots to index the site" do
Capybara.app_host = 'http://www.plugingeek.com'
visit '/robots.txt'
Capybara.app_host = nil
expect(page).to have_content('# ALLOW ROBOTS')
expect(page).to have_content('User-agent: *')
expect(page).to have_content('Disallow:')
expect(page).to have_no_content('Disallow: /')
end
end
context "non-canonical host" do
scenario "deny robots to index the site" do
visit '/robots.txt'
expect(page).to have_content('# DISALLOW ROBOTS')
expect(page).to have_content('User-agent: *')
expect(page).to have_content('Disallow: /')
end
end
end
# This would be the resulting docs
# Robots
# canonical host
# allow robots to index the site
# non-canonical host
# deny robots to index the site
As a last step, you might need to remove the static public/robots.txt in the public folder if it's still present.
I hope you find this useful. Feel free to comment, helping to improve this technique even further.

One solution that works in Rails 3.2.3 (not sure about 3.0.10) is as follows:
1) Name your template file robots.text.erb # Emphasis on text vs. txt
2) Setup your route like this: match '/robots.:format' => 'sites#robots'
3) Leave your action as is (you can remove the respond_with in the controller)
def robots
#site = Site.find_by_subdomain # blah blah
end
This solution also eliminates the need to explicitly specify txt.erb in the render call mentioned in the accepted answer.

For my rails projects I usually have a seperate controller for the robots.txt response
class RobotsController < ApplicationController
layout nil
def index
host = request.host
if host == 'lawc.at' then #liveserver
render 'allow.txt', :content_type => "text/plain"
else #testserver
render 'disallow.txt', :content_type => "text/plain"
end
end
end
Then I have views named : disallow.txt.erb and allow.txt.erb
And in my routes.rb I have
get "robots.txt" => 'robots#index'

I don't like the idea of robots.txt reaching my Web Server.
If you are using Nginx/Apache as your reverse proxy, Static files would be much faster to handle by them than the request reaching rails itself.
This is much cleaner, and I think this is more faster too.
Try using the following setting.
nginx.conf - for production
location /robots.txt {
alias /path-to-your-rails-public-directory/production-robots.txt;
}
nginx.conf - for stage
location /robots.txt {
alias /path-to-your-rails-public-directory/stage-robots.txt;
}

I think the problem is that if you define respond_to in your controller, you have to use respond_with in the action:
def robots
#site = Site.find_by_subdomain # blah blah
respond_with #site
end
Also, try explicitly specifying the .erb file to be rendered:
def robots
#site = Site.find_by_subdomain # blah blah
render 'sites/robots.txt.erb'
respond_with #site
end

Related

Rendering a dynamic robots.txt with rails

I'm replacing the default robots.txt file in rails with a dynamic one where I can control what the bots see on my site.
I've deleted the public/robots.txt file. In my PagesController, I've defined a robot's action
def robots
respond_to :text
render 'pages/robots.txt.erb'
expires_in 6.hours, public: true
end
And in my routes
get '/robots.:format' => 'pages#robots'
I've created a robots.txt.erb file in the pages views directory to respond only when the site visited is the production site.
<% if Rails.env.production? %>
User-Agent: *
Allow: /
Disallow: /admin
Sitemap: http://www.example.com/sitemap
<% else %>
User-Agent: *
Disallow: /
<% end %>
When I went to the site and robots.txt path, I got the error
Template is Missing
It wasn't finding the robots file in the pages view directory. I had previously named this file robots.html.erb and then renamed it to robots.txt.erb. The error persisted. Finally I just removed the respond_to line so now the robots action in PagesController is just
def robots
render 'pages/robots.txt.erb'
expires_in 6.hours, public: true
end
This works when I go to the URL.
I'm just curious whether or not this is good practice and if I'm losing anything by removing the respond_to action.

You will need to start by creating a route like this
# config/routes.rb
#
# Dynamic robots.txt
get 'robots.:format' => 'robots#index'
Now we have to create a controller called robots
# app/controllers/robots_controller.rb
class RobotsController < ApplicationController
# No layout
layout false
# Render a robots.txt file based on whether the request
def index
...
end
end
I hope that this help.

Rendering action based on (random) route name

I'm trying to make a controller action that renders a random route from a set of given route names, without a redirect.
I know the method render controller: name, action: name but rendering fails because it tries to find a template on it's own instead of letting the target action determine the template.
Here is my code:
def random
# create basic route names
route_names = %w(root route1 route2)
# get route path
path = Rails.application.routes.url_helpers.send("#{route_names.sample}_path")
# {controller: name, action: name, param: val}
action_config = Rails.application.routes.recognize_path(path, {:method => :get})
# doesn't work
# fails with Missing template application/*action name*
return render action_config
# doesnt work
# require 'open-uri'
# render html: open("http://localhost:3000/#{path}") { |io| io.read }
# doesn't work
# require 'net/http'
# require 'uri'
# render html: Net::HTTP.get(URI.parse("http://localhost:3000/#{path}"))
# doesnt work
# ctrl = (action_config[:controller].camelize + "Controller").constantize.new
# ctrl.request = request
# ctrl.response = response
# ctrl.send(action_config[:action])
# works, but not for Derailed
# redirect_to path
# works but not for Derailed, since the server doesn't parse the <iframe>
#render html: "
# <iframe
# src='#{path}'
# width='100%'
# height='100%'
# style='overflow: visible; border: 0;'></iframe>
# <style>body {margin: 0;}</style>".html_safe
end
Could anyone make the render work properly?
background
I'm trying to debug a memory leak in my Rails app. I'm using the Derailed gem that retrieves a path from my app 10.000 times. Derailed only supports hitting a single path. So, to actually mimic site usage I'm trying to implement an action that renders a random route from a set of given routes. Derailed allows me to use a real webserver like Puma, but that configuration doesn't follow redirects, so I need Rails to render without a redirect.

You can write middleware for this:
class RandomMiddleware
def initialize(app)
#app = app
end
def call(env)
route_names = %w(/ /route1 /route2)
if env['PATH_INFO'] == '/random'
env['PATH_INFO'] = route_names.sample
end
#app.call(env)
end
end
and then insert this middleware in the stack (in config/application.rb):
config.middleware.use RandomMiddleware

You can try to open new app session inside controller, render action there and return the result:
session = ActionDispatch::Integration::Session.new(Rails.application)
session.get '/'
render html: session.body

Rails: How do I create a custom 404 error page that uses the asset pipeline?

There are many solutions for creating customized error handling pages, but almost none for Rails 4:
Basic Rails 404 Error Page
Dynamic error pages in Rails
The standard answer of encouraging people to modify 404.html in /public doesn't work for me because I want to use the CSS theme that resides in the asset pipeline. Is there a way that html files can access those styles defined in the asset pipeline? If not, is there a way to create a custom error handler that has access to the pipeline?

For Rails 4.1 I like this answer, add an asset type better; however I have not tried it. On Rails 4.0.8, these three references helped me:
Dynamic error pages is the second reference in the question. This worked just fine for me.
Custom error pages may have cribbed from the first reference, or the other way around, but goes the extra mile by adding some information about testing with Capybara.
I did not do the Capybara testing because I didn't want to change the test configuration; however, RSpec-Rails Request Specs clued me in to test these requests independently and see that they complete and return the correct content.
What follows is a nutshell description of what is taught by the three references:
Add the following setting to config/environments/production.rb
# Route exceptions to the application router vs. default
config.exceptions_app = self.routes
Edit the routing configuration, config/routes.rb to direct the error pages to an errors controller
# error pages
%w( 404 422 500 503 ).each do |code|
get code, :to => "errors#show", :code => code
end
will route the 404, 422, 500, and 503 page requests to the show action of the errors controller with a parameter code that has the value of the status code.
Create the controller, app/controllers/errors_controller.rb. Here is the entire content:
class ErrorsController < ApplicationController
def show
status_code = params[:code] || 500
flash.alert = "Status #{status_code}"
render status_code.to_s, status: status_code
end
end
My preference was to set a status message on flash.alert
Create the pages themselves. I use .erb Here is app/views/errors/500.html.erb
<p>Our apology. Your request caused an error.</p>
<%= render 'product_description' %>
So you see that you can render a partial. The page renders with all of the layout boilerplate from app/views/layouts/application.html.erb or any other layout boilerplate that you have configured. That includes the <div id='alert'><%= alert %></div> that displays the status message from the flash.
Tested with RSpec by adding a test file, spec/requests/errors_request_spec.rb. Here is abbreviated content of that file that shows a test of the 500 status page:
require 'rails_helper'
RSpec.describe "errors", :type => :request do
it "displays the 500 page" do
get "/500"
assert_select 'div#alert', 'Status 500'
assert_select 'div[itemtype]'
end
end
The first assertion checks for the flash alert. The second assertion checks for the partial.

We've made a gem which does this for you: exception_handler.
There is also a great tutorial here.
I also wrote an extensive answer on the subject here.
Middleware
# config/application.rb
config.exceptions_app = ->(env) { ExceptionController.action(:show).call(env) }
Controller
# app/controllers/exception_controller.rb
class ExceptionController < ApplicationController
respond_to :json, :js, :html
before_action :set_status
def show
respond_with #status
end
private
def set_status
def status
#exception = env['action_dispatch.exception']
#status = ActionDispatch::ExceptionWrapper.new(env, #exception).status_code
#response = ActionDispatch::ExceptionWrapper.rescue_responses[#exception.class.name]
end
end
end
View
# app/views/exception/show.html.erb
<h1>404 error</h1>
This is very simple version - I can explain more if you wish.
Basically, you need to hook into the config.exceptions_app middleware, it will capture any exception in the middleware stack (as opposed to rendering the entire environment), allowing you to send the request to your own controller#action.
If you comment, I'll help you out some more if you want!

Recognizing route with regex?

Let's say that I have a postback url that comes in as
http://domain/merkin_postback.cgi?id=987654321&new=25&total=1000&uid=3040&oid=123
and other times as:
http://domain/merkin_postback.php?id=987654321&new=25&total=1000&uid=3040&oid=123
If my route definition is
map.purchase '/merkin_postback', :controller => 'credit_purchases', :action => 'create'
it barks that either of the two forms above is invalid.
Should I be using regex to recognize either of the two forms?

This isn't a routing issue, it's a content format issue. You should be using respond_to.
class CreditPurchasesController < ActionController::Base
# This is a list of all possible formats this controller might expect
# We need php and cgi, and I'm guesses html for your other methods
respond_to :html, :php, :cgi
def create
# ...
# Do some stuff
# ...
# This is how you can decide what to render based on the format
respond_to do |format|
# This means if the format is php or cgi, then do the render
format.any(:php, :cgi) { render :something }
# Note that if you only have one format for a particular render action, you can do:
# format.php { render :something }
# The "format.any" is only for multiple formats rendering the exact same thing, like your case
end
end
end

switching rails controller

I have to separate models: nested sections and articles, section has_many articles.
Both have path attribute like aaa/bbb/ccc, for example:
movies # section
movies/popular # section
movies/popular/matrix # article
movies/popular/matrix-reloaded # article
...
movies/ratings # article
about # article
...
In routes I have:
map.path '*path', :controller => 'path', :action => 'show'
How to create show action like
def show
if section = Section.find_by_path!(params[:path])
# run SectionsController, :show
elsif article = Article.find_by_path!(params[:path])
# run ArticlesController, :show
else
raise ActiveRecord::RecordNotFound.new(:)
end
end

You should use Rack middleware to intercept the request and then rewrite the url for your proper Rails application. This way, your routes files remains very simple.
map.resources :section
map.resources :articles
In the middleware you look up the entity associated with the path and remap the url to the simple internal url, allowing Rails routing to dispatch to the correct controller and invoking the filter chain normally.
Update
Here's a simple walkthrough of adding this kind of functionality using a Rails Metal component and the code you provided. I suggest you look at simplifying how path segments are looked up since you're duplicating a lot of database-work with the current code.
$ script/generate metal path_rewriter
create app/metal
create app/metal/path_rewriter.rb
path_rewriter.rb
# Allow the metal piece to run in isolation
require(File.dirname(__FILE__) + "/../../config/environment") unless defined?(Rails)
class PathRewriter
def self.call(env)
path = env["PATH_INFO"]
new_path = path
if article = Article.find_by_path(path)
new_path = "/articles/#{article.id}"
elsif section = Section.find_by_path(path)
new_path = "/sections/#{section.id}"
end
env["REQUEST_PATH"] =
env["REQUEST_URI"] =
env["PATH_INFO"] = new_path
[404, {"Content-Type" => "text/html"}, [ ]]
end
end
For a good intro to using Metal and Rack in general, check out Ryan Bates' Railscast episode on Metal, and episode on Rack.

Rather than instantiating the other controllers I would just render a different template from PathController's show action depending on if the path matches a section or an article. i.e.
def show
if #section = Section.find_by_path!(params[:path])
render :template => 'section/show'
elsif #article = Article.find_by_path!(params[:path])
render :template => 'article/show'
else
# raise exception
end
end
The reason being that, whilst you could create instances of one controller within another, it wouldn't work the way you'd want. i.e. the second controller wouldn't have access to your params, session etc and then the calling controller wouldn't have access to instance variables and render requests made in the second controller.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Rails: dynamic robots.txt with erb - ruby-on-rails

Related

Rendering a dynamic robots.txt with rails

Rendering action based on (random) route name

Rails: How do I create a custom 404 error page that uses the asset pipeline?

Recognizing route with regex?

switching rails controller

Categories

Resources