Rendering a dynamic robots.txt with rails - ruby-on-rails

I'm replacing the default robots.txt file in rails with a dynamic one where I can control what the bots see on my site.
I've deleted the public/robots.txt file. In my PagesController, I've defined a robot's action
def robots
respond_to :text
render 'pages/robots.txt.erb'
expires_in 6.hours, public: true
end
And in my routes
get '/robots.:format' => 'pages#robots'
I've created a robots.txt.erb file in the pages views directory to respond only when the site visited is the production site.
<% if Rails.env.production? %>
User-Agent: *
Allow: /
Disallow: /admin
Sitemap: http://www.example.com/sitemap
<% else %>
User-Agent: *
Disallow: /
<% end %>
When I went to the site and robots.txt path, I got the error
Template is Missing
It wasn't finding the robots file in the pages view directory. I had previously named this file robots.html.erb and then renamed it to robots.txt.erb. The error persisted. Finally I just removed the respond_to line so now the robots action in PagesController is just
def robots
render 'pages/robots.txt.erb'
expires_in 6.hours, public: true
end
This works when I go to the URL.
I'm just curious whether or not this is good practice and if I'm losing anything by removing the respond_to action.

You will need to start by creating a route like this
# config/routes.rb
#
# Dynamic robots.txt
get 'robots.:format' => 'robots#index'
Now we have to create a controller called robots
# app/controllers/robots_controller.rb
class RobotsController < ApplicationController
# No layout
layout false
# Render a robots.txt file based on whether the request
def index
...
end
end
I hope that this help.

Related

Ruby on Rails Image Assets not found

On my Application.html.erb page I have my navbar and footer so that they are shared accross all pages. I am referencing images on this page as so
src="assets/img-1.jpg"
and they appear but only on my index page "localhost:3000".
When I navigate to these pages:
localhost:3000/pages/contact
localhost:3000/pages/home "index page"
localhost:3000/pages/portfolio
localhost:3000/pages/schedule
None of the images are being found. Here are is my route setup:
Rails.application.routes.draw do
root 'pages#home'
get "/pages/:page" => "pages#show"
end
Here is my pages controller:
class PagesController < ApplicationController
def show
if valid_page?
render template: "pages/#{params[:page]}"
else
render file: "public/404.html", status: :not_found
end
end
private
def valid_page?
File.exist?(Pathname.new(Rails.root + "app/views/pages/# {params[:page]}.html.erb"))
end
end
Again: images referenced as src="assets/img-1.jpg" on my application.html.erb are being found but only when I am in the index page "localhost:3000/", what I am doing wrong?
You must use the rails assets helpers to reference your image
= image_tag 'img-1.jpg'
And please have a look here, it will save you many a headache.
You are missing a /.
it will be src="/assets/img-1.jpg".
For production:
make sure it's in the /public directory or you use asset_path. Otherwise you won't see it after you precompile the assets for production.

Rails: How do I create a custom 404 error page that uses the asset pipeline?

There are many solutions for creating customized error handling pages, but almost none for Rails 4:
Basic Rails 404 Error Page
Dynamic error pages in Rails
The standard answer of encouraging people to modify 404.html in /public doesn't work for me because I want to use the CSS theme that resides in the asset pipeline. Is there a way that html files can access those styles defined in the asset pipeline? If not, is there a way to create a custom error handler that has access to the pipeline?
For Rails 4.1 I like this answer, add an asset type better; however I have not tried it. On Rails 4.0.8, these three references helped me:
Dynamic error pages is the second reference in the question. This worked just fine for me.
Custom error pages may have cribbed from the first reference, or the other way around, but goes the extra mile by adding some information about testing with Capybara.
I did not do the Capybara testing because I didn't want to change the test configuration; however, RSpec-Rails Request Specs clued me in to test these requests independently and see that they complete and return the correct content.
What follows is a nutshell description of what is taught by the three references:
Add the following setting to config/environments/production.rb
# Route exceptions to the application router vs. default
config.exceptions_app = self.routes
Edit the routing configuration, config/routes.rb to direct the error pages to an errors controller
# error pages
%w( 404 422 500 503 ).each do |code|
get code, :to => "errors#show", :code => code
end
will route the 404, 422, 500, and 503 page requests to the show action of the errors controller with a parameter code that has the value of the status code.
Create the controller, app/controllers/errors_controller.rb. Here is the entire content:
class ErrorsController < ApplicationController
def show
status_code = params[:code] || 500
flash.alert = "Status #{status_code}"
render status_code.to_s, status: status_code
end
end
My preference was to set a status message on flash.alert
Create the pages themselves. I use .erb Here is app/views/errors/500.html.erb
<p>Our apology. Your request caused an error.</p>
<%= render 'product_description' %>
So you see that you can render a partial. The page renders with all of the layout boilerplate from app/views/layouts/application.html.erb or any other layout boilerplate that you have configured. That includes the <div id='alert'><%= alert %></div> that displays the status message from the flash.
Tested with RSpec by adding a test file, spec/requests/errors_request_spec.rb. Here is abbreviated content of that file that shows a test of the 500 status page:
require 'rails_helper'
RSpec.describe "errors", :type => :request do
it "displays the 500 page" do
get "/500"
assert_select 'div#alert', 'Status 500'
assert_select 'div[itemtype]'
end
end
The first assertion checks for the flash alert. The second assertion checks for the partial.
We've made a gem which does this for you: exception_handler.
There is also a great tutorial here.
I also wrote an extensive answer on the subject here.
Middleware
# config/application.rb
config.exceptions_app = ->(env) { ExceptionController.action(:show).call(env) }
Controller
# app/controllers/exception_controller.rb
class ExceptionController < ApplicationController
respond_to :json, :js, :html
before_action :set_status
def show
respond_with #status
end
private
def set_status
def status
#exception = env['action_dispatch.exception']
#status = ActionDispatch::ExceptionWrapper.new(env, #exception).status_code
#response = ActionDispatch::ExceptionWrapper.rescue_responses[#exception.class.name]
end
end
end
View
# app/views/exception/show.html.erb
<h1>404 error</h1>
This is very simple version - I can explain more if you wish.
Basically, you need to hook into the config.exceptions_app middleware, it will capture any exception in the middleware stack (as opposed to rendering the entire environment), allowing you to send the request to your own controller#action.
If you comment, I'll help you out some more if you want!

Caching: wrong paths and pages are not expired

I've got problem with simple caching (ruby 1.9.2, rails 3.1.3, development environment):
development.rb:
config.action_controller.perform_caching = true
config.action_controller.cache_store = :file_store, 'tmp/cache'
config.action_controller.page_cache_directory = 'public/cache'
sweeper:
class CacheSweeper < ActionController::Caching::Sweeper
observe Article, Photo, Advertisement
def after_save(record)
expire_home
end
...
private
...
def expire_home
expire_page(:controller => '/homes', :action => 'index')
end
end
controllers:
class HomeController < ApplicationController
caches_page :index
cache_sweeper :cache_sweeper
def index
....
Pages are cached in right directory and actions triggers sweeper actions as they should, but pages are not expired and server is trying to get cached pages from default place.
cache: [GET /] miss
Any ideas why? Is there something wrong with my configuration?
You have the wrong controller name and leading slash. Try the following:
def expire_home
expire_page(:controller => 'home', :action => 'index')
end
expire_page expects the path of the route so for example the root url in a caches page you could do
expire_page "/"
Also, to get your web server to look into the right place you need to configure a rewrite rule in apache or nginx to look in the cache directory.

Rails: dynamic robots.txt with erb

I'm trying to render a dynamic text file (robots.txt) in my Rails (3.0.10) app, but it continues to render it as HTML (says the console).
match 'robots.txt' => 'sites#robots'
Controller:
class SitesController < ApplicationController
respond_to :html, :js, :xml, :css, :txt
def robots
#site = Site.find_by_subdomain # blah blah
end
end
app/views/sites/robots.txt.erb:
Sitemap: <%= #site.url %>/sitemap.xml
But when I visit http://www.example.com/robots.txt I get a blank page/source, and the log says:
Started GET "/robots.txt" for 127.0.0.1 at 2011-11-21 11:22:13 -0500
Processing by SitesController#robots as HTML
Site Load (0.4ms) SELECT `sites`.* FROM `sites` WHERE (`sites`.`subdomain` = 'blah') ORDER BY created_at DESC LIMIT 1
Completed 406 Not Acceptable in 828ms
Any idea what I'm doing wrong?
Note: I added this to config/initializers/mime_types, cause Rails was complaining about not knowing what the .txt mime type was:
Mime::Type.register_alias "text/plain", :txt
Note 2: I did remove the stock robots.txt from the public directory.
NOTE: This is a repost from coderwall.
Reading up on some advice to a similar answer on Stackoverflow, I currently use the following solution to render a dynamic robots.txt based on the request's host parameter.
Routing
# config/routes.rb
#
# Dynamic robots.txt
get 'robots.:format' => 'robots#index'
Controller
# app/controllers/robots_controller.rb
class RobotsController < ApplicationController
# No layout
layout false
# Render a robots.txt file based on whether the request
# is performed against a canonical url or not
# Prevent robots from indexing content served via a CDN twice
def index
if canonical_host?
render 'allow'
else
render 'disallow'
end
end
private
def canonical_host?
request.host =~ /plugingeek\.com/
end
end
Views
Based on the request.host we render one of two different .text.erb view files.
Allowing robots
# app/views/robots/allow.text.erb # Note the .text extension
# Allow robots to index the entire site except some specified routes
# rendered when site is visited with the default hostname
# http://www.robotstxt.org/
# ALLOW ROBOTS
User-agent: *
Disallow:
Banning spiders
# app/views/robots/disallow.text.erb # Note the .text extension
# Disallow robots to index any page on the site
# rendered when robot is visiting the site
# via the Cloudfront CDN URL
# to prevent duplicate indexing
# and search results referencing the Cloudfront URL
# DISALLOW ROBOTS
User-agent: *
Disallow: /
Specs
Testing the setup with RSpec and Capybara can be done quite easily, too.
# spec/features/robots_spec.rb
require 'spec_helper'
feature "Robots" do
context "canonical host" do
scenario "allow robots to index the site" do
Capybara.app_host = 'http://www.plugingeek.com'
visit '/robots.txt'
Capybara.app_host = nil
expect(page).to have_content('# ALLOW ROBOTS')
expect(page).to have_content('User-agent: *')
expect(page).to have_content('Disallow:')
expect(page).to have_no_content('Disallow: /')
end
end
context "non-canonical host" do
scenario "deny robots to index the site" do
visit '/robots.txt'
expect(page).to have_content('# DISALLOW ROBOTS')
expect(page).to have_content('User-agent: *')
expect(page).to have_content('Disallow: /')
end
end
end
# This would be the resulting docs
# Robots
# canonical host
# allow robots to index the site
# non-canonical host
# deny robots to index the site
As a last step, you might need to remove the static public/robots.txt in the public folder if it's still present.
I hope you find this useful. Feel free to comment, helping to improve this technique even further.
One solution that works in Rails 3.2.3 (not sure about 3.0.10) is as follows:
1) Name your template file robots.text.erb # Emphasis on text vs. txt
2) Setup your route like this: match '/robots.:format' => 'sites#robots'
3) Leave your action as is (you can remove the respond_with in the controller)
def robots
#site = Site.find_by_subdomain # blah blah
end
This solution also eliminates the need to explicitly specify txt.erb in the render call mentioned in the accepted answer.
For my rails projects I usually have a seperate controller for the robots.txt response
class RobotsController < ApplicationController
layout nil
def index
host = request.host
if host == 'lawc.at' then #liveserver
render 'allow.txt', :content_type => "text/plain"
else #testserver
render 'disallow.txt', :content_type => "text/plain"
end
end
end
Then I have views named : disallow.txt.erb and allow.txt.erb
And in my routes.rb I have
get "robots.txt" => 'robots#index'
I don't like the idea of robots.txt reaching my Web Server.
If you are using Nginx/Apache as your reverse proxy, Static files would be much faster to handle by them than the request reaching rails itself.
This is much cleaner, and I think this is more faster too.
Try using the following setting.
nginx.conf - for production
location /robots.txt {
alias /path-to-your-rails-public-directory/production-robots.txt;
}
nginx.conf - for stage
location /robots.txt {
alias /path-to-your-rails-public-directory/stage-robots.txt;
}
I think the problem is that if you define respond_to in your controller, you have to use respond_with in the action:
def robots
#site = Site.find_by_subdomain # blah blah
respond_with #site
end
Also, try explicitly specifying the .erb file to be rendered:
def robots
#site = Site.find_by_subdomain # blah blah
render 'sites/robots.txt.erb'
respond_with #site
end

Multiple robots.txt for subdomains in rails

I have a site with multiple subdomains and I want the named subdomains robots.txt to be different from the www one.
I tried to use .htaccess, but the FastCGI doesn't look at it.
So, I was trying to set up routes, but it doesn't seem that you can't do a direct rewrite since every routes needs a controller:
map.connect '/robots.txt', :controller => ?, :path => '/robots.www.txt', :conditions => { :subdomain => 'www' }
map.connect '/robots.txt', :controller => ?, :path => '/robots.club.txt'
What would be the best way to approach this problem?
(I am using the request_routing plugin for subdomains)
Actually, you probably want to set a mime type in mime_types.rb and do it in a respond_to block so it doesn't return it as 'text/html':
Mime::Type.register "text/plain", :txt
Then, your routes would look like this:
map.robots '/robots.txt', :controller => 'robots', :action => 'robots'
For rails3:
match '/robots.txt' => 'robots#robots'
and the controller something like this (put the file(s) where ever you like):
class RobotsController < ApplicationController
def robots
subdomain = # get subdomain, escape
robots = File.read(RAILS_ROOT + "/config/robots.#{subdomain}.txt")
respond_to do |format|
format.txt { render :text => robots, :layout => false }
end
end
end
at the risk of overengineering it, I might even be tempted to cache the file read operation...
Oh, yeah, you'll almost certainly have to remove/move the existing 'public/robots.txt' file.
Astute readers will notice that you can easily substitute RAILS_ENV for subdomain...
Why not to use rails built in views?
In your controller add this method:
class StaticPagesController < ApplicationController
def robots
render :layout => false, :content_type => "text/plain", :formats => :txt
end
end
In the view create a file: app/views/static_pages/robots.txt.erb with robots.txt content
In routes.rb place:
get '/robots.txt' => 'static_pages#robots'
Delete the file /public/robots.txt
You can add a specific business logic as needed, but this way we don't read any custom files.
As of Rails 6.0 this has been greatly simplified.
By default, if you use the :plain option, the text is rendered without
using the current layout. If you want Rails to put the text into the
current layout, you need to add the layout: true option and use the
.text.erb extension for the layout file. Source
class RobotsController < ApplicationController
def robots
subdomain = request.subdomain # Whatever logic you need
robots = File.read( "#{Rails.root}/config/robots.#{subdomain}.txt")
render plain: robots
end
end
In routes.rb
get '/robots.txt', to: 'robots#robots'
For Rails 3:
Create a controller RobotsController:
class RobotsController < ApplicationController
#This controller will render the correct 'robots' view depending on your subdomain.
def robots
subdomain = request.subdomain # you should also check for emptyness
render "robots.#{request.subdomain}"
end
end
Create robots views (1 per subdomain):
views/robots/robots.subdomain1.txt
views/robots/robots.subdomain2.txt
etc...
Add a new route in config/routes.rb: (note the :txt format option)
match '/robots.txt' => 'robots#robots', :format => :txt
And of course, you should declare the :txt format in config/initializers/Mime_types.rb:
Mime::Type.register "text/plain", :txt
Hope it helps.
If you can't configure your http server to do this before the request is sent to rails, I would just setup a 'robots' controller that renders a template like:
def show_robot
subdomain = # get subdomain, escape
render :text => open('robots.#{subdomain}.txt').read, :layout => false
end
Depending on what you're trying to accomplish you could also use a single template instead of a bunch of different files.
I liked TA Tyree's solution but it is very Rails 2.x centric so here is what I came up with for Rail 3.1.x
mime_types.rb
Mime::Type.register "text/plain", :txt
By adding the format in the routes you don't have to worry about using a respond_to block in the controller.
routes.rb
match '/robots.txt' => 'robots#robots', :format => "text"
I added a little something extra on this one. The SEO people were complaining about duplicated content both in subdomains and in SSL pages so I created a two robot files one for production and one for not production which is also going to be served with any SSL/HTTPS requests in production.
robots_controller.rb
class RobotsController < ApplicationController
def robots
site = request.host
protocol = request.protocol
(site.eql?("mysite.com") || site.eql?("www.mysite.com")) && protocol.eql?("http://") ? domain = "production" : domain = "nonproduction"
robots = File.read( "#{Rails.root}/config/robots-#{domain}.txt")
render :text => robots, :layout => false
end
end

Resources