About a year ago I merged three websites (oldsite.com, oldsite.nu, newsite.se) into one, which I kept on one of the domains (newsite.se). I am not sure if this has been done right since I still see a lot of traffic from Google for old url:s, even after a year.
Oldsite redirect code
Important edit note: I recently realized the nameservers were not pointing towards my old rails app any longer but instead to a php-folder on my web host in which I have a .htaccess with the following code:
RewriteEngine on
RewriteRule ^robots.txt - [L]
RewriteRule ^sitemap.xml - [L]
RewriteRule ^(.*)$ http://www.newsite.se/$1 [R=301,L]
This makes this section below (regarding oldsite.com/oldsite.nu) void:
The .com and .nu were built in Ruby on Rails and hosted on Heroku.
The logic to redirect paths from oldsite.com/oldsite.nu were made
completely on the newsite.se site. The redirection code on the
oldsites is a straightforward redirection with this on the first row
in routes.rb on oldsite.com:
match "/(*path)" => redirect {|params, req| "http://www.newsite.se/#{params[:path]}"}, via: [:get, :post]
I used this (Swedish) tool to verify that this redirect actually makes a 301 redirect: http://301redirect.se . It confirmed the redirections were 301.
Newsite.se redirection handler
The content on each old website were matched with the same content on the new one, quite rarely on the same path, e.g.
oldsite.com/categories/vacation/item/1243
could lead to
newsite.se/product-items/1243
I handle these types of redirections mostly in an internal redirection controller that catches and redirects any traffic on newsite.se like:
newsite.se/categories/vacation/item/1243 -> newsite.se/product-items/1243
using this at the bottom on my newsite.se routes.rb:
match '*not_found_path', :to => 'redirections#not_found_catcher', via: :get, as: :redirect_catcher, :constraints => lambda{|req| req.path !~ /\.(png|gif|jpg|txt|js|css)$/ }
This works fine.
Edit 20151223: The reason I use Newsite.se to handle the redirects is because it holds all the logic of where to redirect the paths. This is virtually impossible for Oldsite.com/.nu to know.
Actions taken
Outside of redirecting with 301 (as far as I understand, I do). I have also used Google Webmaster Tools to make a "Request to change address" from my old two websites to my new one. I can't find any information on this any longer but I am quite sure I got a positive response from WMT this hade been done (but I am not 100% sure).
The problem indications
I am not 100% sure there is something wrong but I have seen indications that makes me believe the redirection is not made properly so that Google really realize the websites are not moved.
In Google Webmaster Tools, and "Incoming links" the top link domain is herokuapp.com which in term means oldsite.com. I.e. the 301 redirects seems to be interpreted as links (and not as redirects).
I often get new indications on Google WMT about "Not founds/404's" (don't know what this section is called in the English version) for url's that could not be reached on newsite.se. When I check the source of those url's I often see links from e.g. oldsite.nu/oldpath/productitem/1234 - like someone (Google?) still have accessed that old url. An important part of this is that I did NOT have that many links to the old sites so I don't expect these to be from old links still feeding traffic.
I still get traffic to many of my old paths (from oldsite.com/oldsite.new). I find this through my redirection controller which handles plenty of requests on old paths every day.
The website have lost a lot of positions in Google SERP, this is only a weak indication though since there could be numerous reasons for it.
Solving the problem
How should I go about to trouble shoot this problem?
Is it normal for WMT to consider 301's as links?
Is there a smarter way to handle the redirection from oldsite.com than my routes.rb-match line?
I had to do a similar move for a client moving a large e-commerce website that required transitioning all old traffic to the new website and redirect the appropriate products to the new pathing.
In order to get everything transitioned so we didn't lose Google Ranking we had to implement 301 redirects as you mentioned above. In the WMT they seem to rely on you to handle this instead of have it as a supported function anymore.
Approach
You should redirect every URL on your old domain to the corresponding
new URL. That is the documented & recommended way of changing your
domain according to Google.
The best approach would be to handle the redirects in a controller and have the logic to send it to the actual page with a 301 and no more redirects once landing on the new website.
I would suggest the following:
routes.rb (oldsite.com/oldsite.nu)
Match the request and send it to a controller to handle the finer logic and 301.
match "/(*path)", to: 'redirector#catch_all', via: [:get, :post]
RedirectorController (oldsite.com/oldsite.nu)
def catch_all
# Separate the rest of the link into its components
# I will use the example of the vacation -> product-items you have
options = params[:path].split('/')
options.reject! { |e| e.to_s.empty? } # Remove trailing junk if any
if options[0] == 'categories'
redirect_to "http://www.newsite.se/product-items/#{options.last}", notice: 'Product Updated! We Have A New Site.', status: 301
return # the return here is a MUST for more complex if-then and multiple redirect_to's
elsif options[0] == 'contact'
redirect_to "http://www.newsite.se/contact-us", notice: 'We moved, Contact us Here.', status: 301
return
elsif options.empty? || options.blank?
redirect_to "http://www.newsite.se/", notice: 'That page no longer exists, Please browse our new site', status: 301
return
else
more_sorting(options)
end
end
private
def more_sorting(options)
case options
when options[2].....
redirect_to ....., notice: '', status: 301
return
when options[3].....
redirect_to ....., notice: '', status: 301
return
when options[4].....
redirect_to ....., notice: '', status: 301
return
end
end
Why do it this way:
This will cause the search engines robots, and users to still be able to crawl and visit each page and link and get redirected to the specific page it is associated to on the new website.
Further it handles the 301 redirect on this server and does not result in another redirect on the new server. Something you may be penalized for from both and user experience and robot interpretation of you're attempts to unite the sites. (this will also most likely remove the interpretation of link 301's)
If you need more complex routing you can add (as I had to do) private functions in the RedirectController for more in depth analysis of the parameters that I had as the last else in the if then .
Clarification?
Let me know if you have any other questions and if this helped.
Related
I'm currently working on a solution for doing redirects in RoR because I got an error within the brakeman report saying that I have to fix redirects in a proper way.
I understand what the message says and how to solve it within one controller action.
But now I got the following. During the instantiation of the new method I set the HTTP_REFERER header which can be used in the create action.
This is giving me a Brakeman warning which can be found on the following link
Suppose I got the following controller with multiple endpoints:
def new
#my_model_set = MyModel.new
#referer = request.env['HTTP_REFERER'] # We want to redirect to this referer after a create
end
def create
...
if #my_model_set.save
flash_message :success, t('notification.item_created', type: #my_model_set.model_name.human)
if params[:referer].present?
redirect_to params[:referer]
else
redirect_to admin_my_model_set_path
end
else
...
end
end
I already tried to fix this by using the redirect_back method from RoR but that's using the referer link of the create method which I don't want to use.
if #my_model_set.save
flash_message :success, t('notification.item_created', type: #my_model_set.model_name.human)
redirect_back(fallback_location: admin_my_model_set_path)
else
...
end
The main problem in your code is that params[:referer] can be set by your user (or an attacker forging a link for your user) to an arbitrary value by appending ?referer=https://malicious.site to the url. You will then redirect to that, which is an open redirect vulnerability.
You could also argue that the referer header is technically user input, and you will be redirecting to it, but I would say in most cases and modern browsers that would probably be an acceptable risk, because an attacker does not really have a way to exploit it (but it might depend on the exact circumstances).
One solution that immediately comes to mind for similar cases would be the session - but on the one hand this is a rest api if I understand correctly, so there is no session, and on the other hand, it would still not be secure against an attacker linking to your #new endpoint from a malicious domain.
I think you should validate the domain before you redirect to it. If there is a common pattern (like for example if all of these are subdomains of yourdomain.com), validate for that. Or you could have your users register their domains first before you redirect to it (see how OAuth2 works for example, you have to register your app domain first before the user can get redirected there with a token).
If your user might just come from anywhere to #new and you want to send them back wherever they came from - that I think is not a good requirement, you should probably not do that, or you should carefully assess the risk and consciously accept it if you want to for some reason. In most cases there is a more secure solution.
I came across something curious and was wondering how to fix it within Rails.
In my app, I have a Country model; it already contains records for all countries I'll ever need. However, many of them don't contain any data or are otherwise not yet relevant. These countries yield a 404 error, as they would if the record didn't exist in the first place:
begin
#country = Country.friendly.find(params[:id])
rescue ActiveRecord::RecordNotFound
#country = nil
end
if !#country.nil? && Country.with_data.include?(#country)
# render the view
else
render :file => "#{Rails.root}/public/404", :status => :not_found
end
So assuming that e.g. France contains data while Andorra doesn't, the following should happen:
mysite/country/france --> HTTP 200 OK
mysite/country/andorra --> HTTP 404 Not found
mysite/country/randomstring123 --> HTTP 404 Not found
This all works fine. However, what's curious is that when I track my site in Google Webmaster Tools, it is actually aware of some of the URLs that point to "empty" countries, and shows them to me as 404-yielding "crawling errors". (E.g., it knows mysite/country/andorra.) What I can't see is where Google got those URLs from. Those links are also not included in the WT "Internal Links" section, so that doesn't help.
The routes.rb excludes the index action:
resources :countries, path: "country", except: :index
I generate a sitemap with a custom controller, but it excludes the countries in question.
I conclude that there are two likely options:
It is possible that an earlier version of the sitemap controller included "empty" countries. They might then still be tried by Google eventually (as are some old URLs from a much outdated site structure > 9 months ago).
Otherwise, Rails somehow would have to "leak" URLs to these empty countries. Is there any "internal" way to check that? I'll also run external 404 checks but it would be good to know if I can somehow get an efficient output much alike rake routes somehow.
I just realized I had a very hard to find bug on my website. I frequently use Model.find to retrieve data from my database.
A year ago I merged three websites causing a lot of redirections that needed to be handled. To do I created a "catch all"-functionality in my application controller as this:
around_filter :catch_not_found
def catch_not_found
yield
rescue ActiveRecord::RecordNotFound
require 'functions/redirections'
handle_redirection(request.path)
end
in addition I have this at the bottom of my routes.rb:
match '*not_found_path', :to => 'redirections#not_found_catcher', via: :get, as: :redirect_catcher, :constraints => lambda{|req| req.path !~ /\.(png|gif|jpg|txt|js|css)$/ }
Redirection-controller has:
def not_found_catcher
handle_redirection(request.path)
end
I am not sure these things are relevant in this question but I guess it is better to tell.
My actual problem
I frequently use Model.find to retrieve data from my database. Let's say I have a Product-model with a controller like this:
def show
#product = Product.find(params[:id])
#product.country = Country.find(...some id that does not exist...)
end
# View
<%= #product.country.name %>
This is something I use in some 700+ places in my application. What I realized today was that even though the Product model will be found. Calling the Country.find() and NOT find something causes a RecordNotFound, which in turn causes a 404 error.
I have made my app around the expectation that #product.country = nil if it couldn't find that Country in the .find-search. I know now that is not the case - it will create a RecordNotFound. Basically, if I load the Product#show I will get a 404-page where I would expect to get a 500-error (since #product.country = nil and nil.name should not work).
My question
My big question now. Am I doing things wrong in my app, should I always use Model.find_by_id for queries like my Country.find(...some id...)? What is the best practise here?
Or, does the problem lie within my catch all in the Application Controller?
To answer your questions:
should I always use Model.find_by_id
If you want to find by an id, use Country.find(...some id...). If you want to find be something else, use eg. Country.find_by(name: 'Australia'). The find_by_name syntax is no longer favoured in Rails 4.
But that's an aside, and is not your problem.
Or, does the problem lie within my catch all in the Application Controller?
Yeah, that sounds like a recipe for pain to me. I'm not sure what specifically you're doing or what the nature of your redirections is, but based on the vague sense I get of what you're trying to do, here's how I'd approach it:
Your Rails app shouldn't be responsible for redirecting routes from your previous websites / applications. That should be the responsibility of your webserver (eg nginx or apache or whatever).
Essentially you want to make a big fat list of all the URLs you want to redirect FROM, and where you want to redirect them TO, and then format them in the way your webserver expects, and configure your webserver to do the redirects for you. Search for eg "301 redirect nginx" or "301 redirect apache" to find out info on how to set that up.
If you've got a lot of URLs to redirect, you'll likely want to generate the list with code (most of the logic should already be there in your handle_redirection(request.path) method).
Once you've run that code and generated the list, you can throw that code away, your webserver will be handling the redirects form the old sites, and your rails app can happily go on with no knowledge of the previous sites / URLs, and no dangerous catch-all logic in your application controller.
That is a very interesting way to handle exceptions...
In Rails you use rescue_from to handle exceptions on the controller layer:
class ApplicationController < ActionController::Base
rescue_from SomeError, with: :oh_noes
private def oh_noes
render text: 'Oh no.'
end
end
However Rails already handles some exceptions by serving static html pages (among them ActiveRecord::RecordNotFound). Which you can override with dynamic handlers.
However as #joshua.paling already pointed out you should be handling the redirects on the server level instead of in your application.
A seemlingly simple problem which I can't figure out how to deal with (in Rails 3.2): we would like to offer the possibility to
our users to define a subdomain and map incoming requests using that subdomain to a path partially retrieved from
the database. So for example, while www.example.com will go to our usual root path, a request to steve.example.com
would first look up "steve" in a datbase table that associates subdomains with ids, and if a match is found route
the request to, say, www.example.com/folders/36 (where 36 is the id associated with "steve"), and if no match is found
continue looking for other routes in routes.rb.
I have a working solution using redirect, which goes something like:
constraints :subdomain => /^(?!www).+/ do # a subdomain is used and different from www
match '*path', :to => redirect {|params, req|
req_protocol=req.env['rack.url_scheme'] # e.g. "http"
req_host=req.env['HTTP_HOST'] # e.g. "steve.example.local:3000"
...
code to pick up "steve", do the lookup and return a suitable URL
}
end
Now, I do NOT want to use redirects for two reasons: firstly the URL is then modified in the user's browser address bar,
and secondly browsers tend to cache the redirects (even with status set to 302 or 307) making such a solution less dynamic.
If I had had access to the request object in routes.rb, I could possibly had done something like
match '*path' => "folders##{index}", :constraints => {:subdomain => /^(?!www).+/ }
after having retrieved index from the database table using the subdomain, but the request object is not available.
(I could however probably manage the case when no association is found using the "Advanced Contstraints" as described in the
Rails guide, although I haven't tested that, knowing that it would only fix a tiny part of the problem).
I've made mistake and allowed two different routes pointing at same place. Now I've got troubles with duplicated content.
News could be viewed in two ways:
http://website.com/posts/321 and http://website.com/news/this-is-title/321
I want to fix this mess and my idea is to check by what link user is coming. For example if someone will came through http://website.com/posts/321 I would like to redirect visitor to correct route: http://website.com/news/this-is-title/321
My very first idea is to validate request url at Post controller and then in if statement decide about redirecting or simply displaying proper view. Is it good conception?
I think it's not the best fit.
You should do this at routes level using the redirect methods.
I don't think you should bother, take a look at canonical url's if you're worried about SEO
In your posts_controller.rb show:
def show
return redirect_to post_path(params[:id]) if request.fullpath.match /(your regex)/i, :status => 301, :notice => 'This page has been permanently moved'
#post = Post.find(...)
end
return redirect_to is important because you can't call redirect or render multiple times
match the regex on request.fullpath
if you're super concerned about SEO, set the status to 301. this tells search engines that the page has been permanently moved
the notice is optional and only for asthetics after the redirect in case the user has bookmarked the old page url