Rails: Find all occurrences of internal 404 links (is Rails "leaking" URLs?) - ruby-on-rails

I came across something curious and was wondering how to fix it within Rails.
In my app, I have a Country model; it already contains records for all countries I'll ever need. However, many of them don't contain any data or are otherwise not yet relevant. These countries yield a 404 error, as they would if the record didn't exist in the first place:
begin
#country = Country.friendly.find(params[:id])
rescue ActiveRecord::RecordNotFound
#country = nil
end
if !#country.nil? && Country.with_data.include?(#country)
# render the view
else
render :file => "#{Rails.root}/public/404", :status => :not_found
end
So assuming that e.g. France contains data while Andorra doesn't, the following should happen:
mysite/country/france --> HTTP 200 OK
mysite/country/andorra --> HTTP 404 Not found
mysite/country/randomstring123 --> HTTP 404 Not found
This all works fine. However, what's curious is that when I track my site in Google Webmaster Tools, it is actually aware of some of the URLs that point to "empty" countries, and shows them to me as 404-yielding "crawling errors". (E.g., it knows mysite/country/andorra.) What I can't see is where Google got those URLs from. Those links are also not included in the WT "Internal Links" section, so that doesn't help.
The routes.rb excludes the index action:
resources :countries, path: "country", except: :index
I generate a sitemap with a custom controller, but it excludes the countries in question.
I conclude that there are two likely options:
It is possible that an earlier version of the sitemap controller included "empty" countries. They might then still be tried by Google eventually (as are some old URLs from a much outdated site structure > 9 months ago).
Otherwise, Rails somehow would have to "leak" URLs to these empty countries. Is there any "internal" way to check that? I'll also run external 404 checks but it would be good to know if I can somehow get an efficient output much alike rake routes somehow.

Related

RoR Magazine specific category routes

I would like to create routes on RoR for a media website with different sections of articles (ecology, legal, economy, etc...)
I would like my url goes like this
root/magazine/ecology/name-of-articles
(nothing corresponding on rails routing/rails guide, nested and collection routes don't fit for me I think)
here is my try:
get 'magazine/ecology/name-of-article', to: 'articles#name_of_article'
views: folders articles => magazine => ecology => file: name_of_article
controller: articles
But it's not working ...answer from rails below
Thx for your help
ActionController::UnknownFormat at
/magazine/actualite-juridique/legislation-ce-qui-change-en-2017
ArticlesController#legislation_2017 is missing a template for this
request format and variant.
request.formats: ["text/html"] request.variant: []
NOTE! For XHR/Ajax or API requests, this action would normally respond
with 204 No Content: an empty white screen. Since you're loading it in
a web browser, we assume that you expected to actually render a
template, not nothing, so we're showing an error to be extra-clear. If
you expect 204 No Content, carry on. That's what you'll get from an
XHR or API request. Give it a shot.
Although you don't seek a solution and what you trying to do as discussed in the comments, I don't at all recommend that. I would like to suggest a better method. Take it if you like it.
Run a migration add a column to your articles model which will save the name of section it belongs to.
Then, your routes:
get 'magazine/:section/:name' => 'articles#name_of_article' , as: 'sectioned_article'
In your controller:
def name_of_article
#article = Article.where("section iLike '%#{params[:section]}% AND name iLike '%#{params[:name]}%'").first
end
This way, you can create as many sections/articles you want without code changes.
It also gives you customisation option and also reduces your work to great extent which you are ready to do [but now you can utilize that time partying ;)].

Are my old websites being properly redirected?

About a year ago I merged three websites (oldsite.com, oldsite.nu, newsite.se) into one, which I kept on one of the domains (newsite.se). I am not sure if this has been done right since I still see a lot of traffic from Google for old url:s, even after a year.
Oldsite redirect code
Important edit note: I recently realized the nameservers were not pointing towards my old rails app any longer but instead to a php-folder on my web host in which I have a .htaccess with the following code:
RewriteEngine on
RewriteRule ^robots.txt - [L]
RewriteRule ^sitemap.xml - [L]
RewriteRule ^(.*)$ http://www.newsite.se/$1 [R=301,L]
This makes this section below (regarding oldsite.com/oldsite.nu) void:
The .com and .nu were built in Ruby on Rails and hosted on Heroku.
The logic to redirect paths from oldsite.com/oldsite.nu were made
completely on the newsite.se site. The redirection code on the
oldsites is a straightforward redirection with this on the first row
in routes.rb on oldsite.com:
match "/(*path)" => redirect {|params, req| "http://www.newsite.se/#{params[:path]}"}, via: [:get, :post]
I used this (Swedish) tool to verify that this redirect actually makes a 301 redirect: http://301redirect.se . It confirmed the redirections were 301.
Newsite.se redirection handler
The content on each old website were matched with the same content on the new one, quite rarely on the same path, e.g.
oldsite.com/categories/vacation/item/1243
could lead to
newsite.se/product-items/1243
I handle these types of redirections mostly in an internal redirection controller that catches and redirects any traffic on newsite.se like:
newsite.se/categories/vacation/item/1243 -> newsite.se/product-items/1243
using this at the bottom on my newsite.se routes.rb:
match '*not_found_path', :to => 'redirections#not_found_catcher', via: :get, as: :redirect_catcher, :constraints => lambda{|req| req.path !~ /\.(png|gif|jpg|txt|js|css)$/ }
This works fine.
Edit 20151223: The reason I use Newsite.se to handle the redirects is because it holds all the logic of where to redirect the paths. This is virtually impossible for Oldsite.com/.nu to know.
Actions taken
Outside of redirecting with 301 (as far as I understand, I do). I have also used Google Webmaster Tools to make a "Request to change address" from my old two websites to my new one. I can't find any information on this any longer but I am quite sure I got a positive response from WMT this hade been done (but I am not 100% sure).
The problem indications
I am not 100% sure there is something wrong but I have seen indications that makes me believe the redirection is not made properly so that Google really realize the websites are not moved.
In Google Webmaster Tools, and "Incoming links" the top link domain is herokuapp.com which in term means oldsite.com. I.e. the 301 redirects seems to be interpreted as links (and not as redirects).
I often get new indications on Google WMT about "Not founds/404's" (don't know what this section is called in the English version) for url's that could not be reached on newsite.se. When I check the source of those url's I often see links from e.g. oldsite.nu/oldpath/productitem/1234 - like someone (Google?) still have accessed that old url. An important part of this is that I did NOT have that many links to the old sites so I don't expect these to be from old links still feeding traffic.
I still get traffic to many of my old paths (from oldsite.com/oldsite.new). I find this through my redirection controller which handles plenty of requests on old paths every day.
The website have lost a lot of positions in Google SERP, this is only a weak indication though since there could be numerous reasons for it.
Solving the problem
How should I go about to trouble shoot this problem?
Is it normal for WMT to consider 301's as links?
Is there a smarter way to handle the redirection from oldsite.com than my routes.rb-match line?
I had to do a similar move for a client moving a large e-commerce website that required transitioning all old traffic to the new website and redirect the appropriate products to the new pathing.
In order to get everything transitioned so we didn't lose Google Ranking we had to implement 301 redirects as you mentioned above. In the WMT they seem to rely on you to handle this instead of have it as a supported function anymore.
Approach
You should redirect every URL on your old domain to the corresponding
new URL. That is the documented & recommended way of changing your
domain according to Google.
The best approach would be to handle the redirects in a controller and have the logic to send it to the actual page with a 301 and no more redirects once landing on the new website.
I would suggest the following:
routes.rb (oldsite.com/oldsite.nu)
Match the request and send it to a controller to handle the finer logic and 301.
match "/(*path)", to: 'redirector#catch_all', via: [:get, :post]
RedirectorController (oldsite.com/oldsite.nu)
def catch_all
# Separate the rest of the link into its components
# I will use the example of the vacation -> product-items you have
options = params[:path].split('/')
options.reject! { |e| e.to_s.empty? } # Remove trailing junk if any
if options[0] == 'categories'
redirect_to "http://www.newsite.se/product-items/#{options.last}", notice: 'Product Updated! We Have A New Site.', status: 301
return # the return here is a MUST for more complex if-then and multiple redirect_to's
elsif options[0] == 'contact'
redirect_to "http://www.newsite.se/contact-us", notice: 'We moved, Contact us Here.', status: 301
return
elsif options.empty? || options.blank?
redirect_to "http://www.newsite.se/", notice: 'That page no longer exists, Please browse our new site', status: 301
return
else
more_sorting(options)
end
end
private
def more_sorting(options)
case options
when options[2].....
redirect_to ....., notice: '', status: 301
return
when options[3].....
redirect_to ....., notice: '', status: 301
return
when options[4].....
redirect_to ....., notice: '', status: 301
return
end
end
Why do it this way:
This will cause the search engines robots, and users to still be able to crawl and visit each page and link and get redirected to the specific page it is associated to on the new website.
Further it handles the 301 redirect on this server and does not result in another redirect on the new server. Something you may be penalized for from both and user experience and robot interpretation of you're attempts to unite the sites. (this will also most likely remove the interpretation of link 301's)
If you need more complex routing you can add (as I had to do) private functions in the RedirectController for more in depth analysis of the parameters that I had as the last else in the if then .
Clarification?
Let me know if you have any other questions and if this helped.

Rails: Model.find() or Model.find_by_id() to avoid RecordNotFound

I just realized I had a very hard to find bug on my website. I frequently use Model.find to retrieve data from my database.
A year ago I merged three websites causing a lot of redirections that needed to be handled. To do I created a "catch all"-functionality in my application controller as this:
around_filter :catch_not_found
def catch_not_found
yield
rescue ActiveRecord::RecordNotFound
require 'functions/redirections'
handle_redirection(request.path)
end
in addition I have this at the bottom of my routes.rb:
match '*not_found_path', :to => 'redirections#not_found_catcher', via: :get, as: :redirect_catcher, :constraints => lambda{|req| req.path !~ /\.(png|gif|jpg|txt|js|css)$/ }
Redirection-controller has:
def not_found_catcher
handle_redirection(request.path)
end
I am not sure these things are relevant in this question but I guess it is better to tell.
My actual problem
I frequently use Model.find to retrieve data from my database. Let's say I have a Product-model with a controller like this:
def show
#product = Product.find(params[:id])
#product.country = Country.find(...some id that does not exist...)
end
# View
<%= #product.country.name %>
This is something I use in some 700+ places in my application. What I realized today was that even though the Product model will be found. Calling the Country.find() and NOT find something causes a RecordNotFound, which in turn causes a 404 error.
I have made my app around the expectation that #product.country = nil if it couldn't find that Country in the .find-search. I know now that is not the case - it will create a RecordNotFound. Basically, if I load the Product#show I will get a 404-page where I would expect to get a 500-error (since #product.country = nil and nil.name should not work).
My question
My big question now. Am I doing things wrong in my app, should I always use Model.find_by_id for queries like my Country.find(...some id...)? What is the best practise here?
Or, does the problem lie within my catch all in the Application Controller?
To answer your questions:
should I always use Model.find_by_id
If you want to find by an id, use Country.find(...some id...). If you want to find be something else, use eg. Country.find_by(name: 'Australia'). The find_by_name syntax is no longer favoured in Rails 4.
But that's an aside, and is not your problem.
Or, does the problem lie within my catch all in the Application Controller?
Yeah, that sounds like a recipe for pain to me. I'm not sure what specifically you're doing or what the nature of your redirections is, but based on the vague sense I get of what you're trying to do, here's how I'd approach it:
Your Rails app shouldn't be responsible for redirecting routes from your previous websites / applications. That should be the responsibility of your webserver (eg nginx or apache or whatever).
Essentially you want to make a big fat list of all the URLs you want to redirect FROM, and where you want to redirect them TO, and then format them in the way your webserver expects, and configure your webserver to do the redirects for you. Search for eg "301 redirect nginx" or "301 redirect apache" to find out info on how to set that up.
If you've got a lot of URLs to redirect, you'll likely want to generate the list with code (most of the logic should already be there in your handle_redirection(request.path) method).
Once you've run that code and generated the list, you can throw that code away, your webserver will be handling the redirects form the old sites, and your rails app can happily go on with no knowledge of the previous sites / URLs, and no dangerous catch-all logic in your application controller.
That is a very interesting way to handle exceptions...
In Rails you use rescue_from to handle exceptions on the controller layer:
class ApplicationController < ActionController::Base
rescue_from SomeError, with: :oh_noes
private def oh_noes
render text: 'Oh no.'
end
end
However Rails already handles some exceptions by serving static html pages (among them ActiveRecord::RecordNotFound). Which you can override with dynamic handlers.
However as #joshua.paling already pointed out you should be handling the redirects on the server level instead of in your application.

Changing urls in ruby on rails depending on different conditions

I'm new to ruby on rails....I wanted to know if there is a way to change the URL displayed depending on the client's response. I mean... here's an example:
I'm making a project showing listings in various places...
Now in general I have a home page, a search page, and a detail page for listings. So, respective URLs are officespace/home, officespace/search?conditions, officespace/detailpage?id=(controller-officespace)[&Conditions eg.---price,size,place,type...]
So, every time the client makes a request for search, the same URL is shown, of course with the given conditions.
Now I want that if the client asks for only the place and mentions nothing about size, price, etc., the url should be /listing/location_name.
If he mentions other conditions, then it'll be listing/(office_type)/size(x sq feet)_office_for_rent_in_locationname)
B.t.w. (I already have a controller named listings and its purpose is something else.)
And so on ........... Actually, I want to change URLs for a number of things. Anyway, please help me. And please don't refer me to the manuals. I've already read them and they didn't give any direct help.
This is an interesting routing challenge. Essentially, your goal is to create a special expression that will match the kinds of URL's you want to display in the user's browser. These expressions will be used in match formulas in config/routes.rb. Then, you'll need to make sure the form actions and links on relevant search pages link to those specialized URL's and NOT the default pages. Here's an example to get started:
routes.rb
match "/listing/:officeType/size/:squarefeet/office_for/:saleOrRent/in/:locationName" => "searches#index"
match "/listing/*locationName" => "searches#index"
resources :searches
Since you explicitly mentioned that your listings controller is for something else, I just named our new controller searches. Inside the code for the index method for this controller, you have to decide how you want to collect the relevant data to pass along to your view. Everything marked with a : in the match expressions above will be passed to the controller in the params hash as if it were an HTTP GET query string parameter. Thus we can do the following:
searches_controller.rb
def index
if params[:squarefeet] && params[:officeType] && params[:locationName]
#listings = Listing.where("squarefeet >= ?", params[:squarefeet].to_i).
where(:officeType => params[:officeType],
:locationName => params[:locationName])
elsif params[:locationName]
#listings = Listing.where(:locationName => params[:locationName])
else
#listings = Listing.all
end
end
And to send the user to one of those links:
views/searches/index.html.erb
<%= link_to "Click here for a great office!", "/listing/corporate/size/3200/office_for/rent/in/Dallas" %>
The above example would only work if your Listing model is set up exactly the same way as my arbitrary guess, but hopefully you can work from there to figure out what your code needs to look like. Note that I wasn't able to get the underscores in there. The routes only match segments separated by slashes as far as I can tell. Keep working on it and you may find a way past that.

Rails - any fancy ways to handle 404s?

I have a rails app I built for an old site I converted from another cms (in a non-rails language, hehe). Most of the old pages are mapped to the new pages using routes.rb. But there are still a few 404s.
I am a rails newb so I'm asking if there are any advanced ways to handle 404s. For example, if I was programming in my old language I'd do this:
Get the URL (script_name) that was being accessed and parse it.
Do a lookup in the database for any keywords, ids, etc found in the new URL.
If found, redirect to the page (or if multiple records are found, show them all on a results page and let user choose). With rails I'd probably want to do :status => :moved_permanently I'm guessing?
If not found, show a 404.
Are there any gems/plugins or tutorials you know of that would handle such a thing, if it's even possible. Or can you explain on a high level how that can be done? I don't need a full code sample, just a push in the right direction.
PS. It's a simple rails 3 app that uses a single Content model.
Put this in routes (after every other route that you have, this will capture every url)
match '*url' => 'errors#routing'
And now in errors controller in routing action you can implement any fancy logic that you want, and render a view as always (you might want to add :status => 404 to the render call). Requested url will be available in controller as params[:url].
There is an ugly way of doing this:
render :file => "#{RAILS_ROOT}/public/404.html", :layout => false, :status => 404
Maybe someone can come with a better solution.

Resources