Rails 3, snowman and SEO - ruby-on-rails

I'm wondering how the snowman affects the SEO.
For example, if someone puts a link to your post with the snowman, it means two URLs are pointing to the same resource (the other one without snowman), and basically this is bad for search engines.
Does this really poses a problem?

If you care about SEO you should already have 303 redirects for normalization in place, so that both the URL with and without the snowmen end up linking to the same URL, hopefully without any query parameter.
Also, this only affect forms that use the GET method; URLs that can only be retrieved via POST request are not crawled at all.
Minor note: the snowman is no longer used in current Rails versions, now you have utf8=✓.

Related

URL routing: do I need it, and why?

While easy enough to find technical, technology-dependent descriptions of URL routing, it's surprisingly difficult to find a coherent summary of the various use cases (situations in which it might be required). I need to know under what hypothetical circumstances I am likely to need routing.
Some of those deriving from dynamic URL usage are outlined here, but it seems unlikely that this list is exhaustive.
I'd be glad if someone could list these separately for static and dynamic URLs -including, where applicable, any more or less imposed by external tools and services.
If you find yourself talking in terms of HTTP actions such as GET, POST, PATCH, PUT, DELETE or equivalents, to my mind you're already going too deep.
As far as I can make out, this question was too stupid to be asked by anyone else. :-)
Tks..
Yes you do need routing, routing can be an excellent way to get users to be able to effectively share a resource and find direct routes to it.
URL routes are basically GET requests sent by the browser to the web application, so theoretically you can do just as well with POST requests (i.e. without URLs), however for the reason mentioned above and also as a best practice (supported by most modern web frameworks) it's best that you use proper URLs, POST requests are probably just fine too - but i would suggest to use them when obfuscation of some parts of the web resources from the end user is required.

Rails Mass 301 redirect - will it cause routes.rb issues?

I have a large set of URLs that I need to redirect (over 600), which are all unique. They're all named URLs (i.e. example.com/this-shoe-name and example.com/that-blue-product), so I can't use something like this to semi-dynamically do this, and they also don't redirect to anything similar (i.e. www.example.com/this-shoe-name redirects to newexample.com/catalog/shoes vs. newexample.com/this-shoe-name).
I guess I'm fine writing 600 redirect rules (though I'd rather avoid it), but that seems to me like it's going to make the routes.rb file rather unwieldy.
Short of writing 600 rules in routes.rb, is there a best practice way to do this? Is having 600 rules in the routes.rb going to make my app slow/break things?
I don't know if it will slow routing processing down, but I doubt it. As long as you put them at the end, do you really care? Like you I don't like the messiness it causes.
An alternative would be to create a catch all route or rescue the 404 and then look up the URL in a "old urls" table and if a match is found redirect to the new url.
You might even be able to tie that "old url" into your Product or Category model as an attribute. Then search that and if found redirect to the correct url for that model. I've done that in the past with pretty good success. Bonus is that you can expose it in an admin tool and make the client enter them all in :-)

SEO for Rails site, now or later?

My freelance web developer is developing a site on Ruby on Rails. Currently the URL structure is
http://abc.com/all?ID=category
Example:
http://abc.com/all?3=CatA
I requested him to structure the URL according to categories, such as
http://abc.com/CatA/3-page-title
but he refused cos it was too much work and cost involved as he is using the same model for Category A, Category B... I thought the URL structure that he is using is messy and not search engine friendly.
My question is, should I add cost to let him do a better structured URL I requested, or should I let the project finish, then do it in the next iteration?
I'm worried that if I do it in the next iteration, all the previous URLs structured in the old way will be purged and when sites that refer to it will show a 404 error. It's like I have to rebuild the site ranking all over again.
Any other solutions for me? Is that any way to redirect old URLs to new URLs automatically in Rails?
The way your developer is proposing to compose the URLs would be considered something of an anti-pattern is Rails. What you are looking for is close to what Rails does out-of-the-box when using RESTful resource routing, admittedly, I'm guessing as to how CatA and page-title are related to each other. A RESTful Rails route might look like this (using your example):
http://abc.com/categories/3-CatA/pages/10-page-tite
If he really is using Rails, and he knows what he's doing, then there should be no cost at all to converting to what you want. It just needs the correct routes defined and then a to_param override in your models to get the nice SEO friendly identifiers.

Is it safe to depend on a trailing slash in a URL for routing purposes?

I'm building a site that has products, each of which belongs to one or more categories, which can be nested within parent categories. I'd like to have SEO-friendly URLs, which look like this:
mysite.com/category/
mysite.com/category/product
mysite.com/category/sub-category/
mysite.com/category/sub-category/product
My question is: Is it safe to depend on a the presence of a trailing slash to differentiate between cases 2 and 3? Can I always assume the user wants a category index when a trailing slash is detected, vs a specific product's page with no trailing slash?
I'm not worried about implementing this URI scheme; I've already done as much with PHP and mod_rewrite. I'm simply wondering if anybody knows of any objections to this kind of URL routing. Are there any known issues with browsers stripping/adding trailing URLs from the address bar, or with search engines crawling such a site? Any SEO issues or other stumbling blocks that I'm likely to run into?
In addition to the other pitfall ideas you mentioned, the user might himself change the URL (by typing the product or category) and add/remove the trailing "/".
To solve your problem, why not have a special sub-category "all" and instead of
"mysite.com/category/product" have "mysite.com/category/all/product"?
To me, it seems very unnatural that http://product/ and http://product would represent two entirely different resources. It is confusing, and it makes your URLs less hackable, since it is difficult to tell when a trailing slash should be present or not.
Also, in RFC 3986, Uniform Resource Identifier (URI): Generic Syntax, there is a note on Protocol-Based Normalization in chapter 6.2.4, which talks about this particular situation with regard to non-human visitors of your site, such as search engines and web spiders:
Substantial effort to reduce the incidence of false negatives is
often cost-effective for web spiders. Therefore, they implement
even more aggressive techniques in URI comparison. For example,
if they observe that a URI such as
http://example.com/data
redirects to a URI differing only in the trailing slash
http://example.com/data/
they will likely regard the two as equivalent in the future. (...)
One way to differentiate would be to make sure product pages have an extension, but category or sub-category pages to not. That is:
mysite.com/category/
mysite.com/category/product.html
mysite.com/category/sub-category/
mysite.com/category/sub-category/product.html
That makes it unambiguous.
Never assume the user will do anything BUT the worst case scenario in anything URL related.
unless you're prepared to do redirects in your code, assume you have the equal chance of a URI ending in slash or no slash. Only way to make sure your code is robust and thus won't have to worry about this kind of issue.
This question assumes that the addition of a trailing slash to a URL creates a URL that refers to a different resource. This is wrong; the semantics of URLs is that they both refer to the same resource. The presence of a trailing slash in a base URL merely changes how relative URLs are interpreted using that base URL.

SEO and URI Structure

Standard SEO caveat: It's a black box, and the algorithms are proprietary, and trying to predict and game the search engines is a crappy way to make a living.
That said, what are the baseline steps you want to take to make sure your content is visible to the major search engines (Google, Bing, etc.)
I'm specifically curious as to what role your URI Information Architecture plays. It's common wisdom that you want keywords in your URI, and you want to avoid the query-string laden approach, but what else beyond that?
A quick example of what I'm talking about. Based on something I read on a forum, I recently exposed a /category/* hierarchy on my site. In the following weeks I noticed a sharp uptick in my page views.
I'm curious what other basic steps a site/weblog should take with its URIs to ensure a baseline visibility.
A few URI tips that have kept me ranking:
Write URIs in English but include a unique ID. SO does this well: http://stackoverflow.com/questions/1278157/seo-and-uri-structure
Stay consistent when linking to a page: domain.com/, domain.com/index and domain.com/index.php are different URIs
Use .html extensions, or purely /one/two/ directories for pages
There's probably hundreds of other tips! The structure of linking plays a very important role too...
Logically break your site down into many categories/subcategories
Link all pages back to your homepage
Don't link to hundreds of pages from your homepage
EDIT: Oh I forgot a very important one - a proper 404 response!
Hopefully that helps a bit
some simple things ...
meaningful and accurate meta fields (especially description, keywords)
a valid hn hierarchy on every page (e.g. h1 h2 h3 h2 h2 h3 h3 h4 h3 h2)
all (text) content accessible to a text browser
check spellings
keep content and display functionality separated (e.g. use HTML and CSS fully)
validate CSS and (X)HTML and use standard DOCTYPES
relevant <title> for each page
sensible site hierarchy and no orphan pages
1) Don't use www subdomain if you do not have to. If you or your company has made the mistake of using subdomains for asset management then you likely forced into using www just to be safe.
2) The biggest problem faced by search engines is redundant URIs for the same page. This problem is solved by using a canonical link tag in your HTML. This will perhaps help you more than any other single SEO factor.
3) Make your URIs meaningful. If people can remember URIs well enough to type them out your SEO will be significantly improved.
The most important factors with URIs is easy to remember and the ability to specify uniqueness to the search engine. Nothing else matters with regard to URIs and SEO.

Resources