How does rendering different formats/layouts in Rails affect SEO? - ruby-on-rails

I have been reading a lot about cloaking and redirects, and am wondering how this fits into rendering and layouts in Rails...
Two parts:
1) If I have different data formats to render in (json, xml, html, and iphone), and they all use the same url, differing by the ".format" at the end, is this considered "content duplication"? It seems like you could make the search engines unhappy with this. Is there a workaround/best-practice here?
2) If I render just the model template for rails, projects/index.html in one case, and render both the model template and the layout template in another, projects/index.html and layouts/application.html, and they are at different urls, is that considered content duplication? What's best practice in this case?
layout :main # or
layout :projects # or
layout :some_condition
I have read a little about canonicalization but I'm not quite sure how that fits into these cases.
What do you normally do in this situation to prevent being banned by the search engines?
Thanks for the tips.

No, this is not content duplication because you are serving the content in different formats. It would be content duplication if you would serve the same content in the same format at more than one URL.
Yes, it could. But you need to provide more details in order to provide a more specific answer.
There are multiple solutions you can adopt:
Use a Canonical Link Tag (here's an how to with Rails)
Disable duplicate content in your robots.txt or use the no-index header/tag
Don't duplicate your content

Related

URL design: is it bad practice to use consecutive numbering in URLs for user-submitted items?

I'm working on a website where users can submit items (in this case, proposals). The simplest URL design would be something like website.com/proposal/1, website.com/proposal/2, etc. (perhaps with a slug appended) but I've never seen this done in practice.
Is this URL design really as rare as I think it is and if so, why?
This URL design is not uncommon.
It is used, for example, by
stackoverflow.com (for each post):
https://stackoverflow.com/questions/1/
https://stackoverflow.com/questions/2/
https://stackoverflow.com/questions/3/
Drupal.org (for each user):
https://www.drupal.org/user/1
https://www.drupal.org/user/2
https://www.drupal.org/user/3
But there are various cases when not to use this design, for example
when all/some URLs should not be easy to guess (example: YouTube, for unlisted but sharable videos)
when URLs should not contain opaque/unneeded parts (example: Wikipedia)
when it should be private which URLs/pages were created before/after
when it should be private how many URLs/pages of this kind exist

Route for printer-formatted ERB view

Working with rails 3.1 here.
I'm wondering if there are any best practices for defining routes for views are meant to be sent to the printer. For instance, I have a report at "/daily" which has a print function that opens up a new nicely formatted printer view.
What URL should this view sit on? Couple ideas are:
/daily/print
/daily?media=print
What have other people used?
Either is fine.
Probably the main consideration is whether your app is public facing and accessible to search engines. Typically you want to prevent them from indexing (duplicate) content, which a printable version would be, and typically it's easier to exclude search engines (using the robots.txt file) from printable content if it's part of the path, as opposed to the query-string.
Otherwise, it's easier to just tack on the query string parameter and use that to set the printable version of your stylesheet and/or views. This approach saves having to create a new route, which may be more flexible.

ColdFusion - What's the best URL naming convention to use?

I am using ColdFusion 9.
I am creating a brand new site that uses three templates. The first template is the home page, where users are prompted to select a brand or a specific model. The second template is where the user can view all of the models of the selected brand. The third template shows all of the specific information on a specific model.
A long time ago... I would make the URLs like this:
.com/Index.cfm // home page
.com/Brands.cfm?BrandID=123 // specific brand page
.com/Models.cfm?ModelID=123 // specific model page
Now, for SEO purposes and for easy reading, I might want my URLs to look like this:
.com/? // home page
.com/?Brand=Worthington
.com/?Model=Worthington&Model=TX193A
Or, I might want my URLs to look like this:
.com/? // home
.com/?Worthington // specific brand
.com/?Worthington/TX193A // specific model
My question is, are there really any SEO benefits or easy reading or security benefits to either naming convention?
Is there a best URL naming convention to use?
Is there a real benefit to having a URL like this?
http://stackoverflow.com/questions/7113295/sql-should-i-use-a-junction-table-or-not
Use URLs that make sense for your users. If you use sensible URLs which humans understand, it'll work with search engines too.
i.e. Don't do SEO, do HO. Human Optimisation. Optimise your pages for the users of your page and in doing so you'll make Google (and others) happy.
Do NOT stuff keywords into URLs unless it helps the people your site is for.
To decide what your URL should look like, you need to understand what the parts of a URL are for.
So, given this URL: http://domain.com/whatever/you/like/here?q=search_terms#page-frament.
It breaks down like this:
http
what protocol is used to deliver the page
:
divides protocol from rest of url
//domain.com
indicates what server to load
/whatever/you/like/here
Between the domain and the ? should indicate which page to load.
?
divides query string from rest of url
q=search_terms
Between the ? and the # can be used for a dynamic search query or setting.
#
divides page fragment from rest of the url
page-frament
Between the # and the end of line indicates which part of the page to focus on.
If your system setup lets you, a system like this is probably the most human friendly:
domain.com
domain.com/Worthington
domain.com/Worthington/TX193A
However, sometimes a unique ID is needed to ensure there is no ambiguity (with SO, there might be multiple questions with the same title, thus why ID is included, whilst the question is included because it's easier for humans that way).
Since all models must belong to a brand, you don't need both ID numbers though, so you can use something like this:
domain.com
domain.com/123/Worthington
domain.com/456/Worthington/TX193A
(where 123 is the brand number, and 456 is the model number)
You only need extra things (like /questions/ or /index.cfm or /brand.cfm or whatever) if you are unable to disambiguate different pages without them.
Remember: this part of the URL identifies the page - it needs to be possible to identify a single page with a single URL - to put it another way, every page should have a unique URL, and every unique URL should be a different page. (Excluding the query string and page fragment parts.)
Again, using the SO example - there are more than just questions here, there are users and tags and so on too. so they couldn't just do stackoverflow.com/7275745/question-title because it's not clearly distinct from stackoverflow.com/651924/evik-james - which they solve by inserting /questions and /users into each of those to make it obvious what each one is.
Ultimately, the best URL system to use depends on what pages your site has and who the people using your site are - you need to consider these and come up with a suitable solution. Simpler URLs are better, but too much simplicity may cause confusion.
Hopefully this all makes sense?
Here is an answer based on what I know about SEO and what we have implemented:
The first thing that get searched and considered is your domain name, and thus picking something related to your domain name is very important
URL with query string has lower priority than the one that doesn't. The reason is that query string is associated with dynamic content that could change over time. The search engine might also deprioritize those with query string fearing that it might be used for SPAM and diluting the result of SEO itself
As for using the URL such as
http://stackoverflow.com/questions/7113295/sql-should-i-use-a-junction-table-or-not
As the search engine looks at both the domain and the path, having the question in the path will help the Search Engine and elevate the question as a more relevant page when someone typing part of the question in the search engine.
I am not an SEO expert, but the company I work for has a dedicated dept to managing the SEO of our site. They much prefer the params to be in the URI, rather than in the query string, and I'm sure they prefer this for a reason (not simply to make the web team's job slightly trickier... all though there could be an element of that ;-)
That said, the bulk of what they concern themselves with is the content within and composition of the page. The domain name and URL are insignificant compared to having good, relevant content in a well defined structure.

How good is the Rails sanitize() method?

Can I use ActionView::Helpers::SanitizeHelper#sanitize on user-entered text that I plan on showing to other users? E.g., will it properly handle all cases described on this site?
Also, the documentation mentions:
Please note that sanitizing
user-provided text does not guarantee
that the resulting markup is valid
(conforming to a document type) or
even well-formed. The output may still
contain e.g. unescaped ’<’, ’>’, ’&’
characters and confuse browsers.
What's the best way to handle this? Pass the sanitized text through Hpricot before displaying?
Ryan Grove's Sanitize goes a lot farther than Rails 3 sanitize. It ensures the output HTML is well-formed and has three built-in whitelists:
Sanitize::Config::RESTRICTED
Allows only very simple inline formatting markup. No links, images, or block elements.
Sanitize::Config::BASIC
Allows a variety of markup including formatting tags, links, and lists. Images and tables are not allowed, links are limited to FTP, HTTP, HTTPS, and mailto protocols, and a attribute is added to all links to mitigate SEO spam.
Sanitize::Config::RELAXED Allows an even wider variety of markup than BASIC, including images and tables. Links are still limited to FTP, HTTP, HTTPS, and mailto protocols, while images are limited to HTTP and HTTPS. In this mode, is not added to links.
Sanitize is certainly better than the "h" helper. Instead of escaping everything, it actually allows the html tags that you specify. And yes, it does prevent cross-site scripting because it removes javascript from the mix entirely.
In short, both will get the job done. Use "h" when you don't expect anything other than plaintext, and use sanitize when you want to allow some, or you believe people may try to enter it. Even if you disallow all tags with sanitize, it'll "pretty up" the code by removing them instead of escaping them as "h" does.
As for incomplete tags: You could run a validation on the model that passes html-containing fields through hpricot, but I think this is overkill in most applications.
The best course of action depends on two things:
Your rails version (2.x or 3.x)
Whether your users are supposed to enter any html at all on the input or not.
As a general rule, I don't allow my users to input html - instead I let them input textile.
On rails 3.x:
User input is sanitized by default. You don't have to do anything, unless you want your users to be able to send some html. In that case, keep reading.
This railscast deals with XSS attacks on rails 3.
On rails 2.x:
If you don't allow any html from your users, just protect your output with the h method, like this:
<%= h post.text %>
If you want your users to send some html: you can use rails' sanitize method or HTML::StathamSanitizer

SEO urls in rails: .html ending vs. none. What's best?

I'm thinking about a good SEO Url strategy for a blog application. I'm not sure - and maybe it's just the same - but what is better? With or without .html
/blog/entry_permalink_name.hml
VS
/blog/entry_permalink_name
What do you think?
To answer directly you question, without the HTML is better SEO-wise. The search engines take keywords from the url into account. Now the more words or characters there are in the url the weaker the power of a given keyword.
It follows logically that there is no SEO advantage in adding '.html' at the end of the url.
Similarly removing the blog bit would enhance the power of the keywords in the title but if you want to use 'blog' as a valuable keyword, leave it.
Keep in mind that the url is just one of many factors of optimization of a page for SEO, and not the most powerful at that. The common thinking here is that none of these optimization tricks make a substantial difference by themselves but they do cumulatively.
I would suggest removing /blog/ from the url and making it as follows:
/entry-permallink-name
word 'blog' introduces extra irrelevant term to your URL
.html would be mostlikely ignored by search engines, but it's absence makes it a bit more user-friendly, so do dashes instead of underscores.
I disagree about not having the blog entry in there. I don't think 'blog' is an irrelevant term since you are writing a 'blog' application and good has a search 'blog' section.
As for your question, look in your address bar when you view this question. Stack overflow seems like a good site to emulate.
I do agree with xelurg about the dashes instead of underscores.
I would keep the unique id in the name just like stackoverflow. It's a lot simpler that way.

Resources