Does the position of a slug in a URL matter? - url

FOR SEARCH ENGINE OPTIMIZATION PURPOSES, does the location of the slug within a URL matter?
There's no doubt that you could code URL slugs to work properly in any order. I'm more interested to know if search engines place different weights to portions of the URL on the right-hand-side vs the left-hand-side
For example, here the slug appears at the end of the URL:
Why do some websites add "Slugs" to the end of URLs?
Whereas here the slug appears in the middle of the URL:
https://stackoverflow.com/questions/why-do-some-websites-add-slugs-to-the-end-of-urls/47427

It's better to push whatever has less semantic content to the right because it's more likely to get chopped off by length limits on what's considered relevant. So the second form you post would be better for SEO purposes than the way SO does it. (Better yet is using the slug as a real identifier and keeping semantic-content-free IDs out of it.)

I always go with the Rule that it is important to move from right to left when determining the most important information in your URL for the user (an actual user or google). So the question you have to ask your self, is do you want your user to see the ID or the title as the most important thing of the page.
Also what happens if they drop off the number, and just leave the title. The page blows up right, but what happens if you drop the slug and leave the number. The page functions as normal.

Unfortunately, given Google's (and most other search engine's) security through obscurity (fear of gaming of the system if any clear methods are described/explained), there's just not going to be a clear, demonstrable answer.
On the whole, you can assume that if it seems slimy, Google will penalize it, and if it seems semantic/useful, Google will promote it. In the case of the above url, it's my guess that Google will treat both the same, but that's entirely a guess, and outside of a Google algorithmic engineer stopping in here, I doubt you're going to find anything more definitive.

Parsing the URL is probably a lot easier if the slug is at the end. You can pull out the values you need from the beginning of the path, and then just ignore everything after it. (so the slug could be even more complex than what you have, with multiple "directories", etc). If you put the slug at the beginning or the middle you have to be able to parse that out in order to find what's important.

https://stackoverflow.com/questions/727281/blahblablah-lets-assume-that-this-continues-on-and-on-and-on
Now if you truncate that https://stackoverflow.com/questions/727281/blahbla still works.
In the other case: https://stackoverflow.com/questions/why-do-some-websites-add-slugs-to-the-end-of-urls/47427 truncated https://stackoverflow.com/questions/why-do-so would have no chance to work.

Related

We're dropping our beta tag. What will Google do about it?

Until now, our ASP.NET MVC site was accessible at http://beta.fleex.tv. Now that we dropped our beta label, it is located at http://fleex.tv.
We set up an http redirect from beta.fleex.tv to fleex.tv through our domain name registrar, 1&1. That redirect is pretty brutal: it doesn't look at the page consulted, just the domain, and will for instance redirect http://beta.fleex.tv/page?arg=0 straight to http://fleex.tv.
I have 2 questions:
Is there a simple way to redirect http://beta.fleex.tv/page?arg=0 to http://fleex.tv/page?arg=0? Is this a good idea, or should we instead delete beta.fleex.tv altogether?
What should we do with Google?
If we keep the 'beta' pages, what will happen to them in Google's index? With the current redirects in place they all point to http://fleex.tv. My guess is that Google will start detecting duplicate content (or even redirected content) and delete everything from the index, but I'd love to understand how things will go in more details
If we submit a new sitemap with all the fleex.tv nodes, will Google penalize us in any way or will it simply start indexing those pages from scratch, untouched by the beta.fleex.tv debacle?
Generally speaking I'd love to know what you guys think about what the best strategy might be here. This seems like a fairly common problem. I feel there's no way to avoid losing all the indexing that Google has done, in that case though I'd just like to know how this whole operation will affect our 'reputation' with Google...
Please shoot questions if this is unclear.
When it comes to SEO this is a disaster. You need to do page-level 301 redirects.
Having a different sub-domain was a mistake from the beginning because now it causes trouble and links pointing to you are now inaccurate.
Redirecting is not particularly hard to do: in Application_BeginRequest look at the Request.Host and Request.RawUrl properties and redirect if necessary.
I was going to give the same advice as usr, but to reiterate - this will absolutely murder your rankings and traffic from organic if you don't fix it, especially long tail ones.
You've got about a three-four week grace period to get page level stuff implemented before you start to lose value from the old URLs.
I would also add that because you're using 302s instead of 301s (very common in ASP.NET environments), Google will take much longer to view the redirects as permanent, repeatedly checking back to see if the pages have moved - and therefore not passing value from the subdomain to the main domain - and from personal experience will in the end pass less link value overall.
http://support.google.com/analytics/bin/answer.py?hl=en-GB&answer=2613318

Storing permalinks in the database or building on-the-fly

I was going to ask this on Meta but I think it's a general enough question to warrant a place here instead.
I'm interested in knowing some of the ways you manage permalinks in your site, specifically permalinks that are built from data that can change over time.
StackOverflow is a good example of this whereby the URL to a question is partly made up from the question title. Without posting a dud question to test I'm unsure whether the link to the question changes if the title of the question changes. My guess is that it doesn't and if it does, a canonical is likely retained to the origional url.
Changing the title on SO does not change the url
Given that as the case is it common practice to store permalinks against posts in your database? and if so, how much of the permalink would you store?
I ask the latter because there's only one part of the URL that's variable in the context of SO, and that's the question title. So should we store only the sanitized title and build up the rest based on the static information we have from the post, or should we store the whole url including the controller name and Id (etc.)?
What you usually want is some identifier uniquely identifying the data item you want to link to (in SO's case the question). How you build your URL is more a question of what you think you will be able to support for a long time and how to convey additional information to the reader.
If you look at SO URLs, you notice that they put the unique identifier at the beginning (the number after /questions/) which is enough to get to the question (try putting garbage in the rest of the URL, it will still redirect to your question). Therefore, the title at the end is just eyecandy for the user and not really used in resolving the question.
I think it's relatively common to store the permalink in the database. Space is cheap and string parsing functions can be expensive (making a question title HTTP friendly a few thousand times across thousands of questions will eat some processor) each time you want to display the link.
As for how much to store, personally, I would only store the HTTP friendly version of your question/post title in the DB (along with a primary key) for the following reasons.
Storing the entire or even part of the URL that concerns itself with Actions and Controllers will make it really, really hard to refactor/rename those things down the road. You would either need to run mass DB updates or custom URL rewrites, etc.
Only storing the friendly version of the title allows you to use it in other places. Let's take this URL to this question for example, it was probably generated by #Html.ActionLink(Question.Title, "Index", new {controller = "Questions", Id = Question.Id, Slug = Question.Slug}). Keeping the slug as a separate parameter, you can use the questionId and questionSlug parameters in other controller/action calls and keep your URLs pretty.

Is it worth using "pretty URLs" if you don't care about SEO/SEM

I'm designing a hosted software-as-a-service application that's like a highly specialized version of 37Signal's Highrise product. In that context, where SEO is a non-issue, is it worth implementing "pretty URLs" instead of going with numeric IDs (e.g. customers/john-smith instead of customers/1234)? I notice that a lot of web applications don't bother with them unless they provide a real value (e.g. e-commerce apps, blogs - things that need SEO to be found via search engines)
Depends on how often URLs are transmitted verbally by its users. People tend to find it relatively difficult to pronounce something like
http://www.domain.com/?id=4535&f=234&r=s%39fu__
and like
http://www.domain.com/john-doe
much better ;)
In addition to readability, another thing to keep in mind is that by exposing an auto-incrementing numeric key you also allow someone to guess the URLs for other resources and could give away certain details about your data. For instance, if someone signs up for your app and sees that their account is at /customer/12, it may effect their confidence in your application knowing that you only have 11 other customers. This wouldn't be an issue if they had a url of /customer/some-company.
It's always worth it if you just have the time to do it right.
Friendly-urls look a lot nicer and they give a better idea where the link will lead. This is useful if the link is shared eg. via instant message.
If you're searching for a specific page from browser history, human readable url helps.
Friendly url is a lot easier to remember (useful in some cases).
Like said earlier, it is also a lot easier to communicate verbally (needed more often than you'd think).
It hides unnecessary technical details from the user. In one case where user id was visible in the url, several users asked why their user id is higher than total amount of users. No damage done, but why have a confused user if you can avoid it.
I sure am a lot more likely to click on a link when I mouseover it, and it has http://www.example.com/something-i-am-interested-in.html.
Rather than seeing http://www.example.com/23847ozjo8uflidsa.asp.
It's quite annoying clicking links on MSDN because I never know what to expect I will get.
When I create applications I try my best to hide its structure from prying eyes - while it's subjective on how much "SEO" you get out of it - Pretty URLs tend to help people navigate and understand where they are while protecting your code from possible injections.
I notice you're using Rails app - so you probably wouldn't have a huge query string like in ASP, PHP, or those other languages - but in my opinion the added cleanliness and overall appearance is a plus for customer interaction. When sharing links it's nicer for customers to be able to copy the url: customer/john_doe than have to hunt for a "link me" or a random /customer/
Marco
I typically go with a combination -- keeping the ease of using Rails RESTful routing while still providing some extended information in URLs.
My app URLs look something like this:
http://example.com/discussions/123-is-it-worth-using-pretty-urls/
http://example.com/discussions/123-is-it-worth-using-pretty-urls/comments
http://example.com/discussions/123-is-it-worth-using-pretty-urls/comments/34567
You don't have to add ANY custom routes to pull this off, you just need to add the following method to your model:
def to_param
[ id, permalink ].join("-")
end
And ensure any find calling params[:id] in your controller is converted to an integer by setting params[:id].to_i.
Just a note, you'll need to set a permalink attribute when your record is saved...
If your application is restful, the URLs that rails gives you are SEO-friendly by default.
In your example, customers/1234 will probably return something like
<h1>Customer</h1>
<p><strong>Name:</strong> John Smith</p>
etc etc
Any current SEO spider will be smart enough to parse the destination page and extract that "John Smith" from there anyway.
So, in that sense, customers/1234 is already a "nice" URL (as opposed to other systems, in which you would have something like resource/123123/1234 for customer 1234 resource/23232/321 for client 321).
Now, if you want your users to be regularly using urls (like in delicious, etc) you might want to start using logins and readable fields instead of ids.
But for SEO, ids are just fine.

Why would Google Search use client-side URL parameters?

Yesterday morning I noticed Google Search was using hash parameters:
http://www.google.com/#q=Client-side+URL+parameters
which seems to be the same as the more usual search (with search?q=Client-side+URL+parameters). (It seems they are no longer using it by default when doing a search using their form.)
Why would they do that?
More generally, I see hash parameters cropping up on a lot of web sites. Is it a good thing? Is it a hack? Is it a departure from REST principles? I'm wondering if I should use this technique in web applications, and when.
There's a discussion by the W3C of different use cases, but I don't see which one would apply to the example above. They also seem undecided about recommendations.
Google has many live experimental features that are turned on/off based on your preferences, location and other factors (probably random selection as well.) I'm pretty sure the one you mention is one of those as well.
What happens in the background when a hash is used instead of a query string parameter is that it queries the "real" URL (http://www.google.com/search?q=hello) using JavaScript, then it modifies the existing page with the content. This will appear much more responsive to the user since the page does not have to reload entirely. The reason for the hash is so that browser history and state is maintained. If you go to http://www.google.com/#q=hello you'll find that you actually get the search results for "hello" (even if your browser is really only requesting http://www.google.com/) With JavaScript turned off, it wouldn't work however, and you'd just get the Google front page.
Hashes are appearing more and more as dynamic web sites are becoming the norm. Hashes are maintained entirely on the client and therefore do not incur a server request when changed. This makes them excellent candidates for maintaining unique addresses to different states of the web application, while still being on the exact same page.
I have been using them myself more and more lately, and you can find one example here: http://blixt.org/js -- If you have a look at the "Hash" library on that page, you'll see my implementation of supporting hashes across browsers.
Here's a little guide for using hashes for storing state:
How?
Maintaining state in hashes implies that your application (I'll call it application since you generally only use hashes for state in more advanced web solutions) relies on JavaScript. Without JavaScript, the only function of hashes would be to tell the browser to find content somewhere on the page.
Once you have implemented some JavaScript to detect changes to the hash, the next step would be to parse the hash into meaningful data (just as you would with query string parameters.)
Why?
Once you've got the state in the hash, it can be modified by your code (or your user) to represent the current state in your application. There are many reasons for why you would want to do this.
One common case is when only a small part of a page changes based on a variable, and it would be inefficient to reload the entire page to reflect that change (Example: You've got a box with tabs. The active tab can be identified in the hash.)
Other cases are when you load content dynamically in JavaScript, and you want to tell the client what content to load (Example: http://beta.multifarce.com/#?state=7001, will take you to a specific point in the text adventure.)
When?
If you had a look at my "JavaScript realm" you'll see a border-line overkill case. I did it simply because I wanted to cram as much JavaScript dynamics into that page as possible. In a normal project I would be conservative about when to do this, and only do it when you will see positive changes in one or more of the following areas:
User interactivity
Usually the user won't see much difference, but the URLs can be confusing
Remember loading indicators! Loading content dynamically can be frustrating to the user if it takes time.
Responsiveness (time from one state to another)
Performance (bandwidth, server CPU)
No JavaScript?
Here comes a big deterrent. While you can safely rely on 99% of your users to have a browser capable of using your page with hashes for state, there are still many cases where you simply can't rely on this. Search engine crawlers, for example. While Google is constantly working to make their crawler work with the latest web technologies (did you know that they index Flash applications?), it still isn't a person and can't make sense of some things.
Basically, you're on a crossroads between compatability and user experience.
But you can always build a road inbetween, which of course requires more work. In less metaphorical terms: Implement both solutions so that there is a server-side URL for every client-side URL that outputs relevant content. For compatible clients it would redirect them to the hash URL. This way, Google can index "hard" URLs and when users click them, they get the dynamic state stuff!
Recently google also stopped serving direct links in search results offering instead redirects.
I believe both have to do with gathering usage statistics, what searches were performed by the same user, in what sequence, what of the search results the user has followed etc.
P.S. Now, that's interesting, direct links are back. I absolutely remember seeing there only redirects in the last couple of weeks. They are definitely experimenting with something.

Is it OK to include an ID inside the URL?

Well, my question is simple.
Does the ID affect the position of a webpage on Google ?
I have links like this
http://example.com/news/title-slug/15/
and people say to me that I should remove the ID from the URL.
And I belive that is not true. By my logic, you can't depend on the title's slug. I know it should work perfectly fine if there aren't two pages that have the same title, but why should I remove the ID if there is no harm when it's there.
Yes, leave it there.
Google has no business trying to second-guess what each element of a URL represents and changing its index based on that.
URLs by their nature can map to any resource, and I'm pretty sure Google recognises that. All you should do is ensure that multiple URLs don't have the same content by using redirects. So, for example, http://example.com/news/wrong-title-slug/15/ should redirect back to http://example.com/news/title-slug/15/ rather than just echo back the same page. Google doesn't really like duplicate content.
It's fine.
But I would not put that behind the title-slug though. Some url might get more confusing than the others.
http://example.com/entry/how-to-solve-question-45/15
a better one would be :
http://example.com/entry/15/how-to-solve-question-45
Besides, you can't really rely on just the title-slug, because changing the title of an entry means breaking user's bookmark. Not to mention that it is faster to retrieve an entry from the database by an integer ID instead of an url-slug.
The problem here is not whether Google will accept it, but whether or not doing so is user-friendly.
A common reason for keeping the ID in a URL is to ensure that the URL is unique. For example, if two people on here were to create a question named "Jon Skeet Facts" we'd have a problem, whereas with the ID the users are aware that they are two different questions with the same title. This is the same as with relational databases where a unique identifier is required.
In essence, why care what Google thinks? The whole Search Engine Optimisation industry is a farce, and this is coming from someone who has been paid more than once as a SEO Consultant. Why follow what Google wants when you can map Google's intentions by making your website perfect for the user? If you make a good website Google will reward you. The ID has a reason to be there, so keep it in.
I think your fine leaving it in. Seems to make sense as you get the element for identification and the element for being descriptive. It is done on here after all.
Zeus won't strike you down for it. I prefer not to have meaningless numbers in there because it's not very attractive or semantic.
Having the id will NOT hurt your SEO rankings. Having the slug there ensures that the page's main keywords will be indexed so it's all good.

Resources