Lack of invariance in stackoverflow URL. Why? [duplicate]

Lack of invariance in stackoverflow URL. Why? [duplicate] - url

This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
Why do some websites add “Slugs” to the end of URLs?
This is not a question about stackoverflow, it's a question about a design decision which stackoverflow implements, and I take it as example.
A question on stackoverflow is identified by the following URL (took one from the suggestions)
https://stackoverflow.com/questions/363292/why-is-visual-c-lacking-refactor
Similarly, my user URL is:
https://stackoverflow.com/users/78374/stefano-borini
fact is, only the numeric index is actually used
https://stackoverflow.com/users/78374/
The remaining part can be anything. What is the reason behind such design decision, in particular considering that "cool URIs do not change"
Edit: voting for close after I saw this question which substantially puts the same issue forward. My question is a duplicate

Part of the reason is so you can change your user name or the title of the post (correcting spellings etc.) but leave the URL valid.
It makes SEO sense to have the title in the URL - it makes it a lot more likely that the site will get indexed correctly.

It allows the URL to contain some interesting information for humans and search engines, but still works even if the title changes.
You could store the original "slug" in the database and verify against that as well as the id, but the only thing it prevents is games like this:
Lack of invariance in stackoverflow URL. Why?
:)

Search engines like text in URLs.
Pages are given higher rank when the search terms actually appear in the URL rather than just the page. It's robot sugar, basically.

This allows you to see in the url some text which means something to you. If I look in a history of links a bunch of questions, the number alone would be meaningless. However, having the text there allows me to have some context.

This is SEO (search engine optimization) in action. It helps with the ranking of pages in search results (on google, yahoo, bing, ...), because search engines give higher rankings to pages which URL's contain keywords the user is searching for.

Theoretically, it's for SEO reasons. The software ignores the part following the identifier (the "slug"), but the idea is that search engine crawlers consider the description part of the link text, and thus weigh the resulting page higher in search results. Whether this actually happens in any meaningful way, I don't know for sure.
A more practical use is that you can gain a better idea of where a link's pointing just by inspecting the slug, which is handy if you've got multiple question URLs.

In addition to the benefit for Search Engines (the text in the URL is powerful), the fact is for all practical purposes this URL does not "change". A change can be defined as something which causes a link at some point in the future not to work - this would never be the case with this URL. The varying text at the end does not affect any user's ability to access the underlying resource.

"cool URIs do not change"
Cool URIs can change, as long as the old ones are still fine.
Maybe someone will decide that “Lack of invariance in stackoverflow URL. Why ?” is a bad question title, and change it to “Why is there redundant information in SO URLs?”. It would be good if the slug can update to reflect the new title, especially if the reason we wanted to change it was an embarrassing typo. But the old URI must continue to work.
One drawback to non-canonical URIs is that search engines can get confused and think they're two different pages. Or they'll spot that they're two pages the same, but decide that the ‘best’ page to link to is the one with the title you don't want. This is especially bad if lots of people link to another title completely like:
http://stackoverflow.com/questions/1534161/stackoverflow-smells-of-poo
cue more embarrassment. The best way around this (though few sites bother, and SO doesn't) is to check the slug on submission and do a 301 permanent redirect to the new URI with the up-to-date slug instead. Search engines will pick up the new URI and not any malicious one with poo in it.

Related

URL keyword vs URL readibility

this question is about SEO in URL naming, I just want to know is SEO really weight much more than user experience? What you guys will see as limit to how far SEO should go as ruining people's experience. Just like for this example, I have a page that contain information about art contest that is running or have run in my website.
Which URL is better?
example.com/contest/{contest-id}/{name-of-contest}
or
example.com/online-graphic-design-contest/{contest-id}/{name-of-contest}
Is keyword stuffing in url for keyword such as 'online', 'graphic', 'design' and 'contest' so much more important in SEO, than having a short more readable URL such as the first one?

The best way to think about SEO these days is through the perspective of the user, firstly, and then through the search engine perspective. I would argue that your second URL is much better for both cases. It's more descriptive to the user (we have an "online graphic design contest") and also to search engines.
Google has made it apparent that their focus is on providing content that is relevant to the user, and the best way to be relevant is with content that is descriptive and fits with what your users are searching for. I don't think you're keyword stuffing if you're using a single natural language phrase in the URL to describe the content of the page. That portion of the URL should also match your page title, and header tags on the page, etc., etc.
Here are some useful resources:
http://static.googleusercontent.com/media/www.google.com/en/us/webmasters/docs/search-engine-optimization-starter-guide.pdf
http://linchpinseo.com/user-focused-seo-redefining-what-search-engine-optimization-is

Best way to format pretty URLs for numeric IDs

Alright, so let's say I'm writing a forum application, and I want pretty URLs. However, all my tables use numeric IDs, so I'm not sure the best way to format the URLs for those resources. Let's pretend I'm trying to get a topic with ID 123456 and title This is a forum post. I've seen it done a couple ways:
www.example.com/topic/123456
www.example.com/topic/this-is-a-forum-post
www.example.com/topic/123456/this-is-a-forum-post
Which one would you say is, taking all things into consideration (including SEO), the optimal URL?
Sorry if this question is too vague, but it seems programming-related and it's not incredibly open-ended, as I just want to hear the pros and cons of each method.

I would go with option 3, and make the slug (the last bit) optional
Because?
The ID will always be unique... 2 people may make a thread with the name 'good news' for example
The search bots can access the slug for some SEO goodness
The slug should be optional ... Using just the ID should still give you access to the site. Perhaps if the slug isn't there you could forward to the slug'd version, if you're concerned about duplicate content. You could always use the canonical meta tag to tell Google to index the slugged version.
Another benefit of the optional slug is if someone copies and pastes the URL into a document, there is a chance it could have characters at the end chopped off (because URLs generally don't have spaces, so they don't break to new lines). Having the slug optional means there is more of a chance people will find your page.
I believe this is what Stack Overflow does.. and also notice they are doing rather well in the Search Engines.
Update
From the comments, be sure to 301 redirect any missing slug version to the correct slug.

URL 1 is definitely suboptimal. URL 2 is attractive but you run the risk of confusion if tags collide, especially if they differ only in punctuation. So I'd say URL 3 is the clear winner.
Also note that just because you display URL 3 is no reason not to accept all 3, with the other two redirecting. If URL 2 is ambiguous, it should redirect to a disambiguation page.

I would think that the 2nd URL would be the best for SEO since it is meaningful and has less depth. It's nicer for people as well since you can look at the URL and know what the content is about.

Doesn't include the title, so you'll lose the additional SEO value of having those keywords in the URL.
Won't work well, because it doesn't have a unique numerical ID, so what are you going to do if someone else tries to post a topic titled "This is a forum post"? Then you start getting into the weird thing digg does, where it has to give the second one the url "http://www.example.com/topic/this-is-a-forum-post_2", and so on. It makes it harder to take the URL they tried to load, and figure out exactly which topic they were trying to get to.
Has the best of both worlds, this would be my style of choice.

Stackoverflow seems to using pattern 3, with the title being ignored completely (just the id is used).
That makes for nice semantic URL, and is also easy to implement, and still works if the title changes later.
Of course, the title could be completely fake:
Best way to format pretty URLs for numeric IDs

I'll go for the first one. You know it really doesn't matter now. Since there are Long URLs converter and it will just proliferate and will become the norm in the future. Remember the longer your URL the less SEO points you'll get.
And you can't control the way people name their forum topics. So really, I'll just choose the first one for simplicity and the norm.

For SEO/traffic, definitely no.2 without a doubt. Get those meaningless numbers out of the URL every single time.
www.example.com/topic/this-is-a-forum-post
pickup the "this-is-a-forum-post" from your database and map it back to the ID number within your database via a query. Then do an internal URL re-write to the real page, something like /topic.php?ID=324342

I would go with option 2, as SEO can better understand.
Stack Overflow uses the third way, probably, that is the reason, Stack Overflow urls were not optimized for SEO. I am not sure in the above answer.
But In my experience with Google, Quite Often, I could see a solution from other forums, whereas stackoverflow solutions were almost invisible.
Best way to format pretty URLs for numeric IDs
Best way to format pretty URLs for numeric IDs
if the both urls were one and the same, the SEO simply goes with option 2, which is less optimized.

I'm not convinced longer URL's are SEO trouble. The depth seems to be a bigger issue, and not by counting slashes, but by steps it takes to get from an indexed page with rank to the content page. I recently created a dummy test page titled /content/roofing/how-much-does-a-shingle-roof-cost.html and threw it on the server just to test pathways and make sure my directories were working correctly. I'm not even sure how google discovered the page but it did and it started getting traffic, so I had to give it content and make it part of the family. The dummy content was a copy of our about page so it wasn't empty, but I was surprised an unpromoted page would get traction, and think the URL had something to do with that.
Which brings up a slight alternative to the above 3 choices for a URL. What if you went with number 3 but added .html to the end? I generally do this with dynamic URL's but I have no concrete evidence that it's helpful. According to Google they brag that they can index dynamic URL's just fine and so there's no need to do URL rewrites at all. Google doesn't mind a bit if the other engines aren't as good at that. Several sites I trust add the html at the end (blogger for example) and it can't hurt, so I still do it.

i would suggest the first one, since the topic title can be changed for clarity, by the admins and then the url will be inconsistent.
www.example.com/topic/123456
also allows one to just edit the last bit of the url (the numbers and jump to another topic), not likely to happen but still a usable feature.

Can an "SEO Friendly" url contain a unique ID?

I'd like to start using "SEO Friendly Urls" but the notion of generating and looking up large, unique text "ids" seems to be a significant performance challenge relative to simply looking up by an integer. Now, I know this isn't as "human friendly", but if I switched from
http://mysite.com/products/details?id=1000
to
http://mysite.com/products/spacelysprokets/sproket/id
I could still use the ID alone to quickly lookup the details, but the URL itself contains keywords that will display in that detail. Is that friendly enough for Google? I hope so as it seems a much easier process than generating something at the end that is both unique and meaningful.
Thanks!
James

Be careful with allowing a page to render using the same method as Stack overflow.
http://stackoverflow.com/questions/820493/random-text-can-cause-problems
Black hats can this to cause duplicate content penalty for long tail competitors (trust me).
Here are two things you can do to protect yourself from this.
HTTP 301 redirect any inbound display url that matches your ID but doesn't match the text to the correct text.
Example:
http://stackoverflow.com/questions/820493/random-text-can-cause-problems
301 ->
http://stackoverflow.com/questions/820493/can-an-seo-friendly-url-contain-a-unique-id
Use canonical URLs.
<link rel="canonical"
href="http://stackoverflow.com/questions/820493/can-an-seo-friendly-url-contain-a-unique-id"
/>

https://stackoverflow.com/questions/820493/can-an-seo-friendly-url-contain-a-unique-id
I'd say you're fine.

Have a look at the URLs that StackOverflow uses. They have a unique id, then they have the SEO-friendly stuff. You can omit the SEO-friendly stuff and the URL still works.

You are making a devils bargan here, you are trading away business goals for technology goals.
If you were to ask "From a purely business and SEO prospective, is it better to include unique IDs in the URL or not?"; the answer would clearly be to not use them.
The question then becomes, if you do use them, how much does it hurt you in the search engines? The answer is that it definately has some negative impact. How much is yet to be determined.
In terms of "user friendly", no, they are definitely not user friendly.
In terms of Google, they state "Whenever possible, shorten URLs by trimming unnecessary parameters." See their URL structure document.

I'm not aware of any problems caused by adding an ID to a URL. In fact it can be extremely useful, as it allows the human/search engine friendly part of the URL to be changed without causing a broken link to a page that a search engine has already indexed. Using SO as an example, here's a link to your question:
https://stackoverflow.com/questions/820493/you-can-put-any-text-you-want-here

Nothing wrong with that. An increasing number of services have started to use a hybrid solution as Paul Tomblin already pointed out. In addition to SO, Tumblr uses this pattern too (maybe it was the first).
Furthermore, in certain services—like Google News—the URL must contain a unique numeric ID.

Getting rid of the parameterized URL will definitely help. From my experience, including the ID does not hurt or help, as long as there are no '?key=value' pairs in the url.

I have two seemingly contradictory points to make here:-
Nobody looks at URLs! Experience has "trained" browser users to render the "Address" box contents as invisable, they know the contents will be any two of 'ureadable', 'meaningless' and 'confusing', hence they just ignore it completely.
Using a String which can be easily converted to an integer may offer a slight performance advantage over using a longer string which is slightly harder (hash() vs. to_int() ) to convert into an integer. However in the context of the average web application any performance difference would would be negligable.
My advice would be to stick with what your comfortable with.

Use something like modrewrite to parse URLs before they reach your server. So you could convert a slug like http://oorl.com/99942/My-Friendly-Text-For-Search-Engines/ into http://oorl.com/lookup.php?id=99942. This will also let you change slug and keywords used to optimize certain links without damaging functionality.

Duplicate refer cause more negative impact compare to friendly URL, be careful about using fake text with id, your competitors could miss use this.

Yes, and in fact it's more SEO friendly to include a number in your url as it implies to google that you are consistently updating your content.
I am fairly sure that it makes it much more difficult to get indexed in Google News if you don't have an incrementing number attached in some way to your URLs.

Why should I use "Web 2.0"-style URLs? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
In short, why use something like http://stackoverflow.com/badges/6/supporter instead of something "simpler" (and subjectively, at that) like http://stackoverflow.com/badges/6/.
Even on my own site I've just been using /post/6/ to reference posts (by IDs, even though I still store a slug.) Instead of /post/6/small-rant-on-urls, and in some cases, they can get even more absurd, much more so than is really necessary.

Search Engine Optimisation would be one, as well as making the URL more readable to humans. Search engines generally like your URL, Title and H2 to contain the "topic" of the page.
If you have both in there then you can manually type /ID and get automagically taken to the "flowery" URL with rewriting.. saves your fingers a bit :)

Because you can potentially end up with duplicates if you're not careful. I imagine stack overflow added the ID because there was a high potential for duplicates given the volume of posts created.
Other systems may choose not to use the ID in the URL - for example, a blogging system probably would not need to.
It's a better idea if you have user generated content that results in a new URL created to include a post ID. If the only way new URL's can be created is through administrator type access, you can probably do without it as long as you check for duplicates.

Adding the slug in all links to the content helps with search engines, because search engines will generally use words in the URL itself to help index content.

The reason for including the id in the url is that it makes it easier behind the scenes to retrieve the correct article from the database, as a lookup can be performed on the ID rather than the article's title.
The reason for including the full title of the article, is that Google gives heaps of bonus points for search terms that are matched in the filename.

URL is part of the Web user interface.
There is an eyetracking study of search engine use that found that people spend 24% of their gaze time looking at the URLs in the search results.
Searchers are particularly interested in the URL when they are assessing credibility and usefulness of a destination. If the URL looks like garbage, people are less likely to click on that search hit. On the other hand, if the URL looks like the page will address the user's question, they are more likely to click.

#Greg Hewgill
Adding the slug in all links to the content helps with search engines, because search engines will generally use words in the URL itself to help index content.
I should have clarified a bit: I meant URLs that have both an id and slug in them. I just don't see the point in having something like /post/1/la-la-la-la-text-hahahaha vs /post/1/ or /post/la-la-la-la-text-hahahaha, since the first one would work without the extranous text at the end.

It could be that is faster to get the post in a blog by the id than by the slug, so put the id for the SQL query and the slug for the search engines (SEO).
https://stackoverflow.com/users/58163/movaxes65675
I like the /post/la-la-la-la-text-hahahaha type, i can remember the url, know what the title of the post is (before actually loading the site). Don't like much the /post/1/ it means nothing to me but post #1 (bad for marketing?)
edit: id also helps to avoid duplicates as andybaird pointed

Well, firstly it should be pointed out that the "Web 2.0 style URLs" are actually part of something called REST. Those URLs are sometimes called RESTful URLs. The claimed benefits are:
Provides improved response time and reduced server load due to its support
for the caching of representations;
Improves server scalability by reducing the need to maintain session
state. This means that different
servers can be used to handle
different requests in a session;
Requires less client-side software to be written than other approaches,
because a single browser can access
any application and any resource;
Depends less on vendor software and mechanisms which layer additional
messaging frameworks on top of HTTP;
Provides equivalent functionality when compared to alternative
approaches to communication;
Does not require a separate resource discovery mechanism, due to
the use of hyperlinks in
representations;
Provides better long-term compatibility and evolvability
characteristics than RPC. This is due
to:
The capability of document types such as HTML to evolve without
breaking backwards- or
forwards-compatibility; and
The ability of resources to add support for new content types as they
are defined without dropping or
reducing support for older content
types.

Why do some websites add "Slugs" to the end of URLs? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
Many websites, including this one, add what are apparently called slugs - descriptive but as far as I can tell useless bits of text - to the end of URLs.
For example, the URL the site gives for this question is:
https://stackoverflow.com/questions/47427/why-do-some-websites-add-slugs-to-the-end-of-urls
But the following URL works just as well:
https://stackoverflow.com/questions/47427/
Is the point of this text just to somehow make the URL more user friendly or are there some other benefits?

The slugs make the URL more user-friendly and you know what to expect when you click a link. Search engines such as Google, rank the pages higher if the searchword is in the URL.

Usability is one reason, if you receive that link in your e-mail, you know what to expect.
SEO (search engine optimization) is another reason. Search engines such as google will rank your page higher for the keywords contained in the url

I recently changed my website url format from:
www.mywebsite.com/index.asp?view=display&postid=100
To
www.mywebsite.com/this-is-the-title-of-the-post
and noticed that click through rates to article increased about 300% after the change. It certainly helps the user decide if what they're thinking of clicking on is relevant, in terms of SEO purposes though I have to say I've seen little impact after the change

I agree with other responses that any mis-typed slug should 301-redirect to the proper form. In other words, https://stackoverflow.com/questions/47427/wh should redirect to https://stackoverflow.com/questions/47427/why-do-some-websites-add-slugs-to-the-end-of-urls . It has one other benefit that hasn't been mentioned--if you do not do a redirect to a canonical URL, it will appear that you have a near-infinite number of duplicate pages. Google hates duplicate content.
That said, you should really only care about the content ID and allow any input for the slug as long as you redirect. Why?
https://stackoverflow.com/questions/47427/why-do-some-websites-add-slugs-to-the-end-of-urls
... Oops, the mail software cut off the end of the URL! No problem though because you still can roll with just https://stackoverflow.com/questions/47427
The one big problem with this approach is if you derive the slug from the title of your content, how are you going to deal with non-ASCII, UTF-8 titles?

The reason most sites use it is probably SEO (Search Engine Optimization). Yahoo used to give a reasonable weighting to the presence of the search keyword in the URL itself, and it also helped in the Google result as well.
More recently the search engines have lowered the weighting given to keywords in the URL, likely because the technique is now more common on spam sites than legitimate. Keywords in the URL now have only a very minor impact on the search results, if at all.
As for stackoverflow itself, SEO might be a motivation (old habits die hard) or simply for usability.

It's basically a more meaningful location for the resource. Using the ID is perfectly valid but it means more to machines than people.
Strictly speaking the ID shouldn't be needed if the slug is unique, you can more easily ensure unique slugs by scoping them inside dates.
ie:
/2008/sept/06/why-some-websites-add-slugs-end-of-urls/
Basically this exploits the low likelihood of two identical slugs being in use on the same day. If there is a clash the general convention is to add a counter at the end of the slug but it's rare that you ever see these:
/2008/sept/06/why-some-websites-add-slugs-end-of-urls/
/2008/sept/06/why-some-websites-add-slugs-end-of-urls-1/
/2008/sept/06/why-some-websites-add-slugs-end-of-urls-2/
A lot of slug algorithms also get rid of common words like "the" and "a" to assist in keeping the URL short. This scoped approach also makes it very straightforward to find all resources for a given day, month or year - you simply chop off segments.
Additionally, stackoverflow URLs are bad in the sense that they introduce an additional segment in order to feature the slug, which is a violation of the idea that each segment should represent descending a resource hierarchy.

The term slug comes from the newspaper/publishing business. It's a short title that's used to identify a story in progress. People interested in URL semantics started using a short, abbreviated title in their URLs. It also pays off in SEO land, as keywords in URLs add importance to a page.
Ironically, lots of websites have started place a full serialized-with-hyphens version of the titles in their URLs for strictly SEO purposes, which means the term slug no longer quite applies. This also rankles semantic purists, as many implementations just tack this serialized version of the title at the end of their URLs.

I note that you can change the text freely. This URL appears to work just as well.
https://stackoverflow.com/questions/47427/why-is-billpg-so-very-awesome

As already stated, the 'slug' helps people and the search engines...
Something worth noticing, is that in the source of the page there is a canonical url
This stops the page from being index multiple times.
Example:
<link rel="canonical" href="http://stackoverflow.com/questions/47427/why-do-some-websites-add-slugs-to-the-end-of-urls">

Remove the formatting from your question, and you'll see part of the answer:
https://stackoverflow.com/questions/47427/
vs
https://stackoverflow.com/questions/47427/why-do-some-websites-add-slugs-to-the-end-of-urls
With no markup, the second one is self-descriptive.

Don't forget readability when sending a link, not just in search engines. If you email someone the first link they can look at the URL and get a general idea of what it is about. The second one gives no indication of the content of that page before they click.

If you emailed someone a link wouldn't it make more sense to include a description by actually writing out a description rather than making the other person parse to the URL where the description exists, and try-to-read-a-bunch-of-hyphenated-words-stuck-together.

First off, it's SEO and user friendly, but in the case of the example (this site), it's not done well or correctly
(as it is open to black hat tricks and rank poisoning by others, which would reflect badly on this site).
If
https://stackoverflow.com/questions/47427/why-do-some-websites-add-slugs-to-the-end-of-urls
has the content, then
https://stackoverflow.com/questions/47427/
and
https://stackoverflow.com/questions/47427/any-other-bollix
should not be duplicates. They should actually automatically detect the link followed is not using the current text (as obviously the slug is defined by the question title and can be later edited) and they should redirect 301 automatically to
https://stackoverflow.com/questions/47427/why-do-some-websites-add-slugs-to-the-end-of-urls
thus ensuring the "one piece of content to one URI" rule, and if the URI moves/changes, ensure the old bookmarks follow/move with it through 301 redirects (so intelligent browsers can update the bookmarks).

Ideally, the "slug" should be the only identifier needed. In practice, on dynamic sites such as this, you either have to have a unique numerical identifier or start appending/incrementing numbers to the "slug" like Digg does.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Lack of invariance in stackoverflow URL. Why? [duplicate] - url

Part of the reason is so you can change your user name or the title of the post (correcting spellings etc.) but leave the URL valid. It makes SEO sense to have the title in the URL - it makes it a lot more likely that the site will get indexed correctly.

Search engines like text in URLs. Pages are given higher rank when the search terms actually appear in the URL rather than just the page. It's robot sugar, basically.

This allows you to see in the url some text which means something to you. If I look in a history of links a bunch of questions, the number alone would be meaningless. However, having the text there allows me to have some context.

This is SEO (search engine optimization) in action. It helps with the ranking of pages in search results (on google, yahoo, bing, ...), because search engines give higher rankings to pages which URL's contain keywords the user is searching for.

Related

URL keyword vs URL readibility

Best way to format pretty URLs for numeric IDs

Can an "SEO Friendly" url contain a unique ID?

Why should I use "Web 2.0"-style URLs? [closed]

Why do some websites add "Slugs" to the end of URLs? [closed]

Categories

Resources