URLs: Dash vs. Underscore [closed] - url

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
The community reviewed whether to reopen this question 8 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
Is it better convention to use hyphens or underscores in your URLs?
Should it be /about_us or /about-us?
From usability point of view, I personally think /about-us is much better for end-user yet Google and most other websites (and javascript frameworks) use underscore naming pattern. Is it just matter of style? Are there any compatibility issues with dashes?

From Google Webmaster Central
Consider using punctuation in your
URLs. The URL
http://www.example.com/green-dress.html
is much more useful to us than
http://www.example.com/greendress.html.
We recommend that you use hyphens (-)
instead of underscores (_) in your
URLs.

Here are a few points in favor of the dashes:
Dashes are recommended by Google over underscores (source).
Dashes are more familiar to the end user.
Dashes are easier to write on a standard keyboard (no need to Shift).
Dashes don't hide behind underlines.
Dashes feel more native in the context of URLs as they are allowed in domain names.

It's not just dash vs. underscore:
text with spaces
textwithoutspaces
encoded%20spaces%20in%20URL
underscore_means_space
dash-means-space
plus+means+space
camelCase
PascalCase
" quoted text with spaces" (and single quote vs. double quote)
slash/means/space
dot.means.space

Google did not treat underscore as a word separator in the past, which I thought was pretty crazy, but apparently it does now. Because of this history, dashes are preferred. Even though underscores are now permissible from an SEO point of view, I still think that dashes are best.
One benefit is that your average semi-computer-illiterate web surfer is much more likely to be able to type a dash on the keyboard, they may not even know what the underscore is.

This is just a guess, but it seems they picked the one that people most probably wouldn't use in a name. This way you can have a name that includes a hyphenated word, and still use the underbar as a word delimiter, e.g. UseTwo-wayLinks could be converted to use_two-way_links.
In your example, /about-us would be a directory named the hyphenated word "about-us" (if such a word existed, and /about_us would be a directory named the two-word phrase "about us" converted to a single string of non-white characters.

I used to use underscores all the time, now I only use them for parts of a web site that I don't want anyone to directly link, js files, css, ... etc.
From an SEO point of view, dashes seem to be the preferred way of handling it, for a detailed explanation, from the horses mouth http://www.mattcutts.com/blog/dashes-vs-underscores/.
The other problem that seems to occur, more with the general public than programmers, is that when a hyperlink with underscores is underlined, you can't see the underscore. Advanced users will work it out, but Joe Public probably won't.
Still use underscores in code in preference to dashes though - programmers understand them, most other people don't.

Jeff has some thoughts on this: https://blog.codinghorror.com/of-spaces-underscores-and-dashes/
There are drawbacks to both. I would suggest that you pick one and be consistent.

I'm more comfortable with underscores. First of all, they match in with my regular programming experience of variable_names_are_not-subtraction, second of all, and I believe this was mentioned already, words can have hyphens, but they do not ever have underscores. To pick a really stupid example, "Nation-state country" is different from "nation state country". The former translates something like "the land of nation-states" (think "this here is gun country! Best move along, y'hear?"), whereas the latter looks like a list of sometime-synonyms. http://example.com/nation-state-country/ doesn't appear to mean the same as http://example.com/nation-state_country/, and yet, if hyphens are delimiters/"space"s in addition to characters in words, it can. The latter seems more clear as to the actual purpose, whereas the former looks more like that list, if anything.

The SEO guru Jim Westergren tested this back in 2005 from a strict SEO perspective and came to the conclusion that + (plus) was actually the best word delimiter. However, this doesn't seem reasonable and may be due to a bug in the search engines' algorithms. He recommends - (dash) for both readability and SEO.

Underscores replace spaces where whitespace is not allowed. Dashes (hyphens) can be part of a word, thus joining words with hyphens that already include hyphens is ugly/confusing.
Bad:
/low-budget-movies
Good:
/low-budget_movies

I think dash is better from a user perspective and it will not interfere with SEO.
Not sure where or why the underscore convention started.
A little more knowledgeable debate

I prefer dashes on the basis that an underscore might be obscured to an extent by a link underline. Textual URLs are primarily for being recognised at a glance rather than being grammatically correct so the argument for preserving dashes for use in hyphenated words is limited.
Where the accuracy of a textual URL is important is when reading it out to someone, in which case you don't want to confuse an underscore for a space (or vice-versa).
I also find dashes more aesthetically pleasing, if that counts for anything.

For end-user view i prefer "about-us" or "about us" not "about_us"

Personally, I'd avoid using about-us or about_us, and just use about.

Some older web hosting and DNS servers actually have problems parsing underscores for URLs, so that may play a part in conventions like these.

I personally would avoid all dashes and underscores and opt for camelCase or PascalCase if its in code.
The Wikipedia article on camelCase explains a bit of the reasoning behind it's origins. They amount to
Lazy programmers who didn't like
reaching for the _ key
Potential confusion about
readability
The "Alto" keyboard at xerox PARC
that had no underscore key.
If the user is to see the string then I'd do none of the above and use "About us." or "AboutUs" if I had to as camelCase has spread to common usage in some areas such as product names. i.e ThinkPad, TiVo

Spaces are allowed in URL's, so you can just use "/about us" in a link (although that will be encoded to "/about%20us". But be honest, this will always be personal preference, so there is no real answer to be given here.
I would go with the convention that dashes can appear in words, so spaces should be converted to underscores.

Better use . - / as separators, because _ seems not to be a separator.
http://www.sistrix.com/blog/832-how-long-may-a-linktext-be.html

Related

Using commas in URL's can break the URL sometimes?

Is anyone aware of any problems with using commas in SEO friendly URL's? I'm working with some software that uses a lot of commas in it's SEO friendly URL's; but I am 100% certain I have seen some instances where some programs/platforms don't recognize the URL correctly & cut the "linking" of the URL off after the first comma.
I just tested this out with thunderbird, gmail, hotmail & on a SMF forum with no problems; however I know I have seen the issue before.
So my question is, is there anything in particular that would cause some platforms to stop linking URL's with a comma? Such as a certain character after the comma?
There will be countless implementations that will cut the automatical linking at that point. As with many other characters, too. But that’s not a problem because of using these characters, but because of a wrong/incomplete implementation.
See for example this very site, Stack Overflow. It will cut off the link at the * when manually entering/pasting this URL (see bug; in case it gets fixed, here’s a screenshot of it):
http://wayback.archive.org/web/*/http://www.example.com/
But when using the hyperlink syntax, it works fine:
http://wayback.archive.org/web/*/http://www.example.com/
The * character is allowed in an HTTP URL path, so the link detection should have recognized the first URL instead of breaking it at the occurence of *.
Regarding the comma:
The comma is a reserved character and its meaning is relevant for the URL path (bold emphasis mine):
Aside from dot-segments in hierarchical paths, a path segment is
considered opaque by the generic syntax. URI producing applications
often use the reserved characters allowed in a segment to delimit
scheme-specific or dereference-handler-specific subcomponents. For
example, the semicolon (";") and equals ("=") reserved characters are
often used to delimit parameters and parameter values applicable to
that segment. The comma (",") reserved character is often used for
similar purposes. For example, one URI producer might use a segment
such as "name;v=1.1" to indicate a reference to version 1.1 of
"name", whereas another might use a segment such as "name,1.1" to
indicate the same.
So, if you don’t intend to use the comma for the function it has as reserved character, you may want to percent-encode it with %2C. Users copying such an URL from their browser’s address bar would paste it in the encoded form, so it should work almost everywhere.
However, especially because it’s a reserved character, the unencoded form should work, too.

is it ever appropriate to localize a single ascii character

When would it be appropriate to localize a single ascii character?
for instance /, or | ?
is it ever necessary to add these "strings" to the localization effort?
just want to give some people the benefit of the doubt and make sure there's not something I didn't think of.
Generally it wouldn't be appropriate to use something like that except as a graphic element (which of course wouldn't be I18N'd in the first place, much less L10N'd). If you are trying to use it to e.g. indicate a ratio then you should have something like "%d / %d" instead, and localize the whole thing.
Yes, there are cases where these individual characters change in localization. This is not a comprehensive list, just examples I happen to know.
Not every locale uses , to separate thousands and . for the decimal. (However, these will usually be handled by your number formatter. If you do so yourself, you're probably doing it wrong. See this MSDN blog post by Michael Kaplan, Number format and currency format are not always the same.)
Not every language uses the same quotation marks (“, ”, ‘ and ’). See Wikipedia on Non-English Uses of Quotation Marks. (Many of these are only easy to replace if you use full quote marks. If you use the " and ' on your keyboard to mark both the start and end of sentences, you won't know which of two symbols to substitute.)
In Spanish, a question or exclamation is preceded by an inverted ? or !. ¿Question? ¡Exclamation! (Obviously, you can't fix this with a locale substitution for a single character. Any questions or exclamations in your application should be entire strings anyway, unless you're writing some stunningly intelligent natural language generator.)
If you do find a circumstance where you need to localize these symbols, be extra cautious not to accidentally localize a symbol like / used as a file separator, " to denote a string literal or ? for a search wildcard.
However, this has already happened with CSV files. These may be separated by ,, or may be separated by the local list separator. See What would happen if you defined your system's CSV delimiter as being a quotation mark?
In Greek, questions end with a semicolon rather than ?, so essentially the ? is replaced with ; ... however, you should aim to always translate the question as a complete string including question mark anyway.

Dashes vs underscores in URLs for SEO [duplicate]

This question already has answers here:
URLs: Dash vs. Underscore [closed]
(18 answers)
Closed 8 years ago.
What do search engines presently prefer us to use as word separators in URLs: dashes or underscores? Should I use http://example.com/dash-underscore
or http://example.com/dash_underscore?
Search engines tend to treat them differently. Google likes to treat two words joined by an underscore as a single word, but dashes are considered to be seperating puntuation. Try it out yourself!
I tried search for search_engine and search-engine. The first gave me pages and urls with that exact phrase, the second was a more general search, treating the dash - like a space.
A lot of the blog sites nowadays build URL slugs using dashes as opposed to underscores as it is a lot easier to read so I wouldn't be surprised if search engines score dashes higher on the results than underscores.
Use hyphens.
I have asked several SEO consultants and they always say to use hyphens in URLs for separate words. That seems in line with practices on most commons sites I've looked at.

What's the best character to represent blank spaces in a URL?

When you are building URLs that should be legible for users and search engines and you do it automatically from the content, what's the best way to represent blank spaces? Hyphens (this is what StackOverflow uses)? Underscores? Any other? Does any of those make a different for SEO?
Both are valid URL characters and both have their pros and cons.
Pro dash
Google recommends dashes, and here is what Matt Cutts from Google has to say about
Dashes vs. underscores.
If you have a url like word1-word2,
that page can be returned for the
searches word1, word2, and even “word1
word2″.
That’s why I would always choose
dashes instead of underscores.
Dashes seem to be what major blogs do:
The Huffington Post,
TechCrunch,
Engadget, ...
Dashes seem to be what major CMS do.
Not sure about that one anymore, can anyone comment?
As mentioned by Kazar, underscores can clash with the underlining of links.
I find underscores awkward to type.
Rene Saarsoo pointed out that dashes take less space than underscores in proportional fonts.
Ionut G. Stan mentioned that underscores are not allowed in hostnames. If you strive for consistency you should opt for dashes.
Pro underscore
Dashes are not allowed in
ISO9660 file systems.
This can be a problem if your content is also shipped on DVD or CD (e.g help files or
eLearning content).
In some languages (e.g. German) dashes can be word characters and are not generally considered word separators.
Another advantage of dashes is that in proportional font they take less space that underscores. Compare:
https://stackoverflow.com/../whats-the-best-character-to-represent-blank-spaces-in-a-url
https://stackoverflow.com/../whats_the_best_character_to_represent_blank_spaces_in_a_url
It's not a lot, but every little helps :)
Again, personal preference - personally I think hyphens work better than underscores, because underscores can clash with the underlining a tags add (by default), so http://someurl.com/this_is_a_address looks like there are no underscores there. (as this is stack overflow, roll over the link). http://someurl.com/this-is-a-address looks fine.
You know, if you buy a domain name, you're allowed to use hyphens inside that name, but no underscores. This is an additional reason for which I believe hyphens are better than underscores.
I'd say dashes. I used to use underscored for pretty much every such purpose (representing spaces) but nowadays, with all the visual thingies blinking all round, you often find underlining that makes them normally invisible.
This may answer your question. Things looks like changed for Google few years ago about - and _
See this article here:
http://www.blog-tutorials.com/marketing-and-seo/linking/google-oks-underscores-as-word-separators-in-urls-and-more-seo-tips/
I think that depends on your favorite. My favourites are underscores, but I don't see any (dis-)advantages if using hyphens or other valid URL characters instead. And everything looks better than %20 :)

What are all of the allowable characters for people's names? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
There are the standard A-Z, a-z characters, but also there are hyphens, em dashes, quotes, etc.
Plus, there are all of the international characters, like umlauts, etc.
So, for an English-based system, what's the complete set? What about sets for other languages? What about UTF8, UTF16, etc?
Bonus question: How many name fields are needed, and what are their maximum lengths?
EDIT: There are definitely two different types of characters involved in people's names, those that are there as part of the context, and those that are there for structural reasons. I don't want to limit or interfere with the context characters, but I do need to deal with the structural ones.
For example, I had a name come in that was separated by an em dash, but it was hard to distinguish that from the minus character. To make the system easier for searching, I want to take all five different types of dashes, and map them onto one unique character (minus), that way the searcher doesn't need to know specifically which symbol was initially entered.
The problem exists for dashes, probably quotes as well, but also how many other symbols?
There's good article by the W3C called Personal names around the world that explains the problems (and possible solutions) pretty well (it was originally a two-part blog post by Richard Ishida: part 1 and part 2)
Personally I'd say: support every printable Unicode-Character and to be safe provide just a single field "name" that contains the full, formatted name. This way you can store pretty much every form of name. You might need a more structured storage, but then don't expect to be able to store every single combination in a structured form, as there are simply too many different ones.
Whitelisting characters that could appear in a person's name is the wrong way to go, if you ask me. Sure, [A-Za-z] is a fair starting point, but, as you said, you get problems with "European" names. So you map all the umlauts, circumflexes and those. What about Chinese names? Japanese? Indian? Hebrew? You're entering a battle against wind turbines.
If you absolutely must check the validity of someone's name, I'd suggest doing a modest blacklist of certain characters. Braces, mathematical characters, some punctuation and such might be safe to ignore. But I'd be cautious, if I were you.
It might be best to just accept whatever comes in. UTF-16 should be today's overkill character set, that should be adequate for some years to come.
Edit: As for your question about name length and amount of names. If you really want people to write their real and complete names, I guess the only foolproof answer to both of those questions would be "infinite". Not being able to whip out any real examples for human beings, but surely there are analogous examples for humans as the native name for the city of Bangkok.
I don't think there's a definitive answer. After all, some people have names that can't even be expressed in UTF-16...
There are some odd people out there, who will give their kids the craziest of names, including putting in weird punctuation, accents that don't exist in their own language, etc.
However, you can place arbitrary restrictions on your database. If you want to you can insist on 7 bit ASCII names. It's slightly rude to users, but they'll live with it. It certainly makes searching easier.
My colleague's daughter is named Amélie. But even some (not all!) official British government web sites ("Please enter the name exactly as shown on the birth certificate") won't accept the unicode, so he has to use 'Amelie' instead.
Any character that can be represented by any multiple of eight bits (greater than zero) is a possible character for a person's name. Lengths of both names and encodings are arbitrary, so no upper bound should be considered.
Just make sure you sanitize your database inputs so little Bobby Drop-tables doesn't get ya.
On the issue of name fields, the WRONG answer is first name, middle initial, last name, etc. for many reasons.
Many people are known by their middle name, and formally use a first initial, middle name, last name format.
In some cultures, the surname is the first name, and the given name is the last name.
Multiple first and/or middle given names is getting more common. As #Dour High Arch points out, the other extreme is people with only one word in their name.
In an object-oriented database, you would store a Name object with methods to return a directory-style or signature-style name; and the backing store would contain whatever data was necessary to support those methods.
I haven't yet seen a relational database model that improves on the model of two variable-length strings for directory-style and signature-style names.
I'm making software for driving schools in the USA, so to me what matters most what the state DMV's accept as a proper name on a driver's license. In my case, it would cause problems to allow names beyond what the DMV allows, even if such names were legal because the same name must later be used for a driver's license.
From StackOverflow, I still hadn't confirmed the answer I needed. And I happen to know that in my state (Calif) they're using AS400's with software probably written in COBOL, and to the best of my knowledge, those only support an 8-bit character set. (Is it EBCDIC?) Anyway... Ugh.
So, I called the California DMV... Sure enough, their system allows A-Z and spaces and absolutely nothing else. Not even hyphens are allowed -- Hyphens are replaced with spaces. In fact, apparently just to be difficult, they only use capitals. And names such as "O'Malley" must be replaced with OMALLEY.
Leave it to government. I must say I'm thrilled not to be a developer working for DMV. (Although I could really use that kind of salary.)
It really depends on what the app is supposed to be used for.
Sure, in theory it's great if you allow every script on god's green earth to be used, but if the DB is also used by support staff, are they going to be able to handle names in Japanese, Hebrew and Thai script? Can you printer, if it's used to print postage labels?
You might add an extra field "Latin Transcription", but IMO it's really OK to restrict it to ISO-8859-1 characters - People who don't use Latin characters are by now so used to having to use a transcription that they don't mind it anymore, unless they're hardcore nationalists.
UTF-8 should be good enough, as far as name fields, you'll want at minimum a first name and last.
Depending on the complexity of your name structure I could see:
First Name
Middle Initial/Middle Name
Last Name
Suffix (Jr. Sr. II, III, IV, etc.)
Prefix (Mr., Mrs., Ms., etc.)
What do you do when you have "The Artist Formerly Known as Prince". That symbol he used is not a character in the unicode set (AFAIK).
It's some levity, but at the same time, names are a rather broad concept that doesn't lend itself well to a structured format. In this case, something free-form might be most appropriate.

Resources