URL my-web-url.com vs myweburl.com in SEO - url

Can anyone suggest about the different between two domain in Search engine and it's effect. although there are two different words in the domain most prefer domain without "-" but in my knowledge "-" means space in the URL and "_" means same words but this two symbols are least use in domain name. Can anyone provide the different on these two.

One should first give priority to the domain name without '-' because it is hard to pronounce when telling someone your domain name, as well as chances are high that people will often forget '-' in your domain name when they are typing, at least the first few times. Of course this will impact your business negatively.
Also, the domain with hyphen doesn't produces very good feeling in the customer as well. Agree with what #chimpsarehungry said in the earlier answer.
Other than that, I guess it doesn't matters much in the SEO though. May be even produces good effect in some cases as in long URLs. For eg. WordPress posts. URL's with '-' are search engine friendly.

Take a look for yourself, based on 2011 data gathered by SEOmoz:
http://www.seomoz.org/article/search-ranking-factors#metrics
Not looking so good for dashes. Some of that is from correlation of spammers using such domains, but definitely not all of it. I apologize I don't have a reference to back this up, but there was a Matt Cutts QA where he said multiple dashes is indicative of spam and does indeed get a negative hit in overall rank score. I believe it was part of a big keynote speech so it'd be hard to find. You'll just have to take my word for it.

I don't think this will matter at all. But as a search engine user the sites with dashes in between them look spam-like to me. Name one popular website with a dash.

Related

Find almost-duplicate strings in Objective-C on iOS

I have a list of song tracks that I uploaded from the iTunes API. Some of them are duplicates, but not perfect duplicates. For example, one might say "All 4 u" vs "All for you", or "Some song" vs "some song feat. some other artist"
I want to be able to identify the duplicates. Is the best way to compute the Levenshtein distance for all pairs? That seems excessive.
I'm working in the Cocoa Touch framework for iOS programming so if anyone knows of any libraries that would help a lot.
Why do you consider computing the Levenshtein distance excessive? What algorithm would you use if you were sitting down to a list with pencil and paper?
That said, Levenshtein is likely necessary, but not sufficient. I would start by normalizing the strings. In some cases, a string might normalize a couple of ways and you'll need to do both. Normalization would look like:
convert to lowercase
Strip any leading numbers followed by punctuation ( "1.", "1 - ", etc.)
Tentatively strip anything after "feat." or "with"
This is an example of special knowledge about your problem set. You're going to have to use a lot of special knowledge like this.
"Tentatively" means you should probably keep both the stripped and non-stripped versions of the string
Keep in mind that things including "feat." might be remixes, so you have to be careful about assuming duplicates. This is of course true of almost any attempt at de-dupping. There are often multiple versions.
Tentatively expand common abbreviations (u=>you, 4=>for, 2=>two, w/=>with, etc. etc.)
Tentatively strip anything in parentheses
Strip English articles (a, an, the). Maybe even strip all very short words (3 or less characters) as a first pass.
Doing this well is complicated and will require a lot of trial and error. I've done a lot of contact de-dupping in the past, and one piece of advice: start conservative. It is very easy to accidentally de-dupe way too much. Build a big list of test data that you've de-duped by hand and test, test, test after every algorithm change. Make sure your UI can present the user with anything you're uncertain about, because there are going to be many, many records that you can't be certain about. (This is true even when you do it by hand. Look at a big list of human-entered titles and tell me which ones are duplicates 100% without listening to the tracks. A computer isn't going to do better than you at this.)
I'm not aware of any publicly available library for this. It's been solved by many people many times (search for "dedupe song titles" or anything similar). But it's generally commercial software.
One more piece of advice for this, since it's a huge O(n^2) or worse problem. Look for bucketing opportunities. If you can match artists first, then albums, then tracks, you can divide and conquer in much less time.

URL Structure: Lower case VS Upper case

Just trigger in my mind when I was going through some websites were they having upper case and lower case combination in url something like http://www.domain.com/Home/Article
Now as I know we should always use lowercase in url but have not idea about technical reason. I would like to learn from you expert to clear this concept why to use lowercase in url. What are the advantages and disadvantages for upper case url.
The domain part is not case sensitive. GoOgLe.CoM works. You can add uppercase as you like, but normally there's not a reason to do so and, as stated in the comments below, may hurt your SEO ranking.
The path part is or is not case sensitive, depending on the server environment and server. Typically Windows machines are case insensitive, while Linux machines are case sensitive. This means that you should stick to lowercase or you risk introducing a bug that's really hard to hunt down (mismatched case that doesn't matter on the dev server).
The query string part is available to the server as it is. You can readily use mixed-case as you like, or discard the case (toLowerCase(...)). This also means that using a base64-encoded keys will work. You can't expect the users to type that correctly, though.
The hash part (called "fragment identifier") is only available to the client code, not to the server. Javascript may distinguish between the cases as it likes, and so does the browser. url#a will scroll to the element with the ID a, but url#A won't.
I'm going to have to disagree with all established wisdom on this, so I'll probably get downvoted, but:
If you redirect all mixed case urls to your properly cased url, it solves all the problems mentioned. Therefore it seems this argument is coming from tradition and preference. The point of a URL is to have a user-friendly representation of a page, and if your url is friendlier with upper case, why not use it? Compare:
moviesforyoutowatch.com/batman-vii-the-dark-knight-whatevers
MoviesForYouToWatch.com/Batman-VII-The-Dark-Knight-Whatevers
I find the mixed case version superior for the purpose. If there's a technical reason that can't be solved with a lower-case compare and redirect, please share it.
I know you asked for technical reasons but it's also worth considering this from a UX perspective.
Say you have a URL with upper case characters and, for arguments sake, this has been distributed on printed media. When a user comes to enter that URL into their browser they may well be compelled to match that case (or be forced to match the specified case if your web server is case sensitive) ultimately you are giving them more work to do as they have to consider case as well. After all, they don't know if your server is case sensitive or not and they may have experienced 404s from case sensitive web servers in the past.
If your server is case sensitive and you are using mixed case URLs you are giving more scope for the user to mistype the URL. Furthermore, say you have the URL www.example.com/Contact. It's easy to confuse an upper and lower case "c" (especially if it is copied in hand writing) if the user overlooks this and uses the wrong case they may never reach your content.
With all this in mind consider www.example.com/News/Articles/FreeIceCreamForAll. On keyboard that's not too difficult but consider this on a mobile device, it would be very fiddly to input.
The reverse is also true should a user want to write down a URL from the address bar. They may feel they need to match the case, ultimately giving them more work to do and increasing the likelyhood of errors.
To conclude; keep URLs lower case.
REGARDING SECURITY ASPECTS OF THIS ISSUE:
There is actually a good security reason to use a mix of uppercase and lowercase.
It has the effect of confusing and blocking attackers !
In human conversation humans get easily confused with uppercase and lowercase use.
Humans can't "speak" the word of the "identifiers or passwords or url's" with clarity if they contain uppercase and lowercase.
This helps with security on data or passwords on site sub-parts that are provided as part of a locked-in or secure sub-part of an "automated access" part of sites or their data.
It's similar to NOT USING JSON.
JSON is "human-readable text" and so JSON is simply giving all the attackers (Including Governments, Google .. who steal your ideas and data) ... almost everything they need to know about the data ... it's much more secure to confuse them by using private bespoke very-fast "binary protocols" - that use your own "unknowable data structures" ... but just watch out, because it is actually possible to confuse yourself or your own development team.
All your security layers and protocols have to be "well managed" to avoid confusion.
There is therefore an extra level of site and data security from human attackers (and some robots) to be had by simply using totally unconventional systems (i.e. why on earth would anybody want to use a "standard security protocol" when by some simple heavyweight prior computing they can all be easily broken).
Just "salt and hash" everything - plus also add some extra extra bespoke security of your own - it's just commonsense !
Conclusion: All the above answers are very clear and correct - but you can also happily leverage that very same knowledge to confuse potential attackers.

Designing a Non-Specific Language Application, e.g. planning for localization

Made this community wiki :3
I'm developing a basic RPG, and one of my goals from the beginning is to make sure that my program is language non-specific. Basically, before I design or start programming any menus, I want to make sure that I can load and display them out of supported languages so I am not hard-coding in values.
(It would save me from many migranes down the road)
For this example, let's use Western Left-to-Right languages. English, Spanish, German, French, Italian.
This is a basic example of what I have.
One XML file contains a mapping and design of a conversation.
<conversation>
<dialog>line1</dialog>
<dialog>line2</dialog>
</conversation>
Other XML files contains the definitions.
<mappings language="English">
<line1>This is line 1 in English!</line1>
<line2>Other lines are contained in language-separated xml files</line2>
</mappings>
Heh. This would work great, besides the fact that I forgot that English doesn't assign genders to their words, whereas other languages do. So, where one sentence might be enough in English, I might need to have two sentences in other languages, one to cover the masuline tense and the other to cover the feminine tense.
What would be the most condusive way of solving this problem? Right now, I've considered coming up with different mapping tables, one excuslively for masculine-tense sentences whereas the other table would cover just feminine-tenses. Or just reading from different defintion tables.
And another kicker would be based within my game data design. I never thought about it, but I might need to store within my game items and characters their sexes so I can use the correct sentence. However, other languages might have their own specific quirks that I would need to consider as well (though thankfully, from what I know Italian and Spanish are relatively similar, and French possibly as well.)
So, obviously this is a huge task ahead of me. What other design considerations should I think of? Rightnow, I'm thinking a static class would be easiest. Configure selected language at startup, throw in inputs and hopefully get a string back.
Any ideas (looking to throw ideas around :P)
There's two general ways to approach this: brute force and trying to be clever. Brute force means writing each possible line and including it with your XML files. It's a lot of work, but it will work.
Trying to be clever gets into deep water, fairly fast, particularly if you're trying to cover a whole lot of languages.
You need to keep more information about characters than gender. In Russian, for example, there are different words meaning "you" depending on whether you're being informal or formal (or talking to multiple people), and the verb endings are also different. There are different translations of "please pass the bread" depending on the formality. In other languages, getting the translation right depends on social status.
There are issues, as pawel_dyda pointed out, with singular, plural, and possibly dual case. Other languages also use different word orders: "The arrows are X coppers each, so to buy Y arrows you'll need Z silver" may require you to keep track of the order of the numbers.
Visual C++ and MFC come with internationalization facilities that are actually pretty good. You'd keep the strings in a resource file, and it's possible to substitute numbers and the like in while keeping the order correct for different languages.
Look up "internationalization" (often abbreviated to "i18n") on the web. There's plenty of stuff out there.
As for genders you may try encourage translators to use non-gender specific translations (which is usually possible in business applications but might be impossible here).
You may have also encounter the problem somewhere else. Other (non-English) languages have multiple plural forms. For example: "Your team has acquired 2 swords". No matter how many swords you will actually receive, be it 5 or 1000, in English you will always end up with one plural sentence. But this is not the case in many languages.

Descriptive URLs vs. Basic URLs

I have a website and I'm employing Clean URLs to all of the links. I'm wondering what the opinion is about short, basic URLs versus longer, descriptive URLs.
For instance, if my website was about Georgia Bulldog football news, which would be better for SEO purposes?
http://www.example.com/news
or
http://www.example.com/georgia-bulldog-football-news
I've read quite a bit, but I'm torn on the simple vs. descriptive factor. Can anyone give opinions based on SEO experience?
The descriptive format, as the search engine can pick up keywords inside the URL. Apart from that, I don't think there's much difference. I personally prefer the simple format, but I'm obsessed with URLs!
Think of it in terms of the end user.
I don't know how much Google really uses URLs in its rankings. It's something that can be so obviously spoofed (like keywords) that I suspect it's low in their algorithm. The heart of what they do is to count incoming links, and trying to discern meaning of the actual page contents.
But users appreciate readable URLs. It gives them a hint of what they will be getting. I know that a readable URL greatly increases the likelihood that I will click on something (in an email, say).
See here and here for lots of detail on this topic.
No one but Google knows for sure exactly how much this factors into rankings, but it helps and Google recommends that you use hyphens (just as you demonstrated). This also tends increase clickthroughs from search result pages. I found this article very useful:
http://searchengineland.com/supercharge-your-urls-for-maximum-seo-impact-14006
Readability is nice, and may help your rankings. In your example the exact domain is important e.g.
georgiabulldogs.com/news
and
southerncollegesports.com/news
will leave a user with very different expectations.
In some cases typability is also important, and long hyphenated or ID ridden URLs are terrible anytime you may expect people to type in a URL.
I love the second one.
No.1 In the perspective of SEO, It will be better if you add keywords in your URL.(it will be better to add keywords than null,not sure exactly how much this count in ranking though) and take SRACKOVERFLOW.COM is doing the same.
No.2 In the standpoint of visitors, the second one is readable. It good for end users, there is no reason for google to not count this element!

Pros and cons of using DB id in the URL?

For example: http://stackoverflow.com/questions/396164/exposing-database-ids-security-risk and http://stackoverflow.com/questions/396164/blah-blah loads the same question.
(I guess this is DB id of Questions table? Is this standard in ASP.NET?)
What are the pros and cons of using this type of scheme in your web app?
Well, for one, simple id's are usually sequential, so it's quite easy to guess at and retrieve other data from your application.
Load JSON at runtime rather than dynamically via AJAX
https://stackoverflow.com/questions/395858/doesnt-matter-what-I-type-here
Now, having said that, that might also be seen as a bonus, because nobody in their right mind would make their whole security hinge on the fact that you have to clink on a link to get to your secure data, and thus easy discoverability of the data might be good.
However, one point is that you're at some point going to reindex your database, having something that makes the old url's invalid would be bad, if for no other reason that search engines would still have old links.
Also, here on SO it's quite normal to use links like this to other questions, so if they at some point want to reindex and thus renumber things (or move to guid's), they will still have to keep the old structure and id's.
Now, is this likely to ever happen or be needed? Probably no.
I wouldn't worry too much about it, just build your security as though every entrypoint to your application is known and there should be no problems.
The database ID is used to lookup the question in the database. It's numerical which means: fast. If you would leave it out you had to lookup the title which is a lot slower.
The question itself is part of the url to make it "search engine friendly". It'll be higher ranked by g**gle etc.
Pro:
Super easy to retrieve the page information. Take the ID, call the database, viola. Your table will (should) be indexed to make this lookup super fast.
Guaranteed unique URL.
Con:
IDs in your system are being publicly displayed. Not a problem in a publicly available system like SO. However, proper security measures on the back end can make this not a problem even on sensitive systems.
Ugly URLs. 6+ digit numbers are just hard to remember, and makes it more difficult to distinguish pages, if the number is all that identifies it. This can also has SEO consequences, as URLs with more relevant and well structured information are generally ranked better. SO compensates by providing the post name in the URL as well. While I still can't rattle off a particular post to my buddy at lunch, I can still find it easier in the browser history.
Slower lookups. Doing text searches on a database is generally slower.
But remember in a community like this there is a higher (although still minimal) chance of the same question name being posted at the same time, which would break things, thus some kind of unique identification need be applied, ID's are probably quite logical in the context that this particular web application was developed in.
I dont think it's bad practice, and fairly common, to do it in ASP.NET and other frameworks. As #lassevk said, if your security depends on it, then you need some more checks in there (can user X get to record Y), but it more comes down to the SEO-friendlyness of the URLs for public sites.
For example, SO's URLs are fairly friendly:
Pros and cons of using DB id in the URL?
google rates information at the START of the URL higher than at the end, so having it look like:
https://stackoverflow.com/pros-and-cons-of-using-db-id-in-the-url/q/407120
should get a higher ranking for "pros and cons of using db id in the url". It's not the only factor, but it is quite a major one - look at Amazon's format, they do it for a very good reason:
http://www.amazon.com/Maverick-Ricardo-Semler/dp/0712678867
http://server/book-name/dp/book-id
Wordpress does it like this:
http://server/yyyy/mm/dd/name-of-the-post
however, if you post two posts on the same day called "foo", you get:
http://server/yyyy/mm/dd/foo
http://server/yyyy/mm/dd/foo2
the slug (foo/foo2) isn't a PK, but it IS maintained as unique over the posts table.
I think putting the ID in the URL isn't a problem, unless your URL is a GUID! Way too long, and hard to type. If it's an int, or some kind of short guid (eg 6-8 chars), then it shouldn't be a problem.

Resources