I see often (rewritten) URLs without ID in it, like on some wordpress installations. What is the best way of achieve this?
Example: site.com/product/some-product-name/
Maybe to keep an array of page names and IDs in cache, to avoid DB query on every page request?
How to avoid conflicts, and what are other issues on using urls without IDs?
Using an ID presents the same conundrum, really--you're just checking for a different value in your database. The "some-product-name" part of your URL above is also something unique. Some people call them slugs (Wordpress, also permalinks). So instead of querying the database for a row that has the particular ID, you're querying the database for a row that has a particular slug. You don't need to know the ID to retrieve the record.
As long as product names are unique it shouldn't be an issue. It won't take any longer (at least not significant) to look up a product by unique name than numeric ID as long as the column is indexed.
Wordpress has a field in the wp_posts table for the slug. When you create the post, it creates a slug from the post title (if that's how you have it configured), replacing spaces with dashes (or I think you can set it to underscores). It also takes out the apostrophes, commas, or whatnot. I believe it also limits the overall length of the slug, too.
So, in short, it isn't dynamically decoding the URL into the post's title--there's a field in the table that matches the URL version of the post name directly.
As you may or may not know, the URLs are being re-written with Apache's mod_rewrite module. As mentioned here, Wordpress is, in the background, assigning a slug after sanitizing the title or post name.
But, to answer your question, what you're describing is Wordpress' "Pretty Permalinks" feature and you can learn more about it in the Wordpress codex. Newer versions of Wordpress do the re-writing internally (no .htaccess editin, wp_rewrite instead). Which is why you'll see the same ruleset for any permalink structure.
Though, if you do some digging you can find the old rewrite rules. For example:
RewriteRule ^([0-9]{4})/([0-9]{1,2})/([0-9]{1,2})/?$ /index.php?year=$1&monthnum=$2&day=$3 [QSA,L]
Will take a URL like /2008/01/01/ and direct it to /index.php?year=2008&monthnum=01&day=01 (and load a date category).
But, as mentioned, a page like product-name exists only because Wordpress already sanitized the post title and stored it as a field in the database.
Related
I know with symfony2 is very trivial get pretty urls through routing system and I love it. But when the routes parameters are based only in slugs I've got to find by slug.
$em->getRepository('Bundle:Entity')->findOneBySlug($slug);
I thinking about combine both parameters like stackoverflow http://mysite.com/articles/234/the-title. Mantaining the slug parameter only for SEO proposes and find directly with the entity id (234).
$em->getRepository('Bundle:Entity')->find($id);
What are the pro / cons using this strategy. I'm right way?
I would go as you suggested and use both an unique identifier and a slug, because you do not have to worry about unique slugs this way.
But one thing you should is check if the slug is valid.
So do not use URLs like this: /articles/{id}/{unchecked-slug}, because if you do that you can reach the same article with an unlimited number of different/evil URLs, i.e. /articles/123/the-correct-title and /artcle/123/some-dirty-words.
So i would suggest using something like this:
$em->getRepository('Bundle:Entity')->findOneBy(array('slug' => $slug, 'id' => $id);
I am not a SEO expert, but I do not think, that shorter URLs are THAT important, as long as it contains useful words, that may be part of a search.
From a pure SEO perspective, you want to have a shorter URL since they tend to attract more clicks and are easier to share. However, catering to only SEO would be a mistake IMHO.
Adding a unique identifier to the string would be a smart thing to do, and would make things easier to lookup and maintain. I would suggest putting the unique identifier at the end of the URL string to maximize the "SEO effect".
Keywords in the URL might be a ranking signal, but really they drive up the CTR if the keywords found in the URL match the user's query. When that happens, the keywords in the URL become bolded in the Search Results Page (SERP). By putting the ID at the end of the URL, you're helping to ensure that the keywords in the slug have a better chance of appearing to the user, which means a better chance of being bolded, which hopefully leads to more CTR.
Here's what I would suggest:
http://example.com/articles/the-title-234
No one has suggested it so far, so I'll offer what WordPress does. If there is already a permalink in the database that is identical to the one being supplied, you simply concatenate a counter at the end.
http://example.com/blog/my-article
becomes
http://example.com/blog/my-article-2
becomes
http://example.com/blog/my-article-3
The method eywu suggested is second best, but only because you still have the full ID in the permalink. No one wants to remember that, and it has no meaning to search engines.
So I wish to implement URL rewrite once my site is done but I wish to have it in this format.
site.com/city/example-deal
Currently once a city is chosen it links to a page in the following format:
site.com/city.php?city=atlanta
Then once on that page, a deal is selected from there and it links to the next page:
site.com/deal.php?deal=123
With that in mind, could I rewrite it as such with my current linking structure:
site.com/atlanta/example-deal or do I have to link the page as such:
site.com/city.php?city=atlanta/deal.php?deal=123 in order to get the final URL rewrite structure I'm looking for.
Hopefully I explained this right and thanks for the help!
What you need to do is so that deal.php reads in the city in the query string.
You should also slugify your deal as well so that you can derive the deal id from the deal slug.
Here's an example of a slugify function in php:
http://sourcecookbook.com/en/recipes/8/function-to-slugify-strings-in-php
RewriteRule ^([^\.^/]+)/deals/(.*)$ deal.php?city=$1&deal_slug=<deal_slug> [QSA]
Also your deal table in MySQL should be modified to store the slug. With that your deal.php can be modified so that:
// get deal slug from query string
//select from deal table where deal slug = submitted deal slug
// continue with normal code.
I am using ColdFusion 9.
I am creating a brand new site that uses three templates. The first template is the home page, where users are prompted to select a brand or a specific model. The second template is where the user can view all of the models of the selected brand. The third template shows all of the specific information on a specific model.
A long time ago... I would make the URLs like this:
.com/Index.cfm // home page
.com/Brands.cfm?BrandID=123 // specific brand page
.com/Models.cfm?ModelID=123 // specific model page
Now, for SEO purposes and for easy reading, I might want my URLs to look like this:
.com/? // home page
.com/?Brand=Worthington
.com/?Model=Worthington&Model=TX193A
Or, I might want my URLs to look like this:
.com/? // home
.com/?Worthington // specific brand
.com/?Worthington/TX193A // specific model
My question is, are there really any SEO benefits or easy reading or security benefits to either naming convention?
Is there a best URL naming convention to use?
Is there a real benefit to having a URL like this?
http://stackoverflow.com/questions/7113295/sql-should-i-use-a-junction-table-or-not
Use URLs that make sense for your users. If you use sensible URLs which humans understand, it'll work with search engines too.
i.e. Don't do SEO, do HO. Human Optimisation. Optimise your pages for the users of your page and in doing so you'll make Google (and others) happy.
Do NOT stuff keywords into URLs unless it helps the people your site is for.
To decide what your URL should look like, you need to understand what the parts of a URL are for.
So, given this URL: http://domain.com/whatever/you/like/here?q=search_terms#page-frament.
It breaks down like this:
http
what protocol is used to deliver the page
:
divides protocol from rest of url
//domain.com
indicates what server to load
/whatever/you/like/here
Between the domain and the ? should indicate which page to load.
?
divides query string from rest of url
q=search_terms
Between the ? and the # can be used for a dynamic search query or setting.
#
divides page fragment from rest of the url
page-frament
Between the # and the end of line indicates which part of the page to focus on.
If your system setup lets you, a system like this is probably the most human friendly:
domain.com
domain.com/Worthington
domain.com/Worthington/TX193A
However, sometimes a unique ID is needed to ensure there is no ambiguity (with SO, there might be multiple questions with the same title, thus why ID is included, whilst the question is included because it's easier for humans that way).
Since all models must belong to a brand, you don't need both ID numbers though, so you can use something like this:
domain.com
domain.com/123/Worthington
domain.com/456/Worthington/TX193A
(where 123 is the brand number, and 456 is the model number)
You only need extra things (like /questions/ or /index.cfm or /brand.cfm or whatever) if you are unable to disambiguate different pages without them.
Remember: this part of the URL identifies the page - it needs to be possible to identify a single page with a single URL - to put it another way, every page should have a unique URL, and every unique URL should be a different page. (Excluding the query string and page fragment parts.)
Again, using the SO example - there are more than just questions here, there are users and tags and so on too. so they couldn't just do stackoverflow.com/7275745/question-title because it's not clearly distinct from stackoverflow.com/651924/evik-james - which they solve by inserting /questions and /users into each of those to make it obvious what each one is.
Ultimately, the best URL system to use depends on what pages your site has and who the people using your site are - you need to consider these and come up with a suitable solution. Simpler URLs are better, but too much simplicity may cause confusion.
Hopefully this all makes sense?
Here is an answer based on what I know about SEO and what we have implemented:
The first thing that get searched and considered is your domain name, and thus picking something related to your domain name is very important
URL with query string has lower priority than the one that doesn't. The reason is that query string is associated with dynamic content that could change over time. The search engine might also deprioritize those with query string fearing that it might be used for SPAM and diluting the result of SEO itself
As for using the URL such as
http://stackoverflow.com/questions/7113295/sql-should-i-use-a-junction-table-or-not
As the search engine looks at both the domain and the path, having the question in the path will help the Search Engine and elevate the question as a more relevant page when someone typing part of the question in the search engine.
I am not an SEO expert, but the company I work for has a dedicated dept to managing the SEO of our site. They much prefer the params to be in the URI, rather than in the query string, and I'm sure they prefer this for a reason (not simply to make the web team's job slightly trickier... all though there could be an element of that ;-)
That said, the bulk of what they concern themselves with is the content within and composition of the page. The domain name and URL are insignificant compared to having good, relevant content in a well defined structure.
I have a news section where the pages resolve to urls like
newsArticle.php?id=210
What I would like to do is use the title from the database to create seo friendly titles like
newsArticle/joe-goes-to-town
Any ideas how I can achieve this?
Thanks,
R.
I suggest you actually include the ID in the URL, before the title part, and ignore the title itself when routing. So your URL might become
/news/210/joe-goes-to-town
That's exactly what Stack Overflow does, and it works well. It means that the title can change without links breaking.
Obviously the exact details will depend on what platform you're using - you haven't specified - but the basic steps will be:
When generating a link, take the article title and convert it into something URL-friendly; you probably want to remove all punctuation, and you should consider accented characters etc. Bear in mind that the title won't need to be unique, because you've got the ID as well
When handling a request to anything starting with /news, take the next part of the path, parse it as an integer and load the appropriate article.
Assuming you are using PHP and can alter your source code (this is quite mandatory to get the article's title), I'd do the following:
First, you'll need to have a function (or maybe a method in an object-oriented architecture) to generate the URLs for you in your code. You'd supply the function with the article object or the article ID and it returns the friendly URL with the ID and the friendly title.
Basically function url(Article $article) => URL.
You will also need some URL rewriting rules to remove the PHP script from the URL. For Apache, refer to the mod_rewrite documentation for details (RewriteEngine, RewriteRule, RewriteCond).
What should I use:
/findby/name/{first}_{last}
/findby/name/{first}-{last}
/findby/name/{first};{last}
/findby/name/first/{first}/last/{last}
etc.
The URI represents a Person resource with 1 name, but I need to logically separate the first from the last to identify each. I kind of like the last example because I can do:
/findby/name/first/{first}
/findby/name/last/{last}
/findby/name/first/{first}/last/{last}
You could always just accept spaces :-) (querystring escaped as %20)
But my preference is to just use dashes (-) ... looks nicer in the URL. unless you have a need to be able to essentially query in which case the last example is better as you noted
Why not use + for space?
I am at a loss: dashes, minuses, underscores, %20... why not just use +? This is how spaces are normally encoded in query parameters. Yes, you can use %20 too but why, looks ugly.
I'd do
/personNamed/Joe+Blow
I like using "_" because it is the most similar character to space that keeps the URL readable.
However, the URLs you provided don't seem really RESTful. A URL should represent a resource, but in your case it represents a search query. So I would do something like this:
/people/{first}_{last}
/people/{first}_{last}_(2) - in case there are duplicate names
It this case you have to store the slug ({first}_{last}, {first}_{last}_(2)) for each user record. Another option to prepend the ID, so you don't have to bother with slugs:
/people/{id}-{first}_{last}
And for search you can use non-RESTful URLs:
/people/search?last={last}&first={first}
These would display a list of search results while the URLs above the page for a particular person.
I don't think there is any use of making the search URLs RESTful, users will most likely want to share links to a certain person's page and not search result pages. As for the search engines, avoid having the same content for multiple URLs, and you should even deny indexing of your search result pages in robots.txt
For searching:
/people/search?first={first}&last={last}
/people/search?first=george&last=washington
For resource paths:
/people/{id}-{first}-{last}
/people/35-george-washington
If you are using Ruby on Rails v3 in standard configuration, here's how you can do it.
# set up the /people/{param} piece
# config/routes.rb
My::Application.routes.draw do
resources :people
end
# set up that {param} should be {id}-{first}-{last}
# app/models/person.rb
class Person < ActiveRecord::Base
def to_param
"#{id}-#{to_slug(first_name)}-#{to_slug(last_name)}"
end
end
Note that your suggestion, /findby/name/first/{first}/last/{last}, is not restful. It does not name resources and it does not name them succinctly.
The most sophisticated choice should always and first of all consider two constraints:
As you'll never know how skilled the developer or the device being implemented on is regarding handling of urlencoding, i will always try to limit myself to the table of safe characters, as found in the excellent rant (Please) Stop Using Unsafe Characters in URLs
Also - we want to consider the client consuming the API. Can we have the whole structure easily represented and accessible in the client side programming language? What special characters would this requirement leave us with? I.e. a $ will be fine in javascript variable names and thus directly accessible in the parsed result, but a PHP client will still have to use a more complex (and potentially more confusing) notation $userResult->{'$mostVisited'}->someProperty... that a shot in your own foot! So for those two (and a couple of other programming environments) underscore seems the only valid option.
Otherwise i mostly agree with #yfeldblum`s response - i'd distinct between a search endpoint vs. the actual unique resource lookup. Feels more REST to me, but more importantly, the two have a significant cost difference on your api server - this way you can easier distinct and i.e. charge a higher costs or rate limit the search endpoint - should you ever need it.
To be Pragmatic, as opposed to a "RESTafarian" the mentioned approach /people/35-george-washington could (and should imho) basically respond to just the id, so if you want a named, urlsafe-for-dummies-link, list the reference as /people/35_george_washington. Other ideas could be /people/35/#GeorgeWashington (so breaking tons of RFCs) or /people/35_GeorgeWashington - the API wouldn't care.