Advanced URLs and URL rewriting - url

I was visiting the site asos.com the other day. If you search 'tshirt' on their site the resulting URL is 'http://www.asos.com/search/tshirt?q=tshirt'. Does anyone know which technique they use to make it seem that the live generate a page called 'tshirt' which basically takes any extension?
Also if you select a product the URL becomes something like: 'http://www.asos.com/ralph_lauren/polo/product.aspx' I know they don't have a file and folder for every brand and item, so how is it possible for the browser to follow this url?
I'm not looking for any code, just a hint on what to google for more information.
Hope this doesn't sound too ignorant!
Many Ragards,
Andreas

In most cases, this sort of functionality (often called clean URL's, user-friendly URL's, or spider-friendly URL's), is achieved through server-side rewrites. To point all requests of a specific known structure to a single backend script for processing.
Now these specific URL's you mention are not, in my opinion, the best examples of clean URL's. I will give you an example however of how such a clean URL might be achieved using Apache mod_rewrite (since Apache is so popular).
Take for example a URL like http://somedomain.com/product/ralph_lauren/polo
You might be able to do something like this in mod_rewrite
RewriteEngine On
RewriteRule /?product/(.*)/(.*) /product.php?cat=$1&subcat=$2 [L]
This would silently (to the end user) redirect the incoming request for any URL's of the structure /product/*/* to a script called /product.php, passing the second and third parts of the URL as cat and subcat parameters to be evaluated by the script.

I'm not sure I understand what you are asking, but in the example you cited it's using a query string which is everything after the '?' in the URL.
On the backend server it uses the variables passed in the query string to determine what to return back to you.

Related

What are URL codes called?

I came across a website with a blog post teaching all how to clear cache for web development purposes. My personal favourite one is to do /? on the end of a web address at the URL bar.
Are there any more little codes like that? if so what are they and where can I find a cheat sheet?
Appending /? may work for some URLs, but not for all.
It works if the server/site is configured in a way that, for example, http://example.com/foo and http://example.com/foo/? deliver the same document. But this is not the case for all servers/sites, and the defaults can be changed anyway.
There is no name for this. You just manipulate the canonical URL, hoping to craft a URL that points to the same document, without getting redirected.
Other common variants?
I’d expect appending ? would work even more often than /? (both, of course, only work if the URL has no query component already).
http://example.com/foo
http://example.com/foo?
You’ll also find sites that allow any number of additional slashes where only one slash used to be.
http://example.com/foo/bar
http://example.com/foo////bar
Not sure if it affects the cache, but specifying the domain as FQDN, by adding a dot after the TLD, would work for many sites, too.
http://example.com/foo
http://example.com./foo
Some sites might not have case-sensitive paths.
http://example.com/foo
http://example.com/fOo

How this particular url rewrite can be done

I have a site where both of the following URLs go to same page.
www.mydomain.com/pd/Children_Products/Toys/1/
www.mydomain.com/pd/Children_Products/Toys_/1/
First URL has Toys, 2nd has Toys_.
This is not good for search engines. I want that when second URL is visited, either return a 404 not found page, or simply redirect to the first URL.
I am not looking for solution to this particular url but also all such instances where this happens in my site for different directories.
Any help would be greatly appreciated.
Thanks!
You should provide more details about your setup. Do you use Apache HTTPD? Do your pages have some kind of code in them, like PHP?
You could take a look at mod_rewrite, which can be configured using regular expressions that would match any _ in the URL. For example:
RewriteEngine On
RewriteRule ^(.*)_/(.*)$ /$1/$2 [R]

Hide website filenames in URL

I would like to hide the webpage name in the url and only display either the domain name or parts of it.
For example:
I have a website called "MyWebSite". The url is: localhost:8080/mywebsite/welcome.xhtml. I would like to display only the "localhost:8080/mywebsite/".
However if the page is at, for example, localhost:8080/mywebsite/restricted/restricted.xhtml then I would like to display localhost:8080/mywebsite/restricted/.
I believe this can be done in the web.xml file.
I believe that you want URL rewriting. Check out this link: http://en.wikipedia.org/wiki/Rewrite_engine - there are many approaches to URL rewriting, you need to decide what is appropriate for you. Some of the approaches do make use of the web.config file.
You can do this in several ways. The one I see most is to have a "front door" called a rewrite engine that parses the URL dynamically to internally redirect the request, without exposing details about how that might happen as you would see if you used simple query strings, etc. This allows the URL you specify to be digested into a request for a master page with specific content, instead of just looking up a physical page at that location to serve.
The StackExchange sites do this so that you can link to a question in a semi-permanent fashion (and thus can use search engines with crawlers that log these URLs) without them having to have a real page in the file system for every question that's ever been asked (we're up to 9,387,788 questions as of this one).

How to improve the structure of URLs

From the article at google's webmaster center and SEO's pdf, I think I should improve my website's URLs structure.
Now the news url looks like "news.php?id=127591". I need to rewrite it to something like "/news/127591/this-is-article-subject"
The problem is if I change the structure of url to the new one. Can I still keep the old one working? If both url working, how to avoid search engine like google and bing to search twice times for one article?
Thanks!
HTTP 301 permanent redirect from the old URL to the new URL
an HTTP 301 redirect has the property of communicate a new (permanent) URL for an old (outdated) ressource to google (and other clients). google will transfer most/all of the allocated value from the old URL to the new URL.
Also, in order to improve the arquitecture of your website, you must keep a clean structure by inserting links within all its pages/posts. But be careful, you must not do this lightly, or Google´s robot will get confused and leave.
Structure is key to your SEO
1. Find one page which is the "really important page" for any given keyword
2. direct relevant content from other pages which is relevant to that particular kw
3. repeat with every relevan kw
I´m gonna leave this post for you, where I explain this more in depth, hoping that you understand spanish. http://coach2coach.es/la-estructura-web-es-la-base-del-posicionamiento/
Yep.. you can use robots.txt to exclude news.php, and create an xml sitemap with the new URLs. mod_rewrite can be set to only change directories, with trailing slashes.. so all files in your root directory should work fine.

Enable Query Strings in Code Igniter

I am trying to implement Twitter's OAuth into my Code Igniter web application at which the callback URL is /auth/ so once you have authenticated with Twitter you are taken to /auth/?oauth_token=SOME-TOKEN.
I want to keep the nice clean URL's the framework provides using the /controller/method/ style of URL but I want to enable query strings as well, there will only ever be one name of the data oauth_token so it's ok if it has to be hard coded.
Any ideas?
I have tried tons of the things people are saying to do, but none work :(
PS: I'm using the .htaccess method of URL rewriting.
There are several ways to handle this.
Most People, and Elliot Haughin's Twitter Lib, extend the CI_Input library with a MY_Input library that sets allow_query_strings to true
You will also need to add ? to the allowed characters in config/config.php and set $config['url_protocal'] to PATH_INFO
see here: Enable GET in CodeIgniter
Codeigniter Reactor lets you access $_GET directly or via $this->input->get(). You don't need to use MY_Input or even change your config.php. This method leaves the query string in the URL, however.
I used a hacked index.php to recognise users coming back from Twitter, check for valid and safe values, then re-direct it to to a CodeIgniter friendly URL.
It may not be to everyones taste but I preferred it over allowing query strings throughout the entire application instead of just one particular circumstance.

Resources