How this particular url rewrite can be done - url

I have a site where both of the following URLs go to same page.
www.mydomain.com/pd/Children_Products/Toys/1/
www.mydomain.com/pd/Children_Products/Toys_/1/
First URL has Toys, 2nd has Toys_.
This is not good for search engines. I want that when second URL is visited, either return a 404 not found page, or simply redirect to the first URL.
I am not looking for solution to this particular url but also all such instances where this happens in my site for different directories.
Any help would be greatly appreciated.
Thanks!

You should provide more details about your setup. Do you use Apache HTTPD? Do your pages have some kind of code in them, like PHP?
You could take a look at mod_rewrite, which can be configured using regular expressions that would match any _ in the URL. For example:
RewriteEngine On
RewriteRule ^(.*)_/(.*)$ /$1/$2 [R]

Related

Joomla URL junk

So, I've set up a site and have Search Engine Friendly URLs on YES, I've set up page aliases and my main URLs are fine but those pages, for some reason, can be accessed trough some weird links like mysite.com/component/content/article/17-category/61-article-name.html instead of just mysite.com/category/article-name.html like I want it and like I have it in my sitemap.
Why is joomla generating these redundant URLs and how to get rid of them (so when somebody clicks on them in google it takes him to 404)?
Thanks
PS. answer on question How to clean up Joomla! URLs? does not help me.
As per http://magazine.joomla.org/issues/issue-june-2013/item/1054-duplicate-pages-joomla-causes-errors-solutions
I used 301 Redirect in .htaccess file to redirect from doubled URLs

Google indexing only specific domain

I have a problem with Google indexing.
The thing is that I have several domains (language mutations) example: www.example.com, www.example.co.uk, www.example.de.
So each site has its own language and its own SEO links, example:
search/en-us/building/window/77/1/
search/en-gb/building/window/77/1/
search/de-de/gebaude/fenster/77/1/
Now Google is indexing those SEO url links with wrong domains like www.example.co.uk/search/de-de/gebaude/fenster/77/1/ which should obviously be correctly: www.example.de/search/de-de/gebaude/fenster/77/1/
Every language is bound to its own domain so there is no possibility that EN-GB leads to a German language mutation on www.example.de
I am open to all advice. Thank you in advance.
This kind of thing happens. Its easier than you think to create a bad link. One hard coded domain name in the wrong place and you have a link to somewhere you didn't mean to link. Once Googlebot finds a site served on a wrong domain, it will crawl the whole thing. Even if you are correct, that you would never create such a link, maybe there is an external link created by a user in which they got the domain name confused.
The technique that you need to apply is called URL canonicalization. You have two options:
Put the rel canonical meta tag on each of your pages. search/de-de/gebaude/fenster/77/1/ would have a canonical URL in the tag which has the correct domain: http://www.example.de/search/de-de/gebaude/fenster/77/1/ You'd have to make sure that the domain in the canonical tag is the .de domain whether that page were accessed on the .co.uk domain or on the .de domain.
Use 301 redirects to correct any URLs that are wrong. Have your server detect that a domain name is wrong. 301 redirect www.example.co.uk/search/de-de/gebaude/fenster/77/1/ to www.example.de/search/de-de/gebaude/fenster/77/1/
Another random piece of SEO advice: I would remove the word "search" from your URLs. Google doesn't like to index site search results in its search results. I've seen cases where it assumes that the word "search" in the URL indicates that the page shouldn't be indexed for this reason.

Advanced URLs and URL rewriting

I was visiting the site asos.com the other day. If you search 'tshirt' on their site the resulting URL is 'http://www.asos.com/search/tshirt?q=tshirt'. Does anyone know which technique they use to make it seem that the live generate a page called 'tshirt' which basically takes any extension?
Also if you select a product the URL becomes something like: 'http://www.asos.com/ralph_lauren/polo/product.aspx' I know they don't have a file and folder for every brand and item, so how is it possible for the browser to follow this url?
I'm not looking for any code, just a hint on what to google for more information.
Hope this doesn't sound too ignorant!
Many Ragards,
Andreas
In most cases, this sort of functionality (often called clean URL's, user-friendly URL's, or spider-friendly URL's), is achieved through server-side rewrites. To point all requests of a specific known structure to a single backend script for processing.
Now these specific URL's you mention are not, in my opinion, the best examples of clean URL's. I will give you an example however of how such a clean URL might be achieved using Apache mod_rewrite (since Apache is so popular).
Take for example a URL like http://somedomain.com/product/ralph_lauren/polo
You might be able to do something like this in mod_rewrite
RewriteEngine On
RewriteRule /?product/(.*)/(.*) /product.php?cat=$1&subcat=$2 [L]
This would silently (to the end user) redirect the incoming request for any URL's of the structure /product/*/* to a script called /product.php, passing the second and third parts of the URL as cat and subcat parameters to be evaluated by the script.
I'm not sure I understand what you are asking, but in the example you cited it's using a query string which is everything after the '?' in the URL.
On the backend server it uses the variables passed in the query string to determine what to return back to you.

Remove multiple indexed URLs (duplicates) with redirect

I am managing a website that has only about 20-50 pages (articles, links and etc.). Somehow, Google indexed over 1000 links (duplicates, same page with different string in the URL). I found that those links contain ?date= in url. I already blocked by writing Disallow: *date* in robots.txt, made an XML map (which I did not had before) placed it into root folder and imported to Google Webmaster Tools. But the problem still stays: links are (and probably will be) in search results. I would easily remove URLs in GWT, but they can only remove one link at the time, and removing >1000 one by one is not an option.
The question: Is it possible to make dynamic 301 redirects from every page that contains $date= in url to the original one, and how? I am thinking that Google will re-index those pages, redirect to original ones, and delete those numerous pages from search results.
Example:
bad page: www.website.com/article?date=1961-11-1 and n same pages with different "date"
good page: www.website.com/article
automatically redirect all bad pages to good ones.
I have spent whole work day trying to solve this problem, would be nice to get some support. Thank you!
P.S. As far as I think this coding question is the right one to ask in stackoverflow, but if I am wrong (forgive me) redirect me to right place where I can ask this one.
You're looking for the canonical link element, that's the way Google suggests to solve this problem (here's the Webmasters help page about it), and it's used by most if not all search engines. When you place an element like
<link rel='canonical' href='http://www.website.com/article'>
in the header of the page, the URI in the href attribute will be considered the 'canonical' version of the page, the one to be indexed and so on.
For the record: if the duplicate content is not a html page (say, it's a dynamically generated image), and supposing you're using Apache, you can use .htaccess to redirect to the canonical version. Unfortunately the Redirect and RedirectMatch directives don't handle query strings (they're strictly for URIs), but you could use mod_rewrite to strip parts of the query string. See, for example, this answer for a way to do it.

301 Redirects - Advanced?

I am in a situation where there are TWO version sof each page on my site - which runs into thousands....now this is causing all sorts of problems with Google, I am dropping down the search results due to duplicate content. This was created as a result of enabling "SEO Friendly URLs" on my site.
Is there a way that I can rewrite ALL pages taht start with say brands.php to their SEO friendly version? e.g. /products.php?product=Oil-Pump-Star to /prducts/oil-pump-star/....without having to go through each URL manually...
Apologies if this is confusing - I find it hard to put the exact situation into written words!
Any input is appreciated!
this looks like you are using a Joomla CMS. you could use rel="canonical" to get away with this, but this will have to be done manually unfortunately. Google still suggests using a 301 redirect, and recommends a rel="canonical" only where 301 is not possible.
I will let you decide whats best for you.
It's hard to give you an example without knowing the type of URLs your system is setup for. However, based on the example you gave, you could do something like this:
RewriteRule ^([0-9a-zA-Z-]+).php?([0-9a-zA-Z]+)=([0-9a-zA-Z_-]+)$ $1/$3 [NC]
I have not tested it, so it may need some tweaking. You'll need to adjust your rules accordingly to work with the URLs you are trying to restructure.

Resources