How to improve the structure of URLs - url

From the article at google's webmaster center and SEO's pdf, I think I should improve my website's URLs structure.
Now the news url looks like "news.php?id=127591". I need to rewrite it to something like "/news/127591/this-is-article-subject"
The problem is if I change the structure of url to the new one. Can I still keep the old one working? If both url working, how to avoid search engine like google and bing to search twice times for one article?
Thanks!

HTTP 301 permanent redirect from the old URL to the new URL
an HTTP 301 redirect has the property of communicate a new (permanent) URL for an old (outdated) ressource to google (and other clients). google will transfer most/all of the allocated value from the old URL to the new URL.

Also, in order to improve the arquitecture of your website, you must keep a clean structure by inserting links within all its pages/posts. But be careful, you must not do this lightly, or Google´s robot will get confused and leave.
Structure is key to your SEO
1. Find one page which is the "really important page" for any given keyword
2. direct relevant content from other pages which is relevant to that particular kw
3. repeat with every relevan kw
I´m gonna leave this post for you, where I explain this more in depth, hoping that you understand spanish. http://coach2coach.es/la-estructura-web-es-la-base-del-posicionamiento/

Yep.. you can use robots.txt to exclude news.php, and create an xml sitemap with the new URLs. mod_rewrite can be set to only change directories, with trailing slashes.. so all files in your root directory should work fine.

Related

How to delete old Google Urls with parameters

a month ago i relaunched a Website in Typo3 CMS. Before that, the site was hosted with Joomla CMS.
In Joomla Config, SEO Links were disabled, so Google indexed the Page Urls this:
www.domain.de/index.php?com_component&itemid=123....
for example.
Now, a month later (after the Typo3 Relaunch), these Links are still visible in Google because the Urls don't return a 404-Error. That's because "index.php" also exists on Typo3 and Typo3 doesnt care about the additional query string/variables - it returns a 200 status code and shows the front page.
In Google Webmaster Tools it's possible to delete single Urls from the Google Index, but that way i have to delete about 10000 Urls manually...
My Question is: Is there a way to remove these old Urls from the Google Index?
Greetings
With this amount of URL's there is only one sensible solution, implement the proper 404 handling in your TYPO3, or even better redirections to same content placed in TYPO3.
You can use TYPO3's handler (search for it in Install Tool > All configuration) it's called pageNotFound_handling, you can use options like REDIRECT for redirecting to some page or even USER_FUNCTION, which allow you to use own PHP script, check the description in the Install Tool.
You can also write a simple condition in TypoScript and check if Joomla typical params exists in the URL - so that easy way you can return custom 404 page. If it's important to you to make more sophisticated condition (for an example, you want to redirect links which previously pointed to some gallery in Joomla, to new gallery in TYPO3) you can make usage of userFunc condition and that would be probably best option for SEO
If these urls contain an acceptable number of common indicators, you could redirect these links with a rule in your virtual host or .htaccess so that google will run into the correct error message.
I wrote a google chrome extension to remove urls in bulk in google webmaster tools. Check it out here: https://github.com/noitcudni/google-webmaster-tools-bulk-url-removal.
Basically, it's a glorified for loop. You put all the urls in a text file. For example,
http://your-domain/link-1
http://your-domain/link-2
Having installed the extension as described in the README, you'll find a new "choose a file" button.
Select the file you just created. The extension reads it in, loops thru all the urls and submits them for removal.

Remove multiple indexed URLs (duplicates) with redirect

I am managing a website that has only about 20-50 pages (articles, links and etc.). Somehow, Google indexed over 1000 links (duplicates, same page with different string in the URL). I found that those links contain ?date= in url. I already blocked by writing Disallow: *date* in robots.txt, made an XML map (which I did not had before) placed it into root folder and imported to Google Webmaster Tools. But the problem still stays: links are (and probably will be) in search results. I would easily remove URLs in GWT, but they can only remove one link at the time, and removing >1000 one by one is not an option.
The question: Is it possible to make dynamic 301 redirects from every page that contains $date= in url to the original one, and how? I am thinking that Google will re-index those pages, redirect to original ones, and delete those numerous pages from search results.
Example:
bad page: www.website.com/article?date=1961-11-1 and n same pages with different "date"
good page: www.website.com/article
automatically redirect all bad pages to good ones.
I have spent whole work day trying to solve this problem, would be nice to get some support. Thank you!
P.S. As far as I think this coding question is the right one to ask in stackoverflow, but if I am wrong (forgive me) redirect me to right place where I can ask this one.
You're looking for the canonical link element, that's the way Google suggests to solve this problem (here's the Webmasters help page about it), and it's used by most if not all search engines. When you place an element like
<link rel='canonical' href='http://www.website.com/article'>
in the header of the page, the URI in the href attribute will be considered the 'canonical' version of the page, the one to be indexed and so on.
For the record: if the duplicate content is not a html page (say, it's a dynamically generated image), and supposing you're using Apache, you can use .htaccess to redirect to the canonical version. Unfortunately the Redirect and RedirectMatch directives don't handle query strings (they're strictly for URIs), but you could use mod_rewrite to strip parts of the query string. See, for example, this answer for a way to do it.

Hide website filenames in URL

I would like to hide the webpage name in the url and only display either the domain name or parts of it.
For example:
I have a website called "MyWebSite". The url is: localhost:8080/mywebsite/welcome.xhtml. I would like to display only the "localhost:8080/mywebsite/".
However if the page is at, for example, localhost:8080/mywebsite/restricted/restricted.xhtml then I would like to display localhost:8080/mywebsite/restricted/.
I believe this can be done in the web.xml file.
I believe that you want URL rewriting. Check out this link: http://en.wikipedia.org/wiki/Rewrite_engine - there are many approaches to URL rewriting, you need to decide what is appropriate for you. Some of the approaches do make use of the web.config file.
You can do this in several ways. The one I see most is to have a "front door" called a rewrite engine that parses the URL dynamically to internally redirect the request, without exposing details about how that might happen as you would see if you used simple query strings, etc. This allows the URL you specify to be digested into a request for a master page with specific content, instead of just looking up a physical page at that location to serve.
The StackExchange sites do this so that you can link to a question in a semi-permanent fashion (and thus can use search engines with crawlers that log these URLs) without them having to have a real page in the file system for every question that's ever been asked (we're up to 9,387,788 questions as of this one).

Google indexing urls redirect 301

Let say my site has the following URLs indexed in Google:
/test/1
/test/2
/test/3
For some reasons, I want those same pages to have the following URLs:
/test/abc
/test/def
/test/ghi
I noticed that even if I use a 301 redirect from /test/1 to /test/abc, the URL /test/1 stays in the Google index for a while after the robot hits the redirect and discovers the change.
Is it normal that it takes few weeks for the old URLs to disappear from the search engine index or is there a better way to let him know about the changes.
Should I use the URL removal tool ?
Will a new sitemap in the Google webmaster tools help to get rid of the old URLs ?
Help me see inside the Google black box :)
Answering your questions:
Yes it's normal for this process to take a few weeks, this is nothing to worry about.
The URL removal tool is only for URLs that no longer exist, you can't use it for URLs that now return a 301 (see: http://www.google.com/support/webmasters/bin/answer.py?answer=59819&hl=en)
An XML sitemap is mainly for telling Google about new pages and pages that have changed recently, so I don't think it will help you here
In short, the index will update naturally, you just need to let Google do its thing.

SEO help for replacing a website

I run a small e-commerce site that over the last few years has built up a reasonable search engine status.
I've been working on a new site that uses new URL formats and I am worried about how to deal with all the broken links and customer frustration for users finding out dated links through search engines.
Can anyone offer advice on how to mitigate / minimize the damage? The old site was done in ASP.NET the new in ASP.NET MVC
Thanks for any help you can be.
You will need some sort of parallel structure. Ideally, the old site with the old URLs remains fully accessible for some time, but does not get indexed any more.
If that's not feasible, and since you are saying that the site is small, you could establish a URL mapping old-new and have a 404 handler that attempts to redirect to the new content.
You should create permanent redirects for the links you want to preserve (routelevel). This way searchengines will update their references to the new locations.
As cdonner says, you want to have a handler that reroutes the traffic to its appropriate destination. Even more important though, is you want to make sure when you redirect the client, you send a status code of 301 (permanently moved) instead of 404. The search engines will rate you negatively if there are a lot of 404 errors on your site and you will see your standing decrease instead of increase.
You could set up your old site's .htaccess file to redirect traffic to the new site. Beyond that, you could use mod_rewrite to map requests to pages on the old site to the same (or similar) pages on the new one.
This is the way I do it migrating from an old ASP classic site:
Sub Application_BeginRequest(ByVal sender As Object, ByVal e As System.EventArgs)
Dim fullOriginalpath As String = Request.Url.ToString.ToLower
If (fullOriginalpath.Contains("/viewitem.asp?itemid=")) Then
Context.Response.StatusCode = 301
Context.Response.Redirect("/item/" + getItemIDFromPath(fullOriginalpath))
ElseIf (fullOriginalpath.Contains("/search.asp")) Then
Context.Response.StatusCode = 301
Context.Response.Redirect("/search/")
ElseIf (fullOriginalpath.EndsWith("/default.asp")) Then
Context.Response.StatusCode = 301
Context.Response.Redirect("/")
End If
End Sub
Sounds like you have it figured out, but just wanted to add one more option - the canonical tag - which may have advantages if for any reason you needed to keep both the old url and the new URL active. You can create a copy of the page at the old URL and then add the "canonical" tag, which tells the search engines "please credit the link credits of this page to the following page: www.site.com/newpage"
<link rel="canonical" href="http://www.yoursite.com" /> this line goes in before </head>
For example if you have lots of links to certain key pages and those links are pointed to the old URL's, this may be a help.
A 301 also redirects the link credits, and generally moved pages you'll want to use a 301 redirect. Oh and if you use a URL rewrite rule and all the URL's change in the same way, you can probably use regex in the rewrite rule to handle all of them in a single step.

Resources