Joomla URL junk - url

So, I've set up a site and have Search Engine Friendly URLs on YES, I've set up page aliases and my main URLs are fine but those pages, for some reason, can be accessed trough some weird links like mysite.com/component/content/article/17-category/61-article-name.html instead of just mysite.com/category/article-name.html like I want it and like I have it in my sitemap.
Why is joomla generating these redundant URLs and how to get rid of them (so when somebody clicks on them in google it takes him to 404)?
Thanks
PS. answer on question How to clean up Joomla! URLs? does not help me.

As per http://magazine.joomla.org/issues/issue-june-2013/item/1054-duplicate-pages-joomla-causes-errors-solutions
I used 301 Redirect in .htaccess file to redirect from doubled URLs

Related

How to delete old Google Urls with parameters

a month ago i relaunched a Website in Typo3 CMS. Before that, the site was hosted with Joomla CMS.
In Joomla Config, SEO Links were disabled, so Google indexed the Page Urls this:
www.domain.de/index.php?com_component&itemid=123....
for example.
Now, a month later (after the Typo3 Relaunch), these Links are still visible in Google because the Urls don't return a 404-Error. That's because "index.php" also exists on Typo3 and Typo3 doesnt care about the additional query string/variables - it returns a 200 status code and shows the front page.
In Google Webmaster Tools it's possible to delete single Urls from the Google Index, but that way i have to delete about 10000 Urls manually...
My Question is: Is there a way to remove these old Urls from the Google Index?
Greetings
With this amount of URL's there is only one sensible solution, implement the proper 404 handling in your TYPO3, or even better redirections to same content placed in TYPO3.
You can use TYPO3's handler (search for it in Install Tool > All configuration) it's called pageNotFound_handling, you can use options like REDIRECT for redirecting to some page or even USER_FUNCTION, which allow you to use own PHP script, check the description in the Install Tool.
You can also write a simple condition in TypoScript and check if Joomla typical params exists in the URL - so that easy way you can return custom 404 page. If it's important to you to make more sophisticated condition (for an example, you want to redirect links which previously pointed to some gallery in Joomla, to new gallery in TYPO3) you can make usage of userFunc condition and that would be probably best option for SEO
If these urls contain an acceptable number of common indicators, you could redirect these links with a rule in your virtual host or .htaccess so that google will run into the correct error message.
I wrote a google chrome extension to remove urls in bulk in google webmaster tools. Check it out here: https://github.com/noitcudni/google-webmaster-tools-bulk-url-removal.
Basically, it's a glorified for loop. You put all the urls in a text file. For example,
http://your-domain/link-1
http://your-domain/link-2
Having installed the extension as described in the README, you'll find a new "choose a file" button.
Select the file you just created. The extension reads it in, loops thru all the urls and submits them for removal.

How to restrict to access to non-SEO URLs in Joomla 3?

I've converted all my URLs to SEO friendly URLs.
But I want to restrict to be accessed to my non-seo friendly URLs.
As an example, you can access to www.example.com/article-1 with http://www.example.com/index.php?option=com_content&view=article&id=76&Itemid=113. But I don't want this. I just want you to be able to access with http://www.example.com/article-1
I wish that I'm clear to explain what I need.
I don't think it's possible for the simple reason that Joomla always uses the non-SEF links internally. That's why they always work.
Also there are links which are not converted to SEF links because the user will not see and Google will not index them. Like links used by AJAX scripts or similar things.
If you block non-SEF urls in your .htaccess file, expect your page to break sooner than later. Don't blame the extension developer then :-)

Remove multiple indexed URLs (duplicates) with redirect

I am managing a website that has only about 20-50 pages (articles, links and etc.). Somehow, Google indexed over 1000 links (duplicates, same page with different string in the URL). I found that those links contain ?date= in url. I already blocked by writing Disallow: *date* in robots.txt, made an XML map (which I did not had before) placed it into root folder and imported to Google Webmaster Tools. But the problem still stays: links are (and probably will be) in search results. I would easily remove URLs in GWT, but they can only remove one link at the time, and removing >1000 one by one is not an option.
The question: Is it possible to make dynamic 301 redirects from every page that contains $date= in url to the original one, and how? I am thinking that Google will re-index those pages, redirect to original ones, and delete those numerous pages from search results.
Example:
bad page: www.website.com/article?date=1961-11-1 and n same pages with different "date"
good page: www.website.com/article
automatically redirect all bad pages to good ones.
I have spent whole work day trying to solve this problem, would be nice to get some support. Thank you!
P.S. As far as I think this coding question is the right one to ask in stackoverflow, but if I am wrong (forgive me) redirect me to right place where I can ask this one.
You're looking for the canonical link element, that's the way Google suggests to solve this problem (here's the Webmasters help page about it), and it's used by most if not all search engines. When you place an element like
<link rel='canonical' href='http://www.website.com/article'>
in the header of the page, the URI in the href attribute will be considered the 'canonical' version of the page, the one to be indexed and so on.
For the record: if the duplicate content is not a html page (say, it's a dynamically generated image), and supposing you're using Apache, you can use .htaccess to redirect to the canonical version. Unfortunately the Redirect and RedirectMatch directives don't handle query strings (they're strictly for URIs), but you could use mod_rewrite to strip parts of the query string. See, for example, this answer for a way to do it.

Strange google result listing, invalid URL created

Would be great if you guys could shed some light on this, has baffled me:
I was asked by a client if I could try and make the search term for his comedy night "sketchercise" put his website top of the Google ranking. I simply changed the title tag of the header for the whole site from "Allnutt and Simpson" to "Allnutt and Simpson - Sketchercise # Ginglik - Sketch Duo". It did the trick and now the site comes up top of the Google listing when typing in "sketchercise". However, it gives off this very strange link:
http://www.allnuttandsimpson.com/index.php/videos/
This is the link to the google search result too:
http://www.google.co.uk/search?sourceid=chrome&ie=UTF-8&q=sketchercise
This link is invalid, it doesn't make any sense. I guess it has something to do with the use of hash tags and the AJAX driven site, but before I changed the title tag, it linked to the site fine using the # tags. What is the deal with this slash?
The strangest part is that the valid URL for the videos page on that site is /index.php#vidspics, I have never used the word "videos" in a url!
If anyone can explain the cause of this or just help me stop it from happening, I'd be very grateful. I realise that this is an SEO question and I hate that stuff generally, but I hope you can see this is a bit of a strange case!
Just to compare, if you google "allnutt and simpson" it works just fine links to the site and all of it's pages absolutely fine as .php pages (and then my JS converts them to hash tags to keep things clean)
It's because there must be a folder called 'videos' under your hosted files, use an FTP client and check this.
Google crawls every folder and file unless you tell him not to do this, look for robot.txt files to learn how to avoid indexation.
Also ask google to remove that result when you solve this.
Finally that behaviour is not related with hash tags, these are just references to javascript in order to display the appropiate content in you webpage.
Not sure why its posted like this but the only way to stop that page from appearing is using a google webmaster account for this website and make sure the crawlers can't find this link anymore. The alternative is have the site admin put this tag, <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> , in the header when isset($_REQUEST(videos)) is true.
The slash in the address is the parsed form of www.allnuttandsimpson.com/index.php?=videos. You can have the web server change all the php parameters into slashes to make the links look pretty.
Best option for correct results is to create a sitemap and submit it to https://www.google.com/webmasters/tools/ for that site. You will need access.
Oh forgot, the sitemap will make google see all the pages you want it to post, use this for the major pages like those in the main menu. To remove links you don't want requires a robots.txt in the main directory of the site.

How to improve the structure of URLs

From the article at google's webmaster center and SEO's pdf, I think I should improve my website's URLs structure.
Now the news url looks like "news.php?id=127591". I need to rewrite it to something like "/news/127591/this-is-article-subject"
The problem is if I change the structure of url to the new one. Can I still keep the old one working? If both url working, how to avoid search engine like google and bing to search twice times for one article?
Thanks!
HTTP 301 permanent redirect from the old URL to the new URL
an HTTP 301 redirect has the property of communicate a new (permanent) URL for an old (outdated) ressource to google (and other clients). google will transfer most/all of the allocated value from the old URL to the new URL.
Also, in order to improve the arquitecture of your website, you must keep a clean structure by inserting links within all its pages/posts. But be careful, you must not do this lightly, or Google´s robot will get confused and leave.
Structure is key to your SEO
1. Find one page which is the "really important page" for any given keyword
2. direct relevant content from other pages which is relevant to that particular kw
3. repeat with every relevan kw
I´m gonna leave this post for you, where I explain this more in depth, hoping that you understand spanish. http://coach2coach.es/la-estructura-web-es-la-base-del-posicionamiento/
Yep.. you can use robots.txt to exclude news.php, and create an xml sitemap with the new URLs. mod_rewrite can be set to only change directories, with trailing slashes.. so all files in your root directory should work fine.

Resources