Google Analytics tracking some pageviews as just a forward slash - url

Really strange. One of my posts is being tracked half the time in Google Analytics as its correct permalink, while the other half of the pageviews are coming from a single forward slash that is attributed with the same Page Title.
Example:
Title of Page: Official iPhone Unlock
Correct URL of page: /official-iphone-unlock
Two URL's being tracked with that page title:/official-iphone-unlock/
So, needless to say, this is throwing off my numbers as I'm getting pageviews for this page under both URLs, and really hard to figure out what the issue is. I'm using ECWID shopping cart, and I'm suspicious that it's their way of tracking things, but I can't prove it. But the issue started around the time I enabled their tracking code.

Have you tried segmenting the traffic for these pages by browser?
First find the page:
Behavior > Site Content > All Pages (then search for your pages)
...then cross-drill by browser segment:
Secondary Dimension > Visitors > Browsers
One possibility that comes to mind is that some browsers may auto-append a slash to the end of URLs without a file extension, while others may not. For example, Chrome forwards a /foo URL to /foo/ for me. It may only be specific versions of a browser that exhibits this behavior -- like IE9 for example.

You can implement the filter to remove the trailing slashes - check this https://www.petramanos.com/ecommerce-google-analytics/remove-slashes-end-urls-google-analytics

Related

Remove multiple indexed URLs (duplicates) with redirect

I am managing a website that has only about 20-50 pages (articles, links and etc.). Somehow, Google indexed over 1000 links (duplicates, same page with different string in the URL). I found that those links contain ?date= in url. I already blocked by writing Disallow: *date* in robots.txt, made an XML map (which I did not had before) placed it into root folder and imported to Google Webmaster Tools. But the problem still stays: links are (and probably will be) in search results. I would easily remove URLs in GWT, but they can only remove one link at the time, and removing >1000 one by one is not an option.
The question: Is it possible to make dynamic 301 redirects from every page that contains $date= in url to the original one, and how? I am thinking that Google will re-index those pages, redirect to original ones, and delete those numerous pages from search results.
Example:
bad page: www.website.com/article?date=1961-11-1 and n same pages with different "date"
good page: www.website.com/article
automatically redirect all bad pages to good ones.
I have spent whole work day trying to solve this problem, would be nice to get some support. Thank you!
P.S. As far as I think this coding question is the right one to ask in stackoverflow, but if I am wrong (forgive me) redirect me to right place where I can ask this one.
You're looking for the canonical link element, that's the way Google suggests to solve this problem (here's the Webmasters help page about it), and it's used by most if not all search engines. When you place an element like
<link rel='canonical' href='http://www.website.com/article'>
in the header of the page, the URI in the href attribute will be considered the 'canonical' version of the page, the one to be indexed and so on.
For the record: if the duplicate content is not a html page (say, it's a dynamically generated image), and supposing you're using Apache, you can use .htaccess to redirect to the canonical version. Unfortunately the Redirect and RedirectMatch directives don't handle query strings (they're strictly for URIs), but you could use mod_rewrite to strip parts of the query string. See, for example, this answer for a way to do it.

Url with pseudo anchors and duplicate content / SEO

I have a product page with options in select list (ex : color of the product etc...).
You accede to my product with different urls :
www.mysite.com/product_1.html
www.mysite.com/product_1.html#/color-green
If you accede with the url www.mysite.com/product_1.html#/color-green, the option green of the select list is automatically selected (with javascript).
If i link my product page with those urls, is there a risk of duplicate content ? Is it good for my seo ?
thx
You need to use canonical urls in order to let the search engines know that you are aware that the content seems duplicated.
Basically using a canonical url on your page www.mysite.com/product_1.html#/color-green to go to www.mysite.com/product_1.html tells the search engine that whenever they see www.mysite.com/product_1.html#/color-green they should not scan this page but rather scan the page www.mysite.com/product_1.html
This is the suggested method to overcome duplicate content of this type.
See these pages:
SEO advice: url canonicalization
A rel=canonical corner case
At one time I saw Google indexing the odd #ed URL and showing them in results, but it didn't last long. I think it also required that there was an on page link to the anchor.
Google does support the concept of the hashbang (#!) as a specific way to do indexable anchors and support AJAX, which implies an anchor without the bang (!) will no longer be considered for indexing.
Either way, Google is not stupid. The the basic use of the anchor is to move to a place on a page, i.e. it is the same page (duplicate content) but a different spot. So Google will expect a #ed URL to contain the same content. Why would they punish you for doing what the # is for?
And what is "the risk of duplicate content". Generally, the only onsite risk from duplicate content is Google may waste it's time crawling duplicate pages instead of focusing on other valuable pages. As Google will assume # is the same page it is more likely to not event try the #ed URL.
If you're worried, implement the canonical tag, but do it right. I've seen more issues from implementing it badly than the supposed issues they are there to solve.
Both answers above are correct. Google has said they ignore hashtags unless you use hash-bang format (#!) -- and that really only addresses a certain use case, so don't add it just because you think it will help.
Using the canonical link tag is the right thing to do.
One additional point about dupe content: it's less about the risk than about a missed opportunity. In cases where there are dupes, Google chooses one. If 10 sites link to your site using www.example.com and 10 more link using just example.com you'll get the :link goodness" benefit of only 10 links. The complete solution to this involves ensuring that when users and Google arrive at the "wrong" on, the server responds with an HTTP 301 status and redirects the user to the "right" one. This is known as domain canonicalization and is a good thing for many, many reasons. Use this in addition to the "canonical" link tag and other techniques.

Why do some websites have "#!" in the URL [duplicate]

I've just noticed that the long, convoluted Facebook URLs that we're used to now look like this:
http://www.facebook.com/example.profile#!/pages/Another-Page/123456789012345
As far as I can recall, earlier this year it was just a normal URL-fragment-like string (starting with #), without the exclamation mark. But now it's a shebang or hashbang (#!), which I've previously only seen in shell scripts and Perl scripts.
The new Twitter URLs now also feature the #! symbols. A Twitter profile URL, for example, now looks like this:
http://twitter.com/#!/BoltClock
Does #! now play some special role in URLs, like for a certain Ajax framework or something since the new Facebook and Twitter interfaces are now largely Ajaxified?
Would using this in my URLs benefit my Web application in any way?
This technique is now deprecated.
This used to tell Google how to index the page.
https://developers.google.com/webmasters/ajax-crawling/
This technique has mostly been supplanted by the ability to use the JavaScript History API that was introduced alongside HTML5. For a URL like www.example.com/ajax.html#!key=value, Google will check the URL www.example.com/ajax.html?_escaped_fragment_=key=value to fetch a non-AJAX version of the contents.
The octothorpe/number-sign/hashmark has a special significance in an URL, it normally identifies the name of a section of a document. The precise term is that the text following the hash is the anchor portion of an URL. If you use Wikipedia, you will see that most pages have a table of contents and you can jump to sections within the document with an anchor, such as:
https://en.wikipedia.org/wiki/Alan_Turing#Early_computers_and_the_Turing_test
https://en.wikipedia.org/wiki/Alan_Turing identifies the page and Early_computers_and_the_Turing_test is the anchor. The reason that Facebook and other Javascript-driven applications (like my own Wood & Stones) use anchors is that they want to make pages bookmarkable (as suggested by a comment on that answer) or support the back button without reloading the entire page from the server.
In order to support bookmarking and the back button, you need to change the URL. However, if you change the page portion (with something like window.location = 'http://raganwald.com';) to a different URL or without specifying an anchor, the browser will load the entire page from the URL. Try this in Firebug or Safari's Javascript console. Load http://minimal-github.gilesb.com/raganwald. Now in the Javascript console, type:
window.location = 'http://minimal-github.gilesb.com/raganwald';
You will see the page refresh from the server. Now type:
window.location = 'http://minimal-github.gilesb.com/raganwald#try_this';
Aha! No page refresh! Type:
window.location = 'http://minimal-github.gilesb.com/raganwald#and_this';
Still no refresh. Use the back button to see that these URLs are in the browser history. The browser notices that we are on the same page but just changing the anchor, so it doesn't reload. Thanks to this behaviour, we can have a single Javascript application that appears to the browser to be on one 'page' but to have many bookmarkable sections that respect the back button. The application must change the anchor when a user enters different 'states', and likewise if a user uses the back button or a bookmark or a link to load the application with an anchor included, the application must restore the appropriate state.
So there you have it: Anchors provide Javascript programmers with a mechanism for making bookmarkable, indexable, and back-button-friendly applications. This technique has a name: It is a Single Page Interface.
p.s. There is a fourth benefit to this technique: Loading page content through AJAX and then injecting it into the current DOM can be much faster than loading a new page. In addition to the speed increase, further tricks like loading certain portions in the background can be performed under the programmer's control.
p.p.s. Given all of that, the 'bang' or exclamation mark is a further hint to Google's web crawler that the exact same page can be loaded from the server at a slightly different URL. See Ajax Crawling. Another technique is to make each link point to a server-accessible URL and then use unobtrusive Javascript to change it into an SPI with an anchor.
Here's the key link again: The Single Page Interface Manifesto
First of all: I'm the author of the The Single Page Interface Manifesto cited by raganwald
As raganwald has explained very well, the most important aspect of the Single Page Interface (SPI) approach used in FaceBook and Twitter is the use of hash # in URLs
The character ! is added only for Google purposes, this notation is a Google "standard" for crawling web sites intensive on AJAX (in the extreme Single Page Interface web sites). When Google's crawler finds an URL with #! it knows that an alternative conventional URL exists providing the same page "state" but in this case on load time.
In spite of #! combination is very interesting for SEO, is only supported by Google (as far I know), with some JavaScript tricks you can build SPI web sites SEO compatible for any web crawler (Yahoo, Bing...).
The SPI Manifesto and demos do not use Google's format of ! in hashes, this notation could be easily added and SPI crawling could be even easier (UPDATE: now ! notation is used and remains compatible with other search engines).
Take a look to this tutorial, is an example of a simple ItsNat SPI site but you can pick some ideas for other frameworks, this example is SEO compatible for any web crawler.
The hard problem is to generate any (or selected) "AJAX page state" as plain HTML for SEO, in ItsNat is very easy and automatic, the same site is in the same time SPI or page based for SEO (or when JavaScript is disabled for accessibility). With other web frameworks you can ever follow the double site approach, one site is SPI based and another page based for SEO, for instance Twitter uses this "double site" technique.
I would be very careful if you are considering adopting this hashbang convention.
Once you hashbang, you can’t go back. This is probably the stickiest issue. Ben’s post put forward the point that when pushState is more widely adopted then we can leave hashbangs behind and return to traditional URLs. Well, fact is, you can’t. Earlier I stated that URLs are forever, they get indexed and archived and generally kept around. To add to that, cool URLs don’t change. We don’t want to disconnect ourselves from all the valuable links to our content. If you’ve implemented hashbang URLs at any point then want to change them without breaking links the only way you can do it is by running some JavaScript on the root document of your domain. Forever. It’s in no way temporary, you are stuck with it.
You really want to use pushState instead of hashbangs, because making your URLs ugly and possibly broken -- forever -- is a colossal and permanent downside to hashbangs.
To have a good follow-up about all this, Twitter - one of the pioneers of hashbang URL's and single-page-interface - admitted that the hashbang system was slow in the long run and that they have actually started reversing the decision and returning to old-school links.
Article about this is here.
I always assumed the ! just indicated that the hash fragment that followed corresponded to a URL, with ! taking the place of the site root or domain. It could be anything, in theory, but it seems the Google AJAX Crawling API likes it this way.
The hash, of course, just indicates that no real page reload is occurring, so yes, it’s for AJAX purposes. Edit: Raganwald does a lovely job explaining this in more detail.

What does the #!/ do for SEO in the URL? [duplicate]

I've just noticed that the long, convoluted Facebook URLs that we're used to now look like this:
http://www.facebook.com/example.profile#!/pages/Another-Page/123456789012345
As far as I can recall, earlier this year it was just a normal URL-fragment-like string (starting with #), without the exclamation mark. But now it's a shebang or hashbang (#!), which I've previously only seen in shell scripts and Perl scripts.
The new Twitter URLs now also feature the #! symbols. A Twitter profile URL, for example, now looks like this:
http://twitter.com/#!/BoltClock
Does #! now play some special role in URLs, like for a certain Ajax framework or something since the new Facebook and Twitter interfaces are now largely Ajaxified?
Would using this in my URLs benefit my Web application in any way?
This technique is now deprecated.
This used to tell Google how to index the page.
https://developers.google.com/webmasters/ajax-crawling/
This technique has mostly been supplanted by the ability to use the JavaScript History API that was introduced alongside HTML5. For a URL like www.example.com/ajax.html#!key=value, Google will check the URL www.example.com/ajax.html?_escaped_fragment_=key=value to fetch a non-AJAX version of the contents.
The octothorpe/number-sign/hashmark has a special significance in an URL, it normally identifies the name of a section of a document. The precise term is that the text following the hash is the anchor portion of an URL. If you use Wikipedia, you will see that most pages have a table of contents and you can jump to sections within the document with an anchor, such as:
https://en.wikipedia.org/wiki/Alan_Turing#Early_computers_and_the_Turing_test
https://en.wikipedia.org/wiki/Alan_Turing identifies the page and Early_computers_and_the_Turing_test is the anchor. The reason that Facebook and other Javascript-driven applications (like my own Wood & Stones) use anchors is that they want to make pages bookmarkable (as suggested by a comment on that answer) or support the back button without reloading the entire page from the server.
In order to support bookmarking and the back button, you need to change the URL. However, if you change the page portion (with something like window.location = 'http://raganwald.com';) to a different URL or without specifying an anchor, the browser will load the entire page from the URL. Try this in Firebug or Safari's Javascript console. Load http://minimal-github.gilesb.com/raganwald. Now in the Javascript console, type:
window.location = 'http://minimal-github.gilesb.com/raganwald';
You will see the page refresh from the server. Now type:
window.location = 'http://minimal-github.gilesb.com/raganwald#try_this';
Aha! No page refresh! Type:
window.location = 'http://minimal-github.gilesb.com/raganwald#and_this';
Still no refresh. Use the back button to see that these URLs are in the browser history. The browser notices that we are on the same page but just changing the anchor, so it doesn't reload. Thanks to this behaviour, we can have a single Javascript application that appears to the browser to be on one 'page' but to have many bookmarkable sections that respect the back button. The application must change the anchor when a user enters different 'states', and likewise if a user uses the back button or a bookmark or a link to load the application with an anchor included, the application must restore the appropriate state.
So there you have it: Anchors provide Javascript programmers with a mechanism for making bookmarkable, indexable, and back-button-friendly applications. This technique has a name: It is a Single Page Interface.
p.s. There is a fourth benefit to this technique: Loading page content through AJAX and then injecting it into the current DOM can be much faster than loading a new page. In addition to the speed increase, further tricks like loading certain portions in the background can be performed under the programmer's control.
p.p.s. Given all of that, the 'bang' or exclamation mark is a further hint to Google's web crawler that the exact same page can be loaded from the server at a slightly different URL. See Ajax Crawling. Another technique is to make each link point to a server-accessible URL and then use unobtrusive Javascript to change it into an SPI with an anchor.
Here's the key link again: The Single Page Interface Manifesto
First of all: I'm the author of the The Single Page Interface Manifesto cited by raganwald
As raganwald has explained very well, the most important aspect of the Single Page Interface (SPI) approach used in FaceBook and Twitter is the use of hash # in URLs
The character ! is added only for Google purposes, this notation is a Google "standard" for crawling web sites intensive on AJAX (in the extreme Single Page Interface web sites). When Google's crawler finds an URL with #! it knows that an alternative conventional URL exists providing the same page "state" but in this case on load time.
In spite of #! combination is very interesting for SEO, is only supported by Google (as far I know), with some JavaScript tricks you can build SPI web sites SEO compatible for any web crawler (Yahoo, Bing...).
The SPI Manifesto and demos do not use Google's format of ! in hashes, this notation could be easily added and SPI crawling could be even easier (UPDATE: now ! notation is used and remains compatible with other search engines).
Take a look to this tutorial, is an example of a simple ItsNat SPI site but you can pick some ideas for other frameworks, this example is SEO compatible for any web crawler.
The hard problem is to generate any (or selected) "AJAX page state" as plain HTML for SEO, in ItsNat is very easy and automatic, the same site is in the same time SPI or page based for SEO (or when JavaScript is disabled for accessibility). With other web frameworks you can ever follow the double site approach, one site is SPI based and another page based for SEO, for instance Twitter uses this "double site" technique.
I would be very careful if you are considering adopting this hashbang convention.
Once you hashbang, you can’t go back. This is probably the stickiest issue. Ben’s post put forward the point that when pushState is more widely adopted then we can leave hashbangs behind and return to traditional URLs. Well, fact is, you can’t. Earlier I stated that URLs are forever, they get indexed and archived and generally kept around. To add to that, cool URLs don’t change. We don’t want to disconnect ourselves from all the valuable links to our content. If you’ve implemented hashbang URLs at any point then want to change them without breaking links the only way you can do it is by running some JavaScript on the root document of your domain. Forever. It’s in no way temporary, you are stuck with it.
You really want to use pushState instead of hashbangs, because making your URLs ugly and possibly broken -- forever -- is a colossal and permanent downside to hashbangs.
To have a good follow-up about all this, Twitter - one of the pioneers of hashbang URL's and single-page-interface - admitted that the hashbang system was slow in the long run and that they have actually started reversing the decision and returning to old-school links.
Article about this is here.
I always assumed the ! just indicated that the hash fragment that followed corresponded to a URL, with ! taking the place of the site root or domain. It could be anything, in theory, but it seems the Google AJAX Crawling API likes it this way.
The hash, of course, just indicates that no real page reload is occurring, so yes, it’s for AJAX purposes. Edit: Raganwald does a lovely job explaining this in more detail.

URL Redirection for Coming Soon Page?

I have a site with over 100 pages. We need to go live with products that are soon available, however, many site pages will not be prepared at the time of release.
In order to move forward, I would like to reference a "coming soon" page with links to pages that are current and available.
Is there an easy way to forward a URL to a Coming Soon page?
Is this valid, or is there a better way?
Found this at:
http://www.web-source.net/html_redirect.htm
"Place the following HTML redirect code between the and tags of your HTML code.
meta HTTP-EQUIV="REFRESH" content="0; url=http://www.yourdomain.com/index.html"
Does this negatively affect you if the search engines crawl through your site?
Thank you!
The code you listed will work. However, I would never do this:
You could just show the page you wanted to show immediately, without a redirect. This will be faster for the visitor, as they don't need to load two pages.
If you must use a redirect, why not create it programmatically, for example by instructing your web server (e.g. Apache) to redirect certain pages?
I would not link to pages that don't exist yet. Most visitors will dislike that - clicking on something to find out "come back later" is a disappointment. We've all seen those coming soon pages, with the content never arriving, or only after months or even years. Either leave out those links (or perhaps put a "work in progress sign" without a link), or add the items only after they've been finished.
Search engines should work well with redirect pages, although it is unlikely your "coming soon" page will show up anywhere in the top the rankings anyway.
Perhaps a better or "more correct way" would be to do the redirection at the header level. Using PHP, you would call
<?php
header("Location: http://www.yourdomain.com/index.html");
There's also ways to do this in Apache (assuming you are using it) and .htaccess-files. See http://www.webweaver.nu/html-tips/web-redirection.shtml for more info about that.

Resources