Google refuses to index new site URL - url

This is either a problem that Google is inflicting upon me, or a problem I am inflicting upon myself. I'm not totally sure.
When I first created my website a couple years ago, it followed a path similar to: http://www.mywebsite.abc123.com
Now, after a change in hosting services, I changed my domain to simply: https://www.mywebsite.com
I also added an SSL certifcate at the time for what it's worth. And it's been almost six months. I have all the variations (past and present) of my website registered and verified with Google's search console, but I can see no reason why the http://www.mywebsite.abc123.com link is getting indexed over the https://www.mywebsite.com link. I actually just assumed that http://www.mywebsite.abc123.com wouldn't even work anymore.
I've read about 301 redirects and it looks like something like that would solve my problem, but upon trying to implement it, I was confronted with nothing but a "Too many redirects" error.
Long story short, Google won't index my newer better URL.
But Yahoo and Bing will.

301 redirects have to be set up in the old domain so that it will point to the new one. If you still have access to that domain, you can add the redirects via .htaccess or in the admin panel.

Related

.com extension adds random characters/numbers

I'm using a domain name with this general structure: http://mydomainname.com/
However, when I click it, I get a 404 message saying:
And when I look in the URL, it's not http://mydomainname.com/ but surprisingly http://mydomainname.com/YkPWZ/.
How did YkPWZ/ appear automatically and what can I do to eliminate this issue? Sometimes accessing http://mydomainname.com/ works fine, but most of the time the browser automatically tacks on some random characters at the end of the URL, throwing the 404 message. This is not a browser-specific issue and I've had a few colleagues replicate this issue on different operating systems (both desktop and iOS).
P.S. If it matters at all, I generated my website using Github Pages (markdown files, not HTML).
I'm quite certain this is an issue on the GoDaddy side of things, though I'm unable to find any official documentation on the subject. As noted in comments above, the redirect isn't coming from GitHub Pages.
I found an old thread discussing the issue. Here is a brief summary:
GoDaddy may use redirects like this to handle load balancing on their shared hosting servers.
In several cases, users contacted GoDaddy to ask about the problem and
had the issue resolved, but
were never told the technical specifics of what was happening.
If you wish to stay with GoDaddy I recommend contacting them and sending them to the link I found above. They may be able to resolve the issue for you, though I wouldn't expect an explanation.
Alternatively, you can use another web host. In many circles, GoDaddy isn't rated very highly. It's lucky that there are so many web hosts to choose from. Alternatively, you can use a custom domain directly with GitHub Pages, bypassing a third-party host entirely.

Google script origin request url

I'm developing a Google Sheets add-on. The add-on calls an API. In the API configuration, a url like https://longString-script.googleusercontent.com had to be added to the list of urls allowed to make requests from another domain.
Today, I noticed that this url changed to https://sameLongString-0lu-script.googleusercontent.com.
The url changed about 3 months after development start.
I'm wondering what makes the url to change because it also means a change in configuration in our back-end every time.
EDIT: Thanks for both your responses so far. Helped me understand better how this works but I still don't know if/when/how/why the url is going to change.
Quick update, the changing part of the url was "-1lu" for another user today (but not for me when I was testing). It's quite annoying since we can't use wildcards in the google dev console redirect uri field. Am I supposed to paste a lot of "-xlu" uris with x from 1 to like 10 so I don't have to touch this for a while?
For people coming across this now, we've also just encountered this issue while developing a Google Add-on. We've needed to add multiple origin urls to our oauth client for sign-in, following the longString-#lu-script.googleusercontent.com pattern mentioned by OP.
This is annoying as each url has to be entered separately in the authorized urls field (subdomain or wildcard matching isn't allowed). Also this is pretty fragile since it breaks if Google changes the urls they're hosting our add-on from. Furthermore I wasn't able to find any documentation from Google confirming that these are the script origins.
URLs are managed by the host in various ways. At the most basic level, when you build a web server you decide what to call it and what to call any pages on it. Google and other large content providers with farms of servers and redundant data centers and everything are going to manage it a bit differently, but for your purposes, it will be effectively the same in that ... you need to ask them since they are the hosting provider of your cloud content.
Something that MIGHT be related is that Google rolled out some changes recently dealing with the googleusercontent.com domain and picassa images (or at least was scheduled to do so.) So the google support forums will be the way to go with this question for the freshest answers since the cause of a URL change is usually going to be specific to that moment in time and not something that you necessarily need to worry about changing repeatedly. But again, they are going to need to confirm that it was something related to the recent planned changes... or not. :-)
When you find something out you can update this question in case it is of use to others. Especially, if they tell you that it wasn't a one time thing dealing with a change on their end.
This is more likely related to Changing origin in Same-origin Policy. As discussed:
A page may change its own origin with some limitations. A script can set the value of document.domain to its current domain or a superdomain of its current domain. If it sets it to a superdomain of its current domain, the shorter domain is used for subsequent origin checks.
For example, assume a script in the document at http://store.company.com/dir/other.html executes the following statement:
document.domain = "company.com";
After that statement executes, the page can pass the origin check with http://company.com/dir/page.html
So, as noted:
When using document.domain to allow a subdomain to access its parent securely, you need to set document.domain to the same value in both the parent domain and the subdomain. This is necessary even if doing so is simply setting the parent domain back to its original value. Failure to do this may result in permission errors.

Verifying Googlebot in Rails

I am looking to implement First Click Free in my rails application. Google has this information on how to verify a if a googlebot is viewing your site here.
I have been searching to see if there is anything existing for Rails to do this but I have been unable to find anything. So firstly, does anyone know of anything? If not, could anyone point me in the right direction of how to go about implementing what they have suggested in that page about how to verify?
Also, in that solution, it has to do a lookup every time to try and detect google, that seems like its going to be a big performance hit if I have to do it every page load? I could cache the IP if it has been verified in the past but Google have stated that their IP's change so at some point it may no longer belong to them. Although it probably doesn't happen regularly so it may not be that big of an issue.
Many thanks!!
Check out the browser gem: https://github.com/fnando/browser
What I'd do is use the
browser.bot?
method to check if your site is being accessed by a bot or not. If you care about the Googlebot specifically, you could check if
browser.name
includes googlebot. Keep in mind that this gem just checks the user agent sent by the client's browser, which could of course be spoofed. Sounds like that isn't a huge concern for your purposes.
I've built a Ruby gem for that recently, it's called "legitbot".
You may learn if a Web request comes from a supported bot using
bot = Legitbot.bot(userAgent, ip)
"legitbot" does this looking into User-agent and searching for a bot signature, i.e. how bots identify themselves. This doesn't guarantee that the Web request IP really comes from e.g. Googlebot. To make sure it is, call
bot.detected_as # => "Google"
bot.valid? # => true
bot.fake? # => false
Supported bots are Googlebot, Yandex bots, Bing, Baidu, DuckDuckGo.

Redirecting an old SSL domain to a new one on Heroku (w/ Rails 3)

I've been googling for the past hour on this but can't quite get it nailed down. Perhaps you guys can assist here!
Here's what I'm trying to do:
Old site is:
old.tld
New site is:
new.tld
A bunch of folks access one particular legacy url on the old site via SSL, i.e.:
https://www.old.tld/old_url
I've just setup a brand new site on Heroku, running Rails 3, on the new domain.
I also have installed Heroku's SSL Endpoint Add-on and am using a new secure subdomain:
secure.new.tld
I've got a bunch of redirects & constraints in my Rails routes.rb to redirect old -> new and pass the appropriate requests. Everything works perfectly for the non-SSL stuff.
The only thing I can't seem to get working properly is the redirects of
https://www.old.tld/old_url -> https://secure.new.tld/new_url
Safari doesn't seem to mind, but Chrome is throwing a "This is probably not the site you are looking for!" error. It says (only when trying via SSL):
"You attempted to reach www.old.tld, but instead you actually reached a server identifying itself as secure.new.tld" Etc.
(which is exactly what I want, but Chrome doesn't seem to approve ;)
Any thoughts on how to properly configure?
The seamless solution it is to get a SAN (Subject Alternative Name) certificate that has both the old name and the new name.
Another way is to insert a non-secure request in between the two secure requests: redirect from https://www.old.tld -> http://secure.new.tld -> https://secure.new.tld. I haven't actually tested this, but it should work in theory. If it doesn't "just work", then you might try adding an actual page at http://secure.new.tld with a message and a link to https://secure.new.tld.

google bot, false links

I have a little problem with google bot, I have a server working on windows server 2009, the system called Workcube and it works on coldfusion, there is an error reporter built-in, thus i recieve every message of error, especially it concerned with google bot, that trying to go to a false link, which doesn't exist! the links looks like this:
http://www.bilgiteknolojileri.net/index.cfm?fuseaction=objects2.view_product_list&product_catid=282&HIERARCHY=215.005&brand_id=hoyrrolmwdgldah
http://www.bilgiteknolojileri.net/index.cfm?fuseaction=objects2.view_product_list&product_catid=145&HIERARCHY=200.003&brand_id=hoyrrolmwdgldah
http://www.bilgiteknolojileri.net/index.cfm?fuseaction=objects2.view_product_list&product_catid=123&HIERARCHY=110.006&brand_id=xxblpflyevlitojg
http://www.bilgiteknolojileri.net/index.cfm?fuseaction=objects2.view_product_list&product_catid=1&HIERARCHY=100&brand_id=xxblpflyevlitojg
of course with definition like brand_id=hoyrrolmwdgldah or brand_id=xxblpflyevlitojg is false, i don't have any idea what can be the problem?! need advice! thank you all for help! ;)
You might want to verify your site with Google Webmaster Tools which will provide URLs that it finds that error out.
Your logs are also valid, but you need to verify that it really is Googlebot hitting your site and not someone spoofing their User Agent.
Here are instructions to do just that: http://googlewebmastercentral.blogspot.com/2006/09/how-to-verify-googlebot.html
Essentially you need to do a reverse DNS lookup and then a forward DNS lookup after you receive the host from the reverse lookup.
Once you've verified it's the real Googlebot you can start troubleshooting. You see Googlebot won't request URLs that it hasn't naturally seen before, meaning Googlebot shouldn't be making direct object reference requests. I suspect it's a rogue bot with a User Agent of Googlebot, but if it's not you might want to look through your site to see if you're accidentally linking to those pages.
Unfortunately you posted the full URLs, so even if you clean up your site, Googelbot will see the links from Stack Overflow and continue to crawl them because it'll be in their crawl queue.
I'd suggest 301 redirecting these URLs to someplace that make sense to your users. Otherwise I would 404 or 410 these pages so Google know to remove these pages from their index.
In addition, if these are pages you don't want indexed, I would suggest adding the path to your robots.txt file so Googlebot can't continue to request more of these pages.
Unfortunately there's no real good way of telling Googlebot to never ever crawl these URLs again. You can always go into Google Webmaster Tools and request the URLs to be removed from their index which may stop Googlebot from crawling them again, but that doesn't guarantee it.

Resources