Can a dedicated IP address with a website on it be found and crawled by search engines? - search-engine

I have a VPS. I have placed a Drupal installation on that IP address. There is no URL registered for my website. The site on the IP address is for personal reference.
Can my IP address get indexed and found on search engines if there is no traditional URL for it? Will it get crawled?
I have no A-records pointing to it from other domain names I have on another VPS platform either. As far as I know, I am the only one that knows this IP address by heart or even goes there to add or refer to content.

There are three ways I know for a search engine to learn about the existence a website.
You submit the domain to them directly.
Someone else links to the domain.
The search engine watches all domain registrations (Google can do this easily because they run a DNS themselves), and tries the standard prefixes (e.g. www).
There does not seem to be an automatic approach for discovering IP addresses with content unless someone links to it.
If it's purely for personal reference and you want to be sure no one else can access it, then you should implement security anyway. Don't just rely on no one knowing the IP.

Can my IP address get indexed and found on search engines if there is no traditional URL for it?
Yes, if you can reach it externally, then so can the search engines. If you don't want it to be indexed, add a "robots.txt" that requests for the site not to be indexed. Bear in mind that crawlers do not have to respect this, but the major ones do.
As for how the search engines discover IP addresses that are not indexed elsewhere, that is probably part of their "secret sauce" that we will never know about. Perhaps your IP has been used before, and it has previously been indexed in that context; if so, a search engine that has a poke around may be expecting that old site but will happily index your new one.
Or, maybe other IP addresses in the same netblock are in active use, and the search engines give yours "a quick try" to see if it responds on ports 80 (http) or 443 (https). If they do, it gets added to their indexes (or do-not-crawl lists, if your robots.txt requests it).
If you specifically do not want search engines to see your content, you could make the default home page blank, and put your Drupal installation in a sub-directory. The search engines will then have nothing to index apart from a blank home page.

Related

Given 2 URLs, is it possible to know if the resources are on the same web server?

I am accessing 2 URLs. The domain name/server part is the same. The resource part is different.
The URLs are like the following:
https://aa.bb.com/dir1/dir2
https://aa.bb.com/dir3
When I access the first URL, I get redirected to the second URL. Is it possible that the second URL be hosted on a different web server than the first or both resources would be on the same web server?
If by web server you mean physical computer, absolutely they could be on different servers. Google and Akamai, among others, have large collections of machines serving the same domain names. It helps with speed, since you are likely to receive pages from a server near you.
In general, it does not appear to be possible to reliably tell whether you are talking to the exact same server before and after a redirect. First, it is difficult to test for IP addresses from a Web page (see, e.g., this question and this one). Second, even if the IP addresses are the same before and after the redirect, they may be on different machines. For example, TCP anycast can change which server you are talking to without changing the IP address. Also, network address translation and load-balancing may change which server you are talking to behind a firewall, which you would probably have no way of finding out unless the server provided some ID of its own.

using \\servername\sharename in IE9 not being picked up as intranet site

I have a very basic intranet site for our company, and it's main purpose is to link to SMB shares on our network, so people can open files and edit them, without the need to then reupload to the site.
What I have, is a basic < a href="\IP ADDRESS\SHARENAME\">< /a>
The issue seems to be, regardless of whether I use the IP address, or the actual DNS name of the machine, IE9 always seems to think the intranet is an internet site, and stops these links from working.
Let's say for example, the web server address is 10.1.3.81, and I have a share on that same server for a global phone directory spreadsheet. I want someone to be able to click on the link on the page, and have it open that file directly.
So for the href, I put in \\10.1.3.81\intranet\phone directory\list.xls
Or something like that. IE9 (which is what all our users are using), considers this link to point to file://10.1.3.81/intranet/phone directory/list.xls
That's great, but as it doesnt consider this to be on the intranet, it blocks the file:// protocol, and the link does nothing.
If I add the site to my trusted sites list, it then works correctly. So I am wondering if there is a way on the programming side of things, that will let me create these kind of links and have them auto picked up as an intranet link?
Failing that, I will post on serverfault, and see if someone can guide me on applying a policy to add this site to trusted sites for all users and computers.
Many thanks
Eds
As it turns out, I was accessing the intranet by using either the FQDN or the IP address of the server.
As this article shows, http://support.microsoft.com/kb/303650 , if I just use the server name instead, and drop the domain name from the end, the links behave as I would like.
Sorry for this useless question.
Thanks, Eds

Google indexed my domain anyway?

I have a robots.txt like below but Google has still indexed my domain. Basically they've indexed mydomain.com but not mydomain.com/any_page
UserAgent: *
Disallow: /
I mean how can I go back further than / which I thought was the root of domain?
Note this domain is a work in progess, hence I don't want Google or any other search engines seeing it for a minute.
If you don't have one already, get a Google Webmaster Tools account. It includes a URL removal tool that may work for you.
This doesn't address the problem of search engines possibly ignoring or misinterpreting your robots.txt file, of course.
If you REALLY want your site to be off the air until it's launched, your best bet is to actually take it off the air. Make the site inaccessible except by password. If you put HTTP Basic authentication on your documentroot, then no search engine will be able to index anything, but you'll have full access with a password.

SEO Destroyed By URL Forwarding - Can't figure out another way

We design and host websites for our clients/sales force. We have our own domain: http://www.firstheartland.com
Our agents fill out a series of forms on our website that are loaded into a database. The database then renders the website as a database driven website.
/repwebsites/repSite.cfm?link=&rep=rick.higgins
/repwebsites/repSite.cfm?link=&rep=troy.thompson
/repwebsites/repSite.cfm?link=&rep=david.kover
The database application reads which "rep" the site is for and the appropriate page to display from the query string. The page then outputs the content and the appropriate CSS to style the page and give it its own individual branding.
We have told the user to use Domain Name Forwarding to get the users to their spot on our server. However, everyone seems to be getting indexed under our domain instead of their own. We could in theory assign an new IP to them, the cost is not the issue.
The issue is how we would possibly accomplish this.
With all of that said, them being indexed under our domain would still be OK as long as they would actually show up high in the ranking for their search term.
For instance, an agent owns TroyLThompson.com. If I search Troy L Thompson, It does not show up in my search. Only, "troy thompson first heartland" works (they show up third)
Apart from scrapping the whole system, I don't know what to do. I'm very open to ideas.
I'm sure you can get this to work as most hosting companies will host hundreds of websites on a single server (i.e. multiple domains on one IP).
I think you need your clients to update the nameservers for their domains (i.e. DNS) to return the IP address of your hosting server. Then you need to configure your server to return the right website based on the domain that was originally requested.
That requires your "database driven website" to look in the HTTP request and check which domain was originally requested, then it can handle the request accordingly.
- If you are using Apache, see how to configure Apache to host multiple domains on one IP address.
- If you are using Microsoft IIS, maybe Host-Header Routing is what you need.
You will likely need code changes on your "database driven website" to cope with these changes.
I'm not sure that having a dedicated IP address per domain will help much, as then you have to find a way to host all those IP addresses from a single web server. However, if your web server architecture already supports a shared database and multiple servers, then that approach might work well for you, especially if you expect the load from some domains to be so heavy that you need a dedicated web server for them.
Google does not include URL in its index which return a 301 status code. The reason is pretty obvious on second thought, because the redirect tells Google "Whatever was here before has moved there, please update your references". One solution I can see is setting up Apache virtual hosts on your server for each external domain, and have each rep configure their domain's DNS A record to point to the IP address of your server.

Account based lookup in ASP.NET

I'm looking at using ASP.NET for a new SaaS service, but for the love of me I can't seem to figure out how to do account lookups based on subdomains like most SaaS applications (e.g. 37Signals) do.
For example, if I offer yourname.mysite.com, then how would I use ASP.NET (MVC specifically) to extract the subdomain so I can load the right template (displaying your company's name and the like)? Can it be done with regular routing?
This seems to be a common thing in SaaS so there has to be an easy way to do it in ASP.NET; I know there are plugins that do it for other frameworks like Ruby on Rails.
This works for me:
//--------------------------------------------------------------------------------------------------------------------------
public string GetSubDomain()
{
string SubDomain = "";
if (Request.Url.HostNameType == UriHostNameType.Dns)
SubDomain = Regex.Replace(Request.Url.Host, "((.*)(\\..*){2})|(.*)", "$2");
if (SubDomain.Length == 0)
SubDomain = "www";
return SubDomain;
}
I'm assuming that you would like to handle multiple accounts within the same web application rather than building separate sites using the tools in IIS. In our work, we started out creating a new web site for each subdomain but have found that this approach doesn't scale well - especially when you release an update and then have to modify dozens of sites! Thus, I do recommend this approach rather than the server-oriented techniques suggested above based on several years worth of experience doing exactly what you propose.
The code above just makes sure that this is a fully formed URL (rather, say, than an IP address) and returns the subdomain. It has worked well for us in a fairly high-volume environment.
You should be able to pick this up from the ServerVariables collection, but first you need to configure IIS and DNS to work correctly. So you know 37Signals probably use Apache or another open source, unix web server. On Apache this is referred to as VirtualHosting.
To do this with IIS you would need to create a new DNS entry (create a CNAME yourname.mysite.com to application.mysite.com) for each domain that points to your application in IIS (application.mysite.com).
You then create a host header entry in the IIS application (application.mysite.com) that will accept the header yourname.mysite.com. Users will actually hit application.mysite,com but the address is the custom subdomain. You then access the ServerVariables collection to get the value to decide on how to customize the site.
Note: there are several alternative implementations you could follow depending on requirements.
Handle the host header processing at a hardware load balancer (more likely 37Signals do this, than rely on the web server), and create a custom HTTP header to pass to the web application.
Create a new web application and host header for each individual application. This is probably an inefficient implementation for a large number of users, but could offer better isolation and security for some people.
You need to configure your DNS to support wildcard subdomains. It can be done by adding an A record pointing to your IP address, like this:
* A 1.2.3.4
Once its done, whatever you type before your domain will be sent to your root domain, where you can get by splitting the HTTP_HOST server variable, like the user buggs said above:
string user = HttpContext.Request.ServerVariables["HTTP_HOST"].Split(".")
//use the user variable to query the database for specific data
PS. If you are using a shared hosting you're probably going to have to by a Unique IP addon from them, since it's mandatory for the wildcard domains to work. If you're using a dedicated hosting you already have your own IP.
The way I have done it is with HttpContext.Request.ServerVariables["HTTP_HOST"].Split(".").
Let me know if you need more help.

Resources