How does search engine mapping sites without links to certain pages? - search-engine

How search engines can map pages in sites without links to them [pages] ? how can they search in domains such as www.zippyshare.com and find links for download page in that site or other uploading sites?
Thanks.

Related

How do search engines obtain unlinked pages?

I noticed that quite a lot Dropbox pages are indexed by Google, Bing, etc. and was wondering how these search engines obtain for instance links like these:
https://dl.dropboxusercontent.com/s/85cdji4d5pl5qym/37-71.pdf
https://dl.dropboxusercontent.com/u/11421929/larin2014.pdf
Given that there are no links on dl.dropboxusercontent.com to follow and the path structure is not that easy to guess, how is it possible that a search engine obtains such a link?
One solution might be that it was posted on a forum and picked up by the search engine but I looked up quite a lot of the links and checked the backlinks without success. I also noticed that Bing and Yahoo show a considerable amount of more results than Google which would mean that Bing does a better job in picking up these links which seems unlikely to me.
Even if the document is really unlinked (no link on their site, no link on someone other’s site, no sitemap, no Referer log from a site that gets linked in the document, etc.), it’s still possible for search engines to find the link.
Two ways are:
Someone could submit the URL to a search engine (whether via a public tool, or via the site’s webmaster account).
The search engine could get all URLs that certain users visit in their browsers. This could, for example, happen when the user has installed a toolbar from that search engine. This is the case with Bing, see my related answer on Webmasters SE:
Microsoft has confirmed that they do discover and index URLs that they find through users surfing the Internet with the Bing Toolbar installed.
And there might be more ways, of course.

How google is crawl my database without connecting?

I have website and added the google custom site search. It works fine and displays the results. The pages are stored in the database and allowed to edit by Administrator (as CMS solution)... How google search the content and displays from my database content? I would like to know the technique or method google follow?
Google does not have access to your database, only your web pages. It regularly crawls your web pages and indexes them, just like it would for its own search results. The only difference is it serves up results specific to your website to your website.

How to block my website from all the web

I have a website and it should not be accessible from any one without the URL or any search engines.
No search engine should be aware of my website, only the person with the link should access it. Can some one suggest the best ideas since I'm going to share my office data's on it.
You can prevent most search engines from indexing your site with a robots.txt file. More details here: http://www.robotstxt.org/
However, this is not very secure. Some robots ignore robots.txt. The best way to restrict access is either to require a user to log in before entering the site, or use a firewall to allow only that user's IP address.
You need to add a robots.txt file to the root folder of your website indicating that search engine spiders should not index tour website.
http://www.robotstxt.org/robotstxt.html
But its left to the search engines to read the file. Most popular search engines do honor this file. Another method is to not have a index.htm or default.htm in your website Even if it exists, remove any links to internal pages. This way spiders will never know the site structure of your website.
Wow. OK:
1) robots.txt
http://www.robotstxt.org/robotstxt.html
2) Authentication. If you're using apache, password protect the site
3) Ensure no one ever links to it from anywhere.
4) Consider a different alternative, like dropbox.
http://www.dropbox.com/

how to find webpages which link to some specific page?

Just wondering if there's any way to search all web pages which link to some specific url? For example, all web pages containing link to example.com? Thanks
You probably might want to explore the Google Search API which allows you to use Google search results in your programs.

Creating Custom Search Webpage Using Google Engine

I want to create a search webpage which should display the Google results page as well as results from our intranet webpage. Can I design it using Google Custom Search Engine?
Not unless you expose your intranet to the public Internet for Google to index, which is probably not something you want to do.
They have services to index intranet content as well, but they might be a big costly.

Resources