Why can Google not direct me to the paper? - search-engine

I enter this reference into the google search field
Nature 2008 May 8;453(7192):164-6
I expect at least one link to be from the nature.com website, I mean it says "Nature" in the query. All results are from ncbi, which only collects abstracts. Is that so hard? Usually references in journals are in such or a similar format... How come it's not recognized as such?
Please redirect the question to the appropriate stackexchange sub-field if necessary.

When Google lists query results, it takes into consideration website popularity. Your paper is both on ncbi and nature.com and because nsbi website is ranked at 361-st and nature.com is at 2,984-th for internet traffic from alexa.com, your Google results will be like that.

Related

Facebook Search in Graph API

I'm developing an iOS application that let the user to search for a person throught the Graph API.
What I want is the SAME behavior that it's present on the Facebook website. You know when you begin to search for a person in the top text input? The first results will be mostly your friends AND some people you MAY know or people you already looked for.
The problem? Try to use the same search pattern here to search a person: Graph Api Explorer
The Graph Api returns DIFFERENT results than the search input on the Facebook website.
Does anyone knows why? Is there a way to achieve the same results?
Facebook are using many algorithms to display search result like Relevance Indicators, Complexities of User-Centric Search and The Product.
One of the algorithm to display result on their page as below.
Personal Context:
Unlike most search engines, every Facebook search involves two key elements - a query and a querier.
Just as we need to understand the query, it’s as essential to understand the person behind the query.
People are more likely to be looking for things located in their own city/country or for people who share the same college/workplace.
We consider this information and much more when ranking results. The more we know about you, the better your search results will be.
In Graph API, they are not using this algorithm.They are just displaying the queried result. Hence you can not achieve same result using graph search API.
To achieve this you can use following apporach -
Get the friend list of user using me/friends?limit=1&offset=1
Get the user list using search api
merge both the result
show result(s) to user
For more information(approach/algorithm) you can check Intro to Facebook Search
Is there a way to achieve the same results? - NO
Does anyone knows why? - NOT REALLY
(Edit: Seems in another answer, someone does actually, but it doesn't change the answer for "If you can achieve it")
But its safe to presume that Facebook does not allow all functionality through the API, why would they after all ? They need to keep the people coming to their own platform. So I can't give you a straight forward response on WHY, but IF ? Not possible, there is zero documentation about more specified search for type user. When you request user friends, you will only get the user friends who are using the same app starting v2.0
Am afraid that you will have to drop the functionality you want to achieve.
It is not just the graph search. When you refresh your TimeLine. The order of posts gets changed every time because Facebook takes a Pull on Demand approach. Which means whenever you login, the data from your friends is fetched. Which is why facebook has a limit to maximum number of friends.
Talking about the Graph search and Graph API. They are not same and the Graph Search cannot be accessed through the Graph API. So, you would have to change your approach.
To explain why the graph search gives different results on same search term. I would guess that it follows the game Pull on Demand model ( although it is not open and we cannot know for sure ). Following that model makes sense though.
Thanks

view search queries by popuarity containing keyword

I have been spending some time watching the search queries that bring people to my site on google analytics recently, in order to see if people are finding exacty what they are looking for and if not creating that new content. But i figured an easier way would be to what search queries are popular. But containing a keyword that relates to my site.
for example, i want to see all the most queried search terms that contain "in japanese".
like "dog in japanese", "i love you in japanese"
I have found http://www.google.com/trends/
but after playing with it for a while it doesnt seem like i can do this. seems like i can just see popularity of spesific queries. I dont want to see how popular specific queries are, i want to see what queries containing x are popular. Anywhere i can do this?
If you join the Google AdWords program, you can use the Keyword Planner tool to try out keywords and immediately get the number of searches per month in a chosen geography. This is a very interesting tool. See http://adwords.google.com.
I'm not sure this question belongs here on SO though.

From a development perspective, how does does the indeed.com URL structure and site work?

On the webmaster's Q and A site, I asked the following:
https://webmasters.stackexchange.com/questions/42730/how-does-indeed-com-make-it-to-the-top-of-every-single-search-for-every-single-c
But, I would like a little more information about this from a development perspective.
If you search Google for anything job related, for example, Gastonia Jobs (City + jobs), then, in addition to their search results dominating the first page of Google, you get a URL structure back that looks like this:
indeed.com/l-Gastonia,-NC-jobs.html
I am assumming that the L stands for location in the URL structure. If you do a search for an industry related job, or a job with a specific company name, you will get back something like the following (Microsoft jobs):
indeed.com/q-Microsoft-jobs.html
With just over 40,000 cities in the USA I thought, ok, maybe it's possible they looped through them and created a page for every single one. That would not be hard for a computer. But then obviously the site is dynamic as each of those pages has 10000s of results and paginated by 10. The q above obviously stands for query. The locations I can understand, but they cannot possibly have created a web page for every single query combination, could they?
Ok, it gets a tad weirder. I wanted to see if they had a sitemap, so I typed into Google "indeed.com sitemap.xml" I got the response:
indeed.com/q-Sitemap-xml-jobs.html
.. again, I searched for "indeed.com url structure" and, as I mentioned in the other post on webmasters, I got back:
indeed.com/q-change-url-structure-l-Arkansas.html
Is indeed.com somehow using programming to create a webpage on the fly based on my search input into google? If they are not, how are they able to have a static page for millions and millions and millions possible query combinations, have them dynamically paginate, and then have all of those dominate google's first page of results (albeit that very last question may be best for the webmasters QA)?
Does the javascript in the page somehow interact with the URL
It's most likely not a bunch of pages. The "actual" page might be http://indeed.com/?referrer=google&searchterm=jobs%20in%20washington. The site then cleverly produces a human readable URL using URL rewrite, fetches jobs in the database that matches the query, and voíla...
I could be dead wrong of course. Truth be told, the technical aspect of it can probably be solved in a multitude of ways. Every time a job is added to the site, all pages that need to be done to match that job, might be created, thus producing an enormous amount of pages for Google to crawl.
This is a great question however remains unanswered on the ground that a basic Google search using,
ste:indeed.com
returns over 120MM results and secondly a query such as, "product manager new york" ranks #1 in results. These pages are obviously pre-generated which is confirmed by the fact the page is cached by the search engine (sometimes several days before) has different results from a live query on the site.
Easy when Googles search bot crawls the pages on indeed or any other job search site those page are dynamically created. Here is another site: http://jobuzu.co.uk i run this which is similar to how indeed works.
PHP is your friend in this and Indeed don't just use standard databases look into Sphinx and Solr as they offer Full text search for better performance then MySql etc.
They also make clever use of rel="canonical" and thorough internal linking:
http://www.indeed.com/find-jobs.jsp
Notice that all the pages that actually rank can be found from that direct internal link structure.

How would I find all the short urls that link to a particular long url?

Basically I want to know how many people have tweeted a link to a url, but since there are dozens of link shortener out there I don't see any way to do this without having access to all of their url maps. I found a previous question here but it was over a year old and didn't have any new answers.
So #1, does anyone know of a service/API for doing this?
And #2, can anyone think of a way to accomplish this task other than submitting the long url in question to all the popular link shortening sites?
ps- I'm also open to comments about why this is impossible or impractical.
You could perform a Google search (or the equivalent via API) for any pages that link to your page. This is done with the link: keyword. So if you're trying to figure out how many people link to www.example.com (regardless of whether it's through a link shortner URL), then you would just do a Google search for link:www.example.com.
e.g.: http://www.google.com/search?q=link:www.example.com
Note that this will only find pages that have been indexed, so pages that haven't been crawled, or pages that get crawled infrequently, will not show up in the results until a later date (if at all).
Since all sites have different algorithms for shortening the URLs, and these are different sites that most likely do not share their data with each other, how can you hope to find all of them in a single or small number of queries?
All you can do is brute-force it, and even then this might not be any good if a site is content to create a new value for the same long-form URL (especially if you send a different long-form URL that maps to the same place, like http://www.stackoverflow.com/ rather than http://stackoverflow.com/).
In order to really get this to work, there would have to be a site that ALREADY automatically collects all of this information from every site, which the URL shortening sites voluntarily call. And even if you wrote such a site, that doesn't account for the URL-shortening sites already out there who already have data!
In short, I do not see how this is remotely possible, unless I'm wrong about there being such a database somewhere out there.
So months after asking this question I came across a solution to a similar question, that is how to tell how many times a link has been shared on facebook. The solution, via a simple new API call:
http://graph.facebook.com/http://stackoverflow.com
returns the following json data:
{
"id": "http://stackoverflow.com",
"shares": 1627
}

What's the best method to capture URLs?

I'm trying to find the best method to gather URLs, I could create my own little crawler but it would take my servers decades to crawl all of the Internet and the bandwidth required would be huge. The other thought would be using Google's Search API or Yahoo's Search API, but that's not really a great solution as it requires a search to be performed before I get results.
Other thoughts include asking DNS servers and requesting a list of URLs but DNS servers can limit/throttle my requests or even ban me all together. My knowledge of asking DNS servers is quite limited at the moment, so I don't know if this is the best method or not.
I just want a massive list of URLs, but I want to build this list without running into brick walls in the future. Any thoughts?
I'm starting this project to learn Python but that really has nothing to do with the question.
$ wget http://s3.amazonaws.com/alexa-static/top-1m.csv.zip
You can register to get access to the entire .com and .net zone files at Verisign
I haven't read the fine print for terms of use, nor do I know how much (if anything) it costs. However, that would give you a huge list of active domains to use as URLs.
How big is massive? A good place to start is http://www.alexa.com/topsites. They offer a download of the top 1,000,000 sites (by their ranking mechanism). You could then expand this list by going to Google and scraping the results of the query link: url for each url in the list.
modern terms now are URI and URN, URL is the shrunk/outdated. i'd scan for sitemap files that contain many addresses in one file and study the classic text spiders, wanderes, brokers and bots and RFC 3305 (appendix b. p 50) defining URI regex

Resources