Which are the best paper on indexing and ranking? [duplicate] - search-engine

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Where can I find materials about indexing and page ranking?
I'm reading a search engine source code without a document.
Are there classic papers on indexing and ranking?

You can read the original Google Paper: http://infolab.stanford.edu/~backrub/google.html

There is this excellent free online book on the subject, which covers indexing, queries, scoring, ranking, PageRank and everything else on the subject really. It's really very good. It covers the theory and practice of search engine technology and information retrieval. An essential read if you are diving into the nuts and bolts of a search engine, like Lucene.

Related

A must read for image processing and computer vision? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I tried to read Digital Image Processing by Gonzalez/Woods but I found it difficult to understand/grasp. I have taken a Graduate Course in Computer Vision, which is more practically oriented and I am doing lot of cool stuff with OpenCV, however I still feel I am swimming in higher abstractions, and do NOT understand the basics beneath.
I am planning to read a book on Computer Vision/Image Processing during the Winter Break to solidify my understanding of the content and would appreciate some must-read suggestions
I have done assignments like - camera calibration, image transforms, stitching images into panoramas, haar classification.
You should probably take a look at Szeliski's book
Hartley and Zisserman's book is also excellent.
Gonzales and woods (or Wintz in my day) is a very good introduction.
There is a more readable but less concise introduction - Image-Processing-Analysis-Machine-Vision
And since you are working with opencv - you can do worse than read the opencv book
Have a look at this book. It's quite heavy (and expensive!), but it covers a lot of topics, and each chapter is authored by a different person that is competent in the corresponding field. If cost is a huge issue, I've seen reprints from Taiwan that appear to be legitimate for a fraction of the original price (they are soft cover, though, and the print quality is obviously not as good).
Mind you, I've got both The Handbook and Gonzalez & Woods, and I've found Gonzalez to be easier to digest during the initial stages. Rather than just reading, it is definitely recommended to attempt to reproduce all the examples that they give, and make an honest attempt at the exercises at the end of each chapter. The Handbook is great for coverage but lacks exercises.
Finally, your choice of must read really depends which specific direction you are expecting to be working in. The basic knowledge (spatial and frequency domain filtering, for example) has been around since the dawn of the field (early 60s) and is usually covered fairly well by most texts. If you want to learn about more recent applications, you have be a bit more specific (or go for The Handbook as it attempts to cover it all).
For contemporary readers viewing this question, an outstanding text is Prince's Computer Vision: Models, Learning, and Inference . The pdf is available free on that site.

who are people Devs/techies should follow on twitter/ facebook? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 12 years ago.
Improve this question
yesterday i was reading an article that touched on twitter and made mention of how it can be influential if someone like Tim O'Reilly makes a suggestion then his 1.5 million followers on twitter will react to such tweets and cause some sort of reaction.
weather tweets and/ or the entire online social media ecosystem is debatable to no end it is a means of staying informed, sort of like watching the morning news.
this thought has sparked me to create a twitter account so that i can follow current events in what im interested in, namely software development and technology in general.
this brings me to my current situation of what intelligent people are worth following and listening too. i know the social media web is flooded with mind numbing nonsense but in part there are movers and shakers like Tim O'Reilly who are well worth listening too if for nothign more than getting a sense of which direction the wind is blowing.
so the million dollar question is who do you follow regularly?
please list the moniker of the person for others (ME) to be able to easily add & follow them as well... also list the medium (facebook/ twitter...)
in particular im interested in these technologies(MS SQL, asp.net/ C#)
thanks all for helping me get off to a fast start.
The standard ones are problably something like:
haacked
jonskeet
spolsky
scottgu
Not exactly your what you are looking for, but I would also consider blogs as well if I were you, which I find much more in depth and easier to follow than tweets. I would certainly add Scott Hanselman to the list of people you follow.
Blog: http://www.hanselman.com/blog/
Twitter handle: shanselman
The two that top my list:
martinfowler
unclebobmartin

Are there any useful datasets available on the web for data mining? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
Does anyone know any good resource where example (real) data can be downloaded for experimenting statistics and machine learning techniques such as decision trees etc?
Currently I am studying machine learning techniques and it would be very helpful to me to have real data for evaluating the accuracy of various tools.
If anyone knows of any good resource (perhaps as csv, xls files or any other format) I would be very thankful for a suggestion.
The UCI Machine Learning Archive and the past datasets of the KDD Cup are probably the best known such archives for general data mining. An example of a more specific kind of source is the UCR Time Series Classification/Clustering Page.
Here's an article from DataWrangling.com that lists hundreds of datasets.
On Kaggle you can find some competitions and download the associated datasets.
There is a system that scores your solutions in real time and you'll see your place on the "live leaderboard".
It's a good way of studying machine learning techniques because choosing a "for knowledge" competition you can compare your solution with other participants and discuss strengths and weaknesses of various approaches.
Try my blog, Vellum Information, where I've got several annotated bibliographies curating data sets and data sources:
http://velluminformation.com/2014/03/05/big-data-public-databases-an-annotated-bibliography/.
I've got an annotated bibliography of various data sources that are available. I've also got an annotated bibliography for health data here:
http://velluminformation.com/2012/05/19/free-online-public-data-sources-an-annotated-bibliography/.
Obvious disclosure, this is my blog, so there are other technical things on there as well.

IDoc Beginners tutorial [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Where can I find a tutorial for IdocScript for Stellent/Oracle UCM?
Hi All,
Can someone point me to a good BEGINNERS tutorials on IDoc. I basically need to understand the IDoc format.
I tried googling , but not able to find a beginners introductory tutorial to IDoc.
pardon me for being a noob at googling.
Thanks.
I just added an answer to a similar question here;
Where can I find the documentation for IdocScript for Stellent/Oracle UCM?
Don't sweat on the google part. There really isn't much around as IDOC is only used in Oracle UCM (formerly Stellent)
Afaik, there is nothing specifically targeted to Beginners. Best bet out of the book/pdf is to start creating copies of fragments (inside Site Studio Designer) and working out what makes them tick.

What are some ways to have fun with a large amount of data? (ie, the Twitter, del.icio.us etc. APIs) [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Twitter, Google, Amazon, del.icio.us etc. all give you a lot of data to play with, all for free. There's also a lot of textual data available through initiatives like Project Gutenberg. And that, it seems, is just the tip of the iceberg.
I have been wondering how you could use this data for fun. I'm a first year IT student, so I have no knowledge of statistics, machine learning, collaborative filtering etc. My interest in this area was piqued by the book Programming Collective Intelligence by Toby Segaran, and now I want to take a deeper look at what you can do with data. I don't know where to start. Any ideas?
I have also been pondering whether I should go and buy something like Paradigms of Artificial Intelligence Programming. Is it worth the trip across the city?
Try firing books in different styles from Guttenberg through a Markov Chain generator - there's one in Perl here to get you started.
Visualizations, do them, share them.
You can use some of that data to make money (if you're really good!)
http://www.netflixprize.com/ Netflix has made available an anonymized dataset, and are asking for better algorithms to predict customer choices.
If you're familiar with Python try playing around with the nltk. It has tons of libraries for text mining and even machine learning in general. Try working your way through nltk book.
If you want to start off with a easy AI problem, you might try clustering.
http://en.wikipedia.org/wiki/Data_clustering
You could use it to group flickr images together by tag or something cool like that.
You can make puzzles like hangman games. Or a mashup or try Yahoo pipes to join information.
Predict future stockmarket trends from the data. Profit!

Resources