It would help if I could do a search log analysis for my research. Is it possible to use a search API (Google, Yahoo, Bing) to create a log of web search queries over a specified time span, or is it available on request?
The only thing I know of is the old aol search logs which they released a while back. You can find it on some of the torrent sites. for news about it read this
Related
My requirement is to search for a file in my entire SharePoint Online (SPO) using Graph apis. I have my query something like this which I use in MS Graph Explorer:
https://'my domain'.sharepoint.com/_api/search/query?Querytext='res1a2b3c4d5e'
Basically above query is to search for all the documents having title/name as 'res1a2b3c4d5e'. This works fine if I search for any existing document. If I try to search for any document which was created/uploaded just before making above call I wont get the result.
If I search for the same after couple of minutes, then the request succeeds. Where as when I try the same on my customers site where it has millions of documents. It nearly takes around 20+ hours before I can do a successful search
So does graph api work on any SPO cache?
How can I search for newly added file without having to wait for 20+ hours?
Thanks in advance.
The search results will be available only after the search content is crawled and available in search index. While you can control the crawl schedule in SharePoint On-Premise, you cannot do so in SharePoint online. Crawling frequency in SharePoint online is managed by Microsoft.
More Details
I am currently developing an application in Rails, which requires to check whether a website has been listed in Google, Bing, Yahoo, Yelp and Yellow Pages. From my research the best is to check site: domain.com on Google and Bing and look for results and check in Yahoo directory for the domain.
Is there any other way to do it? I mean some code snippet to check on domain's home page or using their API or something like that. Also how to check on Yelp and Yellow pages.
You can use mechanize and write web-style drivers
Google: do a search on your domain with this on the search term
site:checkmeout360.com
https://www.google.com/search?q=site%3A<SITE_NAME>.com
Try to see how yelp, yahoo, bing and yellow pages do indexing. Then you can use mechanize to automate the searching process for you, you can use mechanize to do the search like above with google, then write asserts (check if stuff you are looking for is on the search result)
Search engines don't appreciate automated queries that are sent their way.
Here is what Google has to say about it:
Google's Terms of Service do not allow the sending of automated queries of any sort to our system without express permission in advance from Google. Sending automated queries consumes resources and includes using any software (such as WebPosition Gold) to send automated queries to Google to determine how a website or webpage ranks in Google search results for various queries. In addition to rank checking, other types of automated access to Google without permission are also a violation of our Webmaster Guidelines and Terms of Service.
I'd like to make a tool which accesses a search engine programatically.
I've been enjoying using YQL recently and thought it might be useful since it can dig data out of HTML pages.
But I tried it with Google, Bing, and Yahoo search and they all seem to block YQL.
I wonder if there are some lesser-known web search sites that might work with YQL.
Or actually if there's still any search engine which offers an API that would be even better.
(In fact I'm only searching linguistics.stackexchange.com because the Stack Exchange APIs don't provide a way to search by text that I can find.)
Most search engine sites will block access from screen scrapers and other agents. YQL is designed to respect the robots.txt file, so on many sites like this it won't work.
Instead, I suggest moving a step above HTML screen scraping and using a published search API.
In YQL for example, there is a table which provides access to the Bing search results:
select * from microsoft.bing where query="soccer" and source in ("web","image")
You could also look at the Yahoo! BOSS API or using the Bing Search API directly.
For a normal search engine, I understand that it regularly travel across the internet to gather web page information, and sometimes the web page can voluntarily submit to engines their latest updates. But how about BT search engines? These torrents cannot be simply find through viewing web pages. Then how do they work? User submit?
A publisher submits their torrent to a tracker, and then distributes a link to the file on that tracker. Users in turn use that file to connect to the specified tracker and download that file; the tracker then gives a list of users who are sharing that file. The torrent search sites just list what trackers are available and what files can be found on what trackers, which are submitted by publishers.
However, I think this may be better suited to something like the superuser rather than stackoverflow...
No, users do not submit torrents. As we made with our torrent search site http://tornado.li/, we've created different robots that scan all added torrent sites for new torrents and add them to database. The whole process is fully automated, only in this way it's possible to give a good choise of torrents.
is there any service from where we can download tweets?
UPDATE!!!
Googling for sometime gave me this result
a.) http://snap.stanford.edu/data/twitter7.html
b.) http://140kit.com/datasets
Yes, there is. It's called the Twitter API.
As we have access to limited tweets by Twitter-API, we should make use of third-party resellers like Topsy for just the past data, GNIP just for streaming data, or DataSift for both streaming data as well as past data.
You might also want to check the following sites:
http://www.infochimps.com/collections/twitter-census
http://www.tweetarchivist.com/
Twitter API allows provides partial results, it gives you the last 100 or even 500 tweets fo every search. If you need to keep tweets long term, twitter API shows its limits.
I had same need as you apparently hae and I developed a tool that queries twitter API periodically and stores search results on a Wordpress database.
I called the tools twittcorder and you can find a live demo on twittcorder.com
I hope this helps.
These other data sources are probably shared against the Twitter TOS. I wouldn't want to invest my time and effort building something on datasets that are non-repeatable. The Twitter Streaming API allows collection of a sample of Tweets.
There's also Gnip: http://gnip.com/.
Sysomos is there for complete data analysis including twitter, faecbook and various boards and forums