I have a particular keyword that I want to search, and scrape all tweets containing it in the last 10 years.
I used tweepy but it can only provide last 7 days of tweets containing the keyword.
I have access to Twitter Academic API but dont know how to use it.
I'm trying to develop some code in order to get all the tweets that were generated with certain hashtags, then parse them and finally analyse them. I believe I've already thought and solve the last two parts of this but I'm having some trouble with the first one. I've already read the Twitter Search API documentation but I haven't realised yet how to do this. Can anyone help me?
If you want to retrieve the tweets sent recently, you should use the search/tweets endpoint of twitter' REST API, and mention the hashtag inside q parameter
In case you want to listen to tweets containing the hashtag and receive them in real time, then twitter's streaming API is what you should use (statuses/filter endPoint).
Have a look at the documentation on twitter's website, there's also plenty of information on how to do this all around the web.
Is it possible to access historic Twitter data?
I want to write a function that accesses tweets which match certain keywords, like the FilterStream in the Twitter streaming API, but can access tweets which have been created historically.
No it is not.
All you can do, is use search method to get recent matching tweets. (little amount if keywords are unpopular)
Or connect to streaming API and wait (in little while you will have historic tweets)
For an academic usage, I would like to analyze about three months of tweets. However, it seems the official Twitter search API doesn't provide tweets older than one week.
I've tried to write a self crawler, however, given a search keyword, Twitter page will not show tweets older than about one week.
Is there any trick that I can get older tweets? Or my best bet is to hit the API once a week and do it for the following three months?
From the Twitter API documentation regarding limitations:
- The Search API is not complete index of all Tweets, but instead an index of recent Tweets.
- At the moment that index includes between 6-9 days of Tweets.
- You cannot use the Search API to find Tweets older than about a week.
So, yes, if you need to collect a certain span of time, it will require multiple queries, as you suggested.
(You should also read this answer: retrieving tweets from specific user older than 7 days)
There are also currently two commercial companies that have access to the Twitter firehose and can provide this data (they are called "licensed re-syndicators"):
Gnip - offers 30 days of Twitter data
DataSift - up to two years of Twitter data
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
I have a video blog for which I would like to track certain statistics, including stats from Google Analytics, Twitter, YouTube, Facebook, etc.
The problem is that the various stats are on different websites, which require different logins, etc. It takes a long time to actually view everything. I am looking for a way to be able to aggregate all of this information in one place.
I have searched quite a bit on Google, Mashable, Delicious, etc and I haven't found any websites that do what I want. Are my searching skills bad, or does this really not exist?
The data in which I am interested appears to be available in readily parsable forms (see below), but I am hesitant to write an app to do this myself, because of an already more than full workload.
Data I want to aggregate:
Google Analytics -- tracking on my website
number of visitors
traffic sources
use Data Export API -- http://code.google.com/apis/analytics/docs/gdata/gdataDeveloperGuide.html
Twitter
number of followers
number of retweets
new # messages
new direct messages
Twitter API -- (sorry, I can only post one hyperlink because I am new)
Facebook fan page
number of fans
new posts on wall
Facebook API -- (sorry, I can only post one hyperlink because I am new)
Tumblr
number of followers
Video
number of views
view location
number of comments
number of channel subscribers
do this for
YouTube -- CSV report available at (sorry, I can only post one hyperlink because I am new)
MetaCritic
Feed burner (RSS)
number of subscribers
CSV report available at (sorry, I can only post one hyperlink because I am new)
SEO stuff
Google PageRank
Alexa rankings
So is there an app that does this already, or should I do this myself? I would like a quick and dirty way to do this -- I was thinking something like Yahoo pipes, but it appears to not be up to the task. I could probably get it done in Grails, but that might be more trouble than it's worth. Other ideas?
I have a better answer. YQL has community data tables for all the services you listed. You can pull in all the different values through their API.
http://developer.yahoo.com/yql/
You could try creating a Google Spreadsheet and use their external data import tools.
http://docs.google.com/support/bin/answer.py?hl=en&answer=75507
The biggest problem will probably be access authenticated APIs.
Presumably that all of the services above has fashioned a statistics API, I would advice you to write it yourself rather than battling an integration war with a bunch of aggregating programs.
Here's an iphone app that does at least a bit of this:
http://ego-app.com/
I don't know a single tool that can do this, off the top of my head. But you can chain a few tools together to do this.
1- If you're on Windows, use Website Watcher. It has a macro-recording tool to login a webpage, a regex-based tool to filter content and a scripting language that let you email/export the result. IMO, this will let you extract data from just any web page/RSS/forums.
2- Then use Dropbox to automatically upload the result files to your Dropbox's public folder (because you will need the public link to these file).
3- Use Yahoo Pipes to consolidate/aggregate the result files.
I suggest you try Metricly http://metricly.com/ that is natively intergating Facebook & Google Analytics data. It is extensible by nature and with a little bit of tweaking you can push any meric to it. I enjoy it.
I originally suggested this as an edit to abraham's answer but it was rejected:
Mikael Thuneberg has written a freely available google script for pulling GA data into Google Docs using the GA API: http://www.automateanalytics.com/2010/04/google-analytics-data-to-google-docs.html
I use it for creating client dashboards all the time. I suspect there may be others for pulling in twitter/facebook data etc.
And Google have just released this tool for importing GA data into Google Docs:
http://analytics.blogspot.co.uk/2012/08/automate-google-analytics-reporting.html
Also see SEOTools for Excel which can pull some facebook and twitter data as well as Google Analytics through the API.
YouTube has a public API http://developers.google.com/youtube/analytics to retrieve reports for your videos and channels.