Collecting all or at least a good sample of tweets related to a particular event - twitter

I'm interested in using the Tweets in a Complex Networks research project for which I've to collect a good amount of tweets(unbiased sample of tweets) that're related to a particular event, for example, say Boston Bombing. I need to collect the tweets that're related to this event from the event date to current date.
Tried Search API and got to know that I can't retrieve tweets older than a week. I've a small doubt regarding Streaming API, does Streaming API allow you to get older tweets and not real-time tweets. Also, I don't need tweets that're of a particular user.
Is there any way that I can collect the necessary tweets? If there aren't any, can you give me any other archiving sources that I can use?
Thanks a lot.

The streaming API is for accessing what is happening right now, there is no option to "replay" the stream from some arbitrary point in the past. You can access past tweets from individual users' timelines (up to 3,200), but this would require knowing who had tweeted about your event and, as you say, you can only search the past week.
If you are prepared to pay for it, you can get tweets since the start of 2010 via DataSift.

Historical data going back to the very first Tweet ever is only available from Gnip.

Related

How to get the greatest Tweets flow?

I am working on a project using the Twitter API. I am doing data analysing on Twitter Data. Actually, I have to stream the greatest number of Tweets ever to perform my algorithm on it. Everything works well but I am quite disapointed by the number of Tweets I can actually collect.
By using the STREAMING API, I have only access to around 4500 Tweets each hour in an area like London. The more Tweets I might collect, the better my analysis. I read somewhere that someone was collecting like more than 100,000 each hour...
Do you think my authentification might be rate limited ?
Is SEARCH allows us to collect more Tweets than STREAMING ?
Thank you.

How do I collect 1000 recent tweets of lot of users(say 80000) from twitter?

I am trying to collect 1000 recent tweets of lot of users (around 80000) for my research work. I tried using REST API but due to rate limit its becoming not practical for 80000 users. Lot of Research papers said they have collected tweets and other information of thousands of users but i could not figure out how did they do. WHat is the best way to do the same.
Use the Twitter Streaming API.
You will be able to get real-time tweets from everyone. You can filter it down with search parameters if you want.

Is there any way to "backdate" requests to google server-side analytics?

I have an iOS app which can be used offline. I need to do anonymous page view tracking, so our customers can tell which pages people are most interested in (to drive future investments). So when the user is offline, we save a timestamped page view list, and if the user happens to be online when they use the app, we send these historic records up, and also do real-time tracking.
I'm keeping some summary statistics in my GAE app, so I can report the page views with historic accuracy. However, I'm also feeding these views into google analytics, using some python code I ported from google's server-side samples.
That all works great (except for language tracking, which I may have solved thanks to a separate question here on SO). However, I'd love for google analytics to be able to understand the historical hits in context. Right now, if I connect up after looking at several pages offline, GA thinks I just popped through a bunch of pages over the course of a couple seconds.
There is no documented utm variable for timestamping. The google analytics SDK for iOS (which I'm not using) has this ominous note:
Known Issues
Possible inaccurate timestamps: timestamps are recorded at the time the application dispatches to Google Analytics, so if a user experiences long periods of offline use, the timestamps may not be 100% accurate.
That seems like a bit of an understatement. Wouldn't offline timestamps be 100% inaccurate?
Anyway, the fact that the SDK doesn't handle this right makes me think I'm not going to be able to solve this. But I figured some SO wizard might have an idea...
In fact, timestamp is a "relative" (client side) information used by Analytics to compute things like "time on page".
When the page is view in "absolute" (date and time) is always the time you send the request.

About data mining by using twitter data

I plan to write a thesis about using sentiment information to enhance the predictivity of some financial trading model for currency.
The sentiment data should be twitter threads including some keyword, like "EUR.USD". And I will filter out some sentiment words to identify the sentiment. Simple idea. Then we try to see whether here is any relation between the degree of sentiment and the movement of EUR.USD.
My big concern is on twitter data. As we all know that the twitter set up the limit to see the history data. You could only browser back for like 5 days. It is not enough since our strategy based on daily sentiment.
I noticed that google have some fantastic thing like timeline about the twitter updates: http://www.readwriteweb.com/archives/googles_twitter_timeline_lets_you_explore_the_past.php
But first of all, I am in Switzerland and seems I have no such function on my google which is too smart to identify my location and may block some US google version function like this. Secondly, even I could see some fancy interactive google timeline control on my firefox, How could I dig out data from my query and save them? Does google supply such api?
The Google service you mentioned has shut down recently so you won't be able to use it. (http://www.searchenginejournal.com/google-realtime-shuts-down-as-twitter-deal-expires/31007/)
If you need a longer timespan of data to analyze I see the following options:
pay for historical data :) (https://dev.twitter.com/docs/twitter-data-providers)
if you don't want to pay, you need to fetch tweets containing EUR/USD whatever else (you could use the streaming API for this) and store them somehow. Run this service for a while (if possible) and you'll have more than just 5 days of data.

Is it possible to get the number of tweets with a certain hashtag?

Is it possible to get the number of tweets with a certain hashtag? I have been looking in the twitter API but havent found anything, any help?
Thanks!
You have basically two options to get tweets based on a hashtag (none of which will give you all tweets from the past): the search api (see here for usage) to get a limited amount of tweets from the past or the streaming api (using a filter to get current tweets as they happen).
The search api has a limit on the number of tweets it returns. Usually only a couple of days back and maxing out at around 2500 tweets. This very much depends on the hashtag.
The streaming api can be used to get all tweets of a hashtag (assuming that the hashtag is not so popular as to make you hit the streaming api bandwidth limit). -- But not from the past, only new tweets and only as long as you monitor the stream.
A bit late for my answer, but I have found this API which may be of use to you:
https://code.google.com/p/otterapi/
And here is a small tutorial to get you started:
http://clarklab.com/posts/finding-the-number-of-tweets-for-a-hashtag-or-twitter-account-mentions-using-the-topsy-api/
Unfortunately, it is not free but it offers a 30-day trial so you could try it out and see if it suits your needs.

Resources