Python twitter crawler for tweets older than one week? - twitter

For an academic usage, I would like to analyze about three months of tweets. However, it seems the official Twitter search API doesn't provide tweets older than one week.
I've tried to write a self crawler, however, given a search keyword, Twitter page will not show tweets older than about one week.
Is there any trick that I can get older tweets? Or my best bet is to hit the API once a week and do it for the following three months?

From the Twitter API documentation regarding limitations:
- The Search API is not complete index of all Tweets, but instead an index of recent Tweets.
- At the moment that index includes between 6-9 days of Tweets.
- You cannot use the Search API to find Tweets older than about a week.
So, yes, if you need to collect a certain span of time, it will require multiple queries, as you suggested.
(You should also read this answer: retrieving tweets from specific user older than 7 days)
There are also currently two commercial companies that have access to the Twitter firehose and can provide this data (they are called "licensed re-syndicators"):
Gnip - offers 30 days of Twitter data
DataSift - up to two years of Twitter data

Related

Is there any methods to get the like list of a tweet?

I want to get the complete like list of a specific tweet, but the Twitter API only provided an API that can retrieve the 100 most recent users who liked the specific tweet. I also looked for Twitter crawlers on Github, but they all worked in a user-oriented manner, ie they can only get a list of liked tweets of a user, not a list of liking users of a specific tweet.
I also tried to crawl the list using selenium, but maybe due to my limited skill, it didn't work well. I don't want to spend a lot of time studying selenium and front-end knowledge just to accomplish a simple thing, so are there any open source codes or twitter APIs that can do this?
Yes. This has just been announced in the Twitter API v2.
Previously, you were limited to the 100 most recent Likes or Retweets with these endpoints. We heard your feedback that this was too limiting and have updated these endpoints to now return all results. To retrieve a complete list of Likes and Retweets, you can now use pagination.
Use the v2 Likes lookup endpoint: GET /2/tweets/:id/liking_users

How to get the old timeline or tweet?

i'm University students of South korea
I'm developing analysis application using bigdata of twitter with my advisor professor. So i'm gathering tweets contains specific keyword(relevant word of crime) at period. I use 'streaming api' and 'search api' now. I have seen that using search api and streaming api result is return tweets of only one week.
I should be get the old data that have keyword of crime and since 2006 until 2016
do you have any idea?
Sadly you can't get tweets from that time range.
From the documentation:
The Search API is not complete index of all Tweets, but instead an index of recent Tweets. At the moment that index includes between 6-9 days of Tweets.
So, you can only get recent tweets from the search API. Be careful too with the data beacuse it's about relevance not completeness, from the same documentation:
Before getting involved, it’s important to know that the Search API is focused on relevance and not completeness. This means that some Tweets and users may be missing from search results. If you want to match for completeness you should consider using a Streaming API instead.
If you really need older tweets you will have to get them from other sources like Gnip. Otherwise you will have to approach differently your problem.
If you have the names (or id's) of all the users that you want to get info you could get the timelines from each user getting up to 3200 tweets.

Is there a way to search for old tweets on a specific date using a specific hashtag?

For example how could search for tweets using the #oldtweets sent on 5-29-16, from 8pm to 9pm.
According to the The Twitter Search API documentation, the query you want is not possible: https://dev.twitter.com/rest/public/search
The Twitter Search API searches against a sampling of recent Tweets
published in the past 7 days.
Beyond the last 7 days, what you want to achieve can only be done through manually searching an account on Twitter.
Another alternative would be to use https://webrecorder.io/
Scroll to the timeframe you want to record or you can attempt to capture the entire feed. Note the auto scrolling option as well.

Twitter Search API for around 3 month data

I am working since last 3 days to find solution about how we can get old data from twitter api around 3 month over limitation of twitter api about 1 week. can some one help me to know best solution.
You cannot get any tweet for the last 3 months apart if you store them in your own database.
The only solution for you would be to pay Twitter GNIP to access historical data.
The Twitter API only provides the most recent tweets for a given search, up to ~3200.
The so-called "firehose" and some datasets are available mostly for research or developer purposes.
There are some services which sell custom datasets, including Twitter itself (which purchased Gnip). The overview by Justin Littman at GWU is rather comprehensive.

Can I use twitter api to search tweets one week before?

I am trying to search keywords in twitter through tweepy.
However, I found it seems like that I can not search the tweets one week before, the code below is the main search code.
for searched_tweets in tweepy.Cursor( API.search,
q = "python",
since = 2014-02-03,
until = 2014-02-04,
lang='en' ).items()
I am not sure whether there is any limited or any better way to search by time, thanks for your help!!
:)
Unfortunately, you can't get tweets older than a week with the twitter search API.
Also note that the search results at twitter.com may return historical results while the Search API usually only serves tweets from the past week. - Twitter documentation.
You can get specific tweets older than a week by looking at individual users or using specific post ids (if you have them) but it's not reasonable to index every single tweet ever to be searchable using the API.
If you need a large time range, you can collect them yourself using the streaming API or check out a service that does (see this dev twitter thread for examples).

Resources