I am trying to collect all Tweets of certain users and the replies to those Tweets. Collecting the original Tweets works out fine, but collecting the replies does not. As soon as on_tweet is called (when the stream receives a Tweet), I am trying to add a rule 'in_reply_to_tweet_id: 'id of incoming tweet'' to my stream so that it also stream those replies. Yet the code below doesn't work. I checked with get_rules after the stream was closed and there was no rule added. I also tried adding a simple 'OR: keyword' rule, which was also not added, so the ID is not the problem.
Thanks, any help is appreciated!:)
class stream(tweepy.StreamingClient):
def __init__(self, token):
tweepy.StreamingClient.__init__(self, token)
self.raw_tweets = []
self.raw_replies = []
#this method is called whenever the stream receives a tweet
def on_tweet(self, tweet):
#checking whether new tweet in stream is a original tweet or a reply to a tweet
if (tweet.conversation_id == tweet.id):
#add the id to the rules so that replies to the tweet are also streamed
#this is where the problem is
id_as_str = str(tweet.id)
new_rule = 'OR in_reply_to_tweet_id:' + id_as_str
self.add_rules(add= tweepy.StreamRule(new_rule), dry_run = True)
print('this is an original'+tweet.text)
print('this is a reply:'+tweet.text)
return self.raw_tweets
I have the Academic Research access to Twitter's API and have been using Tweepy to access the API. My problem is I cannot retrieve the tweets from older tweets
This is the code attempting to retrieve the tweets using the conversation_id, from 2014
# https://twitter.com/NintendoAmerica/status/535462600294035456
start_time = '2014-11-01T00:00:00Z'
end_time = '2014-12-12T00:00:00Z'
tweets = client.search_all_tweets(query = 'conversation_id:535462600294035456', max_results = 500, start_time=start_time, end_time=end_time)
and the output is:
Response(data=[<Tweet id=535465221679489024 text='#NintendoAmerica #Pokemon [this was a link I had to remove]'>], includes={}, errors=[], meta={'newest_id': '535465221679489024', 'oldest_id': '535465221679489024', 'result_count': 1})
which is only one seemingly random tweet amongst many.
However, when I tried running the same code on a more recent tweet, it retrieved all the tweets. I do not have to specify a start/end time because it's a tweet from the past 30 days.
# https://twitter.com/380kmh/status/1545477360916373504
tweets = client.search_all_tweets(query = 'conversation_id:1545477360916373504', max_results = 500)
the output was complete (shortened, I removed the tweets):
Response(data=[...], meta={'newest_id': '1546465585093087235', 'oldest_id': '1545477768229670912', 'result_count': 18})
I followed Tweepy's documentation here, using Client.search_all_tweets:
I also tried using Postman to retrieve the tweets but it came out empty, even though I followed the documentation here:
Query: https://api.twitter.com/2/tweets/search/all?query=conversation_id%3A537923834557771776&start_time=2014-11-01T00:00:00.000Z&end_time=2014-12-18T00:00:00.000Z&tweet.fields=in_reply_to_user_id,text
"meta": {
"result_count": 0
What am I doing wrong?
I am building a data mining app to collect tweets using the Twitter streaming API (via tweepy) and run a suite of NLP algorithms on it. So far all I have been able to do is get the tweets to be written into an external file. Due to the volume of tweets I am going to collect is a 100 at a time (pretty small) and deployment concerns, I wish to collect these tweets to a dictionary or list for further analysis. However, I have failed in doing this. The code I have so far is given below:
import tweepy
class MyStreamListener(tweepy.StreamListener):
def __init__(self, api=None):
super(MyStreamListener, self).__init__()
self.num_tweets = 0
self.tweets = []
def on_status(self, status):
self.num_tweets += 1
if self.num_tweets > 100:
return False
def getstreams(keyword):
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_SECRET)
api = tweepy.API(auth, wait_on_rate_limit=True)
myStreamListener = MyStreamListener()
myStream = tweepy.Stream(auth = api.auth,listener=myStreamListener)
tweet_list = myStream.filter(track=[keyword])
return tweet_list.tweets
However when I run this, all I get is:
AttributeError: 'NoneType' object has no attribute 'tweets'
pointing to the line:
return tweet_list.tweets
I'd be grateful if anyone could answer how to overcome this issue and shed insight on how to collect n number of tweets into a list.
You can use the on_data function in your class.
def on_data(self, data):
# Converting data , which is an object, into JSON
tweet = json.loads(data)
# my_tweet is our list declared globally
I am using tweepy streaming API to get the tweets containing a particular hashtag . The problem that I am facing is that I am unable to extract full text of the tweet from the Streaming API . Only 140 characters are available and after that it gets truncated.
Here is the code:
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
api = tweepy.API(auth)
def analyze_status(text):
if 'RT' in text[0:3]:
return True
return False
class MyStreamListener(tweepy.StreamListener):
def on_status(self, status):
if not analyze_status(status.text):
with open('fetched_tweets.txt', 'a') as tf:
tf.write(status.text.encode('utf-8') + '\n\n')
def on_error(self, status):
print("Error Code : " + status)
def test_rate_limit(api, wait=True, buffer=.1):
Tests whether the rate limit of the last request has been reached.
:param api: The `tweepy` api instance.
:param wait: A flag indicating whether to wait for the rate limit reset
if the rate limit has been reached.
:param buffer: A buffer time in seconds that is added on to the waiting
time as an extra safety margin.
:return: True if it is ok to proceed with the next request. False otherwise.
# Get the number of remaining requests
remaining = int(api.last_response.getheader('x-rate-limit-remaining'))
# Check if we have reached the limit
if remaining == 0:
limit = int(api.last_response.getheader('x-rate-limit-limit'))
reset = int(api.last_response.getheader('x-rate-limit-reset'))
# Parse the UTC time
reset = datetime.fromtimestamp(reset)
# Let the user know we have reached the rate limit
print "0 of {} requests remaining until {}.".format(limit, reset)
if wait:
# Determine the delay and sleep
delay = (reset - datetime.now()).total_seconds() + buffer
print "Sleeping for {}s...".format(delay)
# We have waited for the rate limit reset. OK to proceed.
return True
# We have reached the rate limit. The user needs to handle the rate limit manually.
return False
# We have not reached the rate limit
return True
myStreamListener = MyStreamListener()
myStream = tweepy.Stream(auth=api.auth, listener=myStreamListener,
myStream.filter(track=['#bitcoin'], async=True)
Does any one have a solution ?
tweet_mode=extended will have no effect in this code, since the Streaming API does not support that parameter. If a Tweet contains longer text, it will contain an additional object in the JSON response called extended_tweet, which will in turn contain a field called full_text.
In that case, you'll want something like print(status.extended_tweet.full_text) to extract the longer text.
There is Boolean available in the Twitter stream. 'status.truncated' is True when the message contains more than 140 characters. Only then the 'extended_tweet' object is available:
if not status.truncated:
text = status.text
text = status.extended_tweet['full_text']
This works only when you are streaming tweets. When you are collecting older tweets using the API method you can use something like this:
tweets = api.user_timeline(screen_name='whoever', count=5, tweet_mode='extended')
for tweet in tweets:
This full_text field contains the text of all tweets, truncated or not.
You have to enable extended tweet mode like so:
s = tweepy.Stream(auth, l, tweet_mode='extended')
Then you can print the extended tweet, but remember due to Twitter APIs you have to make sure extended tweet exists otherwise it'll throw an error
l = listener()
class listener(StreamListener):
def on_status(self, status):
except Exception as e:
return True
def on_error(self, status_code):
if status_code == 420:
return False
Worked for me.
Building upon #AndyPiper's answer, you can check to see if the tweet is there by either a try/except:
def get_tweet_text(tweet):
return tweet.extended_tweet['full_text']
except AttributeError as e:
return tweet.text
OR check against the inner json:
def get_tweet_text(tweet):
if 'extended_tweet' in tweet._json:
return tweet.extended_tweet['full_text']
return tweet.text
Note that extended_tweet is a dictionary object, so "tweet.extended_tweet.full_text" doesn't actually work and will throw an error.
In addition to the previous answer: in my case it worked only as status.extended_tweet['full_text'], because the status.extended_tweet is nothing but a dictionary.
this is what worked for me:
status = tweet if 'extended_tweet' in status._json: status_json = status._json['extended_tweet']['full_text'] elif 'retweeted_status' in status._json and 'extended_tweet' in status._json['retweeted_status']: status_json = status._json['retweeted_status']['extended_tweet']['full_text'] elif 'retweeted_status' in status._json: status_json = status._json['retweeted_status']['full_text'] else: status_json = status._json['full_text'] print(status_json)'
https://github.com/tweepy/tweepy/issues/935 - implemented from here, needed to change what they suggest but the idea stays the same
I use the Following Function:
def full_text_tweeet(id_):
status = api.get_status(id_, tweet_mode="extended")
return status.retweeted_status.full_text
except AttributeError:
return status.full_text
and then call it in my list
tweets_list = []
# foreach through all tweets pulled
for tweet in tweets:
# printing the text stored inside the tweet object
tweet_list = [str(tweet.id),str(full_text_tweeet(tweet.id))]
try this, this is the most simplest and fastest way.
def on_status(self, status):
if hasattr(status, "retweeted_status"): # Check if Retweet
except AttributeError:
except AttributeError:
Visit the link it will give you the how extended tweet can be achieve
I am using twitter_timeline to get the user details.
It provides set of tweets including RTs. I am considering on retweets from all tweets.
Suppose I retweeted any tweet, which I can get using:
$tweets3 = $connection->get("https://api.twitter.com/1.1/statuses/user_timeline.json?trim_user=true&include_rts=true");
foreach ($tweets3 as $item)
$rt_reach = $item->retweet_count; //This is available
$text = $item->text; //This is available
$follower_count = $item->user->followers_count; //This is not available
echo "User location $item->user->location"; //This is not available
echo $follower_count = $item->user->screen_name; //This is not available
Link to document: https://dev.twitter.com/docs/api/1/get/statuses/user_timeline
Why it does not provide last three value in above code?
Since you're using "trim_user=true", twitter strips the user record except for the user_id.
Please check trim_user parameter here.
The "trim_user" param is used by applications to stop the tweets data from getting bloated and it should be excluded if the app needs the full user record, which seems to be the case for you.