Adding rules to running stream (tweepy, streamingClient) - twitter

I am trying to collect all Tweets of certain users and the replies to those Tweets. Collecting the original Tweets works out fine, but collecting the replies does not. As soon as on_tweet is called (when the stream receives a Tweet), I am trying to add a rule 'in_reply_to_tweet_id: 'id of incoming tweet'' to my stream so that it also stream those replies. Yet the code below doesn't work. I checked with get_rules after the stream was closed and there was no rule added. I also tried adding a simple 'OR: keyword' rule, which was also not added, so the ID is not the problem.
Thanks, any help is appreciated!:)
class stream(tweepy.StreamingClient):
def __init__(self, token):
tweepy.StreamingClient.__init__(self, token)
self.raw_tweets = []
self.raw_replies = []
#this method is called whenever the stream receives a tweet
def on_tweet(self, tweet):
#checking whether new tweet in stream is a original tweet or a reply to a tweet
if (tweet.conversation_id == tweet.id):
#add the id to the rules so that replies to the tweet are also streamed
#this is where the problem is
id_as_str = str(tweet.id)
new_rule = 'OR in_reply_to_tweet_id:' + id_as_str
self.add_rules(add= tweepy.StreamRule(new_rule), dry_run = True)
self.raw_tweets.append(tweet)
print('this is an original'+tweet.text)
else:
self.raw_replies.append(tweet)
print('this is a reply:'+tweet.text)
return self.raw_tweets

Related

Twitter replying to tweet twice incorrectly

I wrote a python script which listens for twitter mentions and reply with a text. Everything works well until i decide to change the response text, now i notice everytime a new tweet comes in, the script reply twice, one with the old response text and second with the new text.
Have you come across this before and how were you able to resolve it?
received = "Hey"
response = client.create_tweet(
text=received,
in_reply_to_tweet_id = tweet.id
)

How to disconnect the stream (Tweepy)?

I'm trying to update the stream every 15 minutes to change its rules.
As far as I understand it is impossible to update the filter rules in real time. So I try to stop the stream and then start it again.
class MyStream(tweepy.StreamingClient):
def disconnect(self):
self.running=False
print('stop stream)
stream = MyStream(bearer_token=bearer_token, wait_on_rate_limit=True)
stream.disconnect()
But it doesn't work. Streaming continues to work.
Can you please tell me how to reallocate what I have in mind?
update
I try to add a rule to the stream, then wait 10 seconds and add another one. But it doesn't work. Can you please tell me what the problem is and how to fix it?
import telebot
import tweepy
import time
bot = telebot.TeleBot()
api_key =
api_key_secret =
bearer_token =
access_token =
access_token_secret =
client = tweepy.Client(bearer_token, api_key, api_key_secret, access_token, access_token_secret)
auth = tweepy.OAuth1UserHandler(api_key, api_key_secret, access_token, access_token_secret)
api = tweepy.API(auth)
class MyStream(tweepy.StreamingClient):
def on_connect(self):
print('Connected')
def on_response(self, response):
print(response)
stream = MyStream(bearer_token=bearer_token, wait_on_rate_limit=True)
rules = ['book', 'tree', 'word']
#create the stream
for rule in rules:
stream.add_rules(tweepy.StreamRule(rule))
print('Showing the rule')
print(stream.get_rules().data)
stream.filter(tweet_fields=["referenced_tweets"])
# this part of the code no longer works.
print('sleep 10 sec')
time.sleep(10)
# this part not working too
print('Final Streaming Rules:')
print(stream.get_rules().data)
In Twitter API v2 (Tweepy using the tweepy.client interface and StreamingClient object) the stream does not need to disconnect in order to update the rules, you do that by adding rules via StreamingClient.add_rules(). Docs:
StreamingClient.add_rules() can be used to add rules before using StreamingClient.filter() to connect to and run a filtered stream:
streaming_client.add_rules(tweepy.StreamRule("Tweepy"))
streaming_client.filter()
StreamingClient.get_rules() can be used to retrieve existing rules
and
StreamingClient.delete_rules() can be used to delete rules.

list() returns different number of comments from a video

I'm trying to crawl comments of a given videoId with youtube API.
But the number of crawled comments is less than its actual number.
Do you have any idea about this? My code is like the below.
from googleapiclient.discovery import build
from typing import List
def get_comments(api, video_id: str, fields: str)-> List[List[str]]:
comments = list()
response = api.commentThreads().list(part='snippet', fields=fields, videoId=video_id, maxResults=50).execute()
all_comment_crawled = True
while all_comment_crawled:
for item in response['items']:
comment = item['snippet']['topLevelComment']['snippet']
comments.append([comment['textOriginal'], comment['likeCount']])
if 'nextPageToken' in response:
response = api.commentThreads().list(part='snippet', videoId=video_id, fields=fields, pageToken=response['nextPageToken'], maxResults=50).execute()
else:
all_comment_crawled = False
return comments
api_key = "MY_API_KEY"
api_obj = build('youtube', 'v3', developerKey=api_key)
video_id = 'fgSvGLxanCo'
fields = 'items(snippet(totalReplyCount, topLevelComment(snippet(textOriginal, likeCount)))), nextPageToken'
comments = get_comments(api_obj, video_id, fields)
print(len(comments)) # returns 1,945 actually is over 2,000
There is a trap (and I announce another trap that it is easy to fall into) when listing comments on a YouTube video:
The comments count on YouTube counts all (not filtered) comments. The replies are included in this count and you haven't considered them in your algorithm. Have a look to CommentThreads: list
The replies when using commentThreads are given up to 5 replies.
If the message have more than 5 replies you have to use Comments: list to list them all.

tweepy Streaming API : full text

I am using tweepy streaming API to get the tweets containing a particular hashtag . The problem that I am facing is that I am unable to extract full text of the tweet from the Streaming API . Only 140 characters are available and after that it gets truncated.
Here is the code:
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
api = tweepy.API(auth)
def analyze_status(text):
if 'RT' in text[0:3]:
return True
else:
return False
class MyStreamListener(tweepy.StreamListener):
def on_status(self, status):
if not analyze_status(status.text):
with open('fetched_tweets.txt', 'a') as tf:
tf.write(status.text.encode('utf-8') + '\n\n')
print(status.text)
def on_error(self, status):
print("Error Code : " + status)
def test_rate_limit(api, wait=True, buffer=.1):
"""
Tests whether the rate limit of the last request has been reached.
:param api: The `tweepy` api instance.
:param wait: A flag indicating whether to wait for the rate limit reset
if the rate limit has been reached.
:param buffer: A buffer time in seconds that is added on to the waiting
time as an extra safety margin.
:return: True if it is ok to proceed with the next request. False otherwise.
"""
# Get the number of remaining requests
remaining = int(api.last_response.getheader('x-rate-limit-remaining'))
# Check if we have reached the limit
if remaining == 0:
limit = int(api.last_response.getheader('x-rate-limit-limit'))
reset = int(api.last_response.getheader('x-rate-limit-reset'))
# Parse the UTC time
reset = datetime.fromtimestamp(reset)
# Let the user know we have reached the rate limit
print "0 of {} requests remaining until {}.".format(limit, reset)
if wait:
# Determine the delay and sleep
delay = (reset - datetime.now()).total_seconds() + buffer
print "Sleeping for {}s...".format(delay)
sleep(delay)
# We have waited for the rate limit reset. OK to proceed.
return True
else:
# We have reached the rate limit. The user needs to handle the rate limit manually.
return False
# We have not reached the rate limit
return True
myStreamListener = MyStreamListener()
myStream = tweepy.Stream(auth=api.auth, listener=myStreamListener,
tweet_mode='extended')
myStream.filter(track=['#bitcoin'], async=True)
Does any one have a solution ?
tweet_mode=extended will have no effect in this code, since the Streaming API does not support that parameter. If a Tweet contains longer text, it will contain an additional object in the JSON response called extended_tweet, which will in turn contain a field called full_text.
In that case, you'll want something like print(status.extended_tweet.full_text) to extract the longer text.
There is Boolean available in the Twitter stream. 'status.truncated' is True when the message contains more than 140 characters. Only then the 'extended_tweet' object is available:
if not status.truncated:
text = status.text
else:
text = status.extended_tweet['full_text']
This works only when you are streaming tweets. When you are collecting older tweets using the API method you can use something like this:
tweets = api.user_timeline(screen_name='whoever', count=5, tweet_mode='extended')
for tweet in tweets:
print(tweet.full_text)
This full_text field contains the text of all tweets, truncated or not.
You have to enable extended tweet mode like so:
s = tweepy.Stream(auth, l, tweet_mode='extended')
Then you can print the extended tweet, but remember due to Twitter APIs you have to make sure extended tweet exists otherwise it'll throw an error
l = listener()
class listener(StreamListener):
def on_status(self, status):
try:
print(status.extended_tweet['full_text'])
except Exception as e:
raise
else:
print(status.text)
return True
def on_error(self, status_code):
if status_code == 420:
return False
Worked for me.
Building upon #AndyPiper's answer, you can check to see if the tweet is there by either a try/except:
def get_tweet_text(tweet):
try:
return tweet.extended_tweet['full_text']
except AttributeError as e:
return tweet.text
OR check against the inner json:
def get_tweet_text(tweet):
if 'extended_tweet' in tweet._json:
return tweet.extended_tweet['full_text']
else:
return tweet.text
Note that extended_tweet is a dictionary object, so "tweet.extended_tweet.full_text" doesn't actually work and will throw an error.
In addition to the previous answer: in my case it worked only as status.extended_tweet['full_text'], because the status.extended_tweet is nothing but a dictionary.
this is what worked for me:
status = tweet if 'extended_tweet' in status._json: status_json = status._json['extended_tweet']['full_text'] elif 'retweeted_status' in status._json and 'extended_tweet' in status._json['retweeted_status']: status_json = status._json['retweeted_status']['extended_tweet']['full_text'] elif 'retweeted_status' in status._json: status_json = status._json['retweeted_status']['full_text'] else: status_json = status._json['full_text'] print(status_json)'
https://github.com/tweepy/tweepy/issues/935 - implemented from here, needed to change what they suggest but the idea stays the same
I use the Following Function:
def full_text_tweeet(id_):
status = api.get_status(id_, tweet_mode="extended")
try:
return status.retweeted_status.full_text
except AttributeError:
return status.full_text
and then call it in my list
tweets_list = []
# foreach through all tweets pulled
for tweet in tweets:
# printing the text stored inside the tweet object
tweet_list = [str(tweet.id),str(full_text_tweeet(tweet.id))]
tweets_list.append(tweet_list)
try this, this is the most simplest and fastest way.
def on_status(self, status):
if hasattr(status, "retweeted_status"): # Check if Retweet
try:
print(status.retweeted_status.extended_tweet["full_text"])
except AttributeError:
print(status.retweeted_status.text)
else:
try:
print(status.extended_tweet["full_text"])
except AttributeError:
print(status.text)
Visit the link it will give you the how extended tweet can be achieve

Tweepy user_search api is very slow

After two days of unsuccessful attempt to use twitter gem I have decided to use tweepy of python for a task. (My original attempt was with ruby and I posted the question here)
My task is to collect all those actresses who have a verified account on twitter. I have taken the list of actresses from wikipedia.
Everything looks fine till now. I have started hitting twitter REST api with each name and I check whether it is a verified account or not.
The only problem I have is that the response is very slow. It takes about 12-15 seconds for every request. Am I doing something wrong here or is it how it is suppose to be.
Below is my code in its entirety :
import tweepy
consumer_key = 'xxx'
consumer_secret = 'xxx'
access_token_key = 'xx-xx'
access_token_secret = 'xxx'
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token_key, access_token_secret)
api = tweepy.API(auth)
actresses = []
f = open('final','r')
for line in f:
actresses.append(line)
f.close()
print actresses
for actress in actresses:
print actress
users = api.search_users(actress)
for u in users:
if u.verified == True and u.name == actress:
print u.name + " === https://twitter.com/" + u.screen_name
Also is there any better way to extract the verified actresses using that list?
Unfortunately, there is no faster way to do it, given that you only know the actresses' full names, and not their screen names. Each request will take a long time, as Twitter needs to return the results of users matching the query (there may be quite a few). Each one needs to be loaded and examined, which can take a while, depending on how many results were returned.

Resources