Youtube Data Api stopping after 1000 results - youtube

I have been trying to retrieve a list of all my subscribers for my youtube channel. I am using a query of the format:
https://content.googleapis.com/youtube/v3/subscriptions?mySubscribers=true&maxResults=50&part=subscriberSnippet&access_token=xxxx
However, after 1000 results (for example, 20 pages # 50 per page, or 50 pages # 20 per page) it stops. Within the documentation it say for myRecentSubscribers it should only retrieve 1000, but for mySubscribers there is no limit. Is there some unwritten limit?

I just ran into the same issue w/ the following code:
import os
import google_auth_oauthlib.flow
import googleapiclient.discovery
import googleapiclient.errors
scopes = ["https://www.googleapis.com/auth/youtube.readonly"]
def main():
# Disable OAuthlib's HTTPS verification when running locally.
# *DO NOT* leave this option enabled in production.
os.environ["OAUTHLIB_INSECURE_TRANSPORT"] = "1"
api_service_name = "youtube"
api_version = "v3"
client_secrets_file = "google_oauth_client_secret.json"
# Get credentials and create an API client
flow = google_auth_oauthlib.flow.InstalledAppFlow.from_client_secrets_file(
client_secrets_file, scopes)
credentials = flow.run_console()
youtube = googleapiclient.discovery.build(
api_service_name, api_version, credentials=credentials)
next_page_token = ''
count = 0
while next_page_token != None:
print("Getting next page:", next_page_token, count)
request = youtube.subscriptions().list(
part="subscriberSnippet",
maxResults=50,
mySubscribers=True,
pageToken=next_page_token
)
response = request.execute()
next_page_token = response.get('nextPageToken')
subscribers = response.get('items')
for subscriber in subscribers:
count += 1
print('Total subscribers:', count)
if __name__ == "__main__":
main()
This is the expected behavior if myRecentSubscribers=True but the list should continue with mySubscribers=True...
EDIT: Apparently this is a WONT FIX issue, and the documentation will likely eventually be updated to reflect.(https://issuetracker.google.com/issues/172325507)

Related

How to disconnect the stream (Tweepy)?

I'm trying to update the stream every 15 minutes to change its rules.
As far as I understand it is impossible to update the filter rules in real time. So I try to stop the stream and then start it again.
class MyStream(tweepy.StreamingClient):
def disconnect(self):
self.running=False
print('stop stream)
stream = MyStream(bearer_token=bearer_token, wait_on_rate_limit=True)
stream.disconnect()
But it doesn't work. Streaming continues to work.
Can you please tell me how to reallocate what I have in mind?
update
I try to add a rule to the stream, then wait 10 seconds and add another one. But it doesn't work. Can you please tell me what the problem is and how to fix it?
import telebot
import tweepy
import time
bot = telebot.TeleBot()
api_key =
api_key_secret =
bearer_token =
access_token =
access_token_secret =
client = tweepy.Client(bearer_token, api_key, api_key_secret, access_token, access_token_secret)
auth = tweepy.OAuth1UserHandler(api_key, api_key_secret, access_token, access_token_secret)
api = tweepy.API(auth)
class MyStream(tweepy.StreamingClient):
def on_connect(self):
print('Connected')
def on_response(self, response):
print(response)
stream = MyStream(bearer_token=bearer_token, wait_on_rate_limit=True)
rules = ['book', 'tree', 'word']
#create the stream
for rule in rules:
stream.add_rules(tweepy.StreamRule(rule))
print('Showing the rule')
print(stream.get_rules().data)
stream.filter(tweet_fields=["referenced_tweets"])
# this part of the code no longer works.
print('sleep 10 sec')
time.sleep(10)
# this part not working too
print('Final Streaming Rules:')
print(stream.get_rules().data)
In Twitter API v2 (Tweepy using the tweepy.client interface and StreamingClient object) the stream does not need to disconnect in order to update the rules, you do that by adding rules via StreamingClient.add_rules(). Docs:
StreamingClient.add_rules() can be used to add rules before using StreamingClient.filter() to connect to and run a filtered stream:
streaming_client.add_rules(tweepy.StreamRule("Tweepy"))
streaming_client.filter()
StreamingClient.get_rules() can be used to retrieve existing rules
and
StreamingClient.delete_rules() can be used to delete rules.

Storing streamed tweets in a list for further analysis

I am building a data mining app to collect tweets using the Twitter streaming API (via tweepy) and run a suite of NLP algorithms on it. So far all I have been able to do is get the tweets to be written into an external file. Due to the volume of tweets I am going to collect is a 100 at a time (pretty small) and deployment concerns, I wish to collect these tweets to a dictionary or list for further analysis. However, I have failed in doing this. The code I have so far is given below:
import tweepy
class MyStreamListener(tweepy.StreamListener):
def __init__(self, api=None):
super(MyStreamListener, self).__init__()
self.num_tweets = 0
self.tweets = []
def on_status(self, status):
#print(status.text)
self.num_tweets += 1
self.tweets.append(status.text)
if self.num_tweets > 100:
return False
def getstreams(keyword):
CONSUMER_KEY = ''
CONSUMER_SECRET = ''
ACCESS_TOKEN = ''
ACCESS_SECRET = ''
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_SECRET)
api = tweepy.API(auth, wait_on_rate_limit=True)
myStreamListener = MyStreamListener()
myStream = tweepy.Stream(auth = api.auth,listener=myStreamListener)
tweet_list = myStream.filter(track=[keyword])
return tweet_list.tweets
getstreams('Starbucks')
However when I run this, all I get is:
AttributeError: 'NoneType' object has no attribute 'tweets'
pointing to the line:
return tweet_list.tweets
I'd be grateful if anyone could answer how to overcome this issue and shed insight on how to collect n number of tweets into a list.
You can use the on_data function in your class.
def on_data(self, data):
# Converting data , which is an object, into JSON
tweet = json.loads(data)
# my_tweet is our list declared globally
my_tweet.append(tweet)

requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(0 bytes read, 512 more expected)', IncompleteRead

I wanted to write a program to fetch tweets from Twitter and then do sentiment analysis. I wrote the following code and got the error even after importing all the necessary libraries. I'm relatively new to data science, so please help me.
I could not understand the reason for this error:
class TwitterClient(object):
def __init__(self):
# keys and tokens from the Twitter Dev Console
consumer_key = 'XXXXXXXXX'
consumer_secret = 'XXXXXXXXX'
access_token = 'XXXXXXXXX'
access_token_secret = 'XXXXXXXXX'
api = Api(consumer_key, consumer_secret, access_token, access_token_secret)
def preprocess(tweet, ascii=True, ignore_rt_char=True, ignore_url=True, ignore_mention=True, ignore_hashtag=True,letter_only=True, remove_stopwords=True, min_tweet_len=3):
sword = stopwords.words('english')
if ascii: # maybe remove lines with ANY non-ascii character
for c in tweet:
if not (0 < ord(c) < 127):
return ''
tokens = tweet.lower().split() # to lower, split
res = []
for token in tokens:
if remove_stopwords and token in sword: # ignore stopword
continue
if ignore_rt_char and token == 'rt': # ignore 'retweet' symbol
continue
if ignore_url and token.startswith('https:'): # ignore url
continue
if ignore_mention and token.startswith('#'): # ignore mentions
continue
if ignore_hashtag and token.startswith('#'): # ignore hashtags
continue
if letter_only: # ignore digits
if not token.isalpha():
continue
elif token.isdigit(): # otherwise unify digits
token = '<num>'
res += token, # append token
if min_tweet_len and len(res) < min_tweet_len: # ignore tweets few than n tokens
return ''
else:
return ' '.join(res)
for line in api.GetStreamSample():
if 'text' in line and line['lang'] == u'en': # step 1
text = line['text'].encode('utf-8').replace('\n', ' ') # step 2
p_t = preprocess(text)
# attempt authentication
try:
# create OAuthHandler object
self.auth = OAuthHandler(consumer_key, consumer_secret)
# set access token and secret
self.auth.set_access_token(access_token, access_token_secret)
# create tweepy API object to fetch tweets
self.api = tweepy.API(self.auth)
except:
print("Error: Authentication Failed")
Assume all the necessary libraries are imported. The error is on line 69.
for line in api.GetStreamSample():
if 'text' in line and line['lang'] == u'en': # step 1
text = line['text'].encode('utf-8').replace('\n', ' ') # step 2
p_t = preprocess(text)
I tried checking on the internet the reason for the error but could not get any solution.
Error was:
requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(0 bytes read, 512 more expected)', IncompleteRead(0 bytes read, 512 more expected))
I'm using Python 2.7 and requests version 2.14, the latest one.
If you set stream to True when making a request, Requests cannot release the connection back to the pool unless you consume all the data or call Response.close. This can lead to inefficiency with connections. If you find yourself partially reading request bodies (or not reading them at all) while using stream=True, you should make the request within a with statement to ensure it’s always closed:
with requests.get('http://httpbin.org/get', stream=True) as r:
# Do things with the response here.
I had the same problem but without stream, and as stone mini said, just apply "with" clause before to make sure your request is closed before a new request.
with requests.request("POST", url_base, json=task, headers=headers) as report:
print('report: ', report)
actually the problem with your django2.7 or earlier version based application. that django versions by default allowed 2.5mb data upload memory size of request body.
I was facing the same issue with django2.7 based application, I just updated the setting.py file of my django application where my urls(endpoints) were working.
DATA_UPLOAD_MAX_MEMORY_SIZE = None
I just add the above variable in my application's settings.py file.
you can also readout about that from here
I'm pretty sure this will work for you.

tweepy Streaming API : full text

I am using tweepy streaming API to get the tweets containing a particular hashtag . The problem that I am facing is that I am unable to extract full text of the tweet from the Streaming API . Only 140 characters are available and after that it gets truncated.
Here is the code:
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
api = tweepy.API(auth)
def analyze_status(text):
if 'RT' in text[0:3]:
return True
else:
return False
class MyStreamListener(tweepy.StreamListener):
def on_status(self, status):
if not analyze_status(status.text):
with open('fetched_tweets.txt', 'a') as tf:
tf.write(status.text.encode('utf-8') + '\n\n')
print(status.text)
def on_error(self, status):
print("Error Code : " + status)
def test_rate_limit(api, wait=True, buffer=.1):
"""
Tests whether the rate limit of the last request has been reached.
:param api: The `tweepy` api instance.
:param wait: A flag indicating whether to wait for the rate limit reset
if the rate limit has been reached.
:param buffer: A buffer time in seconds that is added on to the waiting
time as an extra safety margin.
:return: True if it is ok to proceed with the next request. False otherwise.
"""
# Get the number of remaining requests
remaining = int(api.last_response.getheader('x-rate-limit-remaining'))
# Check if we have reached the limit
if remaining == 0:
limit = int(api.last_response.getheader('x-rate-limit-limit'))
reset = int(api.last_response.getheader('x-rate-limit-reset'))
# Parse the UTC time
reset = datetime.fromtimestamp(reset)
# Let the user know we have reached the rate limit
print "0 of {} requests remaining until {}.".format(limit, reset)
if wait:
# Determine the delay and sleep
delay = (reset - datetime.now()).total_seconds() + buffer
print "Sleeping for {}s...".format(delay)
sleep(delay)
# We have waited for the rate limit reset. OK to proceed.
return True
else:
# We have reached the rate limit. The user needs to handle the rate limit manually.
return False
# We have not reached the rate limit
return True
myStreamListener = MyStreamListener()
myStream = tweepy.Stream(auth=api.auth, listener=myStreamListener,
tweet_mode='extended')
myStream.filter(track=['#bitcoin'], async=True)
Does any one have a solution ?
tweet_mode=extended will have no effect in this code, since the Streaming API does not support that parameter. If a Tweet contains longer text, it will contain an additional object in the JSON response called extended_tweet, which will in turn contain a field called full_text.
In that case, you'll want something like print(status.extended_tweet.full_text) to extract the longer text.
There is Boolean available in the Twitter stream. 'status.truncated' is True when the message contains more than 140 characters. Only then the 'extended_tweet' object is available:
if not status.truncated:
text = status.text
else:
text = status.extended_tweet['full_text']
This works only when you are streaming tweets. When you are collecting older tweets using the API method you can use something like this:
tweets = api.user_timeline(screen_name='whoever', count=5, tweet_mode='extended')
for tweet in tweets:
print(tweet.full_text)
This full_text field contains the text of all tweets, truncated or not.
You have to enable extended tweet mode like so:
s = tweepy.Stream(auth, l, tweet_mode='extended')
Then you can print the extended tweet, but remember due to Twitter APIs you have to make sure extended tweet exists otherwise it'll throw an error
l = listener()
class listener(StreamListener):
def on_status(self, status):
try:
print(status.extended_tweet['full_text'])
except Exception as e:
raise
else:
print(status.text)
return True
def on_error(self, status_code):
if status_code == 420:
return False
Worked for me.
Building upon #AndyPiper's answer, you can check to see if the tweet is there by either a try/except:
def get_tweet_text(tweet):
try:
return tweet.extended_tweet['full_text']
except AttributeError as e:
return tweet.text
OR check against the inner json:
def get_tweet_text(tweet):
if 'extended_tweet' in tweet._json:
return tweet.extended_tweet['full_text']
else:
return tweet.text
Note that extended_tweet is a dictionary object, so "tweet.extended_tweet.full_text" doesn't actually work and will throw an error.
In addition to the previous answer: in my case it worked only as status.extended_tweet['full_text'], because the status.extended_tweet is nothing but a dictionary.
this is what worked for me:
status = tweet if 'extended_tweet' in status._json: status_json = status._json['extended_tweet']['full_text'] elif 'retweeted_status' in status._json and 'extended_tweet' in status._json['retweeted_status']: status_json = status._json['retweeted_status']['extended_tweet']['full_text'] elif 'retweeted_status' in status._json: status_json = status._json['retweeted_status']['full_text'] else: status_json = status._json['full_text'] print(status_json)'
https://github.com/tweepy/tweepy/issues/935 - implemented from here, needed to change what they suggest but the idea stays the same
I use the Following Function:
def full_text_tweeet(id_):
status = api.get_status(id_, tweet_mode="extended")
try:
return status.retweeted_status.full_text
except AttributeError:
return status.full_text
and then call it in my list
tweets_list = []
# foreach through all tweets pulled
for tweet in tweets:
# printing the text stored inside the tweet object
tweet_list = [str(tweet.id),str(full_text_tweeet(tweet.id))]
tweets_list.append(tweet_list)
try this, this is the most simplest and fastest way.
def on_status(self, status):
if hasattr(status, "retweeted_status"): # Check if Retweet
try:
print(status.retweeted_status.extended_tweet["full_text"])
except AttributeError:
print(status.retweeted_status.text)
else:
try:
print(status.extended_tweet["full_text"])
except AttributeError:
print(status.text)
Visit the link it will give you the how extended tweet can be achieve

How to get content owner info for particular video via YouTube Partner API?

Is there a way to get content owner information (attribution) for particular video via YouTube Partner API?
For example, this video: http://www.youtube.com/watch?v=I67cgXr6L6o&feature=c4-overview&list=UUGnjeahCJW1AF34HBmQTJ-Q is attributed to VEVO.
Is there a way to get that info somehow via API?
It's possbible now:
https://developers.google.com/youtube/partner/docs/v1/ownership/
[EDIT]
I've used this sample to create the snipped below and succesfully got the assets content owner.
You will need setup your python enviromment and the clients_secrets.file correctly
#!/usr/bin/python
import httplib2
import os
import sys
import logging
import requests
import json
import time
from apiclient.discovery import build
from oauth2client.file import Storage
from oauth2client.client import flow_from_clientsecrets
from oauth2client.tools import argparser, run_flow
from symbol import for_stmt
# The CLIENT_SECRETS_FILE variable specifies the name of a file that contains
# the OAuth 2.0 information for this application, including its client_id and
# client_secret. You can acquire an OAuth 2.0 client ID and client secret from
# the Google Developers Console at
# https://console.developers.google.com/.
# Please ensure that you have enabled the YouTube Data API for your project.
# For more information about using OAuth2 to access the YouTube Data API, see:
# https://developers.google.com/youtube/v3/guides/authentication
# For more information about the client_secrets.json file format, see:
# https://developers.google.com/api-client-library/python/guide/aaa_client_secrets
CLIENT_SECRETS_FILE = "client_secrets.json"
# This variable defines a message to display if the CLIENT_SECRETS_FILE is
# missing.
MISSING_CLIENT_SECRETS_MESSAGE = """
WARNING: Please configure OAuth 2.0
To make this sample run you will need to populate the client_secrets.json file
found at:
%s
with information from the Developers Console
https://console.developers.google.com/
For more information about the client_secrets.json file format, please visit:
https://developers.google.com/api-client-library/python/guide/aaa_client_secrets
""" % os.path.abspath(os.path.join(os.path.dirname(__file__), CLIENT_SECRETS_FILE))
YOUTUBE_SCOPES = (
# This OAuth 2.0 access scope allows for read-only access to the authenticated
# user's account, but not other types of account access.
"https://www.googleapis.com/auth/youtube.readonly",
# This OAuth 2.0 scope grants access to YouTube Content ID API functionality.
"https://www.googleapis.com/auth/youtubepartner")
YOUTUBE_API_SERVICE_NAME = "youtube"
YOUTUBE_API_VERSION = "v3"
YOUTUBE_PARTNER_API_SERVICE_NAME = "youtubePartner"
YOUTUBE_PARTNER_API_VERSION = "v1"
KEY = "USE YOUR KEY"
# Authorize the request and store authorization credentials.
def get_authenticated_services(args):
flow = flow_from_clientsecrets(CLIENT_SECRETS_FILE, scope=" ".join(YOUTUBE_SCOPES), message=MISSING_CLIENT_SECRETS_MESSAGE)
storage = Storage("%s-oauth2.json" % sys.argv[0])
credentials = storage.get()
if credentials is None or credentials.invalid:
credentials = run_flow(flow, storage, args)
youtube = build(YOUTUBE_API_SERVICE_NAME, YOUTUBE_API_VERSION, http=credentials.authorize(httplib2.Http()))
youtube_partner = build(YOUTUBE_PARTNER_API_SERVICE_NAME, YOUTUBE_PARTNER_API_VERSION, http=credentials.authorize(httplib2.Http()))
return (youtube, youtube_partner)
def get_content_owner_id(youtube_partner):
content_owners_list_response = youtube_partner.contentOwners().list(fetchMine=True).execute()
return content_owners_list_response["items"][0]["id"]
#def
def claim_search(youtube_partner, content_owner_id, _videoId):
video_info = get_video_info(_videoId)
print '\n---------- CLAIM SEARCH LIST - VIDEO (',video_info['title'],') ID: (',_videoId,')'
claims = youtube_partner.claimSearch().list( videoId=_videoId, onBehalfOfContentOwner=content_owner_id).execute()
print ' --- CLAIMS: ',claims
if claims['pageInfo']['totalResults'] < 1:
print " --- DOESN'T HAVE CLAIMS"
return
assetId = claims['items'][0]['assetId']
print ' --- ASSET ID: ', assetId
ownershipHistory = youtube_partner.ownershipHistory().list(assetId=assetId, onBehalfOfContentOwner=content_owner_id).execute()
print ' --- OWNERSHIP HISTORY: ', ownershipHistory
contentOwnerId = ownershipHistory['items'][0]['origination']['owner']
print ' --- CONTENT OWNER ID: ', contentOwnerId
contentOwners = youtube_partner.contentOwners().get(contentOwnerId=contentOwnerId, onBehalfOfContentOwner=content_owner_id).execute()
print ' --- CONTENT OWNERS: ', contentOwners
def get_video_info(videoId):
r = requests.get("https://www.googleapis.com/youtube/v3/videos?id="+videoId+"&part=id,snippet,contentDetails,status,statistics,topicDetails,recordingDetails&key="+KEY)
content = r.content
content = json.loads(content)
return content['items'][0]['snippet']
if __name__ == "__main__":
args = argparser.parse_args()
(youtube, youtube_partner) = get_authenticated_services(args)
#video id's. Ex: "https://www.youtube.com/watch?v=lgSLz5FeXUg"
videos = ['I67cgXr6L6o', 'lgSLz5FeXUg']
content_owner_id = get_content_owner_id(youtube_partner)
for vid in videos :
claim_search(youtube_partner, content_owner_id, vid)
time.sleep(0.5)
The (censored) proof :
There is no way to check other people's ownership or management rights on a video or channel. YouTube deliberately avoids that.
While ownershipHistory works, I saw GET https://www.googleapis.com/youtube/partner/v1/assets?id=[multiple asset ids] also worked when it's called with fetchOwnership=effective option
https://developers.google.com/youtube/partner/docs/v1/assets/list

Resources