twython : get followers list - twitter

Using twython I am trying to retrieve list of all of the followers of a particular id which has more than 40k followers. But I am running into below error
"Twitter API returned a 429 (Too many requests) rate limit exceeded. How to over come this issue?
Below is the snippet, I am printing user name and time zone information.
next_cursor = -1
while(next_cursor):
search = twitter.get_followers_list(screen_name='ndtvgadgets',cursor=next_cursor)
for result in search['users']:
time_zone =result['time_zone'] if result['time_zone'] != None else "N/A"
print result["name"].encode('utf-8')+ ' '+time_zone.encode('utf-8')
next_cursor = search["next_cursor"]

Change the search line to:
search = twitter.get_followers_list(screen_name='ndtvgadgets',count=200,cursor=next_cursor)
Then import the time module and insert time.sleep(60) between each API call.
It'll take ages for a user with 41K followers (around three and a half hours for the ndtvgadgets account), but it should work. With the count increased to 200 (the maximum) you're effectively requesting 200 results every minute. If there are other API calls in your script in addition to twitter.get_followers_list you might want to pad the sleep time a little or insert a sleep call after each one.

Related

BioPython Entrez article limit

I've been using the classic article function which returns the articles for a string
from Bio import Entrez, __version__
print('Biopython version : ', __version__)
def article_machine(t):
Entrez.email = 'email'
handle = Entrez.esearch(db='pubmed',
sort='relevance',
retmax='100000',
retmode='xml',
term=t)
return list(Entrez.read(handle)["IdList"])
print(len(article_machine('T cell')))
I've noticed now that there's a limit to the amount of articles I receive (not the one I put in retmax).
The amount I get is 9999 PMIDS, for key words who used to return 100k PMIDS (T cell for example)
The amount I get now
The amount I used to get
I know it's not a bug in the package itself but in NCBI.
Has someone managed to solve it?
from : The E-utilities In-Depth: Parameters, Syntax and More
retmax
Total number of UIDs from the retrieved set to be shown in the XML output (default=20). By default, ESearch only includes the first 20 UIDs retrieved in the XML output. If usehistory is set to 'y', the remainder of the retrieved set will be stored on the History server; otherwise these UIDs are lost. Increasing retmax allows more of the retrieved UIDs to be included in the XML output, up to a maximum of 10,000 records.
To retrieve more than 10,000 UIDs from databases other than PubMed, submit multiple esearch requests while incrementing the value of retstart (see Application 3). For PubMed, ESearch can only retrieve the first 10,000 records matching the query. To obtain more than 10,000 PubMed records, consider using that contains additional logic to batch PubMed search results
Unfortunately my code devised on the above mentioned info, doesnt work:
from Bio import Entrez, __version__
print('Biopython version : ', __version__)
def article_machine(t):
all_res = []
Entrez.email = 'email'
handle = Entrez.esearch(db='pubmed',
sort='relevance',
rettype='count',
term=t)
number = int(Entrez.read(handle)['Count'])
print(number)
retstart = 0
while retstart < number:
retstart += 1000
print('\n retstart now : ' , retstart ,' out of : ', number)
Entrez.email = 'email'
handle = Entrez.esearch(db='pubmed',
sort='relevance',
rettype='xml',
retstart = retstart,
retmax = str(retstart),
term=t)
all_res.extend(list(Entrez.read(handle)["IdList"]))
return all_res
print(article_machine('T cell'))
changing while retstart < number: with while retstart < 5000:
the code works, but as soon as retmax exceeds 9998, that is using
the former while loop needed to access all the results, I get the following error:
RuntimeError: Search Backend failed: Exception:
'retstart' cannot be larger than 9998. For PubMed,
ESearch can only retrieve the first 9,999 records
matching the query.
To obtain more than 9,999 PubMed records, consider
using EDirect that contains additional logic
to batch PubMed search results automatically
so that an arbitrary number can be retrieved.
For details see https://www.ncbi.nlm.nih.gov/books/NBK25499/
See https://www.ncbi.nlm.nih.gov/books/NBK25499/ that actually should be https://www.ncbi.nlm.nih.gov/books/NBK179288/?report=reader
Try to have a look at NCBI APIs to see if there is something could work out your problem usinng a Python interface, I am not an expert on this , sorry : https://www.ncbi.nlm.nih.gov/pmc/tools/developers/

Search results have a couple of more "nextPageToken", but no more items

I call YT.Search API and then iterate over the results. Page by Page.
At some point, there are still more pages to iterate, but the "items" array is empty.
I had to add a condition to break the iteration loop.
if (searchResults.items.length == 0 || !('nextPageToken' in searchResults)) {
Is that a known issue?
I saw this post but I recall it wasn't like that a few weeks ago.
I used to call the YT search and get ~2000 results per keyword.

Tweepy not returning all tweets?

I am trying to simply get n tweets for a given query. The problem is, tweepy keeps returning tweets < n when I do this via cursor method.
I think this has something to do rate limiting. Though I initialized the api to wait on the rate limit, and tell me when this is happening.
Here's my code.
# Initialize API
ckey = "xxx"
csecret = "xxx"
atoken = "xxx"
asecret = "xxx"
auth = tweepy.OAuthHandler(ckey, csecret)
auth.set_access_token(atoken, asecret)
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
max_tweets = 1000
query = 'tweepy'
l = []
for tweet in tweepy.Cursor(api.search, q=query).items(max_tweets):
l.append(tweet.text)
print(len(l))
And it always happens that l has less tweets than max_tweets.
It’s due to the query, try different query instead of tweety may be there is no enough tweets in last seven days. I just tried different query getting 1000 tweets as requested.

Microsoft Graph "messages" delta request truncates too many results with date filter

I think I've found a bug with the date filtering on the delta API.
I'm finding on one of the email accounts I'm working with using Office 365 Graph API that the "messages" graph API delta request is returning a different number of items than are actually in a folder for the expected time range. There are 150,000 items covering 10 years in the folder but delta only returns the last 5,000-ish items covering the last 60 or so days.
Paging Works Fine
When querying the graph API for the folder "Inbox" it has 154,045 total items and 57456 unread items.
IUserMailFoldersCollectionPage foldersPage =
await client.Users[mailboxid].MailFolders.Request().GetAsync();
I can skip over 10,000, 50,000 or more messages using paging.
model.messages = await client.Users[mailboxid].MailFolders[folderid].Messages.Request().Top(top)
.Skip(skip).GetAsync();
Delta with Date Filter doesn't work
But when looping with nextToken and deltaTokens, the deltaToken appears after 5000 or so email messages. Basically it seems like it's only returning results for the last couple months even though the filter is saying find messages for the last 20 years.
Here is the example for how we generate the Delta request. The time is hardcoded here but in reality it is a variable.
var sFilter = $"receivedDateTime ge {DateTimeOffset.UtcNow.AddYears(-20).ToString("yyyy-MM-dd")}";
model.messages = await client.Users[mailboxid].MailFolders[folderid].Messages.Delta().Request()
.Header("Prefer", "odata.maxpagesize=" + maxpagesize)
.Filter(sFilter)
.OrderBy("receivedDateTime desc")
.GetAsync();
And then on each paging operation I do the following. "nexttoken" is either the next or delta link depending on what came back from the first request.
model.messages = new MessageDeltaCollectionPage();
model.messages.InitializeNextPageRequest(client, nexttoken);
model.messages = await model.messages.NextPageRequest
.Header("Prefer", "odata.maxpagesize=" + maxpagesize)
.GetAsync();
Delta without Filter works
If I do the exact same code for delta above but remove the "Filter" operation on date, then I get all the messages in the folder.
This isn't a great solution since I normally only need messages for the last year or 2 years and if there are 15 years of messages it is a huge waste to query everything.
Update on 12/3/2019
I'm still getting this issue. I recently switched back to trying to use Delta again whereas before I was querying everything from the server even though I might only need the last month of data. But that's super wasteful.
This code works fine for most mailboxes but sometimes I encounter a mailbox with this issue.
My code looks like this.
string sStartingTime = startingTime.ToString("yyyy'-'MM'-'dd'T'HH':'mm':'ss") + "Z";
var messageCollectionPage = await client.Users[mailboxsource.GetMailboxIdFromAccountID()].MailFolders[folder.Id].Messages.Delta().Request()
.Filter("receivedDateTime+ge+" + Uri.EscapeDataString(sStartingTime))
.Select(select)
.Header("Prefer", "odata.maxpagesize=" + preferredPageSize)
.OrderBy("receivedDateTime desc")
.GetAsync(cancellationToken);
At around 5000 results the Delta request just stops returning results even though there are 66K items in the folder.
Paul, my peers confirmed there is indeed a 5000-item limit if you apply $filter to a delta query of the message resource.
Within the next day, the docs will also be updated with this information. Thank you for your patience and support!

twitter API limiting tweets to one day, tweepy

I'm trying to pull data from Twitter over a month or so for a project. There are <10000 tweets over this time period with this hashtag, but I'm only seeming to get all the tweets from the current day. I got 68 yesterday, and 80 today; both were timestamped with the current day.
api = tweepy.API(auth)
igsjc_tweets = api.search(q="#igsjc", since='2014-12-31', count=100000)
ipdb> len(igsjc_tweets)
80
I know for certain there should be more than 80 tweets. I've heard that Twitter rate-limits to 1500 tweets at a time, but does it also rate-limit to a certain day? Note that I've also tried the Cursor approach with
igsjc_tweets = tweepy.Cursor(api.search, q="#igsjc", since='2015-12-31', count=10000)
This also only gets me 80 tweets. Any tips or suggestions on how to get the full data would be appreciated.
Here's the official tweepy tutorial on Cursor. Note: you need to iterate through the Cursor, shown below. Also, there is a max count that you can pass .items(), so it's probably a good idea to pull month-by-month or something similar and probably a good idea to sleep in between calls. HTH!
igsjc_tweets_jan = [tweet for tweet in tweepy.Cursor(
api.search, q="#igsjc", since='2016-01-01', until='2016-01-31').items(1000)]
First, tweepy cannot bring too old data using its search API
I don't know the exact limitation but maybe month or two back only.
anyway,
you can use this piece of code to get tweets.
i run it in order to get tweets from last few days and it works for me.
notice that you can refine it and add geocode information - i left an example commented out for you
flag = True
last_id = None
while (flag):
flag = False
for status in tweepy.Cursor(api.search,
#q='geocode:"37.781157,-122.398720,1mi" since:'+since+' until:'+until+' include:retweets',
q="#igsjc",
since='2015-12-31',
max_id=last_id,
result_type='recent',
include_entities=True,
monitor_rate_limit=False,
wait_on_rate_limit=False).items(300):
tweet = status._json
print(Tweet)
flag = True # there still some more data to collect
last_id = status.id # for next time
Good luck

Resources