How does Twitter Search API rate limit work? - twitter

I am not clear about what the Twitter rate limit, "350 requests per hour per access token/user", means. How they are limiting the request? In 1 request how much data i can get?

The rate limits are based on request, not the amount of data (e.g. bytes) you receive. With that in mind, you can maximize requests by using the available parameters of the particular endpoint you're calling. I'll give you a couple examples to explain what I mean:
One way is to set count, if supported, to the highest available value. On statuses/home_timeline, you can max out count at 200. If you aren't using it now, you're getting the default of 20, which means that you would (theoretically) need to do 10 more queries the get the same amount of data. More queries mean you eat up rate limit.
Using statuses/home_timeline again, notice that you can page through data using since_id and max_id, as described in Working with Timelines. Essentially, you keep track of the tweets you already requested so you can save on requests by only getting the newest tweets.
Rate limits are in 15 minute windows, so you can pace your requests to minimize the chance of running out in any give time window.
Use a combination of Streams and requests, which increases your rate limit.
There are more optimizations like this that help you save request limit, some more subtle than others. Looking at the rate limits per API, studying parameters, and thinking about how the API is used can help you minimize rate limit usage.

Related

What's the most efficient way to handle quota for the YouTube Data API when developing a chat bot?

I'm currently developing a chat bot for one specific YouTube channel, which can already fetch messages from the currently active livechat. However I noticed my quota usage shooting up, so I took the "liberty" to calculate my quota cost.
My API call currently looks like this https://www.googleapis.com/youtube/v3/liveChat/messages?liveChatId=some_livechat_id&part=snippet,authorDetails&pageToken=pageTokenIfProvided, which uses up 5 units. I checked this by running one API call and comparing the quota usage before and after (so apologies, if this is inaccurate). The response contains pollingIntervalMillis set to 5086 milliseconds. Currently, my bot adds that interval to the current datetime and schedules the next fetch at that time (using Celery), so it currently fetches messages at a rate of 4-6 seconds. I'm gonna take the liberty and always wait for 6 seconds.
Calculating my API quota would result in a usage of 72.000 units per day:
10 requests per minute * 60 minutes * 24 hours = 14.400 requests per day
14.400 requests * 5 units per request = 72.000 units per day
This means that if I used the pollingIntervalMillis as a guideline for how often to request, I'd easily reach the maximum quota of 10.000 units by running the bot for 3 hours and 20 minutes. In order to not use up the quota by just fetching chat messages, I would need to run 1 API call per minute (1,3889 approximately). This is very unfeasible for a chatbot, since this is only for fetching messages and not even sending any messages to the chat.
So my question is: Is there maybe a more efficient way to fetch chat messages which won't use up the quota so much? Or will I only get this resolved by applying for a quota extension? And if this is only resolved by a quota extension, how much would I need to ask for reliably? Around 100k units? Even more?
I am also asking myself how something like Streamlabs Chatbot (previously known as AnkhBot) accomplishes this without hitting the quota limit despite thousands of users using their API client, their quota must probably be in the millions or billions.
And another question would be how I'd actually fill out the form, if the bot is still in this "early" state of development?
You pretty much hit the nail on the head. Services like Streamlabs are owned by larger companies, in their case Logitech. They not only have the money to throw around for things like increasing their API quota, but they also have professional relationships with companies like Google to decrease their per unit cost.
As for efficiency, the API costs are easily found in the documentation, but for live chat as you've found, you're going to be hitting the API for 5 units per hit. The only way to improve your overall daily cost with your calls is to perform them less frequently. While once per minute is clearly excessively long, once every 15-18 seconds could reduce the overall cost of your API quota increase, while making the chat bot adequately responsive.
Of course that all depends on your desired usage of the data, but still a recommendation if you're implementing the bot still in the realm of hobbyist usage.

Getting realtime twitter search results using the streaming API

I have an application where I need to get complete, realtime search results from twitter (preferably polling every 500ms or less). Based on my understanding, doing this using the search API will run into rate limits very quickly. However, the streaming API doesn't seem to support getting complete anything (only a 5% sample).
More specifically, I have a search query term which typically comes up with <20 matching tweets per hour, and I would like to be informed of these new tweets within 1-2 seconds, and it is considered a failure if I am not notified within 5 seconds. Due to the relatively low frequency of posting, missing even one tweet is very undesirable.
Is there any way I can realistically do this using twitter API, or is my only choice to write a browser extension to repeatedly refresh the search page?
The answer is "yes". Although you are rate limited (the limit is closer to 1% than 5%), that is only a cutoff based on your query results. Very roughly, you can stream about 60 tweets per second max. In your case, you say you expect under 20 tweets per hour, so you should have no problem getting all those tweets.
You also require a latency less than 5 seconds. In my experience latency has always been a second or two. I think you should be fine.

Ways to pull (potentially) large amounts of data from Twitter

I've been playing around with the Twitter API using Twitter4j. I am trying to pull data given a keyword and date, and example of a query I would run using the REST API would be
bagels since:2014-12-27
Which would give me all tweets containing the keyword 'bagels' since 2014-12-27.
This works in theory, but I've quickly exceeded the rate limits since each query allows up to 100 results, and only 180 queries are allowed within a 15-minute interval. There are many keywords that return more than 18k results.
Is there a better way to pull large amounts of data from Twitter? I looked at the Streaming API but I don't know if I can pull data from a certain date range.
There are a few things you can do to improve your rates:
Make sure your count is maxed at 100, which it looks like you're doing.
Use Application-Only authorization - it increases your rate limit to 450.
Use the max_id, since_id parameters to page through data and avoid querying for results you're already received. See the Working with Timelines docs to see what I mean.
Consider using Gnip if you're willing to pay to remove rate limits.

Twitter rate limiting confusion?

So I'm currently using NodeXL to get search for a particular Twitter hashtag, and I'm having trouble on understand how exactly the rate-limiting works. I looked it up in Twitter's API Rate Limits page, and also this SO post, but even after reading both, I don't really understand. The API page says:
Search will be limited at 180 queries per 15 minute window for the time being.
and also
Rate limiting in version 1.1 of the API is primarily considered on a per-user basis — or more accurately described, per access token in your control. If a method allows for 15 requests per rate limit window, then it allows you to make 15 requests per window per leveraged access token.
But I'm totally confused... probably because I've never really worked with anything database, or social network analysis before.
When it says that it always 180 queries per 15 minutes, what exactly constitutes a query? The way the search works on NodeXL is that you limit the amount of tweets you are searching for. So if I search once and set my tweet limit to 1000 tweets, is that only 1 query?
Sorry if this seems like a stupid or really elementary question, but I just don't have any experience with this stuff at all, and any help would be much appreciated, thanks!
When it says that it always 180 queries per 15 minutes, what exactly
constitutes a query?
Whenever you make one request to Twitter, its considered as one query. For Search API you can make 180 calls per 15 minute.
So if I search once and set my tweet limit to 1000 tweets, is that
only 1 query?
Yes, but you can't set count to 1000 since the maximum tweets you can return per request is 100 as it mentioned here.
You can retrieve the latest 100 tweets with the normal search query and for pagination you should use since_id and max_id to retrieve the next 100 tweets for fresh tweets.
The number of queries you can make per 15 min windows varies by API. For example, you can query 180 requests per 15 min window if you use Search API. But, if you use API like GET friends/ids, it's limited to 15 query per 15 min windows. i.e you can make call only 15 times per 15 minutes.
Here's the Rate limits chart where you can find how many requests you can make per 15 window for each API.

Twitter Search API rate limit

I'm using the Twitter search API (for example: http://search.twitter.com/search.rss?q=%23juventus&rpp=100&page=4)
I read here: http://search.twitter.com/api/ this:
We do not rate limit the search API under ordinary circumstances, however we have put measures in place to limit the abuse of our API. If you find yourself encountering these limits, please contact us and describe your app's requirements.
The limit seems random: sometimes I do 150 requests sometimes 300, generally, after 5 minutes I can do other requests.
I was wondering if is it possible do more requests
They'll detect floods and throttle accordingly rather than publisher defined limits, which is why it appears random. It'll also no doubt be based on load from other sources at the time.
If you need lots more, then they gave you the answer - contact them telling them why.

Resources