How can I setup rate limiting in Dropwizard per user?
I am doing user authentication and authorization and on the top of that I want rate limiting also so that user can not bombard my application with huge number of calls. the rate limits might vary per user according to use cases.
Related
Currently we are on Google My Business API V3, and we are with 5 QPS rate limiting. I see the new release of V4 API. In change log i see some minor upgrades to the API but i don't see any talk about rate limits.
Are they same as previous rate limits?
There are changes to rate limits, earlier every API call used to fall into the same bucket of 5QPS. Now they separated the rate limiting buckets to different types of API calls.
You can find more details about the newer rate limits on following link
https://developers.google.com/my-business/content/limits
I'm considering using the Twitter Streaming API (public streams) to keep track of the latest tweets for many users (up to 100k). Despite having read various sources regarding the different rate limits, I still have couple of questions:
According to the documentation: The default access level allows up to 400 track keywords, 5,000 follow userids. What are the best practices to follow more the 5k users. Creating, for example, 20 applications to get 20 different access tokens?
If I follow just one single user, does the rule of thumb "You get about 1% of all tweets" indeed apply? And how does this changes if I add more users up to 5k?
Might using the REST API be a reasonable alternative somehow, e.g., by polling the latest tweets of users on a minute-by-minute basis?
What are the best practices to follow more the 5k users. Creating, for example, 20 applications to get 20 different access tokens?
You don't want to use multiple applications. This response from a mod sums up the situation well. The Twitter Streaming API documentation also specifically calls out devs who attempt to do this:
Each account may create only one standing connection to the public endpoints, and connecting to a public stream more than once with the same account credentials will cause the oldest connection to be disconnected.
Clients which make excessive connection attempts (both successful and unsuccessful) run the risk of having their IP automatically banned.
A rate limit is a rate limit--you can't get more than Twitter allows.
If I follow just one single user, does the rule of thumb "You get about 1% of all tweets" indeed apply? And how does this changes if I add more users up to 5k?
The 1% rule still applies, but it is very unlikely impossible for one user to be responsible for at least 1% of all tweet volume in a given time interval. More users means more tweets, but unless all 5k are very high-volume tweet-ers you shouldn't have a problem.
Might using the REST API be a reasonable alternative somehow, e.g., by polling the latest tweets of users on a minute-by-minute basis?
Interesting idea, but probably not. You're also rate-limited in the Search API. For GET/statuses/user_timeline, the rate limit is 180 queries per 15 minutes. You can only get the tweets for one user with this endpoint, and the regular GET/search/tweets doesn't accept user id as a parameter, so you can't take advantage of that (also 180 query/15 min rate limited).
The Twitter Streaming and REST API overviews are excellent and merit a thorough reading. Tweepy unfortunately has spotty documentation and Twython isn't too much better, but they both leverage the Twitter APIs directly so this will give you a good understanding of how everything works. Good luck!
To get past the 400 keywords and 5k followers, you need to apply for enterprise access.
Basic
400 keywords, 5,000 userids and 25 location boxes
One filter rule on one allowed connection, disconnection required to adjust rule
Enterprise
Up to 250,000 filters per stream, up to 2,048 characters each.
Thousands of rules on a single connection, no disconnection needed to add/remove rules using Rules API
https://developer.twitter.com/en/enterprise
Is there a max limit on the valance API. I've made a number of calls, but I put some self throttling in the program. It makes a call to the user page, loops through the data, and then makes another call. It probably averaged 1 call every second or so.
I'm looking at expanding some functionality and I'm worried that we may reach a limit if we aren't careful about how we go doing everything.
So, is there a limit to how often we can call the valance api?
The back-end LMS can be configured to rate limit on Valence Learning Framework API calls; however, by default this does not get configured as active. To be sure, you should consult with the administrators of your back-end LMS.
Update: Brightspace no longer supports this kind of rate limiting mentioned. As Brightspace evolved, D2L found that the rate limiting was not providing the value that was originally intended, and as a result D2L deprecated the feature. D2L is no longer rate limiting the Brightspace APIs and instead depend on developer self-governance and asynchronous APIs for more resource intensive operations (the APIs around importing courses, for example). When you use the Brightspace APIs, you should be mindful that you are using the same computing resources as made available to end users interacting with the web UI, and if you over-stress these resources (as can easily be done through any API), you can have a negative impact on these end users.
Primary question: Will Twitter's rate limits allow me to do the data mining necessary to construct a complete social network graph with all directed edges among about 600K users?
Here is the idea:
The edges/ties/relations in the network will be follower/followed relationships.
Start with a specific list of approximately 600 Twitter users, chosen because they all are from all of the news outlets in a large city.
Collect all of the followers and friends (people they follow) for all 600 users. These users probably have an average number of followers of 2,000 each. They probably have an average number of friends (people they follow) of 500.
Since these followers of the 600 are all in the same city, it is expected that many of these followers would be the same users following these 600 people. So let's approximate and guess that these 600 users have approximately 600,000 followers and friends in total. So this would be a subgraph/network of 600,600 total Twitter users.
So once I have collected all of the 600,000 followers and friends of all of these 600 people, I want to be able to construct a social network of all of these 600,600 people AND their followers. This would require me to be able to at least find all of the directed edges amongst these 600,600 users (whether or not each of these 600,600 users follow each other).
With Twitter rate limits, would this kind of data mining be feasible?
I'll answer these questions in reverse order, starting with David Marx first:
Well, I do have access to a pretty robust computer research center with a ton of storage capacity, so that should not be an issue. I don't know if the software can handle it, however.
Chances are that I will have to scale down the project, which is OK. The idea for me is to start out with a bigger idea, figure out how big it can be, and then pare down accordingly.
Following up on Anony-Mousse's question now: Part of my problem is that I am not sure I am interpreting the Twitter rate limits correctly. I'm not sure if it's 15 requests per 15 minutes, or 30 requests per 15 minutes. And I think 1 request will get 5000 followers/friends, so you could presumably collect 75,000 friends or followers every 15 minutes if the limit is 15 requests per 15 minutes. I'm also trying to figure out if there is any process for requesting higher rate limits for any kind of research purposes.
Here is where they list the limits:
https://dev.twitter.com/docs/rate-limiting/1.1/limits
Primary question: Will Twitter's rate limits allow me to do the data mining (...)
Yes, it is technically feasible, however it will take ages in case you are using only one API user access tokens. I mean here probably more than 6 Months of uninterrupted run.
To be more precise:
the extraction of nodes (twitter users) can be done very quickly as you will use users/lookup API endpoint, which lets you extract 100 nodes per request, and make 180 requests per 15 minutes window (per access token you have)
the extraction of edges (follow relationship between users) is the slow part, you will use friends/ids and followers/ids API endpoints, limited at 15 queries per 15 minutes and letting you extract at most 5000 friends of followers for a unique user per request.
You can use the nodes metadata (descriptions texts, locations, languages, time zones) to perform some interesting analysis, even without having extracted the 'graph' (follow relationships between everyone)
A work around this is to parallelize sub-parts of the extraction by spreading the extraction across several access tokens. Seems compliant to me regarding the terms of use, as long as you respect protected accounts.
In any case you should filter out extraction of edges for celebrities (you probably do not want to extract the followers of hootsuite, there are almost 6 millions of them).
disclaimer: self-promotion here: in case you do not want to develop this yourself I could do the extraction for you and provide you the graph file, as I am extracting twitter graphs at tribalytics. (I have read this and that before posting).
I'm also trying to figure out if there is any process for requesting higher rate limits for any kind of research purposes
Officially, there are no more white-listed apps with higher rate limits, like there could be with previous version of twitter's API. You probably should still contact twitter and see whether they can help you as your work is for academic purpose.
Chances are that I will have to scale down the project, which is OK
I would advise you to reduce your initial list of 600 users as much as you can. Only keep those who are really central regarding to your topic, and whose audience is not too large. Extracting graph of local celebrities will give you a graph with many people not related at all to the population you want to study.
I am developing an application in which I am using Compas and GPS for iPhone Platform and after that I will make this for Android also in Unity.
So I am sniffing Compas related question and I found an Answer that there is a Limit for Querying Google Api. I am a bit scared because I want to query each second to update user location and compas information of device.
Does anyone know what is the maximum limit of Query? What is a better approach to do this task?
If you are referring to the Google Elevation API, then yes, there is a limit. Basically, ALL Google API's have Usage Limits. From the Elevation API webpage:
Usage Limits
Use of the Google Elevation API is subject to a limit of 2,500
requests per day (Maps API for Business users may send up to 100,000
requests per day). In each given request you may query the elevation
of up to 512 locations, but you may not exceed 25,000 total locations
per day (1,000,000 for Maps API for Business users). This limit is
enforced to prevent abuse and/or repurposing of the Elevation API, and
this limit may be changed in the future without notice. Additionally,
we enforce a request rate limit to prevent abuse of the service. If
you exceed the 24-hour limit or otherwise abuse the service, the
Elevation API may stop working for you temporarily. If you continue to
exceed this limit, your access to the Elevation API may be blocked.
If you need more requests, you may have to use Maps API for Business.