Twitter Rate Limiting IP/OAuth Concerns - twitter

I have a series of webapps that collects all terms relating to a subject using the Public Streaming API. So far, I've been taking a very, very arduous route of creating a new account for each stream, setting up a new Twitter application on that account, copying the OAuth tokens, spinning up a new EC2 instance, and setting up the stream.
This allows me to have the streams coming from multiple different IPs, OAuth generation is easy with the generator tool when you create an app, and because they are each in different accounts I don't meet any account limits.
I'm wondering if there's anything I can do to speed up the process, specifically in terms of EC2 instances. Can I have a bunch of streams running off the same instance using different accounts?

If you run multiple consumers from a single machine you may be temporarily banned,
repeated bans may end up getting you banned for longer periods.
At least, this happened to me a few times in the past.
What I had found at the time was:
same credentials, same ip -> block/ban
different credentials, same ip -> mostly ok, but banned from time to time
different credentials, different ip -> ok
This was a few years ago, so I am not sure the same is still true, but I'd expect twitter to have tightened the rules, rather than having relaxed them.
(Also, I think you're infringing their ToS)

You should check the new Twitter API version 1.1. It was released a few days ago and many changes were made on how the rates are calculated.
One big change is that the IP is completely ignored now. So you don't have to create many instances anymore (profit!)
From the Twitter dev #episod:
Unlike in API v1, all rate limiting is per user per app -- IP address has no involvement in rate limiting consideration. Rate limits are completely partitioned between applications.

Related

Nginx rate limiting for unique IPs

we've been dealing with constant attacks on our authentication url, we're talking millions of requests per day, my guess is they are trying to brute force passwords.
Whenever we would block the IP with the server firewall, few seconds later the attacks would start again from a different IP.
we ended up implementing a combination of throttling through rack-attack plus custom code to dynamically block the IPs in the firewall. But as we improved our software's security, so did the attackers, and now we are seeing every request they make is done from a different IP, one call per IP, still several per seconds, not as many but still an issue.
Now i'm trying to figure out what else can i do to prevent this, we tried recaptcha but quickly ran out of the monthly quota and then nobody can login.
I'm looking into Nginx rate limiter but from what I can see it also uses the IP, considering they now rotate IPs for each request, is there a way that this would work?
Any other suggestions on how to handle this, maybe one of you went through the same thing?
Stack: Nginx and Rails 4, Ubuntu 16.
For your condition, the most effective way to prevent this attack is using captcha. You mentioned that you have used recaptcha, and that can be run out of soon, but if you develeop the captcha yourself and you would have unlimited captcha images.
As for the other prevent method, like lock the IPs, this is always useless when the attackers use IP pool, there are so many IPs(including some IoT devices' IPs) that you can not identify/lock them all even if you use the commercial Threat Intelligence Pool.
So the suggestion like this
Develop the captcha yourself,and implement this on your api,
Identify and lock the IPs that you think malicious
Set some rules to identify the UA and Cookie of the http request (usually the normal request is deferent from the attack)
Use WAF (if you have enough budget)

Video Streaming for Mobile App

I'm building an iOS app for a client that allows users to pay a subscription and unlock additional content within the app. Part of the additional content will be videos which need to be streamed from a server... but I'm not sure whether we should use a hosting service (like Amazon CloudFront or Wowza, perhaps?) or roll our own solution.
Have any of you had experience with either of these options? I looks like this is supported natively by nginx, which we're currently using as our rProxy, but I'd like to hear some thoughts about that. I would be somewhat concerned about saturating our server's 1Gb network connection too...
Whatever the solution, we must be able to verify a person's account before they can access the video content. Variable bitrate is also desirable, and the ability to support >500 concurrent users. This company is also a new startup, so subscription costs are an important factor.
It is usually best to deploy streaming-specific software or services instead of generic HTTP servers such as Nginx. For Wowza, as an example, here's a quick list of features for this type of workflow.
Performance and scalability. You can do a quick comparison on playing back concurrent streams (using load test tools) and see what kind of load can be handled by an HTTP server vs Wowza.
Monitoring. Statistics collection is also integrated with Wowza, which may prove beneficial for start-up companies that need to leverage this kind of data mining.
Security. Wowza also has several options that you can use, such as Secure Token. For example, you can configure your mobile app to query the user's IP address once you determine that they are authorized to receive the stream. You can then generate a hash token based on this IP address and the stream they are authorized for, and only allow playback with the valid token. You can also expire these tokens.
Manager UI. Not as attractive for developers/sys admins, but users can take advantage of a relatively intuitive UI.
Extensibility. Wowza has REST and Java API that can allow you to add custom modules or integrate third-party systems. For example, you can use a custom module that monitors stream connection time, and cuts off any connections that are longer than x number of hours.

Twitter Streaming API to follow thousands of users

I'm considering using the Twitter Streaming API (public streams) to keep track of the latest tweets for many users (up to 100k). Despite having read various sources regarding the different rate limits, I still have couple of questions:
According to the documentation: The default access level allows up to 400 track keywords, 5,000 follow userids. What are the best practices to follow more the 5k users. Creating, for example, 20 applications to get 20 different access tokens?
If I follow just one single user, does the rule of thumb "You get about 1% of all tweets" indeed apply? And how does this changes if I add more users up to 5k?
Might using the REST API be a reasonable alternative somehow, e.g., by polling the latest tweets of users on a minute-by-minute basis?
What are the best practices to follow more the 5k users. Creating, for example, 20 applications to get 20 different access tokens?
You don't want to use multiple applications. This response from a mod sums up the situation well. The Twitter Streaming API documentation also specifically calls out devs who attempt to do this:
Each account may create only one standing connection to the public endpoints, and connecting to a public stream more than once with the same account credentials will cause the oldest connection to be disconnected.
Clients which make excessive connection attempts (both successful and unsuccessful) run the risk of having their IP automatically banned.
A rate limit is a rate limit--you can't get more than Twitter allows.
If I follow just one single user, does the rule of thumb "You get about 1% of all tweets" indeed apply? And how does this changes if I add more users up to 5k?
The 1% rule still applies, but it is very unlikely impossible for one user to be responsible for at least 1% of all tweet volume in a given time interval. More users means more tweets, but unless all 5k are very high-volume tweet-ers you shouldn't have a problem.
Might using the REST API be a reasonable alternative somehow, e.g., by polling the latest tweets of users on a minute-by-minute basis?
Interesting idea, but probably not. You're also rate-limited in the Search API. For GET/statuses/user_timeline, the rate limit is 180 queries per 15 minutes. You can only get the tweets for one user with this endpoint, and the regular GET/search/tweets doesn't accept user id as a parameter, so you can't take advantage of that (also 180 query/15 min rate limited).
The Twitter Streaming and REST API overviews are excellent and merit a thorough reading. Tweepy unfortunately has spotty documentation and Twython isn't too much better, but they both leverage the Twitter APIs directly so this will give you a good understanding of how everything works. Good luck!
To get past the 400 keywords and 5k followers, you need to apply for enterprise access.
Basic
400 keywords, 5,000 userids and 25 location boxes
One filter rule on one allowed connection, disconnection required to adjust rule
Enterprise
Up to 250,000 filters per stream, up to 2,048 characters each.
Thousands of rules on a single connection, no disconnection needed to add/remove rules using Rules API
https://developer.twitter.com/en/enterprise

How to prevent gaming of website rewards for new visitors

I'm about to embark on a website build where a company wants to reward new visitors with a gift. The gift has some monetary value, and I'm concerned about the site being gamed. I'm looking for ways to help reduce the chance that any one person can drain the entire gift inventory.
The plans call for an integration with Facebook, so authenticating with your FB credentials will provide at least a bit of confidence that a new visitor is actually a real person (assuming that scripting the creation of 100's of FB accounts and then authenticating with them is no simple task).
However, there is also a requirement to reward new visitors who do not have FB accounts, and this is where I'm looking for ideas. An email verification system by itself won't cut it, because it's extremely easy to obtain countless number of email address (me+1#gmail.com, me+2#gmail.com, etc). I've been told that asking for a credit card number is too much of a barrier.
Are there some fairly solid strategies or services for dealing with situations like this?
EDIT: The "gift" is virtual - like a coupon
Ultimately, this is an uphill, loosing battle. If there will be incentive to beat the system, someone will try and they will eventually succeed. (See for example: every DRM scheme ever implemented.)
That said, there are strategies to reduce the ease of gaming the system.
I wouldn't really consider FB accounts to be that secure. The barrier to creating a new FB account is probably negligibly higher than creating a new webmail account.
Filtering by IP address is bound to be a disaster. There may be thousands of users behind a proxy on a single IP address (cough, AOL), and a scammer could employ a botnet to distribute each account requests to a unique IP. It is likely to be more trouble than it is worth to preemptively block IPs, but you could analyze the requests later—for example, before actually sending the reward—to see if there's lots of suspicious behavior from an IP.
Requiring a credit card number is a good start, but you've already ruled that out. Also consider that one individual can have 10 or more card numbers between actual credit cards, debit cards, and one-time-use card numbers.
Consider sending a verification code via SMS to PSTN numbers. This will cost you some money (a few cents per message), but it also costs a scammer a decent amount of change to acquire a large number of phone numbers to receive those messages. (Depending on the value of your incentive, the cost a prepaid SIM may make it cost-prohibitive.) Of course, if a scammer already has many SMS-receiving PSTN numbers at his disposal, this won't work.
First thing I wonder is if these gifts need to be sent to a physical address. It's easy to spoof 100 email addresses or FB accounts but coming up with 100 clearly unique physical addresses is much harder, obviously.
Of course, You may be giving them an e-coupon or something so address might not be an option.
Once upon a time I wrote a pretty intense anti-gaming script for a contest judging utility. While this was many months of development and is far too complex to describe in great detail, I can outline the basic features of the script:
For one we logged every detail we could when a user applied for the contest. It was pretty easy to catch obvious similarities in accounts by factoring the average time between logins / submissions from a group of criteria (like IP, browser, etc - all things that can be spoofed so by themselves it is unreliable). In addition, I compared account credentials for obvious gaming - like acct1#yahoo.com, acct2#yahoo.com, etc. by using a combination of levenshtein distance which is not solely reliable - as well as a parsing script that broke apart the various details of the credentials and looked for patterns.
Depending on the scores of each test, we assigned a probability of gaming as well as a list of possible account matches. Then it was up to the admins to exclude them from the results.
You could go on for months refining your algorithm and never get it perfect. That's why my script only flagged accounts and did not take any automatic action.
Since you're talking about inventory, can we therefore assume your gift is an actual physical item?
If so, then delivery of the gift will require a physical address for delivery - requiring unique addresses (or, allowing duplicate addresses but flagging those users for manual review) should be a good restriction.
My premise is this: While you can theoretically run a script to create 100s of Facebook or Google accounts, exercising physical control over hundreds of distinct real world delivery locations is a whole different class of problem.
I would suggest a more 'real world' solution in stead of all the security: make it clear that it is one coupon per address. Fysical (delivery and/or payment) address. Then just do as you want, maybe limit it by email or something for the looks of it, but in the end, limit it per real end-user, not per person receiving the coupon.

reading from multiple imap.gmail.com from the same fetchmail client

For my portfolio software I have been using fetchmail to read from a Google email account over IMAP and life has been great. Thanks to the miracle of idle connection supported by imap3, my triggers fire in near-realtime due to server push, much sooner than periodic polling would allow otherwise.
In my basic .fetchmailrc setup, in which a brokerage customer's account emails trade notifications to a dedicated Gmail/Google Apps box, I've had
poll imap.gmail.com proto imap user "youraddress#yourdomain-OR-gmail.com" pass "yoMama" keep nofetchall ssl idle mimedecode limit 29000 no rewrite mda "myCustomSpecialMDAhandler.sh %F %T"
Trouble is, now I need to support reading from multiple email boxes, and hand off the emails to other specialized MDA scripts I wrote. No problem, just add more poll lines to .fetchmailrc, right? Well that doesn't work when the other accounts also use imap.gmail.com. What ends up happening is that while one account reads fine (and not necessary the first one listed, though usually yes), the other is getting "socket error" all day and the emails remain untouched, unread. I can't figure out why and not even sure if it's some mechanism at imap.gmail.com or not, eg. limiting to one IMAP connection from a host. That doesn't seem right since I have kept IMAP connections to many separate Gmail & Google Apps accounts from the same client for years (like Thunderbird) and never noticed this exclusivity problem.
I haven't tried launching multiple fetchmail daemons using separate -f config files (assuming they wouldn't conflict), or deploying one or more getmail and other similar email fetchers in addition. Still trying to avoid that kind of mess--unscaleable the more boxes I have to monitor.
Don't have the reference offhand but somewhere in fetchmail's docs I recall reading that idle is not so much an imap feature as a fetchmail optional trick, which has a (nasty for me) side effect of choking off all other defined accounts from polling until the connection is cut off by some external event or timeout. So at least that would vindicate Google.
Credit to Carl's Whine Rack blog for some tips.
For now I use killall fetchmail; fetchmail -f fetcher.$[$RANDOM % $numaccounts].rc periodically from crontab to cycle reading accounts each defined individually in fetcher.1.rc, fetcher.2.rc, etc. Acceptable while email events are relatively infrequent.

Resources