Fastest way to track API access - ruby-on-rails

I'm creating an API for my Rails app, and I want to track how many times a user calls a particular API method, and cap them say at like 1,000 requests per day. I'm expecting very high request volumes across multiple users.
Do you have a suggestion as to how I can keep track of something like that per user? I want to avoid having to write to the database repeatedly and deal with locks.
I'm okay doing a delayed write (API limit don't have to be super exact), but is there a standard way of doing this?

You could try Apigee. It looks like it's "free up to 10,000 messages per hour".
Disclaimer: I have never used Apigee.

It really depends on the # of servers, the dataset, the # of users, etc.
One approach would be to maintain a quota datastructure in memory on the server and update it per invocation. If you have multiple servers you could maintain a memcache of the quota. Obviously, a memory-based implementation wouldn't survive a reboot or restart, so some sort of serializiation would be required to support that.
If quota accuracy is critical it's probably best to just do it in the DB. You could do it in a file, but then you face the same issues you're trying to avoid /w the database.
EDIT:
You could also do a mixed approach -- maintain a memory-based cache of user|api|invocation counts and periodically write them to the database.
A bit more info on the requirements would help pare down the options..

Here's a way of doing it using the rails cache
call_count_key = "api_calls_#{params[:api_key]}_#{Time.now.strftime('%Y-%m-%d-%Hh')}"
call_count = Rails.cache.read(call_count_key) || 0
call_count += 1
Rails.cache.write call_count_key, call_count
# here is our limit
raise "too many calls" if call_count > 100
This isn't a perfect solution as it doesn't handle concurrency properly and if you're using the in memory cache (rails' default) then this will be a per process counter

If you're ok with a hosted solution, take a look at my company, WebServius ( http://www.webservius.com ) that does API management (issuing keys, enforcing quotas, etc). We also have a billing support, so that you will be able to set per-call prices, etc.

Related

Max number of queues

Is there a limit on the number of queues you can create with AWS SQS?
This page https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-limits.html doesn't state so one way or the other.
We're not looking to create thousands of the things but might dynamically create a good few dozen for a while then destroy them. I've come across unexpected limits with AWS before (only 4 transcoding pipelines - why?) so need to be sure on this.
Thanks for any advice.
AB
Indeed there is no informations about that in the AWS documentation.
I don't think there is a limit on the queues number.
We are actually working with 28 fulltime Queues on our infrastructure without any problem.
If at least you hit a limit, a simple AWS support ticket can increase it.
Just like the Ec2 Number limit increase process.
Hope it helps

Respond with large amount of objects through a Rails API

I currently have an API for one of my projects and a service that is responsible for generating export files as CSVs, archive and store them somewhere in the cloud.
Since my API is written in Rails and my service in plain Ruby, I use the Her gem in the service to interact with the API. But I find my current implementation less performant, since I do a Model.all in my service, which in turn triggers a request that may contain way too many objects in the response.
I am curious on how to improve this whole task. Here's what I've thought of:
implement pagination at API level and call Model.where(page: xxx) from my service;
generate the actual CSV at API level and send the CSV back to the service (this may be done sync or async).
If I were to use the first approach, how many objects should I retrieve per page? How big should a response be?
If I were to use the second approach, this would bring quite an overhead to the request (and I guess API requests shouldn't take that long) and I also wonder whether it's really the API's job to do this.
What approach should I follow? Or, is there something better that I'm missing?
You need to pass a lot of information through a ruby process, that's always not simple, I don't think you're missing anything here.
If you decide to generate CSVs at the API level then what do you get with maintaining the service? You could just ditch the service altogether because replacing your service with an nginx proxy would do the same thing better (if you're just streaming the response from API host)?
If you decide to paginate, there will be a performance reduction for sure, but nobody can tell you exactly how much you should paginate - bigger pages will be faster and consume more memory (reducing throughput by being able to run less workers), smaller pages will be slower and consume less memory but demand more workers because of IO wait times,
exact numbers will depend on the IO response times of your API app and the cloud and your infrastructure, I'm afraid no one can give you a simple answer you can follow without experimentation with a stress test, and once you set up a stress test, you will get a number of your own anyway - better than anybody's estimate.
A suggestion, write a bit more about your problem, constraints you are working under etc and maybe someone can help you with a bit more radical solution. For some reason I get the feeling that what you're really looking for is a background processor like sidekiq or delayed job, or maybe connect your service to the DB directly through a DB view if you are anxoius to decouple your apps, or an nginx proxy for API responses, or nothing at all... but I really can't tell without more information.
I think it really depends how you want do define 'performance' and what your goal for your API is. Do you want to make sure no request to your API takes longer than 20msec to respond, than adding pagination would be a reasonable approach. Especially if the CSV generation is just an edge case, and the API is really built for other services. The number of items per page would then be limited by the speed at which you can deliver them. Your service would not be particularly more performant (even less so), since it needs to call the service multiple times.
Creating an async call (maybe with a webhook as callback) would be worth adding to your API if you think it is a valid use case for services to dump the whole record set.
Having said that, I think strictly speaking it is the job of the API to be quick and responsive. So maybe try to figure out how caching can improve response times, so paging through all the records is reasonable. On the other hand it is the job of the service to be mindful of the amount of calls to the API, so maybe store old records locally and only poll for updates instead of dumping the whole set of records each time.

Ruby on Rails performance on lots of requests and DB updates per second

I'm developing a polling application that will deal with an average of 1000-2000 votes per second coming from different users. In other words, it'll receive 1k to 2k requests per second with each request making a DB insert into the table that stores the voting data.
I'm using RoR 4 with MySQL and planning to push it to Heroku or AWS.
What performance issues related to database and the application itself should I be aware of?
How can I address this amount of inserts per second into the database?
EDIT
I was thinking in not inserting into the DB for each request, but instead writing to a memory stream the insert data. So I would have a scheduled job running every second that would read from this memory stream and generate a bulk insert, avoiding each insert to be made atomically. But i cannot think in a nice way to implement this.
While you can certainly do what you need to do in AWS, that high level of I/O will probably cost you. RDS can support up to 30,000 IOPS; you can also use multiple EBS volumes in different configurations to support high IO if you want to run the database yourself.
Depending on your planned usage patterns, I would probably look at pushing into an in-memory data store, something like memcached or redis, and then processing the requests from there. You could also look at DynamoDB, which might work depending on how your data is structured.
Are you going to have that level of sustained throughput consistently, or will it be in bursts? Do you absolutely have to preserve every single vote, or do you just need summary data? How much will you need to scale - i.e. will you ever get to 20,000 votes per second? 200,000?
These type of questions will help determine the proper architecture.

Storing sessions in db in rails

I've recently encountered cookieOverflow exception in my rails application. I've googled a bit and found this answer to be most helpful :
https://stackoverflow.com/a/9474262/169277
After having implemented storing sessions in database I'm trying to figure out the drawbacks of this approach so far I see around 1200 entries in sessions table which was populated in only few hours.
When does actual interaction with database occurs, only when writing data to session or?
This grows rather fast so is there a way to purge old unused sessions from db other than having some daily cron jobs or something.
I'm just looking some additional information regarding this approach, right now I'm thinking should I keep it or change logic of my app.
> 4KB in a cookie is a lot, so changing your app is probably not a bad idea to consider.
That said, 1200 in a few hours doesn't seem outlandish. If you're worried about growing it unbounded, you can use memcache or redis as a caching layer to store your cookies instead of your database. That would free you from worrying about growth in your database. The downside is that evictions probably mean that you're logging people out.
All that said, we have a number of daily cron-like jobs that clean out our database tables, not for sessions, but it's similar. They run at night when utilization is low anyway.

Number of Average Concurrent and Peak Concurrent Users for a Web Application

I have a RoR 2.1 Web Application up and running on the Mongrel server and now i want to calculate the average number of concurrent users and peak concurrent users for the web application.
Is there an explicit way to figure this out or else what analytics should i use for calculating this?
The Back end of my application is MySQL and i tried looking for the values of Threads_connected and Threads_created in the Mysql Status. This values returns the number of currently open connections and the number of threads created to handle connections.
Does these value directly imply the number of currently connected users? If not please suggest ways of calculating these values.
The MySQL stats reflect on the number of processes connected to the database, so if you had 10 mongrels up you would probably see 10 there (plus whatever processes you had connected to the db ( scripts, daemons, console sessions etc)). This wouldn't change whether there were 100 users using the site or none (unless you have something scaling the number of processes)
In terms of number of users, google analytics can give a good idea of this sort of thing or analysing your own log files, depending on the level of sophistication you require.
Services like newrelic or union central are good if the end goal of this is figure out what server resources you need.
You can only have as much concurrent users as you have mongrel processes, as Rails is single threaded unless you call threadsafe when configuring your app (but you should never do this unless you really understand the implications of doing so).

Resources