Call API but not for every user - ruby-on-rails

I would like to do something similar to this: Rails way to call external API from view?
But I don't want to call the API for every request from users because that would put a lot of unnecessary load on the API server and deplete my quota too fast.
Is there any way to cache the response from every 100th user and display the cached version to every other user or something of the sort? There's probably something already out there to do this, but I'm very new to Ruby and would appreciate some help.

There are numerous ways to achieve what you are looking for. I would advise against caching the response per xxx user, since there are many variables around days and times where traffic will be more strenuous than others. I would advise that you ask yourself what the behaviour of the method is. Is it to pull some complex data or would it just be a simple count? If real-time information is not important, what is an acceptable timeframe for the information to be cached?
If the answer to the above questions can be answered in time metric rather than xxx Users visiting, then you may want to use the built in Rails.cache, by defining the metric collection method in a helper and then calling from a view:
def method_to_call
Rails.cache.fetch("some_method", expires_in: 1.hour) do
SomeThing.to_cache
end
end
from here you can forecast your access to the API and be certain of your usage over a defined time period, without worrying about what times of day your website may be more busy, or any unexpected spikes in application usage.
If you want to cache per xxx user visit, I would highly recommend redis. It's a fantastic piece of software that is incredibly fast and scalable. It's a key value pair store that can hold the data around unique users and page views.
Another question to ask is are you caching on individual user or individual page view? Based on the answer you can store user id or page view count and have conditional logic to refresh the cache on each xxx metric. Performance should not be too much of an issue if you have some due diligence to clear the store every week or so, depending on the data stored.
When you get to large scales of caching you might have to think about the infrastructure of hosting a redis instance. Will you need a dedicated server? Is docker a viable option for a production redis? Can you host the redis instance on the same instance of the application? All of these possible overheads favour the initial approach, but again it is dependant on your needs.

Related

Getting most recent paths visited across sessions in Rails app

I have a simple rails app with no database and no controllers. It uses High Voltage for routing queries, then uses javascript to go get data using the params hash.
A typical URL looks like this:
http://example.com/?id=37ed660aa222e61ebbbc02db
I'd like to grab the ten unique URLs users have most recently visited and pass them to a view. Note that I said users, preferably across concurrent sessions.
Is there a way to retrieve this using ActiveSupport::Notifications or Production.log? Any examples, including where the code should best go, would be greatly appreciated!
I think that Redis would be ideally suited to this. It's one of the NoSQL key-value store db's, but its support for the value part being an ordered list, queue, etc. should make it easy to store unique urls in a FIFO list as they are visited, limit the size of that list (discard urls at the 'old' end of the list), and retrieve the most recent N urls to pass to your view. Your list should stay small enough that it would all stay in memory and be very fast. You might be able to do this with memcached or mongo or another one as well; I think it would be best though if the solution kept the stored values in memory.
If you aren't already using redis (or similar), it might seem like overkill to set it up and maintain just for this feature. But you can make it pay for itself by also using it for caching, background job processing (Resque / Sidekiq), and probably other things in your app.

Limit user to perform an action a certain number of times in a day

I am using Rails 3.1.0 with Devise 2.1.0. I would like to limit the number of times a user can perform an action in a day. The main purpose of this limitation is to prevent spam.
I see many questions similar to this one but was wondering if there is a specific way to accomplish what I am trying to do through Devise.
For the actions that create model instances, the number of times an action has been performed in a day is easy to keep track of. However, at least one action that I would like to restrict does not create a model instance, so I'm not sure what to do about it.
I was also wondering if this is a legitimate/effective way of preventing spam (in addition to requiring users to register and sign in to perform the actions).
Personally, I find these sorts of systems to be over-complications. Unless spam is an existing, provable problem I'm not sure adding in a system that's likely to be rather extensive is a good use of time and energy.
Alternatives to this would be requiring registration through a third-party service (say Facebook) and using either captchas or exciting and new negative captchas.
That said, if you want to do this, I think the best place to keep track of it would be in an ephemeral data store. Redis would be really good for this since you can use queues. In the actions that you want to restrict, add a timestamp to the queue, and before you allow the user to perform said action, check the number of elements in the queue, purging ones that are too old while you do so.
That's sort of pseudo-codey, but should at least help you get started.

How to build cached stats in database without taking down site?

I'm working on a Ruby on Rails site.
In order to improve performance, I'd like to build up some caches of various stats so that in the future when displaying them, I only have to display the caches instead of pulling all database records to calculate those stats.
Example:
A model Users has_many Comments. I'd like to store into a user cache model how many comments they have. That way when I need to display the number of comments a user has made, it's only a simple query of the stats model. Every time a new comment is created or destroyed, it simply increments or decrements the counter.
How can I build these stats while the site is live? What I'm concerned about is that after I request the database to count the number of Comments a User has, but before it is able to execute the command to save it into stats, that user might sneak in and add another comment somewhere. This would increment the counter, but then by immediately overwritten by the other thread, resulting in incorrect stats being saved.
I'm familiar with the ActiveRecord transactions blocks, but as I understand it, those are to guarantee that all or none succeed as a whole, rather than to act as mutex protection for data on the database.
Is it basically necessary to take down the site for changes like these?
Your use case is already handled by rails. It's called counter cache. There is a rails cast here: http://railscasts.com/episodes/23-counter-cache-column
Since it is so old, it might be out of date. The general idea is there though.
It's generally not a best practice to co-mingle application and reporting logic. Send your reporting data outside the application, either to another database, to log files that are read by daemons, or to some other API that handle the storage particulars.
If all that sounds like too much work then, you don't really want real time reporting. Assuming you have a backup of some sort (hot or cold) run the aggregations and generate the reports on the backup. That way it doesn't affect running application and you data shouldn't be more than 24 hours stale.
FYI, I think I found the solution here:
http://guides.ruby.tw/rails3/active_record_querying.html#5
What I'm looking for is called pessimistic locking, and is addressed in 2.10.2.

Twitter app development best practices?

Let's imagine app which is not just another way to post tweets, but something like aggregator and need to store/have access to tweets posted throught.
Since twitter added a limit for API calls, app should/may use some cache, then it should periodically check if tweet was not deleted etc.
How do you manage limits? How do you think good trafficed apps live while not whitelistted?
To name a few.
Aggressive caching. Don't call out to the API unless you have to.
I generally pull down as much data as I can upfront and store it somewhere. Then I operate off the local store until it runs out and needs to be refreshed.
Avoid doing things in real time. Queue up requests and make them on a timer.
If you're on Linux, cronjobs are the easiest way to do this.
Combine requests as much as possible.
Well you have 100 requests per hour, so the question is how do you balance it between the various types of requests. I think the best option is the way is how TweetDeck which allows you to set the percentage and saves the rest of the % for posting (because that is important too):
(source: livefilestore.com)
Around the caching a database would be good, and I would ignore deleted ones - once you have downloaded the tweet it doesn't matter if it was deleted. If you wanted to, you could in theory just try to open the page with the tweet and if you get a 404 then it's been deleted. That means no cost against the API.

Storing Data In Memory: Session vs Cache vs Static

A bit of backstory: I am working on an web application that requires quite a bit of time to prep / crunch data before giving it to the user to edit / manipulate. The data request task ~ 15 / 20 secs to complete and a couple secs to process. Once there, the user can manipulate vaules on the fly. Any manipulation of values will require the data to be reprocessed completely.
Update: To avoid confusion, I am only making the data call 1 time (the 15 sec hit) and then wanting to keep the results in memory so that I will not have to call it again until the user is 100% done working with it. So, the first pull will take a while, but, using Ajax, I am going to hit the in-memory data to constantly update and keep the response time to around 2 secs or so (I hope).
In order to make this efficient, I am moving the intial data into memory and using Ajax calls back to the server so that I can reduce processing time to handle the recalculation that occurs w/ this user's updates.
Here is my question, with performance in mind, what would be the best way to storing this data, assuming that only 1 user will be working w/ this data at any given moment.
Also, the user could potentially be working in this process for a few hours. When the user is working w/ the data, I will need some kind of failsafe to save the user's current data (either in a db or in a serialized binary file) should their session be interrupted in some way. In other words, I will need a solution that has an appropriate hook to allow me to dump out the memory object's data in the case that the user gets disconnected / distracted for too long.
So far, here are my musings:
Session State - Pros: Locked to one user. Has the Session End event which will meet my failsafe requirements. Cons: Slowest perf of the my current options. The Session End event is sometimes tricky to ensure it fires properly.
Caching - Pros: Good Perf. Has access to dependencies which could be a bonus later down the line but not really useful in current scope. Cons: No easy failsafe step other than a write based on time intervals. Global in scope - will have to ensure that users do not collide w/ each other's work.
Static - Pros: Best Perf. Easies to maintain as I can directly leverage my current class structures. Cons: No easy failsafe step other than a write based on time intervals. Global in scope - will have to ensure that users do not collide w/ each other's work.
Does anyone have any suggestions / comments on what I option I should choose?
Thanks!
Update: Forgot to mention, I am using VB.Net, Asp.Net, and Sql Server 2005 to perform this task.
I'll vote for secret option #4: use the database for this. If you're talking about a 20+ second turnaround time on the data, you are not going to gain anything by trying to do this in-memory, given the limitations of the options you presented. You might as well set this up in the database (give it a table of its own, or even a separate database if the requirements are that large).
I'd go with the caching method of for storing the data across any page loads. You can name the cache you want to store the data in to avoid conflicts.
For tracking user-made changes, I'd go with a more old-school approach: append to a text file each time the user makes a change and then sweep that file at intervals to save changes back to DB. If you name the files based on the user/account or some other session-unique indicator then there's no issue with conflict and the app (or some other support app, which might be a better idea in general) can sweep through all such files and update the DB even if the session is over.
The first part of this can be adjusted to stagger the write out more: save changes to Session, then write that to file at intervals, then sweep the file at larger intervals. you can tune it to performance and choose what level of possible user-change loss will be possible.
Use the Session, but don't rely on it.
Simply, let the user "name" the dataset, and make a point of actively persisting it for the user, either automatically, or through something as simple as a "save" button.
You can not rely on the session simply because it is (typically) tied to the users browser instance. If they accidentally close the browser (click the X button, their PC crashes, etc.), then they lose all of their work. Which would be nasty.
Once the user has that kind of control over the "persistent" state of the data, you can rely on the Session to keep it in memory and leverage that as a cache.
I think you've pretty much just answered your question with the pros/cons. But if you are looking for some peer validation, my vote is for the Session. Although the performance is slower (do you know by how much slower?), your processing is going to take a long time regardless. Do you think the user will know the difference between 15 seconds and 17 seconds? Both are "forever" in web terms, so go with the one that seems easiest to implement.
perhaps a bit off topic. I'd recommend putting those long processing calls in asynchronous (not to be confused with AJAX's asynchronous) pages.
Take a look at this article and ping me back if it doesn't make sense.
http://msdn.microsoft.com/en-us/magazine/cc163725.aspx
I suggest to create a copy of the data in a new database table (let's call it EDIT) as you send the initial results to the user. If performance is an issue, do this in a background thread.
As the user edits the data, update the table (also in a background thread if performance becomes an issue). If you have to use threads, you must make sure that the first thread is finished before you start updating the rows.
This allows a user to walk away, come back, even restart the browser and commit whenever she feels satisfied with the result.
One possible alternative to what the others mentioned, is to store the data on the client.
Assuming the dataset is not too large, and the code that manipulates it can be handled client side. You could store the data as an XML data island or JSON object. This data could then be manipulated/processed and handled all client side with no round trips to the server. If you need to persist this data back to the server the end resulting data could be posted via an AJAX or standard postback.
If this does not work with your requirements I'd go with just storing it on the SQL server as the other comment suggested.

Resources