Refresh data with API every X minutes - ruby-on-rails

Ruby on Rails 4.1.4
I made an interface to a Twitch gem, to fetch information of the current stream, mainly whether it is online or not, but also stuff like the current title and game being played.
Since the website has a lot of traffic, I can't make a request every time a user walks in, so instead I need to cache this information.
Cached information is stored as a class variable ##stream_data inside class: Twitcher.
I've made a rake task to update this using cronjobs, calling Twitcher.refresh_stream, but naturally that is not running within my active process (to which every visitor is connecting to) but instead a separate process. So the ##stream_data on the actual app is always empty.
Is there a way to run code, within my currently running rails app, every X minutes? Or a better approach, for that matter.
Thank you for your time!

This sounds like a good call for caching
Rails.cache.fetch("stream_data", expires_in: 5.minutes) do
fetch_new_data
end
If the data is in the cache and is not old then it will be returned without executing the block, if not the block is used to populate the cache.
The default cache store just keeps things in memory so doesn't fix your problem: you'll need to pick a cache store that is shared across your processes. Both redis and memcached (via the dalli gem) are popular choices.

Check out Whenever (basically a ruby interface to cron) to invoke something on a regular schedule.

I actually had a similar problem with using google analytics. Google analytics requires that you have an API key for each request. However the api key would expire every hour. If you requested a new api key for every google analytics request, it'd be very slow per request.
So what I did was make another class variable ##expires_at. Now in every method that made a request to google analytics, I would check ##expires_at.past?. If it was true, then I would refresh the api key and set ##expires_at = 45.minutes.from_now.
You can do something like this.
def method_that_needs_stream_data
renew_data if ##expires_at.past?
# use ##stream_data
end
def renew_data
# renew ##stream_data here
##expires_at = 5.minutes.from_now
end
Tell me how it goes.

Related

Caching an HTTP request made from a Rails API (google-id-token)?

ok, first time making an API!
My assumption is that if data needs to be stored on the back end such that it persists across multiple API calls, it needs to be 1) in cache or 2) in a Database. is that right?
I was looking at the code for the gem "google-id-token". it seems to do just what i need for my google login application. My front end app will send the google tokens to the API with requests.
the gem appears to cache the public (PEM) certificates from Google (for an hour by default) and then uses them to validate the Google JWT you provide.
but when i look at the code (https://github.com/google/google-id-token/blob/master/lib/google-id-token.rb) it just seems to fetch the google certificates and put them into an instance variable.
am i right in thinking that the next time someone calls the API, it will have no memory of that stored data and just fetch it again?
i guess its a 2 part question:
if i put something in an #instance_variable in my API, will that data exist when the next API call comes in?
if not, is there any way that "google-id-token" is caching its data correctly? maybe HTTP requests are somehow cached on the backend and therefore the network request doesnt actually happen over and over? can i test this?
my impulse is to write "google-id-token" functionality in a way that caches the google certs using MemCachier. but since i dont know what I'm doing i thought i would ask.? Maybe the gem works fine as is, i dont know how to test it.
Not sure about google-id-token, but Rails instance variables are not available beyond single requests and views (and definitely not from one user's session to another).
You can low-level cache anything you want with Rails.cache.fetch this is put in a block, takes a key name, and an expiration. So it looks like this:
Rails.cache.fetch("google-id-token", expires_in: 24.hours) do
#instance_variable = something
end
If the cache exists and it is not past its expiration date/time, Rails grabs it from the cache; otherwise, it would make your API request.
It's important to note that low-level caching doesn't work with mem_store (the default for development) and so you need to implement a solution with redis or memcached or something like that for development, too. Also, make sure the file tmp/cache.txt exists. You can run rails dev:cache or just touch it to create it.
More on Rails caching

How can I create a lock for concurrency across different requests (on a process-based webserver)

I have a rails app that people can send data to in the query params of a url. The rails app then validates the correctness of the data and creates a json reponse listing any detected errors. The validation itself is done by checking the data against a set of rules that live in a github repo.
Ideally I'd like to update my local copy of this repo once a day. In order to prevent complications I'd like any requests that come in while this update takes place to back off for a few seconds.
What's the best way to communicate to the incoming requests that an update is currently occuring? I'm using a process based webserver (unicorn), so memory mutexes don't seen like the right answer :(.

Sending data from an analytics engine to a Rails server

I have an analytics engine which periodically packages a bunch of stats in JSON format. I want to send these packages to a Rails server. Upon a package arriving, the Rails server should examine it, generate a model instance out of it (for historical purposes), and then display the contents to the user. I've thought of two approaches.
1) Have a little app residing on the same host as the Rails server to be listening for these packages (using ZeroMQ). Upon receiving a package, the app would invoke a Rails action through CURL, passing on the package as a parameter. My concern with this approach is that my Rails server checks that only signed-in users can access actions which affect models. By creating an action accessible to this listening app (and therefore other entities), am I exposing myself to a major security flaw?
2) The second approach is to simply have the listening app dump the package into a special database table. The Rails server will then periodically check this table for new packages. Upon detecting one or more, it will process them and remove them from the table.
This is the first time I'm doing something like this, so if you have techniques or experiences you can share for better solutions, I'd love to learn.
Thank you.
you can restrict access to a certain call by limiting the host name that is allowed for the request in routes.rb
post "/analytics" => "analytics#create", :constraints => {:ip => /127.0.0.1/}
If you want the users to see updates, you can use polling to refresh the page every minute orso.
1) Yes you are exposing a major security breach unless :
Your zeroMQ app provides the needed data to do authentification and authorization on the rails side
Your rails app is configured to listen only on the 127.0.0.1 interface and is thus not accessible from the outside
Like Benjamin suggests, you restrict specific routes to certain IP
2) This approach looks a lot like what delayed_job does. You might wanna take a look there : https://github.com/collectiveidea/delayed_job and use a rake task to add a new job.
In short, your listening app will call a rake task that will add a custom delayed_job when receiving a packet. Then let delayed_job handle the load. You benefit from delayed_job goodness (different queues, scaling, ...). The hard part is getting the result.
One idea would be to associated a unique ID with each job, and have the delayed_job task output the result in a data store wich associated the job ID with the result. This data store can be a simple relational table
+----+--------+
| ID | Result |
+----+--------+
or a memecache/redis/whatever instance. You just need to poll that data store looking for the result associated with the job ID. And delete everything when you are done displaying that to the user.
3) Why don't you directly POST the data to the rails server ?
Following Benjamin's lead, I implemented a filter for this particular action.
def verify_ip
#ips = ['127.0.0.1']
if not #ips.include? request.remote_ip
redirect_to root_url
end
end
The listening app on the localhost now invokes the action, passing the JSON package received from the analytics engine as a param. Thank you.

Twitter Rate Limits and Cron caching with rails

I have a small rails app on Heroku that pulls in my client's latest Tweet to display on all pages. It is hitting Twitter rate limits already. I'm trying to come up with a solution. Would the following be a sensible approach ...
Use a cron gem like Whenever to pull down the latest Tweet every minute and write it to a file, then have pages pull the Tweet from that file instead of directly from Twitter.
Yes, this is one possibility. Or you could use caching to store the tweets, for example using Memcached. This will also make your app faster.
I'm not familiar with the specific rate limits on twitter, but if they're expressed in requests/minute then the cron job might work. Whatever you do, you need to stop letting incoming traffic drive your requests. Typically you'd create a queue and have a single worker pull requests off of it. That worker would take care of rate limiting itself so you don't go over.
API rate limits are a necessary evil. Maybe you can make a gem to help other folks easily throttle themselves.
I ended up using memcache to cache the requests:
latest_tweet = Rails.cache.read "latest_tweet"
if !latest_tweet
latest_tweet = Twitter.user_timeline("sometwitterusername").first.text
Rails.cache.write("latest_tweet", latest_tweet, :expires_in => 5.minutes)
end

Suggestions for how to write a service in Rails 3

I am building an application which will send status requests to users (via email & sms) on a regular basis. I want to execute the service each hour which will:
Query the database for all requests that need to be sent (based on some logic)
Send the requests through Amazon's Simple Email Service (this is already working)
Write a record of the status request notification back to the data store
I am considering wrapping up this series of operations into a single controller with an end point that can be called remotely to kick off the process within the rails app.
Longer term, I will break this process out into an app that can be run independently of my rails app, but for now I'm just trying to keep it simple.
My first inclination is to build the following:
Controller with the following elements:
A method which will orchestrate the steps outlined above (and can be called externally)
A call to the status_request model which will bring back a collection of request needing to be sent
A loop to iterate through the pending requests, which will:
Make a call to my AWS Simple Email Service module to actually send the email, and
Make a call to the status_request model to log the request back to the database
Model:
A method on my status_request model which will bring back a collection of requests that need to be sent
A method in my status_request model which will log that a notification was sent
Since this will behave as a service that gets called periodically from an outside scheduler I don't think I'll need a view for this operation. (Will, of course, need views to show users and admins what requests have been sent, but that's later...).
As someone new to Rails, I'm asking for review of this approach and any suggestions you may have.
Thanks!
Instead of a controller which Jeff pointed out exposes a security risk, you may just want to expose a rake task and use cron to invoke it on an hourly basis.
If you are still interested in building a controller, look at devise gem and its single access token, token_authenticatable, for securing the methods you are exposing.
You may also want to look at delayed_job or resque to offload the call to status_request and the loop to AWS simple service to a background worker process.
You may want a seperate controller and view for the log file so you can review progress on demand.
And if you want to get real fancy use Amazon SNS to send you alerts when the service reaches some unacceptable level of failures, backlog, etc.
Since you are trying to invoke this from an outside process, your approach should work. You could also have a worker process that processes task when they are there.
You will need routes to expose your service, and you may want to also make security decisions. How will the service that invokes your application authenticate so all others can't hit it at will?
Another consideration should be how many emails are you sending. If there are enough, we may want to look into the fact that writing this sort of loop is going to be extremely top heavy; and may affect users on the current system if it's a web application.
In the end, there are many ways to do this. I would focus on the performance/usage you expect as well as security. There's never one perfect way to solve a problem like this, and your way should just be aware of the variables it will need to be operating within.
Resque and Redis might be helpful to you in scheduling and performing operatio n .They are simple and superfast, [here](http://railscasts.com/episodes/271-resque] is a simple tut on same.

Resources