We are using reverse-geocoding in a rails webservice, and have run into quota problems when using the Google reverse geocoder through geokit. We are also implementing the simple-geo service, and I want to be able to track how many requests per minute/hour we are making.
Any suggestions for tracking our reverse-geocoding calls?
Our code will look something like the following. Would you do any of these?
Add a custom logger and process in the background daily
Use a super-fantastic gem that I don't know about that does quotas and rating easily
Insert into database a call and do queries there.
Note: I don't need the data in real-time, just want to be able to know in an hourly period, what's our usual and max requests per hour. (and total monthly requests)
def use_simplegeo(lat, lng)
SimpleGeo::Client.set_credentials(SIMPLE_GEO_OAUTHTOKEN, SIMPLE_GEO_OAUTHSECRET)
# maybe do logging/tracking here?
nearby_address = SimpleGeo::Client.get_nearby_address(lat, lng)
located_location = LocatedLocation.new
located_location.city = nearby_address[:place_name]
located_location.county = nearby_address[:county_name]
located_location.state = nearby_address[:state_code]
located_location.country = nearby_address[:country]
return located_location
end
Thanks!
The first part here is not answering the question you are asking but my be helpful if haven't considered it before.
Have you looked at not doing your reverse geocoding using your server (i.e. through Geokit) but instead having this done by the client? In other words some Javascript loaded into the user's browser making Google geocoder API calls on behalf of your service.
If your application could support this approach than this has a number of advantages:
You get around the quota problem because your distributed users each have their own daily quota and don't consume yours
You don't expend server resources of your own doing this
If you still would like to log your geocoder queries and you are concerned about the performance impact to your primary application database then you might consider one of the following options:
Just create a separate database (or databases) for logging (which write intensive) and do it synchronously. Could be relational but perhaps MongoDB or Redis might work either
Log to the file system (with a custom logger) and then cron these in batches into structured, queriable storage later. The storage could be external such as on Amazon's S3 if that works better.
Just write a record into SimpleGeo each time you do a Geocode and add custom meta-data to those records to tie them back to your own model(s)
Related
Every n seconds application is requesting a remote JSON file that provides live prices for securities in the Trading system. JSON has a block with the data I need (marketdata) and a block with the current dataversion(version and seqnum).
Right now I use ActionController::Live (with EventSource on the client side) to push updated data to the browser. All actions are done within one method:
opening SSE connection;
forming dynamic URL;
pulling new data from remote server;
comparing/reassigning seqnum value;
updating database if needed.
So my goal now is to separate pulling & updating the database (ActiveJob) with pushing updated values to the browser (ActionController::Live). To accomplish this I need:
either to store somewhere on the server side seqnum & version to share between controller and background job;
or monitor databases for the latest changes in the updated_at fields.
So basically I have two questions:
What is more efficient between the two options above?Are there any other good approaches?
(in case the first one has a right to exist) How to implement this approach?
Considering the fact that you might have, for example, multiple rails process running, I believe it becomes quite hard for you to let activejob talk directly to rails controller in some way.
Defintely store seqnum and version, I wouldn't rely on updated_at in any case, it's too easy to get it updated randomly and so end up sending stuff to the client without any real reason. Also in this case they seem like very solid fields to point out if the file has been updated.
With polling
That being said, you want to "signal" ActionController::Live in some way and I'm afraid polling here is your only option, unless on your client side there is a specific moment when it needs to know if the file has been updated, in which case you might want to use websockets or something similar.
So, something like
cached_request = YourCachedRequest.latest # Assuming it returns a single record
updated = true
loop do
if updated
updated = false
response.stream.write cached_request.serialize_in_some_way
end
current_version = cached_request.version # use seqnum too if you need
cached_request = cached_request.reload
updated = true if cached_request.version > current_version
sleep 20.0
end
Without polling
If you want an option that doesn't involve polling, you can only go for websockets I believe. However you have a more efficient option:
Create a mini application (evenmachine/sinatra/something light) where the clients will poll (you can pass through your main application to distribute this to differente nodes of this mini application), the point of this app is only to reroute messages from your main application to polling clients.
Now, you can create an internal API endpoint for your main application that it's used only by delayed job. Delayed job will hit this endpoint only when it notices that the fetched JSON is actually updated relative to the one currently stored. If that's the case, it will hit your main app API endpoint which in turn will send a message (again, probably through an HTTP API endpoint, this time on your mini app) to all your mini app instances, which in turn will send them to your clients.
In this way, you don't overload your main server but only these mini-nodes which can have localized outages (which is a big advantage, instead of having a big system outage).
Ruby on Rails 4.1.4
I made an interface to a Twitch gem, to fetch information of the current stream, mainly whether it is online or not, but also stuff like the current title and game being played.
Since the website has a lot of traffic, I can't make a request every time a user walks in, so instead I need to cache this information.
Cached information is stored as a class variable ##stream_data inside class: Twitcher.
I've made a rake task to update this using cronjobs, calling Twitcher.refresh_stream, but naturally that is not running within my active process (to which every visitor is connecting to) but instead a separate process. So the ##stream_data on the actual app is always empty.
Is there a way to run code, within my currently running rails app, every X minutes? Or a better approach, for that matter.
Thank you for your time!
This sounds like a good call for caching
Rails.cache.fetch("stream_data", expires_in: 5.minutes) do
fetch_new_data
end
If the data is in the cache and is not old then it will be returned without executing the block, if not the block is used to populate the cache.
The default cache store just keeps things in memory so doesn't fix your problem: you'll need to pick a cache store that is shared across your processes. Both redis and memcached (via the dalli gem) are popular choices.
Check out Whenever (basically a ruby interface to cron) to invoke something on a regular schedule.
I actually had a similar problem with using google analytics. Google analytics requires that you have an API key for each request. However the api key would expire every hour. If you requested a new api key for every google analytics request, it'd be very slow per request.
So what I did was make another class variable ##expires_at. Now in every method that made a request to google analytics, I would check ##expires_at.past?. If it was true, then I would refresh the api key and set ##expires_at = 45.minutes.from_now.
You can do something like this.
def method_that_needs_stream_data
renew_data if ##expires_at.past?
# use ##stream_data
end
def renew_data
# renew ##stream_data here
##expires_at = 5.minutes.from_now
end
Tell me how it goes.
I have a Rails production application that is down several times per day. This application, in addition to serving its users, is the endpoint for a 3rd party website that sends it updates.
Occasionally, these updates will come flooding in so fast that the requests back up and the application becomes unavailable for long periods of time. It is a legitimate usage which ends up causing a Denial of Service.
The request from the 3rd party is pretty simple:
class NotificationsController < ApplicationController
def notify
begin
notification_xml = request.body.read
notification_hash = Hash.from_xml(item_response_xml)['Envelope']['Body']['NotificationResponse']
user = User.find(notification_hash['UserID'])
user.delay.set_notification(notification_hash)
rescue Exception => bang
logger.error bang.backtrace
unless user.blank?
alert_file_name = "#{user.id}_#{notification_hash['Message']['MessageID']}_#{notification_hash['NotificationEventName']}_#{notification_hash['Timestamp']}.xml"
File.open(alert_file_name, 'w') {|f| f.write(notification_xml) }
end
end
render nothing: true, status: 200
end
end
I have two app servers against a very large database. However, when this 3rd party website really hits us with the notification requests, over 200 per minute up to close to 1,000 requests per minute, both webservers get completely tied up.
You can also see above that I'm using the .delay call since I'm using Sidekiq. I thought that would help, and it did for a while, but the application can't handle that many requests.
Other than handling the requests in a separate application, which I'm not sure is really possible in my EngineYard installation, is there something I can do to speed up the handling of this request?
If it takes too much to process all those request, try a different approach.
Create a new model (I will call it Request) with only one field (I'll name it message) - the xml sent to you by that 3rd party app.
Rewrite your notify action to be very simple and fast:
def notify
Request.create(message: request.body)
render nothing: true, status: 200
end
Create a new action, let's say process_requests like this:
def process_requests
Request.order('id ASC')find_in_batches(100) do |group|
group.each do |request|
process_request(request)
request.destroy
end
end
end
def process_request(notification_xml)
begin
notification_hash = Hash.from_xml(item_response_xml)['Envelope']['Body']['NotificationResponse']
user = User.find(notification_hash['UserID'])
user.set_notification(notification_hash)
rescue Exception => bang
logger.error bang.backtrace
unless user.blank?
alert_file_name = "#{user.id}_#{notification_hash['Message']['MessageID']}_#{notification_hash['NotificationEventName']}_#{notification_hash['Timestamp']}.xml"
File.open(alert_file_name, 'w') {|f| f.write(notification_xml) }
end
end
Create a cron and call process_requests at a defined interval (few minutes).
I never used Sidekiq so I preferred to use find_in_batches (I used a batch of 100 results just for the sake of example).
notify action shouldn't run for more than a few milliseconds (inserts are pretty fast) so this should be able to handle the incoming traffic in your critical moments.
If you try something similar and it helps your servers to reduce the load in critical moments let me know :D
If this will be useful and you insert background processing here too, please post that for the others to see.
If you're monitoring this app with New Relic/AppNet/something else, checking your reports might give you an idea of some long-hanging fruit. We've only got a small picture of the application here; it's possible that enhancements elsewhere in the app might help as well.
With that said, here are a few ideas which can be applied separately or together:
Do Less Work on Intake
Right now you're doing a bunch of XML processing—which is expensive—before you pass the job off to Sidekiq. That's a choke point, and by running in the app process it's tying up your application.
If your Redis instance has enough memory, consider refactoring notify so the whole XML payload gets passed off to Sidekiq. You're already always returning a 200 response to the API consumer, so there's no impact on your existing external API.
Your worker instances can then process the XML payloads at their own pace without impacting the application.
Implement API Throttling
The third-party site is hammering you at a tremendous rate not normally permitted even by huge sites. That's a problem.
If you can't get them to address it on their end, play like the big dogs: Implement request throttling on your end. You likely have some ability to do this at the Rack level on EngineYard (though a quick search of their docs didn't immediately yield anything), but even doing it at the application level is likely to improve things.
There's a previous Stack Overflow discussion that may offer a couple options.
Proxy the API
A few services exist that will proxy your API for you, allowing you to easily implement features like rate limiting, throttling, and quotas that might otherwise be difficult to add.
The one I'm familiar with off the top of my head is Azure's API Management service. If this isn't a revenue-generating project, the cost might be prohibitive. ($49/month postpaid, though it would be cheaper prepaid, or could even be free if you qualify for BizSpark.)
Farm the API Out
The more advanced cousin of API proxies, "API as a Service" actually lets you run your API on its own VM instance—as well as offering the features a proxy does. If your database isn't a choke point, this can be a way to spread the load out and help prevent machine clients from affecting the experience of human clients.
The ten thousand pound gorilla is Apigee, though there are a variety of other established and startup options.
There is a catch: Most of these services are built around Node.js. If your Rails app is already leaning toward service-oriented architecture, and if you know and like JavaScript, this may not be an issue for you. Otherwise, the need to build an interface between services and maintain a service in a second language may be a bridge too far.
I have a rails app that people can send data to in the query params of a url. The rails app then validates the correctness of the data and creates a json reponse listing any detected errors. The validation itself is done by checking the data against a set of rules that live in a github repo.
Ideally I'd like to update my local copy of this repo once a day. In order to prevent complications I'd like any requests that come in while this update takes place to back off for a few seconds.
What's the best way to communicate to the incoming requests that an update is currently occuring? I'm using a process based webserver (unicorn), so memory mutexes don't seen like the right answer :(.
I have a small rails app on Heroku that pulls in my client's latest Tweet to display on all pages. It is hitting Twitter rate limits already. I'm trying to come up with a solution. Would the following be a sensible approach ...
Use a cron gem like Whenever to pull down the latest Tweet every minute and write it to a file, then have pages pull the Tweet from that file instead of directly from Twitter.
Yes, this is one possibility. Or you could use caching to store the tweets, for example using Memcached. This will also make your app faster.
I'm not familiar with the specific rate limits on twitter, but if they're expressed in requests/minute then the cron job might work. Whatever you do, you need to stop letting incoming traffic drive your requests. Typically you'd create a queue and have a single worker pull requests off of it. That worker would take care of rate limiting itself so you don't go over.
API rate limits are a necessary evil. Maybe you can make a gem to help other folks easily throttle themselves.
I ended up using memcache to cache the requests:
latest_tweet = Rails.cache.read "latest_tweet"
if !latest_tweet
latest_tweet = Twitter.user_timeline("sometwitterusername").first.text
Rails.cache.write("latest_tweet", latest_tweet, :expires_in => 5.minutes)
end