When fetching data using an api, is it best to store that data on another database, or is it best to keep fetching that data whenever you need it? [duplicate] - ruby-on-rails

This question already has an answer here:
Caching calls to an external API in a rails app
(1 answer)
Closed 6 years ago.
I'm using the TMDB api in order to fetch information such as film titles and release years, but am wondering whether I need to create an extra database in order to store all this information locally, rather than keep having to use the api to get the info? For example, should I create a film model and call:
film.title
and by doing so accessing a local database with the title stored on it, or do I call:
Tmdb::Movie.detail(550).title
and by doing so making another call to the api?

Having dealt with a large Rails application that made service calls to about a dozen other applications, caching is your best bet. The problem with the database solution is keeping it up to date. The problem with making the calls every time is that it's too slow. There is a middle ground. For this you want Ruby on Rails Low Level Caching:
Sometimes you need to cache a particular value or query result instead of caching view fragments. Rails' caching mechanism works great for storing any kind of information.
The most efficient way to implement low-level caching is using the Rails.cache.fetch method. This method does both reading and writing to the cache. When passed only a single argument, the key is fetched and value from the cache is returned. If a block is passed, the result of the block will be cached to the given key and the result is returned.
An example that is pertinent to your use case:
class TmdbService
def self.movie_details(id)
Rails.cache.fetch("tmdb.movie.details.#{id}", expires_in: 4.hours) do
Tmdb::Movie.detail id
end
end
You can then configure your Rails application to use memcached or the database for the cache, it doesn't matter. The point is you want this cached data to expire at some point to ensure you are getting up-to-date information.

This is a big decision to make. If the amount of data you get through the API is not huge you can store all of it in your database. This way you will get the data much faster and your application will work even when the API is down.
If the amount of data you get is huge and you don't have sources to store all the data, you should at least store the most important data in your database as cache.
If you do not store any data on you own you are dependent on the source of data and it can have downtime.
Problem with storing data on your side is when the data change and you need to synchronize. In that case it is still good to store data on your side as cache to get results faster and synchronize the data periodically.

Calls to a local database are way faster than calls to external APIs. I would expect a local database to return within a few milliseconds, whereas an API will probably take hundreds of milliseconds. And local calls are less likely effected by network issues or downtimes.
Therefore I would always cache the result of an API call in a local database and occasionally updated the local version with a newer version from the API.
But in the end it depends on your requirement: Do you need real-time or is a cached version okay? How often do you need that data and how often is is updated? How fast is the API and is latency an issue? Does the API have a rate limit (a maximum number of request per time)?

Related

How to optimise computation intensive request response on rails [duplicate]

This question already has answers here:
How do I handle long requests for a Rails App so other users are not delayed too much?
(3 answers)
Closed 6 years ago.
I have an application, which does a lot of computation on few pages(requests). The web interface sends an AJAX request. The computation takes sometimes about 2-5 minutes. The problem is, by this time AJAX request times out.
We can certainly increase the timeout on the web portal, but that doesn't sound like right solution. Also, to improve performance:
Removed N+1/Duplicate queries
Implemented Caching
What else could be done here to reduce the calculation time?
Also, if it still takes longer, I was thinking of following solutions:
Do the computation beforehand and store it in DB. So when the actual request comes, there is no need of calculation. (Apprehensive about this approach. Since we will have to modify/Erase-and-recalculate this data, whenever there is some application logic change.)
Load the whole data in cache when application starts/data gets modified. But for the first time computation has to be done. Also, can't keep whole data in the cache when the application starts. So need to store it in the cache as per demand.
Maybe, do something like Angular promise, where promise gets fulfilled when the response comes from the server.
Do we have any alternative to do this efficiently?
UPDATE:
Depending on user input, the calculation might happen in few seconds. And also it might take 2-5 minutes. The scenario is, user imports an excel. The excel has been parsed and saved in DB. Now on another page, user wants to see the report/analytics graph derived with few calculations on the imported data(which has already been saved to db with background job). The calculation has to be done with many factors, so do not want to save it in DB(As pointed above). Also, when user request the report/analytics graph, It'll be bad experience to tell him that graph will be shown after sometime. You'll get email/notification etc.
The extremely typical solution is to enqueue a job for background processing, and return a job ID to the front-end. Your front-end can then poll for completion using that job ID, or you can trigger a notification such as an email to be sent to the user when the job completes.
There are a multitude of gems for this, and it is such a popular and accepted solution that Rails introduced its own ActiveJob for this exact purpose.
Here are a few possible solutions:
Optimize your tables with indexes to reduce data fetching time.
Preload all rows you'll be dealing with at the beginning, so you won't do a query each time you calculate something... it's faster/easier to #things.select { |r| r.blah } than to Thing.where(conditions)
Instead of all that, just do the computing in PLSQL on the database side. Sure, it's not the same as writing Ruby code but it could be faster.
And yes, cache the whole results set into memcache or redis or something (and expire when something change)
Run the calculation in the background (crontab?) and store the results in a JSON somewhere, or cache the entire HTML file (if you're not localizing or anything)
PS: I'm doing 1,2,3 combined with 5 (caching JSON results into memcache and then pulling the array and formatting/localizing) for a few M records from about 12 tables... sports data mainly.

What is the most efficient way to store temp data for processing?

I am writing an application in Ruby, which collect huge amount of data from API calls and stores it in a file. After that it processes it one by one. I was wondering if there is a way better than this to achieve the same?
Note: I want to process the records one by one by storing all of them locally because they may change during API calls.
I would look at storing the information in an in-memory key/value store (such as memcached or redis). If you use an in-memory key/value store, you can update information based on subsequent API calls rather than having multiple records in a file which represent the same data, just with different values.
Keep in mind, however, if your data is significantly large, you may run out of memory. That said, if you are into the gigabytes of data, the way you have implemented your solution may be the best route to take.

iOS app with remote server - I don't need data to persist on app, should I still use CoreData?

Design question:
My app talks to a server. Json data being sent/received.
Data on server is always changing, and I want users to see most current data, not stored/cached data. So I require a user to be logged in order to use the app, and care not to persist data in the app.
Should I still use CoreData and map it to Json's.?
Or can I just create custom model classes and map Json's to it's properties, and have nsarray properties, which point to its child objects, etc. ?
Which is better?
Thanks
If you dont want to persist data, I personally think core data would be overkill for this application
Core Data is really for local persistance. If the data was not changing so often and you didnt want them to have to get an updated data everytime the user visited the page, then you would load the JSON and store it locally using CoreData.
Use plain old objective-c objects for now. It's not hard to switch to Core Data in future, but once you've done so it gets a lot harder to change your schema.
That depends on what your needs are.
If you need the app to work offline, you need to store your information somehow in the client.
In order to save on network usage, you could store locally, then query the server to see if it had an updated answer -- you could do this by sending a time stamp to the server and return a 304 Not Modified if the entity hasn't changed.
Generally, it depends on how much time you have to put into the app and what your specific requirements are, but as a general rule I would optimise for as low bandwidth usage as possible, as that not only reduces potential data costs, but also means the answers will be more quickly available to your users (when online and they have not changed) and also available offline.
If you do not wish to store data locally at all,

Updating an existing Memcached record

I have an application that needs to perform multiple network queries each one of those returns 100 records.
I'd like to keep all the results (several thousand or so) together in a single Memcached record named according to the user's request.
Is there a way to append data to a Memcached record or do I need to read and write it back and forth and combine the old results with the new ones by the means of my application?
Thanks!
P.S. I'm using Rails 3.2
There's no way to append anything to a memcached key. You'd have to read it in and out of storage every time.
redis does allow this sort of operation, however, as rubish points out -- it has a native list type that allows you to push new data onto it. Check out the redis list documenation for information on how to do that.
You can write a class that'll emulate list in memcached (which is actually what i did)... appending to record isn't atomic operation, so it'll generate errors that'll accumulate over time (at least in memcached). Beside it'll be very slow.
As pointed out Redis has native lists, but it can be emulated in any noSQL / K-V storage solution.

How to cache objects store them for multiple requests?

I am using Ruby on Rails and I need to store a search result set obtained by connecting to another server. The problem is I don't want to store the result set in the session and I want something where I can store the result set object over multiple requests.
The querying takes time so I don't want to repeat it. Is there a way I could store objects or cache objects so that I don't have to query it again and again?
Can use some kind of object store?
Any help would be great.
If memoizing is an option how do I memoize objects? The connection would still take time so how to store the result set.
If you don't want to store it in the session, obviously you have options here.
You can store in your DB temporarily (I assume querying the DB is faster than re-fetching from another server :)).
Also there's an option to use something like memcached. Although you have to be aware that restarting it will throw all your data away.
Depends on what you need to achieve and how you need to handle your data.
The memoization works only in single request. Can't be use in several request.
All request have all storage resource like Memcache or BDD.. If you use the ActiveSupport::Cache::MemCacheStore. you can push and fetch all object in all of your request.
If you need to store data between requests, memoization likely isn't what you are looking for. Memoization is typically used in Ruby/Rails when you are calling the same method repeatedly within a single request, and that method is expensive (either CPU intensive, multiple DB requests, etc).
You can memoize a method, which stores the result in an instance variable, and the next time it is called, the instance variable value is returned, rather than reevaluating the method. There are tons of resources out there on this if you want to look into it further.
For data that needs to persist across sessions, and may need to be shared between different users, I highly recommend memcached. Rails has some built in support for it, so it shouldn't be too hard to dig up some good resources.

Resources