Updating an existing Memcached record - ruby-on-rails

I have an application that needs to perform multiple network queries each one of those returns 100 records.
I'd like to keep all the results (several thousand or so) together in a single Memcached record named according to the user's request.
Is there a way to append data to a Memcached record or do I need to read and write it back and forth and combine the old results with the new ones by the means of my application?
Thanks!
P.S. I'm using Rails 3.2

There's no way to append anything to a memcached key. You'd have to read it in and out of storage every time.
redis does allow this sort of operation, however, as rubish points out -- it has a native list type that allows you to push new data onto it. Check out the redis list documenation for information on how to do that.

You can write a class that'll emulate list in memcached (which is actually what i did)... appending to record isn't atomic operation, so it'll generate errors that'll accumulate over time (at least in memcached). Beside it'll be very slow.
As pointed out Redis has native lists, but it can be emulated in any noSQL / K-V storage solution.

Related

When fetching data using an api, is it best to store that data on another database, or is it best to keep fetching that data whenever you need it? [duplicate]

This question already has an answer here:
Caching calls to an external API in a rails app
(1 answer)
Closed 6 years ago.
I'm using the TMDB api in order to fetch information such as film titles and release years, but am wondering whether I need to create an extra database in order to store all this information locally, rather than keep having to use the api to get the info? For example, should I create a film model and call:
film.title
and by doing so accessing a local database with the title stored on it, or do I call:
Tmdb::Movie.detail(550).title
and by doing so making another call to the api?
Having dealt with a large Rails application that made service calls to about a dozen other applications, caching is your best bet. The problem with the database solution is keeping it up to date. The problem with making the calls every time is that it's too slow. There is a middle ground. For this you want Ruby on Rails Low Level Caching:
Sometimes you need to cache a particular value or query result instead of caching view fragments. Rails' caching mechanism works great for storing any kind of information.
The most efficient way to implement low-level caching is using the Rails.cache.fetch method. This method does both reading and writing to the cache. When passed only a single argument, the key is fetched and value from the cache is returned. If a block is passed, the result of the block will be cached to the given key and the result is returned.
An example that is pertinent to your use case:
class TmdbService
def self.movie_details(id)
Rails.cache.fetch("tmdb.movie.details.#{id}", expires_in: 4.hours) do
Tmdb::Movie.detail id
end
end
You can then configure your Rails application to use memcached or the database for the cache, it doesn't matter. The point is you want this cached data to expire at some point to ensure you are getting up-to-date information.
This is a big decision to make. If the amount of data you get through the API is not huge you can store all of it in your database. This way you will get the data much faster and your application will work even when the API is down.
If the amount of data you get is huge and you don't have sources to store all the data, you should at least store the most important data in your database as cache.
If you do not store any data on you own you are dependent on the source of data and it can have downtime.
Problem with storing data on your side is when the data change and you need to synchronize. In that case it is still good to store data on your side as cache to get results faster and synchronize the data periodically.
Calls to a local database are way faster than calls to external APIs. I would expect a local database to return within a few milliseconds, whereas an API will probably take hundreds of milliseconds. And local calls are less likely effected by network issues or downtimes.
Therefore I would always cache the result of an API call in a local database and occasionally updated the local version with a newer version from the API.
But in the end it depends on your requirement: Do you need real-time or is a cached version okay? How often do you need that data and how often is is updated? How fast is the API and is latency an issue? Does the API have a rate limit (a maximum number of request per time)?

What is the most efficient way to store temp data for processing?

I am writing an application in Ruby, which collect huge amount of data from API calls and stores it in a file. After that it processes it one by one. I was wondering if there is a way better than this to achieve the same?
Note: I want to process the records one by one by storing all of them locally because they may change during API calls.
I would look at storing the information in an in-memory key/value store (such as memcached or redis). If you use an in-memory key/value store, you can update information based on subsequent API calls rather than having multiple records in a file which represent the same data, just with different values.
Keep in mind, however, if your data is significantly large, you may run out of memory. That said, if you are into the gigabytes of data, the way you have implemented your solution may be the best route to take.

Getting most recent paths visited across sessions in Rails app

I have a simple rails app with no database and no controllers. It uses High Voltage for routing queries, then uses javascript to go get data using the params hash.
A typical URL looks like this:
http://example.com/?id=37ed660aa222e61ebbbc02db
I'd like to grab the ten unique URLs users have most recently visited and pass them to a view. Note that I said users, preferably across concurrent sessions.
Is there a way to retrieve this using ActiveSupport::Notifications or Production.log? Any examples, including where the code should best go, would be greatly appreciated!
I think that Redis would be ideally suited to this. It's one of the NoSQL key-value store db's, but its support for the value part being an ordered list, queue, etc. should make it easy to store unique urls in a FIFO list as they are visited, limit the size of that list (discard urls at the 'old' end of the list), and retrieve the most recent N urls to pass to your view. Your list should stay small enough that it would all stay in memory and be very fast. You might be able to do this with memcached or mongo or another one as well; I think it would be best though if the solution kept the stored values in memory.
If you aren't already using redis (or similar), it might seem like overkill to set it up and maintain just for this feature. But you can make it pay for itself by also using it for caching, background job processing (Resque / Sidekiq), and probably other things in your app.

How to build cached stats in database without taking down site?

I'm working on a Ruby on Rails site.
In order to improve performance, I'd like to build up some caches of various stats so that in the future when displaying them, I only have to display the caches instead of pulling all database records to calculate those stats.
Example:
A model Users has_many Comments. I'd like to store into a user cache model how many comments they have. That way when I need to display the number of comments a user has made, it's only a simple query of the stats model. Every time a new comment is created or destroyed, it simply increments or decrements the counter.
How can I build these stats while the site is live? What I'm concerned about is that after I request the database to count the number of Comments a User has, but before it is able to execute the command to save it into stats, that user might sneak in and add another comment somewhere. This would increment the counter, but then by immediately overwritten by the other thread, resulting in incorrect stats being saved.
I'm familiar with the ActiveRecord transactions blocks, but as I understand it, those are to guarantee that all or none succeed as a whole, rather than to act as mutex protection for data on the database.
Is it basically necessary to take down the site for changes like these?
Your use case is already handled by rails. It's called counter cache. There is a rails cast here: http://railscasts.com/episodes/23-counter-cache-column
Since it is so old, it might be out of date. The general idea is there though.
It's generally not a best practice to co-mingle application and reporting logic. Send your reporting data outside the application, either to another database, to log files that are read by daemons, or to some other API that handle the storage particulars.
If all that sounds like too much work then, you don't really want real time reporting. Assuming you have a backup of some sort (hot or cold) run the aggregations and generate the reports on the backup. That way it doesn't affect running application and you data shouldn't be more than 24 hours stale.
FYI, I think I found the solution here:
http://guides.ruby.tw/rails3/active_record_querying.html#5
What I'm looking for is called pessimistic locking, and is addressed in 2.10.2.

Sharing an large array with all users on a rails app

I have inherited an app that generates a large array for every user that visit the app. I recently discovered that it is identical for nearly all the users!!
Now I want to somehow make one copy of it so it is not built over and over again. I have thought of a few options and wanted input to see which one is the best:
1) Create a model and shove the data into the database
2) Create a YAML file and have the app load it when it initializes.
I personally like the model idea but a few engineers at work feel as though it does not deserve to be a full model. 97% of the times users will see the same exact thing but 3% of the time users will get a slightly different array (a few elements will have changed).
Any other approaches that I should consider.??..thanks in advance.
Remember that if you store the data in the DB, each request which requires the data will have to execute a DB query to pull it out. If you are running multiple server threads, each thread could have its own copy in memory (if they are all handling requests which require the use of the array). In that case, you wouldn't be saving any memory (though you might save time from not having to regenerate the array).
If you are running multiple server processes (not threads), and if the array contents change as the application is running, and the changes have to be visible to all the processes, caching in memory won't work. You will have to use the DB in that case.
From the information in your comment, I suggest you try something like this:
Store the array in your DB, and make sure that the record(s) used have created/updated timestamps. Cache the contents in memory using a constant/global variable/class variable. Also store the last time the cache was updated.
Every time you need to use the array, retrieve the relevant "updated" timestamp from the DB. (You may need to use hand-coded SQL and ModelName.connection.execute to avoid pulling back all the data in the record, which ActiveRecord will probably do.) If the timestamp is later than the last time your cache was updated, pull the array from the DB and update your cache.
Use a Mutex ('require thread') when retrieving/updating the cached data, in case your server setup may use multiple threads. (I don't think that Passenger does, but I have had problems similar to threading problems when using Passenger+RMagick, so I would still use a Mutex to be safe.)
Wrap all the code which deals with the cached array in a library class (or a class method on the model used to store the data), so the details of cache management don't spill over into the rest of the application.
Do a little bit of performance testing on the cache setup using Benchmark.measure {}. If a bug in the setup actually made performance worse rather than better, that would be sad...
I'd go with option 2. You can add two constants (for the 97% and 3%) that load from a YAML file when the app initializes. That ought to shrink your memory footprint considerably.
Having said that, yikes, this is just a band-aid on a hack, but you knew that already. I'd consider putting some time into a redesign, if you have that luxury.

Resources