Share a LUA variable across requests in Lua-Resty (openresty) - lua

I'm using OpenResty with lua-resty; and obviously for each request the program has its own variables. To share simple strings or configurations across requests I currently use lua-shared-dict.
But, if I need to share and maintain a big variable (e.g.: a complex table made by the parsing of a large INI file) across requests (the variable is created every hour, for example, in order to improve performance), how can I do it?
(e.g.: another example, imagine to translate this in LUA: https://github.com/dangrossman/node-browscap/blob/master/browscap.js; how can I maintain the browser[] array across multiple OpenResty HTTP requests, without having to re-parse it for each request?)

how can I maintain the browser[] array across multiple OpenResty HTTP requests, without having to re-parse it for each request?
I assume you mean "across multiple OpenResty workers" or "across requests that may hit different workers" as all the requests that hit the same worker can access the same variables, but if so, you probably can't. Since you seem only need to read that browser[] value (as you are parsing a large INI file), you can try a hybrid approach:
Store the result of parsing in a serialized form in one of the lua-shared-dict values (let's say iniFile).
When the request gets in, check if the iniFile variable in that request is nil and if it is, then read iniFile value from lua-shared-dict, deserialize it and store as the value of the iniFile variable that is shared by all the code that is run by same worker.
If you need to refresh it after 1h to keep it up-to-date, store the time when the value is retrieved from the dictionary, and add a check to #2 to re-retrieve when the time exceeds your limit.

Related

When fetching data using an api, is it best to store that data on another database, or is it best to keep fetching that data whenever you need it? [duplicate]

This question already has an answer here:
Caching calls to an external API in a rails app
(1 answer)
Closed 6 years ago.
I'm using the TMDB api in order to fetch information such as film titles and release years, but am wondering whether I need to create an extra database in order to store all this information locally, rather than keep having to use the api to get the info? For example, should I create a film model and call:
film.title
and by doing so accessing a local database with the title stored on it, or do I call:
Tmdb::Movie.detail(550).title
and by doing so making another call to the api?
Having dealt with a large Rails application that made service calls to about a dozen other applications, caching is your best bet. The problem with the database solution is keeping it up to date. The problem with making the calls every time is that it's too slow. There is a middle ground. For this you want Ruby on Rails Low Level Caching:
Sometimes you need to cache a particular value or query result instead of caching view fragments. Rails' caching mechanism works great for storing any kind of information.
The most efficient way to implement low-level caching is using the Rails.cache.fetch method. This method does both reading and writing to the cache. When passed only a single argument, the key is fetched and value from the cache is returned. If a block is passed, the result of the block will be cached to the given key and the result is returned.
An example that is pertinent to your use case:
class TmdbService
def self.movie_details(id)
Rails.cache.fetch("tmdb.movie.details.#{id}", expires_in: 4.hours) do
Tmdb::Movie.detail id
end
end
You can then configure your Rails application to use memcached or the database for the cache, it doesn't matter. The point is you want this cached data to expire at some point to ensure you are getting up-to-date information.
This is a big decision to make. If the amount of data you get through the API is not huge you can store all of it in your database. This way you will get the data much faster and your application will work even when the API is down.
If the amount of data you get is huge and you don't have sources to store all the data, you should at least store the most important data in your database as cache.
If you do not store any data on you own you are dependent on the source of data and it can have downtime.
Problem with storing data on your side is when the data change and you need to synchronize. In that case it is still good to store data on your side as cache to get results faster and synchronize the data periodically.
Calls to a local database are way faster than calls to external APIs. I would expect a local database to return within a few milliseconds, whereas an API will probably take hundreds of milliseconds. And local calls are less likely effected by network issues or downtimes.
Therefore I would always cache the result of an API call in a local database and occasionally updated the local version with a newer version from the API.
But in the end it depends on your requirement: Do you need real-time or is a cached version okay? How often do you need that data and how often is is updated? How fast is the API and is latency an issue? Does the API have a rate limit (a maximum number of request per time)?

Local vs. Remote SproutCore queries

What’s the difference between SC.Query.local and SC.Query.remote? Why do both kinds of queries get sent to my data source's fetch method?
The words “local” and “remote” have nothing to do with where the data comes from – all your data probably comes from your server, and at any rate SC.Query doesn’t care. What the words mean is where the decisions are made. Local queries get fulfilled (and are updated live!) locally by SproutCore itself; remote queries rely on your server to make the decisions.
The basic fundamental difference then is: “Can/should/does my local store have local copies of all of the records required to fulfill this request?” If yes, use a local query; if no, use a remote query.
For example, in a document editing application, if the query is “get all of the logged-in user’s documents”, then the decision must be made by someone with access to “all documents across all users” – which the client should clearly not have, for reasons of performance and security! This should therefore be a remote query. But once all of the user’s documents are loaded locally, then another query such as “All of the user’s documents which have been edited in the last three days” can (and should) be executed locally.
Similarly, if a query is looking across a small number of records, it makes sense for the client to load them all and search locally; if it’s a search across millions of records, which the client can’t be expected to load and search, then the remote server will have to be relied upon for fulfillment.
The store will send both remote and local queries to your data source’s fetch method, where you can tell the difference via query.get(‘isRemote’). But the store expects the data source to do different things in each case.
It is your data source’s responsibility to fulfill remote queries. You’ll probably make a server call, then load records (if needed) via store.loadRecords(recordTypes, hashes, *ids*) (which returns an array of store keys), then use store.dataSourceDidFetchQuery(query, storeKeys) to fulfill the query and specify results.
On the other hand, with local queries, the store is essentially making a courtesy call – it’s letting your data source know that this kind of information has been requested, so you can decide if you’d like to load some data from the server. The store will happily fulfill the query all by itself, whether or not the data source takes any action. If you do decide to load some data from the server, you just need to let the store know with store.dataSourceDidFetchQuery(query) – without the store keys this time, because the store is making its own decisions about what records satisfy the query.
Of course, there are other ways to implement this all. SC.Store is set up (somewhat controversially) to aggressively request information from its data source, so it’s possible to run an application using nothing but local queries, or by triggering most server requests with toMany relationships (which run through dataSource.retrieveRecord). My personal favorite approach is to do all loading of data with remote queries in my application’s state chart, and then populate my controllers and relationships with nothing but local queries.
See the Records guide, the SCQL documentation at the top of the SC.Query docs, and the SC.Store / SC.DataSource documentation for a ton more information.

Updating an existing Memcached record

I have an application that needs to perform multiple network queries each one of those returns 100 records.
I'd like to keep all the results (several thousand or so) together in a single Memcached record named according to the user's request.
Is there a way to append data to a Memcached record or do I need to read and write it back and forth and combine the old results with the new ones by the means of my application?
Thanks!
P.S. I'm using Rails 3.2
There's no way to append anything to a memcached key. You'd have to read it in and out of storage every time.
redis does allow this sort of operation, however, as rubish points out -- it has a native list type that allows you to push new data onto it. Check out the redis list documenation for information on how to do that.
You can write a class that'll emulate list in memcached (which is actually what i did)... appending to record isn't atomic operation, so it'll generate errors that'll accumulate over time (at least in memcached). Beside it'll be very slow.
As pointed out Redis has native lists, but it can be emulated in any noSQL / K-V storage solution.

Sharing an large array with all users on a rails app

I have inherited an app that generates a large array for every user that visit the app. I recently discovered that it is identical for nearly all the users!!
Now I want to somehow make one copy of it so it is not built over and over again. I have thought of a few options and wanted input to see which one is the best:
1) Create a model and shove the data into the database
2) Create a YAML file and have the app load it when it initializes.
I personally like the model idea but a few engineers at work feel as though it does not deserve to be a full model. 97% of the times users will see the same exact thing but 3% of the time users will get a slightly different array (a few elements will have changed).
Any other approaches that I should consider.??..thanks in advance.
Remember that if you store the data in the DB, each request which requires the data will have to execute a DB query to pull it out. If you are running multiple server threads, each thread could have its own copy in memory (if they are all handling requests which require the use of the array). In that case, you wouldn't be saving any memory (though you might save time from not having to regenerate the array).
If you are running multiple server processes (not threads), and if the array contents change as the application is running, and the changes have to be visible to all the processes, caching in memory won't work. You will have to use the DB in that case.
From the information in your comment, I suggest you try something like this:
Store the array in your DB, and make sure that the record(s) used have created/updated timestamps. Cache the contents in memory using a constant/global variable/class variable. Also store the last time the cache was updated.
Every time you need to use the array, retrieve the relevant "updated" timestamp from the DB. (You may need to use hand-coded SQL and ModelName.connection.execute to avoid pulling back all the data in the record, which ActiveRecord will probably do.) If the timestamp is later than the last time your cache was updated, pull the array from the DB and update your cache.
Use a Mutex ('require thread') when retrieving/updating the cached data, in case your server setup may use multiple threads. (I don't think that Passenger does, but I have had problems similar to threading problems when using Passenger+RMagick, so I would still use a Mutex to be safe.)
Wrap all the code which deals with the cached array in a library class (or a class method on the model used to store the data), so the details of cache management don't spill over into the rest of the application.
Do a little bit of performance testing on the cache setup using Benchmark.measure {}. If a bug in the setup actually made performance worse rather than better, that would be sad...
I'd go with option 2. You can add two constants (for the 97% and 3%) that load from a YAML file when the app initializes. That ought to shrink your memory footprint considerably.
Having said that, yikes, this is just a band-aid on a hack, but you knew that already. I'd consider putting some time into a redesign, if you have that luxury.

How to cache objects store them for multiple requests?

I am using Ruby on Rails and I need to store a search result set obtained by connecting to another server. The problem is I don't want to store the result set in the session and I want something where I can store the result set object over multiple requests.
The querying takes time so I don't want to repeat it. Is there a way I could store objects or cache objects so that I don't have to query it again and again?
Can use some kind of object store?
Any help would be great.
If memoizing is an option how do I memoize objects? The connection would still take time so how to store the result set.
If you don't want to store it in the session, obviously you have options here.
You can store in your DB temporarily (I assume querying the DB is faster than re-fetching from another server :)).
Also there's an option to use something like memcached. Although you have to be aware that restarting it will throw all your data away.
Depends on what you need to achieve and how you need to handle your data.
The memoization works only in single request. Can't be use in several request.
All request have all storage resource like Memcache or BDD.. If you use the ActiveSupport::Cache::MemCacheStore. you can push and fetch all object in all of your request.
If you need to store data between requests, memoization likely isn't what you are looking for. Memoization is typically used in Ruby/Rails when you are calling the same method repeatedly within a single request, and that method is expensive (either CPU intensive, multiple DB requests, etc).
You can memoize a method, which stores the result in an instance variable, and the next time it is called, the instance variable value is returned, rather than reevaluating the method. There are tons of resources out there on this if you want to look into it further.
For data that needs to persist across sessions, and may need to be shared between different users, I highly recommend memcached. Rails has some built in support for it, so it shouldn't be too hard to dig up some good resources.

Resources