Storing Data In Memory: Session vs Cache vs Static - memory

A bit of backstory: I am working on an web application that requires quite a bit of time to prep / crunch data before giving it to the user to edit / manipulate. The data request task ~ 15 / 20 secs to complete and a couple secs to process. Once there, the user can manipulate vaules on the fly. Any manipulation of values will require the data to be reprocessed completely.
Update: To avoid confusion, I am only making the data call 1 time (the 15 sec hit) and then wanting to keep the results in memory so that I will not have to call it again until the user is 100% done working with it. So, the first pull will take a while, but, using Ajax, I am going to hit the in-memory data to constantly update and keep the response time to around 2 secs or so (I hope).
In order to make this efficient, I am moving the intial data into memory and using Ajax calls back to the server so that I can reduce processing time to handle the recalculation that occurs w/ this user's updates.
Here is my question, with performance in mind, what would be the best way to storing this data, assuming that only 1 user will be working w/ this data at any given moment.
Also, the user could potentially be working in this process for a few hours. When the user is working w/ the data, I will need some kind of failsafe to save the user's current data (either in a db or in a serialized binary file) should their session be interrupted in some way. In other words, I will need a solution that has an appropriate hook to allow me to dump out the memory object's data in the case that the user gets disconnected / distracted for too long.
So far, here are my musings:
Session State - Pros: Locked to one user. Has the Session End event which will meet my failsafe requirements. Cons: Slowest perf of the my current options. The Session End event is sometimes tricky to ensure it fires properly.
Caching - Pros: Good Perf. Has access to dependencies which could be a bonus later down the line but not really useful in current scope. Cons: No easy failsafe step other than a write based on time intervals. Global in scope - will have to ensure that users do not collide w/ each other's work.
Static - Pros: Best Perf. Easies to maintain as I can directly leverage my current class structures. Cons: No easy failsafe step other than a write based on time intervals. Global in scope - will have to ensure that users do not collide w/ each other's work.
Does anyone have any suggestions / comments on what I option I should choose?
Thanks!
Update: Forgot to mention, I am using VB.Net, Asp.Net, and Sql Server 2005 to perform this task.

I'll vote for secret option #4: use the database for this. If you're talking about a 20+ second turnaround time on the data, you are not going to gain anything by trying to do this in-memory, given the limitations of the options you presented. You might as well set this up in the database (give it a table of its own, or even a separate database if the requirements are that large).

I'd go with the caching method of for storing the data across any page loads. You can name the cache you want to store the data in to avoid conflicts.
For tracking user-made changes, I'd go with a more old-school approach: append to a text file each time the user makes a change and then sweep that file at intervals to save changes back to DB. If you name the files based on the user/account or some other session-unique indicator then there's no issue with conflict and the app (or some other support app, which might be a better idea in general) can sweep through all such files and update the DB even if the session is over.
The first part of this can be adjusted to stagger the write out more: save changes to Session, then write that to file at intervals, then sweep the file at larger intervals. you can tune it to performance and choose what level of possible user-change loss will be possible.

Use the Session, but don't rely on it.
Simply, let the user "name" the dataset, and make a point of actively persisting it for the user, either automatically, or through something as simple as a "save" button.
You can not rely on the session simply because it is (typically) tied to the users browser instance. If they accidentally close the browser (click the X button, their PC crashes, etc.), then they lose all of their work. Which would be nasty.
Once the user has that kind of control over the "persistent" state of the data, you can rely on the Session to keep it in memory and leverage that as a cache.

I think you've pretty much just answered your question with the pros/cons. But if you are looking for some peer validation, my vote is for the Session. Although the performance is slower (do you know by how much slower?), your processing is going to take a long time regardless. Do you think the user will know the difference between 15 seconds and 17 seconds? Both are "forever" in web terms, so go with the one that seems easiest to implement.
perhaps a bit off topic. I'd recommend putting those long processing calls in asynchronous (not to be confused with AJAX's asynchronous) pages.
Take a look at this article and ping me back if it doesn't make sense.
http://msdn.microsoft.com/en-us/magazine/cc163725.aspx

I suggest to create a copy of the data in a new database table (let's call it EDIT) as you send the initial results to the user. If performance is an issue, do this in a background thread.
As the user edits the data, update the table (also in a background thread if performance becomes an issue). If you have to use threads, you must make sure that the first thread is finished before you start updating the rows.
This allows a user to walk away, come back, even restart the browser and commit whenever she feels satisfied with the result.

One possible alternative to what the others mentioned, is to store the data on the client.
Assuming the dataset is not too large, and the code that manipulates it can be handled client side. You could store the data as an XML data island or JSON object. This data could then be manipulated/processed and handled all client side with no round trips to the server. If you need to persist this data back to the server the end resulting data could be posted via an AJAX or standard postback.
If this does not work with your requirements I'd go with just storing it on the SQL server as the other comment suggested.

Related

Syncing of memory and database objects upon changes in objects in memory

I am currently implementing a web application in .net core(C#) using entity framework. While working on the project, I actually encountered quite a few challenges but I will start with the one which I think are most important. My questions are as follows:
Instead of frequent loading data from the database, I am having a set of static objects which is a mirror of the data in the database. However, it is tedious and error prone when I want to ensure any changes, i.e., adding/deleting/modifying of objects are being saved to the database at real time. Is there any good example or advice that I can refer to improve my approach to do this?
Another thing is that value of some objects' properties will be changed on the fly according to the value of some other objects' properties. Something like a spreadsheet where a cell's value will be changed automatically if the value in the cell that the formula is referring to changes. I do not have a solution to do this yet. Appreciate if anyone has any example that I can refer to. But this will add another layer of complexity to sync the changes of the objects in memory to database.
At the moment, I am unsure if there is any better approach. Appreciate if anyone can help. Thanks!
Basically, you're facing a problem that's called eventual consistency. Something changes and two or more systems need to be aware at the same time. The problem here is that both changes need to be applied in order to consider the operation successful. If either one fails, you need to know.
In your case, I would use the Azure Service Bus. You can create queues and put messages on a queue. An Azure Function would handle these queue messages. You would create two queues, one for database updates, and one for the in-memory update (I think changing this to a cache service may be something to think off). Now the advantage of these queues is that you can easily drop messages on these queues from anywhere. Because you mentioned the object is going to evolve, you may need to update these objects either in the database or in memory (cache).
Once you've done that, I'd create a topic, with two subscriptions. One forwarding messages to Queue 1, and the other to Queue 2. This will solve your primary problem. In case an object changes, just send it to the topic. Both changes (database and memory) will be executed automagically.
The only problem you have now, it that you mentioned you wanted to update the database in real-time. With this scenario, you're going to have to leave that.
Also, you need to make sure you have proper alerts in place for the queues so in case you did miss a message, or your functions didn't handle it well enough, you'll receive an alert to check & correct errors.
I'm totally agree with #nineedm's and answer, but there are also other solutions.
If you introduce cache, you will always face cache revalidation problem - you have to mark cache as invalid when data were changed. Sometimes it is easy, depending on nature of cached data and how often data are changed.
If you have just single application, MemoryCache can be enough with proper specified expiration options.
If there is a cluster - you have to look at Distributed Cache solutions, for example Redis. There is MS article about that Distributed caching in ASP.NET Core

How to optimise computation intensive request response on rails [duplicate]

This question already has answers here:
How do I handle long requests for a Rails App so other users are not delayed too much?
(3 answers)
Closed 6 years ago.
I have an application, which does a lot of computation on few pages(requests). The web interface sends an AJAX request. The computation takes sometimes about 2-5 minutes. The problem is, by this time AJAX request times out.
We can certainly increase the timeout on the web portal, but that doesn't sound like right solution. Also, to improve performance:
Removed N+1/Duplicate queries
Implemented Caching
What else could be done here to reduce the calculation time?
Also, if it still takes longer, I was thinking of following solutions:
Do the computation beforehand and store it in DB. So when the actual request comes, there is no need of calculation. (Apprehensive about this approach. Since we will have to modify/Erase-and-recalculate this data, whenever there is some application logic change.)
Load the whole data in cache when application starts/data gets modified. But for the first time computation has to be done. Also, can't keep whole data in the cache when the application starts. So need to store it in the cache as per demand.
Maybe, do something like Angular promise, where promise gets fulfilled when the response comes from the server.
Do we have any alternative to do this efficiently?
UPDATE:
Depending on user input, the calculation might happen in few seconds. And also it might take 2-5 minutes. The scenario is, user imports an excel. The excel has been parsed and saved in DB. Now on another page, user wants to see the report/analytics graph derived with few calculations on the imported data(which has already been saved to db with background job). The calculation has to be done with many factors, so do not want to save it in DB(As pointed above). Also, when user request the report/analytics graph, It'll be bad experience to tell him that graph will be shown after sometime. You'll get email/notification etc.
The extremely typical solution is to enqueue a job for background processing, and return a job ID to the front-end. Your front-end can then poll for completion using that job ID, or you can trigger a notification such as an email to be sent to the user when the job completes.
There are a multitude of gems for this, and it is such a popular and accepted solution that Rails introduced its own ActiveJob for this exact purpose.
Here are a few possible solutions:
Optimize your tables with indexes to reduce data fetching time.
Preload all rows you'll be dealing with at the beginning, so you won't do a query each time you calculate something... it's faster/easier to #things.select { |r| r.blah } than to Thing.where(conditions)
Instead of all that, just do the computing in PLSQL on the database side. Sure, it's not the same as writing Ruby code but it could be faster.
And yes, cache the whole results set into memcache or redis or something (and expire when something change)
Run the calculation in the background (crontab?) and store the results in a JSON somewhere, or cache the entire HTML file (if you're not localizing or anything)
PS: I'm doing 1,2,3 combined with 5 (caching JSON results into memcache and then pulling the array and formatting/localizing) for a few M records from about 12 tables... sports data mainly.

How to build cached stats in database without taking down site?

I'm working on a Ruby on Rails site.
In order to improve performance, I'd like to build up some caches of various stats so that in the future when displaying them, I only have to display the caches instead of pulling all database records to calculate those stats.
Example:
A model Users has_many Comments. I'd like to store into a user cache model how many comments they have. That way when I need to display the number of comments a user has made, it's only a simple query of the stats model. Every time a new comment is created or destroyed, it simply increments or decrements the counter.
How can I build these stats while the site is live? What I'm concerned about is that after I request the database to count the number of Comments a User has, but before it is able to execute the command to save it into stats, that user might sneak in and add another comment somewhere. This would increment the counter, but then by immediately overwritten by the other thread, resulting in incorrect stats being saved.
I'm familiar with the ActiveRecord transactions blocks, but as I understand it, those are to guarantee that all or none succeed as a whole, rather than to act as mutex protection for data on the database.
Is it basically necessary to take down the site for changes like these?
Your use case is already handled by rails. It's called counter cache. There is a rails cast here: http://railscasts.com/episodes/23-counter-cache-column
Since it is so old, it might be out of date. The general idea is there though.
It's generally not a best practice to co-mingle application and reporting logic. Send your reporting data outside the application, either to another database, to log files that are read by daemons, or to some other API that handle the storage particulars.
If all that sounds like too much work then, you don't really want real time reporting. Assuming you have a backup of some sort (hot or cold) run the aggregations and generate the reports on the backup. That way it doesn't affect running application and you data shouldn't be more than 24 hours stale.
FYI, I think I found the solution here:
http://guides.ruby.tw/rails3/active_record_querying.html#5
What I'm looking for is called pessimistic locking, and is addressed in 2.10.2.

Sharing an large array with all users on a rails app

I have inherited an app that generates a large array for every user that visit the app. I recently discovered that it is identical for nearly all the users!!
Now I want to somehow make one copy of it so it is not built over and over again. I have thought of a few options and wanted input to see which one is the best:
1) Create a model and shove the data into the database
2) Create a YAML file and have the app load it when it initializes.
I personally like the model idea but a few engineers at work feel as though it does not deserve to be a full model. 97% of the times users will see the same exact thing but 3% of the time users will get a slightly different array (a few elements will have changed).
Any other approaches that I should consider.??..thanks in advance.
Remember that if you store the data in the DB, each request which requires the data will have to execute a DB query to pull it out. If you are running multiple server threads, each thread could have its own copy in memory (if they are all handling requests which require the use of the array). In that case, you wouldn't be saving any memory (though you might save time from not having to regenerate the array).
If you are running multiple server processes (not threads), and if the array contents change as the application is running, and the changes have to be visible to all the processes, caching in memory won't work. You will have to use the DB in that case.
From the information in your comment, I suggest you try something like this:
Store the array in your DB, and make sure that the record(s) used have created/updated timestamps. Cache the contents in memory using a constant/global variable/class variable. Also store the last time the cache was updated.
Every time you need to use the array, retrieve the relevant "updated" timestamp from the DB. (You may need to use hand-coded SQL and ModelName.connection.execute to avoid pulling back all the data in the record, which ActiveRecord will probably do.) If the timestamp is later than the last time your cache was updated, pull the array from the DB and update your cache.
Use a Mutex ('require thread') when retrieving/updating the cached data, in case your server setup may use multiple threads. (I don't think that Passenger does, but I have had problems similar to threading problems when using Passenger+RMagick, so I would still use a Mutex to be safe.)
Wrap all the code which deals with the cached array in a library class (or a class method on the model used to store the data), so the details of cache management don't spill over into the rest of the application.
Do a little bit of performance testing on the cache setup using Benchmark.measure {}. If a bug in the setup actually made performance worse rather than better, that would be sad...
I'd go with option 2. You can add two constants (for the 97% and 3%) that load from a YAML file when the app initializes. That ought to shrink your memory footprint considerably.
Having said that, yikes, this is just a band-aid on a hack, but you knew that already. I'd consider putting some time into a redesign, if you have that luxury.

ASP.NET MVC 3 in-memory data store

I have a project which provides users with a list of current tasks that need to be completed. Any user can complete any task, and so to ensure that only one user is working on a task at a time I need to be able to 'lock' it. I'm using SignalR for this, so a user requests a lock on a task, and if they are successful (ie. if noone else has locked it) then they will be able to access the further information that they need.
My problem is how to store the list of locked tasks. The original plan was simply to add an additional bit field 'IsLocked' to the Task table and update this when the user requested a lock and when the task was unlocked. We have about 300 concurrent users, however, and a task takes only about 3-4 minutes, meaning huge numbers of additional - and tiny - queries on the database. Therefore we were wondering about in-memory storage, simply storing a list of task ids in a 'lockedTasks' list.
I had considered using caching, but am unsure on the best ways to do this, or even if better alternatives exist. If anyone has any experience in this then some advice would be great thanks
I would avoid memory completely as IIS is not that great with it, if you found your self in the IIS need for refreshing the Application Pool for some sort of reason, your list is simply gone!
Maybe a MemCache system? If it does not loose things in the above way, but...
I would advice to be in the middle, IO File is fast that request data to a Database, specially if it's not in the same machine (witch for security reasons, it should never be), so... why not, and just to hold your list, you don't use one of the currently famous NoSQL database?
MongoDB is a document database that has a .NET Library and it's easy to use, it is not as fast as Memmory, but extremely quicker than Physical databases for what you want.
Normally the NoSQL Database will be hosted in the App_Data folder so it will be extremely fast to access and you can just hold there the task_id and user_id of all locked tasks.
Have you considered stateful filters?
Check out this links for more info:
ASP.NET MVC Filters and Statefulness
Brad Wilson: Advanced MVC
3 - (Video)
Brad Wilson: Advanced MVC 3 - (PDF)
I'm sorry, but if your app can't handle a single query every 3-4 minutes x 300 users, then you're doing something very wrong. Just browsing a site typically generates orders of magnitude more queries than that.

Resources