How to revert changes with procedural memory? - memory

Is it possible to store all changes of a set by using some means of logical paths - of the changes as they occur - such that one may revert the changes by essentially "stepping back"? I assume that something would need to map the changes as they occur, and the process of reverting them would thus ultimately be linear.
Apologies for any incoherence and this isn't applicable to any particular language. Rather, it's a problem of memory – i.e. can a set * (e.g. which may be some store of user input)* of a finite size that's changed continuously * (e.g. at any given time for any amount of time - there's no limit with regards to how much it can be changed)* be mapped procedurally such that new - future - changes are assumed to be the consequence of prior change * (in a second, mirror store that can be used to revert the state of the set all the way to its initial state)*.

You might want to look at some functional data structures. Functional languages, like Erlang, make it easy to roll back to the earlier state, since changes are always made on new data structures instead of mutating existing ones. While this feature can be used at repeatedly internally, Erlang programming typically uses this abundantly at the top level of a "process" so that on any kind of failure, it aborts both processing as well as all the changes in their entirety simply by throwing an exception (in a non-functional language, using mutable data structures, you'd be able to throw an exception to abort, but restoring originals would be your program's job not the runtime's job). This is one reason that Erlang has a solid reputation.
Some of this functional style of programming is usefully applied to non-functional languages, in particular, use of immutable data structures, such as immutable sets, lists, or trees.
Regarding immutable sets, for example, one might design a functionally-oriented data structure where modifications always generate a new set given some changes and an existing set (a change set consisting of additions and removals). You'd leave the old set hanging around for reference (by whomever); languages with automatic garbage collection reclaim the old ones when they're no longer being used (referenced).
You can put a id or tag into your set data structure, this way you can do some introspection to see what data structure id someone has a hold of. You also can capture the id of the base off of which each new version was generated; this gives you some history or lineage.
If desired, you can also capture a reference to the entire old data structure in the new one, or, one can maintain a global list of all of the sets as they are being generated. If you do, however, you'll have to take over more responsibility for storage management, as an automatic collector will probably not find any unused (unreferenced) garbage to collect without additional some help.
Database designs do some of this in their transaction controllers. For the purposes of your question, you can think of a database as a glorified set. You might look into MVCC (Multi-version Concurrency Control) as one example that is reasonably well written up in literature. This technique keeps old snapshot versions of data structures around (temporarily), meaning that mutations always appear to be in new versions of the data. An old snapshot is maintained until no active transaction references it; then is discarded. When two concurrently running transactions both modify the database, they each get a new version based off the same current and latest data set. (The transaction controller knows exactly which version each transaction is based off of, though the transaction's client doesn't see the version information.) Assuming both concurrent transactions choose to commit their changes, the versioning control in the transaction controller recognizes that the second committer is trying to commit a change set that is not a logical successor to the first (since both changes sets as we postulated above were based on the same earlier version). If possible, the transaction controller will merge the changes as if the 2nd committer was really working off the other, newer version committed by the first committer. (There are varying definitions of when this is possible, MVCC says it is when there are no write conflicts, which is a less-than-perfect answer but fast and scalable.) But if not possible, it will abort the 2nd committers transaction and inform the 2nd committer thereof (they then have the opportunity, should they like, to retry their transaction starting from the newer base). Under the covers, various snapshot versions in flight by concurrent transactions will probably share the bulk of the data (with some transaction-specific change sets that are consulted first) in order to make the snapshots cheap. There is usually no API provided to access older versions, so in this domain, the transaction controller knows that as transactions retire, the original snapshot versions they were using can also be (reference counted and) retired.
Another area this is done is using Append-Only-Files. Logging is a way of recording changes; some databases are based 100% on log-oriented designs.
BerkeleyDB has a nice log structure. Though used mostly for recovery, it does contain all the history so you can recreate the database from the log (up to the point you purge the log in which case you should also archive the database). Again someone has to decide when they can start a new log file, and when they can purge old log files, which you'd do to conserve space.
These database techniques can be applied in memory as well. (Nothing is free, though, of course ;)
Anyway, yes, there are fields where this is done.
Immutable data structures help preserve history, by simply keeping old copies; changes always go to new copies. (And efficiency techniques can make this not as bad as it sounds.)
Id's can help understand lineage without necessarily holding onto all the old copies.
If you do want to hold onto all old the copies, you have to look at your domain design to understand when/how/if old data structures possibly can get accessed with an eye toward how to eventually reclaim them. You'll mostly likely have to help get involved in defining how they get released, if ever. Or how they get archived for posterity though at the cost of slower access later.

Related

Syncing of memory and database objects upon changes in objects in memory

I am currently implementing a web application in .net core(C#) using entity framework. While working on the project, I actually encountered quite a few challenges but I will start with the one which I think are most important. My questions are as follows:
Instead of frequent loading data from the database, I am having a set of static objects which is a mirror of the data in the database. However, it is tedious and error prone when I want to ensure any changes, i.e., adding/deleting/modifying of objects are being saved to the database at real time. Is there any good example or advice that I can refer to improve my approach to do this?
Another thing is that value of some objects' properties will be changed on the fly according to the value of some other objects' properties. Something like a spreadsheet where a cell's value will be changed automatically if the value in the cell that the formula is referring to changes. I do not have a solution to do this yet. Appreciate if anyone has any example that I can refer to. But this will add another layer of complexity to sync the changes of the objects in memory to database.
At the moment, I am unsure if there is any better approach. Appreciate if anyone can help. Thanks!
Basically, you're facing a problem that's called eventual consistency. Something changes and two or more systems need to be aware at the same time. The problem here is that both changes need to be applied in order to consider the operation successful. If either one fails, you need to know.
In your case, I would use the Azure Service Bus. You can create queues and put messages on a queue. An Azure Function would handle these queue messages. You would create two queues, one for database updates, and one for the in-memory update (I think changing this to a cache service may be something to think off). Now the advantage of these queues is that you can easily drop messages on these queues from anywhere. Because you mentioned the object is going to evolve, you may need to update these objects either in the database or in memory (cache).
Once you've done that, I'd create a topic, with two subscriptions. One forwarding messages to Queue 1, and the other to Queue 2. This will solve your primary problem. In case an object changes, just send it to the topic. Both changes (database and memory) will be executed automagically.
The only problem you have now, it that you mentioned you wanted to update the database in real-time. With this scenario, you're going to have to leave that.
Also, you need to make sure you have proper alerts in place for the queues so in case you did miss a message, or your functions didn't handle it well enough, you'll receive an alert to check & correct errors.
I'm totally agree with #nineedm's and answer, but there are also other solutions.
If you introduce cache, you will always face cache revalidation problem - you have to mark cache as invalid when data were changed. Sometimes it is easy, depending on nature of cached data and how often data are changed.
If you have just single application, MemoryCache can be enough with proper specified expiration options.
If there is a cluster - you have to look at Distributed Cache solutions, for example Redis. There is MS article about that Distributed caching in ASP.NET Core

Delphi TFDMemTable, CloneCursor and source table out of sync, unless Refresh is called

the code i'm working on makes heavy usage of TFDMemTables, and clones of those tables using CloneCursor.
Sometimes, under specific conditions which I am unable to identify, the source table and its clone become out of sync: the data between them may be different, the record count as well.
Calling Refresh on the cloned table puts things back in order.
From my understanding, CloneCursor is used to address the same underlying memory where data is stored, meaning alterations to the underlying data from any of the two pointers should reflect on the other table, yet allow the user to have separate filter / record positioning per "view". so how can it possibly go out of sync?
I built a small simulator, where I can insert / delete / filter records in either the table or its clone, and observe the impact on the other one. Changes were reflected correctly.
Another downside of Refresh is that it slows the execution tremendously, if overused.
Has anyone faced similar issues or found explanations / documentation regarding this matter?
Edit:
to clarify what I mean by "out of sync", it means reading a value from the table using FieldByName will return X prior to Refresh, and Y post-refresh. I was not able to reproduce this behavior on the simulator mentioned above.

data warehouse - non volatile vs change data capture

Some say data warehouse is non-volatile. It means no update of data is allowed.
However, sometime we have to capture changes in data. For example changes in transaction status.
Then change data capture comes as a solution.
My question is, should we rely on fundamental concept of data warehouse, to be non-volatile? If we should, then what is another alternatives to capture data changes?
Non volatile doesn't mean "no updates". An accumulating snapshot fact table usually uses updates. Non volatile pertains more to the notion that data is not discarded, it's not temporary. Even if old data is archived, there's still a way to retrieve it at some point. At least this is how I understand the recommendation.
I prefer to avoid updates entirely, mostly by inserting "correction facts". For example, you have a snapshot fact table with an account balance. On a given day the account balance is 1000; a late arriving fact changes that balance and it now should be 1100. Instead of updating the previously inserted fact, I'd rather insert a correction fact with value 100, the difference between the previously known value and the new value. However, for an accumulating snapshot fact table this may not be possible or recommended. Tracking status changes is, usually, modeled through accumulating snapshots, which will require updates.
When we say the data warehouse is volatile, that simply means data is stable in a data warehouse. More data is added but data is never removed. This enables management to gain a consistent picture of the business.

breeze memory management - pattern / practice?

I have an old SL4/ria app, which I am looking to replace with breeze. I have a question about memory use and caching. My app loads lists of Jobs (a typical user would have access to about 1,000 of these jobs). Additionally, there are quite a few lookup entity types. I want to make sure these are cached well client-side, but updated per session. When a user opens a job, it loads many more related entities (anywhere from 200 - 800 additional entities) which compose multiple matrix-style views for the jobs. A user can view the list of jobs, or navigate to view 1 job at a time.
I feel that I should be concerned with memory management, especially not knowing how browsers might deal with this. Originally I felt this should all be 1 EntityManager and I would detachEntities when user navigates away from a job, but I'm thinking this might benefit from multiple managers by intended lifetime. Or perhaps I should create a new dataservice & EntityManager each time the user navigates to a new hash '/#/' area, since comments on clear() seems to indicate that this would be faster? If I did this, I suppose I will be using pub/sub to notify other viewmodels of changes to entities? This seems complex and defeating some of the benefits of breeze as the context.
Any tips or thoughts about this would be greatly appreciated.
I think I understand the question. I think I would use a multi-manager approach:
Lookups Manager - holds once-per session reference (lookup) entities
JobsView Manager - "readonly" list of Jobs in support of the JobsView
JobEditor Manager - One per edit session.
The Lookups Manager maintains the canonical copy of reference entities. You can fill it once with a single call to server (see docs for how). This Lookups Manager will Breeze-export these reference entities to other managers which Breeze-import them as they are created. I am assuming that, while numerous and diverse, the total memory footprint of reference entities is pretty low ... low enough that you can afford to have more than one copy in multiple managers. There are more complicated solutions if that is NOT so. But let that be for now.
The JobsView Manager has the necessary reference entities for its display. If you only displayed a projection of the Jobs, it would not have Jobs in cache. You might have an array and key map instead. Let's keep it simple and assume that it has all the Jobs but not their related entities.
You never save changes with this manager! When editing or creating a Job, your app always fires up a "Job Editor" view with its own VM and JobEditor Manager. Again, you import the reference entities you need and, when editing an existing Job, you import the Job too.
I would take this approach anyway ... not just because of memory concerns. I like isolating my edit sessions into sandboxes. Eases cancellation. Gives me a clean way to store pending changes in browser storage so that the user won't lose his/her work if the app/browser goes down. Opens the door to editing several Jobs at the same time ... without worrying about mutually dependent entities-with-changes. It's a proven pattern that we've used forever in SL apps and should apply as well in JS apps.
When a Job edit succeeds, You have to tell the local client world about it. Lots of ways to do that. If the ONLY place that needs to know is the JobsView, you can hardcode a backchannel into the app. If you want to be more clever, you can have a central singleton service that raises events specifically about Job saving. The JobsView and each new JobEditor communicate with this service. And if you want to be hip, you use an in-process "Event Aggregator" (your pub/sub) for this purpose. I'd probably be using Durandal for this app anyway and it has an event aggregator in the box.
Honestly, it's not that complicated to use and importing/exporting entities among managers is a ... ahem ... breeze. Well worth it compared to refreshing the Jobs List every time you return to it (although you'll want a "refresh button" too because OTHER users could be adding/changing those Jobs). You retain plenty of Breeze benefits: querying, validation, change-tracking, batch saves, entity navigation (those reference lists work "for free" in Breeze).
As a refinement, I don't know that I would automatically destroy the JobEditor view/viewmodel/manager when I returned to the JobsView. In my experience, people often return to the same Job that they just left. I might hold on to a view so you could go back and forth quickly. But now I'm getting tricky.

Sharing an large array with all users on a rails app

I have inherited an app that generates a large array for every user that visit the app. I recently discovered that it is identical for nearly all the users!!
Now I want to somehow make one copy of it so it is not built over and over again. I have thought of a few options and wanted input to see which one is the best:
1) Create a model and shove the data into the database
2) Create a YAML file and have the app load it when it initializes.
I personally like the model idea but a few engineers at work feel as though it does not deserve to be a full model. 97% of the times users will see the same exact thing but 3% of the time users will get a slightly different array (a few elements will have changed).
Any other approaches that I should consider.??..thanks in advance.
Remember that if you store the data in the DB, each request which requires the data will have to execute a DB query to pull it out. If you are running multiple server threads, each thread could have its own copy in memory (if they are all handling requests which require the use of the array). In that case, you wouldn't be saving any memory (though you might save time from not having to regenerate the array).
If you are running multiple server processes (not threads), and if the array contents change as the application is running, and the changes have to be visible to all the processes, caching in memory won't work. You will have to use the DB in that case.
From the information in your comment, I suggest you try something like this:
Store the array in your DB, and make sure that the record(s) used have created/updated timestamps. Cache the contents in memory using a constant/global variable/class variable. Also store the last time the cache was updated.
Every time you need to use the array, retrieve the relevant "updated" timestamp from the DB. (You may need to use hand-coded SQL and ModelName.connection.execute to avoid pulling back all the data in the record, which ActiveRecord will probably do.) If the timestamp is later than the last time your cache was updated, pull the array from the DB and update your cache.
Use a Mutex ('require thread') when retrieving/updating the cached data, in case your server setup may use multiple threads. (I don't think that Passenger does, but I have had problems similar to threading problems when using Passenger+RMagick, so I would still use a Mutex to be safe.)
Wrap all the code which deals with the cached array in a library class (or a class method on the model used to store the data), so the details of cache management don't spill over into the rest of the application.
Do a little bit of performance testing on the cache setup using Benchmark.measure {}. If a bug in the setup actually made performance worse rather than better, that would be sad...
I'd go with option 2. You can add two constants (for the 97% and 3%) that load from a YAML file when the app initializes. That ought to shrink your memory footprint considerably.
Having said that, yikes, this is just a band-aid on a hack, but you knew that already. I'd consider putting some time into a redesign, if you have that luxury.

Resources