Using Rails optimistic locking on real-time editor data

Using Rails optimistic locking on real-time editor data - ruby-on-rails

I'm working with a real-time editor (https://beefree.io/) in my Rails app. Multiple users can edit a document and it autosaves every 10 seconds, returning a JSON object representing the document along with a version number that increments with every change. The JSON object is being saved to a table associated with a Document model.
I'm curious if optimistic locking could be used to prevent user's old changes from overwriting newer changes in the case that a save request doesn't complete in time. The idea would be to use the version number that the editor provides, and use it within the lock_version column. Can I pass an arbitrary value to lock_version like that? Or is the database meant to increment the value itself?
Another issue I have is that I'm saving to a table that has other columns that I don't want to be locked by this lock_version attribute. Can I specifically lock the real-time data column?

I'm curious if optimistic locking could be used to prevent user's old changes from overwriting newer changes in the case that a save request doesn't complete in time. The idea would be to use the version number that the editor provides, and use it within the lock_version column. Can I pass an arbitrary value to lock_version like that? Or is the database meant to increment the value itself?
With Rails' built-in optimistic locking, Active Record (and not your code) is supposed to be responsible for incrementing lock_version. You might be able to pass in your own lock_version with an update, but Rails will still auto-increment it with any other updates to the model. But, given the second part of your question...
Another issue I have is that I'm saving to a table that has other columns that I don't want to be locked by this lock_version attribute. Can I specifically lock the real-time data column?
Locking only for updates to certain columns is not a feature Rails currently supports, though there might be a gem or monkey-patch out there you could accomplish it with. However, given this as well as your need for a custom version number, it would probably be easiest just to implement optimistic locking yourself for just this update:
# last version is what the editor thinks is the last saved version
# new version is the new version provided by the editor
# this update will only succeed if the document still has version=last_version
Document.where(id: document.id, version: last_version)
.update_all(version: new_version, text: new_text)
I do want to point out that, while this will prevent out-of-order changes, collaborative editing (two or more people editing the same doc in real time) is still not going to be fully functional. To get a real collaborative editor, you'll need to implement an solution based on either OT (operational transform) or CRDT (conflict-free replicated data type). See also:
https://www.aha.io/engineering/articles/how-to-build-collaborative-text-editor-rails
https://github.com/benaubin/rails-collab

Related

How to fetch NSManagedObjects in the inserted order?

I have a requirement in which locally created events have to be synced with sever synchroniously. To explain this briefly lets consider this scenario, there were two events occurred in the offline app called A and B here A > B. In this case B should sync only when A is completed its sync.
To fix this I must have an extra attribute in my entity to identify which is created earlier. This attribute can maintain either created time or any incremental number.
Here only i am facing some clarifications
Solution :1 Based on created time
If I maintain created time in that attribute, Will it be proper for below scenario
Lets say I created on event “A” today then I changed my device’s date to previous day’s date and then I am coming back to my app and creating an another event “B”. Here which one will be earlier? if app says “B” is most recently inserted object then there is no issue I can stick with this solution itself otherwise I need to move to some other solution. Is there any optimised solution to find inserted order by maintaining created time?
Solution :2 Based on incremental number
I believe core data does not provide any auto-incremental id so we need to maintain it manually. If so what would be the better approach to maintain the maximum assigned value? Will it be good if I store the maximum assigned value in NSUserDefaults? Whenever app creates an event the value will be fetched from NSUserDefaults and +1 will be added and then I will assign final value to the event. Is this approach proper one? or else please guide me if you know any better solution

There is no auto-incrementing number built into Core Data as that is more a business logic specific item. However, it is not difficult to implement.
You can store the last number used in the metadata of the persistent store. During your insert, simply increment that number, add it to each entity as you go. When you are done inserting, update the number in the metadata.
That is how Core Data updates its own insert numbers for the objectID.

Automatic model cache expiry in Rails

I was reading a few guides on caching in Rails but I am missing something important that I cannot reconcile.
I understand the concept of auto expiring cache keys, and how they are built off a derivative of the model's updated_at attribute, but I cannot figure out how it knows what the updated_at is without first doing a database look-up (which is exactly what the cache is partly designed to avoid)?
For example:
cache #post
Will store the result in a cache key something like:
posts/2-20110501232725
As I understand auto expiring cache keys in Rails, if the #post is updated (and the updated_at attribute is changed, then the key will change. But What I cannot figure out, is how will subsequent look-ups to #post know how to get the key without doing a database look-up to GET the new updated_at value? Doesn't Rails have to KNOW what #post.updated_at is before it can access the cached version?
In other words, if the key contains the updated_at time stamp, how can you look-up the cache without first knowing what it is?

In your example, you can't avoid hitting the database. However, the intent of this kind of caching is to avoid doing additional work that is only necessary to do once every time the post changes. Looking up a single row from the database should be extremely quick, and then based on the results of that lookup, you can avoid doing extra work that is more expensive than that single lookup.
You haven't specified exactly, but I suspect you're doing this in a view. In that case, the goal would be to avoid fragment building that won't change until the post does. Iteration of various attributes associated with the post and emission of markup to render those attributes can be expensive, depending on the work being done, so given that you have a post already, being able to avoid that work is the gain achieved in this case.

As I understand your question. You're trying to figure out the black magic of how caching works. Good luck.
But I think the underlying question is how do updates happen?
A cache element should have a logical key based on some part of the element, e.g. compound key, some key name based on the id for the item. You build this key to call the cache fragment when you need it. The key is always the same otherwise you can't have certainly that you're getting what you want.
One underlying assumption of caching is that the cache value is transient, i.e. if it goes away or is out of date its not a big deal. If it is a big deal then caching isn't the solution to your problem. Caching is meant to alleviate high load, i.e. a lot of traffic hitting the same thing in your database. Similar to a weblog where 1,000,000 people might be reading a particular blog post. Its not meant to speed up your database. That is done through SQL optimization, sharding, etc.
If you use Dalli as your cache store then you can set the expiry.
https://stackoverflow.com/a/18088797/793330
http://www.ruby-doc.org/gems/docs/j/jashmenn-dalli-1.0.3/Dalli/Client.html
Essentially a caching loop in Rails AFAIK works like this:
So to answer your question:
The key gets updated when you update it. An operation that is tied to the update of the post. You can set an expiry time, which essentially accomplishes the desired result by forcing the cache update via a new lookup/cache write. As far as the cache is concerned its always reading the cache element that corresponds to the key. If it gets updated, then it will read the updated element, but its not the cache's responsibility to check against the database.
What you might be looking for is something like a prepared statement. Tenderlove on Prepared Statements or a faster datastore like a less safe Postgres (i.e. tuned to NoSQL without ACID) or a NoSQL type of database here.
Also do you have indexes in your database? DB requests will be slow without proper indexes. You might just need to "tune" your database.
Also there is a wonderful gem called cells which allows you to do a lot more with your views, including faster returns vs rendering partials, at least in my experience. It also has some caching functions.

Getting all database changes

I'm developing an application in rails 3.2 with mongodb as database. I am using the mongoid gem.
I want to track all the changes to the database during runtime, but I have no idea how to do it.
Any clues on this?

There are several approaches you could use to track changes to your data, depending on whether you are just interested in monitoring current activity or need to create some sort of transactional history.
In rough order of least-to-most effort:
1) Enable the MongoDB Query Profiler
If you enable the query profiler at a level of "2" it will collect profiling data for all operations (reads as well as writes) to a specific database. You can also enable this in your configuration options) to change the profiling default for all databases on mongod startup. The profile data is saved in a capped collection per database and will only contain a recent snapshot of queries. Query profiling status can also be changed at runtime, so you can easily enable or disable as required.
2) Add Mongoid callbacks to your application
Add appropriate logic Mongoid callbacks such as after_insert, after_save, after_upsert depending on what information you are trying to capture.
3) Create a tailable cursor on the MongoDB oplog
If you run MongoDB as part of a replica set (or with the --replSet option), it creates a capped collection called the oplog (operations log). You can use a tailable cursor to follow changes as they are committed to the oplog. The oplog details all changes to the database, and is the mechanism MongoDB uses for replication.

I am not familiar with Mongodb nor mongoid, but here is my idea for someone using MySQL (I hope it gives you some clue):
First you take a snapshot of your database (using a tool like mysqldump).
Then at certain intervals, you check (audit) for those records which have an updated_at value greater (later) than the time you took the snapshot, or your last audit time.
Now, you have two versions of your data, which you can check for changes.
As I said, this is just an idea, and it needs to be more refined. I hope it gives you a clue.

How to build cached stats in database without taking down site?

I'm working on a Ruby on Rails site.
In order to improve performance, I'd like to build up some caches of various stats so that in the future when displaying them, I only have to display the caches instead of pulling all database records to calculate those stats.
Example:
A model Users has_many Comments. I'd like to store into a user cache model how many comments they have. That way when I need to display the number of comments a user has made, it's only a simple query of the stats model. Every time a new comment is created or destroyed, it simply increments or decrements the counter.
How can I build these stats while the site is live? What I'm concerned about is that after I request the database to count the number of Comments a User has, but before it is able to execute the command to save it into stats, that user might sneak in and add another comment somewhere. This would increment the counter, but then by immediately overwritten by the other thread, resulting in incorrect stats being saved.
I'm familiar with the ActiveRecord transactions blocks, but as I understand it, those are to guarantee that all or none succeed as a whole, rather than to act as mutex protection for data on the database.
Is it basically necessary to take down the site for changes like these?

Your use case is already handled by rails. It's called counter cache. There is a rails cast here: http://railscasts.com/episodes/23-counter-cache-column
Since it is so old, it might be out of date. The general idea is there though.

It's generally not a best practice to co-mingle application and reporting logic. Send your reporting data outside the application, either to another database, to log files that are read by daemons, or to some other API that handle the storage particulars.
If all that sounds like too much work then, you don't really want real time reporting. Assuming you have a backup of some sort (hot or cold) run the aggregations and generate the reports on the backup. That way it doesn't affect running application and you data shouldn't be more than 24 hours stale.

FYI, I think I found the solution here:
http://guides.ruby.tw/rails3/active_record_querying.html#5
What I'm looking for is called pessimistic locking, and is addressed in 2.10.2.

Recommendations on handling object status fields in rails apps: store versus calculate?

I have a rails app that tracks membership cardholders, and needs to report on a cardholder's status. The status is defined - by business rule - as being either "in good standing," "in arrears," or "canceled," depending on whether the cardholder's most recent invoice has been paid.
Invoices are sent 30 days in advance, so a customer who has just been invoiced is still in good standing, one who is 20 days past the payment due date is in arrears, and a member who fails to pay his invoice more than 30 days after it is due would be canceled.
I'm looking for advice on whether it would be better to store the cardholder's current status as a field at the customer level (and deal with the potential update anomalies resulting from potential updates of invoice records without updating the corresponding cardholder's record), or whether it makes more sense to simply calculate the current cardholder status based on data in the database every time the status is requested (which could place a lot of load on the database and slow down the app).
Recommendations? Or other ideas I haven't thought of?
One important constraint: while it's unlikely that anyone will modify the database directly, there's always that possibility, so I need to try to put some safeguards in place to prevent the various database records from becoming out of sync with each other.

The storage of calculated data in your database is generally an optimisation. I would suggest that you calculate the value on every request and then monitor the performance of your application. If the fact that this data is not stored becomes an issue for you then is the time to refactor and store the value within the database.
Storing calculated values, particularly those that can affect multiple tables are generally a bad idea for the reasons that you have mentioned.
When/if you do refactor and store the value in the DB then you probably want a batch job that checks the value for data integrity on a regular basis.

The simplest approach would be to calculate the current cardholder status based on data in the database every time the status is requested. That way you have no duplication of data, and therefore no potential problems with the duplicates becoming out of step.
If, and only if, your measurements show that this calculation is causing a significant slowdown, then you can think about caching the value.

Recently I had similar decision to take and I decided to store status as a field in database. This is because I wanted to reduce sql queries and it looks simpler. I choose to do it that way because I will very often need to get this status and calculating it is (at least in my case) a bit complicated.
Possible problem with it is that it get out of sync, so I added some after_save and after_destroy to child model, to keep it synchronized. And of course if somebody would modify database in different way, it would make some problems.
You can write simple rake task that will check all statuses and, if needed, correct them. You can run it in cron so you don't have to worry about it.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart