I'm developing an application in rails 3.2 with mongodb as database. I am using the mongoid gem.
I want to track all the changes to the database during runtime, but I have no idea how to do it.
Any clues on this?
There are several approaches you could use to track changes to your data, depending on whether you are just interested in monitoring current activity or need to create some sort of transactional history.
In rough order of least-to-most effort:
1) Enable the MongoDB Query Profiler
If you enable the query profiler at a level of "2" it will collect profiling data for all operations (reads as well as writes) to a specific database. You can also enable this in your configuration options) to change the profiling default for all databases on mongod startup. The profile data is saved in a capped collection per database and will only contain a recent snapshot of queries. Query profiling status can also be changed at runtime, so you can easily enable or disable as required.
2) Add Mongoid callbacks to your application
Add appropriate logic Mongoid callbacks such as after_insert, after_save, after_upsert depending on what information you are trying to capture.
3) Create a tailable cursor on the MongoDB oplog
If you run MongoDB as part of a replica set (or with the --replSet option), it creates a capped collection called the oplog (operations log). You can use a tailable cursor to follow changes as they are committed to the oplog. The oplog details all changes to the database, and is the mechanism MongoDB uses for replication.
I am not familiar with Mongodb nor mongoid, but here is my idea for someone using MySQL (I hope it gives you some clue):
First you take a snapshot of your database (using a tool like mysqldump).
Then at certain intervals, you check (audit) for those records which have an updated_at value greater (later) than the time you took the snapshot, or your last audit time.
Now, you have two versions of your data, which you can check for changes.
As I said, this is just an idea, and it needs to be more refined. I hope it gives you a clue.
Related
I'm working with a real-time editor (https://beefree.io/) in my Rails app. Multiple users can edit a document and it autosaves every 10 seconds, returning a JSON object representing the document along with a version number that increments with every change. The JSON object is being saved to a table associated with a Document model.
I'm curious if optimistic locking could be used to prevent user's old changes from overwriting newer changes in the case that a save request doesn't complete in time. The idea would be to use the version number that the editor provides, and use it within the lock_version column. Can I pass an arbitrary value to lock_version like that? Or is the database meant to increment the value itself?
Another issue I have is that I'm saving to a table that has other columns that I don't want to be locked by this lock_version attribute. Can I specifically lock the real-time data column?
I'm curious if optimistic locking could be used to prevent user's old changes from overwriting newer changes in the case that a save request doesn't complete in time. The idea would be to use the version number that the editor provides, and use it within the lock_version column. Can I pass an arbitrary value to lock_version like that? Or is the database meant to increment the value itself?
With Rails' built-in optimistic locking, Active Record (and not your code) is supposed to be responsible for incrementing lock_version. You might be able to pass in your own lock_version with an update, but Rails will still auto-increment it with any other updates to the model. But, given the second part of your question...
Another issue I have is that I'm saving to a table that has other columns that I don't want to be locked by this lock_version attribute. Can I specifically lock the real-time data column?
Locking only for updates to certain columns is not a feature Rails currently supports, though there might be a gem or monkey-patch out there you could accomplish it with. However, given this as well as your need for a custom version number, it would probably be easiest just to implement optimistic locking yourself for just this update:
# last version is what the editor thinks is the last saved version
# new version is the new version provided by the editor
# this update will only succeed if the document still has version=last_version
Document.where(id: document.id, version: last_version)
.update_all(version: new_version, text: new_text)
I do want to point out that, while this will prevent out-of-order changes, collaborative editing (two or more people editing the same doc in real time) is still not going to be fully functional. To get a real collaborative editor, you'll need to implement an solution based on either OT (operational transform) or CRDT (conflict-free replicated data type). See also:
https://www.aha.io/engineering/articles/how-to-build-collaborative-text-editor-rails
https://github.com/benaubin/rails-collab
I'm working on a Ruby on Rails site.
In order to improve performance, I'd like to build up some caches of various stats so that in the future when displaying them, I only have to display the caches instead of pulling all database records to calculate those stats.
Example:
A model Users has_many Comments. I'd like to store into a user cache model how many comments they have. That way when I need to display the number of comments a user has made, it's only a simple query of the stats model. Every time a new comment is created or destroyed, it simply increments or decrements the counter.
How can I build these stats while the site is live? What I'm concerned about is that after I request the database to count the number of Comments a User has, but before it is able to execute the command to save it into stats, that user might sneak in and add another comment somewhere. This would increment the counter, but then by immediately overwritten by the other thread, resulting in incorrect stats being saved.
I'm familiar with the ActiveRecord transactions blocks, but as I understand it, those are to guarantee that all or none succeed as a whole, rather than to act as mutex protection for data on the database.
Is it basically necessary to take down the site for changes like these?
Your use case is already handled by rails. It's called counter cache. There is a rails cast here: http://railscasts.com/episodes/23-counter-cache-column
Since it is so old, it might be out of date. The general idea is there though.
It's generally not a best practice to co-mingle application and reporting logic. Send your reporting data outside the application, either to another database, to log files that are read by daemons, or to some other API that handle the storage particulars.
If all that sounds like too much work then, you don't really want real time reporting. Assuming you have a backup of some sort (hot or cold) run the aggregations and generate the reports on the backup. That way it doesn't affect running application and you data shouldn't be more than 24 hours stale.
FYI, I think I found the solution here:
http://guides.ruby.tw/rails3/active_record_querying.html#5
What I'm looking for is called pessimistic locking, and is addressed in 2.10.2.
I have an application that needs to perform multiple network queries each one of those returns 100 records.
I'd like to keep all the results (several thousand or so) together in a single Memcached record named according to the user's request.
Is there a way to append data to a Memcached record or do I need to read and write it back and forth and combine the old results with the new ones by the means of my application?
Thanks!
P.S. I'm using Rails 3.2
There's no way to append anything to a memcached key. You'd have to read it in and out of storage every time.
redis does allow this sort of operation, however, as rubish points out -- it has a native list type that allows you to push new data onto it. Check out the redis list documenation for information on how to do that.
You can write a class that'll emulate list in memcached (which is actually what i did)... appending to record isn't atomic operation, so it'll generate errors that'll accumulate over time (at least in memcached). Beside it'll be very slow.
As pointed out Redis has native lists, but it can be emulated in any noSQL / K-V storage solution.
I am running an ASP.NET MVC 3 web application and would like to gather statistics such as:
How often is a specific product viewed
Which search phrases typically return specific products in their result list
How often (for specific products) does a search result convert to a view
I would like to aggregate this data and break it down:
By product
By product by week
etc.
I'm wondering what are the cleanest and most efficient strategies for aggregating the data. I can think of a couple but I'm sure there are many more:
Insert the data into a staging table, then run a job to aggregate the data and push it into permanent tables.
Use a queuing system (MSMQ/Rhino/etc.) and create a service to aggregate this data before it ever gets pushed to the database.
My concerns are:
I would like to limit the number of moving parts.
I would like to reduce impact on the database. The fewer round trips and less extraneous data stored the better
In certain scenarios (not listed) I would like the data to be somewhat close to real-time (accurate to the hour may be appropriate)
Does anyone have real world experience with this and if so which approach would you suggest and what are the positives and negatives? If there is a better solution that I am not thinking of I'd love ot hear it...
Thanks
JP
I needed to do something similar in a recent project. We've implemented a full audit system in a secondary database, it tracks changes on every record on the live db. Essentially every insert, update and delete actually updates 2 records, one in the live db and one in the audit db.
Since we have this data in realtime on the audit db, we use this second database to fill any reports we might need. One of the tricks I've found when working with a reporting DB is to forget about normalisation. Just create a table for each report you want, and have it carry just the data you want for that report. Its duplicating data, but the performance gains are worth it.
As to filling the actual data in the reports, we use a mixture. Daily reports are generated by a scheduled task at around 3am, ditto for the weekly and monthly reports, normally over weekends or late at night.
Other reports are generated on demand, using mostly the data since the last daily, so its not that many records, once again all from the secondary database.
I agree that you should create a separate database for your statistics, it will reduce the impact on your database.
You can go with your idea of having "Staging" tables and "Aggregate" tables; that way, if you want to access the near-real-time data you go o the staging table, when you want to historical data, you go to the aggregates.
Finally, I would recommend you use an asynchronous call to save your statistics; that way your pages will not have an impact in response time.
I suggest that you will create a separate database for this. The best way is to use BI technique. There is a separate services in
SQL server for Bi.
I'm working on an application that works like a search engine, and all the time it has workers in the background searching the web and adding results to the Results table.
While everything works perfectly, lately I started getting huge response times while trying to browse, edit or delete the results. My guess is that the Results table is being constantly locked by the workers who keep adding new data, which means web requests must wait until the table is freed.
However, I can't figure out a way to lower that load on the Results table and get faster respose times for my web requests. Has anyone had to deal with something like that?
The search bots are constantly reading and adding new stuff, it adds new results as it finds them. I was wondering if maybe by only adding the bulk of the results to the database after the search would help, or if it would make things worse since it would take longer.
Anyway, I'm at a loss here and would appreciate any help or ideas.
I'm using RoR 2.3.8 and hosting my app on Heroku with PostgreSQL
PostgreSQL doesn't lock tables for reads nor writes. Start logging your queries and try to find out what is going on. Guessing doesn't help, you have to dig into it.
To check the current activity:
SELECT * FROM pg_stat_activity;
Try the NOWAIT command. Since you're only adding new stuff with your background workers, I'd assume there would be no lock conflicts when browsing/editing/deleting.
You might want to put a cache in front of the database. On Heroku you can use memcached as a cache store very easily.
This'll take some load off your db reads. You could even have your search bots update the cache when they add new stuff so that you can use a very long expiration time and your frontend Rails app will very rarely (if ever) hit the database directly for simple reads.