Model with many has_one associations very slow on deleting records - ruby-on-rails

I have a Notification model. Any time certain actions like Comment, Like, Mention, Follow happens this table gets a single record added to it. This table is now many millions of records long. The only index I have on it is on the user_id - which has been important as I show notifications to the current_user.
On each of these related models, I included the following line
has_one :notification, dependent: :destroy
The issue is that all these actions are reversible, so whenever someone, say, unlikes, I need to destroy the related Notification.
As such Notification#destroy action is very slow now, often taking >10secs! I'm certain this is because each time a destroy happens, it has to look up foreign keys like like_id, comment_id, or mention_id
I can add indexes on all these records, but I'm concerned given how big this table is and how often it gets INSERTS and DELETES. Should I be concerned? Is there a better way to structure this?

A possibility is considering flipping the relationship around, use belongs_to rather than has_one so you rely on only the primary key on that table.
Otherwise, you should really have indices on those; it shouldn't be too much of a burden. You'll get hammered on performance if you don't; it may even be locking the database on those deletes.
Use SHOW TABLE STATUS to see index sizes to get an idea of what you might be looking at.

Related

how to avoid empty fields in a database

Frustrated with the Active Record Reputation gem, which I found very buggy, I'm trying to make my own reputation system for a Rails app. It's very primitive. I created a Contribution resource with a user_id and a value field, with an association between User.rb and Contribution.rb. Every time a user contributes to the app in some way, they get some points. If they ask a question, these lines get included in the create action of the Questions controller.
#contribution = current_user.contributions.build({:value => 3})
#contribution.save
If a user edits some Tags on the site, I do the same thing to reward superusers for their administrative work
#contribution = current_user.contributions.build({:value => 2})
#contribution.save
It then becomes very easy to calculate a user's total reputation.
One problem with this is that, in an imaginary world where users care about this app and their reputation, it would be very easy to game the system. For example, a user could just keep updating the categories or tags, and every time they do so they get 2 more points. Therefore, I wanted to somehow record what type of action the user did.
Right now, all of the work users can earn points for is somehow associated with a Question.rb, however, they get points for updating Tags, updating Categories, upvoting other people's answers etc, therefore merely storing the question_id in the contributions model wouldn't be sufficient.
Based on what I told you, can you give me some idea how I might build out the Contributions resource in order to accomplish what I want?
For example, I thought of one way of doing it that would have left a lot of null fields in my database, so I assumed it wasn't a good way. I could add a question_id and several boolean fields such as 'answering_question' 'updating_category' 'updating_tags' and each time an action is performed, record with a 'true' whether, for example, 'updating_category' is being performed. However, as mentioned, if I start rewarding lots of different types of contributions, there's going to be a lot of columns in each row that aren't being used.
I'm not sure if that's a 'real problem' (i've read about it but not sure how necessary it is to avoid), or if there's a better way of recording what type of activity each user is engaging in to earn points.
some of the current associations
User has_many :answers
Question.rb has_many :categories
Question.rb has_many :tags
for my rails application I am using thumps_up gem which is better than active_record_reputations_system ,Its more simple.
https://github.com/bouchard/thumbs_up

Rails delete records into separate table

In Rails, when a record is to be deleted, I want to maintain a separate table for such deleted records (that in structure would be analogous to the former).
One way to achieve this would be to obviously copy the structure, validations and associations from the first model and paste it into the deleted items model. This would, however, result in a lot of code redundancy and is not a scalable solution.
Is there a way to achieve this in Rails without much (or any) code redundancy
or a solution that might be more scalable than the one mentioned
above?
I am using Ruby 1.9.3-p125 and Rails 3.2.
UPDATE
I did consider using an additional is_deleted column in the table, however, I decided against it because I didn't want this table to get too big and messy with deleted posts. I don't intend to really access these deleted posts - these are merely stored for record-keeping or archival purposes. Adding this column would also make accessing this table slower and more importantly, I am afraid that I may miss the check is_deleted == false in some SQL condition somewhere - even if I include this check in the default_scope of the model.
It is good idea to move them to separate table so that your primary table have less number of records and performance is not decreased by time.
Use Rails ActiveRecord Callback for deletion i.e.
before_destroy :move_to_trash
.
.
def move_to_trash
Trash.create!(self)
end
In this way, when a record is deleted, its copy will be created in Trash table.
Well, basically you want to keep the records and not throw them away. So you may want to just mark them "deleted" and tweak the logic in your code to not consider those records while retrieving them.
Add a 'deleted' column in your original table. Set the default scope of the model to exclude deleted records.

Dynamically creating new Active Record models and database tables

I am not sure exactly what I should name this question. I just started server-side programming and I need some help.
All the tutorials I have read so far on RoR deal with creating a pre-defined table and with pre-defined fields (id, name, email, etc etc). They use ActiveRecord as base class and saving to db is handled automatically by superclass.
What I am trying to program is something that allows user-defined table with fields. So think of this way. The web UI will have an empty table, the user will name the table, and add columns (field), and after that, add rows, and then later save it. How would I implement this? I am not asking for details, just an overview of it. As I said, all the tutorials I have read so far deal with pre-defined tables with fields where the ActiveRecord subclass is predefined.
So in a nutshell, I am asking, how to create tables in db on runtime, and add fields to the tables.
Hope I was clear, if not, please let me know and i will try to elaborate a bit more.
Thanks.
Unless you're building a DB administration tool (and even maybe then), allowing the user direct access to the database layer in the way you're suggesting is probably a bad idea. Apart from issues of stability and security, it'll get really slow if your users are creating lots of tables.
For instance, if you wanted to search for a certain value across 100 of your users' tables, you'd have to run 100 separate queries. The site would get exponentially slower the more user tables that were created.
A saner way to do it might be to have a Table model like this
class Table < ActiveRecord::Base
has_many :fields
has_many :rows
end
Every table would have fields attached to it, and rows to store the corresponding data (which would be encoded somehow).
However, as #Aditya rightly points out, this is not really beginner stuff!
I agree with previous answers generally speaking. It's not clear from your question why you want to create a table at runtime. It's not really obvious what the advantage of doing this would be. If you are just trying to store data that seems to fit into a table with rows and columns, why not just store it as an array in a field of your user table. If your user is allowed to create many tables, then you could have something like
class User < ActiveRecord::Base
has_many :tables
end
and then each table might have a field to store a serialized array. Or you could go with Alex's suggestion - the best choice really depends on what you are going to do with the data, how often it changes, whether you need to search it and so on ...
You can create a database as shown in tutorials which stores name of tables and their columns name those your user want. Then you can have worker (which can be build using Redis and Resque, here is simple Tut on Resque and Redis) and have those worker run migration (write migration with variables and use params to replace them) for you for new table in DB as soon as new entry is made in database. Tell me if you have questions on this.

A database design for variable column names

I have a situation that involves Companies, Projects, and Employees who write Reports on Projects.
A Company owns many projects, many reports, and many employees.
One report is written by one employee for one of the company's projects.
Companies each want different things in a report. Let's say one company wants to know about project performance and speed, while another wants to know about cost-effectiveness. There are 5-15 criteria, set differently by each company, which ALL apply to all of that company's project reports.
I was thinking about different ways to do this, but my current stalemate is this:
To company table, add text field criteria, which contains an array of the criteria desired in order.
In the report table, have a company_id and columns criterion1, criterion2, etc.
I am completely aware that this is typically considered horrible database design - inelegant and inflexible. So, I need your help! How can I build this better?
Conclusion
I decided to go with the serialized option in my case, for these reasons:
My requirements for the criteria are simple - no searching or sorting will be required of the reports once they are submitted by each employee.
I wanted to minimize database load - where these are going to be implemented, there is already a large page with overhead.
I want to avoid complicating my database structure for what I believe is a relatively simple need.
CouchDB and Mongo are not currently in my repertoire so I'll save them for a more needy day.
This would be a great opportunity to use NoSQL! Seems like the textbook use-case to me. So head over to CouchDB or Mongo and start hacking.
With conventional DBs you are slightly caught in the problem of how much to normalize your data:
A sort of "good" way (meaning very normalized) would look something like this:
class Company < AR::Base
has_many :reports
has_many :criteria
end
class Report < AR::Base
belongs_to :company
has_many :criteria_values
has_many :criteria, :through => :criteria_values
end
class Criteria < AR::Base # should be Criterion but whatever
belongs_to :company
has_many :criteria_values
# one attribute 'name' (or 'type' and you can mess with STI)
end
class CriteriaValues < AR::Base
belongs_to :report
belongs_to :criteria
# one attribute 'value'
end
This makes something very simple and fast in NoSQL a triple or quadruple join in SQL and you have many models that pretty much do nothing.
Another way is to denormalize:
class Company < AR::Base
has_many :reports
serialize :criteria
end
class Report < AR::Base
belongs_to :company
serialize :criteria_values
def criteria
self.company.criteria
end
# custom code here to validate that criteria_values correspond to criteria etc.
end
Related to that is the rather clever way of serializing at least the criteria (and maybe values if they were all boolean) is using bit fields. This basically gives you more or less easy migrations (hard to delete and modify, but easy to add) and search-ability without any overhead.
A good plugin that implements this is Flag Shih Tzu which I've used on a few projects and could recommend.
Variable columns (eg. crit1, crit2, etc.).
I'd strongly advise against it. You don't get much benefit (it's still not very searchable since you don't know in which column your info is) and it leads to maintainability nightmares. Imagine your db gets to a few million records and suddenly someone needs 16 criteria. What could have been a complete no-issue is suddenly a migration that adds a completely useless field to millions of records.
Another problem is that a lot of the ActiveRecord magic doesn't work with this - you'll have to figure out what crit1 means by yourself - now if you wan't to add validations on these fields then that adds a lot of pointless work.
So to summarize: Have a look at Mongo or CouchDB and if that seems impractical, go ahead and save your stuff serialized. If you need to do complex validation and don't care too much about DB load then normalize away and take option 1.
Well, when you say "To company table, add text field criteria, which contains an array of the criteria desired in order" that smells like the company table wants to be normalized: you might break out each criterion in one of 15 columns called "criterion1", ..., "criterion15" where any or all columns can default to null.
To me, you are on the right track with your report table. Each row in that table might represent one report; and might have corresponding columns "criterion1",...,"criterion15", as you say, where each cell says how well the company did on that column's criterion. There will be multiple reports per company, so you'll need a date (or report-number or similar) column in the report table. Then the date plus the company id can be a composite key; and the company id can be a non-unique index. As can the report date/number/some-identifier. And don't forget a column for the reporting-employee id.
Any and every criterion column in the report table can be null, meaning (maybe) that the employee did not report on this criterion; or that this criterion (column) did not apply in this report (row).
It seems like that would work fine. I don't see that you ever need to do a join. It looks perfectly straightforward, at least to these naive and ignorant eyes.
Create a criteria table that lists the criteria for each company (company 1 .. * criteria).
Then, create a report_criteria table (report 1 .. * report_criteria) that lists the criteria for that specific report based on the criteria table (criteria 1 .. * report_criteria).

Best way to handle multiple tables to replace one big table in Rails? (e.g. 'todo_items1', 'todo_items2', etc., instead of just 'todo_items')?

Update:
Originally, this post was using Books as the example entity, with
Books1, Books2, etc. being the
separated table. I think this was a
bit confusing, so I've changed the
example entity to be "private
todo_items created by a particular
user."
This kind of makes Horace and Ryan's original comments seem a bit off, and
I apologize for that. Please know that
their points were valid when it looked
like I was dealing with books.
Hello,
I've decided to use multiple tables for an entity (e.g. todo_items1, todo_items2, todo_items3, etc.), instead of just one main table which could end up having a lot of rows (e.g. just todo_items). I'm doing this to try and to avoid a potential future performance drop that could come with having too many rows in one table.
With that, I'm looking for a good way to handle this in Rails, mainly by trying to avoid loading a bunch of unused associations for each User object. I'm guessing that other have done something similar, so there's probably a few good tips/recommendations out there.
(I know that I could use a partition for this, but, for now, I've decided to go the 'multiple tables' route.)
Each user has their todo_items placed into a specific table. The actual "todo items" table is chosen when the user is created, and all of their todo_items go into the same table. The data in their todo items collection is private, so when it comes time to process a users todo_items, I'll only have to look at one table.
One thing I don't particularly want to have is a bunch of unused associations in the User class. Right now, it looks like I'd have to do the following:
class User < ActiveRecord::Base
has_many :todo_items1, :todo_items2, :todo_items3, :todo_items4, :todo_items5
end
class todo_items1 < ActiveRecord::Base
belongs_to :user
end
class todo_items2 < ActiveRecord::Base
belongs_to :user
end
class todo_items3 < ActiveRecord::Base
belongs_to :user
end
The thing is, for each individual user, only one of the "todo items" tables would be usable/applicable/accessible since all of a user's todo_items are stored in the same table. This means only one of the associations would be in use at any time and all of the other has_many :todo_itemsX associations that were loaded would be a waste.
For example, with a user.id of 2, I'd only need todo_items3.find_by_text('search_word'), but the way I'm thinking of setting this up, I'd still have access to todo_items1, todo_items2, todo_items4 and todo_items5.
I'm thinking that these "extra associations" adds extra overhead and makes each User object's size in memory much bigger than it has to be. Also, there's a bunch of stuff that Ruby/Rails is doing in the background which may cause other performance problems.
I'm also guessing that there could be some additional method call/lookup overhead for each User object, since it has to load all of those associations, which in turn creates all of those nice, dynamic model accessor methods like User.find_by_something.
I don't really know Ruby/Rails does internally with all of those has_many associations though, so maybe it's not so bad. But right now I'm thinking that it's really wasteful, and that there may just be a better, more efficient way of doing this.
So, a few questions:
1) Is there's some sort of special Ruby/Rails methodology that could be applied to this 'multiple tables to represent one entity' scheme? Are there any 'best practices' for this?
2) Is it really bad to have so many unused has_many associations for each object? Is there a better way to do this?
3) Does anyone have any advice on how to abstract the fact that there's multiple "todo items" tables behind a single todo_items model/class? For example, so I can call todo_items.find_by_text('search_phrase') instead of todo_items3.find_by_text('search_phrase').
Thank you!
This is not the way to scale.
It would probably be better going with master-slave replication and proper indexing (besides primary key) on fields such as "title" and/or "author" if that's what you're going to be looking up books based on. Having it in n-tables, how are you going to know the best place to go looking for the book the user is after? Are you going to go looking through 4 tables?
I agree with Horace: " don't try to solve a performance issue before you have figures to prove it." I suggest, however, that you should really look into adding indexes to your table if you want lookups to be fast. If they aren't fast, then tell us how they aren't fast and we will tell you how to make it go ZOOOOOM.

Resources