Validate FK now or recover from later? - ruby-on-rails

What's the more accepted Rails approach?
Validate that a foreign key exists on creation/update
or
"Damage control" when we want to use the non-existent foreign key?
Initial validation requires more resources on row creation/updating, and may even be redundant when I'm creating rows systematically in my code (i.e. not user generated). However, I can smoothly write my business logic, without fear of running into bad foreign key.
And on the other hand, damage control allows for quick creation and updating, but of course, more checks and recovery in my logic.
Can you think of any other pros and cons? Perhaps there's even more alternatives than just these two doctrines.
How do experienced Rails developers handle this problem?

If by validate ahead of time you mean doing an additional database request just to validate the existence of a key, I would say don't do it. We use FKs all over, and have almost never run into problems, especially not on a create or update. If it does fail, that's probably a good thing, unlike a validation you can do something about, if you just tried to add an association to a no longer existing object, that seems like a pretty good reason for an error to me.
If you have particularly volatile entities such that an instance might frequently have been deleted between the time it is instantiated and the time you try to save it in a FK, then maybe in that particular case it might be worth it, but as a general guide I would not.
I also often use FKs between tables that are deleted using logical deletes (a la acts_as_paranoid, set a deleted_at flag rather than actually delete the row), which also eases the problems of FKs failing, and I find to be a very helpful strategy at least in my app.

Related

Ensuring consistency with relational database without foreign keys

I have seen examples like discourse where tables in relational database don't have foreign keys. While the other tenants of RDB are still used like CONSTRAINTS, INDEXES , FULLTEXTSEARCH etc.. but as per Rails Active record guidelines , foreign keys are dropped.
https://meta.discourse.org/t/foreign-key-constraints-in-db/2642
Do we need to periodically check for consistency in such applications ? And in that case should it be done for each request -response that there is no invalid foreign key and correct it at same time in application layer.
Ok so the first thing to understand is why we generally put such constraints in the database. The second point will be why some people don't like this. The third will be what the ramifications of not doing so are.
Why we put RI checks in the database
A relational database is basically a big math engine performing set (well, actually bag, due to concessions with real-world data integrity problems) operations on large sets of data. As the sets grow, the ability to verify integrity of the data reduces until at some point one has trouble verifying the entire validity of the data according to the set model one follows. I have worked with PostgreSQL databases where constraints were not possible, so that in some areas we had to accept that there would be referential integrity violations.
The problem of managing referential integrity where one software project owns the database can be formidable, but they can become far worse when many different programs can read or write the same data. This gets worse because normalization and encapsulation concerns increase with the number of pathways for reading (and worse, writing) the data.
Ensuring that one can make sure that referential integrity is not violated on each write is thus an important tool in data management.
Why some people avoid RI checks in the database
Referential integrity constraints however are not free. There are two important drawbacks to using them that sometimes cause developers to decide not to.
Referential integrity checks are not free. They do impact database performance, and often the database is understood to be the least scalable part of a system, and
They divide logic, placing it in different locations and segregating data model logic from application logic. While this separation of concerns is usually desirable, where a single application owns a database, it is sometimes (but not always!) considered to be a less desirable tradeoff.
It is worth noting further that Rails guidelines don't offer solid guidance on this tradeoff. Like many ORMs, Active Record offers tools for addressing this in the application, I found plenty of examples of people using foreign keys in the database, and nobody saying "don't use them."
Concerns from avoiding RI checks in the database
The concerns and further mitigating measures of course depend on the importanc and further use of data. A lower-impact data set which is just the private data store of an application (the normal rails way) doesn't have the same implications as a higher-impact data store used that is intended to be used for decision support later. So repeated read-use is am important question in deciding whether you need to periodically re-scan.
The second concern are alternate sources of writes. In general in this model the most important concern is to prevent alternate sources of writes, outside of uses of these specific ActiveRecord-using classes.
So in answer to your question, you may or may not need to. But you should probably do a risk assessment and decide what to do. Such a risk assessment will guide this decision not only at the moment but also in the future.
As a side note
You can use foreign keys to insist on consistency while using the hooks and so forth to ensure that the logic is properly handled in the ActiveRecord component. I.e. instead of using ON DELETE CASCADE have that handled by a hook.

Are there tools for Rails to validate referential integrity of the database?

Applications have bugs or get bugs when updated, some hidden that they get detected months or years later, producing orphaned records, keys pointing nowhere etc. even with proper test suites.
Allthough Rails doesn't enforce referential integrity on the database level - and for some good reasons discussed elsewhere it will stay like that - it would still be nice to have tools that can check if the database is in a consistent state.
Since the models describe what 'should be', wouldn't it be possible that an offline tool validates the integrity of all the data. It could be run regularly, before backing up data or just for the sake of the developers good sleep.
Is there anything like this?
I don't know of such a tool. At least you are aware of the dangers of referential integrity hazards. So why make yourself suffer? Just use foreign key references in the first place, as dportas suggested.
To use it in a migration, add something like this:
execute('ALTER TABLE users ADD FOREIGN KEY (language_id) REFERENCES languages(id)')
to make the language_id column of users reference a valid row in the languages table.
Depending on your DBMS this will even automatically create an index for you. There are also plugins for rails (check out pg_on_rails) which define easy to use alias functions for those tasks.
Checking the integrity only on the backup file is pointless, as the error has already occured then, and your data may be messed up already. (I've been there)
On the other hand, when using foreign key contraints as stated above, every operation which will mess up the integrity will fail.
Think of it as going to the dentist when you feel pain (=having surgery) vs. brushing your teeth ONCE with a magical tooth paste that guarantees that your teeth will be fine for the rest of your life.
Another thing to consider: An error in your application will be much more easier to locate, because and exception will be raised at the code which tries to insert the corrupt data.
So please, use foreign key contraints. You can easily add those statements to your existing database.
How about using the DBMS to enforce RI? You don't need Rails to do that for you.

Are there advantages to using foreign key constraints when working in an active record framework like ruby-on-rails?

I'm moving back into full time web development after a 5 year hiatus. My previous experience (no active record or MVC) tells me to be very thorough with my database schema. Foreign key constraints, unique indexes, etc... can really help out when your writing spaghetti code.
Does the community still find these useful when working in an Active Record / MVC framework?
EDIT
My main concern is managing the constraints in two places; the model code and the db. This means duplicate work and it could lead to bugs. I.e. you have a unique constraint on some field in the database but the model does not know about it? I guess the reverse is true as well, you could just forget to put the constraint in the model then you would have duplicate data when you don't want it.
If you don't use constraints, your database will accumulate cases of broken referential integrity and duplicates where there should be unique values, etc.
More to the point, if you do use constraints (and don't get in the habit of disabling them from time to time as some people do), you'll always have assurance that all your data conforms to your intended data model.
That's the value of database-enforced constraints: it's the only way you can be sure of your data, and you won't have to double-check that your framework (e.g. ActiveRecord) has worked correctly. You won't have to write SQL cleanup scripts to find spurious orphans and duplicates.
It will work fine, however in cases you have to be careful of double click race condition bugs (as validates_uniqueness_of suffers from race conditions).
As far as the model is concerned it doesnt care, the logic in the database is separate to the logic in the Rails app.

The Ruby community values simplicity...what's your argument for simplifying a db schema in a new project?

I'm working on a project with developers who have not worked with Ruby OR Rails before.
They have created a schema that is too complicated, in my opinion. The schema has 117 tables, and obtaining the simplest piece of information would require traversing/joining 7 tabels...and of course, there's no "main" table that serves as a sort of key between them. The schema renders many of the rails tools like 'find' method, and many of the has_many/belongs to relationships almost useless. And coding for all of these relationships will likely be more time-consuming than we have the money to code for.
THE QUESTION:
Assuming you are VERY convinced (IMHO...hehe) that the schema is not ideal, and there are multiple ways to represent the domain, how would you argue FOR simplifying the schema (aside from what I've already said)?
I'll stand up in 2 roles here
DBA: Database admin/designer.
Dev: Application developer.
I assume the DBA is a person who really know all the Database tricks. Reaallyy Knows.
DBA:
Database is the key of the application and should have predefined structure in order to serve its purpose well and with best performance.
If you cannot use random schema (which is reasonably normalised and good) then the tools are wrong.
Dev:
The database is just a data store, so we need to keep it simple and concentrate on the application.
DBA:
Database is not a store it is the core of the application. There is no application without database.
Dev:
No. The application is the core. There is no application without the front-end and the business logic applied to it.
And the war begins...
Both points are valid and it is always trade off.
If the database will ONLY be used by RoR, then you can use it more like a simple store.
If the DB can be used by other application OR it will be used with large amount of data and high traffic it must enforce some best practices.
Generally there is no way you can disagree with DBA.
But they can understand your situation and might allow you to loose the standards a bit so you could be more productive.
So you need to work closely, together.
And you need to talk to each other to explain and prove the point why database should be like this or that.
Otherwise, the team is broken and project can be failure with hight probability.
ActiveRecord is a very handy tool. But it cannot do everything for you. It does not provide Database structure by default that you expect exactly. So it should be tuned.
On the other side. If DBA can accept that all PKs are Auto incremented integers that would make Developer's life easier (ActiveRecord does it by default).
On the other side, if developers would accept some of DBA constraints it would make DBA's life easier.
Now to answer your question:
how would you argue FOR simplifying the schema
Do not argue. Meet the team and deliver the message and point on WHY it should be done.
Maybe it really shouldn't and you don't know all the things, maybe they are not aware of something.
You could agree on the general structure of the database AND try to describe it using RoR migrations as a meta language.
This way they would see the general picture, and you would use your great ActiveRecords.
And also everybody would be on the same page.
Your DB schema should reflect the domain and its relationships.
De-normalisation should only be done when you have measured that there is a performance problem.
7 joins is not excessive or bad, provided you have good indexes in place.
The general way to make this argument up the chain is based on cost. If you do things simply, there will be less code and fewer bugs. The system will be able to be built more quickly, or with more features, and thus will create more ROI. If you can get the money manager on board with that approach, he or she may let you dictate terms to the team. There is the counterargument that extreme over-normalization prevents bad data, but I have found that this is not the case, as the complexity it engenders tends to lead to more errors and more database code in general.
The architectural and technical argument here is simple. You have decided to use Ruby on Rails. Therefore you have decided to use the ActiveRecord pattern. The ActiveRecord pattern is driven by having the database tables match the object model. That's the pattern in use here, and in many other places, so the best practices they are trying to apply for extreme data normalization simply do not apply. Buy a copy of Patterns of Enterprise Application Architecture and put the little red bookmark at page 160 so they can understand how the pattern works from the architecture perspective.
What the DBA types tend to be unaware of is how much work ActiveRecord does for you, from query generation, cascading deletes, optimistic locking, auto populated columns, versioning (with acts_as_versioned), soft deletes (with acts_as_paranoid), etc. There is a strong argument to use well tested, community supported library functions to perform these operations versus custom code that must be maintained by a DBA.
The real issue with DBAs is then that they need some work to do. Let them focus on monitoring performance, finding slow queries in the code, creating indexes and doing backups.
If you end up losing the political battle for a sane schema, you may want to consider switching to DataMapper. It's the next pattern in PoEAA. The other thing you may be able to get them to do is to create views in the database that correspond to the object model. This way, you could use many of the finding capabilities in the ActiveRecord model based on the views, but have custom insert, update, and delete methods.

Should is_paranoid be built into Rails?

Or, put differently, is there any reason not to use it on all of my models?
Some background: is_paranoid is a gem that makes calls to destroy set a deleted_at timestamp rather than deleting the row (and calls to find exclude rows with non-null deleted_ats).
I've found this feature so useful that I'm starting to include it in every model -- hard deleting records is just too scary. Is there any reason this is a bad thing? Should this feature be turned on by default in Rails?
Ruby is not for cowards who are scared of their own code!
In most cases you really want to delete the record completely. Consider a table that contains relationships between two other models. This is an obvious case when you would not like to use deleted_at.
Another thing is that your approach to database design is kinda rubyish. You will suffer of necessity to handle all this deleted_At stuff, when you have to write more complex queries to your tables than mere finds. And you surely will, when your application's DB takes lots of space so you'll have to replace nice and shiny ruby code with hacky SQL queries. You may want then to discard this column, but--oops--you have already utilized deleted_at logic somewhere and you'll have to rewrite larger pieces of your app. Gotcha.
And at the last place, actually it seems natural when things disappear upon deletion. And the whole point of the modelling is that the models try to express in machine-readable terms what's going on there. By default you delete record and it passes forever. And only reason deleted_at may be natural is when a record is to be later restored or should prevent similar record to be confused with the original one (table for Users is most likely the place you want to use it). But in most models it's just paranoia.
What I'm trying to say is that the plausibility to restore deleted records should be an explicitly expressed intent, because it's not what people normally expect and because there are cases where implicit use of it is error prone and not just adds a small overhead (unlike maintaining a created_at column).
Of course, there is a number of cases where you would like to revert deletion of records (especially when accidental deletion of valuable data leads to an unnecessary expenditure). But to utilize it you'll have to modify your application, add forms an so on, so it won't be a problem to add just another line to your model class. And there certainly are other ways you may implement storing deleted data.
So IMHO that's an unnecessary feature for every model and should be only turned on when needed and when this way to add safety to models is applicable to a particular model. And that means not by default.
(This past was influenced by railsninja's valuable remarks).
#Pavel Shved
I'm sorry but what? Ruby is not for cowards scared of code? This could be one of the most ridiculous things I have ever heard. Sure in a join table you want to delete records, but what about the join model of a has many through, maybe not.
In Business applications it often makes good sense to not hard delete things, Users make mistakes, A LOT.
A lot of your response, Pavel, is kind of dribble. There is no shame in using SQL where you need to, and how does using deleted_at cause this massive refactor, I'm at a loss about that.
#Horace I don't think is_paranoid should be in core, not everyone needs it. It's a fantastic gem though, I use it in my work apps and it works great. I am also fairly certain it hasn't forced me to resort to sql when I wouldn't need to, and I can't see a big refactor in my future due to it's presence. Use it when you need it, but it should be a gem.

Resources