ActiveRecord Views vs Tables (in Rails 4) - ruby-on-rails

I'm only just starting to learn about views in ActiveRecord from reading a few blog posts and some tutorials on how to set them up in Rails.
What I would like to know is what are some of the pros and cons of using a View instead of a query on existing ActiveRecord tables? Are there real, measurable performance benefits of using a view?
For example, I have a standard merchant application with orders, line items, and products. For an admin dashboard, I have various queries, many of which ping a query that I reuse a lot in the code - namely one that returns user_id, order_id and total_revenue from that order. From a business perspective, many good stats are based off that core query. At what point does it make sense to switch to to a view instead?
The ActiveRecord docs on Views are also a bit sparse so any references to some good resources on both the why and how would be greatly appreciated.
Clarification: I am not talking about HTML views but SQL database views. Essentially, are the performance wins maintained in an ActiveRecord implementation? Since the docs are sparse, are there any potential obstacles to those performance wins that could be lost if you implement them incorrectly (i.e., any non-obvious gotchas)?

I got this information from another developer off-line, so I am answering my own question here in case other people stumble upon it.
Basically, ActiveRecord starting in Rails 3.1 began implementing prepared statements which preprocess and cache SQL statement patterns ahead of time which later allow faster query responses. You can read more about it in this blog post.
This actually might result in not much benefit in switching to views in PostgreSQL since views may not perform much better than prepared statements in PG.
The PostgreSQL documentation on prepared statements seems clear and well-written. More thoughts on PostgreSQL views performance can be found in this stackoverflow post.
Additionally, it's probably much more likely that your Rails app has performance issues due to N+1 queries - this is a great post that explains the problem and one of the easiest ways to prevent it with eager loading.

This question is not directly related to ActiveRecord and it seems more a database related question to me. The answer is "it depends". Many times people use view because they want to:
represent a subset of the data contained in a table
simplify your query by join multiple tables into a virtual table represented by the view
do aggregation in the view
hide complexity of your data
for some security reasons
etc.
But most of the aforementioned features can be implemented by using raw tables. It's just a little complicated than using views. Another place you may considering using a view is for performance reason. That's materialized view or indexed view in SQL Server. Basically it saved a copy of your data in the view in the form you want and it can greatly boost performance.

Related

How to Organize an out of control table?

Hello and good morning.
I am working on a side project where I am adding an analytic board to an already existing app. The problem is that now the users table has over 400 columns. My question is that what's a better way of organizing this table such as splintering the table off into separate tables. How do you do that and how do you communicate the tables between the new tables?
Another concern is that If I separate the table will I still be able to save into it through the user model? I have code right now that says:
user.wallet += 100
user.save
If I separate wallet from user and link the two tables will I have to change this code. The reason I'm asking this is that there is a ton of code like this in the app.
Thank you so much if you can help me understanding how to organize a database. As a bonus if there is a book that talks about database organization can you recommend it to me (preferably one that is in rails).
Edit: Is there also a way to do all of this without loosing any data. For example transfer the data to a new column on the new table then destroying the old column.
Please read about:
Database Normalization
You'll get loads of hits when searching for that string and there are many books about database design covering that subject.
It is most likely, that this table of yours lacks normalization, but you have to see yourself!
Just to give an orientation - I would get a little anxious when dealing with a tenth of that number of columns. That saying, I clearly have to stress that there might be well normalized tables with 400 columns as well as sloppily created examples with just 10 columns.
Generally speaking, the probability of dealing with bad designed tables and hence facing trouble simply rises with the number of columns.
So take your time and if you find out, that users table needs normalization next step would indeed be to spread data over several tables. Because that clearly (and most likely even heavily) affects the coding of your application here is where you thoroughly have to balance pros and cons - simply impossible to judge that from far away.
Say, you have substantial problems (e.g. fierce performance problems - you wouldn't post it) that could be eased by normalization there are different approaches of how to split data. Here please read about:
Cardinalities
Usually the new tables are linked by
Foreign Keys
, identical data (like a user id) that appear in multiple tables and that are used to join them, that is.
And finally, yes, you can do that without losing data as the overall amount of information never changes when normalizing.
In case your last question was meant to be technical: There is no problem in reading data from one column and inserting them into a new one (of a new table). That has to happen in a certain order as foreign keys have to be filled before you can use them. See
Referential Integrity
However, quite obvious: Deleting data and dropping columns interferes with the operability of your application. Good planning is due.

Multiple models for similar type data

I'm designing a small app as a way to learn Ruby on Rails which will (hopefully) display network traffic trends. Since the data I'm basing off is quite large (100s of millions of rows), we have three tables which sums up data by hour, by day and by month. I'm curious if the proper approach is to create three models, one for each type of data despite their similarities. It seems to be the way I'm leaning with the understanding I will eventually have to do some refactoring to help eliminate duplicate coding. Just curious if there were any other approaches that could be suggested?
You can consider using STI(single table inheritance). Read How (and When) to Use Single Table Inheritance in Rails for more details.
from the article:
STI should be considered when dealing with model classes that share much of the same functionality and data fields, but you as the developer may want more granular control over extending or adding to each class individually. Rather than duplicate the code over and over for multiple tables (and not being DRY) or forego the flexibility of adding idiosyncratic functionality or methods, STI permits you to use keep your data in a single table while writing specialized functionality.

Best practice for multiple resources and one database table in ruby on rails?

I'm working on a Ruby on Rails project. The project has a main database table, called 'Post'.
This table is used for storing questions, answers, comments and announcements as different post types. Currently we use only one model to access the data. But while working on the systems it feels cluttered. Because every post type has some differences.
What is a best practice:
Split the post resource into different resources, but using only the one Post table for data access?
Split the database table into comments, questions, answers and announcements tables; and use resources for each table (model+controller+views)?
Short answer: it depends.
What you're doing now is pretty close to Single Table Inheritance (STI), though it sounds like you're using a single Post class which contains all your different behaviors. STI is a valid approach to storing your model data, but like everything, it has advantages and disadvantages.
My recommendation would be to use separate classes to encapsulate the behavior of separate domain models. This is the fundamental paradigm in OOP, so I would say it's probably a best practice.
Assuming you split your behavior across classes, the question of whether you use a single table to store them or multiple tables really depends on the complexity of your queries and how many common fields there are across your models. If they share a lot of common fields, STI may work nicely. If not, you're probably not going to enjoy the overhead of all the extra reads/writes you'll pay. But all that is really secondary and can be figured out as your app grows and you learn more about the usage patterns.
Try it using STI and see how it goes. Splitting things up into multiple tables is easier than the opposite, so a migration path is not necessarily too hard.
I think the choise should be based on every certain situation. Just enumerate advantages and disadvantages of each option for you. Points to investigate and make decision could be like: number of differences between each type of entities, the way to access entities, frequency of usage of each type of entities, maintability of code, forecast of changes in the future, and many others.

Linq to Sql structure standard

I was wondering what the "correct"-way is when using a Linq to Sql-model in Visual Studio.
Should I create a new model for each of my components, say Blog, Users, News and so on and have all different xxxDataContext's with tables and SPROCs added in each of these.
Or should I create one MyDbDataContext and always work against that?
What's the pro's/con's? My gut tells me to divide it up in smaller context's, but it also feels like that could bring problems as the project expands?
What's the deal? Help me Stackoverflow :)
There will always be overhead when creating the data context as the model needs to be built. Depending on the number of tables in your database this might not be much of a big deal though. If it's only 10 tables or so, the overhead will not be much more than that for a context with say 1 table (sorry, I don't have actual stress testing to show the overhead, but, hey, maybe that gives me something to blog on this weekend).. When looking at large databases the overhead might be a enough to consider using seperate contexts.
The main advantage I would see with using a single data context is that you gain the ability to use JOINs in your LINQ query and that will be translated to T-SQL. Where as if you do the join after the arrays of objects are pulled, the performance might be a bit slower. Additionally, keeping track of multiple data contexts might be confusing and good naming conventions would be needed. So building your own data model w/ business logic which encapsulates the contexts would be a bit harder. I've done this and it's not fun :)
However, if you still feel you want to go that route, then I would recommend putting similar tables (that you might need to join) in the same context. Also, there are some tuts online that recommend using a shared MappingSource when using multiple contexts that use the same source. Information on this can be found here: http://www.albahari.com/nutshell/speedinguplinqtosql.aspx
Sorry, I know that's not really a black and white answer, but hopefully it helps :)
Addition:
Just wanted to add that I did a small test and ran 20,000 SELECT statements against a small sized table using 2 different data contexts:
DataClasses1DataContext contained mappings to all tables in the db (4 total)
DataClasses2DataContext contained a single mapping for just the one table
Results:
Time to execute 20000 SELECTs using DataClasses1DataContext: 00:00:10.4843750
Time to execute 20000 SELECTs using DataClasses2DataContext: 00:00:10.4218750
As you can see, it's not much of a difference.

Why all the Active Record hate? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
As I learn more and more about OOP, and start to implement various design patterns, I keep coming back to cases where people are hating on Active Record.
Often, people say that it doesn't scale well (citing Twitter as their prime example) -- but nobody actually explains why it doesn't scale well; and / or how to achieve the pros of AR without the cons (via a similar but different pattern?)
Hopefully this won't turn into a holy war about design patterns -- all I want to know is ****specifically**** what's wrong with Active Record.
If it doesn't scale well, why not?
What other problems does it have?
There's ActiveRecord the Design Pattern and ActiveRecord the Rails ORM Library, and there's also a ton of knock-offs for .NET, and other languages.
These are all different things. They mostly follow that design pattern, but extend and modify it in many different ways, so before anyone says "ActiveRecord Sucks" it needs to be qualified by saying "which ActiveRecord, there's heaps?"
I'm only familiar with Rails' ActiveRecord, I'll try address all the complaints which have been raised in context of using it.
#BlaM
The problem that I see with Active Records is, that it's always just about one table
Code:
class Person
belongs_to :company
end
people = Person.find(:all, :include => :company )
This generates SQL with LEFT JOIN companies on companies.id = person.company_id, and automatically generates associated Company objects so you can do people.first.company and it doesn't need to hit the database because the data is already present.
#pix0r
The inherent problem with Active Record is that database queries are automatically generated and executed to populate objects and modify database records
Code:
person = Person.find_by_sql("giant complicated sql query")
This is discouraged as it's ugly, but for the cases where you just plain and simply need to write raw SQL, it's easily done.
#Tim Sullivan
...and you select several instances of the model, you're basically doing a "select * from ..."
Code:
people = Person.find(:all, :select=>'name, id')
This will only select the name and ID columns from the database, all the other 'attributes' in the mapped objects will just be nil, unless you manually reload that object, and so on.
I have always found that ActiveRecord is good for quick CRUD-based applications where the Model is relatively flat (as in, not a lot of class hierarchies). However, for applications with complex OO hierarchies, a DataMapper is probably a better solution. While ActiveRecord assumes a 1:1 ratio between your tables and your data objects, that kind of relationship gets unwieldy with more complex domains. In his book on patterns, Martin Fowler points out that ActiveRecord tends to break down under conditions where your Model is fairly complex, and suggests a DataMapper as the alternative.
I have found this to be true in practice. In cases, where you have a lot inheritance in your domain, it is harder to map inheritance to your RDBMS than it is to map associations or composition.
The way I do it is to have "domain" objects that are accessed by your controllers via these DataMapper (or "service layer") classes. These do not directly mirror the database, but act as your OO representation for some real-world object. Say you have a User class in your domain, and need to have references to, or collections of other objects, already loaded when you retrieve that User object. The data may be coming from many different tables, and an ActiveRecord pattern can make it really hard.
Instead of loading the User object directly and accessing data using an ActiveRecord style API, your controller code retrieves a User object by calling the API of the UserMapper.getUser() method, for instance. It is that mapper that is responsible for loading any associated objects from their respective tables and returning the completed User "domain" object to the caller.
Essentially, you are just adding another layer of abstraction to make the code more managable. Whether your DataMapper classes contain raw custom SQL, or calls to a data abstraction layer API, or even access an ActiveRecord pattern themselves, doesn't really matter to the controller code that is receiving a nice, populated User object.
Anyway, that's how I do it.
I think there is a likely a very different set of reasons between why people are "hating" on ActiveRecord and what is "wrong" with it.
On the hating issue, there is a lot of venom towards anything Rails related. As far as what is wrong with it, it is likely that it is like all technology and there are situations where it is a good choice and situations where there are better choices. The situation where you don't get to take advantage of most of the features of Rails ActiveRecord, in my experience, is where the database is badly structured. If you are accessing data without primary keys, with things that violate first normal form, where there are lots of stored procedures required to access the data, you are better off using something that is more of just a SQL wrapper. If your database is relatively well structured, ActiveRecord lets you take advantage of that.
To add to the theme of replying to commenters who say things are hard in ActiveRecord with a code snippet rejoinder
#Sam McAfee Say you have a User class in your domain, and need to have references to, or collections of other objects, already loaded when you retrieve that User object. The data may be coming from many different tables, and an ActiveRecord pattern can make it really hard.
user = User.find(id, :include => ["posts", "comments"])
first_post = user.posts.first
first_comment = user.comments.first
By using the include option, ActiveRecord lets you override the default lazy-loading behavior.
My long and late answer, not even complete, but a good explanation WHY I hate this pattern, opinions and even some emotions:
1) short version: Active Record creates a "thin layer" of "strong binding" between the database and the application code. Which solves no logical, no whatever-problems, no problems at all. IMHO it does not provide ANY VALUE, except some syntactic sugar for the programmer (which may then use an "object syntax" to access some data, that exists in a relational database). The effort to create some comfort for the programmers should (IMHO...) better be invested in low level database access tools, e.g. some variations of simple, easy, plain hash_map get_record( string id_value, string table_name, string id_column_name="id" ) and similar methods (of course, the concepts and elegance greatly varies with the language used).
2) long version: In any database-driven projects where I had the "conceptual control" of things, I avoided AR, and it was good. I usually build a layered architecture (you sooner or later do divide your software in layers, at least in medium- to large-sized projects):
A1) the database itself, tables, relations, even some logic if the DBMS allows it (MySQL is also grown-up now)
A2) very often, there is more than a data store: file system (blobs in database are not always a good decision...), legacy systems (imagine yourself "how" they will be accessed, many varieties possible.. but thats not the point...)
B) database access layer (at this level, tool methods, helpers to easily access the data in the database are very welcome, but AR does not provide any value here, except some syntactic sugar)
C) application objects layer: "application objects" sometimes are simple rows of a table in the database, but most times they are compound objects anyway, and have some higher logic attached, so investing time in AR objects at this level is just plainly useless, a waste of precious coders time, because the "real value", the "higher logic" of those objects needs to be implemented on top of the AR objects, anyway - with and without AR! And, for example, why would you want to have an abstraction of "Log entry objects"? App logic code writes them, but should that have the ability to update or delete them? sounds silly, and App::Log("I am a log message") is some magnitudes easier to use than le=new LogEntry(); le.time=now(); le.text="I am a log message"; le.Insert();. And for example: using a "Log entry object" in the log view in your application will work for 100, 1000 or even 10000 log lines, but sooner or later you will have to optimize - and I bet in most cases, you will just use that small beautiful SQL SELECT statement in your app logic (which totally breaks the AR idea..), instead of wrapping that small statement in rigid fixed AR idea frames with lots of code wrapping and hiding it. The time you wasted with writing and/or building AR code could have been invested in a much more clever interface for reading lists of log-entries (many, many ways, the sky is the limit). Coders should dare to invent new abstractions to realize their application logic that fit the intended application, and not stupidly re-implement silly patterns, that sound good on first sight!
D) the application logic - implements the logic of interacting objects and creating, deleting and listing(!) of application logic objects (NO, those tasks should rarely be anchored in the application logic objects itself: does the sheet of paper on your desk tell you the names and locations of all other sheets in your office? forget "static" methods for listing objects, thats silly, a bad compromise created to make the human way of thinking fit into [some-not-all-AR-framework-like-]AR thinking)
E) the user interface - well, what I will write in the following lines is very, very, very subjective, but in my experience, projects that built on AR often neglected the UI part of an application - time was wasted on creation obscure abstractions. In the end such applications wasted a lot of coders time and feel like applications from coders for coders, tech-inclined inside and outside. The coders feel good (hard work finally done, everything finished and correct, according to the concept on paper...), and the customers "just have to learn that it needs to be like that", because thats "professional".. ok, sorry, I digress ;-)
Well, admittedly, this all is subjective, but its my experience (Ruby on Rails excluded, it may be different, and I have zero practical experience with that approach).
In paid projects, I often heard the demand to start with creating some "active record" objects as a building block for the higher level application logic. In my experience, this conspicuously often was some kind of excuse for that the customer (a software dev company in most cases) did not have a good concept, a big view, an overview of what the product should finally be. Those customers think in rigid frames ("in the project ten years ago it worked well.."), they may flesh out entities, they may define entities relations, they may break down data relations and define basic application logic, but then they stop and hand it over to you, and think thats all you need... they often lack a complete concept of application logic, user interface, usability and so on and so on... they lack the big view and they lack love for the details, and they want you to follow that AR way of things, because.. well, why, it worked in that project years ago, it keeps people busy and silent? I don't know. But the "details" separate the men from the boys, or .. how was the original advertisement slogan ? ;-)
After many years (ten years of active development experience), whenever a customer mentions an "active record pattern", my alarm bell rings. I learned to try to get them back to that essential conceptional phase, let them think twice, try them to show their conceptional weaknesses or just avoid them at all if they are undiscerning (in the end, you know, a customer that does not yet know what it wants, maybe even thinks it knows but doesn't, or tries to externalize concept work to ME for free, costs me many precious hours, days, weeks and months of my time, live is too short ... ).
So, finally: THIS ALL is why I hate that silly "active record pattern", and I do and will avoid it whenever possible.
EDIT: I would even call this a No-Pattern. It does not solve any problem (patterns are not meant to create syntactic sugar). It creates many problems: the root of all its problems (mentioned in many answers here..) is, that it just hides the good old well-developed and powerful SQL behind an interface that is by the patterns definition extremely limited.
This pattern replaces flexibility with syntactic sugar!
Think about it, which problem does AR solve for you?
Some messages are getting me confused.
Some answers are going to "ORM" vs "SQL" or something like that.
The fact is that AR is just a simplification programming pattern where you take advantage of your domain objects to write there database access code.
These objects usually have business attributes (properties of the bean) and some behaviour (methods that usually work on these properties).
The AR just says "add some methods to these domain objects" to database related tasks.
And I have to say, from my opinion and experience, that I do not like the pattern.
At first sight it can sound pretty good. Some modern Java tools like Spring Roo uses this pattern.
For me, the real problem is just with OOP concern. AR pattern forces you in some way to add a dependency from your object to infraestructure objects. These infraestructure objects let the domain object to query the database through the methods suggested by AR.
I have always said that two layers are key to the success of a project. The service layer (where the bussiness logic resides or can be exported through some kind of remoting technology, as Web Services, for example) and the domain layer. In my opinion, if we add some dependencies (not really needed) to the domain layer objects for resolving the AR pattern, our domain objects will be harder to share with other layers or (rare) external applications.
Spring Roo implementation of AR is interesting, because it does not rely on the object itself, but in some AspectJ files. But if later you do not want to work with Roo and have to refactor the project, the AR methods will be implemented directly in your domain objects.
Another point of view. Imagine we do not use a Relational Database to store our objects. Imagine the application stores our domain objects in a NoSQL Database or just in XML files, for example. Would we implement the methods that do these tasks in our domain objects? I do not think so (for example, in the case of XM, we would add XML related dependencies to our domain objects...Truly sad I think). Why then do we have to implement the relational DB methods in the domain objects, as the Ar pattern says?
To sum up, the AR pattern can sound simpler and good for small and simple applications. But, when we have complex and large apps, I think the classical layered architecture is a better approach.
The question is about the Active
Record design pattern. Not an orm
Tool.
The original question is tagged with rails and refers to Twitter which is built in Ruby on Rails. The ActiveRecord framework within Rails is an implementation of Fowler's Active Record design pattern.
The main thing that I've seen with regards to complaints about Active Record is that when you create a model around a table, and you select several instances of the model, you're basically doing a "select * from ...". This is fine for editing a record or displaying a record, but if you want to, say, display a list of the cities for all the contacts in your database, you could do "select City from ..." and only get the cities. Doing this with Active Record would require that you're selecting all the columns, but only using City.
Of course, varying implementations will handle this differently. Nevertheless, it's one issue.
Now, you can get around this by creating a new model for the specific thing you're trying to do, but some people would argue that it's more effort than the benefit.
Me, I dig Active Record. :-)
HTH
Although all the other comments regarding SQL optimization are certainly valid, my main complaint with the active record pattern is that it usually leads to impedance mismatch. I like keeping my domain clean and properly encapsulated, which the active record pattern usually destroys all hope of doing.
I love the way SubSonic does the one column only thing.
Either
DataBaseTable.GetList(DataBaseTable.Columns.ColumnYouWant)
, or:
Query q = DataBaseTable.CreateQuery()
.WHERE(DataBaseTable.Columns.ColumnToFilterOn,value);
q.SelectList = DataBaseTable.Columns.ColumnYouWant;
q.Load();
But Linq is still king when it comes to lazy loading.
#BlaM:
Sometimes I justed implemented an active record for a result of a join. Doesn't always have to be the relation Table <--> Active Record. Why not "Result of a Join statement" <--> Active Record ?
I'm going to talk about Active Record as a design pattern, I haven't seen ROR.
Some developers hate Active Record, because they read smart books about writing clean and neat code, and these books states that active record violates single resposobility principle, violates DDD rule that domain object should be persistant ignorant, and many other rules from these kind of books.
The second thing domain objects in Active Record tend to be 1-to-1 with database, that may be considered a limitation in some kind of systems (n-tier mostly).
Thats just abstract things, i haven't seen ruby on rails actual implementation of this pattern.
The problem that I see with Active Records is, that it's always just about one table. That's okay, as long as you really work with just that one table, but when you work with data in most cases you'll have some kind of join somewhere.
Yes, join usually is worse than no join at all when it comes to performance, but join usually is better than "fake" join by first reading the whole table A and then using the gained information to read and filter table B.
The problem with ActiveRecord is that the queries it automatically generates for you can cause performance problems.
You end up doing some unintuitive tricks to optimize the queries that leave you wondering if it would have been more time effective to write the query by hand in the first place.
Try doing a many to many polymorphic relationship. Not so easy. Especially when you aren't using STI.

Resources