What are alternatives to find_by_sql for computationaly-heavy queries? - ruby-on-rails

Our company loves reports that calculate obscure metrics--metrics that cannot be calculated with ActiveRecord's finders (except find_by_sql) and where ruport's ruby-based capabilities are just too slow.
Is there a plugin or gem or db adapter out there that will do large calculations in the database layer? What's your solution to creating intricate reports?

Although not database agnostic, our solution is plpgsql functions where it becomes really slow to use Ruby and ActiveRecord.

Thoughtbot's Squirrel plugin adds a lot of Ruby-ish functionality to ActiveRecord's find method, with multi-layered conditionals, ranges, and nested model associations:
www.thoughtbot.com/projects/squirrel/

Is there anything inherent about your reports that prevents the use of an SQL view or stored procedure?
In one particular project, a technique I often find useful is to create your SQL query (that may be quite complex) as a named view in the database, and then use
YourModel.connection.select_all(query)
to pull back the data. It's not an optimal approach; I'm keen to explore improvements to it.
Unfortunately, as you suggested, the support for doing computing complex database-based reports within rails seems fairly limited.

It sounds as if your tables could be normalized. At one place I worked, the amount of normalization we did was impacting our reporting needs, so we created some shadow tables that contained a bunch of the aggregate data, and did reporting against that.
I agree with Neil N's comment that the question is a little vague, but perhaps this gets you moving in the right direction?

You might want to look at using DataMapper or Sequel for your ORM if you're finding that ActiveRecord lacks the expressiveness you need for complex queries. Switching away from ActiveRecord wouldn't be a decision to take likely, but it might be worth investigating at least.

Related

Is it a good practice to directly write SQL query in rails?

I am making an application with very huge data and multiple joins. Is it a bad practice to right away use the full sql string in rails? What are the downsides of writing the full sql query in rails?
It's only bad practice if you do it without understanding the alternatives.
That said there is rarely a reason to do this. The framework encapsulates it for you and the benefit is that you have to write less code. The other benefit is database independence. The more direct queries you write, the more likely you'll write something that will break when you switch database engines.
It is easy to test. If you are using the framework properly (i.e. optimizing ActiveRecord as you will find discussed in numerous articles) and still feel like your queries are too slow...you can always benchmark direct queries.
But not knowing how to do something using ActiveRecord associations is not a good reason to resort to direct SQL.
http://guides.rubyonrails.org/association_basics.html
SQL is not a 'bad practice' per se. Database systems have plenty of native SQL ways of doing things that would be much slower to execute and more complex to write and maintain if written in Ruby. Like Oracle's Analytic Functions.
That said, ActiveRecord is pretty easy to write and you probably aren't going to get a performance boost just by using a SQL query. At least not if the query you write resembles the query ActiveRecord would have written anyway! ;)
Perhaps you should try to work with ActiveRecord and only resort to SQL if you hit problems you can't solve another way. That way you keep your code simple until you need to do it another way (i.e. don't 'optimise early').
I generally try to make things work in ActiveRecord (or DataMapper or Sequel or whatever), but I have definitely resorted finder_sql when the job needed doing quickly and I couldn't get where I wanted to go using the ORM's 'sugar'. Other times I have based a rails object on a single massive view in the database.
Hope this helps.
:D
If you need more powerfull syntax than provides standard ActiveRecord module, see meta_where gem.

The Ruby community values simplicity...what's your argument for simplifying a db schema in a new project?

I'm working on a project with developers who have not worked with Ruby OR Rails before.
They have created a schema that is too complicated, in my opinion. The schema has 117 tables, and obtaining the simplest piece of information would require traversing/joining 7 tabels...and of course, there's no "main" table that serves as a sort of key between them. The schema renders many of the rails tools like 'find' method, and many of the has_many/belongs to relationships almost useless. And coding for all of these relationships will likely be more time-consuming than we have the money to code for.
THE QUESTION:
Assuming you are VERY convinced (IMHO...hehe) that the schema is not ideal, and there are multiple ways to represent the domain, how would you argue FOR simplifying the schema (aside from what I've already said)?
I'll stand up in 2 roles here
DBA: Database admin/designer.
Dev: Application developer.
I assume the DBA is a person who really know all the Database tricks. Reaallyy Knows.
DBA:
Database is the key of the application and should have predefined structure in order to serve its purpose well and with best performance.
If you cannot use random schema (which is reasonably normalised and good) then the tools are wrong.
Dev:
The database is just a data store, so we need to keep it simple and concentrate on the application.
DBA:
Database is not a store it is the core of the application. There is no application without database.
Dev:
No. The application is the core. There is no application without the front-end and the business logic applied to it.
And the war begins...
Both points are valid and it is always trade off.
If the database will ONLY be used by RoR, then you can use it more like a simple store.
If the DB can be used by other application OR it will be used with large amount of data and high traffic it must enforce some best practices.
Generally there is no way you can disagree with DBA.
But they can understand your situation and might allow you to loose the standards a bit so you could be more productive.
So you need to work closely, together.
And you need to talk to each other to explain and prove the point why database should be like this or that.
Otherwise, the team is broken and project can be failure with hight probability.
ActiveRecord is a very handy tool. But it cannot do everything for you. It does not provide Database structure by default that you expect exactly. So it should be tuned.
On the other side. If DBA can accept that all PKs are Auto incremented integers that would make Developer's life easier (ActiveRecord does it by default).
On the other side, if developers would accept some of DBA constraints it would make DBA's life easier.
Now to answer your question:
how would you argue FOR simplifying the schema
Do not argue. Meet the team and deliver the message and point on WHY it should be done.
Maybe it really shouldn't and you don't know all the things, maybe they are not aware of something.
You could agree on the general structure of the database AND try to describe it using RoR migrations as a meta language.
This way they would see the general picture, and you would use your great ActiveRecords.
And also everybody would be on the same page.
Your DB schema should reflect the domain and its relationships.
De-normalisation should only be done when you have measured that there is a performance problem.
7 joins is not excessive or bad, provided you have good indexes in place.
The general way to make this argument up the chain is based on cost. If you do things simply, there will be less code and fewer bugs. The system will be able to be built more quickly, or with more features, and thus will create more ROI. If you can get the money manager on board with that approach, he or she may let you dictate terms to the team. There is the counterargument that extreme over-normalization prevents bad data, but I have found that this is not the case, as the complexity it engenders tends to lead to more errors and more database code in general.
The architectural and technical argument here is simple. You have decided to use Ruby on Rails. Therefore you have decided to use the ActiveRecord pattern. The ActiveRecord pattern is driven by having the database tables match the object model. That's the pattern in use here, and in many other places, so the best practices they are trying to apply for extreme data normalization simply do not apply. Buy a copy of Patterns of Enterprise Application Architecture and put the little red bookmark at page 160 so they can understand how the pattern works from the architecture perspective.
What the DBA types tend to be unaware of is how much work ActiveRecord does for you, from query generation, cascading deletes, optimistic locking, auto populated columns, versioning (with acts_as_versioned), soft deletes (with acts_as_paranoid), etc. There is a strong argument to use well tested, community supported library functions to perform these operations versus custom code that must be maintained by a DBA.
The real issue with DBAs is then that they need some work to do. Let them focus on monitoring performance, finding slow queries in the code, creating indexes and doing backups.
If you end up losing the political battle for a sane schema, you may want to consider switching to DataMapper. It's the next pattern in PoEAA. The other thing you may be able to get them to do is to create views in the database that correspond to the object model. This way, you could use many of the finding capabilities in the ActiveRecord model based on the views, but have custom insert, update, and delete methods.

ORM that accepts SQL and simply maps the objects and relations?

I find that the use of ActiveRecord affects the way I design the database schema (though I wish it wouldn't). I'm thinking about the inefficiency of fetching data and how to reduce the overall number of queries. The find :include option can only get you so far. I come from writing stored procs that grab everything you need (for a particular screen or activity) in a single call.
I really don't need an API for writing my SQL. I'm entirely content to write T-SQL in a non-Ruby way. The only thing I want is to have my query results mapped to instantiated models and their associations. Are there any ORMs that take this approach? Ones that can handle multiple selects (stored procs?) and maybe even the use of temp tables...
EDIT:
I rephrased and clarified what I was really after with this question.
I find that the need for a quite complex query is about the 20% of a project, so using an ORM quite helps.
When that 20% arise, I find myself doing something alike what you ask, working excluvely with SQL. ActiveRecord and DataMapper have a find_by_sql method that helps you more, but doesn't instantiate all models (at least on ActiveRecord, if I'm not mistaken, that is).
Have you tried using Sequel? It's an ORM too but let's you have an easier approach and more flexibility on what you have.
Besides that, I can't think of a more focused solution on the ORMs realm. Keep in mind that the ORM tries to abstract the querying interface to simplify. If you are feeling confortable with using raw SQL, maybe you could be more productive with just an SQL facade interface.
look at the ActiveRecord find_by_sql() method.
http://github.com/rails/rails/blob/491f1b5f36a11427decc5d7f3558b5c3f5f243c1/activerecord/lib/active_record/base.rb#L660

Sequel in conjunction with ActiveRecord any gotchas?

I'm considering using Sequel for some of my hairier SQL that I find too hard to craft in Active Record.
Are there any things I need to be aware of when using Sequel and ActiveRecord on the same project? (Besides the obvious ones like no AR validations in sequel etc...)
Disclaimer: I'm the Sequel maintainer.
Sequel is easy to use along side of or instead of ActiveRecord when using Rails. You do have to setup the database connection manually, but other than that, the usage is similar. Your Sequel model files go in app/models and work similarly to ActiveRecord models.
Setting up the database connections isn't tedious, it's generally one line in environment.rb to require sequel, and a line in each environment file (development.rb, test.rb, production.rb) to do something like:
DB = Sequel.connect(...)
So it's only tedious if you consider 4 lines of setup code tedious.
Using raw SQL generally isn't a problem unless you are targeting multiple databases. The main reason to avoid it is the increased verbosity. Sequel supports using raw SQL at least as easily as ActiveRecord, but the times where you need to use raw SQL are generally fairly rare in Sequel.
BTW, Sequel ships with multiple validation plugins. The validation_class_methods plugin is similar to ActiveRecord validations, using class methods. The validation_helpers plugin has a simpler implementation using instance level methods, but both can do roughly the same thing.
Finally, I'll say that if you already have working ActiveRecord code that does what you want, it's probably not worth the effort to port the code to Sequel unless you plan on adding features.
Personally, I wouldn't do it. Just managing connection more-or-less by hand would be tedious, for a start. I'd be more inclined, if I felt Sequel was the stronger option, to hold off for Rails 3.0 (or perhaps start developing against Edge Rails) where it should be fairly easy to switch ORMs, if Yehuda and co are doing their stuff right. A lot more Merb-like than now, at least.
This was DHH's take on the subject (I'm not saying it should be taken as gospel truth, mind, but it is, so to speak, from the horse's mouth):
But Isn’t Sql Dirty?
Ever since programmers started to
layer object-oriented systems on top
of relational databases, they’ve
struggled with the question of how
deep to run the abstraction. Some
object-relational mappers seek to
eradicate the use of SQL entirely,
striving for object oriented purity by
forcing all queries through another OO
layer.
Active Record does not. It was built
upon the notion that SQL is neither
dirty nor bad, just verbose in the
trivial cases. The focus is on
removing the need to deal with the
verbosity in those trivial cases but
keeping the expressiveness around for
hard queries – the type SQL was
created to deal with elegantly.
Therefore, you shouldn’t feel guilty
when you use find_by_sql() to handle
either performance bottlenecks or hard
queries. Start out using the
object-oriented interface for
productivity and pleasure, and the dip
beneath the surface for a
close-to-the-metal experience when you
need to.
(Quote was found here, original text is on p334 of AWDRWR, the "hammock" book).
I think that's reasonable.
Are we talking about something that find_by_sql can't handle? Or are we talking about complex non-SELECT stuff that execute can't deal with?
Any examples we could look at?

If using LINQ to SQL is there any good reason to learn SQL queries/syntax anymore?

I do understand SQL querying and syntax because of previous work using ASP.NET web forms and stored procedures, but I would not call myself an "expert" in it.
Since I have been using ASP.NET MVC and LinqToSql it seems that so much of the heavy lifting is done for me and encapsulated away at the SQL end that I'm questioning whether there is any benefit in continuing to top-up my knowledge of SQL queries or whether I'm better off focusing my "learning time" on other things.
Your thoughts?
You should absolutely know SQL and keep your knowledge up-to-date. ORM is designed to ease the pain of doing something tedious that you know how to do, much like a graphing calculator is designed to do something that you can do by hand (and should know how).
The minute you start letting your ORM do things in the database that you don't fully understand is the minute you've lost control over your model.
In my opinion, knowing SQL is more valuable than any vendor specific technology. There will always be cases when those nice prepackaged frameworks will not be able to solve a particular situation and knowledge of advanced SQL will be required.
It is still important to learn SQL queries/syntax. The reason is you need to at least understand how Linq to SQL translate to the database behind the scenes.
This will help you when you find problems, for example something not updating correctly. Or a query performance needs to increase.
It is the same that you need to understand what assembly language is and how it eventually becomes machine language. However in all you don't have to be an expert, but at least be able to write in it and understand it.
It is still important to know SQL and the paradigm (set-based) behind it to be able to create efficient SQL statements, even if your using LinqToSql or any other OR/M.
There will always be situations where you will want to write the query in native SQL because it is not possible to write it in LinqToSql / HQL / whatever, or LinqToSql is just not able to generate a performant query for it.
There will always be situations where you will want to execute an ad-hoc query on a database using native sql, etc...
I think LinqToSQL (or other Linq to SQL providers) should not prevent you of knowing SQL.
When your query is not returning what you expect, or when it takes 30 minutes to run on the production database, you'd better be able to understand what LTS has generated, and why it is failing.
I know, it's a rehashed topic, and it might not be applicable to what you do ("small" database that will never hit that kind of problem etc), but it pays not to get too oblivious of abstraction layers sometimes.
The other reason is, Linq does not the whole range of what you can do in SQL, so you might have to resort to writing "raw" SQL, even if the result is materialised as objects.
It depends what you're working on, and from what you said it might make more sense to focus on other areas.
Having said that I find knowing SQL allows the following:
The ability to write queries to extract data from systems easily.
For adhoc queries, or for checking things.
The ability to write complex stored procedures, which allows me to group complex data processing in one place, where it should be, in the database.
The ability to fine tune LinqToSql by adding indexes, and understanding the SQL/query plan's it procedures.
Most of these are more of a help on more complex systems, so if you're not working on those it might not be as much of a help.
It may help in your situation to list the technologies which might be of use, and then prioritise them.
In order words make a development plan for yourself, which may encompass more then just learning technical knowledge but allow a more broad focus like design patterns, communication skills and other areas.
SQL is a tool. Linq to SQL is also a tool. Having more tools in your belt is a good thing. It'll give you more perspectives when attacking a problem.
Consider a scenario where you may want to do multiple queries or multiple updates to the db in one operation. If you can write TSQL you can potentially save yourself a lot of roundtrips to the database.
I would say you definately need to know your SQL in depth, because you need to know what code your Linq-expression generates and what effects the code will have if you want high performing queries. Sure you might get the job done in most cases, but sometimes there is a huge difference in performance in very subtle difference in Linq-syntax.
I ran into this this morning actually, where I had done .Any(d => d.Id == (...).First().Id) instead of doing where (...).Any(i => i.Id == d.Id). This resulted in the query executing five times slower.
Sometimes you need to analyze the actual Sql-query to realise the mistakes you make.
Its always a good think to learn the underlying language for stuff like Linq To SQL. SQL is pretty much standardized and it will help you understand a new paradigm in programming.
You may not always be working in .NET.
Doesn't hurt to know the underlying concepts.
LINQ to SQL is not being maintained anymore in favor of the Entity Framework
Sooner or later you will run into problems that need at leat a working knowledge of SQL to solve. And sooner or later you will run into requirements that are best realised in the DB (whether in SP-s or in triggers or views or whaterver).
LINQ To SQL will only work with .NET. IF you happen get another job where you are not working with .NET, then you will have to go back to writing Stored Procs.
Knowing SQL will also give you a better understanding of how the server operates as well as possibly making you a better database designer.

Resources