ActiveRecord .joins breaking other queries - ruby-on-rails

I'm writing a Rails API on top of a legacy database with tons of tables. The search feature gives users the ability to query around 20 separate columns spread across 13 tables. I have a number of queries that check the params to see if they need to return results. They look like this:
results << Company.where('city LIKE ?', "#{params[:city]}").select('id') unless params[:city].blank?
and they work fine. However, I just added another query that looks like this:
results << Company.joins("JOIN Contact ON Contact.company_id = Company.id").where("Contact.first_name LIKE ?", "%#{params[:first_name]}%").select('company_id') unless params[:first_name].blank?
and suddenly my first set of queries started returning null, rather than the list of IDs they had been returning. The query with the join works perfectly well whether the other queries are functional or not. When I comment the join query out, the previous queries start working again. Is there some reason the query with a join would break other queries on the page?

I can't think of a particular reason why the join would be breaking your previous queries however I do have some suggestions for your query overall.
Assuming you've modelled these relationships correctly you shouldn't need to define the join manually. On another note, you're not querying against the company at all so you can use an includes instead of a join - this will allow you to access its data without firing another query.
If you wanted to access company data (ie. query.company.name) use an includes like so:
Contact.includes(:company).where('first_name LIKE ?', param).select(:company_id).distinct
However it appears all you really want is an array of ID's (which exists on the contact model), because of this you can lighten things up and not include the company at all.
Contact.where('first_name LIKE ?', param).select(:company_id).distinct
Whenever you get stuck never forget to checkout the great resources over at: http://api.rubyonrails.org/ - they are an absolute life saver sometimes!

It turned out that the queries with a join needed to be placed above the queries without a join. I'm not sure why it behaves this way, but hopefully this helps someone else down the line.

Related

How do I join multiple hive queries?

I am trying to join a simple query with a very ugly query that resolves to a single line. They have a date and a userid in common but nothing else. Alone both queries work but for the life of me I cannot get them to work together. Can someone assist me in how I would do this?
Fixed it...when you union queries in hive it looks like you need to have an equal number of fields coming back from each.

Propel query with nested statements and empty field value

I have quite a complex SQL query which I would like to transform into Propel but I am not sure about the best approach.
The query I need looks like this:
SELECT id_loan
FROM loan loanA
JOIN loan_funding on fk_loan = loanA.id_loan
JOIN `user` userA on loan_funding.fk_user = userA.id_user
WHERE
userA.`acc_internal_account_id` is not null
AND loanA.`state` = 'payment_origination'
AND loanA.id_loan IN (
SELECT id_loan from loan loanB
JOIN loan_funding on fk_loan = id_loan
JOIN `user` userB on loan_funding.fk_user = userB.id_user
WHERE
userB.`acc_internal_account_id` is null
AND loanB.`state` = 'payment_origination'
GROUP BY loanB.id_loan
)
GROUP BY loanA.id_loan
LIMIT 1;
What I would like to have is something completely based on the Generated Query Methods but I do not quite get how to do it.
Performance is not an issue but as for now it is unclear where and how those queries will be called from. However, it is important to get back an object as we need to use the getters and setters.
I found this website: http://propelorm.org/blog/2011/02/02/how-can-i-write-this-query-using-an-orm-.html which looks really cool and helpful, however, I am not sure what option fits best here.
I do not expect a complete solution but maybe some thoughts how to narrow down the problem...
What confuses me is especially the part where it compares the id_loan and fk_loan before it goes to the user table. How would this relationship be represented by propel? Might it be better to split the whole thing in multiple queries?
Any hints appreciated!

Performing multiple queries on the same model efficiently

I've been going round in circles for a few days trying to solve a problem which I've also struggled with in the past. Essentially its an issue of understanding the best (or an efficient) way to perform multiple queries on a model as I'm regularly finding my pages are very slow to load.
Consider the situation where you have a model called Everything. Initially you perform a query which finds those records in Everything which match certain criteria
#chosenrecords = Everything.where('name LIKE ?', 'What I want').order('price ASC')
I want to remember the contents of #chosenrecords as I will present them to the user as a list, however, I would also like to understand more of the attributes of #chosenrecords,for instance
#minimumprice = #chosenrecords.first
#numberofrecords = #chosenrecords.count
When I use the above code in my controller and inspect the command history on the local server, I am surprised to find that each of the three queries involves an SQL query on the original Everything model, rather than remembering the records returned in #chosenrecords and performing the query on that. This seems very inefficient to me and indeed each of the three queries takes the same amount of time to process, making the page perform slowly.
I am more experienced in writing codes in software like MATLAB where once you've calculated the value of a variable it is stored locally and can be quickly interrogated, rather than recalculating that variable on each occasion you want to know more information about it. Please could you guide me as to whether I am just on the wrong track completely and the issues I've identified are just "how it is in Rails" or whether there is something I can do to improve it. I've looked into concepts like using a scope, defining a different variable type, and caching, but I'm not quite sure what I'm doing in each case and keep ending up in a similar hole.
Thanks for your time
You are partially on the wrong track. Rails 3 comes with Arel, which defer the query until data is required. In your case, you have generated Arel query but executing it with .first & then with .count. What I have done here is run the first query, get all the results in an array and working on that array in next two lines.
Perform the queries like this:-
#chosenrecords = Everything.where('name LIKE ?', 'What I want').order('price ASC').all
#minimumprice = #chosenrecords.first
#numberofrecords = #chosenrecords.size
It will solve your issue.

Performance of generated T-SQL from Entity Framework

I recently used Entity Framework for a project, despite my DBA's strong disapproval. So one day he came to my office complaining about generated T-SQL that reaches his database.
For instance, when I want to select a product based on the id, I write something like this:
context.Products.FirstOrDefault(p=>p.Id==id);
Which translates to
SELECT ... FROM (SELECT TOP 1 ... FROM PRODUCTS WHERE ID=#id)
So he is shouting, "Why on earth would you write a SELECT * FROM (SELECT TOP 1)"
So I changed my code to
context.Products.Where(p=>p.Id==id).ToList().FirstOrDefault()
and this produces a much cleaner T-SQL:
SELECT ... FROM PRODUCTS WHERE ID=#id
The inner query and the TOP 1 dissappeared. Enough mambling, my question is this: Does the first query really put an overhead for SQL Server? Is it harder to parse than the second method? The Id column has a Clustered index on. I want a good answer so I can rub it on his face (or mine)
Thanks,
Themos
Have you tried running the queries manually and comparing the executions plans?
The biggest problem here isn't that the SQL isn't perfectly formed to your DBA's standards (although I'm fairly certain that the query engine will optimize out the extra select). The second query actually returns the entire contents of the Products table which you then analyse in memory and this is definitely a task that should be performed by the DB and not the application layer.
In short, he's being a pedant; leave it the way it was.

Search a relation without a second query

My question is about how to perform varying levels of search into a database while limiting the number of queries.
Let's start simple:
#companies = Company.where("active = ?", true)
Let's say we display records from this set. Then, we need:
#clientcompanies = #companies.where("client_id = ?", #client.id)
We display something from #clientcompanies. Then, we want to drill down further.
#searchcompanies = #clientcompanies.where("name LIKE ? OR notes LIKE ?", "#{params[:search]}%", "#{params[:search]}%")
Are these three statements the most efficient way to go about this?
If indeed the database is starting with the entire Company table each time around, is there a way to limit the scope so each of the above statements would take a shorter amount of time as the size of the set diminishes?
In case it matters, I'm running Rails 3 on both MySQL and PostgreSQL.
It doesn't get much more optimized then what you're already doing. Exactly zero of those statements will execute a SQL query until you try to iterate over the results. Calling methods like all, first, inspect, any?, each etc will be when the query is executed.
Each time you chain on a new where or other arel method, it appends to the sql query that it'll execute at the end. If, somewhere in the middle, you want to see the query that'll be executed you can do puts #searchcompanies.to_sql
Note that if you run these commands in the console each statement appears to run a SQL query only because the console automatically runs .inspect on the line you entered.
Hopefully I answered your question :)
There's a great railscast here: http://railscasts.com/episodes/239-activerecord-relation-walkthrough that explains how ActiveRelation works, and what you can do with it.
EDIT:
I may have mis-understood your question. You indicated that after each where call you were displaying information from the query. What's the use-case for this? Are you displaying all companies on the same page that you have filtered-out companies from a search? If you display something from that very first query then you will be pulling every single company row from your database (which is not going to be very scalable or performant at larger quantities of company entries).
Would it not make sense to only display information from the #searchcompanies variable?

Resources