Updating Multiple Rows in DB in Rails, each with Unique values - ruby-on-rails

I have extensively researched this, and I can't seem to find the answer I need.
I am familiar with Rails transactions, but a transaction in this case would execute several queries and I would rather not do that.
In a single query, how can I update the same column on multiple rows with unique values?
Ex:
update_hash = {1: 'Bandits on the High Road', 2: 'Broccoli: The Menace'}
Books.where(<id_is_in_update_hash_keys>).each do |b|
matching_hash_key = b.id
new_title = update_hash[:matching_hash_key].value
# problem here because each update is a query
b.update(title: new_title)
end
Of course, I could wrap it in a transaction, but 10k books still call 10k queries. I use Postgresql, but I don't know the correct, idiomatic way to update that field for multiple objects in a single query. The data has been pre-vetted so there will never be a need to run validations.
If anyone knows either the Rails code to execute, or more likely the Postgresql query that I need to generate, I would be very grateful.

With PostgreSQL it's possible with a query like this one:
update_hash = { 1: 'Bandits on the High Road', 2: 'Broccoli: The Menace' }
values = update_hash.map { |k, v| "(#{k}, #{ActiveRecord::Base.connection.quote(v)})" }.join(', ')
query = "
UPDATE books T
SET title = uv.new_title
FROM (VALUES #{values}) AS uv (id, new_title)
WHERE T.id = uv.id::int"
ActiveRecord::Base.connection.execute(query)

Related

How to sanitise multiple variables into SQL query ActiveRecord Rails

In our application, the Recipe model has many ingredients (many-to-many relationship implemented using :through). There is a query to return all the recipes where at least one ingredient from the list is contained (using ILIKE or SIMILAR TO clause). I would like to pose two questions:
What is the cleanest way to write the query which will return this in Rails 6 with ActiveRecord. Here is what we ended up with
ingredients_clause = '%(' + params[:ingredients].map { |i| i.downcase }.join("|") + ')%'
recipes = recipes.where("LOWER(ingredients.name) SIMILAR TO ?", ingredients_clause)
Note that recipes is already created before this point.
However, this is a bit dirty solution.
I also tried to use ILIKE = any(array['ing1', 'ing2',..]) with the following:
ingredients_clause = params[:ingredients].map { |i| "'%#{i}%'" }.join(", ")
recipes = recipes.where("ingredients.name ILIKE ANY(ARRAY[?])", ingredients_clause)
This won't work since ? automatically adds single quotes so it would be
ILIKE ANY (ARRAY[''ing1', 'ing2', 'ing3'']) which is of course wrong.
Here, ? is used to sanitise parameters for SQL query, so avoid possible SQL injection attacks. That is why I don't want to write a plain query formed from params.
Is there any better way to do this?
What is the best approach to order results by the number of ingredients that are matched? For example, if I search for all recipes that contains ingredients ing1 and ing2 it should return those which contains both before those which contains only one ingredient.
Thanks in advance
For #1, a possible solution would be something like (assuming the ingredients table is already joined):
recipies = recipies.where(Ingredients.arel_table[:name].lower.matches_any(params[:ingredients]))
You can find more discussion on this kind of topic here: Case-insensitive search in Rails model
You can access a lot of great SQL query features via #arel_table.
#2 If we assume all the where clauses are applied to recipies already:
recipies = recipies
.group("recipies.id")
# Lets Rails know you meant to put a raw SQL expression here
.order(Arel.sql("count(*) DESC"))

I need advice in speeding up this rails method that involves many queries

I'm trying to display a table that counts webhooks and arranges the various counts into cells by date_sent, sending_ip, and esp (email service provider). Within each cell, the controller needs to count the webhooks that are labelled with the "opened" event, and the "sent" event. Our database currently includes several million webhooks, and adds at least 100k per day. Already this process takes so long that running this index method is practically useless.
I was hoping that Rails could break down the enormous model into smaller lists using a line like this:
#today_hooks = #m_webhooks.where(:date_sent => this_date)
I thought that the queries after this line would only look at the partial list, instead of the full model. Unfortunately, running this index method generates hundreds of SQL statements, and they all look like this:
SELECT COUNT(*) FROM "m_webhooks" WHERE "m_webhooks"."date_sent" = $1 AND "m_webhooks"."sending_ip" = $2 AND (m_webhooks.esp LIKE 'hotmail') AND (m_webhooks.event LIKE 'sent')
This appears that the "date_sent" attribute is included in all of the queries, which implies that the SQL is searching through all 1M records with every single query.
I've read over a dozen articles about increasing performance in Rails queries, but none of the tips that I've found there have reduced the time it takes to complete this method. Thank you in advance for any insight.
m_webhooks.controller.rb
def index
def set_sub_count_hash(thip) {
gmail_hooks: {opened: a = thip.gmail.send(#event).size, total_sent: b = thip.gmail.sent.size, perc_opened: find_perc(a, b)},
hotmail_hooks: {opened: a = thip.hotmail.send(#event).size, total_sent: b = thip.hotmail.sent.size, perc_opened: find_perc(a, b)},
yahoo_hooks: {opened: a = thip.yahoo.send(#event).size, total_sent: b = thip.yahoo.sent.size, perc_opened: find_perc(a, b)},
other_hooks: {opened: a = thip.other.send(#event).size, total_sent: b = thip.other.sent.size, perc_opened: find_perc(a, b)},
}
end
#m_webhooks = MWebhook.select("date_sent", "sending_ip", "esp", "event", "email").all
#event = params[:event] || "unique_opened"
#m_list_of_ips = [#List of three ip addresses]
end_date = Date.today
start_date = Date.today - 10.days
date_range = (end_date - start_date).to_i
#count_array = []
date_range.times do |n|
this_date = end_date - n.days
#today_hooks = #m_webhooks.where(:date_sent => this_date)
#count_array[n] = {:this_date => this_date}
#m_list_of_ips.each_with_index do |ip, index|
thip = #today_hooks.where(:sending_ip => ip) #Stands for "Today Hooks ip"
#count_array[n][index] = set_sub_count_hash(thip)
end
end
Well, your problem is very simple, actually. You gotta remember that when you use where(condition), the query is not straight executed in the DB.
Rails is smart enough to detect when you need a concrete result (a list, an object, or a count or #size like in your case) and chain your queries while you don't need one. In your code, you keep chaining conditions to the main query inside a loop (date_range). And it gets worse, you start another loop inside this one adding conditions to each query created in the first loop.
Then you pass the query (not concrete yet, it was not yet executed and does not have results!) to the method set_sub_count_hash which goes on to call the same query many times.
Therefore you have something like:
10(date_range) * 3(ip list) * 8 # (times the query is materialized in the #set_sub_count method)
and then you have a problem.
What you want to do is to do the whole query at once and group it by date, ip and email. You should have a hash structure after that, which you would pass to the #set_sub_count method and do some ruby gymnastics to get the counts you're looking for.
I imagine the query something like:
main_query = #m_webhooks.where('date_sent > ?', 10.days.ago.to_date)
.where(sending_ip:#m_list_of_ips)
Ok, now you have one query, which is nice, but I think you should separate the query in 4 (gmail, hotmail, yahoo and other), which gives you 4 queries (the first one, the main_query, will not be executed until you call for materialized results, don forget it). Still, like 100 times faster.
I think this is the result that should be grouped, mapped and passed to #set_sub_count instead of passing the raw query and calling methods on it every time and many times. It will be a little work to do the grouping, mapping and counting for sure, but hey, it's faster. =)
In case this helps anybody else, I learned how to fill a hash with counts in a much simpler way. More importantly, this approach runs a single query (as opposed to the 240 queries that I was running before).
#count_array[esp_index][j] = MWebhook.where('date_sent > ?', start_date.to_date)
.group('date_sent', 'sending_ip', 'event', 'esp').count

Active Record - Chain Queries with OR

Rails: 4.1.2
Database: PostgreSQL
For one of my queries, I am using methods from both the textacular gem and Active Record. How can I chain some of the following queries with an "OR" instead of an "AND":
people = People.where(status: status_approved).fuzzy_search(first_name: "Test").where("last_name LIKE ?", "Test")
I want to chain the last two scopes (fuzzy_search and the where after it) together with an "OR" instead of an "AND." So I want to retrieve all People who are approved AND (whose first name is similar to "Test" OR whose last name contains "Test"). I've been struggling with this for quite a while, so any help would be greatly appreciated!
I digged into fuzzy_search and saw that it will be translated to something like:
SELECT "people".*, COALESCE(similarity("people"."first_name", 'test'), 0) AS "rankxxx"
FROM "people"
WHERE (("people"."first_name" % 'abc'))
ORDER BY "rankxxx" DESC
That says if you don't care about preserving order, it will just filter the result by WHERE (("people"."first_name" % 'abc'))
Knowing that and now you can simply write the query with similar functionality:
People.where(status: status_approved)
.where('(first_name % :key) OR (last_name LIKE :key)', key: 'Test')
In case you want order, please specify what would you like the order will be after joining 2 conditions.
After a few days, I came up with the solution! Here's what I did:
This is the query I wanted to chain together with an OR:
people = People.where(status: status_approved).fuzzy_search(first_name: "Test").where("last_name LIKE ?", "Test")
As Hoang Phan suggested, when you look in the console, this produces the following SQL:
SELECT "people".*, COALESCE(similarity("people"."first_name", 'test'), 0) AS "rank69146689305952314"
FROM "people"
WHERE "people"."status" = 1 AND (("people"."first_name" % 'Test')) AND (last_name LIKE 'Test') ORDER BY "rank69146689305952314" DESC
I then dug into the textacular gem and found out how the rank is generated. I found it in the textacular.rb file and then crafted the SQL query using it. I also replaced the "AND" that connected the last two conditions with an "OR":
# Generate a random number for the ordering
rank = rand(100000000000000000).to_s
# Create the SQL query
sql_query = "SELECT people.*, COALESCE(similarity(people.first_name, :query), 0)" +
" AS rank#{rank} FROM people" +
" WHERE (people.status = :status AND" +
" ((people.first_name % :query) OR (last_name LIKE :query_like)))" +
" ORDER BY rank#{rank} DESC"
I took out all of quotation marks in the SQL query when referring to tables and fields because it was giving me error messages when I kept them there and even if I used single quotes.
Then, I used the find_by_sql method to retrieve the People object IDs in an array. The symbols (:status, :query, :query_like) are used to protect against SQL injections, so I set their values accordingly:
# Retrieve all the IDs of People who are approved and whose first name and last name match the search query.
# The IDs are sorted in order of most relevant to the search query.
people_ids = People.find_by_sql([sql_query, query: "Test", query_like: "%Test%", status: 1]).map(&:id)
I get the IDs and not the People objects in an array because find_by_sql returns an Array object and not a CollectionProxy object, as would normally be returned, so I cannot use ActiveRecord query methods such as where on this array. Using the IDs, we can execute another query to get a CollectionProxy object. However, there's one problem: If we were to simply run People.where(id: people_ids), the order of the IDs would not be preserved, so all the relevance ranking we did was for nothing.
Fortunately, there's a nice gem called order_as_specified that will allow us to retrieve all People objects in the specific order of the IDs. Although the gem would work, I didn't use it and instead wrote a short line of code to craft conditions that would preserve the order.
order_by = people_ids.map { |id| "people.id='#{id}' DESC" }.join(", ")
If our people_ids array is [1, 12, 3], it would create the following ORDER statement:
"people.id='1' DESC, people.id='12' DESC, people.id='3' DESC"
I learned from this comment that writing an ORDER statement in this way would preserve the order.
Now, all that's left is to retrieve the People objects from ActiveRecord, making sure to specify the order.
people = People.where(id: people_ids).order(order_by)
And that did it! I didn't worry about removing any duplicate IDs because ActiveRecord does that automatically when you run the where command.
I understand that this code is not very portable and would require some changes if any of the people table's columns are modified, but it works perfectly and seems to execute only one query according to the console.

Modifying the returned value of find_by_sql

So I am pulling my hair over this issue / gotcha. Basically I used find_by_sql to fetch data from my database. I did this because the query has lots of columns and table joins and I think using ActiveRecord and associations will slow it down.
I managed to pull the data and now I wanted to modify returned values. I did this by looping through the result ,for example.
a = Project.find_by_sql("SELECT mycolumn, mycolumn2 FROM my_table").each do |project|
project['mycolumn'] = project['mycolumn'].split('_').first
end
What I found out is that project['mycolumn'] was not changed at all.
So my question:
Does find_by_sql return an array Hashes?
Is it possible to modify the value of one of the attributes of hash as stated above?
Here is the code : http://pastie.org/4213454 . If you can have a look at summarize_roles2() that's where the action is taking place.
Thank you. Im using Rails 2.1.1 and Ruby 1.8. I can't really upgrade because of legacy codes.
Just change the method above to access the values, print value of project and you can clearly check the object property.
The results will be returned as an array with columns requested encapsulated as attributes of the model you call this method from.If you call Product.find_by_sql then the results will be returned in a Product object with the attributes you specified in the SQL query.
If you call a complicated SQL query which spans multiple tables the columns specified by the SELECT will be attributes of the model, whether or not they are columns of the corresponding table.
Post.find_by_sql "SELECT p.title, c.author FROM posts p, comments c WHERE p.id = c.post_id"
> [#<Post:0x36bff9c #attributes={"title"=>"Ruby Meetup", "first_name"=>"Quentin"}>, ...]
Source: http://api.rubyonrails.org/v2.3.8/
Have you tried
a = Project.find_by_sql("SELECT mycolumn, mycolumn2 FROM my_table").each do |project|
project['mycolumn'] = project['mycolumn'].split('_').first
project.save
end

Ruby on Rails ActiveRecord efficiency

This code should update the entire table by applying a filter to its "name" values:
entries = select('id, name').all
entries.each do |entry|
puts entry.id
update(entry.id, { :name => sanitize(entry.name) })
end
I am pretty new to Ruby on Rails and found it interesting, that my selection query is split into the single row selections:
SELECT `entries`.* FROM `entries` WHERE (`entries`.`id` = 1) LIMIT 1
SELECT `entries`.* FROM `entries` WHERE (`entries`.`id` = 2) LIMIT 1
SELECT `entries`.* FROM `entries` WHERE (`entries`.`id` = 3) LIMIT 1
...
As I understand, it's a kind of optimization, provided by Rails - to select a row only when it's needed (every cycle) and not the all entries at once.
However, is it really more efficient in this case? I mean, if I have 1000 records in my database table - is it better to make 1000 queries than a single one? If not, how can I force Rails to select more than one row per query?
Another question is: not all rows are updated by this query. Does Rails ignore the update query, if the provided values are the same which already exist (in other words, if entry.name == sanitize(entry.name))?
ActiveRecord is an abstraction layer, but when doing certain operations (especially those involving large datasets) it is useful to know what is happening underneath the abstraction layer.
This is pretty much true for all abstractions. (see Joel Spolsky's classic article on leaky abstractions: http://www.joelonsoftware.com/articles/LeakyAbstractions.html )
To deal with the case in point here, Rails provides the update_all method
Entry.find_each do |entry|
#...
end
That fetches all entries (100 per query) and exposes each entry for your pleasure.
If attributes are not changed, Rails will not perform an UPDATE query.

Resources