Splitting an Array in Ruby - ruby-on-rails

I am fetching some data from the database as below in my ruby file:
#main1= $connection.execute("SELECT * FROM builds
WHERE platform_type LIKE 'TOTAL';")
#main2= $connection.execute("SELECT * FROM builds
WHERE platform_type NOT LIKE 'TOTAL';")
After doing this I am performing hashing and a bunch of other stuff on these results. To be clear, this does not return an array as such, but it returns some mysql2 type object. So I just map it to 2 arrays to be safe:
#arr1 = Array.new
#arr1 = #main1.map
#arr2 = Array.new
#arr2 = #main2.map
Is there any way to avoid executing 2 different queries and getting all the results in 2 different arrays by executing just one query. I basically want to split the results into 2 arrays, the first one having platform_type = TOTAL and everything else in the other one.

Also without getting into why you're doing what you're doing, I would use Enumerable#partition as such:
rows = $connection.execute('SELECT * FROM builds')
like_total, not_like_total = rows.partition { |row|
row['platform_type'] =~ /TOTAL/
}
Note that, IIRC, SQL LIKE 'TOTAL' isn't the same as Ruby's "string" =~ /TOTAL/ (which is more like LIKE '%TOTAL%' in SQL—am not sure what you need).

To answer your question without getting into why you're doing it like that:
Return them all in one query, with the extra criteria, then you can group them however you want with group_by:
all_results = $connection.execute("SELECT *, platform_type LIKE 'TOTAL' as is_like_total FROM builds").
This will give each of your results an 'is_like_total' "column" that you can group_by on.
http://ruby-doc.org/core-2.0/Enumerable.html#method-i-group_by

Related

Updating Multiple Rows in DB in Rails, each with Unique values

I have extensively researched this, and I can't seem to find the answer I need.
I am familiar with Rails transactions, but a transaction in this case would execute several queries and I would rather not do that.
In a single query, how can I update the same column on multiple rows with unique values?
Ex:
update_hash = {1: 'Bandits on the High Road', 2: 'Broccoli: The Menace'}
Books.where(<id_is_in_update_hash_keys>).each do |b|
matching_hash_key = b.id
new_title = update_hash[:matching_hash_key].value
# problem here because each update is a query
b.update(title: new_title)
end
Of course, I could wrap it in a transaction, but 10k books still call 10k queries. I use Postgresql, but I don't know the correct, idiomatic way to update that field for multiple objects in a single query. The data has been pre-vetted so there will never be a need to run validations.
If anyone knows either the Rails code to execute, or more likely the Postgresql query that I need to generate, I would be very grateful.
With PostgreSQL it's possible with a query like this one:
update_hash = { 1: 'Bandits on the High Road', 2: 'Broccoli: The Menace' }
values = update_hash.map { |k, v| "(#{k}, #{ActiveRecord::Base.connection.quote(v)})" }.join(', ')
query = "
UPDATE books T
SET title = uv.new_title
FROM (VALUES #{values}) AS uv (id, new_title)
WHERE T.id = uv.id::int"
ActiveRecord::Base.connection.execute(query)

I need advice in speeding up this rails method that involves many queries

I'm trying to display a table that counts webhooks and arranges the various counts into cells by date_sent, sending_ip, and esp (email service provider). Within each cell, the controller needs to count the webhooks that are labelled with the "opened" event, and the "sent" event. Our database currently includes several million webhooks, and adds at least 100k per day. Already this process takes so long that running this index method is practically useless.
I was hoping that Rails could break down the enormous model into smaller lists using a line like this:
#today_hooks = #m_webhooks.where(:date_sent => this_date)
I thought that the queries after this line would only look at the partial list, instead of the full model. Unfortunately, running this index method generates hundreds of SQL statements, and they all look like this:
SELECT COUNT(*) FROM "m_webhooks" WHERE "m_webhooks"."date_sent" = $1 AND "m_webhooks"."sending_ip" = $2 AND (m_webhooks.esp LIKE 'hotmail') AND (m_webhooks.event LIKE 'sent')
This appears that the "date_sent" attribute is included in all of the queries, which implies that the SQL is searching through all 1M records with every single query.
I've read over a dozen articles about increasing performance in Rails queries, but none of the tips that I've found there have reduced the time it takes to complete this method. Thank you in advance for any insight.
m_webhooks.controller.rb
def index
def set_sub_count_hash(thip) {
gmail_hooks: {opened: a = thip.gmail.send(#event).size, total_sent: b = thip.gmail.sent.size, perc_opened: find_perc(a, b)},
hotmail_hooks: {opened: a = thip.hotmail.send(#event).size, total_sent: b = thip.hotmail.sent.size, perc_opened: find_perc(a, b)},
yahoo_hooks: {opened: a = thip.yahoo.send(#event).size, total_sent: b = thip.yahoo.sent.size, perc_opened: find_perc(a, b)},
other_hooks: {opened: a = thip.other.send(#event).size, total_sent: b = thip.other.sent.size, perc_opened: find_perc(a, b)},
}
end
#m_webhooks = MWebhook.select("date_sent", "sending_ip", "esp", "event", "email").all
#event = params[:event] || "unique_opened"
#m_list_of_ips = [#List of three ip addresses]
end_date = Date.today
start_date = Date.today - 10.days
date_range = (end_date - start_date).to_i
#count_array = []
date_range.times do |n|
this_date = end_date - n.days
#today_hooks = #m_webhooks.where(:date_sent => this_date)
#count_array[n] = {:this_date => this_date}
#m_list_of_ips.each_with_index do |ip, index|
thip = #today_hooks.where(:sending_ip => ip) #Stands for "Today Hooks ip"
#count_array[n][index] = set_sub_count_hash(thip)
end
end
Well, your problem is very simple, actually. You gotta remember that when you use where(condition), the query is not straight executed in the DB.
Rails is smart enough to detect when you need a concrete result (a list, an object, or a count or #size like in your case) and chain your queries while you don't need one. In your code, you keep chaining conditions to the main query inside a loop (date_range). And it gets worse, you start another loop inside this one adding conditions to each query created in the first loop.
Then you pass the query (not concrete yet, it was not yet executed and does not have results!) to the method set_sub_count_hash which goes on to call the same query many times.
Therefore you have something like:
10(date_range) * 3(ip list) * 8 # (times the query is materialized in the #set_sub_count method)
and then you have a problem.
What you want to do is to do the whole query at once and group it by date, ip and email. You should have a hash structure after that, which you would pass to the #set_sub_count method and do some ruby gymnastics to get the counts you're looking for.
I imagine the query something like:
main_query = #m_webhooks.where('date_sent > ?', 10.days.ago.to_date)
.where(sending_ip:#m_list_of_ips)
Ok, now you have one query, which is nice, but I think you should separate the query in 4 (gmail, hotmail, yahoo and other), which gives you 4 queries (the first one, the main_query, will not be executed until you call for materialized results, don forget it). Still, like 100 times faster.
I think this is the result that should be grouped, mapped and passed to #set_sub_count instead of passing the raw query and calling methods on it every time and many times. It will be a little work to do the grouping, mapping and counting for sure, but hey, it's faster. =)
In case this helps anybody else, I learned how to fill a hash with counts in a much simpler way. More importantly, this approach runs a single query (as opposed to the 240 queries that I was running before).
#count_array[esp_index][j] = MWebhook.where('date_sent > ?', start_date.to_date)
.group('date_sent', 'sending_ip', 'event', 'esp').count

Parsing a PostgreSQL result object in a Rails app

I am writing an app that needs to quickly process hundreds of thousands of rows of data, so I've looked into nesting raw SQL in my Ruby code using ActiveRecord::Base.connection.execute, which is working beautifully. However whenever I run it I get the following Object as a result:
#<PG::Result:0x007fe158ab18c8 status=PGRES_TUPLES_OK ntuples=0 nfields=1 cmd_tuples=0>
I've googled around and can't find a way to parse the PG Result into something actually useful. Is there any built-in PG way to do this, or a workaround, or anything really?
Here is the query I'm using:
SELECT row_to_json(row(company_name, ccn_short_title, title))
FROM contents
WHERE contents.company_name = '#{company_name}'
AND contents.title = '#{title}';
Actually PG::Result responds to many well-known methods from Enumerable module. You can output them all to watch for the desired ones:
query = "SELECT row_to_json(row) from (select * from users) row"
result = ActiveRecord::Base.connection.execute(query)
result.methods - Object.methods
# => returns an array of methods which can be used
For example, you could iterate the results and map them to something more suitable...
result.map do |row|
JSON.parse(row["row_to_json"])
end
# => returns familiar hashes
Get a desired result hash by its index...
result[0]
And much more.

How to speed up these ActiveRecord queries?

I have three tables in my database: values, keys, and sources. Both keys and sources have many values. sources has a datetime column called file_date. For each key, I have to select all values that have a source that is between a specific date range (usually within two years). Not all dates within that range have a source, and not all sources have a value. I need to create an array that contains all values from within that range.
I at first tried to simply query the entire sources array at once, like so:
Value.where(key: 1, source: sources_array)
However, since there are several values that are nil, it simply returned records that have a value at that date.
So now, I've created an array containing all of the sources between that date range. Days that don't have a source simply have nil. Then, for each key, I iterate through the sources array, returning the value that matches that foo and source. That is obviously not ideal, and it takes about 7 seconds for the page to load.
Here is that code:
sources.map do |source|
Value.where(source: source, key: key)
end
Any ideas?
You could also generate a single SQL query if you only need to retrieve the data and getting back arrays will suffice.
output = []
sources.each do |source|
output << "select * from values where (source like #{source} and key like #{key})"
end
result = ActiveRecord::Base.connection.execute(output.join(" union ")).to_a
The big issue with your .map is that it runs several queries, instead of one query with all the results you want. With each additional query, you add the overhead of sending the request to the database and receiving the response back. What you need to do is generate the query in a way that it returns all the results you may be looking for, and then dealing with any grouping or sorting application side.
You could try wrapping your queries in a single transaction:
ActiveRecord::Base.transaction do
sources.map do |source|
Value.where(source: source, key: key)
end
end
I'll need to refactor this like crazy, but I have something that has cut it down to 3 seconds:
source_ids = sources.map do |source|
if source
source.id
else
nil
end
end
values = Value.where(source: sources, key: key)
value_sources = values.pluck(:source_id)
value_decimals = values.pluck(:value_decimal)
source_ids.map do |id|
index = value_sources.index(id)
if index
value_decimals[index]
else
nil
end
end

large data in array

I am developing a Rails app. I would like to use an array to hold 2,000,000 data, then insert the data into database like following:
large_data = Get_data_Method() #get 2,000,000 raw data
all_values = Array.new
large_data.each{ |data|
all_values << data[1] #e.g. data[1] has the format "(2,'john','2002-09-12')"
}
sql="INSERT INTO cars (id,name,date) VALUES "+all_values.join(',')
ActiveRecord::Base.connection.execute(sql)
When I run the code, it takes a long long time at the point of large_data.each{...} . Actually I am now still waiting for it to finish(it has been running for 1 hour already still not finish the large_data.each{...} part).
Is it because of the number of elements is too large for the ruby array that the array can not hold 2,000,000 elements ? or ruby array can hold that much elements and it is reasonable to wait this long?
Since I would like to use bulk insertion in SQL to speed up the large data insertion time in mysql database, so I would like to use only one INSERT INTO statement, that's why I did the above thing. If this is a bad design, can you recommand me a better way?
Some notes:
Don't use the pattern "empty array + each + push", use Enumerable#map.
all_values = large_data.map { |data| data[1] }
Is it possible to write get_data to return items lazily? if the answer is yes, check enumerators and use them to do batched inserts into the database instead of puting all objects at once. Something like this:
def get_data
Enumerator.new do |yielder|
yielder.yield some_item
yielder.yield another_item
# yield all items.
end
end
get_data.each_slice(1000) do |data|
# insert those 1000 elements into the database
end
That said, there're projects for doing efficient bulk insertions, check ar-extensions and activerecord-import for Rails >= 3.
An array of 2m items is never going to be the easyist thing to manage, have you taken a look at MongoDB, this is a database which can be accessed just like an array and could be the answer to your issues.
An easy fix would be to split your inserts into blocks of 1000, that would make the whole process more manageable.

Resources