How to do this computation-heavy query on millions of rows - ruby-on-rails

I am using Idempotence to make sure the same Message doesn't get saved to the DB more than once. To ensure this, I need a combination of 3 columns. Instead of indexing on 3 columns where one might be null, I instead do a calculation and Digest and store that on a column that is indexed and unique.
I now need to apply this calculation to all previous messages, for which there are millions of rows.
Message.rb:
def set_unique_identifier
part_one = mm_id || SecureRandom.uuid
part_two = c_id
part_three = s_id
self.unique_identifier = Digest::SHA1.hexdigest("#{part_one}-#{part_two}-#{part_three}")
end
and then I have a migration like so:
Message.find_each.with_index do |message, index|
message.set_unique_identifier
message.save
puts "SETTING UNIQUE IDENTIFIER FOR #{index}" if index % 1000 == 0
end
however, obviously, this is going to take a really long time to compute. is there a faster way to do this using raw SQL?

Well you're going to have a certain level of computation involved no matter what the solution with a million rows. What you can do is reduce the movement of data. Postgresql's encrypt module has support for SHA1 hashing and UUID generation.
Using those you can use keep the logic in the server and execute it as single SQL statement, or multiple statements if you want to do it in chunks.
UPDATE message SET unique_identifier = encode(digest(
mm_id || gen_random_uuid() || '-' || c_id || '-' || s_id
,'sha1'),'hex');
However, what you're doing won't actually check for uniqueness because the random component means that two messages with same mm_id,c_id,s_id could be allowed.
You'd be best off using a unique database constraint. You can either create a unique index on the raw columns.
CREATE UNIQUE INDEX ON message(mm_id,c_id,s_id);
and rely on postgres to handle that. This is what I'd do first and not worry about performance issues until you've tried it that way and can measure performance.
An alternative is to create an index on a function. It will operate in about the same way:
CREATE UNIQUE INDEX ON message (encode(digest(mm_id || c_id || s_id,'sha1'),'hex'));

Related

How to iterate over an ActiveRecord resultset in one line with nil check in Ruby

I have an array of Active Record result and I want to iterate over each record to get a specific attribute and add all of them in one line with a nil check. Here is what I got so far
def total_cost(cost_rec)
total= 0.0
unless cost_rec.nil?
cost_rec.each { |c| total += c.cost }
end
total
end
Is there an elegant way to do the same thing in one line?
You could combine safe-navigation (to "hide" the nil check), summation inside the database (to avoid pulling a bunch of data out of the database that you don't need), and a #to_f call to hide the final nil check:
cost_rec&.sum(:cost).to_f
If the cost is an integer, then:
cost_rec&.sum(:cost).to_i
and if cost is a numeric inside the database and you don't want to worry about precision issues:
cost_rec&.sum(:cost).to_d
If cost_rec is an array rather than a relation (i.e. you've already pulled all the data out of the database), then one of:
cost_rec&.sum(&:cost).to_f
cost_rec&.sum(&:cost).to_i
cost_rec&.sum(&:cost).to_d
depending on what type cost is.
You could also use Kernel#Array to ignore nils (since Array(nil) is []) and ignore the difference between arrays and ActiveRecord relations (since #Array calls #to_ary and relations respond to that) and say:
Array(cost_rec).sum(&:cost)
that'll even allow cost_rec to be a single model instance. This also bypasses the need for the final #to_X call since [].sum is 0. The downside of this approach is that you can't push the summation into the database when cost_rec is a relation.
anything like these?
def total_cost(cost_rec)
(cost_rec || []).inject(0) { |memo, c| memo + c.cost }
end
or
def total_cost(cost_rec)
(cost_rec || []).sum(&:cost)
end
Either one of these should work
total = cost_rec.map(&:cost).compact.sum
total = cost_rec.map{|c| c.cost }.compact.sum
total = cost_rec.pluck(:cost).compact.sum
Edit: if cost_rec is nil
total = (cost_rec || []).map{|c| c.cost }.compact.sum
When cost_rec is an ActiveRecord::Relatation then this should work out of the box:
cost_rec.sum(:cost)
See ActiveRecord::Calculations#sum.

How to create serial number for ruby on rails?

I want to create ticket number by serial number, eg. T-0001, T-0002, T-0003,
for ruby on rails project. How to make this?
Admission.transaction do
cus = #admission.customer
cus.inpatient_id = cus.inpatient_id || "I-%.6d" % cus.id
cus.save
end
Most rails servers are multi-threaded. Meaning many requests will be processed in parallel. You can imagine two processes trying to create a new serial number at the same point in time - duplicate ticket numbers! - not what we expect for sure.
It is better we delegate this task of creating ids to the database itself. So instead of the default auto-increment ids (1,2,3,4...), we will tell database to create ids in this format (T-0001, T-0002, ...). This can be achieved using custom sequences. I am assuming postgres database here, but should be same for mysql.
First create sequence
CREATE SEQUENCE ticket_seq;
But sequences don't allow strings so we convert them to strings and format them:
SELECT 'T-'||to_char(nextval('ticket_seq'), 'FM0000');
This will return values like T-0001, T-0002 ...
Note: We have just created a sequence, you need to tell database to use this sequence instead.
Check: https://stackoverflow.com/a/10736871/3507206
here is just sample to generate your required formatted series on range:
> (0..5).map{|e| "T-#{e.to_s.rjust(4, "0")}"}
#=> ["T-0000", "T-0001", "T-0002", "T-0003", "T-0004", "T-0005"]
If you are using PG / MySQL you can use object's id for unique number (ID- primary key is always serialize and unique)
UPDATE: as per OP's comment:
Admission.transaction do
cus = #admission.customer
cus.inpatient_id = cus.inpatient_id || "T-#{cus.id.to_s.rjust(4, "0")}"
cus.save
end

Passing params to sql query

So, in my rails app I developed a search filter where I am using sliders. For example, I want to show orders where the price is between min value and max value which comes from the slider in params. I have column in my db called "price" and params[:priceMin], params[:priceMax]. So I can't write something kinda MyModel.where(params).... You may say, that I should do something like MyModel.where('price >= ? AND price <= ?', params[:priceMin], params[:priceMax]) but there is a problem: the number of search criteria depends on user desire, so I don't know the size of params hash that passes to query. Are there any ways to solve this problem?
UPDATE
I've already done it this way
def query_senders
query = ""
if params.has_key?(:place_from)
query += query_and(query) + "place_from='#{params[:place_from]}'"
end
if params.has_key?(:expected_price_min) and params.has_key?(:expected_price_max)
query += query_and(query) + "price >= '#{params[:expected_price_min]}' AND price <= '#{params[:expected_price_max]}'"
end...
but according to ruby guides (http://guides.rubyonrails.org/active_record_querying.html) this approach is bad because of SQL injection danger.
You can get the size of params hash by doing params.count. By the way you described it, it still seems that you will know what parameters can be passed by the user. So just check whether they're present, and split the query accordingly.
Edited:
def query_string
return = {}
if params[:whatever].present?
return.merge({whatever: #{params[:whatever]}}"
elsif ...
end
The above would form a hash for all of the exact values you're searching for, avoiding SQL injection. Then for such filters as prices you can just check whether the values are in correct format (numbers only) and only perform if so.

Ruby on Rails: how to assign a random number to users without storing it?

How can I assign a table of users a random number between 1 and 9 without needing to store it in the db (to recall it later).
Is there some way to hash their user_id into returning a number between a range (and then get the same number for that user every time that function would be called).
I know the following is not an optimal way to do this, but it works and is guaranteed to return the same random number between 1 and 9, which will be unique for each user, i.e. you wont need to store it in your database:
require 'digest/md5'
def unique_number_for(user)
hash = (Digest::MD5.new << user.id.to_s).to_s
hash.split("").map(&:to_i).detect {|a| a > 0}
end
The obvious solution:
id.to_s[-1,1]

How do I count items in an array that have a specific attribute value?

In my application, I have an array named #apps which is loaded by ActiveRecord with a record containing the app's name, environment, etc.
I am currently using #apps.count to get the number of apps in the array, but I am having trouble counting the number of applications in the array where the environment = 0.
I tried #apps.count(0) but that didn't work since there are multiple fields for each record.
I also tried something like #apps.count{ |environment| environment = 0} but nothing happened.
Any suggestions?
Just use select to narrow down to what you want:
#apps.select {|a| a.environment == 0}.count
However, if this is based on ActiveRecord, you'd be better off just making your initial query limit it unless of course you need all of the records and are just filtering them in different ways for different purposes.
I'll assume your model is call App since you are putting them in #apps:
App.where(environment: 0).count
You have the variable wrong. Also, you have assignment instead of comparison.
#apps.count{|app| app.environment == 0}
or
#apps.count{|app| app.environment.zero?}
I would use reduce OR each_with_object here:
reduce docs:
#apps.reduce(Hash.new(0)) do |counts, app|
counts[app.environment] += 1
counts
end
each_with_object docs:
#apps.each_with_object(Hash.new(0)) do |app, counts|
counts[app.environment] += 1
end
If you are able to query, use sql
App.group(:environment).count will return a hash with keys as environment and values as the count.

Resources