How do I put a Ruby function into a SQLite3 query? - ruby-on-rails

I have a function which I need to put into a SQLite3 query.
I have the following method:
def levenshtein(a, b)
case
when a.empty? then b.length
when b.empty? then a.length
else [(a[0] == b[0] ? 0 : 1) + levenshtein(a[1..-1], b[1..-1]),
1 + levenshtein(a[1..-1], b),
1 + levenshtein(a, b[1..-1])].min
end
end
and I want to do a query that looks something like this:
#results = The_db.where('levenshtein("name", ?) < 3', '#{userinput}')
What I want to find the values of name in The_db where the edit distance between the value of the name column and the user input is less than 3. The problem is that I don't know how to use a Ruby function in a query. Is this even possible?

Have a look at the create_function method. You can use it to create a custom function like this (where you have already defined your levenshtein Ruby method):
db.create_function "levenshtein", 2 do |func, a, b|
func.result = levenshtein(a, b)
end
You can then use it in your SQL:
# first set up some data
db.execute 'create table names (name varchar(30))'
%w{Sam Ian Matt John Albert}.each do |n|
db.execute 'insert into names values (?)', n
end
#Then use the custom function
puts db.execute('select * from names where levenshtein(name, ?) < 3', 'Jan')
In this example the output is
Sam
Ian
John
I haven’t tested this in Rails, but I think it should work, the query string is just passed through to the database. You can get the Sqlite db object using ActiveRecord::Base.connection.raw_connection. (I don’t know how to configure Rails so that all ActiveRecord connections have the function defined however – it does seem to work if you add the function in the controller, but that isn’t ideal).
Having shown how this can be done, I’m not sure if it should be done in a web app. You probably don’t want to use Sqlite in production. Perhaps it could be useful if you’re using e.g. Postgres with its own levenshtein function in production (as suggested in the Tin Man’s answer), but want to use Sqlite in development.

If you need to retrieve the value of a column to insert it into a Ruby function, then you have to retrieve that value first.
The DBM can't call your method in your running code; You have to have the value, pass it to your method, then use the result in a secondary query.
Or use a DBM that has a Levenshtein function built in, like PostgreSQL or define it in pure SQL.

Related

convert my string to comma based elements

I am working on a legacy Rails project that relies on Ruby version 1.8
I have a string looks like this:
my_str = "a,b,c"
I would like to convert it to
value_list = "('a','b','c')"
so that I can directly use it in my SQL statement like:
"SELECT * from my_table WHERE value IN #{value_list}"
I tried:
my_str.split(",")
but it returns "abc" :(
How to convert it to what I need?
To split the string you can just do
my_str.split(",")
=> ["a", "b", "c"]
The easiest way to use that in a query, is using where as follows:
Post.where(value: my_str.split(","))
This will just work as expected. But, I understand you want to be able to build the SQL-string yourself, so then you need to do something like
quoted_values_str = my_str.split(",").map{|x| "'#{x}'"}.join(",")
=> "'a','b','c'"
sql = ""SELECT * from my_table WHERE value IN (#{quoted_values_str})"
Note that this is a naive approach: normally you should also escape quotes if they should be contained inside your strings, and makes you vulnerable for sql injection. Using where will handle all those edge cases correctly for you.
Under no circumstances should you reinvent the wheel for this. Rails has built-in methods for constructing SQL strings, and you should use them. In this case, you want sanitize_sql_for_assignment (aliased to sanitize_sql):
my_str = "a,b,c"
conditions = sanitize_sql(["value IN (?)", my_str.split(",")])
# => value IN ('a','b','c')
query = "SELECT * from my_table WHERE #{conditions}"
This will give you the result you want while also protecting you from SQL injection attacks (and other errors related to badly formed SQL).
The correct usage may depend what version of Rails you're using, but this method exists as far back as Rails 2.0 so it will definitely work even with a legacy app; just consult the docs for the version of Rails you're using.
value_list = "('#{my_str.split(",").join("','")}')"
But this is a very bad way to query. You better use:
Model.where(value: my_str.split(","))
The string can be manipulated directly; there is no need to convert it to an array, modify the array then join the elements.
str = "a,b,c"
"(%s)" % str.gsub(/([^,]+)/, "'\\1'")
#=> "('a','b','c')"
The regular expression reads, "match one or more characters other than commas and save to capture group 1. \\1 retrieves the contents of capture group 1 in the formation of gsub's replacement string.
couple of use cases:
def full_name
[last_name, first_name].join(' ')
end
or
def address_line
[address[:country], address[:city], address[:street], address[:zip]].join(', ')
end

How to pass array as bind variable to Rails/ActiveRecord raw SQL queries?

I need to pass an array of ids into my raw sql query like this:
select offers.* from offers where id in (1,2,3,4,5)
The real query includes a lot of joins and aggregation functions and can't be written using Arel expressions or ActiveRecord model methods like Offer.where(id: [...]). I'm looking exactly for how to use bind variables in raw queries.
Instead of interpolating ids into string I want to use bind variables like this (pseudo-code):
ActiveRecord::Base.connection.select_all("select offers.* from offers where id in (:ids)", {ids: [1,2,3,4,5]})
However, I can't find any solution to perform this. From this ticket I've got a comment with related test-case in ActiveRecord code with the following example:
sub = Arel::Nodes::BindParam.new
binds = [Relation::QueryAttribute.new("id", 1, Type::Value.new)]
sql = "select * from topics where id = #{sub.to_sql}"
#connection.exec_query(sql, "SQL", binds)
I've tried this approach, but it didn't worked at all, my "?" was not replaced by actual values.
I'm using Rails 5.1.6 and MariaDB database.
You could do this in a much simpler fashion purely with arel. (Also it makes the code far more maintainable than SQL strings)
offers = Arel::Table.new('offers')
ids = [1,2,3,4,5]
query = offers.project(Arel.star).where(offers[:id].in(ids))
ActiveRecord::Base.connection.exec_query(query.to_sql)
This will result in the following SQL
SELECT
[offers].*
FROM
[offers]
WHERE
[offers].[id] IN (1,2,3,4,5)
When executed you will receive an ActiveRecord::Result object with is usually easiest to deal with by calling to_hash and each resulting row will be turned into a Hash of {column_name => value}
However if you are using rails and Offer is a true model then:
Offer.where(id: ids)
Will result in the same query and will return an ActiveRecord::Relation collection of Offer objects which is generally more preferable.
Update
Seems like you need to enable prepared_statements in mysql2 (mariadb) in order to use the bind params, which can be done like this:
default: &default
adapter: mysql2
encoding: utf8
prepared_statements: true # <- here we go!
Please note the following pieces of code:
https://github.com/rails/rails/blob/5-1-stable/activerecord/lib/active_record/connection_adapters/abstract_adapter.rb#L115
https://github.com/rails/rails/blob/5-1-stable/activerecord/lib/active_record/connection_adapters/mysql2_adapter.rb#L40
https://github.com/rails/rails/blob/5-1-stable/activerecord/lib/active_record/connection_adapters/abstract_adapter.rb#L630
https://github.com/rails/rails/blob/5-1-stable/activerecord/lib/active_record/connection_adapters/mysql/database_statements.rb#L30
As you can see in the last code exec_query ignores bind_params if prepared_statements is turned off (which appears to be the default for the mysql2 adapter).

How to get weighted average grouped by a column

I have a model Company that have columns pbr, market_cap and category.
To get averages of pbr grouped by category, I can use group method.
Company.group(:category).average(:pbr)
But there is no method for weighted average.
To get weighted averages I need to run this SQL code.
select case when sum(market_cap) = 0 then 0 else sum(pbr * market_cap) / sum(market_cap) end as weighted_average_pbr, category AS category FROM "companies" GROUP BY "companies"."category";
In psql this query works fine. But I don't know how to use from Rails.
sql = %q(select case when sum(market_cap) = 0 then 0 else sum(pbr * market_cap) / sum(market_cap) end as weighted_average_pbr, category AS category FROM "companies" GROUP BY "companies"."category";)
ActiveRecord::Base.connection.select_all(sql)
returns a error:
output error: #<NoMethodError: undefined method `keys' for #<Array:0x007ff441efa618>>
It would be best if I can extend Rails method so that I can use
Company.group(:category).weighted_average(:pbr)
But I heard that extending rails query is a bit tweaky, now I just want to know how to run the result of sql from Rails.
Does anyone knows how to do it?
Version
rails: 4.2.1
What version of Rails are you using? I don't get that error with Rails 4.2. In Rails 3.2 select_all used to return an Array, and in 4.2 it returns an ActiveRecord::Result. But in either case, it is correct that there is no keys method. Instead you need to call keys on each element of the Array or Result. It sounds like the problem isn't from running the query, but from what you're doing afterward.
In any case, to get the more fluent approach you've described, you could do this:
class Company
scope :weighted_average, lambda{|col|
select("companies.category").
select(<<-EOQ)
(CASE WHEN SUM(market_cap) = 0 THEN 0
ELSE SUM(#{col} * market_cap) / SUM(market_cap)
END) AS weighted_average_#{col}
EOQ
}
This will let you say Company.group(:category).weighted_average(:pbr), and you will get a collection of Company instances. Each one will have an extra weighted_average_pbr attribute, so you can do this:
Company.group(:category).weighted_average(:pbr).each do |c|
puts c.weighted_average_pbr
end
These instances will not have their normal attributes, but they will have category. That is because they do not represent individual Companies, but groups of companies with the same category. If you want to group by something else, you could parameterize the lambda to take the grouping column. In that case you might as well move the group call into the lambda too.
Now be warned that the parameter to weighted_average goes straight into your SQL query without escaping, since it is a column name. So make sure you don't pass user input to that method, or you'll have a SQL injection vulnerability. In fact I would probably put a guard inside the lambda, something like raise "NOPE" unless col =~ %r{\A[a-zA-Z0-9_]+\Z}.
The more general lesson is that you can use select to include extra SQL expressions, and have Rails magically treat those as attributes on the instances returned from the query.
Also note that unlike with select_all where you get a bunch of hashes, with this approach you get a bunch of Company instances. So again there is no keys method! :-)

update_all with a method

Lets say I have a model:
class Result < ActiveRecord::Base
attr_accessible :x, :y, :sum
end
Instead of doing
Result.all.find_each do |s|
s.sum = compute_sum(s.x, s.y)
s.save
end
assuming compute_sum is a available method and does some computation that cannot be translated into SQL.
def compute_sum(x,y)
sum_table[x][y]
end
Is there a way to use update_all, probably something like:
Result.all.update_all(sum: compute_sum(:x, :y))
I have more than 80,000 records to update. Each record in find_each creates its own BEGIN and COMMIT queries, and each record is updated individually.
Or is there any other faster way to do this?
If the compute_sum function can't be translated into sql, then you cannot do update_all on all records at once. You will need to iterate over the individual instances. However, you could speed it up if there are a lot of repeated sets of values in the columns, by only doing the calculation once per set of inputs, and then doing one mass-update per calculation. eg
Result.all.group_by{|result| [result.x, result.y]}.each do |inputs, results|
sum = compute_sum(*inputs)
Result.update_all('sum = #{sum}', "id in (#{results.map(&:id).join(',')})")
end
You can replace result.x, result.y with the actual inputs to the compute_sum function.
EDIT - forgot to put the square brackets around result.x, result.y in the group_by block.
update_all makes an sql query, so any processing you do on the values needs to be in sql. So, you'll need to find the sql function, in whichever DBMS you're using, to add two numbers together. In Postgres, for example, i believe you would do
Sum.update_all(sum: "x + y")
which will generate this sql:
update sums set sum = x + y;
which will calculate the x + y value for each row, and set the sum field to the result.
EDIT - for MariaDB. I've never used this, but a quick google suggests that the sql would be
update sums set sum = sum(x + y);
Try this first, in your sql console, for a single record. If it works, then you can do
Sum.update_all(sum: "sum(x + y)")
in Rails.
EDIT2: there's a lot of things called sum here which is making the example quite confusing. Here's a more generic example.
set col_c to the result of adding col_a and col_b together, in class Foo:
Foo.update_all(col_c: "sum(col_a + col_b)")
I just noticed that i'd copied the (incorrect) Sum.all.update_all from your question. It should just be Sum.update_all - i've updated my answer.
I'm completely beginner, just wondering Why not add a self block like below, without adding separate column in db, you still can access Sum.sum from outside.
def self.sum
x+y
end

comparing one attribute to another with ransack

Ransack allows me to build conditions with an attribute, a predicate and a value. I haven't been able to find any documentation on how to compare one attribute to another however. For instance, how could I create a condition for:
WHERE column_a < column_b
I've been using Ransack for quite a while, but I don't see any possibility to do what you are looking for. What you want is a "case -> when" statement, which can be produced in Rails or as SQL with ActiveRecord.
Ransack gives you the ability to create a custom SQL command, by defining attribute, predicate and value, which then translates into WHERE Statement you already mentioned. I don't see any possibility to tell Ransack directly to filter for what you want. However:
What you could is create a scope like:
scope :column_b_gt_columnb_a, -> { where('column_b > column_a') }
And then you can build your search like this:
Object({ column_b_gt_columnb_a: true })
Probably not really what you were looking, but I think that's the best you gonna get...
And if you want to do it with Rails you would do to compare values or use said where statement I used above.
Records.each do |i|
case i.variable_a
when i.variable_b
# do something when it's the same
when i.variable_a > i.variable_b
# do something when it's greater
end
end
For an example of an SQL statement look here
How do I compare two columns for equality in SQL Server?
Hope this helps a bit!

Resources