Case-insensitive search in Rails model - ruby-on-rails

My product model contains some items
Product.first
=> #<Product id: 10, name: "Blue jeans" >
I'm now importing some product parameters from another dataset, but there are inconsistencies in the spelling of the names. For instance, in the other dataset, Blue jeans could be spelled Blue Jeans.
I wanted to Product.find_or_create_by_name("Blue Jeans"), but this will create a new product, almost identical to the first. What are my options if I want to find and compare the lowercased name.
Performance issues is not really important here: There are only 100-200 products, and I want to run this as a migration that imports the data.
Any ideas?

You'll probably have to be more verbose here
name = "Blue Jeans"
model = Product.where('lower(name) = ?', name.downcase).first
model ||= Product.create(:name => name)

This is a complete setup in Rails, for my own reference. I'm happy if it helps you too.
the query:
Product.where("lower(name) = ?", name.downcase).first
the validator:
validates :name, presence: true, uniqueness: {case_sensitive: false}
the index (answer from Case-insensitive unique index in Rails/ActiveRecord?):
execute "CREATE UNIQUE INDEX index_products_on_lower_name ON products USING btree (lower(name));"
I wish there was a more beautiful way to do the first and the last, but then again, Rails and ActiveRecord is open source, we shouldn't complain - we can implement it ourselves and send pull request.

If you are using Postegres and Rails 4+, then you have the option of using column type CITEXT, which will allow case insensitive queries without having to write out the query logic.
The migration:
def change
enable_extension :citext
change_column :products, :name, :citext
add_index :products, :name, unique: true # If you want to index the product names
end
And to test it out you should expect the following:
Product.create! name: 'jOgGers'
=> #<Product id: 1, name: "jOgGers">
Product.find_by(name: 'joggers')
=> #<Product id: 1, name: "jOgGers">
Product.find_by(name: 'JOGGERS')
=> #<Product id: 1, name: "jOgGers">

You might want to use the following:
validates_uniqueness_of :name, :case_sensitive => false
Please note that by default the setting is :case_sensitive => false, so you don't even need to write this option if you haven't changed other ways.
Find more at:
http://api.rubyonrails.org/classes/ActiveRecord/Validations/ClassMethods.html#method-i-validates_uniqueness_of

Several comments refer to Arel, without providing an example.
Here is an Arel example of a case-insensitive search:
Product.where(Product.arel_table[:name].matches('Blue Jeans'))
The advantage of this type of solution is that it is database-agnostic - it will use the correct SQL commands for your current adapter (matches will use ILIKE for Postgres, and LIKE for everything else).

In postgres:
user = User.find(:first, :conditions => ['username ~* ?', "regedarek"])

Quoting from the SQLite documentation:
Any other character matches itself or
its lower/upper case equivalent (i.e.
case-insensitive matching)
...which I didn't know.But it works:
sqlite> create table products (name string);
sqlite> insert into products values ("Blue jeans");
sqlite> select * from products where name = 'Blue Jeans';
sqlite> select * from products where name like 'Blue Jeans';
Blue jeans
So you could do something like this:
name = 'Blue jeans'
if prod = Product.find(:conditions => ['name LIKE ?', name])
# update product or whatever
else
prod = Product.create(:name => name)
end
Not #find_or_create, I know, and it may not be very cross-database friendly, but worth looking at?

Similar to Andrews which is #1:
Something that worked for me is:
name = "Blue Jeans"
Product.find_by("lower(name) = ?", name.downcase)
This eliminates the need to do a #where and #first in the same query. Hope this helps!

Another approach that no one has mentioned is to add case insensitive finders into ActiveRecord::Base. Details can be found here. The advantage of this approach is that you don't have to modify every model, and you don't have to add the lower() clause to all your case insensitive queries, you just use a different finder method instead.

Upper and lower case letters differ only by a single bit. The most efficient way to search them is to ignore this bit, not to convert lower or upper, etc. See keywords COLLATION for MSSQL, see NLS_SORT=BINARY_CI if using Oracle, etc.

Find_or_create is now deprecated, you should use an AR Relation instead plus first_or_create, like so:
TombolaEntry.where("lower(name) = ?", self.name.downcase).first_or_create(name: self.name)
This will return the first matched object, or create one for you if none exists.

An alternative can be
c = Product.find_by("LOWER(name)= ?", name.downcase)

Case-insensitive searching comes built-in with Rails. It accounts for differences in database implementations. Use either the built-in Arel library, or a gem like Squeel.

There are lots of great answers here, particularly #oma's. But one other thing you could try is to use custom column serialization. If you don't mind everything being stored lowercase in your db then you could create:
# lib/serializers/downcasing_string_serializer.rb
module Serializers
class DowncasingStringSerializer
def self.load(value)
value
end
def self.dump(value)
value.downcase
end
end
end
Then in your model:
# app/models/my_model.rb
serialize :name, Serializers::DowncasingStringSerializer
validates_uniqueness_of :name, :case_sensitive => false
The benefit of this approach is that you can still use all the regular finders (including find_or_create_by) without using custom scopes, functions, or having lower(name) = ? in your queries.
The downside is that you lose casing information in the database.

You can also use scopes like this below and put them in a concern and include in models you may need them:
scope :ci_find, lambda { |column, value| where("lower(#{column}) = ?", value.downcase).first }
Then use like this:
Model.ci_find('column', 'value')

If you're using postgres (probably others), I like this solution.
Product.find_by("name ilike 'bLue JEaNS'")
I like this better for a couple reasons.
Clearer connection to database action -> you can just copy paste that into where ...
If you choose to add a wildard %, it's straightforward.

Assuming that you use mysql, you could use fields that are not case sensitive: http://dev.mysql.com/doc/refman/5.0/en/case-sensitivity.html

user = Product.where(email: /^#{email}$/i).first

Some people show using LIKE or ILIKE, but those allow regex searches. Also you don't need to downcase in Ruby. You can let the database do it for you. I think it may be faster. Also first_or_create can be used after where.
# app/models/product.rb
class Product < ActiveRecord::Base
# case insensitive name
def self.ci_name(text)
where("lower(name) = lower(?)", text)
end
end
# first_or_create can be used after a where clause
Product.ci_name("Blue Jeans").first_or_create
# Product Load (1.2ms) SELECT "products".* FROM "products" WHERE (lower(name) = lower('Blue Jeans')) ORDER BY "products"."id" ASC LIMIT 1
# => #<Product id: 1, name: "Blue jeans", created_at: "2016-03-27 01:41:45", updated_at: "2016-03-27 01:41:45">

You can use like this in model
scope :matching, lambda { |search, *cols|
where cols.flatten.map{|col| User.arel_table[col].matches("%#{search}%") }.inject(:or)
}
and use wherever you like this
User.matching(params[:search], :mobile_number, :name, :email)
You can pass multiple column for search
for single column search you can use like this
User.where(User.arel_table[:column].matches("%#{search}%"))

So far, I made a solution using Ruby. Place this inside the Product model:
#return first of matching products (id only to minimize memory consumption)
def self.custom_find_by_name(product_name)
##product_names ||= Product.all(:select=>'id, name')
##product_names.select{|p| p.name.downcase == product_name.downcase}.first
end
#remember a way to flush finder cache in case you run this from console
def self.flush_custom_finder_cache!
##product_names = nil
end
This will give me the first product where names match. Or nil.
>> Product.create(:name => "Blue jeans")
=> #<Product id: 303, name: "Blue jeans">
>> Product.custom_find_by_name("Blue Jeans")
=> nil
>> Product.flush_custom_finder_cache!
=> nil
>> Product.custom_find_by_name("Blue Jeans")
=> #<Product id: 303, name: "Blue jeans">
>>
>> #SUCCESS! I found you :)

Related

Rails-y way to query a model with a belongs_to association

I have two models:
class Wine
belongs_to :region
end
class Region
has_many :wines
end
I am attempting to use the #where method with a hash built from transforming certain elements from the params hash into a query hash, for example { :region => '2452' }
def index
...
#wines = Wine.where(hash)
...
end
But all I get is a column doesn't exist error when the query is executed:
ActiveRecord::StatementInvalid: PGError: ERROR: column wines.region does not exist
LINE 1: SELECT "wines".* FROM "wines" WHERE "wines"."region" =...
Of course, the table wines has region_id so if I queried for region_id instead I would not get an error.
The question is the following:
Is there a rails-y way to query the Wine object for specific regions using the id in the #where method? I've listed some options below based on what I know I can do.
Option 1:
I could change the way that I build the query hash so that each field has _id (like { :region_id => '1234', :varietal_id => '1515' } but not all of the associations from Wine are belongs_to and thus don't have an entry in wines for _id, making the logic more complicated with joins and what not.
Option 2:
Build a SQL where clause, again using some logic to determine whether to use the id or join against another table... again the logic would be somewhat more complicated, and delving in to SQL makes it feel less rails-y. Or I could be wrong on that front.
Option(s) 3..n:
Things I haven't thought about... your input goes here :)
You could set up a scope in the Wine model to make it more rails-y ...
class Wine < ActiveRecord::Base
belongs_to :region
attr_accessible :name, :region_id
scope :from_region, lambda { |region|
joins(:region).where(:region_id => region.id)
}
end
So then you can do something like:
region = Region.find_by_name('France')
wine = Wine.from_region(region)
Edit 1:
or if you want to be really fancy you could do a scope for multiple regions:
scope :from_regions, lambda { |regions|
joins(:region).where("region_id in (?)", regions.select(:id))
}
regions = Region.where("name in (?)", ['France','Spain']) # or however you want to select them
wines = Wine.from_regions(regions)
Edit 2:
You can also chain scopes and where clauses, if required:
regions = Region.where("name in (?)", ['France','Spain'])
wines = Wine.from_regions(regions).where(:varietal_id => '1515')
Thanks to all who replied. The answers I got would be great for single condition queries but I needed something that could deal with a varying number of conditions.
I ended up implementing my option #1, which was to build a condition hash by iterating through and concatenating _id to the values:
def query_conditions_hash(conditions)
conditions.inject({}) do |hash, (k,v)|
k = (k.to_s + "_id").to_sym
hash[k] = v.to_i
hash
end
end
So that the method would take a hash that was built from params like this:
{ region => '1235', varietal => '1551', product_attribute => '9' }
and drop an _id onto the end of each key and change the value to an integer:
{ region_id => 1235, varietal_id => 1551, product_attribute_id => 9 }
We'll see how sustainable this is, but this is what I went with for now.

Newbie: Rails' way to query database in my case

I am using Ruby v1.8 and Rails v2.3.
I have a two model objects: Cars and Customers,
Model Cars:
class car < ActiveRecord::Base
#car has attribute :town_code
has_many :customers
end
Model Customers:
class customer < ActiveRecord::Base
# customer has attribute :first_name, :last_name
belongs_to :car
end
Now in my controller, I got a string from VIEW, and the received string has the format firstname.lastname#town_code, for example a string like "John.smith#ac01" which can be parsed as first_name="John", last_name="smith" and town_code="ac01"
Now I would like use the Rails's way to query the database to find all the customer objects (match the above conditions) from Customers table which has :
first_name="John",
last_name="smith"
and owned a car(by car_id) with car's town_code="ac01".
what is Rails' syntax to query this?
I know it should be something like (if I wanna count the nr of matched customer):
Customer.count :consitions =>{:first_name => "John", :last_name=>"smith"...}
But, I am not sure how to refer to a customer that has a referenced car with car's town_code= "ac01" ?
------------------ My question --------------------
I want to have two queries:
-one is used to count the number of matching customers,
-the other query returns the customers objects like find_by_ query.
What is the syntax in Ruby on Rails for the two queries?
It should be something similar to
Customer.where(:firstname => "John", :last_name => "Smith").count
If you have many Customers of Car, you can do something like
Car.where(...).customers.where(...)
You should really be firing rails c to test your queries in (I might be slightly off)
You could have something like:
#customers = car.where(:town_code => town_code).customers.where(:first_name => first_name, :last_name => last_name)
And then just count the results:
#customer_count = #customers.count
This assuming you parsed your string into the variables town_code, first_name, and last_name, like you said.
Edit
I don't think Rails v2.3 supports these chains of Active Record queries because I believe it lacks lazy loading from DB. I'm not completely sure. Also, I realize my first suggestion would't work because there could be many cars with the same town_code. I guess you could solve it using the map function like so (not tested):
#customers = car.all(:conditions => {:town_code => town_code}).map{ |c| c.customers.where(:first_name => first_name, :last_name => last_name) }
And then count them like before:
#customer_count = #customers.count
I believe you could do something like this: source
Customer.find(:all, :include => :car, :conditions => "customers.first_name = 'John' AND customers.last_name = 'Smith' AND cars.town_code = 'ac01'")
Counting all customers with a specification can be achieved by this command: source
Customer.count(:all, :include => :car, :conditions => "customers.first_name = 'John' AND customers.last_name = 'Smith' AND cars.town_code = 'ac01'")
By the way, if you are in the position to choose what you work with, I would advise you to go for Rails 3. The chaining methods described by Joseph would make this kind of query a lot easier and it'll save you upgrading issues down the road. (And you tagged the question for Rails 3)

How can I get a unique :group of a virtual attribute in rails?

I have several similar models ContactEmail, ContactLetter, etcetera.
Each one belongs_to a Contact
Each contact belongs_to a Company
So, what I did was create a virtual attribute for ContactEmail:
def company_name
contact = Contact.find_by_id(self.contact_id)
return contact.company_name
end
Question: How can I get an easy list of all company_name (without duplicates) if I have a set of ContactEmails objects (from a find(:all) method, for example)?
When I try to do a search on ContactEmail.company_name using the statistics gem, for example, I get an error saying that company_name is not a column for ContactEmail.
Assuming your ContactEmail set is in #contact_emails (untested):
#contact_emails.collect { |contact_email| contact_email.company_name }.uniq
You don't need the virtual attribute for this purpose though. ActiveRecord sets up the relationship automatically based on the foreign key, so you could take the company_name method out of the ContactEmail model and do:
#contact_emails.collect { |contact_email| contact_email.contact.company_name }.uniq
Performance could be a consideration for large sets, so you might need to use a more sophisticated SQL query if that's an issue.
EDIT to answer your 2nd question
If company_name is a column, you can do:
ContactEmail.count(:all, :joins => :contact, :group => 'contact.company_name')
On a virtual attribute I think you'd have to retrieve the whole set and use Ruby (untested):
ContactEmail.find(:all, :joins => :contact, :select => 'contacts.company_name').group_by(&:company_name).inject({}) {|hash,result_set| hash.merge(result_set.first=>result_set.last.count)}
but that's not very kind to the next person assigned to maintain your system -- so you're better off working out the query syntax for the .count version and referring to the column itself.

Updating a large record set in Rails

I need to update a single field across a large set of records. Normally, I would just run a quick SQL update statement from the console and be done with it, but this is a utility that end users need to be able to run in this app.
So, here's my code:
users = User.find(:all, :select => 'id, flag')
users.each do |u|
u.flag = false
u.save
end
I'm afraid this is just going to take a while as the number of users increases (current sitting at around 35k, adding 2-5k a week). Is there a faster way to do this?
Thanks!
If you really want to update all records, the easiest way is to use #update_all:
User.update_all(:flag => false)
This is the equivalent of:
UPDATE users SET flag = 'f'
(The exact SQL will be different depending on your adapter)
The #update_all method also accepts conditions:
User.update_all({:flag => false}, {:created_on => 3.weeks.ago .. 5.hours.ago})
Also, #update_all can be combined with named scopes:
class User < ActiveRecord::Base
named_scope :inactive, lambda {{:conditions => {:last_login_at => 2.years.ago .. 2.weeks.ago}}
end
User.inactive.update_all(:flag => false)
You could use ActiveRecord's execute method to execute the update SQL. Something like this:
ActiveRecord::Base.connection.execute('UPDATE users SET flag=0')

How to coerce type of ActiveRecord attribute returned by :select phrase on joined table?

Having trouble with AR 2.3.5, e.g.:
users = User.all( :select => "u.id, c.user_id", :from => "users u, connections c",
:conditions => ... )
Returns, e.g.:
=> [#<User id: 1000>]
>> users.first.attributes
=> {"id"=>1000, "user_id"=>"1000"}
Note that AR returns the id of the model searched as numeric but the selected user_id of the joined model as a String, although both are int(11) in the database schema.
How could I better form this type of query to select columns of tables backing multiple models and retrieving their natural type rather than String ? Seems like AR is punting on this somewhere. How could I coerce the returned types at AR load time and not have to tack .to_i (etc.) onto every post-hoc access?
It's unfortunately not going to happen very easily. All of the data from the DB connection comes to rails as strings, the conversion of types happens in each of the dynamic attribute methods that rails creates at runtime. It knows which attributes to convert to which type by the table's column-type meta-data that it retrieves when the app starts. Each model only has column meta-data for it's own columns, that's why it's own columns end up with correct type. There is no easy way to auto-convert to the correct types.
You could on the other hand, create a simple conversion method that would take a Hash and automatically convert the attributes.
Something like this:
users = User.all(:select => "cl, comments.c2", ...)
users = convert_columns(users, 'c2' => :integer, 'other_column' => :date)
def convert_columns(records, columns = {})
records.each do |rec|
columns.each do |col, type|
rec[col] = case type
when :int then rec[col].to_i
when :date then ........
....
end
end
end
end
Why are you using :from => "users" inside a User.method ?
The following will do an inner join (which is what you are doing anyways)
users = User.all(:include => :connections, :select => "users.id, connections.user_id", :conditions => {...})
This is going to be very heavy query for the database.
Faster query would be with the outer join though.
This will also return the keys as INT not STRING
A much faster alternative was
Connection.all(:include => :user, :conditions => {...}).collect {|e| [e.user_id, e.id] }
This gives you an array of arrays with the ids. If you are going to select "id, user_id" columns only, then it may not necessarily be as AR object. An array can be faster.
I hope I am not missing some point here. Suggest me, if I am.
If you want quick solution - try to use after_find callback and preset correct attributes types there:
class User < ActiveRecord::Base
after_find :preset_types
private
def preset_types user
user.user_id = user.user_id.to_i
end
end

Resources