postgres unaccent function vs RoR transliterate - ruby-on-rails

In our RoR project we use postgres unaccent function to retrieve unaccent version of one of our models name attribute. The name attribute can contain any accented characters from various languages. We then save it as unaccent_name attribute. I don't like this solution because we need to be sure to have installed and accessible postgres extension UNACCENT (when testing, moving/cleaning database, and so on).
In RoR there is ActiveSupport::Inflector.transliterate method, which should do something very similar.
I've found that it mostly translates accented characters the same way, but there is also some difference:
same result:
SELECT unaccent('ľščťžý') AS unaccent_name;
=> "lsctzy"
ActiveSupport::Inflector.transliterate('ľščťžý')
=> "lsctzy"
different result:
SELECT unaccent('ß') AS unaccent_name;
=> "S"
ActiveSupport::Inflector.transliterate('ß')
=> "ss"
I know both of these methods can accept dictionaries with custom letter replacements, but I'm only interested in their general/default usage.
Is the main purpose of transliterate method same as postgres unaccent function? Can we use it as a replacement?

Very old post but I am working through a problem similar to the OP. We want to be able to search for a name and transliterate to give better results. However, with our versions of Postgres and rails the character transliterates the same to 'ss'.
Just wanted to share my findings in case it may be useful to others who stumble across this post.
In rails 5.2:
irb(main):001:0> ActiveSupport::Inflector.transliterate('ß')
=> "ss"
In postgres 9.6 I get:
db-test=# SELECT unaccent('ß') AS unaccent_name;
unaccent_name
---------------
ss
(1 row)

Related

RubyOnRails - Sorting members by last name

I have a lot of members in database and I want them sort in Rails by last_name, but the trick is that last name contain croatian letters like (Č,Ć,Š,Đ,Ž).
Some of last_names are for example: Antić, Čekić, Živad, Đurak, Perić...
I used ffi-locale gem but i failed to sort it. So any help and advice is good!
If your database has a correct configuration, this should work:
Member.sort(last_name: :asc).all
If you want to sort the members in your application, after fetching them from the database, you could use sort function (it works fine with utf strings):
members.sort_by!(&:last_name)

How to search for wild card in Rails

Im trying to search my User model for all Users that start with any integer, I have code for individual letters and it works, but Im having trouble getting it working with a wild card. Right now I have this code:
in my view:
<%= link_to '#', users_charlist_path(:char => '[0123456789]' %>
and in my controller I have:
def charlist
#a = User.where('goal like ?', "#{params[:char]}%").to_a
end
how ever, '[0123456789]', doesnt seem to work as it does not return anythign to me even though I have users whose names begin with an integer. how do i do this?
The where method is a part of ActiveRecord which maps the objects to the database. So how you can query the database with a regex depends on which db you are using not on ruby. You need to look up the regex functions of the database your using. For mysql you can find them here:
http://dev.mysql.com/doc/refman/5.1/de/regexp.html
An alternative is to select all objects an use the select method to filter the results that match your needs. That method is documented here:
http://www.ruby-doc.org/core-2.1.0/Array.html#method-i-select
On big amounts of data I whould suggest to use the database even if that means your application isnt 100% portable between different database systems.

Postgres accent insensitive LIKE search in Rails 3.1 on Heroku

How can I modify a where/like condition on a search query in Rails:
find(:all, :conditions => ["lower(name) LIKE ?", "%#{search.downcase}%"])
so that the results are matched irrespective of accents? (eg métro = metro). Because I'm using utf8, I can't use "to_ascii". Production is running on Heroku.
Proper solution
Since PostgreSQL 9.1 you can just:
CREATE EXTENSION unaccent;
Provides a function unaccent(), doing what you need (except for lower(), just use that additionally if needed). Read the manual about this extension.
More about unaccent and indexes:
Does PostgreSQL support "accent insensitive" collations?
Poor man's solution
If you can't install unacccent, but are able to create a function. I compiled the list starting here and added to it over time. It is comprehensive, but hardly complete:
CREATE OR REPLACE FUNCTION lower_unaccent(text)
RETURNS text
LANGUAGE sql IMMUTABLE STRICT AS
$func$
SELECT lower(translate($1
, '¹²³áàâãäåāăąÀÁÂÃÄÅĀĂĄÆćčç©ĆČÇĐÐèéêёëēĕėęěÈÊËЁĒĔĖĘĚ€ğĞıìíîïìĩīĭÌÍÎÏЇÌĨĪĬłŁńňñŃŇÑòóôõöōŏőøÒÓÔÕÖŌŎŐØŒř®ŘšşșߊŞȘùúûüũūŭůÙÚÛÜŨŪŬŮýÿÝŸžżźŽŻŹ'
, '123aaaaaaaaaaaaaaaaaaacccccccddeeeeeeeeeeeeeeeeeeeeggiiiiiiiiiiiiiiiiiillnnnnnnooooooooooooooooooorrrsssssssuuuuuuuuuuuuuuuuyyyyzzzzzz'
));
$func$;
Your query should work like that:
find(:all, :conditions => ["lower_unaccent(name) LIKE ?", "%#{search.downcase}%"])
For left-anchored searches, you can use an index on the function for very fast results:
CREATE INDEX tbl_name_lower_unaccent_idx
ON fest (lower_unaccent(name) text_pattern_ops);
For queries like:
SELECT * FROM tbl WHERE (lower_unaccent(name)) LIKE 'bob%';
Or use COLLATE "C". See:
PostgreSQL LIKE query performance variations
Is there a difference between text_pattern_ops and COLLATE "C"?
For those like me who are having trouble on add the unaccent extension for PostgreSQL and get it working with the Rails application, here is the migration you need to create:
class AddUnaccentExtension < ActiveRecord::Migration
def up
execute "create extension unaccent"
end
def down
execute "drop extension unaccent"
end
end
And, of course, after rake db:migrate you will be able to use the unaccent function in your queries: unaccent(column) similar to ... or unaccent(lower(column)) ...
First of all, you install postgresql-contrib. Then you connect to your DB and execute:
CREATE EXTENSION unaccent;
to enable the extension for your DB.
Depending on your language, you might need to create a new rule file (in my case greek.rules, located in /usr/share/postgresql/9.1/tsearch_data), or just append to the existing unaccent.rules (quite straightforward).
In case you create your own .rules file, you need to make it default:
ALTER TEXT SEARCH DICTIONARY unaccent (RULES='greek');
This change is persistent, so you need not redo it.
The next step would be to add a method to a model to make use of this function.
One simple solution would be defining a function in the model. For instance:
class Model < ActiveRecord::Base
[...]
def self.unaccent(column,value)
a=self.where('unaccent(?) LIKE ?', column, "%value%")
a
end
[...]
end
Then, I can simply invoke:
Model.unaccent("name","text")
Invoking the same command without the model definition would be as plain as:
Model.where('unaccent(name) LIKE ?', "%text%"
Note: The above example has been tested and works for postgres9.1, Rails 4.0, Ruby 2.0.
UPDATE INFO
Fixed potential SQLi backdoor thanks to #Henrik N's feedback
There are 2 questions related to your search on the StackExchange:
https://serverfault.com/questions/266373/postgresql-accent-diacritic-insensitive-search
But as you are on Heroku, I doubt this is a good match (unless you have a dedicated database plan).
There is also this one on SO: Removing accents/diacritics from string while preserving other special chars.
But this assumes that your data is stored without any accent.
I hope it will point you in the right direction.
Assuming Foo is the model you are searching against and name is the column. Combining Postgres translate and ActiveSupport's transliterate. You can do something like:
Foo.where(
"translate(
LOWER(name),
'âãäåāăąÁÂÃÄÅĀĂĄèééêëēĕėęěĒĔĖĘĚìíîïìĩīĭÌÍÎÏÌĨĪĬóôõöōŏőÒÓÔÕÖŌŎŐùúûüũūŭůÙÚÛÜŨŪŬŮ',
'aaaaaaaaaaaaaaaeeeeeeeeeeeeeeeiiiiiiiiiiiiiiiiooooooooooooooouuuuuuuuuuuuuuuu'
)
LIKE ?", "%#{ActiveSupport::Inflector.transliterate("%qué%").downcase}%"
)

Do "like" queries with ActiveRecord in Rails 2.x and 3.x?

I'm doing queries like this in Rails 3.x
Speaker.where("name like '%yson%'")
but I'd love to avoid the DB specific code. What's the right way to do this?
If there's a way to do this in Rails 2.x too, that would help too.
In Rails 3 or greater
Speaker.where("name LIKE ?", "%yson%")
In Rails 2
Speaker.all(:conditions => ["name LIKE ?", "%yson%"])
Avoid to directly interpolate strings because the value won't be escaped and you are vulnerable to SQL injection attacks.
You can use .matches for it.
> t[:name].matches('%lore').to_sql
=> "\"products\".\"name\" LIKE '%lore'"
Actual usage in a query would be:
Speaker.where(Speaker.arel_table[:name].matches('%lore'))
Use a search engine like solr or sphinx to create indexes for the columns you would be performing like queries on. Like queries always result in a full table scan when you look at the explain plan so you really should almost never use them in a production site.
Not by default in Rails, since there are so many DB options (MySQL, Postgresql, MongoDB, CouchDB...), but you can check out gems like MetaWhere, where you can do things like:
Article.where(:title.matches => 'Hello%', :created_at.gt => 3.days.ago)
=> SELECT "articles".* FROM "articles" WHERE ("articles"."title" LIKE 'Hello%')
AND ("articles"."created_at" > '2010-04-12 18:39:32.592087')
In general though you'll probably have to have some DB specific code, or refactor your code (i.e redefine the .matches operator on symbols in MetaWhere) to work with a different database. Hopefully you won't be changing your database that often, but if you are you should have a centralized location where you define these operators for re-use. Keep in mind that an operator or function defined in one database might not be available in another, in which case having this generalized operation is moot since you won't be able to perform the search anyways.

Regex retrieved from a MYSQL database to validate email addresses using Ruby on Rails

I am using Ruby on Rails 3 and a MYSQL database. I would like to retrieve a regex from the database and then use that value to validate email addresses.
I aim to not put the regex value in line in my RoR application code, but outside so that the value can be recalled for other usages and from other places.
In order to populate the database, I put in my 'RAILS_ROOT/db/seed.rb' the following:
Parameter.find_or_create(
:param_name => 'email_regex',
:param_value => "[a-z0-9!#\$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#\$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?",
)
Notice: in the 'seed.rb' file I edited a little bit the original regex from www.regular-expressions.info adding two \ just before $. Here it is the difference:
#original from www.regular-expressions.info
[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?
#edited by me
[a-z0-9!#\$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#\$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?
After run rake db:seed in the Terminal, in MYSQL database I have this value (without \ near $):
[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?
Then in my RoR application I use the regex this way:
def validate(string)
email_regex = Regexp.new(Parameter.find_by_param_name('email_regex').param_value)
if email_regex.match(string)
return true
else
return false
end
end
The problem using the above regex is that I can successfully validate also email addresses with double '#' or without the final part like these:
name#surname#gmail.com # Note the double '#'
test#gmail
Of course I would like to refuse those email addresses. So, how can I adjust that? Or, how can I get what I want?
I tried also to seed these regex:
#case 1
\A[a-z0-9!#\$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#\$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\Z
#case 2
\\A[a-z0-9!#\$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#\$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\\Z
#case 3
^[a-z0-9!#\$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#\$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?$
that in the MYSQL database become respectively:
#case 1
A[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?Z
#case 2
\A[a-z0-9!#\$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#\$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\Z
#case 3
^[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?$
but also them don't work as expected.
UPDATE
Debugging I have
--- !ruby/regexp /[a-z0-9!#$%&'*+\/=?^_`{|}~-]+(?:.[a-z0-9!#$%&'*+\/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?/
that means that just before of all / characters Ruby added a \ character. Can be that my problem? In the 'seed.rb' file I tryed to escape all / adding \ statements but the debug output is always the same.
There are so many things wrong on so many levels here…
Storing application configuration in your database isn't recommended; slower performance, potential catch 22s (like how do you configure your database, from your database), etc. Try something like SettingsLogic if you don't want to have to build your own singleton configuration or use an initializer.
Rails has built in validation functionality as a mixin that's automatically part of any models inheriting from ActiveRecord::Base. You should use it, rather than define your own validation routines, especially for basic cases like this.
You can actually have an email address with multiple # signs, provided the first is escaped with a backslash or the local portion of the address is quoted.
Why are you escaping $ characters in a character class where they have no special meaning?
Regular expressions are okay for a very basic validation of an email address to make sure you didn't get complete garbage data to pass off to your mail server, but they aren't the best way to verify an email address.
I suggest you have a good look at the validations guide at RubyOnRails.org.
You shouldn't reinvent this wheel. See http://lindsaar.net/2010/1/31/validates_rails_3_awesome_is_true for a standard way to validate email addresses in Rails 3.
If you do choose to reinvent the wheel, don't use a regular expression. The gory details of why this is a bad idea are explained in http://oreilly.com/catalog/9780596528126, along with a very, very complicated regular expression that almost does it.

Resources