Ruby 1.9 doesn't support Unicode normalization yet - ruby-on-rails

I'm trying to port over some of my old rails apps to Ruby 1.9 and I keep getting warnings about how "Ruby 1.9 doesn't support Unicode normalization yet." I've tracked it down to this function, but I'm getting about 20 warning messages per request:
rails-2.3.5/activesupport/lib/active_support/inflector.rb
def transliterate(string)
warn "Ruby 1.9 doesn't support Unicode normalization yet"
string.dup
end
Any ideas how I should start tracking these down and resolving it?

If you are aware of the consequences, i.e. accented characters will not be transliterated in Ruby 1.9.1 + Rails 2.3.x, place this in config/initializers to silence the warning:
# http://stackoverflow.com/questions/2135247/ruby-1-9-doesnt-support-unicode-normalization-yet
module ActiveSupport
module Inflector
# Calling String#parameterize prints a warning under Ruby 1.9,
# even if the data in the string doesn't need transliterating.
if Rails.version =~ /^2\.3/
undef_method :transliterate
def transliterate(string)
string.dup
end
end
end
end
Rails 3 does indeed solve this issue, so a more future-proof solution would be to migrate towards that.

The StringEx Gem seems to work pretty well. It has no dependency on Iconv either.
It adds some methods to the String class, like "to_ascii" which does beautiful transliteration out of the box:
require 'stringex'
"äöüÄÖÜßë".to_ascii #=> "aouAOUsse"
Also, the Babosa Gem does a great job transliterating UTF-8 strings, even with language support:
"Jürgen Müller".to_slug.transliterate.to_s #=> "Jurgen Muller"
"Jürgen Müller".to_slug.transliterate(:german).to_s #=> "Juergen Mueller"
Enjoy.

That method definition is wrapped in an if-statement for Ruby 1.9. Right above it, you will find the regular definition, which shows a bit more of what this is doing. It's a method used to convert accented characters into their regular variants. E.g.: á => a, or ë => e
But this method is only used in parameterize, which is in turn defined right above transliterate. This is all still in ActiveSupport. I can't find anything that is directly calling parameterize.
So perhaps you're using parameterize or transliterate yourself, somewhere in your Rails application?
Common usage (according to the parameterize documentation) is for creating friendly permalinks from arbitrary strings, much like SO does, for example:
http://stackoverflow.com/questions/2135247/ruby-1-9-doesnt-support-unicode-normalization-yet
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Replace the body of the method with
raise "transliterate called"
and observe a backtrace which will show you where the stuff is coming from at the first call. Your app will of course collapse as well but that will likely give you the culprit from the first try.

I appreciate that this is a dirty way to solve the problem, but having read the error message I'm aware of the issue. So I want to get rid of the warnings. I dropped this code in environment.rb:
module ActiveSupport
module Inflector
# Calling String#parameterize prints a warning under Ruby 1.9,
# even if the data in the string doesn't need transliterating.
# Maybe Rails 3 will have fixed it...?
if RAILS_GEM_VERSION =~ /^2\.3/
undef_method :transliterate
def transliterate(string)
string.dup
end
end
end
end

If you'd rather not monkey patch the Inflector module, you can also do this...
Both of the following worked for me to silence this annoying "Ruby 1.9 doesn't support Unicode normalization yet" warning:
silence_stream(STDERR) {
whatever_code_caused_transliterate_to_be_called
}
or
silence_warnings {
whatever_code_caused_transliterate_to_be_called
}
This does have the disadvantage that it requires cluttering up your calling code, but it is a technique you can use generally whenever you don't want to see warnings or other output.
activesupport provides silence_stream and silence_warnings in activesupport-2.3.11/lib/active_support/core_ext/kernel/reporting.rb

String#unicode_normalize, String#unicode_normalize!, String#unicode_normalized? will be introduced in Ruby 2.2. Sample code and implementation can be seen in test case, lib/unicode_normalize.rb and lib/unicode_normalize/normalize.rb.
// U+00E1: LATIN SMALL LETTER A WITH ACUTE
// U+U+0301: COMBINING ACUTE ACCENT
puts "\u00E1" == "a\u0301".unicode_normalize(:nfc)
puts true == "a".unicode_normalized?(:nfc)

Related

Rubocop/Hound recommend freezing string literal class names

My project uses HoundCI as a code linter, which I believe internally uses rubocop.
Recently I started noticing this sort of warning -
It appears on every class definition (e.g. class User < ActiveRecord::Base).
I understand the concept of freezing string literals, but why would it expect me to freeze class definitions? Also more importantly, how do I disable it? It's quite annoying to have 10+ of these "errors" polluting our pull requests.
Thank you!
Edit: Looks like it also started appearing on require statements that use string literals, like with rspec tests. This is definitely new and wasn't being flagged previously
It looks like Hound/Rubocop is detecting a violation of the FrozenStringLiteralComment cop.
This cop is designed to help upgrade to Ruby 3.0. It will add the comment # frozen_string_literal: true to the top of files to enable frozen string literals. Frozen string literals will be default in Ruby 3.0. The comment will be added below a shebang and encoding comment. The frozen string literal comment is only valid in Ruby 2.3+.
You can either add the magic comment manually to the top of your files
# frozen_string_literal: true
Or have Rubocop do it for you
$ bundle exec rubocop --auto-correct --only FrozenStringLiteralComment
You can also ignore the cop in your rubocop.yml, Style/FrozenStringLiteralComment

Rails decode json with \u0022 character fail

I would like to decode a json string with a \u0022 character in it.
I am successful with:
>> ActiveSupport::JSON.decode("{\"json\":{\"difficulty\":1}}")
=> {"json"=>{"difficulty"=>1}}
But fail with:
>> ActiveSupport::JSON.decode("{\"json\":{\"difficulty\":\"test\\u0022test\"}}")
StandardError: Invalid JSON string
from /home/.../.rbenv/versions/1.8.7-p358/lib/ruby/gems/1.8/gems/activesupport-2.3.15/lib/active_support/json/backends/yaml.rb:14:in `decode'
from /home/.../.rbenv/versions/1.8.7-p358/lib/ruby/gems/1.8/gems/activesupport-2.3.15/lib/active_support/json/decoding.rb:14:in `__send__'
from /home/.../.rbenv/versions/1.8.7-p358/lib/ruby/gems/1.8/gems/activesupport-2.3.15/lib/active_support/json/decoding.rb:14:in `decode'
from (irb):11
I would love to replace the \u0022 character with another one, but I can't do it because it's inside the parameters parsing process of rails when the app receives a request; except if I override the json decode chore method, which I would prefer to avoid.
FYI : I'm on Ruby 1.8.7-p358 & Rails 2.3.15 & I can't change that.
I think this has to do with Syck, the old YAML library used in (very) old versions of Ruby. What does this have to do with YAML? In ActiveSupport 2.3.15, JSON decoding uses YAML.load to parse JSON, because JSON happens to be a subset of YAML.
A quick aside: In ActiveSupport 2.3.18, ActiveSupport::JSON::Backends::Yaml no longer uses the YAML backend decode YAML, and ActiveSupport::JSON::Backends::Yaml#decode looks like this:
def decode(json)
raise "The Yaml backend has been deprecated due to security risks, you should set ActiveSupport::JSON.backend = 'OkJson'"
end
There are very serious consequences to using outdated versions of Ruby and Rails. Upgrade to versions that are still supported by security releases, or you will be sorry. Seriously.
Anyway, let's take a closer look at your data. As I suspect you know, the UTF-8 character U+0022 is a double-quote. All of the escaping makes things confusing, so let's see what string we actually have:
str = "{\"json\":{\"difficulty\":\"test\\u0022test\"}}"
puts str
# => {"json":{"difficulty":"test\u0022test"}}
FWIW it's more readable to use one of Ruby's alternate string syntaxes here, e.g.:
str = %Q[{"json":{"difficulty":"test\\u0022test"}}]
puts str
# => {"json":{"difficulty":"test\u0022test"}}
Either way, that's a literal \ followed by u0022. So far so good. Now let's try to parse it with Syck (I'm using Ruby 2.2, but it doesn't make a difference in this case):
require "syck"
str = %Q[{"json":{"difficulty":"test\\u0022test"}}]
Syck.load(str)
# => .../syck.rb:145:in `load': syntax error on line 0, col 38: `# Transforms the subclass name found i}}' (ArgumentError)
I don't know why Syck's error messages are so weird, but anyway we can see that Syck falls over trying to parse this JSON. Now let's try it with a modern YAML parser:
require "psych"
str = %Q[{"json":{"difficulty":"test\\u0022test"}}]
Psych.load(str)
# => {"json":{"difficulty":"test\u0022test"}}
Works great!
So you have a few options.
You could upgrade to a version of Ruby that's still maintained, doesn't have security holes, and has a YAML parser that isn't broken.
You could upgrade to a version of Rails that's still maintained, doesn't have security holes, and doesn't use a broken YAML parser to parse JSON.
You could install Psych, which is available as a gem for older versions of Ruby.

Ruby 1.9, Rails 3 and Unicode: code won't recognize Unicode characters

I'm trying to upgrade some tests as we move our app from Rails 2 on 1.8.7 to Rails 3 on Ruby 1.9.2. The tests basically ensure that database objects can be named with unicode characters, to provide international support.
The tests basically look like this:
#encoding: utf-8
'ä' =~ /\S/ # this passes
'ä' =~ \/w/ # this fails, apparently passed on 1.8.7
model = Model.create!(:name => '§äè®') # this causes a "Name must include at least one letter or number" validation error, which means Ruby (or Rails) is seeing the name as being blank
This is all fundamentally basic and very simplified for the purposes of posting here, but these are what fail. Is there anything else I need to be looking at here? I know Ruby doesn't play well with Unicode, but this pretty much has to be left in. Any help is appreciated.
Looks like this is working as intended:
http://redmine.ruby-lang.org/issues/show/3181
Changed to 'ä' =~ /\p{L}/ got it to work.

Tracing dependency loading in Rails

Our team is working on a new application that we started with Rails 3.1 and Ruby 1.9.2 - and last night I took it to Ruby 1.9.3.
One of the gems we were using in the dependency chain (css_parser) ended up having a require 'iconv' in it, triggering a deprecation warning in 1.9.3 that looked like this:
.../gems/activesupport-3.1.1/lib/active_support/dependencies.rb:240:in `block in require': iconv will be deprecated in the future, use String#encode instead.
At first I naively blamed that on rails without a better trace, until I didn't find a require 'iconv' in it anywhere.
The only way I tracked this down was that I started commenting things out in my Gemfile and then I finally got the bright idea to load irb and start requiring each library in turn. I also could have just done a filesystem grep in the gems directory, but I wasn't exactly sure that "require 'iconv'" was what was triggering the error.
What a PITA. There has to be a better way - just doing a --trace in rake tasks that load rails didn't cut it. Is there some way / any way of triggering a trace on this that would have shown me which line in the relatively long list of library dependencies was triggering the deprecation?
So, it's probably a little moot because I'm not likely to ever run into the problem again (and the css_parser gem was the only iconv-requiring gems in my current Rails 3.1/Ruby 1.9.3 projects).
But it was a puzzle, so I wanted to find some way of solving it.
The problem is very iconv-specific in this case. There are other ruby deprecations, but for the most part they seem to go through Kernel#warn (if ruby) or rb_warn() (if C) - but the warning in iconv.c is a little different than the others - at any rate it's a puts to rb_stderr.
So maybe I can do the following
Override Kernel#require to capture stderr
Check for an iconv message after calling the original Kernel#require
Raise an exception if the message found, thereby getting a trace
Do this before bundler runs if at all possible.
It turns out I can't do #4 - because Bundler calls Kernel.require directly - but I can use Bundler to parse the Gemfile to give me a list of things to require myself.
So this is what I get - thanks to this stack overflow post for a pointer on capturing standard error - and the rubygems source for the idea on aliasing the original Kernel#require
# override Kernel#require to intercept stderr messages on require
# and raise a custom error if we find one mentioning 'iconv'
require "stringio"
class RequireError < StandardError
end
module Kernel
alias :the_original_require require
private :the_original_require
def capture_stderr
# The output stream must be an IO-like object. In this case we capture it in
# an in-memory IO object so we can return the string value. You can assign any
# IO object here.
previous_stderr, $stderr = $stderr, StringIO.new
yield
$stderr.string
ensure
# Restore the previous value of stderr (typically equal to STDERR).
$stderr = previous_stderr
end
def require(name)
captured_output = capture_stderr do
the_original_require(name)
end
if(captured_output =~ %r{iconv})
raise RequireError, 'iconv requirement found'
end
end
end
require 'bundler'
# load the list of Bundler requirements from the Gemfile
required_libraries = Bundler.definition.dependencies.map(&:name)
# loop through and require each, ignoring all errors other than
# our custom error
required_libraries.each do |requirement|
begin
require(requirement)
rescue Exception => e
if(e.class == RequireError)
raise e
end
end
end
And voila! A trace message that helps track down where the iconv requirement was.
In the end, probably just a search for "require 'iconv'" is still best (once it's clear that's the what was causing the error).
But, as in life. Some Yaks Must Be Shaved.
You could take a look at the Gemfile.lock file, which holds all dependencies in a hierarchical tree and indicates the versions required by each gem. This might help to identify the gem that is requiring it.

Rails app template is running code and thowing fits

I'm trying to create a Rails app template I have this block of code in there
file 'config/sass.rb', <<-RUBY
Sass::Engine::DEFAULT_OPTIONS[:load_paths].tap do |load_paths|
load_paths << "#{Rails.root}/app/assets/stylesheets"
load_paths << "#{Gem.loaded_specs['compass'].full_gem_path}/frameworks/compass/stylesheets"
end
RUBY
When I run 'rails new' with this template I get the following error:
undefined method `root' for Rails:Module (NoMethodError)
I'm new to app templates as well as this code block syntax. (What do you even call that <<-RUBY block? It's really hard to search for on google). It was my impression that it wouldn't be running any of the code inside the block so it shouldn't be causing errors. What gives?
UPDATE: Let me add some more context:
I'm trying to modify the app template here: https://github.com/leshill/rails3-app/blob/master/app.rb I want add the code from this blog post: http://metaskills.net/2011/05/18/use-compass-sass-framework-files-with-the-rails-3.1-asset-pipeline/ so that I can have compass support in rails3.1
To elaborate on mu's point.
The <<-SOMESTIRING syntax defines the beginning of a string. The string is terminated with SOMESTRING (at the start of the line)
For example you see this a lot
string = <<-EOF
Hey this is a really long string
with lots of new lines
EOF
string # => " Hey this is a really long string\n\n with lots of new lines\n"
In this case the RUBY is to signify that this is ruby code (that will be evaluated). You have to remember that when inside a string the #{ruby_code} escape syntax will evaluate the ruby_code given and insert the result into the string.
So to get around this you can do something like,
irb >> s = <<-RUBY
"#{'#{Rails.root}'}/app/assets/stylesheets"
RUBY
#=> ""\#{Rails.root}/app/assets/stylesheets"\n"
Here we break out of the string using #{} and then use the single quotes to tell ruby that we don't want the #{Rails.root} evaluated.
EDIT: I was thinking more about this, and realized this is equivalent and a little cleaner
irb >> s= <<-RUBY
Rails.root.to_s + "/app/assets/stylesheets"
RUBY #=> "Rails.root.to_s + "/app/assets/stylesheets"\n"
This way we don't have to worry about escaping at all : )
You are asking the "rails new" command to create a file and passing a block of content using a "heredoc" (signaled by the <<-SOMESTRING syntax). More about heredoc:
http://en.wikipedia.org/wiki/Here_document#Ruby
The parser will treat the content just like a Ruby string surrounded by doublequotes and attempt to substitute any string enclosed by #{}. It fails because it can't find a variable named Rails.root.
You can avoid the substitution behavior (have the content treated like a Ruby string surrounded by singlequotes) by using single-quote-style-heredoc. Surround the heredoc signal with singlequotes:
file 'config/sass.rb', <<-'RUBY'
Sass::Engine::DEFAULT_OPTIONS[:load_paths].tap do |load_paths|
load_paths << "#{Rails.root}/app/assets/stylesheets"
load_paths << "#{Gem.loaded_specs['compass'].full_gem_path}/frameworks/compass/stylesheets"
end
RUBY
Since you're creating Rails app template for a starter app, it might be helpful to look at the
Rails 3.1 Application Templates
from the Rails Apps project on GitHub.
The project provides good examples of app templates plus documentation (be sure to take a look at Thor::Actions and Rails::Generators::Actions).

Resources