I've isolated a problem with Ruby on Rails where a model with a serialized column is not properly loading data that has been saved to it.
What goes in is a Hash, and what comes out is a YAML string that can't be parsed due to formatting issues. I'd expect that a serializer can properly store and retrieve anything you give it, so something appears to have gone wrong.
The troublesome string in question is formatted something like this:
message_text = <<END
X
X
END
yaml = message_text.to_yaml
puts yaml
# =>
# --- |
#
# X
# X
puts YAML.load(yaml)
# => ArgumentError: syntax error on line 3, col 0: ‘X’
The combination of newline, indented second line, and non-indented third line causes the parser to fail. Omitting either the blank line or the indentation appears to remedy the problem, but this does seem to be a bug in the serialization process. Since it requires a rather unique set of circumstances, I'm willing to bet this is some strange edge-case that isn't properly handled.
The YAML module that ships with Ruby and is used by Rails looks to delegate a large portion of the processing to Syck, yet does provide Syck with some hints as to how to encode the data it is sending.
In yaml/rubytypes.rb there's the String#to_yaml definition:
class String
def to_yaml( opts = {} )
YAML::quick_emit( is_complex_yaml? ? self : nil, opts ) do |out|
if is_binary_data?
out.scalar( "tag:yaml.org,2002:binary", [self].pack("m"), :literal )
elsif to_yaml_properties.empty?
out.scalar( taguri, self, self =~ /^:/ ? :quote2 : to_yaml_style )
else
out.map( taguri, to_yaml_style ) do |map|
map.add( 'str', "#{self}" )
to_yaml_properties.each do |m|
map.add( m, instance_variable_get( m ) )
end
end
end
end
end
end
There appears to be a check there for strings that start with ':' and could be confused as Symbol when de-serializing, and the :quote2 option should be an indication to quote it during the encoding process. Adjusting this regular expression to catch the conditions described above does not appear to have any effect on the output, so I'm hoping someone more familiar with the YAML implementation can advise.
Yep, that looks like a bug in the C syck library. I checked it out using the PHP syck bindings (v 0.9.3): http://pecl.php.net/package/syck and the same bug is present, indicating it is a bug in the library as opposed to the ruby yaml library or ruby-syck bindings:
// phptestsyck.php
<?php
$message_text = "
X
X
";
syck_load(syck_dump($message_text));
?>
Running this on the cli gives the same SyckException:
$ php phptestsyck.php
PHP Fatal error: Uncaught exception 'SyckException' with message 'syntax error on line 5, col 0: 'X'' in /.../phptestsyck.php:8
Stack trace:
#0 /.../phptestsyck.php(8): syck_load('--- %YAML:1.0 >...')
#1 {main}
thrown in /.../phptestsyck.php on line 8
So, I suppose you could try to fix Syck itself. It appears that the library hasn't been updated since v0.55 in May of 2005 (http://rubyforge.org/projects/syck/), though.
Alternately, there is a pure-ruby yaml parser called RbYAML (http://rbyaml.rubyforge.org/) which originated with JRuby that doesn't appear to have this bug:
>> require 'rbyaml'
=> true
>> message_text = <<END
X
X
END
=> "\n X\nX\n"
>> yaml = RbYAML.dump(message_text)
=> "--- "\\n X\\nX\\n"\n"
>> RbYAML.load(yaml)
=> "\n X\nX\n"
>>
Finally, have you considered another serialization format altogether? Ruby's Marshal library doesn't have this bug either and is faster than Yaml (see http://significantbits.wordpress.com/2008/01/29/yaml-vs-marshal-performance/):
>> message_text = <<END
X
X
END
=> "\n X\nX\n"
>> marshal = Marshal.dump(message_text)
=> "\004\b"\f\n X\nX\n"
>> Marshal.load(marshal)
=> "\n X\nX\n"
You have to give up the easy serialize ActiveRecord::Base method to do so, but it's not hard otherwise to use your own serializing scheme.
For example, to serialize some field called 'person_data':
class Person < ActiveRecord::Base
def person_data
self[:person_data] ? Marshal.load(self[:person_data]) : nil
end
def person_data=(x)
self[:person_data] = Marshal.dump(x)
end
end
## User Person#person_data as normal and it is transparently marshalled
p = Person.find 1
p.person_data = {:color => "blue", :food => "vegetarian"}
(See this ruby forum thread for more)
Related
Testing an upgrade to Ruby 2.3.3 for our Rails 3.2.22.2 application, and getting a weird situation where we are passing an array as the first argument to Tempfile.new, but it's ending up as a hash.
I've patched tempfile.rb to output the basename argument being passed in.
In an irb session (non-Rails), everything is fine:
> require 'tempfile'
true
> Tempfile.new(['test', '.csv'])
["home", ".csv"] # output of basename argument for Tempfile.new
=> #<Tempfile:/var/blah/test###.csv>
In a rails console session:
> Tempfile.new(['test', '.csv'])
{"test"=>nil, ".csv"=>nil}
ArgumentError: unexpected prefix: {"test"=>nil, ".csv"=>nil}
from /path/to/ruby-2.3.3/lib/ruby/2.3.0/tmpdir.rb:113:in `make_tmpname'
Gotta be a gem or something, but can't figure out for the life of me why this is happening or where or what is changing the behavior.
Any ideas or suggestions on how to debug?
In your case I think that somewhere in your code you have the Array#to_hash method defined.
I had the same issue and for some reason when a method has a default param, in this case basename="", and a double splatted parameter, Ruby calls the to_hash function on the first param.
See the following example:
class Dummy
def initialize(val = "", **options)
puts "val = #{val}"
# puts "Options: #{options}"
end
end
class Array
def to_hash
puts "to_hash called on #{self}"
end
end
Dummy.new(["Joe", "Bloe"])
This will output
to_hash called on ["Joe", "Bloe"]
val = ["Joe", "Bloe"]
But when there's no default value for the val param, you'll get:
val = ["Joe", "Bloe"]
Note that the TempFile#initialize function signature was changed from Ruby 2.1 to Ruby 2.2.
Here's the diff:
- def initialize(basename, *rest)
+ def initialize(basename="", tmpdir=nil, mode: 0, **options)
Notice that basename doesn't have a default value anymore.
Just tried this in my console, and got no error. Try a few things,
Make sure you are using ruby 2.3 or higher in your rail app, because I believe that method make_tmpname was handled differently before.
Make sure the quotes around .csv are quotes and not a tilde `.
I get your same error with ruby 2.3.1 if I do this Tempfile.new(['test', /re/])
I hope this helps, at the end of the day what's causing your error is this method try_convert which is returning nil for the second argument you pass to Tempfile.new
This is how I fixed it.
class Tempfile
def initialize(basename="", tmpdir=nil, mode: 0, **options)
warn "Tempfile.new doesn't call the given block." if block_given?
basename = basename.keys if basename.kind_of?(Hash)
#unlinked = false
#mode = mode|File::RDWR|File::CREAT|File::EXCL
::Dir::Tmpname.create(basename, tmpdir, options) do |tmpname, n, opts|
opts[:perm] = 0600
#tmpfile = File.open(tmpname, #mode, opts)
#opts = opts.freeze
end
ObjectSpace.define_finalizer(self, Remover.new(#tmpfile))
super(#tmpfile)
end
end
I've been struggling with this for quite a while and I can't understand the issue. This only happens with a value loaded from the database.
I'm using Rails 4.0.13 and Ruby 2.0.0 by the way, using PostgreSQL
I have the following code:
class Term < ActiveRecord::Base
end
class Search < ActiveRecord::Base
scope :filter, ->(params) {
query = []
unless params[:term].blank?
for term in Array(params[:term]) do
query << "term = '#{term.gsub('"', '""')}'"
end
end
where(query.join(' or '))
}
end
Then, there's a term called "100%Carbs". If I try to filter by it like this, I get the following error:
term = Term.find 1
Search.filter(term: term.term)
ArgumentError: invalid byte sequence in UTF-8
from /home/app/shared/bundle/ruby/2.0.0/gems/activesupport-4.0.13/lib/active_support/core_ext/object/blank.rb:93:in `=~'
from /home/app/shared/bundle/ruby/2.0.0/gems/activesupport-4.0.13/lib/active_support/core_ext/object/blank.rb:93:in `!~'
from /home/app/shared/bundle/ruby/2.0.0/gems/activesupport-4.0.13/lib/active_support/core_ext/object/blank.rb:93:in `blank?'
from /home/app/releases/20160603154616/app/models/search.rb:4:in `block in <class:Search>'
But if I write the term on my own, I get a different error:
Search.filter(term: "100%Carbs")
PG::CharacterNotInRepertoire: ERROR: invalid byte sequence for encoding "UTF8": 0xca 0x72
: SELECT "search".* FROM "search" WHERE ((term_store->'term' = '100�rbs'))
ActiveRecord::StatementInvalid: PG::CharacterNotInRepertoire: ERROR: invalid byte sequence for encoding "UTF8": 0xca 0x72
: SELECT "search".* FROM "search" WHERE ((term_store->'term' = '100�rbs'))
I've tried a number of combinations of the methods encode and force_encoding, and I always get this error.
The only real difference I noticed between the two attempts is that they have different encoding when they go in the method:
2.0.0 :004 > "100%Carbs".encoding
=> #<Encoding:ISO-8859-1>
2.0.0 :005 > term.term.encoding
=> #<Encoding:UTF-8>
I really don't understand what's triggering this error, or why the percent character seems to be treated as something else instead of just a character. I've tried to escape it as well, with no success.
I'm really lost here and would appreciate any and all help. Thank you.
I came across something that seems unusual and I was wondering if anyone could explain why.
1.8.7 :001 > some_str = "Hello World"
=> "Hello World"
1.8.7 :002 > some_str.try(:match, /^(\w*)/)
=> #<MatchData "Hello" 1:"Hello">
1.8.7 :003 > $1
=> nil
1.8.7 :004 > some_str.match(/^(\w*)/)
=> #<MatchData "Hello" 1:"Hello">
1.8.7 :005 > $1
=> "Hello"
I'm not sure why the global variable $1 is not being set the first time, but is set the second. Any insights?
Let me show you how try is implemented. If you want to see it yourself, then take a look at the activesupport source. It's defined in /lib/active_support/core_ext/object/try.rb
class Object
def try(*a, &b)
if a.empty? && block_given?
yield self
else
public_send(*a, &b)
end
end
end
What this basically does, is just sending the method name and the complete arguments to the Object. public_send is the same as send, but can only be used to call public methods.
So I rewrote this, to debug your issue:
class Object
def try(*a)
result = public_send(*a)
puts $1.inspect
result
end
end
string = "Hello"
string.try(:match, /^(\w*)/)
puts $1.inspect
This outputs
"Hello"
nil
So the great question arises: Is this a bug in the ruby interpreter?. Maybe. At least it's not documented in any official source. I found a reference that tells the following (See Global variables.)
[...], $_ and $~ have local scope. Their names suggest they should be global, but they are much more useful this way, and there are historical reasons for using these names.
So it seems like $1 is not a global variable as well, even though it is reported by the Kernel as a global variable:
1.9.3-p194 :001 > global_variables
=> [:$;, :$-F, :$#, :$!, :$SAFE, :$~, :$&, :$`, :$', :$+, :$=, :$KCODE, :$-K,
:$,, :$/, :$-0, :$\, :$_, :$stdin, :$stdout, :$stderr, :$>, :$<, :$.,
:$FILENAME, :$-i, :$*, :$?, :$$, :$:, :$-I, :$LOAD_PATH, :$",
:$LOADED_FEATURES, :$VERBOSE, :$-v, :$-w, :$-W, :$DEBUG, :$-d, :$0,
:$PROGRAM_NAME, :$-p, :$-l, :$-a, :$binding, :$1, :$2, :$3, :$4, :$5, :$6,
:$7, :$8, :$9]
To make sure, I forwarded this incosistency to the Ruby Bug Tracker. See Ruby Bug #6723
try is defined as
def try(method, *args, &block)
send(method, *args, &block)
end
except of course on nil where it just returns nil. Why does this matter? Because the regexp globals aren't real globals: they're maintained on a per method and per thread basis (it's easy enough to see this by perusing the ruby source). When you call match via try the globals are set in the scope for try but in the next case they are set at the top level. It's easy to verify this
def do_match string, regexp
string =~ regexp
$1
end
do_match "Hello World", /^(\w*)/ #=> returns 'Hello'
$1 #=> returns nil
Have been hacking together a couple of libraries, and had an issue where a string was getting 'double escaped'.
for example:
Fixed example
> x = ['a']
=> ["a"]
> x.to_s
=> "[\"a\"]"
>
Then again to
\"\[\\\"s\\\"\]\"
This was happening while dealing with http headers. I have a header which will be an array, but the http library is doing it's own character escaping on the array.to_s value.
The workaround I found, was to convert the array to a string myself, and then 'undo' the to_s. Like so:
formatted_value = value.to_s
if value.instance_of?(Array)
formatted_value = formatted_value.gsub(/\\/,"") #remove backslash
formatted_value = formatted_value.gsub(/"/,"") #remove single quote
formatted_value = formatted_value.gsub(/\[/,"") #remove [
formatted_value = formatted_value.gsub(/\]/,"") #remove ]
end
value = formatted_value
... There's gotta be a better way ... (without needing to monkey-patch the gems I'm using). (yeah, this break's if my string actually contains those strings.)
Suggestions?
** UPDATE 2 **
Okay. Still having troubles in this neighborhood, but now I think I've figured out the core issue. It's serializing my array to json after a to_s call. At least, that seems to be reproducing what I'm seeing.
['a'].to_s.to_json
I'm calling a method in a gem that is returning the results of a to_s, and then I'm calling to_json on it.
I've edited my answer due to your edited question:
I still can't duplicate your results!
>> x = ['a']
=> ["a"]
>> x.to_s
=> "a"
But when I change the last call to this:
>> x.inspect
=> "[\"a\"]"
So I'll assume that's what you're doing?
it's not necessarily escaping the values - per se. It's storing the string like this:
%{["a"]}
or rather:
'["a"]'
In any case. This should work to un-stringify it:
>> x = ['a']
=> ["a"]
>> y = x.inspect
=> "[\"a\"]"
>> z = Array.class_eval(y)
=> ["a"]
>> x == z
=> true
I'm skeptical about the safe-ness of using class_eval though, be wary of user inputs because it may produce un-intended side effects (and by that I mean code injection attacks) unless you're very sure you know where the original data came from, or what was allowed through to it.
ActiveSupport offers the nice method to_sentence. Thus,
require 'active_support'
[1,2,3].to_sentence # gives "1, 2, and 3"
[1,2,3].to_sentence(:last_word_connector => ' and ') # gives "1, 2 and 3"
it's good that you can change the last word connector, because I prefer not to have the extra comma. but it takes so much extra text: 44 characters instead of 11!
the question: what's the most ruby-like way to change the default value of :last_word_connector to ' and '?
Well, it's localizable so you could just specify a default 'en' value of ' and ' for support.array.last_word_connector
See:
from: conversion.rb
def to_sentence(options = {})
...
default_last_word_connector = I18n.translate(:'support.array.last_word_connector', :locale => options[:locale])
...
end
Step by step guide:
First, Create a rails project
rails i18n
Next, edit your en.yml file: vim config/locales/en.yml
en:
support:
array:
last_word_connector: " and "
Finally, it works:
Loading development environment (Rails 2.3.3)
>> [1,2,3].to_sentence
=> "1, 2 and 3"
As an answer to how to override a method in general, a post here gives a nice way of doing it. It doesn't suffer from the same problems as the alias technique, as there isn't a leftover "old" method.
Here how you could use that technique with your original problem (tested with ruby 1.9)
class Array
old_to_sentence = instance_method(:to_sentence)
define_method(:to_sentence) { |options = {}|
options[:last_word_connector] ||= " and "
old_to_sentence.bind(self).call(options)
}
end
You might also want read up on UnboundMethod if the above code is confusing. Note that old_to_sentence goes out of scope after the end statement, so it isn't a problem for future uses of Array.
class Array
alias_method :old_to_sentence, :to_sentence
def to_sentence(args={})
a = {:last_word_connector => ' and '}
a.update(args) if args
old_to_sentence(a)
end
end