"Sunspot" Gem makes distinction between UTF-8 chars - ruby-on-rails

In a Rails app I started using sunspot => https://github.com/sunspot/sunspot/blob/master/README.md
Everything went OK until I noticed this (taken from the rails-console):
1.9.3p194 :002 > MyModel.search{fulltext "leon"}.results
=> [#<MyModel id: 16, name: "Leon">]
1.9.3p194 :003 > MyModel.search{fulltext "león"}.results
=> [#<MyModel id: 18, name: "León">]
How can I tell the system not to make distinction between "leon" and "león"
(I want smth like search{fulltext "leon"} => [#MyModel id: 16 ... , #MyModel id: 18...])
I've been looking for this problem and I've found every time the same response:
With this line in Gemfile works meanwhile the next release of rsolr:
gem 'rsolr', :git => "https://github.com/mwmitchell/rsolr.git"
thx

in the schema.xml you need to add a character filter as described in AnalyzersTokenizersTokenFilters for example:
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
and in the you should have mapping-ISOLatin1Accent.txt you should have entries that will map the unicode byte sequence to a asci character sequence. You can see an example here
mapping-ISOLatin1Accent.txt

Thx for the responses. At least I've solved it right last night with anohter idea I've taked from http://codeshooter.wordpress.com/2011/01/13/full-text-search-in-in-rails-with-sunspot-and-solr/
the idea is
in Restaurant.rb
text :name do
self.name.my_normalize
end
and the function
to_s.mb_chars.normalize(:kd).gsub(/[^\x00-\x7F]/,'').downcase
that line works with strings like "äáàÁÄÀ" --- "aaaaaa"

You need to make changes inside the Solr (the application, not the gem) configuration files. Solr is "embedded" in the gem, but you can access its configuration as if it were installed separately. Have a look at Solr documentation.

Related

Soft signs in rails console

I want to create multiple categories via console and I want to be able add soft signs. At this moment I can't do that.
It's very important to project that I can save category names with soft signs.
Can somebody tip me where to search? I searched such tag - soft signs rails.
There wasn't any usefull resource.
Thanks
EDIT
Soft signs in my native language is like this.
Ā,Š,Ē,Ž with that symbol called soft sign abowe the character.
At this moment when I try to save new category record it shows me this kind off error
thodError: undefined methodcache_ancestry!' for #
But I am sure that I didn't change anything in models or controllers :(
What version of Ruby is this? What you're seeing there are either US-ASCII strings with UTF-8 data in them (Ruby 1.9) or byte arrays (Ruby 1.8).
If you're using Ruby 1.8, you may need to use Iconv to convert your encoding from US-ASCII to UTF-8. If you're using Ruby 1.9, then make sure you're creating UTF-8 strings and it should work just fine.
Note that those escape sequences are correct - that is the literal byte array of those characters, assuming the proper encoding is applied, so you may not need to actually change anything. If the bytes are right, everything's fine - you're just seeing ruby interpret the string as ASCII rather than UTF-8 or whatnot.
In Ruby 1.8, when you #inspect a string, you get the escaped version, but putsing it will show you the actual string:
1.8.7 :021 > s = "Komunālās mašīnas"
=> "Komun\304\201l\304\201s ma\305\241\304\253nas"
1.8.7 :022 > puts s
Komunālās mašīnas
In 1.9, you get the correct display all around, so long as your encoding is right:
1.9.3p327 :001 > s = "Komunālās mašīnas"
=> "Komunālās mašīnas"
1.9.3p327 :004 > s.force_encoding "US-ASCII"
=> "Komun\xC4\x81l\xC4\x81s ma\xC5\xA1\xC4\xABnas"
1.9.3p327 :005 > puts s
Komunālās mašīnas
Check this out Edgars:
#encoding: UTF-8
t = 'ŠšÐŽžÀÁÂÃÄAÆAÇÈÉÊËÌÎÑNÒOÓOÔOÕOÖOØOUÚUUÜUÝYÞBßSàaáaâäaaæaçcèéêëìîðñòóôõöùûýýþÿƒ'
fallback = {
'Š'=>'S', 'š'=>'s', 'Ð'=>'Dj','Ž'=>'Z', 'ž'=>'z', 'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A',
'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E', 'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I',
'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O', 'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U', 'Ú'=>'U',
'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss','à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a',
'å'=>'a', 'æ'=>'a', 'ç'=>'c', 'è'=>'e', 'é'=>'e', 'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i',
'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o', 'ô'=>'o', 'õ'=>'o', 'ö'=>'o', 'ø'=>'o', 'ù'=>'u',
'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'ý'=>'y', 'þ'=>'b', 'ÿ'=>'y', 'ƒ'=>'f'
}
p t.encode('us-ascii', :fallback => fallback)
See Ruby 1.9.x replace sets of characters with specific cleaned up characters in a string
EDIT:
To get all the characters for your language you will need to add them as desired to the fallback hash. When I run "Komunālās mašīnas" as the variable 't' I get this:
t = "Komunālās mašīnas"
t.encode('us-ascii', :fallback => fallback)
Encoding::UndefinedConversionError: U+0101 from UTF-8 to US-ASCII
You can tell from this where the problem lies by googling U+0101 which shows
http://www.charbase.com/0101-unicode-latin-small-letter-a-with-macron
So now you know which letter is not working and you can add it to the fallback hash like so:
fallback = { OTHER DEFINITIONS , 'ā'=>'a'}
Here's a place to start:
http://www.ascii-codes.com/cp775.html

Cannot select column with accent in rails

I have the Level record in my rails app, it contains 1 field (name) that can contain chars with accents.
1.9.3p125 :008 > Level.all
Level Load (0.4ms) SELECT "levels".* FROM "levels"
=> [#<Level id: 1, name: "Débutant">, #<Level id: 2, name: "Intermédiaire">, #<Level id: 3, name: "Avancé">]
But when I query, I have:
1.9.3p125 :011 > Level.where("name = ?", "D\U+FFC3\U+FFA9butant").first
Level Load (0.3ms) SELECT "levels".* FROM "levels" WHERE (name = 'Dbutant') LIMIT 1
=> nil
I cannot type Level.where("name = ?", "Débutant").first when using rails c as é is directly replaced by \U+FFC3\U+FFA9. But in my controller, the result is the same, I cannot query accentuated string.
I currently use sqlite for my tests.
Your problem in the console is related to the readline library: You have to install Ruby with proper readline support.
If you have installed Ruby using rvm, the quickest way to resolve this problem is:
rvm pkg install readline
and then (assuming you are using 1.9.3, otherwise just replace with your Ruby version):
rvm reinstall 1.9.3 --with-readline-dir=$rvm_path/usr
After that, typing é in the Rails console should work again...
Regarding your other problem (in the controller): Could you please post a log excerpt containing the SQL query that is being made?
Place this string
#encoding: utf-8
to the first line of your Ruby file.

Rails ActiveRecord invalid byte sequence in UTF-8 issue

I am using MSSQL 2005.
I have StringIO object that contains my zip file content.
Here is how I obtain zip binary data:
stringio = Zip::ZipOutputStream::write_buffer do |zio|
Eclaim.find_by_sql("SET TEXTSIZE 67108864")
zio.put_next_entry("application.xml")
#zio.write #claim_db[:xml]
biblio = Nokogiri::XML('<?xml version="1.0" encoding="utf-8"?>' + #claim_db[:xml], &:noblanks)
zio.write biblio.to_xml
builder = Nokogiri::XML::Builder.new(:encoding => 'utf-8') do |xml|
xml.documents {
docs.where("ext not in (#{PROHIBITED_EXTS.collect{|v| "'#{v}'"}.join(', ')})").each{|doc|
zio.put_next_entry("#{doc[:materialtitle_id]}.#{doc[:ext]}")
zio.write doc[:efile]
xml.document(:id => doc[:materialtitle_id]) {
xml.title doc[:title]
xml.code doc[:code]
xml.filename "#{doc[:materialtitle_id]}.#{doc[:ext]}"
xml.extname doc[:ext]
}
}
}
end
zio.put_next_entry("docs.xml")
zio.write builder.to_xml
end
stringio
In my controller I try:
data.rewind
#claim.docs.create(
:title => 'Some file',
:ext => 'zip',
:size => data.length,
:receive_date => Time.now,
:efile => data.sysread
)
But Rails complains invalid byte sequence in UTF-8
Help me pls with it.
If the stream is configured as UTF-8 stream, you can't write compressed binary (which may contain any value).
I think, setting data as binary stream before write:
data.force_encoding "ASCII-8BIT"
might help.
What is complaining, ruby, ActiveRecord, or SQL Server? My guess is SQL Server. Make sure the data type of the efile field in the database is a binary BLOB.
You should break your problem down in constituate parts and troubleshoot smaller units. Remove complexity till you get to the source of the issue. For instance, have you tried just doing a simple Document.create with the attributes in say the console to remove the possibility that your controller code may be buggy? Something like Document.create :efile => File.read('sometiny.zip') and just go from there.
Assuming that works or break, you have a much simpler support request and less noise to issue ratio. Right now I suspect your controller code, not the SQL Server Adapter or the connection mode, as I have both tested to the hilt for simple binary data. Assuming the above does not work, you can then moving to examining smaller components.
For instance, what is the data type of the efile column? Do this in the console to find out, Document.columns_hash['efile'] and look at the #sql_type. Is it something suitable like varbinary(max)?
Moving on from there, what connection mode are you using with the SQL Server Adapter, TinyTDS? By default TinyTDS will convert everything to UTF8 as needed and is really smart about things. I have it tested with everything from binary to many different encodings. BTW, if you are using TinyTDS, did you make sure that you compiled FreeTDS with libiconv so it can do all this properly? You can easily check by doing the following in the console tsql -C assuming you have FreeTDS's binaries in your path. This should output a few lines, look for "iconv library: yes". Also make sure you are running 0.91 or better too!
Lastly, a bit of advice, that SET TEXTSIZE is so wrong there. You only want to do that once per connection. See here https://github.com/rails-sqlserver/activerecord-sqlserver-adapter#configure-connection--app-name

Fastercsv shows malformedCSVError, what am i doing wrong?

I am implementing in Ruby and i am running a project which reads a CSV file to add users.
but when i pick my file it just gives always the same error:
FasterCSV::MalformedCSVError in User importController#match
Illegal quoting on line 1.
my CSV file just exists of :
"RubenPersoon1","test","Bauwens","Ruben","rub#gmail.com",0
anyone who knows what can be wrong?
Try to upgrade your FasterCSV gem version. With the latest version it works:
FasterCSV.parse_line '"RubenPersoon1","test","Bauwens","Ruben","rub#gmail.com",0'
=> ["RubenPersoon1", "test", "Bauwens", "Ruben", "rub#gmail.com", "0"]
ruby-1.8.7-p352 :005 > FasterCSV.parse '"RubenPersoon1","test","Bauwens","Ruben","rub#gmail.com",0'
=> [["RubenPersoon1", "test", "Bauwens", "Ruben", "rub#gmail.com", "0"]]
Also, keep in mind that if you are on Ruby 1.9.2, FasterCSV is already included. Just require 'csv' and use the CSV class.

How do I create a checkmark in unicode?

I would think it would be:
"✓".encode(:unicode)
But I think this is not the correct usage of .encode. And when I say:
"✓".encode('Unicode')
it cannot do the conversion.
If you're using Ruby 1.9 (which has much better built-in encoding support), you can do this:
> checkmark = "\u2713"
# => "✓"
> checkmark.encoding
# => #<Encoding:UTF-8>

Resources