convert jruby 1.8 string to windows encoding? - ruby-on-rails

I want to export some data from my jruby on rails webapp to excel, so I create a csv string and send it as a download to the client using
send_data(text, :filename => "file.csv", :type => "text/csv; charset=CP1252", :encoding => "CP1252")
The file seems to be in UTF-8 which Excel cannot read correctly. I googled the problem and found that iconv can convert encodings. I try to do that with:
ic = Iconv.new('CP1252', 'UTF-8')
text = ic.iconv(text)
but when I send the converted text it does not make any difference. It is still UTF-8 and Excel cannot read the special characters. there are several solutions using iconv, so this seems to work for others. When I convert the file on the linux shell manually with iconv it works.
What am I doing wrong? Is there a better way?
Im using:
- jruby 1.3.1 (ruby 1.8.6p287) (2009-06-15 2fd6c3d) (Java HotSpot(TM) Client VM 1.6.0_19) [i386-java]
- Debian Lenny
- Glassfish app server
- Iceweasel 3.0.6
Edit:
Do I have to include some gem to use iconv?
Solution:
S.Mark pointed out this solution:
You have to use UTF-16LE encoding to make excel understand it, like this:
text= Iconv.iconv('UTF-16LE', 'UTF-8', text)
Thanks, S.Mark for that answer.

According to my experience, Excel cannot handle UTF-8 CSV files properly. Try UTF-16 instead.
Note: Excel's Import Text Wizard appears to work with UTF-8 too
Edit: A Search on Stack Overflow give me this page, please take a look that.
According to that, adding a BOM (Byte Order Mark) signature in CSV will popup Excel Text Import Wizard, so you could use it as work around.

Do you get the same result with the following?
cp1252= Iconv.conv("CP1252", "UTF8", text)

Related

How to correctly handle character encoding when using Postgresql's copy_data function?

In my Rails app, I managed to stream large CSV files directly from Postgres based on solutions mentioned in this SO post. My working code looks somewhat like so:
query = <A Long SQL Query String>
response.headers["Cache-Control"] = "no-cache"
response.headers["Content-Type"] = "text/csv; charset=utf-8"
response.headers["Content-Disposition"] =
%(attachment; filename="#{csv_filename}")
response.headers["Last-Modified"] = Time.now.ctime.to_s
conn = ActiveRecord::Base.connection.raw_connection
conn.copy_data("COPY (#{query}) TO STDOUT WITH (FORMAT CSV, HEADER TRUE, FORCE_QUOTE *, ESCAPE E'\\\\');") do
while row = conn.get_copy_data
response.stream.write row
end
end
response.stream.close
end
Some of the columns (VARCHAR) being queried have values as either English or Chinese strings. The CSV file resulting from the above code doesn’t show the Chinese characters as is. Instead, I get something like this:
大大 文文
Am I supposed to change the way I’m using the copy_data function, or is there something I could do to the CSV file to solve this? I’ve tried saving the file as UTF-8 .txt file, as well as trying the convert_to function mentioned in the copy_data documentation, but to no avail.
This depends of the original encoding included in the CSV file.
Do this on Linux :
file -i you_file
Are you sure it's not UTF-16 or GB 18030 ?
And also in what kind of encoding is setup your database ?
do a \l in psql to see this.
So it boiled down to my MS Excel not being able to render the Chinese chars correctly. On MacOS, opening the same .csv file using the Numbers app (or even Atom, for that matter) resolved this issue for me.

Rails CSV file upload character encoding issues

I know there are a lot of threads about this already, but none of the solutions suggested do not seem to work for me for some reason...
I am using:
Ruby 1.9.2
Rails 2.3.8
My users author CSV files in MS Excel and then need to upload these files to the web application. My web application and the database backend uses UTF-8 and all special characters, such as the £ sign, get corrupted on upload.
I am reading in the file like this:
#file = params[:import_file][:uploaded_data]
Then get encoding of the file using:
source_encoding = "UTF-8"
if #file.external_encoding
source_encoding = #file.external_encoding.name
end
For my test file the source encoding value is ASCII-8BIT.
Then I try to do:
#file.each {|line|
print "#{line.force_encoding(source_encoding).encode!("UTF-8") }\n"
}
in order to see if all texts are displaying ok. However this gives me error like this:
"\xA3" from ASCII-8BIT to UTF-8
If I am trying to read the CSV with:
dataArray = CSV.read(#file, encoding: source_encoding)
No errors this time, but all special characters go as ? characters.
Any pointers where I might be going wrong or is importing CSV file authored with MS Excel just a mission impossible?
Regards,
Olli

Rails ActiveRecord invalid byte sequence in UTF-8 issue

I am using MSSQL 2005.
I have StringIO object that contains my zip file content.
Here is how I obtain zip binary data:
stringio = Zip::ZipOutputStream::write_buffer do |zio|
Eclaim.find_by_sql("SET TEXTSIZE 67108864")
zio.put_next_entry("application.xml")
#zio.write #claim_db[:xml]
biblio = Nokogiri::XML('<?xml version="1.0" encoding="utf-8"?>' + #claim_db[:xml], &:noblanks)
zio.write biblio.to_xml
builder = Nokogiri::XML::Builder.new(:encoding => 'utf-8') do |xml|
xml.documents {
docs.where("ext not in (#{PROHIBITED_EXTS.collect{|v| "'#{v}'"}.join(', ')})").each{|doc|
zio.put_next_entry("#{doc[:materialtitle_id]}.#{doc[:ext]}")
zio.write doc[:efile]
xml.document(:id => doc[:materialtitle_id]) {
xml.title doc[:title]
xml.code doc[:code]
xml.filename "#{doc[:materialtitle_id]}.#{doc[:ext]}"
xml.extname doc[:ext]
}
}
}
end
zio.put_next_entry("docs.xml")
zio.write builder.to_xml
end
stringio
In my controller I try:
data.rewind
#claim.docs.create(
:title => 'Some file',
:ext => 'zip',
:size => data.length,
:receive_date => Time.now,
:efile => data.sysread
)
But Rails complains invalid byte sequence in UTF-8
Help me pls with it.
If the stream is configured as UTF-8 stream, you can't write compressed binary (which may contain any value).
I think, setting data as binary stream before write:
data.force_encoding "ASCII-8BIT"
might help.
What is complaining, ruby, ActiveRecord, or SQL Server? My guess is SQL Server. Make sure the data type of the efile field in the database is a binary BLOB.
You should break your problem down in constituate parts and troubleshoot smaller units. Remove complexity till you get to the source of the issue. For instance, have you tried just doing a simple Document.create with the attributes in say the console to remove the possibility that your controller code may be buggy? Something like Document.create :efile => File.read('sometiny.zip') and just go from there.
Assuming that works or break, you have a much simpler support request and less noise to issue ratio. Right now I suspect your controller code, not the SQL Server Adapter or the connection mode, as I have both tested to the hilt for simple binary data. Assuming the above does not work, you can then moving to examining smaller components.
For instance, what is the data type of the efile column? Do this in the console to find out, Document.columns_hash['efile'] and look at the #sql_type. Is it something suitable like varbinary(max)?
Moving on from there, what connection mode are you using with the SQL Server Adapter, TinyTDS? By default TinyTDS will convert everything to UTF8 as needed and is really smart about things. I have it tested with everything from binary to many different encodings. BTW, if you are using TinyTDS, did you make sure that you compiled FreeTDS with libiconv so it can do all this properly? You can easily check by doing the following in the console tsql -C assuming you have FreeTDS's binaries in your path. This should output a few lines, look for "iconv library: yes". Also make sure you are running 0.91 or better too!
Lastly, a bit of advice, that SET TEXTSIZE is so wrong there. You only want to do that once per connection. See here https://github.com/rails-sqlserver/activerecord-sqlserver-adapter#configure-connection--app-name

Character Encoding issue in Rails v3/Ruby 1.9.2

I get this error sometimes "invalid byte sequence in UTF-8" when I read contents from a file. Note - this only happens when there are some special characters in the string. I have tried opening the file without "r:UTF-8", but still get the same error.
open(file, "r:UTF-8").each_line { |line| puts line.strip(",") } # line.strip generates the error
Contents of the file:
# encoding: UTF-8
290919,"SE","26","Sk‰l","",59.4500,17.9500,, # this errors out
290956,"CZ","45","HornÌ Bradlo","",49.8000,15.7500,, # this errors out
290958,"NO","02","Svaland","",58.4000,8.0500,, # this works
This is the CSV file I got from outside and I am trying to import it into my DB, it did not come with "# encoding: UTF-8" at the top, but I added this since I read somewhere it will fix this problem, but it did not. :(
Environment:
Rails v3.0.3
ruby 1.9.2p0 (2010-08-18 revision 29036) [x86_64-darwin10.5.0]
Ruby has a notion of an external encoding and internal encoding for each file. This allows you to work with a file in UTF-8 in your source, even when the file is stored in a more esoteric format. If your default external encoding is UTF-8 (which it is if you're on Mac OS X), all of your file I/O is going to be in UTF-8 as well. You can check this using File.open('file').external_encoding. What you're doing when you opening your file and passing "r:UTF-8" is forcing the same external encoding that Ruby is using by default.
Chances are, your source document isn't in UTF-8 and those non-ascii characters aren't mapping cleanly to UTF-8 (if they were, you would either get the correct characters and no error, and if they mapped by incorrectly, you would get incorrect characters and no error). What you should do is try to determine the encoding of the source document, then have Ruby transcode the document on read, like so:
File.open(file, "r:windows-1251:utf-8").each_line { |line| puts line.strip(",") }
If you need help determining the encoding of the source, give this Python library a whirl. It's based on the automatic charset detection fallback that was in Seamonkey/Mozilla (and is possibly still in Firefox).
If you want to change your file encoding, you can use gem 'charlock holmes'
https://github.com/brianmario/charlock_holmes
$require 'charlock_holmes/string'
content = File.read('test2.txt')
if !content.is_utf8?
detection = CharlockHolmes::EncodingDetector.detect(content)
utf8_encoded_content = CharlockHolmes::Converter.convert content, detection[:encoding], 'UTF-8'
end
Then you can save your new content in a temp file and overwrite your original file.
Hope this help.

Iconv.conv in Rails application to convert from unicode to ASCII//translit

We wanted to convert a unicode string in Slovak language into plain ASCII (without accents/carons) That is to do: č->c š->s á->a é->e etc.
We tried:
cstr = Iconv.conv('us-ascii//translit', 'utf-8', a_unicode_string)
It was working on one system (Mac) and was not working on the other (Ubuntu) where it was giving '?' for accented characters after conversion.
Problem: iconv was using LANG/LC_ALL variables. I do not know why, when the encodings are known, but well... You had to set the locale variables to something.utf8, for example: sk_SK.utf8 or en_GB.utf8
Next step was to try to set ENV['LANG'] and ENV['LC_ALL'] in config/application.rb. This was ignored by Iconv in ruby.
Another try was to use global system setting in /etc/default/locale - this worked in command line, but not for Rails application. Reason: apache has its own environment. Therefore the final solution was to add LANG/LC_ALL variables into /etc/apache2/envvars:
export LC_ALL="en_GB.utf8"
export LANG="en_GB.utf8"
export LANGUAGE="en_GB.utf8"
Restarted apache and it worked.
This is more a little how-to than a question. However, if someone has better solution I would like to know about it.
You can try unaccent approach instead.

Resources