Rails 3, Heroku: Taps Server Error: PGError: ERROR: invalid byte sequence for encoding "UTF8": 0xba - ruby-on-rails

I have a Rails 3.0.9 application running both locally in my dev env and remotely on a heroku app. I have a method that imports a CSV file into a model, and this file can contain non-english characters, like °,á,é,í, etc (it's in spanish).
I am currently able to import the complete file (75k records) without any problems in my local dev (SQLite) database; but, when uploading the db to heroku with heroku db:push, it fails with the error I'm posting in the title:
!!! Caught Server Exception
HTTP CODE: 500
Taps Server Error: PGError: ERROR: invalid byte sequence for encoding "UTF8": 0xba
HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
Apparently, Heroku has issues inserting the '°' character. (At the moment the file doesn't have any á,é,í, etc characters, but I suspect these might fail too.)
I have set in my application.rb file the default encoding, as follows:
#.../application.rb
config.encoding = "utf-8"
What else can I do to set the 'client encoding' and solve this problem?

The numero sign, º, is 0xBA in ISO-8869-1 not UTF-8. So your CSV file is encoded with Latin-1 but you're trying to store it in your database as UTF-8 without fixing the encoding.
You can try telling your CSV library that it is dealing with Latin-1 encoded text and maybe it will take care of converting to UTF-8. If that doesn't work, then you can do it yourself with Iconv:
ruby-1.9.2 > Iconv.iconv('UTF-8', 'ISO-8859-1', "\xba")
=> ["º"]
ruby-1.9.2 > Iconv.iconv('UTF-8', 'ISO-8859-1', "\xb0")
=> ["°"]
You're not having trouble with SQLite because SQLite tends be very forgiving and it has a very loose type system. PostgreSQL, OTOH, tends to be rather strict and properly complains if you try to feed it invalid data. I'd recommend that you stop developing on top of SQLite if you're going to be deploying to Heroku and PostgreSQL, there are other differences that will cause problems (the behavior of GROUP BY and LIKE for example).

Related

Neo4j to Gephi import : Failed to invoke procedure Invalid UTF-8

I tried to import my data from Neo4j into Gephi but it doesn't work.
I have the following result in Neo4j :
Failed to invoke procedure apoc.gephi.add: Caused by: com.fasterxml.jackson.core.JsonParseException: Invalid UTF-8 start byte 0xfb at [Source: (apoc.export.util.CountingInputStream); line: 1, column: 136]
As previously mentioned, it looks like neo4j is not exporting using UTF-8, so that, I would check how neo4j is generating the output.
Another possibility is that, when neo4j writing the output, something went slightly wrong.
I got this very same problem in the past when concurrently managing content in a file.
A thread crashed and closed "not correctly enough" the file. I mean, when reviewing the file, everything looks pretty normal, but some characters have been introduced which are not UTF-8. A tool like Atom can help you.
Best

Woodstox parser works fine in test run in Eclipse, but fails from command line

One of my JUnit tests uses (behind the scenes) the Woodstox parser.
When I run the test from within Eclipse, the test succeeds as expected.
But running the same test on the command line, using
mvn clean test -Dtest=com.example.MyClassTest#someParserTest
results in the test to fail with the following exception messages:
Error on line 114 column 21
SXXP0003: Error reported by XML parser: Invalid UTF-8 middle byte 0x3f (at char #4174, byte #3999)
...
at com.ctc.wstx.io.UTF8Reader.reportInvalidOther(UTF8Reader.java:314)
at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:205)
at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84)
at com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:55)
at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:961)
at com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4580)
at com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3657)
at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1063)
at com.ctc.wstx.sax.WstxSAXParser.fireEvents(WstxSAXParser.java:524)
at com.ctc.wstx.sax.WstxSAXParser.parse(WstxSAXParser.java:452)
at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:440)
at net.sf.saxon.event.Sender.send(Sender.java:171)
at net.sf.saxon.jaxp.IdentityTransformer.transform(IdentityTransformer.java:363)
I took a look at the to-be-parsed InputStream. The InputStreams are identical in both cases.
Also, there is no "line 114 column 21" in the InputStream. Line 114 ends on column 11.
How can I investigate what causes the different behavior?
It turned out that a library I used made wrong assumptions about the environment's default character encoding (also called platform's default charset).
In the Eclipse environment, calling Charset.defaultCharset() returned UTF-8, while in the command line environment it returned CP1252.
Many standard and third-party Java APIs behave differently depending on the platform's default charset, among them:
String.getBytes()
ByteArrayOutputStream.toString()
XMLOutputFactory.createXMLStreamWriter(OutputStream stream)
IOUtils.toString(InputStream input)
To resolve my issue, I had to update that library to explicitly use the correct character set:
String.getBytes(StandardCharsets.UTF_8)
ByteArrayOutputStream.toString( StandardCharsets.UTF_8.name() )
XMLOutputFactory.createXMLStreamWriter( OutputStream stream, StandardCharsets.UTF_8.name() )
IOUtils.toString(InputStream input, StandardCharsets.UTF_8)

Rails 3 Deploying to Heroku syntax error, unexpected $end

After successfully testing my "alpha" Rails 3 app in my local dev environment, I pushed it up to Heroku (Cedar) for live testing. The push was successful, but the app crashes upon startup with the following errors:
: => Booting WEBrick
: => Ctrl-C to shutdown server
: /app/vendor/bundle/ruby/1.9.1/gems/activesupport-3.2.3/lib/active_support/dependencies.rb:251:in `require': /app/app/controllers/dives_controller.rb:50: invalid multibyte char (US-ASCII) (SyntaxError)
: /app/app/controllers/dives_controller.rb:50: syntax error, unexpected $end
: Exiting
I have checked for unexpected characters and missing end statements, but can not seem to find any. I am not using any multilingual characters (to my knowledge).
Here are a few of my files including: Gemfile, Gemfile.lock, database.yml, dives_controller.rb
https://gist.github.com/2632041
Could this possibly have something to do with using postgres and not specifiying it in my database.yml correctly?
If you look at lines 50 and 51 of dives_controller.rb, you will notice some weird white-space sort of characters appearing before (they are highlighted in the github output). I have a feeling those are the characters that are causing the problem.
They may have cropped up by pressing some random keys on your keyboard by mistake. Just remove them and replace them with a space.
I'm not exactly sure why this works, but I removed the following lines in my dives_controller.rb and the app now deploys correctly:
##user = User.where(:facebook_id => current_user.identifier)
##dive = #user.dives.create(params[:dive])
##dive = Dive.new(params[:dive])
The spaces highlighted in red in the Gist file are non-breaking spaces. A nightmare for developers.
You can ask your IDE or text editor to show them in a different character.
For example, set listchars=trail:◃,nbsp:• tells VIM to display • for non-breaking spaces and ◃ for trailing spaces.

PGError: ERROR: invalid byte sequence for encoding "UTF8

I'm getting the following PGError while ingesting Rails emails from Cloudmailin:
PGError: ERROR: invalid byte sequence for encoding "UTF8": 0xbb HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding". : INSERT INTO "comments" ("content") VALUES ('Reply with blah blah ����������������������������������������������������� .....
So it seems pretty clear I have some invalid UTF8 characters getting into the email right? So I tried to clean that up but something is still Sneaking through. Here's what I have so far:
message_all_clean = params[:message]
Iconv.conv('UTF-8//IGNORE', 'UTF-8', message_all_clean)
message_plain_clean = params[:plain]
Iconv.conv('UTF-8//IGNORE', 'UTF-8', message_plain_clean)
#incoming_mail = IncomingMail.create(:message_all => Base64.encode64(message_all_clean), :message_plain => Base64.encode64(message_plain_clean))
Any ideas, thoughts or suggestions? Thanks
When encountering this issue on Heroku, we converted to US-ASCII to sanitize incoming data appropriately (i.e. pasted from Word):
Iconv.conv("UTF-8//IGNORE", "US-ASCII", content)
With this, we had no more issues with character encoding.
Also, double check that there's no other fields that need the same conversion, as it could affect anything that's passing a block of text to the database.

Rails + Ruby 1.9 "invalid byte squence in US-ASCII"

After upgrading to ruby 1.9 we began to notice pages failing to render from the rails template renderer when a user used a non-ASCII character. Specifically "é". I was able to resolve this issue on one of our staging servers, but I have not been able to reproduce the fix on our production server.
The fix that seemed to work the first time:
Converted the database from latin1 to utf8 using the convert_charset tool available here: http://www.mysqlperformanceblog.com/2009/03/17/converting-character-sets/. (including setting default_character_set=utf8 in my.cnf and running SET GLOBAL character_set_server=utf8
Switched to the sam-mysql-ruby adapter (instead of the standard mysql adapter: http://gemcutter.org/gems/sam-mysql-ruby)
Restarted rails
The error is:
"invalid byte sequence in US-ASCII"
Oddly, after following the steps above the error has not changed on our production server. Setting encoding: utf8 in database.yml does not change the error either.
The error raised on the following line of code:
<%= link_to h(question.title), question_path(question) %>
This blog seems to suggest a fix, but it mentions that this should not be a problem in 1.9: http://www.igvita.com/2007/04/11/secure-utf-8-input-in-rails/ (and it's over 2 years old).
I imagine this problem might soon affect a lot of people as more rails developers people switch to 1.9.
I found the solution:
The problem is:
Fetching data from any database (Mysql, Postgresql, Sqlite2 & 3), all configured to have UTF-8 as it's character set, returns the data with ASCII-8BIT in ruby 1.9.1 and rails 2.3.2.1.
(Taken from: https://rails.lighthouseapp.com/projects/8994/tickets/2476)
My attempt to use the patched mysql adapter likely failed because my database was not configured to natively use utf8, so the patched adapter failed to work properly.
The fix ended up being to use the patch file available here: http://gnuu.org/2009/11/06/ruby19-rails-mysql-utf8/
require 'mysql'
class Mysql::Result
def encode(value, encoding = "utf-8")
String === value ? value.force_encoding(encoding) : value
end
def each_utf8(&block)
each_orig do |row|
yield row.map {|col| encode(col) }
end
end
alias each_orig each
alias each each_utf8
def each_hash_utf8(&block)
each_hash_orig do |row|
row.each {|k, v| row[k] = encode(v) }
yield(row)
end
end
alias each_hash_orig each_hash
alias each_hash each_hash_utf8
end
(Placed in lib/mysql_utf8fix.rb and required in enviornment.rb using require 'lib/mysql_utf8fix.rb')
it is only require 'mysql_utf8fix.rb' (rails 2.3.11)
Please user mysql2(gem) adapter instead of mysql adapter in database.yml
and remove the mysql patches(If exists) and add the following lines in environment.rb.
Encoding.default_external = Encoding::UTF_8
Encoding.default_internal = Encoding::UTF_8
Then run in apache and passenger it ll work fine
Thanks,
Ramanavel Selvaraju.

Resources