invalid byte sequence in UTF-8 for single quote in Ruby - ruby-on-rails

I'm using the following code to show description in template:
json.description resource.description if resource.description.present?
It gives me invalid byte sequence in UTF-8 error. I dig this a little bit, and find out the issue is my description has single quote as ’ instead of '. Wondering what's the best way to fix this encoding issue? Description is input by user and I have no control over it. Another weird issue is, I have multiple test environments, they all have the same Ruby and Rails version and they are running the same code, but only one of the environment has this error.

def to_utf8(str)
str = str.force_encoding("UTF-8")
return str if str.valid_encoding?
str = str.force_encoding("BINARY")
str.encode("UTF-8", invalid: :replace, undef: :replace)
end
ref: https://stackoverflow.com/a/17028706/455770

Related

neo4j-admin import "Multi-line fields are illegal"

I'm getting the following error in Neo4j community 4.1.2 using the neo4-admin import tool.
Caused by:ERROR in input
data source: BufferedCharSeeker[source:/home/ubuntu/workspace/neo4j-community-4.1.2/bin/../import/nodes.csv, position:24455, line:359]
in field: code:string:6
for header: [id:ID, labels:LABEL, type:string, flags:string, lineno:string, code:string, childnum:string, funcid:string, classname:string, namespace:string, endlineno:string, name:string, doccomment:string]
raw field value: 402
original error: At /home/ubuntu/workspace/neo4j-community-4.1.2/bin/../import/nodes.csv # position 24455 - Multi-line fields are illegal in this context and so this might suggest that there's a field with a start quote, but a missing end quote. See /home/ubuntu/workspace/neo4j-community-4.1.2/bin/../import/nodes.csv # position 24455.
I checked each single byte with hexedit:
the line #359
the char #24455
the line #358
the line #360
357,AST,string,,34,"/load.php",1,310,,"",,,
358,AST,AST_CALL,,37,,9,310,,"",,,
359,AST,AST_NAME,NAME_NOT_FQ,37,,0,310,,"",,,
360,AST,string,,37,"wp_check_php_mysql_versions",0,310,,"",,,
361,AST,AST_ARG_LIST,,37,,1,310,,"",,,
362,AST,AST_INCLUDE_OR_EVAL,EXEC_REQUIRE,40,,10,310,,"",,,
This is the absurd situation:
no multi-line fields are present
no special char are present
no extra 0A byte
no extra "start quote" without its relative "end quote"
I found some issues on Github but are referred to old versions of Neo4j...what can be the reason?
Finally I found the line causing the exception.
The exception cause was correct but the number of the line was totally wrong.
I pointed out it by adding the following flag --multiline-fields=true to the neo4j-admin import command.

Delete special characters in malformed text

I am encountering some malformed text, and can't seem to find a generalized way to remove the special characters.
This is the text as seen on the website: Technological�\x00 Sciences. String#force_encoding('UTF-8') results: Technological\u0000 Sciences, which still causes Nokogiri to terminate early.
I could do a quick and dirty gsub "Technological\u0000 Sciences".gsub(/\u0000/,''), but was wondering if there was a more generalized solution, or a configuration in Nokogiri or ruby that would also work?
You can try this:
"Technological�\x00 Sciences".gsub(/[^[:alnum:][:space:][:punct:]]/, '')
You could do:
[29] pry(main)> str
=> "Technological�\u0000 Sciences"
[30] pry(main)> str.scan(/[a-zA-Z]{2,}/).join(' ')
=> "Technological Sciences"

error string variable in delphi

Hi I have the following code
delete: = 'testing # {}' testing ';
the problem is that when I use 'fails because it does not know how to avoid this error in other languages ​​such as perl is solved by using \' delphi but does not work.
someone could help form the variable without errors?
Assuming that your problem is trying to put a quote character inside a string literal, then try this:
delete := 'testing # {}'' testing ';

PGError: ERROR: invalid byte sequence for encoding "UTF8

I'm getting the following PGError while ingesting Rails emails from Cloudmailin:
PGError: ERROR: invalid byte sequence for encoding "UTF8": 0xbb HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding". : INSERT INTO "comments" ("content") VALUES ('Reply with blah blah ����������������������������������������������������� .....
So it seems pretty clear I have some invalid UTF8 characters getting into the email right? So I tried to clean that up but something is still Sneaking through. Here's what I have so far:
message_all_clean = params[:message]
Iconv.conv('UTF-8//IGNORE', 'UTF-8', message_all_clean)
message_plain_clean = params[:plain]
Iconv.conv('UTF-8//IGNORE', 'UTF-8', message_plain_clean)
#incoming_mail = IncomingMail.create(:message_all => Base64.encode64(message_all_clean), :message_plain => Base64.encode64(message_plain_clean))
Any ideas, thoughts or suggestions? Thanks
When encountering this issue on Heroku, we converted to US-ASCII to sanitize incoming data appropriately (i.e. pasted from Word):
Iconv.conv("UTF-8//IGNORE", "US-ASCII", content)
With this, we had no more issues with character encoding.
Also, double check that there's no other fields that need the same conversion, as it could affect anything that's passing a block of text to the database.

invalid multibyte escape after upgrade to rails 3 and ruby 1.9.2 -- dtext = '[^\\x80]'

I am upgrading my app from rails 2 to 3 and when i 'require' this file that has an email address validator i get an 'invalid multibyte escape' error with:
dtext = '[^\\\\x80]'
pattern = /\A#{dtext}\z/
Any thoughts?
Try using:
pattern = /\A#{dtext}\z/, nil, 'n'
Check out details on encodings and regexp for more.
And I use and recommend this awesome article on encodings in Ruby.
Modify the rfc822.rb file and change the addr_spec line to the following:
addr_spec = Regexp.new("#{local_part}\\x40#{domain}", nil, 'n')
That should resolve the issue. I got the solution from another gem, see https://github.com/saepia/rfc822/blob/master/lib/rfc822.rb

Resources