I have recently moved an application from Ubuntu across to a Redhat server, and noticed that a difference has occurred when writing a file, with \r\n being written, rather than simply \n.
I am explicitly setting the \n in the data to be written. So, for example
data = "Hello\nWorld"
File.open("#{ Rails.root }/tmp/file.txt", "wb") { |f| f.write(data) }
What is being written is actually "Hello\r\nWorld".
I know Ruby sets the line breaks according to the system it is being run on, but is there a way of enforcing it to keep to \n whatever the system?
Don't put escape sequences in double quote because ruby look for substitutions and replace them with some binary value.
If you want enforcing ruby to keep to same like '\n' than you have to use single quote.
Example:
data = 'Hello\nWorld'
File.open("#{ Rails.root }/tmp/file.txt", "wb") { |f| f.write(data) }
It will keep it same. :)
Related
I'm working on a project with some supplied CSV files that I need to parse and do some manipulation on. One is throwing this error when I try to load it into a file using CSV.read('path/file.csv')
CSV::MalformedCSVError: Unquoted fields do not allow \r or \n (line 7911).
Now when looking at the file, the last line is just blank. It's a \n character. I feel like this should not break the CSV read but it is. Now, I could just check the end of the CSV documents and strip any access return carriages/new lines since that seems like it'll work but it doesn't seem like the correct way. Anybody have some advice?
Edit: Using Ruby 2.0.0 and Rails 4.0.5
I've learnt that you may define a Ruby source file as UTF-8 to be able to key inside it double-byte characters (e.g.: ¤) instead of their HTML code (e.g.: & curren;):
# encoding: UTF-8
class Price < ActiveRecord:Base
def currency_symbol
'¤'
end
end
Without the encoding statement, I would need to write '& curren;'.html_safe as the core of the method.
I don't like the later because it assume I'm writing HTML (I have Excel output in my app on top of HTML).
My question is: Is there any problems or performance hits I must be aware while doing this?
Note: Ruby 2.0 brings UTF-8 as the default encoding; does it mean all Ruby files will automatically support all those characters?
Character chart: http://dev.w3.org/html5/html-author/charref
This is exactly the kind of thing that should go in the locales (config/locales). These are YAML files that define words and characters that will be used in the various parts of your application, including currency symbols. It also has the benefit of allowing you to easily introduce translations for other languages.
Take a look at the ruby on rails guide for i18n for more.
I'm using the DocSplit gem for Ruby 1.9.3 to create Unicode UTF-8 versions of word documents. To my surprise today while I was running a test on a particular piece of one of these documents I started running into character encoding inconstencies.
I have tried a number of different methods to resolve the issue which I will list below, but the best success I've had so far is to remove all non-ASCII characters. This is far from ideal, as I don't think the character's are really going to be all that problematic in the DB.
gsub(/[^[:ascii:]]/, "")
This is a sample of what my output looks like vs. what I'm expecting:
My CODES'S APOSTROPHE
My CODES’S APOSTROPHE
The second apostrophe should look squiggly. If you paste it into irb, you get the following: \U+FFE2
I tried Regexing specifically for this character and it appears to work in Rubular. As soon as I put it in my model however, I got a syntax error.
syntax error, unexpected $end, expecting ')'
raw_title = raw_title.gsub(/’/, "")
I also tried forcing the encoding to UTF-8, but everything is already in UTF-8 and this does not appear to have an effect. I tried forcing the output to US-ASCII, but I get a byte sequence error.
I also tried a few of the encoding options found in Ruby library. These basically did the same thing as the Regex.
This all comes down to that I'm trying to match output for testing purposes. Should I even be concerned about these special characters? Is there a better way to match these characters without blindly removing them?
Try adding:
# encoding: utf-8
at the top of the failing rspec file. This should ensure things like:
raw_title = raw_title.gsub(/’/, "")
in your spec work.
I tried using the above example. but even after that it kept failing. So I used iconv to convert that specfic character. THis is what I used
Iconv.conv('ASCII//IGNORE', 'UTF8', text_to_be_converted)
I tried what was given in the following link - How to get rid of non-ascii characters in ruby
So I have a Rails application that upon user submission should generate some kind of a .tex file based on the user input, compile it into a pdf, and deliver the pdf. Through process of elimination, I am pretty positive that everything is working except for one line; the one where pdflatex is called.
Here's the vital snippet of code:
(If it matters, its located in the Questions controller under the generate action, which is called after the form sends the relevant information. Though this may not be the best way, I'm pretty certain its not the cause of the error)
texHeader = 'app\assets\tex\QuestionsFront.txt'
texOut = 'app\assets\tex\Questions.tex'
#copy latex header to new file
FileUtils.cp(texHeader, texOut)
File.open(texOut, 'a+') do |fout|
fout.write("\n")
# a loop writes some more code to fout (its quite lengthy)
fout.write("\\end{enumerate}\n")
fout.write("\\end{document}")
#The problem line:
puts `pdflatex app/assets/tex/Questions.tex --output-directory=app/assets/tex`
end
filename = 'Questions.pdf'
filelocation = "app\\assets\\tex\\" + filename
File.open(filelocation, 'r') do |file|
send_file file, :filename => filename, :type => "application/pdf", :disposition => "attachment"
end
end
Here's my reasoning: It generates the .tex file correctly, and given a pre-created Questions.pdf file it sends it just fine. When the command in the puts is copied to the terminal, it runs without a hitch (the file begins with \nonstopmode so no worries about small errors). But for some reason, when I run the above script, not even a log file is created with an error.
What am I overlooking? Any ideas? Any way of seeing what output the puts line is creating?
Thanks so much in advance!
Figured out my own problem. The error is pretty interesting. You'll see I'm calling
puts `pdflatex app/assets/tex/Questions.tex --output-directory=app/assets/tex`
within the block
File.open(texOut, 'a+') do |fout|
where just a few lines prior
texOut = 'app\assets\tex\Questions.tex'
Basically, I'm trying to get latex to compile a document while the file is still open. So long as I'm in the File.open block, the file is open, and its automatically closed upon the end of the block.
Cutting and pasting the line of code down below the end of the block made it work just like I wanted. However, for the sake of clarity and the rare case where someone else has this problem, its actually better to open a separate system shell, navigate to the directory where the latex document is and do the compiling there. So, my updated code looks like:
fout.write("\\end{document}")
end
system 'runlatex.bat'
where that batch file is as follows:
cd app/assets/tex
pdflatex Questions.tex
That way any additional files in the tex directory are found, the log file is created there, etc.
Reason why I never got a log file? The pdflatex never executed - the OS stopped it with a permissions error before it ever ran.
Hope this helps!
Backticks (and %x{}) provide the same parsing context as a double quoted string. That means that the usual backslashed escape sequences are interpreted inside backticks; in particular, \t is a tab so this:
puts `pdflatex app\assets\tex\Questions.tex --output-directory=app\assets\tex`
will end up with two tabs in and that breaks everything. You can start escaping your backslashes (I think you'll need two or three backslashes to get one down to the shell) or switch to normal slashes (which Windows usually accepts in paths):
puts `pdflatex app/assets/tex/Questions.tex --output-directory=app/assets/tex`
Alternatively, you could switch to open3 to avoid the escaping and quoting issues and also get better error handling capabilities.
I am trying to parse a CSV file generated from an Excel spreadsheet.
Here is my code
require 'csv'
file = File.open("input_file")
csv = CSV.parse(file)
But I get this error
ArgumentError: invalid byte sequence in UTF-8
I think the error is because Excel encodes the file into ISO 8859-1 (Latin-1) and not in UTF-8
Can someone help me with a workaround for this issue, please
Thanks in advance.
You need to tell Ruby that the file is in ISO-8859-1. Change your file open line to this:
file=File.open("input_file", "r:ISO-8859-1")
The second argument tells Ruby to open read only with the encoding ISO-8859-1.
Specify the encoding with encoding option:
CSV.foreach(file.path, headers: true, encoding:'iso-8859-1:utf-8') do |row|
...
end
You can supply source encoding straight in the file mode parameter:
CSV.foreach( "file.csv", "r:windows-1250" ) do |row|
<your code>
end
If you have only one (or few) file, so when its not needed to automatically declare encoding on whatever file you get from input, and you have the contents of this file visible in plaintext (txt, csv etc) separated with i.e. semicolon, you can create new file with .csv extension manually, and paste the contents of your file there, then parse the contents like usual.
Keep in mind, that this is a workaround, but in need of parsing in linux only one big excel file, converted to some flavour of csv, it spares time on experimenting with all those fancy encodings
Save the file in utf-8, unless for some reason you need to save it differently in which case you may specify the encoded set while reading the file
add second argument "r:ISO-8859-1" as File.open("input_file","r:ISO-8859-1" )
I had this same problem and was just using google spreadsheets and then downloading as a CSV. That was the easiest solution.
Then I came across this gem
https://github.com/singlebrook/utf8-cleaner
Now I don't need to worry about this issue at all. Hope this helps!