How to detect and convert DOS/Windows end of line to UNIX end of line in Ruby - ruby-on-rails

I have implemented a CSV upload in Ruby (on Rails) that works fine when the file is uploaded from a browser that runs on UNIX-like systems
However I have a file that as uploaded by a real customer contains the famous ^M as end of lines (I guess it was uploaded from Windows)
I need to detect this situation and replace the character before the file is processed
Here is the code that creates the file
# create the file on the server
path = File.join(directory, name)
# write the file
File.open(path, 'wb') { |f| f.write(uploadData.read) }
Do I need to change the "wb" to "w" and this would solve the problem ?

The CR (^M as you say it) char is "\r" in Ruby (and many other languages), so if you're sure your line endings also have the LF char (Windows uses CRLF as the line ending) then you can just remove all the CRs at the ends of the lines ($ matches at the end of a line, before the last "\n"):
uploadData.read.gsub /\r$/, ''
If you're not sure you're going to have the LF (eg. MacOS 9 used to use a plain CR at the end of the line) then replace any CR optionally followed by a LF with an LF:
uploadData.read.gsub /\r\n?/, "\n"

Related

Using io.tmpfile() with shell command, ran via io.popen, in Lua?

I'm using Lua in Scite on Windows, but hopefully this is a general Lua question.
Let's say I want to write a temporary string content to a temporary file in Lua - which I want to be eventually read by another program, - and I tried using io.tmpfile():
mytmpfile = assert( io.tmpfile() )
mytmpfile:write( MYTMPTEXT )
mytmpfile:seek("set", 0) -- back to start
print("mytmpfile" .. mytmpfile .. "<<<")
mytmpfile:close()
I like io.tmpfile() because it is noted in https://www.lua.org/pil/21.3.html :
The tmpfile function returns a handle for a temporary file, open in read/write mode. That file is automatically removed (deleted) when your program ends.
However, when I try to print mytmpfile, I get:
C:\Users\ME/sciteLuaFunctions.lua:956: attempt to concatenate a FILE* value (global 'mytmpfile')
>Lua: error occurred while processing command
I got the explanation for that here Re: path for io.tmpfile() ?:
how do I get the path used to generate the temp file created by io.tmpfile()
You can't. The whole point of tmpfile is to give you a file handle without
giving you the file name to avoid race conditions.
And indeed, on some OSes, the file has no name.
So, it will not be possible for me to use the filename of the tmpfile in a command line that should be ran by the OS, as in:
f = io.popen("python myprog.py " .. mytmpfile)
So my questions are:
Would it be somehow possible to specify this tmpfile file handle as the input argument for the externally ran program/script, say in io.popen - instead of using the (non-existing) tmpfile filename?
If above is not possible, what is the next best option (in terms of not having to maintain it, i.e. not having to remember to delete the file) for opening a temporary file in Lua?
You can get a temp filename with os.tmpname.
local n = os.tmpname()
local f = io.open(n, 'w+b')
f:write(....)
f:close()
os.remove(n)
If your purpose is sending some data to a python script, you can also use 'w' mode in popen.
--lua
local f = io.popen(prog, 'w')
f:write(....)
#python
import sys
data = sys.stdin.readline()

Can KEDIT respect per-file line endings?

By default, the KEDIT text editor (the Mansfield Software Group one) adds windows-style CRLF line endings on all files, including unix-style LF files. How can I configure KEDIT to respect the existing newline sequence?
You can edit your KEDIT profile in winprof.kex to include the following at the end
set reprofile on
LOCATE 0
if lower(filestatus.3()) == 'lf' then
'SET EOLOUT lf'
ELSE
'SET EOLOUT crlf'
Reprofile ensures it parses for each file, LOCATE 0 forces the file to be opened, and then it inspects the existing line endings and updates appropriately. This also defaults to windows style CRLF endings for new files.

lua table string concat not correct

I have a simple function to read lines from .txt file:
function loadData(file_name, root_path)
-- here, file_name is './list.txt', path is '../data/my/'
for line in io.lines(file_name) do
local data = {}
base_path = root_path .. line
-- so, base_path is something like ../data/my/001
data.file = base_path .. '_color.png'
print(data)
end
end
I expect the data should be {file: "../data/my/001_color.png"}, but I got {_color.png" ../data/my/001}
Can anyone help me? Thanks!
Check your ./list.txt file content for EOL (end of line) as it may be produced on windows (EOL=CR LF) an interpreted on linux (EOL=LF). io.lines takes CR character into line string on linux!
Your programm makes everything correct, but your data is not.
Let assume your first line in ./list.txt is ../data/my/001\r\n
line variable is ../data/my/001\r (print(#line) gives 15 instead of 14 ).
Carriage return (CR) in print moves cursor to start line position witout changing line.
Your print output in this case is something simmilar to {file: "../data/my/001\r_color.png"} (as it depends on print implementation) and you get output:
{file: "../data/my/001
_color.png"} <-- on the same line
Let's combine it:
_color.png"}ata/my/001
To correct this:
provide file without CR (works correctly on all systems)
add in loop on first row: line = line:gsub('[\r\n]','') to remove CR LF

What does file.new("temp.out", "w") line represent?

I'm learning the Ruby language and I'm having a lot of fun.
I am currently working on the Temperature converter with file output exercise.
The solution is provided below
print "Hello. Please enter a Celsius value: "
celsius = gets.to_i
fahrenheit = (celsius * 9 / 5) + 32
puts "saving result to output file 'temp.out'"
fh = File.new("temp.out", "w")
fh.puts fahrenheit
fh.close
The highlighted part confuses me.
We are calling File.new to create a file named "temp.out" and "w" write whatever inputs until we fh.close. Am I correct?
Thank you!
By default, puts() will send its output to what's called stdout, which is connected to your screen. File.new() creates a new file which is assigned to the variable fh. Because you created the file in write mode, you can use fh to write stuff to the file. fh.puts() sends output to the file assigned to the variable fh. In other words, a bare puts() statement sends output to your screen, but when you precede puts() with a file, the output goes to the file.
You can also write those statements like this:
File.open("temp.out", "w") do |f|
f.puts fahrenheit
end
The neat thing about writing it like that is: when the end statement executes, Ruby will automatically close the file for you.

MalformedCSVError with rails CSV (FasterCSV)

I'm having serious issues trying to parse some CSV in rails right now.
Basically my app gets a user to upload a CSV file. The app then converts the file to ensure it is in UTF-8 format, then attempts to parse it and process it. Whenever the app attempts to parse it however, I get the MalformedCSVError stating "Illegal quoting on line 1"
Now what I don't get, is if I copy the original file into a new document and save it, then I can parse it on a rails console without a problem.
If I attempt to parse the original file, it complains about an invalid character for UTF-8 encoding (the file isn't in UTF-8 hence the app converts it)
If I attempt to parse the file which the app has converted to UTF-8 and changed the line endings to LF, it fails to parse.
If I do a file diff between the version the app has produced, and the copy/paste version that I have made (which works) there are 0 differences so I really can't figure out why one is parsable, and one is not.
Any suggestions? My app is processing the file as follows :
def create
#survey = Survey.new(params[:survey])
# Now we need to try and convert this to UTF-8 if it isn't already
encoded = File.read(#survey.survey_data.current_path)
encoding = CharlockHolmes::EncodingDetector.detect(encoded)
# We've got a guess at the encoding,
# so we can try and convert it but it
# may still fail so we need to handle
# that
begin
re_encoded = CharlockHolmes::Converter.convert(encoded, encoding[:encoding], 'UTF-8')
re_encoded = re_encoded.gsub(/\r\n?/, "\n")
# Now replace the uploaded file
File.open(#survey.survey_data.current_path, 'w') { |f|
f.write(re_encoded)
}
rescue ArgumentError
puts "UH OH!!!!!"
end
puts "#{#survey.survey_data.current_path}"
#parsed = CSV.read(#survey.survey_data.current_path)
end
The file uploading gem is CarrierWave if that makes any difference.
Please can someone help me as this is driving me insane!
Edit
The error says it's on line 1. Line 1 (assuming it doesn't index from 0) is
"Survey","RD","GarrysMDs","NigelsMDs","PaulsMDs","StephensMDs","BrinleyJ","CarolineP","DaveL","GrantR","GregS","Kent","NeilC","NicolaP","AndyC","DarrenS","DeanB","KarenF","PaulR","RichardF","SteveG","BrianG","GordonA","NickD","NickR","NickT","RayL","SimonH","EdmondH","JasonF","MikeS","SamanthaN","TimB","TravisF","AlanS","Q1","Q2","Q3","Q4","Q5","Q6","Q7","Q8PM","Q8N","Q9","Q10","Q11","Q12","Q13","Q14","Q15","Q16PM","Q16N","Q17PM","Q17N","Q18PM","Q18N","Q19","Q20","Q21","Q22","comment","Q23.1","Q23.2","Q23.3","TQ23.1","TQ23.2","VPM","VN","VQ1","VQ2","VQ3","VQ4","VQ5","VQ6","VQ7","VQ8N","VQ8PM","VQ9","VQ10","VQ11","VQ12","VQ13","VQ14","VQ15","VQ16","VQ16N","VQ16PM","VQ17","VQ17N","VQ17PM","VQ18","VQ18N","VQ18PM","VQ19","VQ20","VQ21","VQ22","VQ23.1","VQ23.2","VQ23.3","VRD","XQ16","XQ17","XQ18"
Well that was irritating!
Turns out the file had a BOM which was causing the CSV parser to break. Loading the file with
CSV.open("path/to/file.csv", "rb:bom|encoding")
allowed it to parse it perfectly! So annoyed how long it took to track down but it's now working and with no need to convert to UTF-8 now either!

Resources