I am using Smarter CSV to and have encountered a csv that has blank lines. Is there anyway to ignore these? Smarter CSV is taking the blank line as a header and not processing the file correctly. Is there any way I can bastardize the comment_regexp?
mail.attachments.each do | attachment |
filename = attachment.filename
#filedata = attachment.decoded
puts filename
tmp =
tmp.write attachment.decoded
puts tmp.path
f =, "r:bom|utf-8")
options = {
:comment_regexp => /^#/
data = SmarterCSV.process(f, options)
puts data
Sample File:

Let's first construct your file.
str = <<~_
# Report
Date header1 header2 header3 header4
20200 jdk;df 4543 $8333 4387
20200 jdk 5004 $945876 67
fin_name = 'in'
File.write(fin_name, str)
#=> 223
Two problems must be addressed to read this file using the method SmarterCSV::process. The first is that comments--lines beginning with an octothorpe ('#')--and blank lines must be skipped. The second is that the field separator is not a fixed-length string.
The first of these problems can be dealt with by setting the value of process' :comment_regexp option key to a regular expression:
:comment_regexp => /\A#|\A\s*\z/
which reads, "match an octothorpe at the beginning of the string (\A being the beginning-of-string anchor) or (|) match a string containing zero or more whitespace characters (\s being a whitespace character and \z being the end-of-string anchor)".
Unfortunately, SmarterCSV is not capable of dealing with variable-length field separators. It does have an option :col_sep, but it's value must be a string, not a regular expression.
We must therefore pre-process the file before using SmarterCSV, though that is not difficult. While are are at, we may as well remove the dollar signs and use commas for field separators.1
fout_name = 'out.csv'
fout =, 'w')
File.foreach(fin_name) do |line|
fout.puts(line.strip.gsub(/\s+\$?/, ',')) unless
Let's look at the file produced.
Now that's what a CSV file should look like! We may now use SmarterCSV on this file with no options specified:
#=> [{:date=>20200, :header1=>"jdk;df", :header2=>4543,
# :header3=>8333, :header4=>4387},
# {:date=>20200, :header1=>"jdk", :header2=>5004,
# :header3=>945876, :header4=>67}]
1. I used IO::foreach to read the file line-by-line and then write each manipulated line that is neither a comment nor a blank line to the output file. If the file is not huge we could instead gulp it into a string, modify the string and then write the resulting string to the output file: File.write(fout_name,^#.*?\n|^[ \t]*\n|^[ \t]+|[ \t]+$|\$/, '').gsub(/[ \t]+/, ',')). The first regular expression reads, "match lines beginning with an octothorpe or lines containing only spaces and tabs or spaces and tabs at the beginning of a line or spaces and tabs at the end of a line or a dollar sign". The second gsub merely converts multiple tabs and spaces to a comma., 'w')
File.foreach(fin_name) do |line|
fout.puts(line.strip.gsub(/\s+\$?/, ',')) unless


Split a string on new lines, but include empty lines

Let's say I have a string with the contents
local my_str = [[
I'd like to get the following table:
In other words, I'd like the blank line 3 to be included in my result. I've tried the following:
local result = {};
for line in string.gmatch(my_str, "[^\n]+") do
table.insert(result, line);
However, this produces a result which will not include the blank line 3.
How can I make sure the blank line is included? Am I just using the wrong regex?
Try this instead:
local result = {};
for line in string.gmatch(my_str .. "\n", "(.-)\n") do
table.insert(result, line);
If you don't want the empty fifth element that gives you, then get rid of the blank line at the end of my_str, like this:
local my_str = [[
(Note that a newline at the beginning of a long literal is ignored, but a newline at the end is not.)
You can replace the + with *, but that won't work in all Lua versions; LuaJIT will add random empty strings to your result (which isn't even technically wrong).
If your string always includes a newline character at the end of the last line like in your example, you can just do something like "([^\n]*)\n" to prevent random empty strings and the last empty string.
In Lua 5.2+ you can also just use a frontier pattern to check for either a newline or the end of the string: [^\n]*%f[\n\0], but that won't work in LuaJIT either.
If you need to support LuaJIT and don't have the trailing newline in your actual string, then you could just add it manually:
string.gmatch(my_str .. "\n", "([^\n]*)\n")

Leading and trailing spaces for each line from textarea

ruby 2.1.3
rails 4.1.7
I want to generate a unordered list from textarea. So I have to preserve all line breaks for each item and remove leading and trailing spaces.
Well, I'm trying to remove all leading and trailing spaces from each line of textarea with no success.
I'm using a regex:
string_from_textarea.gsub(/^[ \t]+|[ \t]+$/, '')
I've tried strip and rstrip rails methods with no luck too (they are working with the same result as regex):
Leading spaces for each line are removed perfectly.
But with trailing spaces only the last space from string is removed. But I wanna for each line.
What am I missing here? What is the deal with textarea and trailing spaces for each line?
Some code example:
I'm using a callback to save formated data.
after_validation: format_ingredients
def format_ingredients
self.ingredients = #ingredients.gsub(/^[ \t]+|[ \t]+$/, "")
Form view:
= f.text_area :ingredients, class: 'fieldW-600 striped', rows: '10'
You can use String#strip
' test text with multiple spaces '.strip
#=> "test text with multiple spaces"
To apply this to each line:
str = " test \ntext with multiple \nspaces "
str ="\n")
"test\ntext with multiple\nspaces"
This isn't a good use for a regexp. Instead use standard String processing methods.
If you have text that contains embedded LF ("\n") line-ends and spaces at the beginning and ends of the lines, then try this:
foo = "
line 1
line 2
line 3
foo # => "\n line 1 \n line 2\nline 3\n"
Here's how to clean the lines of leading/trailing white-space and re-add the line-ends:
bar ="\n")
bar # => "\nline 1\nline 2\nline 3"
If you're dealing with CRLF line-ends, as a Windows system would generate text:
foo = "\r\n line 1 \r\n line 2\r\nline 3\r\n"
bar ="\r\n")
bar # => "\r\nline 1\r\nline 2\r\nline 3"
If you're dealing with the potential of having white-space that contains other forms of white-space like non-breaking spaces, then switching to a regexp that uses the POSIX [[:space:]] character set, that contains white-space used in all character sets. I'd do something like:
s.sub(/^[[:space:]]+/, '').sub(/[[:space:]]+$/, '')
I think #sin probably intimated the problem in his/her first comment. Your file was probably produced on a Windows machine that puts a carriage return/life feed pair ("\r\n") at the end of each line other than (presumably) the last, where it just writes \n. (Check line[-2] on any line other than the last.) That would account for the result you are getting:
r = /^[ \t]+|[ \t]+$/
str = " testing 123 \r\n testing again \n"
str.gsub(r, '')
#=> "testing 123 \r\ntesting again\n"
If this theory is correct the fix should be just a slight tweak to your regex:
r = /^[ \t]+|[ \t\r]+$/
str.gsub(r, '')
#=> "testing 123\ntesting again\n"
You might be able to do this with your regex by changing the value of the global variable $/, which is the input record separator, a newline by default. That could be a problem for the end of the last line, however, if that only has a newline.
I think you might be looking for String#lstrip and String#rstrip methods:
str = %Q^this is a line
and so is this
all of the lines have two spaces at the beginning
and also at the end ^`
`> new_string = ""
> ""
str.each_line do |line|
new_string += line.rstrip.lstrip + "\n"
> "this is a line\n and so is this \n all of the lines have two spaces at the beginning \n and also at the end "
2.1.2 :034 > puts new_string
this is a line
and so is this
all of the lines have two spaces at the beginning
and also at the end
> new_string
`> "this is a line\nand so is this\nall of the lines have two spaces at the beginning\nand also at the end\n"`

Pattern match dropping new lines characters

How to extract the values from a csv like string dropping the new lines characters (\r\n or \n) with a pattern.
A line looks like:
1.1;2.2;Example, 3
Notice there are only 3 values and the separator is ;. The problem I'm having is to come up with a pattern that reads the values while dropping the new line characters (the file comes from a windows machine so it has \r\n, reading it from a linux and would like to be independent from the new line character used).
My simple example right now is:
s = "1.1;2.2;Example, 3\r\n";
p = "(.-);(.-);(.-)";
a, b, c = string.match(s, p);
print(c:byte(1, -1));
The two last characters printed by the code above are the \r\n.
The problem is that both, \r and \n are detected by the %c and %s classes (control characters and space characters), as show by this code:
s = "a\r";
So, is it possible to left out from the match the new lines characters? (It should not be assumed that the last two characters will be new lines characters)
The 3ยบ value may contain spaces, punctuation and alphanumeric characters and since \r\n are detected as space characters a pattern like `"(.-);(.-);([%w%s%c]-).*" does not work.
Your pattern
p = "(.-);(.-);(.-)";
does not work: the third field is always empty because .- matches a little as possible. You need to anchor it at the end of the string, but then the third field will contain trailing newline chars:
p = "(.-);(.-);(.-)$";
So, just stop at the first trailing newline char. This also anchors the last match. Try this pattern instead:
p = "(.-);(.-);(.-)[\r\n]";
If trailing newline chars are optional, try this pattern:
p = "(.-);(.-);(.-)[\r\n]*$";
Without any lua experience I found a naive solution:
clean_CR = s:gsub("\r","");
clean_NL = clean_CR:gsub("\n","");
With POSIX regex syntax I'd use
.. with "\n" and "\r" possibly included as "^M", "^#" (control/unicode characters) .. depending on your editor.

How do I read each line of uploaded file with mixed Windows and Unix line endings?

I am trying to read each line of an uploaded file in Rails.
file_data = params[:files]
if file_data.respond_to?(:read) /\n/, "\r\n" ).split("\r\n").each do |line|
elsif file_data.respond_to?(:path) /\n/, "\r\n" ).split("\r\n").each do |line|
If the uploaded file contains a mix of Windows and Unix encodings, presumably being due to copying from multiple places, Rails doesn't properly seperate each line of the file and sometimes returns two lines as one.
The application is hosted on a Linux box. Also, the file is copied from a Google docs spreadsheet column.
Are there any solutions for this problem?
Hex code for lines that don't get seperated into new lines look like:
636f 6d0d 0a4e 6968
Here's how I'd go about this. First, to test some code:
"for all good men\n"
def read_file(data)
data.each do |li|
[ *li.split(/[\r\n]+/) ].each do |l|
yield l
read_file(SAMPLE_TEXT) do |li|
puts li
Which outputs:
for all good men
The magic occurs in [ *li.split(/[\r\n]+/) ]. Breaking it down:
li.split(/[\r\n]+/) causes the line to be split on returns, new-lines and combinations of those. If a line has multiples the code will gobble empty lines, so if there's a chance you'll receive those you'll need a little more sophisticated pattern, /[\r\n]{1,2}/ which, though untested, should work.
*li.split(/[\r\n]+/) uses the "splat" operator * which says to explode the following array into its component elements. This is a convenient way to get an array when you're not sure whether you have a single element or an array being passed into a method.
[*li.split(/[\r\n]+/)] takes the components returned and turns them back into a single array.
To modify the method to handle a file instead is easy:
def read_file(fname)
File.foreach(fname) do |li|
[ *li.split(/[\r\n]+/) ].each do |l|
yield l
Call it almost the same way as in the previous example:
read_file('path/to/file') do |li|
puts li
The reason you want to use foreach is it'll read line-by-line, which is a lot more memory efficient than slurping a file using read or readlines, either of which read the entire file into memory at once. foreach is extremely fast also, so you don't take a speed-hit when using it. As a result there's little advantage to read-type methods, and good advantages to using foreach.
You are substituting \n with \r\n, which is problematic when parsing Windows files. Now \r\n becomes \r\r\n.
Better is to substitute to the Unix line ending format and then split on \n: /\n/, "\r\n" ).split("\r\n").each do |line|
becomes: /\r\n/, "\n" ).split("\n").each do |line|
Try the built-in method:
File.readlines('foo').each do |line|
Or:'foo').read.gsub(/\r\n?/, "\n").each_line do |line|

Removing lines that begin with > in a rails string

I'm trying to remove any lines that begin with the character '>' in a long string (i.e. replies to an email).
In PHP I'd iterate over each line with an if statement, in linux I'd try and use sed or awk.
What's the most elegant rails approach?
You can try this:
Your question is implying that the input is one string, containing multiple lines.
Do you want the output to be just one string with multiple lines as well? I'm assuming yes.
either using String and Array operations:
str.lines.reject{|x| x =~ /^>/}.join # this will return a new string, without those ">" lines
or using Regular Expressions:
str.gsub(/^>.+\n*/. '')
Better Solution:
You will need to use non-greedy multi-line matching mode for your Regular Expression:
str.gsub(/^>.*?$\n*/m, '') # by using gsub!() you can modify the string in place
^> matches your ">" character at the start of a line
.*?$ matches any characters after the start character until the end of the line (non-greedy)
\n* matches the newline character itself if any (you want to remove that as well)
the "m" at the end of the regular expressions indicates multi-line matching , which will apply the RegExp for each line in the string.
It should work as you expect:
your_string.lines.to_a.reject{|line| line[0] == '>'}.join
