Trying to parse this file with Ruby CSV.
https://www.sec.gov/files/data/broker-dealers/company-information-about-active-broker-dealers/bd070219.txt
However, I am getting an error.
CSV.open(file_name, "r", { :col_sep => "\t", :row_sep => "\n\r" }).each do |row|
puts row
end
CSV::MalformedCSVError: New line must be <"\n\r"> not <"\r"> in line
1.
Windows row_sep is "\r\n", not "\n\r". However this CSV is malformed. Looking at it using a hex editor it appears to be using "\r\r\n".
It's tab-delimited.
In addition it is not using proper quoting, line 247 has 600 "B" STREET STE. 2204, so you need to turn off quote characters.
quote_char: nil, col_sep: "\t", row_sep: "\r\r\n"
There's an extra tab on the end, each line ends with \t\r\r\n. You can also look at it as using a row_sep of "\r\n" with an extra \r field.
quote_char: nil, col_sep: "\t", row_sep: "\r\n"
Or you can view it as having a row_sep of \t\r\r\n and no extra field.
quote_char: nil, col_sep: "\t", row_sep: "\t\r\r\n"
Either way, it's a mess.
I used a hex editor to look at the file as text and raw data side by side. This let me see what's truly at the end of the line.
87654321 0011 2233 4455 6677 8899 aabb ccdd eeff 0123456789abcdef
00000000: 3030 3030 3030 3139 3034 0941 4252 4148 0000001904.ABRAH
00000010: 414d 2053 4543 5552 4954 4945 5320 434f AM SECURITIES CO
00000020: 5250 4f52 4154 494f 4e09 3030 3832 3934 RPORATION.008294
00000030: 3532 0933 3732 3420 3437 5448 2053 5452 52.3724 47TH STR
00000040: 4545 5420 4354 2e20 4e57 0920 0947 4947 EET CT. NW. .GIG
00000050: 2048 4152 424f 5209 5741 0939 3833 3335 HARBOR.WA.98335
00000060: 090d 0d0a 3030 3030 3030 3233 3033 0950 ....0000002303.P
^^^^^^^^^
Hex 09 0d 0d 0a is \t\r\r\n.
Alternatively, you can print the lines with p and any invisible characters will be revealed.
f = File.open(file_name)
p f.readline
"0000001904\tABRAHAM SECURITIES CORPORATION\t00829452\t3724 47TH STREET CT. NW\t \tGIG HARBOR\tWA\t98335\t\r\r\n"
Use :row_sep => :auto instead of :row_sep => "\n\r":
CSV.open(file_name, "r", { :col_sep => "\t", :row_sep => :auto }).each do |row|
puts row
end
Related
I have a CSV File that looks like this
"url","id","role","url","deadline","availability","location","my_type","keywords","source","external_id","area","area (1)"
"https://myurl.com","123456","This a string","https://myurl.com?source=5¶m=1","31-01-2020","1","Location´s Place","another_string, my_string","key1, key2, key3","anotherString","145129","Place in Earth",""
It has 13 columns.
The issue is that I get each row with a \" and I don't want that. Also, I get 16 columns back in the read.
This is what I have done
csv = CSV.new(File.open('myfile.csv'), quote_char:"\x00", force_quotes:false)
csv.read[1]
Output:
["\"https://myurl.com\"", "\"123456\"", "\"This a string\"", "\"https://myurl.com?source=5¶m=1\"", "\"31-01-2020\"", "\"1\"", "\"Location´s Place\"", "\"another_string", " my_string\"", "\"key1", " key2", " key3\"", "\"anotherString\"", "\"145129\"", "\"Place in Earth\"", "\"\""]
The file you showed is a standard CSV file. There is nothing special needed. Just delete all those unnecessary arguments:
csv = CSV.new(File.open('myfile.csv'))
csv.read[1]
#=> [
# "https://myurl.com",
# "123456",
# "This a string",
# "https://myurl.com?source=5¶m=1",
# "31-01-2020",
# "1",
# "Location´s Place",
# "another_string, my_string",
# "key1, key2, key3",
# "anotherString",
# "145129",
# "Place in Earth",
# ""
# ]
force_quotes doesn't do anything in your code, because it controls whether or not the CSV library will quote all fields when writing CSV. You are reading, not writing, so this argument is useless.
quote_char: "\x00" is clearly wrong, since the quote character in the example you posted is clearly " not NUL.
quote_char: '"' would be correct, but is not necessary, since it is the default.
I am parsing the CSV file with Ruby and am having trouble in that the delimiter is a comma my data contains commas.
In portions of the data that contain commas the data is surrounded by "" but I am not sure how to make CSV ignore commas that are contained within Quotations.
Example CSV Data (File.csv)
NCB 14591 BLK 13 LOT W IRR," 84.07 FT OF 25, ALL OF 26,",TWENTY-THREE SAC HOLDING COR
Example Code:
require 'csv'
CSV.foreach("File.csv", encoding:'iso-8859-1:utf-8', :quote_char => "\x00").each do |x|
puts x[1]
end
Current Output: " 84.07 FT OF 25
Expected Output: 84.07 FT OF 25, ALL OF 26,
Link to the gist to view the example file and code.
https://gist.github.com/markscoin/0d6c2d346d70fd627203317c5fe3097c
Try with force_quotes option:
require 'csv'
CSV.foreach("data.csv", encoding:'iso-8859-1:utf-8', quote_char: '"', force_quotes: true).each do |x|
puts x[1]
end
Result:
84.07 FT OF 25, ALL OF 26,
The illegal quoting error is when a line has quotes, but they don't wrap the entire column, so for instance if you had a CSV that looks like:
NCB 14591 BLK 13 LOT W IRR," 84.07 FT OF 25, ALL OF 26,",TWENTY-THREE SAC HOLDING COR
NCB 14592 BLK 14 LOT W IRR,84.07 FT OF "25",TWENTY-FOUR SAC HOLDING COR
You could parse each line individually and change the quote character only for the lines that use bad quoting:
require 'csv'
def parse_file(file_name)
File.foreach(file_name) do |line|
parse_line(line) do |x|
puts x.inspect
end
end
end
def parse_line(line)
options = { encoding:'iso-8859-1:utf-8' }
begin
yield CSV.parse_line(line, options)
rescue CSV::MalformedCSVError
# this line is misusing quotes, change the quote character and try again
options.merge! quote_char: "\x00"
retry
end
end
parse_file('./File.csv')
and running this gives you:
["NCB 14591 BLK 13 LOT W IRR", " 84.07 FT OF 25, ALL OF 26,", "TWENTY-THREE SAC HOLDING COR"]
["NCB 14592 BLK 14 LOT W IRR", "84.07 FT OF \"25\"", "TWENTY-FOUR SAC HOLDING COR"]
but then if you have a mix of bad quoting and good quoting in a single row this falls apart again. Ideally you just want to clean up the CSV to be valid.
I am a newcomer in Ruby and I want to parse txt file(new7.txt)
The sample of input txt file is:
Revision: 37407
Author: imakarov
Date: 21 June 2013 г. 10:23:28
Message:
update specification from Jhon (it was in VTBSOATST-1219)
----
Added : /Analitics/Документы/ЧТЗ/BR-5610/2 Спецификации/BR-5610 Публикация клиентских данных в АБС Бисквит (CifOraSyncOffPers).docx
Deleted : /Analitics/Документы/ЧТЗ/BR-5610/2 Спецификации/BR-5610 Публикация клиентских данных в АБС Бисквит.docx
Revision: 37406
Author: imakarov
Date: 21 June 2013 г. 10:22:16
Message:
delete files
----
Deleted : /Analitics/Документы/ЧТЗ/BR-5610/2 Спецификации/ЧТЗ Принудительное обновление и публикация ФЛ с замечаниями Кочебина С..docx
Deleted : /Analitics/Документы/ЧТЗ/BR-5610/2 Спецификации/ЧТЗ Принудительное обновление и публикация ФЛ-comments.docx
Deleted : /Analitics/Документы/ЧТЗ/BR-5610/2 Спецификации/ЧТЗ Принудительное обновление и публикация ФЛ-comments_Орлов.docx
Deleted : /Analitics/Документы/ЧТЗ/BR-5610/2 Спецификации/ЧТЗ Принудительное обновление и публикация ФЛ.docx
Revision: 37405
Author: dboytsov
Date: 21 June 2013 г. 10:21:17
Message:
add attributes in file
----
Modified : /Analitics/Документы/ЧТЗ/BR-5864 Запрос данных клиента по интернет-анкете КН/Преобразование BR-5864.docx
Modified : /Analitics/Документы/ЧТЗ/BR-5864 Запрос данных клиента по интернет-анкете КН/ЧТЗ BR-5864 Запрос данных клиента по интернет анкете.docx
The colleagues from Stackoverflow helps me with programm:
require 'csv'
data = []
File.foreach("new7.txt") do |line|
line.chomp!
if line =~ /Revision/
data.push [line]
elsif line =~ /Author/
if data.last and not data.last[1]
data.last[1] = line
else
data.push [nil, line]
end
elsif line =~ /Date/
if data.last and not data.last[2]
data.last[2] = line
else
data.push [nil, nil, line]
end
end
end
CSV.open('new1.csv', 'w') do |csv|
data.each do |record|
csv << record
end
end
But now I have the following situatuion:
And I need that:
I use an export in .csv
May be it would be a better way to export in .xls ? Is it a difficult to export in .xls file in each column inside?
I have such words in input file (new7.txt) as "Revision" "Author" "date" and so on. In input file it is not a column. And I want to parse the input file and copy to .xls by columns
Depending on your regional settings (only US seems different) Excel should use semicolon (";") as CSV separator instead of coma (",").
From Wikipedia:
"Microsoft Excel will open .csv files, but depending on the system's regional settings, it may expect a semicolon as a separator instead of a comma, since in some languages the comma is used as the decimal separator."
You can use option :col_sep to specify a column separator.
This should works.
CSV.generate('new1.csv', 'w', {col_sep: ";"}) do |csv|
data.each do |record|
csv << record
end
end
I tried to import data from csv in my rails app, but something went wrong:
CSV::MalformedCSVError in ArticlesController#index
Unclosed quoted field on line 1.
My csv looks like this:
"Код";"№ по каталогу (артикул)";"Наименование товара";"Ед. изм.";"Цена опт.";"Доп.";"Остатки";"Класс";"Группа";"Бренд";"Блок."
2223;15-562-44;15-562-44 (27-B07-F) VW Polo 95-R ;шт ;37,430;;;Амортизаторы ;Амортизаторы BOGE ;;
10327;24-052-1;24-052-1(46-A27-0) LAND ROVER 84- F ;шт ;68,750;;;Амортизаторы ;Амортизаторы BOGE ;;
10328;24-053-1;24-053-1(46-A28-0) LAND ROVER 84- R ;шт ;68,750;;;Амортизаторы ;Амортизаторы BOGE ;;
Maybe this is because of the first line (which has no ;;)
My code look like this:
def csv_import
require 'csv'
file = File.open("/#{Rails.public_path}/uploads/smallcsv.csv")
#csv = CSV.parse(file)
csv = CSV.open(file, "r:ISO-8859-15:UTF-8", {:col_sep => ";", :row_sep => ";;", :headers => :first_row})
file_path = "/#{Rails.public_path}/uploads/smallcsv.csv"
##parsed_file=CSV::Reader.parse(file_path)
csv.each do |row|
ename = row[2]
eprice = row[5]
eqnt = row[7]
esupp = row[10]
logger.warn(ename)
end
end
I'm running ruby 1.9+ with fastercsv gem
I figured this out myself using "CSV - Unquoted fields do not allow \r or \n (line 2)".
The problem was with the first line, so :auto helped me.
I am facing "Illegal quoting" error when parse the content from SQL dump and the dump file is in the format of TXT with tab (\t) separator.
require 'rubygems'
require 'faster_csv'
begin
FasterCSV.foreach(excel_file, :quote_char => '"',:col_sep =>'\t', :row_sep =>:auto, :headers => :first_row) do |row|
col= row.to_s.split(/\t/)
if col[3]!="" or !col[3].empty?
color_value=col[3].to_s.capitalize
#Inser Color
color=Color.find_or_create_by_name(:name=>color_value)
elsif col[3].empty?
color_id= nil
end
end
rescue Exception => e
puts e
end
The program executed and run successfully but there is an invalid data present like
below (#font-face ...) mean execution terminated with error of "Illegal quoting on line 3.
ID Name code comments
1 white 234 good
2 Black 222
3 red 343 #font-face { font-family: "Verdana"; .....}
Can any one suggest me how to skip when invalid data occurs in column ?
Thanks in advance.
I'm not sure if this will solve the error you are seeing, but you need to use double quotes around escaped characters, e.g.:
:col_sep => "\t"
FasterCSV isn't very kind to badly formatted data.
I don't know that there is a solution for this.
However - if your example file doesn't actually contain any quoting using "
then perhaps just use a different quot_char (eg ')
You can use the ASCII code for the NULL character -- \0x00 -- as such:
FasterCSV.foreach(excel_file, :quote_char => '\0x00',:col_sep =>'\t', :row_sep =>:auto, :headers => :first_row) do |row|
...
end
You can find a chart of some ASCII chars here: http://www.bluesock.org/~willg/dev/ascii.html