I am a newcomer in Ruby and I want to parse txt file(new7.txt)
The sample of input txt file is:
Revision: 37407
Author: imakarov
Date: 21 June 2013 г. 10:23:28
Message:
update specification from Jhon (it was in VTBSOATST-1219)
----
Added : /Analitics/Документы/ЧТЗ/BR-5610/2 Спецификации/BR-5610 Публикация клиентских данных в АБС Бисквит (CifOraSyncOffPers).docx
Deleted : /Analitics/Документы/ЧТЗ/BR-5610/2 Спецификации/BR-5610 Публикация клиентских данных в АБС Бисквит.docx
Revision: 37406
Author: imakarov
Date: 21 June 2013 г. 10:22:16
Message:
delete files
----
Deleted : /Analitics/Документы/ЧТЗ/BR-5610/2 Спецификации/ЧТЗ Принудительное обновление и публикация ФЛ с замечаниями Кочебина С..docx
Deleted : /Analitics/Документы/ЧТЗ/BR-5610/2 Спецификации/ЧТЗ Принудительное обновление и публикация ФЛ-comments.docx
Deleted : /Analitics/Документы/ЧТЗ/BR-5610/2 Спецификации/ЧТЗ Принудительное обновление и публикация ФЛ-comments_Орлов.docx
Deleted : /Analitics/Документы/ЧТЗ/BR-5610/2 Спецификации/ЧТЗ Принудительное обновление и публикация ФЛ.docx
Revision: 37405
Author: dboytsov
Date: 21 June 2013 г. 10:21:17
Message:
add attributes in file
----
Modified : /Analitics/Документы/ЧТЗ/BR-5864 Запрос данных клиента по интернет-анкете КН/Преобразование BR-5864.docx
Modified : /Analitics/Документы/ЧТЗ/BR-5864 Запрос данных клиента по интернет-анкете КН/ЧТЗ BR-5864 Запрос данных клиента по интернет анкете.docx
The colleagues from Stackoverflow helps me with programm:
require 'csv'
data = []
File.foreach("new7.txt") do |line|
line.chomp!
if line =~ /Revision/
data.push [line]
elsif line =~ /Author/
if data.last and not data.last[1]
data.last[1] = line
else
data.push [nil, line]
end
elsif line =~ /Date/
if data.last and not data.last[2]
data.last[2] = line
else
data.push [nil, nil, line]
end
end
end
CSV.open('new1.csv', 'w') do |csv|
data.each do |record|
csv << record
end
end
But now I have the following situatuion:
And I need that:
I use an export in .csv
May be it would be a better way to export in .xls ? Is it a difficult to export in .xls file in each column inside?
I have such words in input file (new7.txt) as "Revision" "Author" "date" and so on. In input file it is not a column. And I want to parse the input file and copy to .xls by columns
Depending on your regional settings (only US seems different) Excel should use semicolon (";") as CSV separator instead of coma (",").
From Wikipedia:
"Microsoft Excel will open .csv files, but depending on the system's regional settings, it may expect a semicolon as a separator instead of a comma, since in some languages the comma is used as the decimal separator."
You can use option :col_sep to specify a column separator.
This should works.
CSV.generate('new1.csv', 'w', {col_sep: ";"}) do |csv|
data.each do |record|
csv << record
end
end
Related
I have a CSV File that looks like this
"url","id","role","url","deadline","availability","location","my_type","keywords","source","external_id","area","area (1)"
"https://myurl.com","123456","This a string","https://myurl.com?source=5¶m=1","31-01-2020","1","Location´s Place","another_string, my_string","key1, key2, key3","anotherString","145129","Place in Earth",""
It has 13 columns.
The issue is that I get each row with a \" and I don't want that. Also, I get 16 columns back in the read.
This is what I have done
csv = CSV.new(File.open('myfile.csv'), quote_char:"\x00", force_quotes:false)
csv.read[1]
Output:
["\"https://myurl.com\"", "\"123456\"", "\"This a string\"", "\"https://myurl.com?source=5¶m=1\"", "\"31-01-2020\"", "\"1\"", "\"Location´s Place\"", "\"another_string", " my_string\"", "\"key1", " key2", " key3\"", "\"anotherString\"", "\"145129\"", "\"Place in Earth\"", "\"\""]
The file you showed is a standard CSV file. There is nothing special needed. Just delete all those unnecessary arguments:
csv = CSV.new(File.open('myfile.csv'))
csv.read[1]
#=> [
# "https://myurl.com",
# "123456",
# "This a string",
# "https://myurl.com?source=5¶m=1",
# "31-01-2020",
# "1",
# "Location´s Place",
# "another_string, my_string",
# "key1, key2, key3",
# "anotherString",
# "145129",
# "Place in Earth",
# ""
# ]
force_quotes doesn't do anything in your code, because it controls whether or not the CSV library will quote all fields when writing CSV. You are reading, not writing, so this argument is useless.
quote_char: "\x00" is clearly wrong, since the quote character in the example you posted is clearly " not NUL.
quote_char: '"' would be correct, but is not necessary, since it is the default.
i have csv file with strange format
2783¦Larson and Sons
967¦Becker Group
333¦Rolfson LLC
I have tried to do this
CSV.foreach("#{Rails.root}/csv_files/suppliers.csv") do |supplier|
p supplier[0]
end
but have got a string "2783¦Larson and Sons"
How to separate values?
For example will return
supplier[0] #=> "2783"
supplier[1] #=> "Larson and Sons"
Why would you expect CSV to know how to handle this weird input? You should explicitly specify the encoding and the column separator.
CSV.read("#{Rails.root}/csv_files/suppliers.csv",
encoding: Encoding::ISO_8859_1,
col_sep: "\xC2\xA6".force_encoding(Encoding::ISO_8859_1)) do |supplier|
puts supplier.inspect
end
#⇒ [["2783", "Larson and Sons"],
# ["967", "Becker Group"],
# ["333", "Rolfson LLC"]]
I am parsing the CSV file with Ruby and am having trouble in that the delimiter is a comma my data contains commas.
In portions of the data that contain commas the data is surrounded by "" but I am not sure how to make CSV ignore commas that are contained within Quotations.
Example CSV Data (File.csv)
NCB 14591 BLK 13 LOT W IRR," 84.07 FT OF 25, ALL OF 26,",TWENTY-THREE SAC HOLDING COR
Example Code:
require 'csv'
CSV.foreach("File.csv", encoding:'iso-8859-1:utf-8', :quote_char => "\x00").each do |x|
puts x[1]
end
Current Output: " 84.07 FT OF 25
Expected Output: 84.07 FT OF 25, ALL OF 26,
Link to the gist to view the example file and code.
https://gist.github.com/markscoin/0d6c2d346d70fd627203317c5fe3097c
Try with force_quotes option:
require 'csv'
CSV.foreach("data.csv", encoding:'iso-8859-1:utf-8', quote_char: '"', force_quotes: true).each do |x|
puts x[1]
end
Result:
84.07 FT OF 25, ALL OF 26,
The illegal quoting error is when a line has quotes, but they don't wrap the entire column, so for instance if you had a CSV that looks like:
NCB 14591 BLK 13 LOT W IRR," 84.07 FT OF 25, ALL OF 26,",TWENTY-THREE SAC HOLDING COR
NCB 14592 BLK 14 LOT W IRR,84.07 FT OF "25",TWENTY-FOUR SAC HOLDING COR
You could parse each line individually and change the quote character only for the lines that use bad quoting:
require 'csv'
def parse_file(file_name)
File.foreach(file_name) do |line|
parse_line(line) do |x|
puts x.inspect
end
end
end
def parse_line(line)
options = { encoding:'iso-8859-1:utf-8' }
begin
yield CSV.parse_line(line, options)
rescue CSV::MalformedCSVError
# this line is misusing quotes, change the quote character and try again
options.merge! quote_char: "\x00"
retry
end
end
parse_file('./File.csv')
and running this gives you:
["NCB 14591 BLK 13 LOT W IRR", " 84.07 FT OF 25, ALL OF 26,", "TWENTY-THREE SAC HOLDING COR"]
["NCB 14592 BLK 14 LOT W IRR", "84.07 FT OF \"25\"", "TWENTY-FOUR SAC HOLDING COR"]
but then if you have a mix of bad quoting and good quoting in a single row this falls apart again. Ideally you just want to clean up the CSV to be valid.
I am trying to do the following in IRB:
file = CSV.read('branches.csv', headers:true)
file.each do |branch|
Branch.create(attributes:branch.to_hash)
end
branches.csv contains one header entitled business_name which should map onto the attribute for Branch of the same name, but I see the error:
ActiveRecord::UnknownAttributeError: unknown attribute 'business_name' for Branch.
Strangely, doing Branch.create(business_name:'test') works just fine with no issues.
Update:
I think this has something to do with the encoding of the text in the UTF-8 CSV produced by Excel as suggested in the comments below. Not sure if this IRB gives any clues... but our header title business_name != "business_name"
2.3.3 :348 > file = CSV.read('x.csv', headers:true)
#<CSV::Table mode:col_or_row row_count:165>
2.3.3 :349 > puts file.first.to_hash.first.first
business_name
2.3.3 :350 > file = CSV.read('x.csv', headers:true)
#<CSV::Table mode:col_or_row row_count:165>
2.3.3 :351 > puts file.first.to_hash.first.first == "business_name"
false
Just skip the attributes: part. It is not needed at all, because branch.to_hash already returns exactly the format you describe in your last sentence.
file = CSV.read('branches.csv', headers:true)
file.each { |branch| Branch.create(branch.to_hash) }
I am a newcomer in Ruby.
I have a sample (input text) like:
Message:
update attributes in file and commit version
----
Modified
I need to put in line the row after "message" tag. Note that this row can be and close with "message" like
Message:update attributes in file and commit version
I've tried like this:
if line =~/Message/
But of course it doesn't search the next row.
Can anyone help me how to catch rows between tags "Message" and "---"
If you know some examples please type a link
Update: the whole code
require 'csv'
data = []
File.foreach("new7.txt") do |line|
line.chomp!
if line =~ /Revision/
data.push [line]
elsif line =~ /Author/
if data.last and not data.last[1]
data.last[1] = line
else
data.push [nil, line]
end
elsif line=~/^Message:(.*)^-/m
if data.last and not data.last[2]
data.last[2] = line
else
data.push [nil, nil, line]
end
end
end
CSV.open('new1.csv', 'w') do |csv|
data.each do |record|
csv << record
end
enter code here
Input file:
Revision: 37407
Author: imakarov
Date: 21 июня 2013 г. 10:23:28
Message:my infomation
dmitry name
Output csv file:
You can use /^Message:(.*)^---/m as your regex. The /m allows you to match across line boundaries. See http://rubular.com/r/FhqiKx0XyI
Update #1: Here's sample output from irb:
Peters-MacBook-Air-2:bot palfvin$ irb
1.9.3p194 :001 > line = "\nMessage:first-line\nsecond-line\n---\nthird-line"
=> "\nMessage:first-line\nsecond-line\n---\nthird-line"
1.9.3p194 :002 > line =~ /^Message:(.*)^-/m
=> 1
1.9.3p194 :003 > $1
=> "first-line\nsecond-line\n"
1.9.3p194 :004 >