How to read a CSV that contains double quotes (") - ruby-on-rails

I have a CSV File that looks like this
"url","id","role","url","deadline","availability","location","my_type","keywords","source","external_id","area","area (1)"
"https://myurl.com","123456","This a string","https://myurl.com?source=5&param=1","31-01-2020","1","Location´s Place","another_string, my_string","key1, key2, key3","anotherString","145129","Place in Earth",""
It has 13 columns.
The issue is that I get each row with a \" and I don't want that. Also, I get 16 columns back in the read.
This is what I have done
csv = CSV.new(File.open('myfile.csv'), quote_char:"\x00", force_quotes:false)
csv.read[1]
Output:
["\"https://myurl.com\"", "\"123456\"", "\"This a string\"", "\"https://myurl.com?source=5&param=1\"", "\"31-01-2020\"", "\"1\"", "\"Location´s Place\"", "\"another_string", " my_string\"", "\"key1", " key2", " key3\"", "\"anotherString\"", "\"145129\"", "\"Place in Earth\"", "\"\""]

The file you showed is a standard CSV file. There is nothing special needed. Just delete all those unnecessary arguments:
csv = CSV.new(File.open('myfile.csv'))
csv.read[1]
#=> [
# "https://myurl.com",
# "123456",
# "This a string",
# "https://myurl.com?source=5&param=1",
# "31-01-2020",
# "1",
# "Location´s Place",
# "another_string, my_string",
# "key1, key2, key3",
# "anotherString",
# "145129",
# "Place in Earth",
# ""
# ]
force_quotes doesn't do anything in your code, because it controls whether or not the CSV library will quote all fields when writing CSV. You are reading, not writing, so this argument is useless.
quote_char: "\x00" is clearly wrong, since the quote character in the example you posted is clearly " not NUL.
quote_char: '"' would be correct, but is not necessary, since it is the default.

Related

Rails parse csv separtor ¦

i have csv file with strange format
2783¦Larson and Sons
967¦Becker Group
333¦Rolfson LLC
I have tried to do this
CSV.foreach("#{Rails.root}/csv_files/suppliers.csv") do |supplier|
p supplier[0]
end
but have got a string "2783¦Larson and Sons"
How to separate values?
For example will return
supplier[0] #=> "2783"
supplier[1] #=> "Larson and Sons"
Why would you expect CSV to know how to handle this weird input? You should explicitly specify the encoding and the column separator.
CSV.read("#{Rails.root}/csv_files/suppliers.csv",
encoding: Encoding::ISO_8859_1,
col_sep: "\xC2\xA6".force_encoding(Encoding::ISO_8859_1)) do |supplier|
puts supplier.inspect
end
#⇒ [["2783", "Larson and Sons"],
# ["967", "Becker Group"],
# ["333", "Rolfson LLC"]]

Nokogiri: Clean up data output

I am trying to scrape player information from MLS sites to create a map of where the players come from, as well as other information. I am as new to this as it gets.
So far I have used this code:
require 'HTTParty'
require 'Nokogiri'
require 'JSON'
require 'Pry'
require 'csv'
page = HTTParty.get('https://www.atlutd.com/players')
parse_page = Nokogiri::HTML(page)
players_array = []
parse_page.css('.player_list.list-reset').css('.row').css('.player_info').map do |a|
player_info = a.text
players_array.push(player_info)
end
#CSV.open('atlantaplayers.csv', 'w') do |csv|
# csv << players_array
#end
pry.start(binding)
The output of the pry function is:
"Miguel Almirón10\nMidfielder\n-\nAsunción, ParaguayAge:\n23\nHT:\n5' 9\"\nWT:\n140\n"
Which when put into the csv creates this in a single cell:
"Miguel Almirón10
Midfielder
-
Asunción, ParaguayAge:
23
HT:
5' 9""
WT:
140
"
I've looked into things and have determined that it is possible nodes (\n)? that is throwing off the formatting.
My desired outcome here is to figure out how to get the pry output into the array as follows:
Miguel, Almiron, 10, Midfielder, Asuncion, Paraguay, 23, 5'9", 140
Bonus points if you can help with the accent marks on names. Also if there is going to be an issue with height, is there a way to convert it to metric?
Thank you in advance!
I've looked into things and have determined that it is possible nodes (\n)? that is throwing off the formatting.
Yes that's why it's showing in this odd format, you can strip the rendered text to remove extra spaces/lines then your text will show without the \ns
player_info = a.text.strip
[1] pry(main)> "Miguel Almirón10\n".strip
=> "Miguel Almirón10"
This will only remove the \n if you wish to store them in a CSV in this order
Miguel, Almiron, 10, Midfielder, Asuncion, Paraguay, 23, 5'9", 140
then you might want to split by spaces and then create an array for each row so when pushing the line to the CSV file it will look like this:
csv << ["Miguel", "Almiron", 10, "Midfielder", "Asuncion", "Paraguay", 23, "5'9\"", 140]
with the accent marks on names
you can use the transliterate method which will remove accents
[8] pry(main)> ActiveSupport::Inflector.transliterate("Miguel Almirón10")
=> "Miguel Almiron10"
See http://api.rubyonrails.org/classes/ActiveSupport/Inflector.html#method-i-transliterate and you might want to require 'rails' for this
Here's what I would use, i18n and people gems:
require 'people'
require "i18n"
I18n.available_locales = [:en]
#np = People::NameParser.new
players_array = []
parse_page.css('.player_info').each do |div|
name = #np.parse I18n.transliterate(div.at('.name a').text)
players_array << [
name[:first],
name[:last],
div.at('.jersey').text,
div.at('.position').text,
]
end
# => [["Miguel", "Almiron", "10", "Midfielder"],
# ["Mikey", "Ambrose", "22", "Defender"],
# ["Yamil", "Asad", "11", "Forward"],
# ...
That should get you started.

Rails validates NOT in regex

I'm trying to create a Rails 3 validation that will ensure that people are not using one of the common free email addresses.
My thought was something like this ....
validates_format_of :email, :with => /^((?!gmail).*)$|^((?!yahoo).*)$|^((?!hotmail).*)$/
or
validates_exclusion_of :email, :in => %w( gmail. GMAIL. hotmail. HOTMAIL. live. LIVE. aol. AOL. ), :message => "You must use your corporate email address."
But neither works properly. Any ideas?
Basically you've written a regex that matches anything. Let's break it down.
/
^( # [ beginning of string
(?!gmail) # followed by anything other than "gmail"
. # followed by any one character
)$ # followed by the end the of string
| # ] OR [
^( # beginning of the string
(?!yahoo) # followed by anything other than "yahoo"
. # followed by any one character
)$ # followed by the end of the string
| # ] OR [
^( # beginning of the string
(?!hotmail) # followed by anything other than "hotmail"
.* # followed by any or no characters
)$ # followed by the end the of string
/ # ]
When you think about it you'll realize that the only strings that won't match are ones that start with "gmail," "yahoo," and "hotmail"--all at the same time, which is impossible.
What you really want is something like this:
/
.+# # one or more characters followed by #
(?! # followed by anything other than...
(gmail|yahoo|hotmail) # [ one of these strings
\. # followed by a literal dot
) # ]
.+ # followed by one or more characters
$ # and the end of the string
/i # case insensitive
Put it together and you have:
expr = /.+#(?!(gmail|yahoo|hotmail)\.).+$/i
test_cases = %w[ foo#gmail.com
bar#yahoo.com
BAZ#HOTMAIL.COM
qux#example.com
quux
]
test_cases.map {|addr| expr =~ addr }
# => [nil, nil, nil, 0, nil]
# (nil means no match, 0 means there was a match starting at character 0)

Illegal quoting on line using FasterCSV in ruby 1.8.7

I am facing "Illegal quoting" error when parse the content from SQL dump and the dump file is in the format of TXT with tab (\t) separator.
require 'rubygems'
require 'faster_csv'
begin
FasterCSV.foreach(excel_file, :quote_char => '"',:col_sep =>'\t', :row_sep =>:auto, :headers => :first_row) do |row|
col= row.to_s.split(/\t/)
if col[3]!="" or !col[3].empty?
color_value=col[3].to_s.capitalize
#Inser Color
color=Color.find_or_create_by_name(:name=>color_value)
elsif col[3].empty?
color_id= nil
end
end
rescue Exception => e
puts e
end
The program executed and run successfully but there is an invalid data present like
below (#font-face ...) mean execution terminated with error of "Illegal quoting on line 3.
ID Name code comments
1 white 234 good
2 Black 222
3 red 343 #font-face { font-family: "Verdana"; .....}
Can any one suggest me how to skip when invalid data occurs in column ?
Thanks in advance.
I'm not sure if this will solve the error you are seeing, but you need to use double quotes around escaped characters, e.g.:
:col_sep => "\t"
FasterCSV isn't very kind to badly formatted data.
I don't know that there is a solution for this.
However - if your example file doesn't actually contain any quoting using "
then perhaps just use a different quot_char (eg ')
You can use the ASCII code for the NULL character -- \0x00 -- as such:
FasterCSV.foreach(excel_file, :quote_char => '\0x00',:col_sep =>'\t', :row_sep =>:auto, :headers => :first_row) do |row|
...
end
You can find a chart of some ASCII chars here: http://www.bluesock.org/~willg/dev/ascii.html

Why does yaml.dump add quotes this key-value pair

I'm trying to write a new entry to a rails database.yml and for some reason I'm getting quotes around this entry
db_yml => {'new_env' => {'database' => 'database_name', '<<' => '*defaults' }}
File.open("#{RAILS_ROOT}/config/database.yml", "a") {|f| YAML.dump(db_yml, f)}
returns
---
new_env:
database: database_name
"<<": "*defaults"
I don't know why the "---" and the quotes around the defaults are returned, any thoughts on how to prevent?
thanks!
<< and * have special meaning in YAML.
Quotes are used to show that << is not merge and * is not an alias.
the --- is just to mark the start of YAML dump.
The double quote around << it's because can be interpretate in YAML format. So it's escape.

Resources