I'm going to preface that I'm still learning ruby.
I'm writing a script to parse a .csv and identify possible duplicate records in the data-set.
I have a .csv file with headers, so I'm parsing the data so that I can access each row using a header title as such:
#contact_table = CSV.parse(File.read("app/data/file.csv"), headers: true)
# Prints all last names in table
puts contact_table['last_name']
I'm trying to iterate over each row in the table and identify if the last name I'm currently iterating over is similar to the next last name, but I'm having trouble doing this. I guess the way I'm handling it is as if it's an array, but I checked the type and it's a CSV::Row.
example (this doesn't work):
#contact_table.each_with_index do |c, i|
puts "first contact is #{c['last_name']}, second contact is #{c[i + 1]['last_name']}"
end
I realized this doesn't work like this because the table isn't an array, it's a CSV::Row like I previously mentioned. Is there any method that can achieve this? I'm really blanking right now.
My csv looks something like this:
id,first_name,last_name,company,email,address1,address2,zip,city,state_long,state,phone
1,Donalt,Canter,Gottlieb Group,dcanter0#nydailynews.com,9 Homewood Alley,,50335,Des Moines,Iowa,IA,515-601-4495
2,Daphene,McArthur,"West, Schimmel and Rath",dmcarthur1#twitter.com,43 Grover Parkway,,30311,Atlanta,Georgia,GA,770-271-7837
#contact_table should be a CSV::Table which is a collection of CSV::Rows so in this:
#contact_table.each_with_index do |c, i|
...
end
c is a CSV::Row. That's why c['last_name'] works. The problem is that here:
c[i + 1]['last_name']
you're looking at c (a single row) instead of #contact_table, if you said:
#contact_table[i + 1]['last_name']
then you'd get the next last name or, when c is the last row, an exception because #contact_table[i+1] will be nil.
Also, inside the iteration, c is the current (or (i+1)th) row and won't always be the first.
What is your use case for this? Seems like a school project?
I recommend for_each instead of parse (see this comparison). I would probably use a Set for this.
Create a Set outside of the scope of parsing the file (i.e., above the parsing code). Let's call it rows.
Call rows.include?(row) during each iteration while parsing the file
If true, then you know you have a duplicate
If false, then call rows.add(row) to add the new row to the set
You could also just fill your set with an individual value from a column that must be distinct (e.g., row.field(:some_column_name)), such as email or phone number, and do the same inclusion check for that.
(If this is for a real app, please don't do this. Use model validations instead.)
I would use #read instead of #parse and do something like this:
require 'csv'
LASTNAME_INDEX = 2
data = CSV.read('data.csv')
data[1..-1].each_with_index do |row, index|
puts "Contact number #{index + 1} has the following last name : #{row[LASTNAME_INDEX]}"
end
#~> Contact number 1 has the following last name : Canter
#~> Contact number 2 has the following last name : McArthur
Related
Our task is create a table, and read values to the table using a loop. Print the values after the process is complete. - Create a table. - Read the number of values to be read to the table. - Read the values to the table using a loop. - Print the values in the table using another loop. for this we had written code as
local table = {}
for value in ipairs(table) do
io.read()
end
for value in ipairs(table) do
print(value)
end
not sure where we went wrong please help us. Our exception is
Input (stdin)
3
11
22
abc
Your Output (stdout)
~ no output ~
Expected Output
11
22
abc
Correct Code is
local table1 = {}
local x = io.read()
for line in io.lines() do
table.insert(table1, line)
end
for K, value in ipairs(table1) do
print(value)
end
Let's walk through this step-by-step.
Create a table.
Though the syntax is correct, table is a reserved pre-defined global name in Lua, and thus cannot should not be declared a variable name to avoid future issues. Instead, you'll need to want to use a different name. If you're insistent on using the word table, you'll have to distinguish it from the function global table. The easiest way to do this is change it to Table, as Lua is a case-sensitive language. Therefore, your table creation should look something like:
local Table = {}
Read values to the table using a loop.
Though Table is now established as a table, your for loop is only iterating through an empty table. It seems your goal is to iterate through the io.read() instead. But io.read() is probably not what you want here, though you can utilize a repeat loop if you wish to use io.read() via table.insert. However, repeat requires a condition that must be met for it to terminate, such as the length of the table reaching a certain amount (in your example, it would be until (#Table == 4)). Since this is a task you are given, I will not provide an example, but allow you to research this method and use it to your advantage.
Print the values after the process is complete.
You are on the right track with your printing loop. However, it must be noted that iterating through a table always returns two results, an index and a value. In your code, you would only return the index number, so your output would simply return:
1
2
3
4
If you are wanting the actual values, you'll need a placeholder for the index. Oftentimes, the placeholder for an unneeded variable in Lua is the underscore (_). Modify your for loop to account for the index, and you should be set.
Try modifying your code with the suggestions I've given and see if you can figure out how to achieve your end result.
Edited:
Thanks, Piglet, for corrections on the insight! I'd forgotten table itself wasn't a function, and wasn't reserved, but still bad form to use it as a variable name whether local or global. At least, it's how I was taught, but your comment is correct!
Using the helpful answers from other questions here I set up an import in a controller to add CSV values to a DB in Rails. I continue to get the error below which doesn't give an unknown attribute (although I've quadruple-checked that they match) - it only gives a single quotation mark. Very frustrating. I've rebuilt this a couple times and yet the same error continues to come up. Does anyone have a hunch as to why?
Also, is there an easier/more reliable/more straightforward way to add these CSV values to my rails DB, perhaps using different technologies/apps?
ActiveModel::UnknownAttributeError in MatchesController#index
unknown attribute '' for Match.
def index
require 'csv'
CSV.foreach(Rails.root.join('skedupdate.csv'), :headers => true) do |row|
Match.create!(row.to_hash)
end
end
Generally, that error happens when you have a column (name) in the row that does not exist in the model, or in your case the row has a key " that does not exist.
You should probably print what is in row.to_hash to see the contents and to check that all the keys exist in the Match model.
Sometimes this error occurs when the headers in the csv are given as per the delveloper's convinience and not as per the column names of the model/particular table.
In your case if you use the exact names in the first row of csv as the name of the database table column it will solve your issue.
For example if there is a Holiday table having 2 columns "date" and "holiday" then you have to give the same in the 1st row of your csv file too.
ie. "date" and "holiday".
I solved my issue by using this.Maybe it will work in your case too.
I have a CSV file like:
Header: 1,2,3,4
content: a,b,c,d,e
a,b,c,d
a,b
a,b,c,d,d
Is there any CSV method that I can use to easily validate the column consistency instead of
parsing the CSV line by line?
One way or another the whole file has to be read.
Here is a relative simple way. First the file is read and converted to an array which is then mapped to another array based on length (number of fields per row). This array is the checked if the length is always the same.
If you'd hate to read the file twice you could remember the length of the header and while you parse the file check each record if it has the same number of fields and otherwise trow an exeption.
require "csv"
def valid? file
a = CSV.read(file).map { |e|e.length }
a.min == a.max
end
p valid?("data.csv")
csv_validator gem would be helpful here.
Arrays have always been my downfall in every language I've worked with, but I'm in a situation where I really need to create a dynamic array of multiple items in Rails (note - none of these are related to a model).
Briefly, each element of the array should hold 3 values - a word, it's language, and a translation into English. For example, here's what I'd like to do:
myArray = Array.new
And then I'd like to push some values to the array (note - the actual content is taken from elsewhere - although not a model - and will need to be added via a loop, rather than hard coded as it is here):
myArray[0] = [["bonjour"], ["French"], ["hello"]]
myArray[1] = [["goddag"], ["Danish"], ["good day"]]
myArray[2] = [["Stuhl"], ["German"], ["chair"]]
I would like to create a loop to list each of the items on a single line, something like this:
<ul>
<li>bonjour is French for hello</li>
<li>goddag is Danish for good day</li>
<li>Stuhl is German for chair</li>
</ul>
However, I'm struggling to (a) work out how to push multiple values to a single array element and (b) how I would loop through and display the results.
Unfortunately, I'm not getting very far at all. I can't seem to work out how to push multiple values to a single array element (what normally happens is that the [] brackets get included in the output, which I obviously don't want - so it's possibly a notation error).
Should I be using a hash instead?
At the moment, I have three separate arrays, which is what I've always done, but I don't particularly like - that is, one array to hold the original word, one array to hold the language, and a final array to hold the translation. While it works, I'm sure this is a better approach - if I could work it out!
Thanks!
Ok, let's say you have the words you'd like in a CSV file:
# words.csv
bonjour,French,hello
goddag,Danish,good day
stuhl,German,chair
Now in our program we can do the following:
words = []
File.open('words.csv').each do |line|
# chomp removes the newline at the end of the line
# split(',') will split the line on commas and return an array of the values
# We then push the array of values onto our words array
words.push(line.chomp.split(','))
end
After this code is executed, the words array had three items in it, each item is an array that is based off of our file.
words[0] # => ["bonjour", "French", "hello"]
words[1] # => ["goddag", "Danish", "good day"]
words[2] # => ["stuhl", "German", "chair"]
Now we want to display these items.
puts "<ul>"
words.each do |word|
# word is an array, word[0], word[1] and word[2] are available
puts "<li>#{word[0]} is #{word[1]} for #{word[2]}</li>"
end
puts "</ul>"
This gives the following output:
<ul>
<li>bonjour is French for hello</li>
<li>goddag is Danish for good day</li>
<li>stuhl is German for chair</li>
</ul>
Also, you didn't ask about it, but you can access part of a given array by using the following:
words[0][1] # => "French"
This is telling ruby that you want to look at the first (Ruby arrays are zero based) element of the words array. Ruby finds that element (["bonjour", "French", "hello"]) and sees that it's also an array. You then asked for the second item ([1]) of that array and Ruby returns the string "French".
You mean something like this?
myArray.map{|s|"<li>#{[s[0],'is',s[1],'for',s[2]].join(" ")}</li>"}
Thanks for your help guys! I managed to figure a solution out based on your advice
For the benefit of anyone else who stumbles across this problem, here's my elided code. NB: I use three variables called text, language and translation, but I suppose you could replace these with a single array with three separate elements, as Jason suggests above.
In the Controller (content is being added via a loop):
#loop start
my_array.push(["#{text}", "#{language}", "#{translation}"])
#loop end
In the View:
<ul>
<% my_array.each do |item| %>
<li><%= item[0] # 0 is the original text %> is
<%= item[1] # 1 is the language %> for
<%= item[2] # 2 is the translation %></li>
<% end %>
</ul>
Thanks again!
I'm trying to limit the number of times I do a mysql query, as this could end up being 2k+ queries just to accomplish a fairly small result.
I'm going through a CSV file, and I need to check that the format of the content in the csv matches the format the db expects, and sometimes I try to accomplish some basic clean-up (for example, I have one field that is a string, but is sometimes in the csv as jb2003-343, and I need to strip out the -343).
The first thing I do is get from the database the list of fields by name that I need to retrieve from the csv, then I get the index of those columns in the csv, then I go through each line in the csv and get each of the indexed columns
get_fields = BaseField.find_by_group(:all, :conditions=>['group IN (?)',params[:group_ids]])
csv = CSV.read(csv.path)
first_line=csv.first
first_line.split(',')
csv.each_with_index do |row|
if row==0
col_indexes=[]
csv_data=[]
get_fields.each do |col|
col_indexes << row.index(col.name)
end
else
csv_row=[]
col_indexes.each do |col|
#possibly check the value here against another mysql query but that's ugly
csv_row << row[col]
end
csv_data << csv_row
end
end
The problem is that when I'm adding the content of the csv_data for output, I no longer have any connection to the original get_fields query. Therefore, I can't seem to say 'does this match the type of data expected from the db'.
I could work my way back through the same process that got me down to that level, and make another query like this
get_cleanup = BaseField.find_by_csv_col_name(first_line[col])
if get_cleanup.format==row[col].is_a
csv_row << row[col]
else
# do some data clean-up
end
but as I mentioned, that could mean the get_cleanup is run 2000+ times.
instead of doing this, is there a way to search within the original get_fields result for the name, and then get the associated field?
I tried searching for 'search rails object', but kept getting back results about building search, not searching within an already existing object.
I know I can do array.search, but don't see anything in the object api about search.
Note: The code above may not be perfect, because I'm not running it yet, just wrote that off the top of my head, but hopefully it gives you the idea of what I'm going for.
When you populate your col_indexes array, rather than storing a single value, you can store a hash which includes index and the datatype.
get_fields.each do |col|
col_info = {:row_index = row.index(col.name), :name=>col.name :format=>col.format}
col_indexes << col_info
end
You can then access all your data in the for loop