I'm trying to allow users to upload a CSV/Excel document, and parse it using Roo (The most suggested one I've seen), but I'm having a bit of issues figuring it out.
Current Script
require 'roo'
xlsx = Roo::Excelx.new("./TestAppXL.xlsx")
xlsx.each_row_streaming do |row|
puts row.inspect # Array of Excelx::Cell objects
end
This was the only one I was able to get work - It returns what looks to be JSONB.
What I'm trying to do is a few part process:
A) User Uploads a list of 'cards' to my website.(Trying to allow as many options as possible. CSV, Excel, etc)
B) It instantly returns a list of the headers and asks 'Which header is name, quantity, etc etc?'
C) I parse the data for specifics headers and do 'X'.
B is what I primarily need help with. I'm struggling to figure out Roo exactly. I won't have control over the headers so I can't use numerical column numbers.
(Adding in Rails tag since this will be in a controller in the end, maybe an easier way to do it.)
Updated Script
I've actually made a lot of progress. Still trying to get closer to my original request.
require 'roo'
require 'roo-xls'
xlsx = Roo::Spreadsheet.open('Demo.xls')
headers = xlsx.first_row
puts xlsx.row(headers)
puts "Which number header is the Card Name?"
CardName = gets
puts xlsx.column(CardName.to_i)
# => Returns basic info about the spreadsheet file
Need a lot more logic on the gets but currently if I put in '3' it will return all content of Column 'CardName'. Working on iterating over the rows now.
Psuedo working script
require 'roo'
require 'roo-xls'
xlsx = Roo::Spreadsheet.open('Demo.xls')
headers = xlsx.first_row
puts xlsx.row(headers)
puts "Which number header is the Card Name?"
CardName = gets.to_i
specHeader = xlsx.cell(headers,CardName)
xlsx.column(CardName).drop(0).each_with_index do |item, index|
if index == 0
else
puts item
end
end
This is actually performing as expected, and I can start feeding the file into a Rake job now. Still working on some of the iteration but this is very close.
I made you a generic way to extract data out of a Roo spreadsheet based on a few header names which would be the convention to use by your uploaders.
require 'roo'
require 'roo-xls'
xlsx = Roo::Spreadsheet.open('Demo.xls')
first_row = xlsx.first_row
headers = ['CardName', 'Item']
headers.each{|h|Kernel.const_set(h, xlsx.row(first_row).index{|e| e =~ /#{h}/i})}
begin
xlsx.drop(first_row).each do |row|
p [row[CardName], row[Item]]
end
rescue
# the required headers are not all present
end
I suppose the only line that needs explaining is headers.each{|h|Kernel.const_set(h, xlsx.row(first_row).index{|e| e =~ /#{h}/i})}
for each headername assign to it with const_set the index of it in xlsx.row(first_row) (our headerrow) where the regular expression /h/i returns an index, the #{} around h is to expand the h into its value, 'CardName' in the first case, the i at the end of /h/i means the case is to be ignored, so the constant CardName is assigned the index of the string CardName in the headerrow.
Instead of the rather clumsy begin rescue structure you could check if all required constants are present with const_get and act upon that instead of catching the error.
EDIT
instead of the p [row[CardName], row[Item]] you could check and do anything, only keep in mind that if this is going to be part of a Rails or other website the interaction with the user is going to be tickier than your puts and get example. Eg something like
headers = ['CardName', 'Item', 'Condition', 'Collection']
...
xlsx.drop(first_row).each do |row|
if row[CardName].nil? || row[Item].nil?
# let the user know or skip
else
condition, collection = row[Condition], row[Collection]
# and do something with it
end
end
Related
I am rather new to using rails and website programming, apologizes for possible basic questions.
My website needs to upload a CSV file with the format:
Course_name #header
course1
Module_name #header
module1
Task_name,Task_description,Task_expected_result #header
mod1 task1 test,mod1 task2 test description,mod1 task1 test result
mod1 task2 test,mod1 task2 test description,mod1 task2 test result
mod1 task3 test,mod1 task3 test description,mod1 task3 test result
Module_name #header
module2
Task_name,Task_description,Task_expected_result
mod2 task1 test,mod2 task1 test description,mod2 task1 test result
mod2 task2 test,mod2 task2 test description,mod2 task2 test result
My database is set up that a course will have many modules, which in turn have many tasks.
Course > many modules > many tasks.
On my website I would like to upload the .csv file and then hit a button to upload the course.
I need the reading of the file, and thus the creation of the table entries to go as following:
read course_name until blank line is hit, create course using that name, grab the course_id of the newly created course.
read Module_name until blank line is hit, create the module using that name and the course_id(is how they are connected) and grab that module_id.
then read task_name, task_description and task_expected_results and create a task using all those values and the module_id, do this until a blank line is read.
then if not EOF and another module_name is read, repeat from module creation to task creation until EOF.
I know this is a lot to ask, I've tried searching online for help but i have not had any luck there. Any help with model/controller/view code would be appreciated greatly.
You will need to modify this in order to get it to find/populate your objects, but this uses strings to get the data from the CSV.
First off I split your "CSV" based on there being a blank line gap between sections (as I said before, this is a lot like trying to send several dictionary/database files inside a CSV, rather than using a better computer interchange format)
using the sample of view that you added to the comment (as HAML):
= form_with(multipart: true) do |f|
= f.file_field :file, accept: '.csv'
= submit_tag 'Read CSV'
your file will now be passed into the Rails as an ActionDispatch::Http::UploadedFile.
We can access the contents of this file, using the `#read' method:
mycsv = params[:file].read
my code for splitting, formatting, etc (assuming that the format of the file never changes):
require 'csv'
# first split the "string" into sections
csvarr = mycsv.split(/^$/).map(&:strip)
# the first section is going to be the course. so "shift" it out of the array,
# leaving us with an even number of remaining elements.
# since it's assumed that the first section will be the course, I don't bother
# checking that it is, and just store it's value
course_name = csvarr.shift.lines.last
# You probably want to do a "course = Course.find_or_create_by(name: course_name)"
# now loop through the remaining elements, 2 at a time
csvarr.each_slice(2) do |mod|
# the first element in a block of 2 will be the header
modname = mod[0].lines.last
# You probably want to do a "module = Module.find_or_create_by(course: course, name: modname)"
# the second element will be the associated data
modcsv = CSV.parse(mod[1], headers: true)
# loop through the data to show that we have it all correctly
modcsv.each do |task|
puts "Course: #{course_name} Module: #{modname} Task: #{task["Task_name"]}\n"
end
end
I'm trying to scrape the data from a table, and I'm unable to figure out why I can't iterate through the data gathered.
I'd like to iterate through each node in the table, getting the text within but the statement only works if written outside my loop.
Take a look at the terminal I'm getting
It's clear that puts outside of the loop works fine, but the same line fails in the loop. If I use
puts entry.children[1]
I get the proper response in the loop, but adding children.text is what causes it to fail:
require 'HTTParty'
require 'Nokogiri'
require 'JSON'
require 'Pry'
require 'CSV'
module Guns
class Scraper
page = HTTParty.get('http://www.gunviolencearchive.org/last-72-hours')
parse_page = Nokogiri::HTML(page)
incidents = Array.new
raw_table = parse_page.css('#block-system-main').css('.sticky-enabled')
table_entries = raw_table.xpath('//tbody')[0].children
state = table_entries.children[1].children.text
puts table_entries.children[1].children.text
table_entries.each do |entry|
puts entry.children[1].children.text
end
Pry.start(binding)
end
end
I might be able to do string slicing on the client side of the final program if I can't solve this problem but I'd rather not have to.
The table_entries object is equal to nil, or table_entries.children[1] is equal to nil. As a result you are calling the children method on nil which does not define the children method. You might be tempted to use the try method like this:
table_entries.children[1].try(:children).try(:text)
But unless you know why these values are nil and can confirm that they are not detrimental to the design of the app you are just allowing your code to fail silently.
I'm getting the error invalid byte sequence in UTF-8 when trying to import a CSV file in my Rails application. Everything was working fine until I added a gsub method to compare one of the CSV columns to a field in my database.
When I import a CSV file, I want to check whether the address for each row is included in an array of different addresses for a specific client. I have a client model with an alt_addresses property which contains a few different possible formats for the client's address.
I then have a citation model (if you're familiar with local SEO you'll know this term). The citation model doesn't have an address field, but it has a nap_correct? field (NAP stands for "Name", "Address", "Phone Number"). If the name, address, and phone number for a CSV row is equivalent to what I have in the database for that client, the nap_correct? field for that citation gets set to "correct".
Here's what the import method looks like in my citation model:
def self.import(file, client_id)
#client = Client.find(client_id)
CSV.foreach(file.path, headers: true) do |row|
#row = row.to_hash
#citation = Citation.new
if #row["Address"]
if #client.alt_addresses.include?(#row["Address"].to_s.downcase.gsub(/\W+/, '')) && self.phone == #row["Phone Number"].gsub(/[^0-9]/, '')
#citation.nap_correct = true
end
end
#citation.name = #row["Domain"]
#citation.listing_url = #row["Citation Link"]
#citation.save
end
end
And then here's what the alt_addresses property looks like in my client model:
def alt_addresses
address = self.address.downcase.gsub(/\W+/, '')
address_with_zip = (self.address + self.zip_code).downcase.gsub(/\W+/, '')
return [address, address_with_zip]
end
I'm using gsub to reformat the address column in the CSV as well as the field in my client database table so I can compare the two values. This is where the problem comes in. As soon as I added the gsub method I started getting the invalid byte-sequence error.
I'm using Ruby 2.1.3. I've noticed a lot of the similar errors I find searching Stack Overflow are related to an older version of Ruby.
Specify the encoding with encoding option:
CSV.foreach(file.path, headers: true, encoding: 'iso-8859-1:utf-8') do |row|
# your code here
end
One way I've figured out to get around this is to "Save As" on open office or libre office and then click "Edit Filter Settings", then make sure the character set is UTF-8 and save. Bottom line, use some external tool to convert the characters to utf-8 compatible characters before loading it into ruby. This issue can be a true f-ing labyrinth within ruby alone
A unix tool called iconv can apparently do this sort of thing. https://superuser.com/questions/588048/is-there-any-tools-which-can-convert-any-strings-to-utf-8-encoded-values-in-linu
I want to add an import function for the users of my rails app, however the files that they will import won't have a header and the interesting data will start at row 8. In the rows I only need 2 fields
Here is an example of a line in the xlsx file :
751,"01/17/2015","11:17:32","60","TDFSRDSK","2","10","-1","0","3","","26","3","","","1","0"
I'll only need the date and the number in 4th field (60) and add them to an SQL table
I have a problem with the mapping and how to do it. I've tried to do it based on the railscast tutorial and roo doc but I can't manage to make it work.
def self.import(file)
xlsx = Roo::Excelx.new(file)
xlsx.each_row do |row|
date = row[2]
value = row[4]
user_id = current_user.id
product.create(:date => date, :valeur => value, :user_id => user_id)
end
end
And the error I get :
no implicit conversion of ActionDispatch::Http::UploadedFile into String
I'm really new to rails/ruby so I'm not even sure the mapping code is supposed to be like that.
It seems like you need to read the contents of the uploaded file into a String object first:
xlsx = Roo::Excelx.new(file.read)
You can refer to the relevant Rails guide for details on how this works.
I'm trying to save the results of a survey to a csv file, so every time the survey is completed it adds a new line to the file. I have code that exports database rows to a csv and lets you download it, but i don't know how to incorporate saving the survey to begin with, or if this is even possible? I have a csv file set up with the correct headers.
When your create function is called (the action in controller where form’s submit is directed to; create on REST controllers), you can just add some custom logic to there to convert the data from form into csv structure you want.
Ruby has CSV module builtin, which can be used to both read and write CSV files.
So you want something like following
require "csv"
CSV.open "output.csv", "a+" do |csv|
# example logic from another script how to populate the file
times.each do |key, value|
csv << [ key, value ]
end
end
You just need to define structure of rows how you want, this example throws two columns per row.
EDIT: a+ makes file to be written from the end (new rows) rather than original w+ that truncates the files.
A possible solution could be to use a logger. In your application controller:
def surveys
##surveys_log ||= Logger.new("#{Rails.root}/log/surveys.log")
end
Anywhere where you would like to log the survey:
surveys.info #survey.to_csv # you'll need to implement "to_csv" yourself
Which will result in a surveys.log in your log/ folder.