Rails CSV import from scratch

Rails CSV import from scratch - ruby-on-rails

I am rather new to using rails and website programming, apologizes for possible basic questions.
My website needs to upload a CSV file with the format:
Course_name #header
course1
Module_name #header
module1
Task_name,Task_description,Task_expected_result #header
mod1 task1 test,mod1 task2 test description,mod1 task1 test result
mod1 task2 test,mod1 task2 test description,mod1 task2 test result
mod1 task3 test,mod1 task3 test description,mod1 task3 test result
Module_name #header
module2
Task_name,Task_description,Task_expected_result
mod2 task1 test,mod2 task1 test description,mod2 task1 test result
mod2 task2 test,mod2 task2 test description,mod2 task2 test result
My database is set up that a course will have many modules, which in turn have many tasks.
Course > many modules > many tasks.
On my website I would like to upload the .csv file and then hit a button to upload the course.
I need the reading of the file, and thus the creation of the table entries to go as following:
read course_name until blank line is hit, create course using that name, grab the course_id of the newly created course.
read Module_name until blank line is hit, create the module using that name and the course_id(is how they are connected) and grab that module_id.
then read task_name, task_description and task_expected_results and create a task using all those values and the module_id, do this until a blank line is read.
then if not EOF and another module_name is read, repeat from module creation to task creation until EOF.
I know this is a lot to ask, I've tried searching online for help but i have not had any luck there. Any help with model/controller/view code would be appreciated greatly.

You will need to modify this in order to get it to find/populate your objects, but this uses strings to get the data from the CSV.
First off I split your "CSV" based on there being a blank line gap between sections (as I said before, this is a lot like trying to send several dictionary/database files inside a CSV, rather than using a better computer interchange format)
using the sample of view that you added to the comment (as HAML):
= form_with(multipart: true) do |f|
= f.file_field :file, accept: '.csv'
= submit_tag 'Read CSV'
your file will now be passed into the Rails as an ActionDispatch::Http::UploadedFile.
We can access the contents of this file, using the `#read' method:
mycsv = params[:file].read
my code for splitting, formatting, etc (assuming that the format of the file never changes):
require 'csv'
# first split the "string" into sections
csvarr = mycsv.split(/^$/).map(&:strip)
# the first section is going to be the course. so "shift" it out of the array,
# leaving us with an even number of remaining elements.
# since it's assumed that the first section will be the course, I don't bother
# checking that it is, and just store it's value
course_name = csvarr.shift.lines.last
# You probably want to do a "course = Course.find_or_create_by(name: course_name)"
# now loop through the remaining elements, 2 at a time
csvarr.each_slice(2) do |mod|
# the first element in a block of 2 will be the header
modname = mod[0].lines.last
# You probably want to do a "module = Module.find_or_create_by(course: course, name: modname)"
# the second element will be the associated data
modcsv = CSV.parse(mod[1], headers: true)
# loop through the data to show that we have it all correctly
modcsv.each do |task|
puts "Course: #{course_name} Module: #{modname} Task: #{task["Task_name"]}\n"
end
end

Related

Using Roo with Ruby(Rails) to parse Excel

I'm trying to allow users to upload a CSV/Excel document, and parse it using Roo (The most suggested one I've seen), but I'm having a bit of issues figuring it out.
Current Script
require 'roo'
xlsx = Roo::Excelx.new("./TestAppXL.xlsx")
xlsx.each_row_streaming do |row|
puts row.inspect # Array of Excelx::Cell objects
end
This was the only one I was able to get work - It returns what looks to be JSONB.
What I'm trying to do is a few part process:
A) User Uploads a list of 'cards' to my website.(Trying to allow as many options as possible. CSV, Excel, etc)
B) It instantly returns a list of the headers and asks 'Which header is name, quantity, etc etc?'
C) I parse the data for specifics headers and do 'X'.
B is what I primarily need help with. I'm struggling to figure out Roo exactly. I won't have control over the headers so I can't use numerical column numbers.
(Adding in Rails tag since this will be in a controller in the end, maybe an easier way to do it.)
Updated Script
I've actually made a lot of progress. Still trying to get closer to my original request.
require 'roo'
require 'roo-xls'
xlsx = Roo::Spreadsheet.open('Demo.xls')
headers = xlsx.first_row
puts xlsx.row(headers)
puts "Which number header is the Card Name?"
CardName = gets
puts xlsx.column(CardName.to_i)
# => Returns basic info about the spreadsheet file
Need a lot more logic on the gets but currently if I put in '3' it will return all content of Column 'CardName'. Working on iterating over the rows now.
Psuedo working script
require 'roo'
require 'roo-xls'
xlsx = Roo::Spreadsheet.open('Demo.xls')
headers = xlsx.first_row
puts xlsx.row(headers)
puts "Which number header is the Card Name?"
CardName = gets.to_i
specHeader = xlsx.cell(headers,CardName)
xlsx.column(CardName).drop(0).each_with_index do |item, index|
if index == 0
else
puts item
end
end
This is actually performing as expected, and I can start feeding the file into a Rake job now. Still working on some of the iteration but this is very close.

I made you a generic way to extract data out of a Roo spreadsheet based on a few header names which would be the convention to use by your uploaders.
require 'roo'
require 'roo-xls'
xlsx = Roo::Spreadsheet.open('Demo.xls')
first_row = xlsx.first_row
headers = ['CardName', 'Item']
headers.each{|h|Kernel.const_set(h, xlsx.row(first_row).index{|e| e =~ /#{h}/i})}
begin
xlsx.drop(first_row).each do |row|
p [row[CardName], row[Item]]
end
rescue
# the required headers are not all present
end
I suppose the only line that needs explaining is headers.each{|h|Kernel.const_set(h, xlsx.row(first_row).index{|e| e =~ /#{h}/i})}
for each headername assign to it with const_set the index of it in xlsx.row(first_row) (our headerrow) where the regular expression /h/i returns an index, the #{} around h is to expand the h into its value, 'CardName' in the first case, the i at the end of /h/i means the case is to be ignored, so the constant CardName is assigned the index of the string CardName in the headerrow.
Instead of the rather clumsy begin rescue structure you could check if all required constants are present with const_get and act upon that instead of catching the error.
EDIT
instead of the p [row[CardName], row[Item]] you could check and do anything, only keep in mind that if this is going to be part of a Rails or other website the interaction with the user is going to be tickier than your puts and get example. Eg something like
headers = ['CardName', 'Item', 'Condition', 'Collection']
...
xlsx.drop(first_row).each do |row|
if row[CardName].nil? || row[Item].nil?
# let the user know or skip
else
condition, collection = row[Condition], row[Collection]
# and do something with it
end
end

How to create a hash for all records created by a user?

In our Rails app, the user (or we on his behalf) load some data or even insert it manually using a crud.
After this step the user must validate all the configuration (the data) and "accept and agree" that it's all correct.
On a given day, the application will execute some tasks according the configuration.
Today, we already have a "freeze" flag, where we can prevent changes in the data, so the user cannot mess the things up...
But we also would like to do something like hash the data and say something like "your config is frozen and the hash is 34FE00...".
This would give the user a certain that the system is running with the configuration he approved.
How can we do that? There are 7 or 8 tables. The total of records created would be around 2k or 3k.
How to hash the data to detect changes after the approval? How would you do that?
I'm thinking about doing a find_by_user in each table, loop all records and use some fields (or all) to build a string and hash it at the end of the current loop.
After loop all tables, I would have 8 hash strings and would concatenate and hash them in a final hash.
How does it looks like? Any ideas?

Here's a possible implementation. Just define object as an Array of all the stuff you'd like to hash :
require 'digest/md5'
def validation_hash(object, len = 16)
Digest::MD5.hexdigest(object.to_json)[0,len]
end
puts validation_hash([Actor.first,Movie.first(5)])
# => 94eba93c0a8e92f8
# After changing a single character in the first Actors's biography :
# => 35f342d915d6be4e

RoR - Testing POST of file chunk

My rails app contains code to handle large file uploads, which basically consists of splitting up the file in javascript and making a number of posts for each chunk to a route where they are then reconstructed back to the original file.
I'm trying to figure out how to write tests for this logic, as up until now I've simply used fixture_file_upload for posting files.
I basically need to split a given file up into a range of bytes, and post that in a way that my route would handle it just as though it has been posted by my javascript.
Anyone know of a way to accomplish this in a rails test?

You could just create multiple fixture files (e.g. file.part1.txt, file.part2.txt, etc.) , upload all the parts and then check that they get concatenated together.
For example, if there are 10 fixture files:
(1..10).each do |part_no|
fixture_name = "file.part#{part_no}.txt"
fixture_file = fixture_file_upload("/files/#{fixture_name}", "text/plain")
post :part_upload, :part => fixture_file
end
# code to check result here

Saving form results to a CSV - Rails

I'm trying to save the results of a survey to a csv file, so every time the survey is completed it adds a new line to the file. I have code that exports database rows to a csv and lets you download it, but i don't know how to incorporate saving the survey to begin with, or if this is even possible? I have a csv file set up with the correct headers.

When your create function is called (the action in controller where form’s submit is directed to; create on REST controllers), you can just add some custom logic to there to convert the data from form into csv structure you want.
Ruby has CSV module builtin, which can be used to both read and write CSV files.
So you want something like following
require "csv"
CSV.open "output.csv", "a+" do |csv|
# example logic from another script how to populate the file
times.each do |key, value|
csv << [ key, value ]
end
end
You just need to define structure of rows how you want, this example throws two columns per row.
EDIT: a+ makes file to be written from the end (new rows) rather than original w+ that truncates the files.

A possible solution could be to use a logger. In your application controller:
def surveys
##surveys_log ||= Logger.new("#{Rails.root}/log/surveys.log")
end
Anywhere where you would like to log the survey:
surveys.info #survey.to_csv # you'll need to implement "to_csv" yourself
Which will result in a surveys.log in your log/ folder.

How to write Rake task to import data to Rails app?

Goal: Using a CRON task (or other scheduled event) to update database with nightly export of data from an existing system.
All data is created/updated/deleted in an existing system. The website does no directly integrate with this system, so the rails app simply needs to reflect the updates that appear in the data export.
I have a .txt file of ~5,000 products that looks like this:
"1234":"product name":"attr 1":"attr 2":"ABC Manufacturing":"2222"
"A134":"another product":"attr 1":"attr 2":"Foobar World":"2447"
...
All values are strings enclosed in double quotes (") that are separated by colons (:)
Fields are:
id: unique id; alphanumeric
name: product name; any character
attribute columns: strings; any character (e.g., size, weight, color, dimension)
vendor_name: string; any character
vendor_id: unique vendor id; numeric
Vendor information is not normalized in the current system.
What are best practices here? Is it okay to delete the products and vendors tables and rewrite with the new data on every cycle? Or is it better to only add new rows and update existing ones?
Notes:
This data will be used to generate Orders that will persist through nightly database imports. OrderItems will need to be connected to the product ids that are specified in the data file, so we can't rely on an auto-incrementing primary key to be the same for each import; the unique alphanumeric id will need to be used to join products to order_items.
Ideally, I'd like the importer to normalize the Vendor data
I cannot use vanilla SQL statements, so I imagine I'll need to write a rake task in order to use Product.create(...) and Vendor.create(...) style syntax.
This will be implemented on EngineYard

I wouldn't delete the products and vendors tables on every cycle. Is this a rails app? If so there are some really nice ActiveRecord helpers that would come in handy for you.
If you have a Product active record model, you can do:
p = Product.find_or_initialize_by_identifier(<id you get from file>)
p.name = <name from file>
p.size = <size from file>
etc...
p.save!
The find_or_initialize will lookup the product in the database by the id you specify, and if it can't find it, it will create a new one. The really handy thing about doing it this way, is that ActiveRecord will only save to the database if any of the data has changed, and it will automatically update any timestamp fields you have in the table (updated_at) accordingly. One more thing, since you would be looking up records by the identifier (id from the file), I would make sure to add an index on that field in the database.
To make a rake task to accomplish this, I would add a rake file to the lib/tasks directory of your rails app. We'll call it data.rake.
Inside data.rake, it would look something like this:
namespace :data do
desc "import data from files to database"
task :import => :environment do
file = File.open(<file to import>)
file.each do |line|
attrs = line.split(":")
p = Product.find_or_initialize_by_identifier(attrs[0])
p.name = attrs[1]
etc...
p.save!
end
end
end
Than to call the rake task, use "rake data:import" from the command line.

Since Products don't really change that often, the best way I would see is to update only the records that change.
Get all the deltas
Mass update using a single SQL statement
If you are having your normalization code in the models, you could use Product.create and Vendor.create or else it would be just a overkill. Also, Look into inserting multiple records in a single SQL transaction, its much faster.

Create an importer rake task that is cronned
Parse the file line by line using Faster CSV or via vanilla ruby like:
file.each do |line|
products_array = line.split(":")
end
Split each line on the ":" and push in into a hash
Use a find_or_initialize to populate your db such as:
Product.find_or_initialize_by_name_and_vendor_id("foo", 111)

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart