When reading CSV files the output is strange

When reading CSV files the output is strange - ruby-on-rails

I have a CSV file which looks like this:
1ttAAAttAnaattFrench PolynesiattPFttAustralia and Oceaniatt-17.352606tt-145.509956
2ttAAEttAnnabattAlgeriattDZttAfricatt36.822225tt7.809167
3ttAAFttApalachicolattUnited StatesttUSttNorth Americatt29.7276066tt-85.0274416
4ttAAGtt\NttBrazilttBRttSouth Americatt\Ntt\N
I use this gem to fetch data: https://github.com/tilo/smarter_csv
This is the code I use to show data in terminal console:
filename = 'db/csv/airports_codes.csv'
options = {
:col_sep => 'tt',
}
records = SmarterCSV.process(filename, options)
puts records
I put these files in seeds.rb file because I will modify this code later to seed my database with data. This last line of code is there so I can see how it looks like. So I run rake db:seed
And the output is obviously huge because there are around ~5k lines. Now the first problem is that I can't see all of the data in my terminal. When I scroll to the top this is the first item (note that ID is 4674 which means it displayed last ~250 items):
{:"1"=>4674, :aaa=>"YPJ", :anaa=>"Aupaluk", :french_polynesia=>"Canada", :pf=>"CA", :australia_and_oceania=>"North America", :"_17.352606"=>59.2967, :"_145.509956"=>-69.5997}
How do I see others items?
The second problem is that key names are really weird. How do I rename them, or even better, how do I use arrays instead of hashes?

If you set the option
:headers_in_file => false
in options, that should sort the problem out.
i.e.
filename = 'db/csv/airports_codes.csv'
options = {
:col_sep => 'tt',
:headers_in_file => false
}
records = SmarterCSV.process(filename, options)

Related

Read multiple concatenated json objects in Ruby

I have a file that contains multiple JSON objects that are not separated by comma :
{
"field" : "value",
"another_field": "another_value"
} // no comma
{
"field" : "value"
}
Each of the objects standalone is a valid json object.
Is there a way that I can process this file easily?
I know this is NOT a valid json, but unfortunately this file is being generated by a 3rd party tool. I have no option of changing the way the output looks like.
I can't open a text editor and smart-insert commas / square brackets before the run, since this is an automated process (I also really don't want to write code that opens the file and manipulates it).
In .NET there's a library that has this exact feature :
https://stackoverflow.com/a/29480032/2970729
https://www.newtonsoft.com/json/help/html/P_Newtonsoft_Json_JsonReader_SupportMultipleContent.htm
Is there anything equivalent in Ruby?

As long as your file is that simple you might want to do something like this:
# content = File.read(filename)
content =<<-EOF
{
"field" : "value",
"another_field": "another_value"
} // no comma
{
"field" : "value"
}
EOF
require 'json'
JSON.parse("[#{content.gsub(/\}.*?\{/m, '},{')}]")
#=> [{"field"=>"value", "another_field"=>"another_value"}, {"field"=>"value"}]

The yajl-ruby gem enables processing concatenated JSON in Ruby. The parser can read from a String or an IO. Each complete object is yielded to a block.
require 'yajl'
File.open 'file.json' do |f|
Yajl.load f do |object|
# do something with object
end
end
See the documentation for other options (buffer size, symbolized keys, etc).

Rails Roo gem .xlsx output contains the object not the output of method

I am using the Roo gem to output a spreadsheet from a Rails app. One of my columns is a hash (Postgres DB). I would like to format the cell contents into something more readable. I am using a method to return a human readable cell.
The column data looks like this:
Inspection.first.results
=> {"soiled"=>"oil on back",
"assigned_to"=>"Warehouse#firedatasolutions.com",
"contaminated"=>"blood on left cuff",
"inspection_date"=>"01/01/2017",
"physical_damage_seam_integrity"=>"",
"physical_damage_thermal_damage"=>"",
"physical_damage_reflective_trim"=>"",
"physical_damage_rips_tears_cuts"=>"small tear on right sleeve",
"correct_assembly_size_compatibility_of_shell_liner_and_drd"=>"",
"physical_damage_damaged_or_missing_hardware_or_closure_systems"=>""}
In my Inspections model I defined the following method:
def print_results
self.results.each do |k,v|
puts "#{k.titleize}:#{v.humanize}\r\n"
end
end
So in the console I get this:
Inspection.first.print_results
Soiled:Oil on back
Assigned To:Warehouse
Contaminated:Blood on left cuff
Inspection Date:01/01/2017
Physical Damage Seam Integrity:
Physical Damage Thermal Damage:
Physical Damage Reflective Trim:
Physical Damage Rips Tears Cuts:Small tear on right sleeve
Correct Assembly Size Compatibility Of Shell Liner And Drd:
Physical Damage Damaged Or Missing Hardware Or Closure Systems:
=> {"soiled"=>"oil on back",
"assigned_to"=>"Warehouse",
"contaminated"=>"blood on left cuff",
"inspection_date"=>"01/01/2017",
"physical_damage_seam_integrity"=>"",
"physical_damage_thermal_damage"=>"",
"physical_damage_reflective_trim"=>"",
"physical_damage_rips_tears_cuts"=>"small tear on right sleeve",
"correct_assembly_size_compatibility_of_shell_liner_and_drd"=>"",
"physical_damage_damaged_or_missing_hardware_or_closure_systems"=>""}
But when I put this in the index.xlsx.axlsx file
wb = xlsx_package.workbook
wb.add_worksheet(name: "Inspections") do |sheet|
sheet.add_row ['Serial Number', 'Category', 'Inspection Type', 'Date',
'Pass/Fail', 'Assigned To', 'Inspected By', 'Inspection Details']
#inspections.each do |inspection|
sheet.add_row [inspection.ppe.serial, inspection.ppe.category,
inspection.advanced? ? 'Advanced' : 'Routine',
inspection.results['inspection_date'],
inspection.passed? ? 'Pass' : 'Fail',
inspection.ppe.user.last_first_name,
inspection.user.last_first_name,
inspection.print_results]
end
end
The output in the spreadsheet is the original hash, not the results of the print statement.
{"soiled"=>"oil on back",
"assigned_to"=>"Warehouse",
"contaminated"=>"blood on left cuff", "inspection_date"=>"01/01/2017",
"physical_damage_seam_integrity"=>"",
"physical_damage_thermal_damage"=>"",
"physical_damage_reflective_trim"=>"",
"physical_damage_rips_tears_cuts"=>"small tear on right sleeve",
"correct_assembly_size_compatibility_of_shell_liner_and_drd"=>"",
"physical_damage_damaged_or_missing_hardware_or_closure_systems"=>""}
Is it possible to get the output of the method into the cell rather than the hash object?

The problem is that your print_results method prints out what you want to stdout (that is, the console), but still returns the original hash. The return value of the method is all that matters to Roo.
What you want to do is rewrite print_results to return the formatted string:
def print_results
self.results.map do |k,v|
"#{k.titleize}:#{v.humanize}\r\n"
end.join
end
This will return a string (note the use of .join to combine the array of strings returned by .map) that you can throw into Roo and get your desired output.

Use gem 'postgres-copy' to import csv file

currently, I want to import above 55,000 records into my database from a CSV file. This is the code that I am using:
CSV.foreach(Rails.root.join('db/seeds/locations.csv'), headers: true) do |row|
val = Location.find_or_initialize_by(code: row[0])
val.name = row[1]
val.ecc = row[2] || 'MISSING'
val.created_by = User.find_by(name: 'anh')
val.updated_by = User.find_by(name: 'anh')
val.save!
end
However, it is too slow and I have just installed the gem 'postgres-copy'. I read the official documentation, and I believe I can use the class method copy_from to do the job, but if you read my current code, you can see that I am referring the data to the another table(association), and the documentation doesn't mention anything about association or validation. Therefore, I am wondering if there are any ways to solve it. This is the first time I use this gem. Thanks for reading.

I don't know that gem, but I would be very surprised if it can support multi-table copy since PostgreSQL's COPY works on a single table. 50K rows isn't all that many. You might try wrapping your insertions in transactions to avoid one commit per transaction. Doubt you want to wrap all 50K in a transaction though, but something like this:
User.connection.begin_transaction
i = 0
CSV.foreach(...) do |row|
... # your original code here
i += 1
if i % 500 == 0
User.connection.commit_transaction
User.connection.begin_transaction
end
end
User.connection.commit_transaction
This will insert your rows 500 records at a time and you should see a noticeable speed up. Play around with the value of 500 to find the sweet spot.

So, now I understand that I cannot take advantage of the COPY command in POSTGRESQL since it can't copy multiple tables. Therefore, I switch to the gem activerecord-import. Comparing with the method that Philip Hallstrom mentioned above, using activerecord-import give a faster result, 1m20s vs 1m54s to import above 8000 records.
This is my code after installing the gem activerecord-import. Hopefully, it can help other people.
locations = []
columns = [:code, :name, :ecc]
CSV.foreach(Rails.root.join('db/seeds/locations.csv'), headers: true) do |row|
val = Location.find_or_initialize_by(code: row[0])
val.name = row[1]
val.ecc = row[2] || 'MISSING'
val.created_by = User.find_by(name: 'anh')
val.updated_by = User.find_by(name: 'anh')
locations << val
end
Location.import columns, locations, validate: false

Ruby on Rails statement executed on an array check if subelement string is unique

I have an array objects, for this example lets call it Diff. These diffs have multiple fields that are not all the same (old_image, new_image, url, etc). new_image and old_image in this case have fields on them, most importantly a field called image_file_name.
I want to get an array of all the diffs with an unique old_image.image_file_name i.e. no diff should have an old_image with the same file name.
I believe the logic should look something like this.
unique_diffs = Array.new
#diff.build.diffs.each { |diff|
if diff.old_image.image_file_name != #diff.old_image.image_file_name
unique_diffs.push(diff)
end
}
Or something like this
#unique_diffs = #diff.build.diffs.map{|diff| diff.old_image.image_file_name}.uniq
Any help would be much appreciated.

Try something like this:
Diff = Struct.new(:old_image)
Image = Struct.new(:image_file_name)
diffs = [
Diff.new(Image.new('name1')),
Diff.new(Image.new('name2')),
Diff.new(nil),
Diff.new(Image.new('name1')),
]
uniqs = diffs.select { |diff| diff.old_image }.uniq { |diff| diff.old_image.image_file_name }
p uniqs # prints Diff with name1 and Diff with name 2
The only important line is the one that calls select and uniq.
You need to use select to leave only the diffs with the old image, and then use uniq to drop those with the duplicated image file names.

I ended up using the loop, I was hoping to make this cleaner with the uniq function but it didn't seem to work, it gave me back all the diffs instead of the ones with the unique old image filename.
#diff.build.diffs.each { |diff|
if diff.old_image.image_file_name == #diff.old_image.image_file_name
# Logic went here
end
}
Still open to improving this but for now this will have to do.

Rails Microsoft Word, XML databinding, repeat rows

Those willing to jump straight to my questions can go to the paragraph "Please help with". You will find there my beginning of implementation, along with short XML samples
The story
The famous problem of inserting repeating content, like table rows, into a word template, using the rails framework.
I decided to implement a 'cleaner' solution for replacing some variables in a Word document with rails, using XML databinding. This solution works very well for non-repetitive content, but for repetitive content, a little extra dirty work must be done and I need help with it.
No C#, No Visual, just plain olde ruby on rails & XML
The databinded document
I have a Word document with some content controls, tagged with "human-readable" text, so my users know what should be inside.
I have used Word 2007 Content Control Toolkit to add some custom XML to a .docx file. Therefore in each .docx I have some customXml/itemsx.xml that contains my custom XML.
I have manually databinded this XML to text content control I have in my word template, using drag & drop with Word 2007 Content Control Toolkit.
The replacing process with nokogiri
Basically I already have some code that replaces every XML node by the corresponding value from a hash. For example if I provide this hash to my function :
variables = {
"some_xml-node" => "some_value"
}
It will properly replace XML in customXml/itemsx.xml of .docx file :
<root> <some> <xml-node>some_value</xml-node></some> </root>
So this is taken care of !
The repetitive content
Now as I said, this works perfectly for non-repetitive content. For repetitive content (in my case I want to repeat some <w:tr> in a document), the solution I'd like to go with, is
Manually insert some tags in word/document.xml of .docx file (this is dirty, but hell I can't think of anything else) before every <tr> that needs to be duplicated
In rails, parse the XML and locate the <tr> that needs duplicating using Nokogiri
Copy the tr as many times as I need
Look at some text inside this <tr>, find the databinding (which looks like <w:dataBinding w:xpath="/root[1]/movies[1]/movie[1]/name[1]"
Replace movie[1] by movie[index]
Repeat for every table that needs <tr> duplication
With this solution Therefore I ensure 100% compatibility with my existing system ! It's some kind of preprocessing...
Please help with
Finding an XML comment containing a custom string, and selecting the node just below it (using Nokogiri)
Changing attributes in many sub-nodes of the node found in 1.
XML/Hash samples that could be used (my beginning of implementation after that):
Sample of .docx word/document.xml
<w:document>
<!-- My_Custom_Tag_ID -->
<w:tr someparam="something">
<w:td></w:td>
<w:td><w:sthelse></w:sthelse><w:dataBinding w:xpath="/root[1]/movies[1]/movie[1]/name[1]><w:sth>Value</w:sth></w:td>
<w:td></<:td>
</w:tr>
</w:document>
Sample of input parameter repeat_tag hash
repeat_tags_sample = [
{
"tag" => "My_Custom_Tag_ID",
"repeatable-content" => "movie"
},
{
"tag" => "My_Custom_Tag_ID_2",
"repeatable-content" => "cartoons"
}
]
Sample of input parameter contents hash
contents_sample =
{
"movies" => [{"name" => "X-Men",
"year" => 1998,
"property-xxx" => 42
}, { "name" => "X-Men-4",
"year" => 2007,
"property-xxx" => 42
}],
"cartoons" => [{"name" => "Tom_Jerry",
"year" => 1995,
"property-yyy" => "cat"
}, { "name" => "Random_name",
"year" => 2008,
"property-yyy" => 42
}]
}
My beginning of implementation :
def dynamic_table_content(zip, repeat_tags, contents)
doc = zip.find_entry("word/document.xml")
xml = Nokogiri::XML.parse(doc.get_input_dtream)
# repeat_tags_sample = [ {
# "tag" => My_Custom_Tag_ID",
# "repeatable-content" => "movie"},
# ...]
repeat_tags.each do |rpt|
content = contents[rpt[:repeatable-content]]
# content now looks like [
# {"name" => "X-Men",
# "year" => 1998,
# "property-xxx" => 42, ...},
# ...]
content_name = rpt[:repeateable_content].to_s
# the 'movie' of '/root[1]/movies[1]/movie[1]/name[1]' (see below)
puts "Processing #{rpt[:tag]}, adding #{content_name}s"
# Word document.xml sample code looks like this :
# <!-- My_Custom_Tag_ID_inserted_manually -->
# <w:tr ...>
# ...
# <w:dataBinding w:xpath="/root[1]/movies[1]/movie[1]/name[1]>
# ...
# </w:tr>
Find a comment containing a custom string, and select the node just below
# Find starting <w:tr > tag located after <!-- rpt[:tag] -->
base_tr_node = find the node after
# Duplicate it as many times as we want.
content.each_with_index do |content, index|
puts "Adding #{content_name} : #{content}.to_s"
new_tr_node = base_tr_node.add_next_sibling(base_tr_node)
# inside this new node there are many
# <w:dataBinding w:xpath="/root[1]/movies[1]/movie[1]/name[1]>
# <w:dataBinding w:xpath="/root[1]/movies[1]/movie[1]/year[1]>
# ..../movie[1]/property-xxx[1]
# GOAL : replace every movie[1] by movie[index]
Change attributes in many sub-nodes of the node found in 1.
new_tr_node.change_attributes as shown in (see GOAL in previous comments)
# Maybe, it would be something like
# new_tr_node.gsub("(#{content_name})\[([1-9]+)\]", "\1\[#{index}\]")
# ... But new_tr_node is a nokogiri element so .gsub doesn't exist
end
end
#replace["word/document.xml"] = xml.serialize :save_zip_with => 0
end

I have looked at the DoPE extension for Word documents. It looks great ! But alas I had already done a lot of work, and just now I (almost) finished building my own preprocessor.
What I needed was more complicated than what I originally asked. But nevertheless, the answers would be :
EDIT : fixed bad regex/xpath
# 1. Find a comment containing a custom string, and select the node just below
comment_nodes = doc.xpath("//comment()")
# Loop like comment_nodes.each do |comment|
base_tr_node = comment.next_sibling.next_sibling
# For some reason, need to apply next_sibling twice, thought the comment is indeed just above the <w:tr> node
# 2. Change attributes in many sub-nodes of the node found in 1.
matches = tr_node.search('.//*[name()='w:dataBinding']')
matches.each do |databinding_node|
# replace '.*phase[1].*' by '.*phase[index].*'
databinding_node['w:xpath'].gsub("#{comment.text}\[1\]", "#{comment.text}\[#{index}\]")
end

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

When reading CSV files the output is strange - ruby-on-rails

If you set the option :headers_in_file => false in options, that should sort the problem out. i.e. filename = 'db/csv/airports_codes.csv' options = { :col_sep => 'tt', :headers_in_file => false } records = SmarterCSV.process(filename, options)

Related

Read multiple concatenated json objects in Ruby

Rails Roo gem .xlsx output contains the object not the output of method

Use gem 'postgres-copy' to import csv file

Ruby on Rails statement executed on an array check if subelement string is unique

Rails Microsoft Word, XML databinding, repeat rows

Categories

Resources