How to avoid importing nil object when reading spreadsheet with roo on Rails 5.2? - ruby-on-rails

My application manages hierarchical classifications based on lists of values (dictionnaries). At some point, I need to import the parent-child relationships from an Excel sheet, and create persisted ValuesToValues objects.
Based on Ryan Bates' RailsCast 396, I created the import model in which the main loop is:
(2..spreadsheet.last_row).map do |i|
# Read columns indexes
parent = header.index("Parent") +1
level = header.index("Level") +1
code = header.index("Code") +1
# Skip if parent is blank
next if spreadsheet.cell(i, parent).blank?
# Count links
#links_counter += 1
parent_values_list_id = values_lists[((spreadsheet.cell(i, level).to_i) -1)]
child_values_list_id = values_lists[spreadsheet.cell(i, level).to_i]
parent_value_id = Value.find_by(values_list_id: parent_values_list_id, code: spreadsheet.cell(i, parent).to_s).id
child_value_id = Value.find_by(values_list_id: child_values_list_id, code: spreadsheet.cell(i, code).to_s).id
link_code = "#{parent_values_list_id}/#{spreadsheet.cell(i, parent)} - #{child_values_list_id}/#{spreadsheet.cell(i, code)}"
link_name = "#{spreadsheet.cell(i, parent)} #{spreadsheet.cell(i, code)}"
link = ValuesToValues.new( playground_id: playground_id,
classification_id: #classification.id,
parent_values_list_id: parent_values_list_id,
child_values_list_id: child_values_list_id,
parent_value_id: parent_value_id,
child_value_id: child_value_id,
code: link_code,
name: link_name
)
end
The issue is that, when encourtering a root value -without a parent value- the loop creates a nil object, which does not pass the later validation.
How can I build the loop in order to consider only rows where the Parent cell is not empty?

I finally decided to manage my own array of imported values instead of using the array based on the filtered sheet rows.
I added the following code around the main loop:
# Create array of links
linked_values = Array.new
# start loading links for each values list
(2..spreadsheet.last_row).map do |i|
...
and
...
linked_values << link
end
linked_values
Then linked_values array is returned, which only contains valid links records.

Related

Store returned values in variable

I've got a code which renames the names of files to randomly chosen numbers. The code works, but I can't seem to figure out how to store the original filenames and the respective renamed filename (random number).
When I run the code, I only get the values of the last iteration using 'return'. But how do I store the original filenames and the respective renamed filenames?
So, I want to have a list of 'file_name' (which contains all the original filenames) and 'rand_keynumber' (which are the generated random numbers)
Thank you.
import os
import random
numbers = range(1,1025)
numbers_list = list(map(str,numbers))
def keynumber():
# Generate a random index
rand_index = random.randint(0, len(numbers_list)-1)
# Get the keynumber
global rand_keynumber
rand_keynumber = numbers_list[rand_index]
# Remove the used-up keynumber from the list to
# prevent randomly selecting it again when renaming
numbers_list.remove(rand_keynumber)
return rand_keynumber
def renam_name():
os.chdir(r"C:\Users\samwi\OneDrive\Bureaublad\videos_anonimisatie\video_to_rename")
file_list = os.listdir(r"C:\Users\samwi\OneDrive\Bureaublad\videos_anonimisatie\video_to_rename")
global file_name
for f in file_list:
# get the file extension
file_name, img_type = os.path.splitext(f)
os.rename(f, keynumber() + img_type)
return file_name
renam_name()
It looks like you are assigning a new value to the global file_name variable on each iteration. You need to append each filename and the selected key number to a list before moving to the next file in file_list.
import os
import random
numbers = range(1,1025)
numbers_list = list(map(str,numbers))
file_names = []
def keynumber():
# Generate a random index
rand_index = random.randint(0, len(numbers_list)-1)
# Get the keynumber
rand_keynumber = numbers_list[rand_index]
# Remove the used-up keynumber from the list to
# prevent randomly selecting it again when renaming
numbers_list.remove(rand_keynumber)
return rand_keynumber
def renam_name():
os.chdir(r"C:\Users\samwi\OneDrive\Bureaublad\videos_anonimisatie\video_to_rename")
file_list = os.listdir(r"C:\Users\samwi\OneDrive\Bureaublad\videos_anonimisatie\video_to_rename")
for f in file_list:
# get the file extension
next_file_name, img_type = os.path.splitext(f)
next_keynumber = keynumber()
file_names.append([next_file_name, next_keynumber])
os.rename(f, next_keynumber + img_type)
renam_name()
You could just create an empty array under import random and in your functions just before returning rand_keynumber and file_name, append what you want to the empty array using .append

Use gem 'postgres-copy' to import csv file

currently, I want to import above 55,000 records into my database from a CSV file. This is the code that I am using:
CSV.foreach(Rails.root.join('db/seeds/locations.csv'), headers: true) do |row|
val = Location.find_or_initialize_by(code: row[0])
val.name = row[1]
val.ecc = row[2] || 'MISSING'
val.created_by = User.find_by(name: 'anh')
val.updated_by = User.find_by(name: 'anh')
val.save!
end
However, it is too slow and I have just installed the gem 'postgres-copy'. I read the official documentation, and I believe I can use the class method copy_from to do the job, but if you read my current code, you can see that I am referring the data to the another table(association), and the documentation doesn't mention anything about association or validation. Therefore, I am wondering if there are any ways to solve it. This is the first time I use this gem. Thanks for reading.
I don't know that gem, but I would be very surprised if it can support multi-table copy since PostgreSQL's COPY works on a single table. 50K rows isn't all that many. You might try wrapping your insertions in transactions to avoid one commit per transaction. Doubt you want to wrap all 50K in a transaction though, but something like this:
User.connection.begin_transaction
i = 0
CSV.foreach(...) do |row|
... # your original code here
i += 1
if i % 500 == 0
User.connection.commit_transaction
User.connection.begin_transaction
end
end
User.connection.commit_transaction
This will insert your rows 500 records at a time and you should see a noticeable speed up. Play around with the value of 500 to find the sweet spot.
So, now I understand that I cannot take advantage of the COPY command in POSTGRESQL since it can't copy multiple tables. Therefore, I switch to the gem activerecord-import. Comparing with the method that Philip Hallstrom mentioned above, using activerecord-import give a faster result, 1m20s vs 1m54s to import above 8000 records.
This is my code after installing the gem activerecord-import. Hopefully, it can help other people.
locations = []
columns = [:code, :name, :ecc]
CSV.foreach(Rails.root.join('db/seeds/locations.csv'), headers: true) do |row|
val = Location.find_or_initialize_by(code: row[0])
val.name = row[1]
val.ecc = row[2] || 'MISSING'
val.created_by = User.find_by(name: 'anh')
val.updated_by = User.find_by(name: 'anh')
locations << val
end
Location.import columns, locations, validate: false

Using index value in method

In my Rails application, in a model, I am trying to use the loop index x in the following method, and I can't figure out how to get the value:
def set_winners ## loops over 4 quarters
1.upto(4) do |x|
qtr_[x]_winner.winner = 1
qtr_[x]_winner.save
end
end
I'm going to keep searching but any help would be greatly appreciated!
edit: So I guess I can't do that! Here is the original method I was trying to refactor in full by looping four times:
def set_winners
## set all 4 quarter's winning squares
home_qtr_1 = game.home_q1_score.to_s.split('').last.to_i
away_qtr_1 = game.away_q1_score.to_s.split('').last.to_i
qtr_1_winner = squares.where(xvalue:home_qtr_1, yvalue:away_qtr_1).first
qtr_1_winner.winner = 1
qtr_1_winner.save
home_qtr_2 = game.home_q2_score.to_s.split('').last.to_i
away_qtr_2 = game.away_q2_score.to_s.split('').last.to_i
qtr_2_winner = squares.where(xvalue:home_qtr_2, yvalue:away_qtr_2).first
qtr_2_winner.winner = 1
qtr_2_winner.save
home_qtr_3 = game.home_q3_score.to_s.split('').last.to_i
away_qtr_3 = game.away_q3_score.to_s.split('').last.to_i
qtr_3_winner = squares.where(xvalue:home_qtr_3, yvalue:away_qtr_3).first
qtr_3_winner.winner = 1
qtr_3_winner.save
home_qtr_4 = game.home_q4_score.to_s.split('').last.to_i
away_qtr_4 = game.away_q4_score.to_s.split('').last.to_i
qtr_4_winner = squares.where(xvalue:home_qtr_4, yvalue:away_qtr_4).first
qtr_4_winner.winner = 1
qtr_4_winner.save
end
Is there a better way to do this if it's bad practice to dynamically change attribute names?
It looks like you are trying to do a PHP-like trick in a language that doesn't support it, and where we recommend NOT doing it because it results in code that is very difficult to debug due to the dynamically named variables.
It looks like you want to generate a variable name using:
qtr_[x]_winner
to create something like:
qtr_1_winner
Instead, consider creating an array named qtr_winner containing your objects and access the elements like:
qtr_winner[1]
or
qtr_winner[2]
etc.
You could create a hash to do a similar thing:
qtr_winner = {}
qtr_winner[1] = 5
then later access it using qtr_winner[1] and get 5 back or
qtr_winner[1].winner = 1
The determination of whether to use a hash or an array is whether you need to walk the container, or need random access. If you are always indexing into it using a value, then it's probably a wash about which is faster.
Based on your edit, you don't need dynamic variables. The only thing that changes in your loop is game.home_qN_score, so that's what the focus of your refactoring should be. Given that, here's a viable solution:
1.upto(4) do |i|
home_qtr = game.send("home_q#{i}_score)".to_s.split('').last.to_i
away_qtr = game.send("away_q#{i}_score)".to_s.split('').last.to_i
winner = squares.where(xvalue:home_qtr, yvalue:away_qtr).first
winner.winner = 1
winner.save
end
Original answer:
If qtr_1_winner, etc. are instance methods, you can use Object#send to achieve what you want:
def set_winners ## loops over 4 quarters
1.upto(4) do |x|
send("qtr_#{x}_winner").winner = 1
send("qtr_#{x}_winner").save
end
end

How do I iterate on a collection when I don't know what the upper limit of iterations is?

I have an API that I am pulling data from, and I want to collect all the tags from this API...but I don't know the number of tags in advance, and the API throttles access via the max number of results returned in any 1 call (100). It has an unlimited number of pages though.
So a call may look like this: Tag.update_tags(100, 5) where 100 is the max number of objects returned in 1 call and 5 is the page to begin (i.e. if you assume that the tags are stored sequentially, what this is saying is return the tag records with IDs in the range of 401 - 500.
The issue is, I don't want to manually have to enter 5 (i.e. I don't know what the upper limit is). There is no way for me to ping the total number of tags (if there were, I would simply divide it and put this call in a loop up to that number).
All I do know is that once it reaches a page that doesn't have any results, it will return an empty array [].
So, how do I loop through all the tags and stop when the result returned is an empty array (which would be the final result returned and therefore not evaluated)?
What does that loop look like?
Use an unconditional loop with a break statement when the result returns the empty array.
i = 1
loop do
result = call_to_api(i)
do_something_with(result)
i += 1
break if result.empty?
end
Of course in a production scenario you want something a little more robust, including exception handlers, some progress log reporting, and some kind of concrete iteration limit to ensure that the loop does not become infinite.
Update
Here's an example using a class to wrap up the logic.
class Api
DEFAULT_OPTIONS = {:start_position => 1, :max_iterations => 1000}
def initialize(base_uri, config)
#config = DEFAULT_OPTIONS.merge(config)
#position = config[:start_position]
#results_count = 0
end
def each(&block)
advance(&block) while can_advance?
log("Processed #{#results_count} results")
end
def advance(&block)
yield result
#results_count += result.count
#position += 1
#current_result = nil
end
def result
#current_result ||= begin
response = Net::HTTP.get_response(current_uri)
JSON.decode(response.body)
rescue
# provide some exception handling/logging
end
end
def can_advance?
#position < (#config[:start_position] + #config[:max_iterations]) && result.any?
end
def current_uri
Uri.parse("#{#base_uri}?page=#{#position}")
end
end
api = Api.new('http://somesite.com/api/v1/resource')
api.each do |result|
do_something_with(result)
end
There's also an angle with this to allow for concurrency by setting the start and iteration count for each thread which would definetly speed this up with the concurrent http requests.
Hmmm. You can get 100 items at a time, and start at a particular page. How to implement the iteration depends on what you want to do. Let's suppose that you want to collect all the unique tags. Establish a map (for example, a HashMap), then retrieve one page at a time and process it. When you hit a page that's empty, you're done.
// Implements a map and methods to update it
MyHashMap uniqueTags;
// Stores a page of tags
Page page;
Do
// get a page of tags
page = readTags();
if (page != null) {
uniqueTags.getUniqueTags(page);
} else {
break;
}
until (page == null);

Nokogiri/Ruby array question

I have a quick question. I am currently writing a Nokogiri/Ruby script and have the following code:
fullId = doc.xpath("/success/data/annotatorResultBean/annotations/annotationBean/concept/fullId")
fullId.each do |e|
e = e.to_s()
g.write(e + "\n")
end
This spits out the following text:
<fullId>D001792</fullId>
<fullId>D001792</fullId>
<fullId>D001792</fullId>
<fullId>D008715</fullId>
I wanted the just the numbers text in between the "< fullid>" saved, without the < fullId>,< /fullId> markup. What am I missing?
Bobby
I think you want to use the text() accessor (which returns the child text values), rather than to_s() (which serializes the entire node, as you see here).
I'm not sure what the g object you're calling write on is, but the following code should give you an array containing all of the text in the fullId nodes:
doc.xpath(your_xpath).map {|e| e.text}

Resources