Break out of Ruby Find.find loop when condition met

Break out of Ruby Find.find loop when condition met - ruby-on-rails

I'm iterating through a large directory of files with:
Find.find('/some/path/to/directory/root') do |path|
# ...do stuff with files…
end
What's a good way to limit the size of this enumerable group to 500 (or break out in some other way) if, say, Rails.env.development??
I could use a counter but what's "the Ruby way"?

This is one way:
Find.find('/some/path/to/directory/root').first(500).each do |path|
# do something
end

Find.find enumerates the files; In other words it wants to loop over all of them. You can tack on an index value using:
Find.find(ENV['HOME']).with_index(1) do |path, i|
puts path
break if i > 10
end
puts 'done'
and use that to keep track of how many you've processed.
If you want to "do 'em all":
limit = <some number>
do_em_all = (limit == 0)
Find.find(ENV['HOME']).with_index(1) do |path, i|
puts path
break unless (do_em_all || i <= limit)
end
puts 'done'
If limit is 0 you'll process everything. If it's > 0 you'll loop limit times.
You can also use Dir.entries or Dir.glob which will return an array of entries that you can slice apart as you want.
Enumerable's each_slice could be useful at that point.

Related

Counter for parallel processes in Ruby

I have this code that uses the parallel gem to split work among different processes.
Parallel.map(list, :in_processes=>4) do |item|
if item.name == "A"
puts "A"
else
puts "B"
end
end
What would be the best way to have a counter shared between the processes in order to get the exact number of times I got A and B?

Receive the Parallel's returned value (array of returned values from each iteration), and use Array#count method.
list = ["A","B","A","B","A","B","A","B","A","B"]
result = Parallel.map(list, :in_processes=>4) do |item|
if item == "A"
puts "Got A"
return "A"
else
puts "Got B"
return "B"
end
end
result # Array of "A" and "B" like ["A","B","A","A","B","B","A","A","B","B"]
# Now, you can do whatever you want with the result array.
count_a = result.count("A") # 5
count_b = result.count("B") # 5
(Ruby doesn't need explicit return keyword, but I put it to avoid any misunderstandings.)

Trying to get a total - Ruby

I'm trying to get the total cost in one of my field called "upgrade_cost" and store that in a variable called $tuc
def totalUpgradeCost
$e = Experience.all
$tuc = 0
(e.emf_assets).each do |i|
i.upgrade_cost += $tuc
end
return $tuc
end
I'm getting some error undefined local variable or method `e', new to ruby. Anyone help?

I am assuming that emf_assets are associated (via has_many) with an experience. That said I think the following could work for you:
def total_upgrade_cost
total = 0 # use a more descriptive variable names
all_experiences = Experience.all
all_experiences.each do |experience| # iterate over each `experiment`
experience.emf_assets.each do |asset| # load `emf_assets` for each `experiment`
# add the `upgrade_cost` (which might be `nil`) to `total`
total += asset.upgrade_cost.to_i
end
end
total # no need for an explicit `return`
end
Please note that this might work for smaller numbers of experiences and emf_assets, but in a next step performance will benefit from some optimization. But I think that optimization is out of the scope of this question at the moment. You will need to avoid the N+1 query problem and it might makes sense to do the whole calculation in your database.

What is the e in e.emf_assets? If you mean $e, you aren't allowed to drop the $. In Ruby, a $ at the start of a variable name indicates a global variable. If you aren't using $e outside of this function anyway, it would be better to call it simply e, so that it wouldn't be visible outside of the function. Regardless, you're getting an error because $e refers to a global, and e refers to a separate (undefined) local variable.

This is not PHP. $ sign isn't required everywhere. You've used $ with one e and left another empty, that's why the error.
This code should work:
def totalUpgradeCost
e = Experience.all
tuc = 0
e.emf_assets.each do |i|
tuc += i.upgrade_cost
end
return tuc
end
This is doable in shorter way:
def totalUpgradeCost
e = Experience.all
e.emf_assets.inject(0) {|sum, i| sum += i.upgrade_cost}
end

Discriminate first and last element in each?

#example.each do |e|
#do something here
end
Here I want to do something different with the first and last element in each, how should I achieve this? Certainly I can use a loop variable i and keep track if i==0 or i==#example.size but isn't that too dumb?

One of the nicer approaches is:
#example.tap do |head, *body, tail|
head.do_head_specific_task!
tail.do_tail_specific_task!
body.each { |segment| segment.do_body_segment_specific_task! }
end

You can use each_with_index and then use the index to identify the first and last items. For example:
#data.each_with_index do |item, index|
if index == 0
# this is the first item
elsif index == #data.size - 1
# this is the last item
else
# all other items
end
end
Alternately, if you prefer you could separate the 'middle' of the array like so:
# This is the first item
do_something(#data.first)
#data[1..-2].each do |item|
# These are the middle items
do_something_else(item)
end
# This is the last item
do_something(#data.last)
With both these methods you have to be careful about the desired behaviour when there are only one or two items in the list.

A fairly common approach is the following (when there are certainly no duplicates in the array).
#example.each do |e|
if e == #example.first
# Things
elsif e == #example.last
# Stuff
end
end
If you suspect array may contain duplicates (or if you just prefer this method) then grab the first and last items out of the array, and handle them outside of the block.
When using this method you should also extract the code that acts on each instance to a function so that you don't have to repeat it:
first = #example.shift
last = #example.pop
# #example no longer contains those two items
first.do_the_function
#example.each do |e|
e.do_the_function
end
last.do_the_function
def do_the_function(item)
act on item
end

What is an elegant 'ruby' way (within rails if it matters) to alter a loop's function every Nth iteration while using list.each?

What is an elegant 'ruby' way to alter a loop's function every Nth iteration? I would prefer not to use (1..50).each do |i| because I want to iterate over every object in a list objects.
objects.each do |object|
#Do this with object information
#Do not do this if this is the third time through the loop
end

objects.each_with_index do |object, idx|
if idx == 2 # third time
# or
# if idx % 3 == 2 # every third time
# do special thing
else
# do normal thing
end
end

How can I speed up this Rails code?

It's a vague question I know....but the performance on this block of code is horrible. It takes about 15secs from the original post to the action to rendering the page...
The purpose of this action is to retrieve all Occupations from a CV, all the skills from that CV and the occupations. They need to be organized in 2 arrays:
the first array contains all the Occupations (no duplicates) and has them ordered according their score. Fo each double entry found the score is increased by 1
the second array contains ALL the skills from both the occupation array and the cv. Again no doubles are allowed, but for every double encountered the score of the existing is increased by one.
Below is the code block that performs this operation. It's relatively big compared to my other code snippets, but i hope it's understandable. I know working with the arrays like i do is confusing, but here is what each array location means:
position 0 : the actuall skill/occupation object
position 1 : the score of the entry
position 2 : the location found in the db
position 3 : the location found in the cv
def categorize
#cv = Cv.find(params[:cv_id], :include => [:desired_occupations, :past_occupations, :educational_skills])
#menu = :second
#language = Language.resolve(:code => :en, :name => :en)
#occupation_hashes = []
#skill_hashes = []
(#cv.desired_occupations + #cv.past_occupations).each do |occupation|
section = []
section << 'Desired occupation' if #cv.desired_occupations.include? occupation
section << 'Work experience' if #cv.past_occupations.include? occupation
unless (array = #occupation_hashes.assoc(occupation)).blank?
array[1] += 1
array[2] = (array[2] & section).uniq
else
#occupation_hashes << [occupation, 1, section]
end
occupation.skills.each do |skill|
unless (array = #skill_hashes.assoc skill).blank?
label = occupation.concept.label(#language).value
array[1]+= 1
array[3] << label unless array[3].include? label
else
#skill_hashes << [skill, 1, [], [occupation.concept.label(#language).value]]
end
end
end
#cv.educational_skills.each do |skill|
unless (array = #skill_hashes.assoc skill).blank?
array[1]+= 1
array[3] << 'Education skills' unless array[3].include? 'Education skills'
else
#skill_hashes << [skill, 1, ['Education skills'], []]
end
end
# Sort the hashes
#occupation_hashes.sort! { |x,y| y[1] <=> x[1]}
#skill_hashes.sort! { |x,y| y[1] <=> x[1]}
#max = #skill_hashes.first[1]
#min = #skill_hashes.last[1] end
I can post the additional models and migrations to make it clear what each class does, but I think the first few lines of the above script should be clear on the associations. I'm looking for a way to optimize the each-loops...

That's quite the block of code there. Generally if you're writing methods that serious you're going to have trouble maintaining it in the future. A technique that would help is breaking up that monolithic chunk of code and turning it into a helper class that does the processing in more logical stages, making it easier to fine-tune aspects of it.
For instance, an interface might be:
#categorizer = CvCategorizer.new(params[:cv_id])
This would encapsulate all of the above and save it into instance variables made accessible by being declared with attr_reader.
Using a utility class means you can break up the initialization into steps that are made more clear:
def initialize(cv_id)
# Call a wrapper method that loads the CV
#cv = self.load_cv(cv_id)
# Perform discrete steps to re-order the imported data
self.organize_occupations
self.organize_skills
end
It's really hard to say why this is slow by just looking at it, though I would pay very close attention to log/development.log to see what's going on in there. It could be the initial load is painfully slow but the rest of the method is fine.

You should do a but of profiling in your code to see what is taking a large chunk of time. You can figure out how to work on of the profilers, or just sprinkle some simple puts or logger.info statements throughout your code with a timestamp. Probably easiest to do this by using Benchmark. Note: you may need to require 'benchmark'... not sure if it is auto required in Rails or not.
For a single line, you can do something like this:
logger.info Benchmark.measure { #cv = Cv.find(params[:cv_id], :include => [:desired_occupations, :past_occupations, :educational_skills]) }
And for timing larger blocks of code:
logger.info Benchmark.measure do
(#cv.desired_occupations + #cv.past_occupations).each do |occupation|
section = []
section << 'Desired occupation' if #cv.desired_occupations.include? occupation
section << 'Work experience' if #cv.past_occupations.include? occupation
unless (array = #occupation_hashes.assoc(occupation)).blank?
array[1] += 1
array[2] = (array[2] & section).uniq
else
#occupation_hashes << [occupation, 1, section]
end
end
end
I'd just start with large blocks and then narrow it down. Not knowing how large of a dataset you are dealing with, it is hard to say what the problem zone is.
I'll also concur with others that you will be way better off to break this thing into smaller methods. This will also make it easier to test for performance, as you can do things like:
Benchmark.measure { 10000.times { foo.do_that_thing_that_might_be_slow }}

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Break out of Ruby Find.find loop when condition met - ruby-on-rails

This is one way: Find.find('/some/path/to/directory/root').first(500).each do |path| # do something end

Related

Counter for parallel processes in Ruby

Trying to get a total - Ruby

Discriminate first and last element in each?

What is an elegant 'ruby' way (within rails if it matters) to alter a loop's function every Nth iteration while using list.each?

How can I speed up this Rails code?

Categories

Resources