How do I simplify pushing multiple values into an array in Ruby? - ruby-on-rails

How would you improve this:
time = Time.now
#time = []
#time.push(
(time-1.week).strftime("%m-%d"),
(time-6.days).strftime("%m-%d"),
(time-5.days).strftime("%m-%d"),
(time-4.days).strftime("%m-%d"),
(time-3.days).strftime("%m-%d"),
(time-2.days).strftime("%m-%d"),
(time-1.day).strftime("%m-%d"),
(time).strftime("%m-%d")
)
I'm trying out some of the suggestions below:
time = Time.now
iterations = 1000
Benchmark.bm do |bm|
bm.report do
iterations.times do
#time = 7.downto(0).map { |v| (time - v.days).strftime("%m-%d") }
end
end
bm.report do
iterations.times do
#time = []
#time.push(
(time-1.week).strftime("%m-%d"),
(time-6.days).strftime("%m-%d"),
(time-5.days).strftime("%m-%d"),
(time-4.days).strftime("%m-%d"),
(time-3.days).strftime("%m-%d"),
(time-2.days).strftime("%m-%d"),
(time-1.day).strftime("%m-%d"),
(time).strftime("%m-%d")
)
end
end
end
user system total real
0.350000 0.960000 1.310000 ( 1.310054)
0.310000 0.840000 1.150000 ( 1.156484)
downto is demonstrably slower than my method.
The next test used the method:
#time = (0..7).map { |x| (time - x.days).strftime("%m-%d") }.reverse
1000 iterations
user system total real
0.340000 0.980000 1.320000 ( 1.321518)
0.300000 0.840000 1.140000 ( 1.149759)
5000 iterations
user system total real
1.720000 4.800000 6.520000 ( 6.545335)
1.530000 4.180000 5.710000 ( 5.712035)
I'm having a hard time wrapping my head around this without looking at both downto and map in the Ruby core, but in both cases my more elongated method of writing this responds faster than the more simplified methods (I like the answers below much more from a readability standpoint). Please shed some light on my tests if I'm doing it wrong. I expected map to blow my way out of the water.
UPDATED FOR STEFAN'S ANSWER
So I see Stefan's answer below and throw it in the tester:
user system total real
0.040000 0.000000 0.040000 ( 0.035976)
1.520000 4.180000 5.700000 ( 5.704401)
Holy crap! 5000 iterations and it absolutely destroys my method.
Because he accurately points out that I'm only interested in Dates, I decide to change my own method from Time.now to Date.today and test it:
user system total real
0.090000 0.000000 0.090000 ( 0.085940)
0.390000 0.000000 0.390000 ( 0.398143)
It's somewhat weird that in the first test Stefan's method clocks in at 0.0359 and in the second clocks at 0.0859, more than double, but it's still hundredths of a second over 5000 iterations - so I think I'm deep into splitting hairs territory here.
Nevertheless - Stefan's way obliterates my own method - so I'm giving him the check mark.

You can do as below using #downto -
time = Time.now
#time = 7.downto(0).map {|v| (time - v.days).strftime("%m-%d") }

Since you're only interested in dates, you could use a Range of Date instances (dates increment in 1-day steps):
today = Date.today
(today-7..today).map { |date| date.strftime("%m-%d") }
#=> ["04-24", "04-25", "04-26", "04-27", "04-28", "04-29", "04-30", "05-01"]

You can use not pushing itself, but collection, i.e. combination of Range from zero to seven, and Array's #map method:
time = Time.now
#time = (0..7).map {|v| (time - v.days).strftime("%m-%d") }.reverse
# => ["04-24", "04-25", "04-26", "04-27", "04-28", "04-29", "04-30", "05-01"]

I would write it using the Array#map method
time = Time.now
#time = (0..7).map { |x| (time - x.days).strftime("%m-%d") }.reverse

Related

ActiveRecord - How fast are Calculate() methods in PostgreSQL?

I have a rather noobish question about ActiveRecord in ruby on rails.
I'm working on an app on a Postgresql database that will need to handle large amounts of data from multiple platforms as quickly as possible. I'm going through the process of trying to optimize for speed.
I have two functions and I'm wondering which one would be faster theoretically.
Example #1
def spend_branded(date_range)
total_branded_spend = 0.0
platform_list.each do |platform|
platform.where(date: date_range).each do |platform_performance|
total_branded_spend += platform_performance.spend["branded"].to_f
end
end
total_branded_spend
end
VS.
Example #2
def spend_branded(date_range)
total_branded_spend = 0.0
platform_list.each do |platform|
total_branded_spend += (platform.where(date: date_range).sum(:branded_spend)).to_f
end
total_branded_spend
end
As you can see, in the first example, a selection of records are retrieved via the .where() method and then are iterated on with the desired field summed manually. In the second example however, I'm making use of the .sum() method to do the summing at the database level.
I'm wondering if anyone knows which method is faster in general. I suspect the second method is faster, but is it faster by many degrees?
Thanks so much for taking the time to read this question.
EDIT:
As #lacostenycoder pointed out, I should have clarified what platform_list is. It references an array with 1 to 3 ActiveRecord collections containing 1 record per each day in the date_range.
Upon benchmarking with the method provided in his answer, I found the 2nd method to be slightly faster.
user system total real
spend_branded 0.000000 0.000000 0.000000 ( 0.003632)
spend_branded_sum 0.000000 0.000000 0.000000 ( 0.002612)
(102 records processed)
Here's how you can benchmark your methods for comparison. Open a rails console rails c, then paste this into your console.
def spend_branded(date_range)
total_branded_spend = 0.0
platform_list.each do |platform|
platform.where(date: date_range).each do |platform_performance|
total_branded_spend += platform_performance.spend["branded"].to_f
end
end
total_branded_spend
end
def spend_branded_sum(date_range)
total_branded_spend = 0.0
platform_list.each do |platform|
total_branded_spend += (platform.where(date: date_range).sum(:branded_spend)).to_f
end
total_branded_spend
end
require 'benchmark'
Benchmark.bm do |x|
x.report(:spend_branded) { spend_branded(date_range) }
x.report(:spend_branded_sum) { spend_branded_sum(date_range) }
end
Of course we would expect the 2nd way to be faster. We can probably offer more help if you showed more about the model relations and how platform_list is defined.
Also you might want to check out the PgHero gem which can be helpful in identifying slow queries and where to add indices to get better performance. In general when done correctly, doing proper calculations at the database level will be orders of magnitude faster than iteration over large sets of Ruby object.
Also you might try to refactor your first version to this:
def spend_branded(date_range)
platform_list.map do |platform|
platform.where(date: date_range)
.pluck(:spend).map{|h| h['branded'].to_f}.sum
end.sum
end
And 2nd version to
def spend_branded_sum(date_range)
platform_list.map do |platform|
platform.where(date: date_range).sum(:branded_spend).to_f
end.sum
end
lacostenycoder is correct to recommend that you benchmark your code.
If the values you are trying to sum are directly available in the database, Calculations are very likely going to be faster. I do not know how much faster.
If platform_list is a collection of models, something like this might work and might outperform your iteration:
Platform.
where(date: date_range).
where(id: platform_list.map(&:id)).
sum(:branded_spend)

Ruby Multi threading, what am I doing wrong?

So, in order to improve to speed of our app I'm experimenting multi threading with our rails app.
Here is the code:
require 'thwait'
require 'benchmark'
city = Location.find_by_slug("orange-county", :select => "city, state, lat, lng", :limit => 1)
filters = ContractorSearchConditions.new()
image_filter = ImageSearchConditions.new()
filters.lat = city.lat
filters.lon = city.lng
filters.mile_radius = 20
filters.page_size = 15
filters.page = 1
image_filter.page_size = 5
sponsored_filter = filters.dup
sponsored_filter.has_advertised = true
sponsored_filter.page_size = 50
Benchmark.bm do |b|
b.report('with') do
1.times do
cities = Thread.new{
Location.where("lat between ? and ? and lng between ? and ?", city.lat-0.5, city.lat+0.5, city.lng-0.5, city.lng+0.5)
}
images = Thread.new{
Image.search(image_filter)[:hits]
}
sponsored_results_extended = Thread.new{
sponsored_filter.mile_radius = 50
#sponsored_results = Contractor.search( sponsored_filter )
}
results = Thread.new{
Contractor.search( filters )
}
ThreadsWait.all_waits(cities, images, sponsored_results_extended, results)
#cities = cities.value
#images = images.value
#sponsored_results = sponsored_results_extended.value
#results = results.value
end
end
b.report('without') do
1.times do
#cities = Location.where("lat between ? and ? and lng between ? and ?", city.lat-0.5, city.lat+0.5, city.lng-0.5, city.lng+0.5)
#image = Image.search(image_filter)[:hits]
#sponsored_results = Contractor.search( sponsored_filter )
#results = Contractor.search( filters )
end
end
end
Class.search is running a search on our ElasticSearch servers.(3 servers behind a Load balancer), where active record queries are being runned in our RDS instance.
(Everything is in the same datacenter.)
Here is the output on our dev server:
Bob#dev-web01:/usr/local/dev/buildzoom/rails$ script/rails runner script/thread_bm.rb -e development
user system total real
with 0.100000 0.010000 0.110000 ( 0.342238)
without 0.020000 0.000000 0.020000 ( 0.164624)
Nota: I've a very limited knowledge if no knowledge about thread, mutex, GIL, ..
There is a lot more overhead in the "with" block than the "without" block due to the Thread creation and management. Using threads will help the most when the code is IO-bound, and it appears that is NOT the case. Four searches complete in 20ms (without block), which implies that in parallel those searches should take less that amount of time. The "with" block takes 100ms to execute, so we can deduce that at least 80ms of that time is not spent in searches. Try benchmarking with longer queries to see how the results differ.
Note that I've made the assumption that all searches have the same latency, which may or may not be true, and always perform the same. It may be possible that the "without" block benefits from some sort of query caching since it runs after the "with" block. Do results differ when you swap the order of the benchmarks? Also, I'm ignoring overhead from the iteration (1.times). You should remove that unless you change the number of iterations to something greater than 1.
Even though you are using threads, and hence performing query IO in parallel, you still need to deserialize whatever results are coming back from your queries. This uses the CPU. MRI Ruby 2.0.0 has a global interpreter lock. This means Ruby code can only run one line at a time, not in parallel, and only on one CPU core. In order to deserialize all your results, the CPU has to context switch many times between the different threads. This is a lot more overhead than deserializing each result set sequentially.
If your wall time is dominated by waiting for a response from your queries, and they don't all come back at the same time, then there might be an advantage to parallelizing with threads. But it's hard to predict that.
You could try using JRuby or Rubinius. These will both utilize multiple cores, and hence can actually speed up your code as expected.

Is there a way to check the performance of a command in the console in Ruby on Rails?

I couldn't find any information on if this was possible but it would be useful I could call a method on a command in the rails console and determine the performance using any measurement but I was mostly thinking about in time.
For example, I'm trying to figure out which of these is faster:
[val2,val3,val4,val5,val6].find{|x| x != val1}
[val2,val3,val4,val5,val6].all?{|x| x == val1}
Is there something like this?
[val2,val3,val4,val5,val6].find{|x| x != val1}.performance
There is! And you don't even need Rails. Look into benchmark from the standard library.
As a sample:
require 'benchmark'
puts Benchmark.measure { [val2,val3,val4,val5,val6].find{|x| x != val1} }
puts Benchmark.measure { [val2,val3,val4,val5,val6].all?{|x| x == val1} }
The report that is output will show (in seconds):
User CPU time.
System CPU time.
Sum of the User and System CPU times.
The elapsed real time.
Something that looks like this:
0.350000 0.010000 0.360000 ( 0.436450)
This gem:
https://github.com/igorkasyanchuk/benchmark_methods
No more code like this:
t = Time.now
user.calculate_report
puts Time.now - t
Now you can do:
benchmark :calculate_report # in class
And just call your method
user.calculate_report

Measure and Benchmark Time for Ruby Methods

How can i measure the time taken by a method and the individual statements in that method in Ruby. If you see the below method i want to measure the total time taken by the method and the time taken for database access and redis access. I do not want to write Benchmark.measure before every statement. Does the ruby interpreter gives us any hooks for doing this ?
def foo
# code to access database
# code to access redis.
end
The simplest way:
require 'benchmark'
def foo
time = Benchmark.measure {
code to test
}
puts time.real #or save it to logs
end
Sample output:
2.2.3 :001 > foo
5.230000 0.020000 5.250000 ( 5.274806)
Values are: cpu time, system time, total and real elapsed time.
Source: ruby docs.
You could use the Time object. (Time Docs)
For example,
start = Time.now
# => 2022-02-07 13:55:06.82975 +0100
# code to time
finish = Time.now
# => 2022-02-07 13:55:09.163182 +0100
diff = finish - start
# => 2.333432
diff would be in seconds, as a floating point number.
Use Benchmark's Report
require 'benchmark' # Might be necessary.
def foo
Benchmark.bm( 20 ) do |bm| # The 20 is the width of the first column in the output.
bm.report( "Access Database:" ) do
# Code to access database.
end
bm.report( "Access Redis:" ) do
# Code to access redis.
end
end
end
This will output something like the following:
user system total real
Access Database: 0.020000 0.000000 0.020000 ( 0.475375)
Access Redis: 0.000000 0.000000 0.000000 ( 0.000037)
<------ 20 -------> # This is where the 20 comes in. NOTE: This is not shown in output.
More information can be found here.
Many of the answers suggest the use of Time.now. But it is worth being aware that Time.now can change. System clocks can drift and might get corrected by the system's administrator or via NTP. It is therefore possible for Time.now to jump forward or back and give your benchmarking inaccurate results.
A better solution is to use the operating system's monotonic clock, which is always moving forward. Ruby 2.1 and above give access to this via:
start = Process.clock_gettime(Process::CLOCK_MONOTONIC)
# code to time
finish = Process.clock_gettime(Process::CLOCK_MONOTONIC)
diff = finish - start # gets time is seconds as a float
You can read more details here. Also you can see popular Ruby project, Sidekiq, made the switch to monotonic clock.
A second thought, define the measure() function with Ruby code block argument can help simplify the time measure code:
def measure(&block)
start = Time.now
block.call
Time.now - start
end
# t1 and t2 is the executing time for the code blocks.
t1 = measure { sleep(1) }
t2 = measure do
sleep(2)
end
In the spirit of wquist's answer, but a little simpler, you could also do it like below:
start = Time.now
# code to time
Time.now - start

Ruby's range step method causes very slow execution?

I've got this block of code:
date_counter = Time.mktime(2011,01,01,00,00,00,"+05:00")
#weeks = Array.new
(date_counter..Time.now).step(1.week) do |week|
logger.debug "WEEK: " + week.inspect
#weeks << week
end
Technically, the code works, outputting:
Sat Jan 01 00:00:00 -0500 2011
Sat Jan 08 00:00:00 -0500 2011
Sat Jan 15 00:00:00 -0500 2011
etc.
But the execution time is complete rubbish! It takes approximately four seconds to compute each week.
Is there some grotesque inefficiency that I'm missing in this code? It seems straight-forward enough.
I'm running Ruby 1.8.7 with Rails 3.0.3.
Assuming MRI and Rubinius use similar methods to generate the range the basic algorithm used with all the extraneous checks and a few Fixnum optimisations etc. removed is:
class Range
def each(&block)
current = #first
while current < #last
yield current
current = current.succ
end
end
def step(step_size, &block)
counter = 0
each do |o|
yield o if counter % step_size = 0
counter += 1
end
end
end
(See the Rubinius source code)
For a Time object #succ returns the time one second later. So even though you are asking it for just each week it has to step through every second between the two times anyway.
Edit: Solution
Build a range of Fixnum's since they have an optimised Range#step implementation.
Something like:
date_counter = Time.mktime(2011,01,01,00,00,00,"+05:00")
#weeks = Array.new
(date_counter.to_i..Time.now.to_i).step(1.week).map do |time|
Time.at(time)
end.each do |week|
logger.debug "WEEK: " + week.inspect
#weeks << week
end
Yes, you are missing a gross inefficiency. Try this in irb to see what you're doing:
(Time.mktime(2011,01,01,00,00,00,"+05:00") .. Time.now).each { |x| puts x }
The range operator is going from January 1 to now in increments of one second and that's a huge list. Unfortunately, Ruby isn't clever enough to combine the range generation and the one-week chunking into a single operation so it has to build the entire ~6million entry list.
BTW, "straight forward" and "gross inefficiency" are not mutually exclusive, in fact they're often concurrent conditions.
UPDATE: If you do this:
(0 .. 6000000).step(7*24*3600) { |x| puts x }
Then the output is produced almost instantaneously. So, it appears that the problem is that Range doesn't know how to optimize the chunking when faced with a range of Time objects but it can figure things out quite nicely with Fixnum ranges.

Resources