Improving Rails.cache.write by setting key-value pairs asynchronously - ruby-on-rails

I am currently thinking about improving the performance of Rails.cache.write when using dalli to write items to the memcachier cloud.
The stack, as it relates to caching, is currently:
heroku, memcachier heroku addon, dalli 2.6.4, rails 3.0.19
I am using newrelic for performance monitoring.
I am currently fetching "active students" for a given logged in user, represented by a BusinessUser instance, when its active_students method is called from a controller handling a request that requires a list of "active students":
class BusinessUser < ActiveRecord::Base
...
def active_students
Rails.cache.fetch("/studio/#{self.id}/students") do
customer_users.active_by_name
end
end
...
end
After looking at newrelic, I've basically narrowed down one big performance hit for the app in setting key values on memcachier. It takes an average of 225ms every time. Further, it looks like setting memcache key values blocks the main thread and eventually disrupts the request queue. Obviously this is undesirable, especially when the whole point of the caching strategy is to reduce performance bottlenecks.
In addition, I've benchmarked the cache storage with plain dalli, and Rails.cache.write for 1000 cache sets of the same value:
heroku run console -a {app-name-redacted}
irb(main):001:0> require 'dalli'
=> false
irb(main):002:0> cache = Dalli::Client.new(ENV["MEMCACHIER_SERVERS"].split(","),
irb(main):003:1* {:username => ENV["MEMCACHIER_USERNAME"],
irb(main):004:2* :password => ENV["MEMCACHIER_PASSWORD"],
irb(main):005:2* :failover => true,
irb(main):006:2* :socket_timeout => 1.5,
irb(main):007:2* :socket_failure_delay => 0.2
irb(main):008:2> })
=> #<Dalli::Client:0x00000006686ce8 #servers=["server-redacted:11211"], #options={:username=>"username-redacted", :password=>"password-redacted", :failover=>true, :socket_timeout=>1.5, :socket_failure_delay=>0.2}, #ring=nil>
irb(main):009:0> require 'benchmark'
=> false
irb(main):010:0> n = 1000
=> 1000
irb(main):011:0> Benchmark.bm do |x|
irb(main):012:1* x.report { n.times do ; cache.set("foo", "bar") ; end }
irb(main):013:1> x.report { n.times do ; Rails.cache.write("foo", "bar") ; end }
irb(main):014:1> end
user system total real
Dalli::Server#connect server-redacted:11211
Dalli/SASL authenticating as username-redacted
Dalli/SASL: username-redacted
0.090000 0.050000 0.140000 ( 2.066113)
Dalli::Server#connect server-redacted:11211
Dalli/SASL authenticating as username-redacted
Dalli/SASL: username-redacted
0.100000 0.070000 0.170000 ( 2.108364)
With plain dalli cache.set, we are using 2.066113s to write 1000 entries into the cache, for an average cache.set time of 2.06ms.
With Rails.cache.write, we are using 2.108364s to write 1000 entries into the cache, for an average Rails.cache.write time of 2.11ms.
⇒ It seems like the problem is not with memcachier, but simply with the amount of data that we are attempting to store.
According to the docs for the #fetch method, it looks like it would not be the way I want to go, if I want to throw cache sets into a separate thread or a worker, because I can't split out the write from the read - and self-evidently, I don't want to be reading asynchronously.
Is it possible to reduce the bottleneck by throwing Rails.cache.write into a worker, when setting key values? Or, more generally, is there a better pattern to do this, so that I am not blocking the main thread every time I want to perform a Rails.cache.write?

There are two factors that would contribute to overall latency under normal circumstances: client side marshalling/compression and network bandwidth.
Dalli mashalls and optionally compresses the data, which could be quite expensive. Here are some benchmarks of Marshalling and compressing a list of random characters (a kind of artificial list of user ids or something like that). In both cases the resulting value is around 200KB. Both benchmarks were run on a Heroku dyno - performance will obviously depend on the CPU and load of the machine:
irb> val = (1..50000).to_a.map! {rand(255).chr}; nil
# a list of 50000 single character strings
irb> Marshal.dump(val).size
275832
# OK, so roughly 200K. How long does it take to perform this operation
# before even starting to talk to MemCachier?
irb> Benchmark.measure { Marshal.dump(val) }
=> 0.040000 0.000000 0.040000 ( 0.044568)
# so about 45ms, and this scales roughly linearly with the length of the list.
irb> val = (1..100000).to_a; nil # a list of 100000 integers
irb> Zlib::Deflate.deflate(Marshal.dump(val)).size
177535
# OK, so roughly 200K. How long does it take to perform this operation
irb> Benchmark.measure { Zlib::Deflate.deflate(Marshal.dump(val)) }
=> 0.140000 0.000000 0.140000 ( 0.145672)
So we're basically seeing anywhere from a 40ms to 150ms performance hit just for Marshaling and/or zipping data. Marshalling a String will be much cheaper, while marshalling something like a complex object will be more expensive. Zipping depends on the size of the data, but also on the redundancy of the data. For example, zipping a 1MB string of all "a" characters takes merely about 10ms.
Network bandwidth will play some of a role here, but not a very significant one. MemCachier has a 1MB limit on values, which would take approximately 20ms to transfer to/from MemCachier:
irb(main):036:0> Benchmark.measure { 1000.times { c.set("h", val, 0, :raw => true) } }
=> 0.250000 11.620000 11.870000 ( 21.284664)
This amounts to about 400Mbps (1MB * 8MB/Mb * (1000ms/s / 20ms)), which makes sense. However, for even a relatively large, but still smaller value of 200KB, we'd expect a 5x speedup:
irb(main):039:0> val = "a" * (1024 * 200); val.size
=> 204800
irb(main):040:0> Benchmark.measure { 1000.times { c.set("h", val, 0, :raw => true) } }
=> 0.160000 2.890000 3.050000 ( 5.954258)
So, there are several things you might be able to do to get some speedup:
Use a faster marshalling mechanism. For example, using Array#pack("L*") to encode a list of 50,000 32-bit unsigned integers (like in the very first benchmark) into a string of length 200,000 (4 bytes for each integer), takes only 2ms rather than 40ms. Using compression with the same marshalling scheme, to get a similar sized value is also very fast (about 2ms as well), but the compression doesn't do anything useful on random data anymore (Ruby's Marshal produces a fairly redundant String even on a list of random integers).
Use smaller values. This would probably require deep application changes, but if you don't really need the whole list, you should be setting it. For example, the memcache protocol has append and prepend operations. If you are only ever adding new things to a long list, you could use those operations instead.
Finally, as suggested, removing the set/gets from the critical path would prevent any delays from affecting HTTP request latency. You still have to get the data to the worker, so it's important that if you're using something like a work queue, the message you send to the worker should only contain instructions on which data to construct rather than the data itself (or you're in the same hole again, just with a different system). A very lightweight (in terms of coding effort) would be to simply fork a process:
mylist = Student.where(...).all.map!(&:id)
...I need to update memcache with the new list of students...
fork do
# Have to create a new Dalli client
client = Dalli::Client.new
client.set("mylistkey", mylist)
# this will block for the same time as before, but is running in a separate process
end
I haven't benchmarked a full example, but since you're not execing, and Linux fork is copy-on-write, the overhead of the fork call itself should be minimal. On my machine, it's about 500us (that's micro-seconds not milliseconds).

Using Rails.cache.write to prefetch and store data in cache with workers (e.g. Sidekiq) is what I've seen at high volumes. Of course there is a trade off between speed and the money you want to spend. Think about:
the most used paths in your app (is active_students accessed often?);
what to store (just IDs or the entire objects or further down the chain);
if you can optimize that query (n+1?).
Also, if you really need speed, consider using a dedicated memcache service, instead of a Heroku add-on.

Related

Read data from csv file with foreach function

I have been reading data from csv, if there is a large csv file, for avoid this time-out(rack 12 sec timeout) i have read only 25 rows from csv after 25 rows it return and again make a request so this will continue until read all the rows.
def read_csv(offset)
r_count = 1
CSV.foreach(file.tempfile, options) do |row|
if r_count > offset.to_i
#process
end
r_count += 1
end
But here it is creating a new issue, let say first read 25 rows then when the next request comes offset is 25 that time it will read upto first 25 rows then it will start read from 26 and do process, so how can i skip this rows which already read?, i tried this if next to skip iteration but that fails, or is there any other efficient way to do this?
Code
def read_csv(fileName)
lines = (`wc -l #{fileName}`).to_i + 1
lines_processed = 0
open(fileName) do |csv|
csv.each_line do |line|
#process
lines_processed += 1
end
end
end
Pure Ruby - SLOWER
def read_csv(fileName)
lines = open("sample.csv").count
lines_processed = 0
open(fileName) do |csv|
csv.each_line do |line|
#process
lines_processed += 1
end
end
end
Benchmarks
I ran a new benchmark comparing your original method provided and my own. I also included the test file information.
"File Information"
Lines: 1172319
Size: 126M
"django's original method"
Time: 18.58 secs
Memory: 0.45 MB
"OneNeptune's method"
Time: 0.58 secs
Memory: 2.18 MB
"Pure Ruby method"
Time: 0.96
Memory: 2.06 MB
Explanation
NOTE: I added a pure ruby method, since using wc is sort of cheating, and not portable. In most cases it's important to use pure language solutions.
You can use this method to process a very large CSV file.
~2MB memory I feel is pretty optimal considering the file size, it's a bit of an increase of memory usage, but the time savings seems to be a fair trade, and this will prevent timeouts.
I did modify the method to take a fileName, but this was just because I was testing many different CSV files to make sure they all worked correctly. You can remove this if you'd like, but it'll likely be helpful.
I also removed the concept of an offset, since you stated you originally included it to try to optimize the parsing yourself, but this is no longer necessary.
Also, I keep track of how many lines are in the file, and how many were processed since you needed to use that information. Note, that lines only works on unix based systems, and it's a trick to avoid loading the entire file into memory, it counts the new lines, and I add 1 to account for the last line. If you're not going to count headers as line though, you could remove the +1 and change lines to "rows" to be more accurate.
Another logistical problem you may run into is the need to figure how to handle if the CSV file has headers.
You could use lazy reading to speed this up, the whole of the file wouldn't be read, just from the beginning of the file until the chunk you use.
See http://engineering.continuity.net/csv-tricks/ and https://reinteractive.com/posts/154-improving-csv-processing-code-with-laziness for examples.
You could also use SmarterCSV to work in chunks like this.
SmarterCSV.process(file_path, {:chunk_size => 1000}) do |chunk|
chunk.each do |row|
# Do your processing
end
do_something_else
end
enter code here
The way I did this was by streaming the result to the user, if you see what is happening it doesn't bother that much you have to wait. The timeout you mention won't happen here.
I'm not a Rails user so I give an example from Sinatra, this can be done with Rails also. See eg http://api.rubyonrails.org/classes/ActionController/Streaming.html
require 'sinatra'
get '/' do
line = 0
stream :keep_open do |out|
1.upto(100) do |line| # this would be your CSV file opened
out << "processing line #{line}<br>"
# process line
sleep 1 # for simulating the delay
end
end
end
A still better but somewhat complicated solution would be to use websockets, the browser would receive the results from the server once the processing is finished. You will need some javascript in the client also to handle this. See https://github.com/websocket-rails/websocket-rails

Why is this RegExp taking 16 minutes to process on Rails?

I've written a function to remove email addresses from my data using gsub. The code is below. The problem is that it takes a total of 27 minutes to execute the function on a set of 10,000 records. (16 minutes for the first pattern, 11 minutes for the second). Elsewhere in the code I process about 20 other RegExp's using a similar flow (iterating through data.each) and they all finish in less than a second. (BTW, I recognize that my RegExp's aren't perfect and may catch some strings that aren't email addresses.)
Is there something about these two RegExp's that is causing the processing time to be so high? I've tried it on seven different data sources all with the same result, so the problem isn't peculiar to my data set.
def remove_email_addresses!(data)
email_patterns = [
/[[:graph:]]+#[[:graph:]]+/i,
/[[:graph:]]+ +at +[^ ][ [[:graph:]]]{0,40} +dot +com/i
]
data.each do |row|
email_patterns.each do |pattern|
row[:title].gsub!(pattern,"") unless row[:title].blank?
row[:description].gsub!(pattern,"") unless row[:description].blank?
end
end
end
Check that your faster code isn't just doing var =~ /blah/ matching, rather than replacement: that is several orders of magnitude faster.
In addition to reducing backtracking and replacing + and * with ranges for safety, as follows...
email_patterns = [
/\b[-_.\w]{1,128}#[-_.\w]{1,128}/i,
/\b[-_.\w]{1,128} {1,10}at {1,10}[^ ][-_.\w ]{0,40} {1,10}dot {1,10}com/i
]
... you could also try "unrolling your loop", though this is unlikely to cause any issues unless there is some kind of interaction between the iterators (which there shouldn't be, but...). That is:
data.each do |row|
row[:title].gsub!(patterns[0],"") unless row[:title].blank?
row[:description].gsub!(patterns[0],"") unless row[:description].blank?
row[:title].gsub!(patterns[1],"") unless row[:title].blank?
row[:description].gsub!(patterns[1],"") unless row[:description].blank?
end
Finally, if this causes little to no speedup, consider profiling with something like ruby-prof to find out whether the regexes themselves are the issue, or whether there's a problem in the do iterator or the unless clauses instead.
Could it be that the data is large enough that it causes issues with paging once read in? If so, might it be faster to read the data in and parse it in chunks of N entries, rather than process the whole lot at once?

Ruby Multi threading, what am I doing wrong?

So, in order to improve to speed of our app I'm experimenting multi threading with our rails app.
Here is the code:
require 'thwait'
require 'benchmark'
city = Location.find_by_slug("orange-county", :select => "city, state, lat, lng", :limit => 1)
filters = ContractorSearchConditions.new()
image_filter = ImageSearchConditions.new()
filters.lat = city.lat
filters.lon = city.lng
filters.mile_radius = 20
filters.page_size = 15
filters.page = 1
image_filter.page_size = 5
sponsored_filter = filters.dup
sponsored_filter.has_advertised = true
sponsored_filter.page_size = 50
Benchmark.bm do |b|
b.report('with') do
1.times do
cities = Thread.new{
Location.where("lat between ? and ? and lng between ? and ?", city.lat-0.5, city.lat+0.5, city.lng-0.5, city.lng+0.5)
}
images = Thread.new{
Image.search(image_filter)[:hits]
}
sponsored_results_extended = Thread.new{
sponsored_filter.mile_radius = 50
#sponsored_results = Contractor.search( sponsored_filter )
}
results = Thread.new{
Contractor.search( filters )
}
ThreadsWait.all_waits(cities, images, sponsored_results_extended, results)
#cities = cities.value
#images = images.value
#sponsored_results = sponsored_results_extended.value
#results = results.value
end
end
b.report('without') do
1.times do
#cities = Location.where("lat between ? and ? and lng between ? and ?", city.lat-0.5, city.lat+0.5, city.lng-0.5, city.lng+0.5)
#image = Image.search(image_filter)[:hits]
#sponsored_results = Contractor.search( sponsored_filter )
#results = Contractor.search( filters )
end
end
end
Class.search is running a search on our ElasticSearch servers.(3 servers behind a Load balancer), where active record queries are being runned in our RDS instance.
(Everything is in the same datacenter.)
Here is the output on our dev server:
Bob#dev-web01:/usr/local/dev/buildzoom/rails$ script/rails runner script/thread_bm.rb -e development
user system total real
with 0.100000 0.010000 0.110000 ( 0.342238)
without 0.020000 0.000000 0.020000 ( 0.164624)
Nota: I've a very limited knowledge if no knowledge about thread, mutex, GIL, ..
There is a lot more overhead in the "with" block than the "without" block due to the Thread creation and management. Using threads will help the most when the code is IO-bound, and it appears that is NOT the case. Four searches complete in 20ms (without block), which implies that in parallel those searches should take less that amount of time. The "with" block takes 100ms to execute, so we can deduce that at least 80ms of that time is not spent in searches. Try benchmarking with longer queries to see how the results differ.
Note that I've made the assumption that all searches have the same latency, which may or may not be true, and always perform the same. It may be possible that the "without" block benefits from some sort of query caching since it runs after the "with" block. Do results differ when you swap the order of the benchmarks? Also, I'm ignoring overhead from the iteration (1.times). You should remove that unless you change the number of iterations to something greater than 1.
Even though you are using threads, and hence performing query IO in parallel, you still need to deserialize whatever results are coming back from your queries. This uses the CPU. MRI Ruby 2.0.0 has a global interpreter lock. This means Ruby code can only run one line at a time, not in parallel, and only on one CPU core. In order to deserialize all your results, the CPU has to context switch many times between the different threads. This is a lot more overhead than deserializing each result set sequentially.
If your wall time is dominated by waiting for a response from your queries, and they don't all come back at the same time, then there might be an advantage to parallelizing with threads. But it's hard to predict that.
You could try using JRuby or Rubinius. These will both utilize multiple cores, and hence can actually speed up your code as expected.

ruby: how to iterate elements in a hash efficiently

I have a very big hash and I want to iterate it. Hash.each seems to be too slow.
Is there any efficient way to do this?
How about convert this hash to an array?
In each loop I'm doing very simple string stuff:
name_hash.each {|name, str|
record += name.to_s + "\|" + str +"\n"
}
and the hash uses people's names as the key, some related content as the value:
name_hash = {:"jose garcia" => "ca:tw#2#1,2#:th#1#3#;ar:tw#1#4#:fi#1#5#;ny:tw#1#6#;"}
Consider the following example, which uses a hash of 1 million elements:
#! /usr/bin/env ruby
require 'benchmark'
h = {}
1_000_000.times do |n|
h[n] = rand
end
puts Benchmark.measure { h.each { |k, v| } }
a = nil
puts Benchmark.measure { a = h.to_a }
puts Benchmark.measure { a.each { |k, v| } }
I run this on my system at work (running Ruby 1.8.5) and I get:
0.350000 0.020000 0.370000 ( 0.380571)
0.300000 0.020000 0.320000 ( 0.307207)
0.160000 0.040000 0.200000 ( 0.198388)
So iterating over the array is indeed faster (0.16 seconds versus 0.35 seconds for the hash). But it took 0.3 seconds to generate the array. So the net process is slower 0.46 seconds versus 0.35 seconds.
So it seems it's best just to iterate over the hash, at least in this test case.
String#+ is slow. This should improve it
record = name_hash.map{|line| line.join("|")}.join("\n")
If you are using this to output to somewhere, you should not create a huge string but rather write line by line to the output.
A more idiomatic way to do that in ruby:
record = name_hash.map{|k,v| "#{k}|#{v}"}.join("\n")
I don't know how that will compare with speed, but part of the problem might be because you keep adding a little bit onto a string and creating new (ever longer) string objects with each iteration. The join is done in C and might perform better.
Iterating over large collections is slow, the each method is not what's throttling it. What in your loop are you doing that's so slow? If you need to convert to an array, you can do that by calling some_hash.to_a
I had thought ruby 1.9.x had made hash iteration faster but could have been wrong. If it's simple structures you could try a different hash, like https://github.com/rdp/google_hash which is one I hacked up to make #each more reliable...
Probably "by making a single db query"
Converting a large Hash to an Array will require creating a large object and will require two iterations, albeit with one of them being internal to the interpreter and probably very fast.
This is unlikely to be faster than just iterating over the Hash, but it might be for large objects.
Check out the Standard Library Benchmark package for an easy way to measure runtime.
I would also venture a guess that the real problem here is that you have a Hash-like ActiveRecord object that imposes a round-trip to your db server in each cycle of the enumeration. It's possible that what you really want is to bypass AR and run a native query to retrieve everything at once in a single round-trip.

Measure and Benchmark Time for Ruby Methods

How can i measure the time taken by a method and the individual statements in that method in Ruby. If you see the below method i want to measure the total time taken by the method and the time taken for database access and redis access. I do not want to write Benchmark.measure before every statement. Does the ruby interpreter gives us any hooks for doing this ?
def foo
# code to access database
# code to access redis.
end
The simplest way:
require 'benchmark'
def foo
time = Benchmark.measure {
code to test
}
puts time.real #or save it to logs
end
Sample output:
2.2.3 :001 > foo
5.230000 0.020000 5.250000 ( 5.274806)
Values are: cpu time, system time, total and real elapsed time.
Source: ruby docs.
You could use the Time object. (Time Docs)
For example,
start = Time.now
# => 2022-02-07 13:55:06.82975 +0100
# code to time
finish = Time.now
# => 2022-02-07 13:55:09.163182 +0100
diff = finish - start
# => 2.333432
diff would be in seconds, as a floating point number.
Use Benchmark's Report
require 'benchmark' # Might be necessary.
def foo
Benchmark.bm( 20 ) do |bm| # The 20 is the width of the first column in the output.
bm.report( "Access Database:" ) do
# Code to access database.
end
bm.report( "Access Redis:" ) do
# Code to access redis.
end
end
end
This will output something like the following:
user system total real
Access Database: 0.020000 0.000000 0.020000 ( 0.475375)
Access Redis: 0.000000 0.000000 0.000000 ( 0.000037)
<------ 20 -------> # This is where the 20 comes in. NOTE: This is not shown in output.
More information can be found here.
Many of the answers suggest the use of Time.now. But it is worth being aware that Time.now can change. System clocks can drift and might get corrected by the system's administrator or via NTP. It is therefore possible for Time.now to jump forward or back and give your benchmarking inaccurate results.
A better solution is to use the operating system's monotonic clock, which is always moving forward. Ruby 2.1 and above give access to this via:
start = Process.clock_gettime(Process::CLOCK_MONOTONIC)
# code to time
finish = Process.clock_gettime(Process::CLOCK_MONOTONIC)
diff = finish - start # gets time is seconds as a float
You can read more details here. Also you can see popular Ruby project, Sidekiq, made the switch to monotonic clock.
A second thought, define the measure() function with Ruby code block argument can help simplify the time measure code:
def measure(&block)
start = Time.now
block.call
Time.now - start
end
# t1 and t2 is the executing time for the code blocks.
t1 = measure { sleep(1) }
t2 = measure do
sleep(2)
end
In the spirit of wquist's answer, but a little simpler, you could also do it like below:
start = Time.now
# code to time
Time.now - start

Resources