Precompile / preload cached fragments in Rails - ruby-on-rails

Using Rails 3.1.1 and Herkou
I have 1.000 products in my app. They all have a very slow controller which is effectively solved by fragment caching. Although the data doesn't change very often, it still needs to expire (which I do by sweeping) periodically, in my case once a week.
Now, after sweeping the cached views I don't want my users to create the new fragments by trying to access the products one after another (takes about 6-8 secs at the first load, 2-3 sec for the cached load). I assume I can do that with some sort of script that will load each Product Page one by one and thus make the server create those fragments.
I can imagine this can be handled in three ways:
Run a script on my local machine that will try to access each url with some sort of get-command - Downside: Not very pretty and will affect visitor stats in a way I would not prefer.
Run the same type script on the server after the sweeper, that will load each Product. How can I do that, in that case?
Using a smart Rails command to do this automatically. Is there such an elegant command?

I made this script and it works. The "product.slug" is because I have friendly_id installed. It will produce url-variables with names such as www.mydomain.com/productabc-123/ which will be read by Nokogiri (Nokogiri gem is needed for this solution).
PLEASE NOTE THAT I SWITCHED FROM FRAGMENT CACHING TO ACTION CACHING IN THIS SOLUTION (as opposed to the question, where I am using fragment caching). The important difference for this is when I check the cache if Rails.cache.exist?('views/www.mydomain.com/' + product.slug). For fragment_caching it should be the fragment name there instead.
require 'nokogiri'
require 'open-uri'
Product.all.each do |product|
url = 'http://www.mydomain.com/' + product.slug
begin
if Rails.cache.exist?('views/www.mydomain.com/' + product.slug)
puts url + " is already in cache"
else
doc = Nokogiri::HTML(open(url))
puts "Reads " + url
# Verifies if the caching worked. Only for trouble shooting
if Rails.cache.exist?('views/www.mydomain.com/' + product.slug)
puts "--->" + url + " is NOW in the cache"
else
puts "--->" + url + " is still not in the cache!"
end
sleep 1
end
rescue
puts 'Normal rescue of ' + url
rescue Timeout::Error
puts 'Timeout rescue of ' + url
puts 'Sleep for 5 sec'
sleep 5
retry
end
end

Create a script that runs as rake task, or better yet a worker, that runs and curls the page. There is no need to include a gem when you can just call curl
`curl -A "CacheRefresher" #{ENV['HOSTNAME']}/api/v1/#{klass.name.underscore.pluralize}/#{id} >/dev/null 2>&1`

Related

How to "reload" a cloudflare 520 request with ruby?

I wrote a ruby script to download an image URL:
require 'open-uri'
imageAddress = ARGV[0]
targetPath = ARGV[1]
fullFileNamePath = "#{targetPath}test.jpg"
begin
File.open(fullFileNamePath, 'wb') do |fo|
fo.write open(imageAddress).read
end
rescue OpenURI::HTTPError => ex
puts ex
File.delete(fullFileNamePath)
end
Example Usage:
ruby download_image.rb "https://images.genius.com/b015b15e476c92d10a834d523575d3c9.1000x1000x1.jpg" "/Users/Me/Downloads/"
The problem is, sometimes I run across this output error:
520 Origin Error
Then, when I try the same URL in my browser, I get something like this:
If I reload the page or click the 'Retry for a live version' button in the above image, the page loads.
Then if I run the script again it downloads the image just fine.
So how can I replicate this page reload / 'Retry for a live version' behavior using ruby and without switching to my browser? Running the script again doesn't do the job.
It sounds like you are looking for a delay command. If the script fails (or encounters '520 Origin Error') wait and re-try.
This is a quick built recursive function, you may want to add other checks for how many times you have looped, breaking after so many. (Also not tested, may contain errors, meant as an example)
def getFile(params_you_need)
begin
File.open(fullFileNamePath, 'wb') do |fo|
fo.write open(imageAddress).read
end
rescue OpenURI::HTTPError => ex
puts ex
File.delete(fullFileNamePath)
if ex == '520 Origin Error'
sleep(30) #generally a good time to pause
getFile(params_you_need)
end
end
end

Calling a ruby script using a shell command fails at 'load_rakefile'

I have run into a problem with using a shell command that calls a Ruby script, which then invokes Rake.
I have built a test automation framework that does the following when run from the command line (I'm on OSX Yosemite):
Calls a Ruby script which sets a bunch of Environment Variables
It then invokes Rake
rake = Rake.application
rake.init
rake.load_rakefile
rake['execute_tests'].invoke
The Rake file runs a Cucumber Task and the test framework then happily launches a browser and starts executing tests.
Cucumber::Rake::Task.new(:execute_tests) do |task|
# => need to populate these so that the cucumber.yml parses
ENV['TEST_WEB_PARALLEL_OS'] = "null"
ENV['TEST_WEB_PARALLEL_OS_VERSION'] = "null"
ENV['TEST_WEB_PARALLEL_BROWSER'] = "null"
ENV['TEST_WEB_PARALLEL_BROWSER_VERSION'] = "null"
#------------------------------------------------
# Specify rake profile
#------------------------------------------------
runProfile = ENV['TEST_PLATFORM'].downcase + "_" + ENV['TEST_INTERFACE'].downcase + "_" + ENV['TEST_ENVIRONMENT'].downcase + "_" + ENV['TEST_TYPE'].downcase
# => running headless
if ENV['TEST_HEADLESS'] == "TRUE"
# => need to truncate poltergeist - using the #poltergeist tag in cucumber was causing issues with other drivers
ENV['TEST_BROWSER'] = "POLTER"
reportProfile = ENV['TEST_PLATFORM'].downcase + "_" + ENV['TEST_INTERFACE'].downcase + "_" + ENV['TEST_ENVIRONMENT'].downcase + "_" + ENV['TEST_TYPE'].downcase + "_" + osHelper.getOperatingSystem.to_s + "_" + ENV['TEST_BROWSER'].downcase + "_hlst"
# => running headed
else
reportProfile = ENV['TEST_PLATFORM'].downcase + "_" + ENV['TEST_INTERFACE'].downcase + "_" + ENV['TEST_ENVIRONMENT'].downcase + "_" + ENV['TEST_TYPE'].downcase + "_" + osHelper.getOperatingSystem.to_s + "_" + ENV['TEST_BROWSER'].downcase + "_lst"
end
#------------------------------------------------
# Set the env var then run profile
#------------------------------------------------
ENV['REPORT_PROFILE'] = reportProfile
ENV['RUN_PROFILE'] = runProfile
task.profile = runProfile
end
This all works perfectly well when I execute from the command line. The problem is that I want to put a basic GUI on the front of the test framework. I am using Shoe3 to do this.
When I call my initial Ruby script from the GUI...
`ruby ./exe/execute_web_tests_local_singlethread.rb salesforce integration regression headed chrome false false`
...then the code executes until the point where the following line tries to execute:
rake.load_rakefile
At this the code fails over. I don't see any output or stacktrace from the sub-process so am unable to debug beyond the point of knowing that the process falls over at the line specified above.
Screenshot of GUI failing
Unfortunately that's as specific as I can be. If anybody could provide any pointers for how I might go about investigating and/or resolving this issue it would be much appreciated.
I have tried using Open3 as well, this has resulted in the same issue.
Cheers
When you shell out to Ruby with ruby ./exe/execute_web_tests_local_singlethread.rb, the child process will run in the current working directory of the parent process (i.e. the one your GUI application is running from).
When rake.load_rakefile is called, it will be looking for the Rakefile relative to the current working directory, not the directory relative to the script you're calling out to.
There's a couple of ways you can fix this. One is by setting the RAKEOPT environment variable in the parent process (your GUI) before you execute the command. This will be inherited by the child process:
ENV['RAKEOPT'] = "--rakefile ./exe/Rakefile"
Alternatively, you can change the working directory in the parent process:
Dir.chdir("./exe") do
`ruby execute_web_tests_local_singlethread.rb salesforce integration regression headed chrome false false`
end
This might not be advisable. If your application is threaded and relying on the current directory (and Shoes may be), you might have some unexpected consequences modifying the current directory.
One last thing: you may not be able to start up a child process that runs a a browser in this way. Both full browsers and headless browsers, to the best of my knowledge, need information about the graphical environment you're running in. This is fine when you're running a process attached to a GUI terminal session, but you might run into other issues trying to spin up another graphical process from inside Shoes.
Hope that helps!

Use sidekiq with a running dynamic counter in Rails

I build a website-crawler that (later on) uses these links to read out information.
The current rake-task goes through all the possible pages one by one and checks if the requests goes trough (valid response) or returns a 404/503 error (invalid page). If it's valid the pages url gets saved into my database.
Now as you can see the task requests 50,000 pages in total thus requires some time.
I have read about Sidekiq and how it can perform these tasks asynchronously thus making this a lot faster.
My question: As you can see my task builds the counter after each loop. This will not work with Sidekiq I guess as it will only perform this script independent of itself various times, am I right?
How would I go around the problem of each instance needing its own counter then?
Hopefully my question makes sense - Thank you very much!
desc "Validate Pages"
task validate_url: :environment do
require 'rubygems'
require 'open-uri'
require 'nokogiri'
counter = 1
base_url = "http://example.net/file"
until counter > 50000 do
begin
url = "#{base_url}_#{counter}/"
open(url)
page = Page.new
page.url = url
page.save!
puts "Saved #{url} !"
counter += 1
rescue OpenURI::HTTPError => ex
logger ||= Logger.new("validations.log")
if ex.io.status[0] == "503"
logger.info "#{ex} # #{counter}"
end
puts "#{ex} # #{counter}"
counter += 1
rescue SocketError => ex
logger ||= Logger.new("validations.log")
logger.info "#{ex} # #{counter}"
puts "#{ex} # #{counter}"
counter += 1
end
end
end
A simple Redis INCR operation will create and/or increment a global counter for your jobs to use. You can use Sidekiq's redis connection to implement a counter trivially:
Sidekiq.redis do |conn|
conn.incr("my-counter")
end
If you want to use it async - that means you will have many instances of same job. The fastest approach - to use something like redis. This will give you simple and fast way to check\update counter for your needs. But also make sure you took care about counter: If one of your jobs using it, lock it for other jobs, so there wont be wrong results, etc

Simple rails app on Puma throws segfault, cannot handle concurrency

I have a fairly simple Rails app. It listens for requests in the form
example.com/items?access_key=my_secret_key
My application controller looks at the secret key to determine which user is making the call, looks up their database credentials, and connects to the appropriate database to get that person's items.
However we need to have this support multiple requests at a time, and Puma seems like everyone's favorite / the fastest server for us to use. We started running into problems when benchmarking it with ApacheBench. FYI, puma is configured to have 3 workers and min=1, max=16 threads.
If I were to run
ab -n 100 -c 10 127.0.0.1:3000/items?access_key=my_key
then this error is thrown with a whole lot of stack trace after it:
/home/user/.gem/ruby/2.0.0/gems/mysql2-0.3.16/lib/mysql2/client.rb:70: [BUG] Segmentation fault
ruby 2.0.0p353 (2013-11-22 revision 43784) [x86_64-linux]
Edit: This line also appears in the enormous amount of info that the error contains:
*** glibc detected *** puma: cluster worker 1: 17088: corrupted double-linked list: 0x00007fb671ddbd60
And it looks to me like that's tripping multiple times. I have been unable to determine exactly when (on which requests) it trips.
The benchmarking seems to still finish, but it seems quite slow (from ab):
Concurrency Level: 10
Time taken for tests: 21.085 seconds
Complete requests: 100
Total transferred: 3620724 bytes
21 seconds for 3 megabytes? Even if mysql was being slow, that's... bad. But I think it's worse than that - the amount of data isn't high enough. There are no segfaults when I run concurrency 1, and the amount of data for -n 10 -c 1 is 17 megabytes. So puma is responding with some error page that I can't see - running 'curl address' gives me the expected data, and I can't manually do concurrency.
It gets worse when I run more requests or higher concurrency.
ab -n 1000 -c 10 127.0.0.1:3000/items?access_key=my_key
yields
apr_socket_recv: Connection reset by peer (104)
Total of 199 requests completed
and
ab -n 100 -c 50 127.0.0.1:3000/items?access_key=my_key
yields
apr_socket_recv: Connection reset by peer (104)
Total of 6 requests completed
Running top in another putty window shows me that very often (most times I try to benchmark) only one of the three workers puma created is performing any work. Rarely, all three do.
Because it seems like the error might be somewhere in here, I'll show you my application_controller. It's short, but the bulk of the application (which, like I said, is fairly simple).
class ApplicationController < ActionController::Base
# Prevent CSRF attacks by raising an exception.
# For APIs, you may want to use :null_session instead.
protect_from_forgery with: :exception
def get_yaml_params
YAML.load(File.read("#{APP_ROOT}/config/ecommerce_config.yml"))
end
def access_key_login
access_key = params[:access_key]
unless access_key
show_error("missing access_key parameter")
return false
end
access_info = get_yaml_params
unless client_login = access_info[access_key]
show_error("invalid access_key")
return false
end
status = ActiveRecord::Base.establish_connection(
:adapter => "mysql2",
:host => client_login["host"],
:username => client_login["username"],
:password => client_login["password"],
:database => client_login["database"]
)
end
def generate_json (columns, footer)
// config/application.rb includes the line
// require 'json'
query = "select"
columns.each do |column, name|
query += " #{column}"
query += " AS #{name}" unless column == name
query += ","
end
query = query[0..-2] # trim ','
query += " #{footer}"
dbh = ActiveRecord::Base.connection
results = dbh.select_all(query).to_hash
data = results.map do |result|
columns.map {|column, name| result[name]}
end
({"fields" => columns.values, "values" => data}).to_json
end
def show_error(msg)
render(:text => "Error: #{msg}\n")
nil
end
end
And an example of a controller that uses it
class CategoriesController < ApplicationController
def index
access_key_login or return
columns = {
"prd_type" => "prd_type",
"prd_type_description" => "description"
}
footer = "from cin_desc;"
json = generate_json(columns, footer)
render(:json => json)
end
end
That's pretty much it as far as custom code goes. I can't find anything making this not threadsafe, so I don't know what the cause of the segfaults is. I don't know why not all of the workers spin up when requests are made. I don't know what error is getting returned to ApacheBench. Thanks for helping, I can post more information as you need it.
It appears that the stable version of mysql2 library, 0.3.17, is NOT threadsafe. Until it is updated to be threadsafe, using it with multithreaded puma will be impossible. An alternative would be to use Unicorn.

Timeout::Error in Rails application using Watir

I am using Watir to browse pages and take screenshots of some pages in my application.
However, getting a page from my server takes a while, and I get Timeout::Error.
To fix this, I used a wait in my Watir browser instance, to check to see if a div with id 'content' exists, and to make it wait until it exists. However, it takes some time, and the page is loaded in the Watir browser. But after it is loaded, I get the Timeout::Error in my main browser window.
Here's my code:
#pages = Pages.all
browser = Watir::Browser.new
#pages.each do |page|
page_url = app_root_url + 'pages/' + page.id.to_s
browser.goto page_url
Watir::Waiter::wait_until { browser.div(:id, 'content').exists? }
file_save_path = pages_screenshot_path.to_s + page.id.to_s + '.png'
browser.driver.save_screenshot(file_save_path)
end
browser.close
Each page contains a div with id 'content'. Still, it's not waiting I guess.
The default wait time for Watir::Waiter.wait_until is 60 seconds (checking every half second until 60). You can specify a higher value like so:
Watir::Waiter.wait_until(120) { code code code }
You can find more specifics here: http://wiki.openqa.org/display/WTR/How+to+wait+with+Watir
For watir-webdriver, you can use Watir::Wait.methods:
Watir::Wait.until(120) { code code code }
I moved this process to run in the background using delayed_job gem, and it works fine!

Resources