A/B Test with Rails resets data after version change - ruby-on-rails

I'm using split gem on Rails and my experiments keeps incrementing it's version, and after each increment all the data are lost. I don't know the pattern for it to change thus don't know how to prevent it. For example:
Experiment: landing v19 Goal:view-wheels
ALTERNATIVE NAME PARTICIPANTS NON-FINISHED COMPLETED CONVERSION RATE
A 70 40 30 42.86%
B 70 27 43 61.43% +43.33%
After some event that I don't know, the version will change to v20 and all data will be lost. I want to prevent it, I need to keep counting until I actively say to stop.
My redis configuration:
On redis.rb:
uri = URI.parse(ENV['REDISTOGO_URL'])
REDIS = Redis.new(:host => uri.host, :port => uri.port, :password => uri.password, :username => uri.user)
On split.rb:
require "split/dashboard"
include Split::Helper
Split.redis = REDIS
Split.configure do |config|
config.experiments = {
"landing" => {
:alternatives => ["A", "B"],
:resettable => false,
:goals => ["opt-in", "view-wheels"]
}
}
config.db_failover = true # handle redis errors gracefully
config.db_failover_on_db_error = proc{|error| Rails.logger.error(error.message) }
end

Related

Rails cached value lost/nil despite expires_in 24.hours

I am using ruby 2.3.3 and Rails 4.2.8 with Puma (1 worker, 5 threads) and on my admin (i.e. not critical) page I want to show some stats (integer values) from my database. Some requests take quite a long time to perform so I decided to cache these values and use a rake task to re-write them every day.
Admin#index controller
require 'timeout'
begin
timeout(8) do
#products_a = Rails.cache.fetch('products_a', :expires_in => 24.hours) { Product.where(heavy_condition_a).size }
#products_b = Rails.cache.fetch('products_b', :expires_in => 24.hours) { Product.where(heavy_condition_b).size }
#products_c = Rails.cache.fetch('products_c', :expires_in => 24.hours) { Product.where(heavy_condition_c).size }
#products_d = Rails.cache.fetch('products_d', :expires_in => 24.hours) { Product.where(heavy_condition_d).size }
end
rescue Timeout::Error
#products_a = 999
#products_b = 999
#products_c = 999
#products_d = 999
end
Admin#index view
<li>Products A: <%= #products_a %></li>
<li>Products B: <%= #products_b %></li>
<li>Products C: <%= #products_c %></li>
<li>Products D: <%= #products_d %></li>
Rake task
task :set_admin_index_stats => :environment do
Rails.cache.write('products_a', Product.where(heavy_condition_a).size, :expires_in => 24.hours)
Rails.cache.write('products_b', Product.where(heavy_condition_b).size, :expires_in => 24.hours)
Rails.cache.write('products_c', Product.where(heavy_condition_c).size, :expires_in => 24.hours)
Rails.cache.write('products_d', Product.where(heavy_condition_d).size, :expires_in => 24.hours)
end
I am using this in production and use Memcachier (on Heroku) as a cache store. I also use it for page caching on the website and it works fine there. I have:
production.rb
config.cache_store = :dalli_store
The problem I am experiencing is that the cached values disappear almost instantly, and quite intermittently, from the cache. In the console I have tried:
I Rails.cache.write one value (e.g. product_a) and check it a minute later, it is still there. Although crude, I can see the "Set cmds" increments by one in Memcachier admin tool.
However, when I add the next value (e.g. product_b) the first one disappears (becomes nil)! Sometimes if I add all 4 values, 2 seems to stick. These are not always the same values. It is like whack-a-mole!
If I run the rake to write the values and then try to read the values typically only two values are left, whereas the others are lost.
I have seen a similar question related to this where the reason explained was the use of a multithread server. The cached value was saved in one thread and could not be reached in another, the solution was to use a memcache, which I do.
It is not only the console. If I just reload admin#index view to store the values or run the rake task, I experience the same problem. The values do not stick.
My suspicion is that I am either not using the Rails.cache-commands properly or that these commands do not in fact use Memcachier. I have not been able to determine whether my values are in fact stored in Memcachier but when I use my first command in the console I do get the following:
Rails.cache.read('products_a')
Dalli::Server#connect mc1.dev.eu.ec2.memcachier.com:11211
Dalli/SASL authenticating as abc123
Dalli/SASL authenticating as abc123
Dalli/SASL: abc123
Dalli/SASL: abc123
=> 0
but I do not get that for subsequent writes (which I assume is a matter of readability in the console and not a proof of Memcachier not being used.
What am I missing here? Why won't the values stick in my cache?
Heroku DevCenter states a little different cache config and gives some advice about threaded Rails app servers like Puma using connection_pool gem:
config.cache_store = :mem_cache_store,
(ENV["MEMCACHIER_SERVERS"] || "").split(","),
{ :username => ENV["MEMCACHIER_USERNAME"],
:password => ENV["MEMCACHIER_PASSWORD"],
:failover => true,
:socket_timeout => 1.5,
:socket_failure_delay => 0.2,
:down_retry_delay => 60,
:pool_size => 5
}

Persisting thread in delayed_job

So I have a rails app where I want my delayed job process to communicate with an SMPP server. But the problem occurs when I try to send the messages. My thread that I created in an initializer (delayed_job.rb):
if $0.ends_with?('/delayed_job')
require_relative '../../lib/gateway'
config = {
:host => 'SERVER.COM',
:port => 2345,
:system_id => 'USERNAME',
:password => 'PASSWORD',
:system_type => '', # default given according to SMPP 3.4 Spec
:interface_version => 52,
:source_ton => 0,
:source_npi => 1,
:destination_ton => 1,
:destination_npi => 1,
:source_address_range => '',
:destination_address_range => '',
:enquire_link_delay_secs => 60
}
Thread.new{
gw = Gateway.new
gw.start(config)
}
end
But checking my log file for the smpp server, it seems that the thread dies right after it starts. So I guess my question is how to persist the thread while the delayed_job daemon is running?
If I start my rails app in production and I try to send messages individually, it works without a problem, but because delayed_job is a separate process, I can't communicate with the the smpp thread in the rails app from my workers in the delayed_job queues.
Any ideas?
Sorted, decided to separate everything into their own daemons and each would communicate with the database independently as opposed to trying to work with pipes and signals.

Sidekiq worker not able to write to database or log file

I'm building an app on Herokou and Redis that sends an SMS messages for every row in an input CSV file which contains the mobile phone number. The message is sent using Twilio in a sidekiq worker shown below. The problem is that even though the SMS is being sent for all the rows in the CSV, the database write (TextMessage.create) and log write (puts statement) only executes for one row in the CSV. There is one Sidekiq worker spawned for each row in the CSV file. It seems like only one Sidekiq worker has I/O (DB, file) access and it locks it from the other Sidekiq workers. Any help would be appreciated.
sidekiq worker:
require 'sidekiq'
require 'twilio-rb'
class TextMessage < ActiveRecord::Base
include Sidekiq::Extensions
def self.send_message(number, body, row_index, column_index, table_id)
puts "TextMessage#send_message: ROW INDEX: #{row_index} COLUMN INDEX: #{column_index} TABLEID: #{table_id} BODY: #{body} PHONE: #{number}"
Twilio::Config.setup :account_sid => 'obfuscated', :auth_token => '<obfuscated>'
sms = Twilio::SMS.create :to => number, :from => '+17085555555', :body => body + ' | Sent: ' + Time.now.in_time_zone('Central Time (US & Canada)').strftime("%m/%d/%Y %I:%M%p Central")
TextMessage.create :to => number, :from => '+17085555555'
ImportCell.add_new_column(table_id, row_index, column_index, "Time Sent", Time.now.in_time_zone('Central Time (US & Canada)').strftime("%m/%d/%Y %I:%M%p Central"))
end
end
call to sidekiq worker:
TextMessage.delay_until(time_to_send, :retry => 3).send_message(phone, 'Scheduled: ' + time_to_send.in_time_zone('Central Time (US & Canada)').strftime("%m/%d/%Y %I:%M%p Central"), row_index, column_index, table.id)
column_index += 1
Heroku Procfile
worker: bundle exec sidekiq -C config/sidekiq.yml
sidekiq.yml
:verbose: false
:concurrency: 3
:queues:
- [default, 5]
config/initializers/redis.rb:
uri = URI.parse(ENV["REDISTOGO_URL"])
REDIS = Redis.new(:host => uri.host, :port => uri.port, :password => uri.password)
Sidekiq.configure_server do |config|
database_url = ENV['DATABASE_URL']
if(database_url)
ENV['DATABASE_URL'] = "#{database_url}?pool=25"
ActiveRecord::Base.establish_connection
end
end
I am one of the people who commented on your question, just fixed it!
You are using .create which SideKiq seemed to not like, so I tried using .new and then .save which made it work! I think it has to do with .create not being thread safe or something of the sort, but I honestly have no idea.
Non Working code:
class HardWorker
include Sidekiq::Worker
def perform(name, count)
puts 'Doing some hard work!'
UserInfo.create(
:user => "someone",
:misc1 => 0,
:misc2 => 0,
:misc3 => 0,
:comment => "Made from HardWorker",
:time_changed => Time.now
)
puts 'Done with hard work!'
end
end
Working code:
class HardWorker
include Sidekiq::Worker
def perform(name, count)
puts 'Doing some hard work!'
a_row = UserInfo.new(
:user => "someone",
:misc1 => 0,
:misc2 => 0,
:misc3 => 0,
:comment => "Made from HardWorker",
:time_changed => Time.now
)
a_row.save
puts 'Done with hard work!'
end
end

How to "crawl" only the root URL with Anemone?

In the example below I would like anemone to only execute on the root URL (example.com). I am unsure if I should apply the on_page_like method and if so what pattern I would need.
require 'anemone'
Anemone.crawl("http://www.example.com/") do |anemone|
anemone.on_pages_like(???) do |page|
# some code to execute
end
end
require 'anemone'
Anemone.crawl("http://www.example.com/", :depth_limit => 1) do |anemone|
# some code to execute
end
You can also specify the following in the options hash, below are the defaults:
# run 4 Tentacle threads to fetch pages
:threads => 4,
# disable verbose output
:verbose => false,
# don't throw away the page response body after scanning it for links
:discard_page_bodies => false,
# identify self as Anemone/VERSION
:user_agent => "Anemone/#{Anemone::VERSION}",
# no delay between requests
:delay => 0,
# don't obey the robots exclusion protocol
:obey_robots_txt => false,
# by default, don't limit the depth of the crawl
:depth_limit => false,
# number of times HTTP redirects will be followed
:redirect_limit => 5,
# storage engine defaults to Hash in +process_options+ if none specified
:storage => nil,
# Hash of cookie name => value to send with HTTP requests
:cookies => nil,
# accept cookies from the server and send them back?
:accept_cookies => false,
# skip any link with a query string? e.g. http://foo.com/?u=user
:skip_query_strings => false,
# proxy server hostname
:proxy_host => nil,
# proxy server port number
:proxy_port => false,
# HTTP read timeout in seconds
:read_timeout => nil
My personal experience is that Anemone was not very fast and had a lot of corner cases. The docs are lacking (as you have experienced) and the author doesn't seem to be maintaining the project. YMMV. I tried Nutch shortly but didn't play aroud as much but it seemed faster. No benchmarks, sorry.

How often do initializers run in Rails?

Do the initializers in a Rails app run each time someone visits the site?
For example, if my server is started in Texas at 10 a.m. , and someone visits my site from New York at 1 p.m. and someone visits from Los Angeles at 10 p.m, do the the initializers in a rails application run when the people from New York and Los Angeles visit, or do the initializers only run once I start the server in Texas?
The reason I'm asking is because I was using a case expression in an initializer file to change email settings depending on the time of day that app is visited. This would only make sense of course if the initializers ran when someone visited the site. If they ran only when the server was started then it would only be one case...
If that's not the right place to do it, or if the initializers only run once the server is started in Texas (for example) then where would you put this code?
case
when Time.now.hour == 0
ActionMailer::Base.smtp_settings = {
:user_name => "blahblah#example.com",
:password => "blahblah",
:address => "smtp.examplel.com",
:port => 25,
:tls => true
}
when Time.now.hour == 1
ActionMailer::Base.smtp_settings = {
:user_name => "blah#example.com",
:password => "eatshit",
:address => "smtp.example.com",
:port => 25,
:tls => true
}
end
A very simple and straight answer is: Just once, when your server kicks up.
You may be intrested in this article The Rails Initialization Process
Initializers get loaded whenever you start up passenger / mongrel or whatever you are using.
To set these settings at runtime take a look at Rails: Runtime configuration of ActionMailer?

Resources