Ruby Object mutations within sidekiq workers - ruby-on-rails

I have a wrapper that sends user updates to an external service, on a regular basis, running inside a Sidekiq worker. Each User has its own Sidekiq Job. Sidekiq is setup to use 20 threads. It's a Rails 5.0.1 app, MRI Ruby 2.3.0. Webserver is Passenger 5 Community.
If I over simplify, the code looks like this:
class ProviderUserUpdateJob < ApplicationJob
queue_as :default
def perform(user_id)
user = User.find(user_id)
Provider::User.new(user).push_update
end
end
class Provider::User
def initialize(user)
#user = user
end
def push_update
SomeApiWrapper.call(
user_id: #user.id,
status: #user.status
)
end
....
end
Now, the BIG problem that I only have on production and I finally catched up by looking at the logs can be summarized like this :
class Provider::User
def initialize(user)
#user = user
end
def push_update
SomeApiWrapper.call(
user_id: #user.id, # Some user
status: #user.status # NOT THE SAME USER !!! (and I have no idea where he is coming from)
)
end
....
end
2 Questions:
How it this even possible? Does it comes from Provider::User being by essence a globally accessible object, so, from threads to threads, everything gets mixed up in a mutating soup?!
If I only use "functional" style, without any instance, passing parameters and outputs from static methods to static methods, can it solve my problem or am I completely wrong? How can I fix this?
Ultimately, is there any way to really battle test this kind of code so I can be sure not to mix users data?

Ok... turns out it's a dummy data issue. I just lost 2 hours trying to figure it out with complicated explanations but the answer was, simply, in my DB. Well done :-/

Related

How to require manual 'Admin' approval of an OAuth application when using Doorkeeper?

Let's say we have Users (who can create Doorkeeper::Applications). On the other hand our app has Admins who ideally need to check each Application that is created (and maybe do a background check on the creating User and what not) as well as its scopes. They would #approve! or #reject! the Application and only once it is approved, can the Application make calls to the API.
NOTE: #approve!, #reject!, and approved do not come with Doorkeeper, from what I know. They are hypothetical so my question is clearer.
Is this a behavior that can be achieved with Doorkeeper (or an extension)? I don't think something like this is described in the config file. If not, do you have any general steps on how this could be done?
I'm thinking that something like this could work
class Api::V1::TransactionalBaseController < Api::V1::AuthableController
before_action :doorkeeper_authorize!
before_action :check_application_status!
private
def check_application_status!
application = doorkeeper_token.application
unless application.approved?
raise Doorkeeper::Errors::ApplicationForbidden.new
end
end
end
If this is something that may help other users of the gem, I'm open to possibly opening a PR or developing an extension to achieve this.
rails g migration AddApprovedAtRejectedAtToOauthApplications approved_at:datetime rejected_at:datetime
Edit the file to reflect the correct table.
Keeping in mind that Ruby lets you modify a class from anywhere... In an initializer (or similar); from https://github.com/doorkeeper-gem/doorkeeper/issues/153:
doorkeeper_extend_dir = File.join(Rails.root, "app", "models", "doorkeeper")
Dir.glob("#{doorkeeper_extend_dir}/*.rb").each { |f| require(f) }
# app/models/doorkeeper/application.rb
module Doorkeeper
class Application < ActiveRecord::Base
def approved?
approved_at?
end
def rejected?
rejected_at?
end
def approve!
update_column(:approved_at, DateTime.now)
end
def reject!
update_column(:rejected_at, DateTime.now)
end
end
end

cancelling a sheduled Sidekiq job in Rails

Some Sidekiq jobs in my app are scheduled to change the state of a resource to cancelled unless a user responds within a certain timeframe. There is a lot of information about how to best accomplish this task, but none of it actually cancels the job.
To cancel a job, the code in the wiki says:
class MyWorker
include Sidekiq::Worker
def perform(thing_id)
return if cancelled?
thing = Thing.find thing_id
thing.renege!
end
def cancelled?
Sidekiq.redis {|c| c.exists("cancelled-#{jid}") }
end
def self.cancel!(jid)
Sidekiq.redis {|c| c.setex("cancelled-#{jid}", 86400, 1) }
end
end
Yet here it's suggested that I do something like
def perform(thing_id)
thing = Thing.find thing_id
while !cancel?(thing)
thing.ignore!
end
end
def cancel?(thing_id)
thing = Thing.find thing_id
thing.matched? || thing.passed?
end
What's confusing about this and similar code on the wiki is none of it actually cancels the job. The above example just performs an update on thing if cancelled? returns false (as it should), but doesn't cancel if and when it returns true in the future. It just fails with an aasm transition error message and gets sent to the RetrySet. Calling MyWorker.cancel! jid in model code throws an undefined variable error. How can I access that jid in the model? How can actually cancel or delete that specific job? Thanks!
# The wiki code
class MyWorker
include Sidekiq::Worker
def perform(thing_id)
return if cancelled?
# do actual work
end
def cancelled?
Sidekiq.redis {|c| c.exists("cancelled-#{jid}") }
end
def self.cancel!(jid)
Sidekiq.redis {|c| c.setex("cancelled-#{jid}", 86400, 1) }
end
end
# create job
jid = MyWorker.perform_async("foo")
# cancel job
MyWorker.cancel!(jid)
You can do this but it won't be efficient. It's a linear scan for find a scheduled job by JID.
require 'sidekiq/api'
Sidekiq::ScheduledSet.new.find_job(jid).try(:delete)
Alternatively your job can look to see if it's still relevant when it runs.
Ok, so turns out I had one question already answered. One of the code sets I included was a functionally similar version of the code from the wiki. The solution to the other question ("how can I access that jid in the model?") seems really obvious if you're not still new to programming, but basically: store the jid in a database column and then retrieve/update it whenever it's needed! Duh!

Rails methods not initialized in time for worker

Earlier, I had posted this question – and thought it was resolved:
Rails background worker always fails first time, works second
However, after continuing with tests and development, the error is back again, but in a slightly different way.
I'm using Sidekiq (with Rails 3.2.8, Ruby 1.9.3) to run background processes, after_save. Below is the code for my model, worker, and controller.
Model:
class Post < ActiveRecord::Base
attr_accessible :description,
:name,
:key
after_save :process
def process
ProcessWorker.perform_async(id, key) if key.present?
true
end
def secure_url
key.match(/(.*\/)+(.*$)/)[1]
end
def nonsecure_url
key.gsub('https', 'http')
end
end
Worker:
class ProcessWorker
include Sidekiq::Worker
def perform(id, key)
post = Post.find(id)
puts post.nonsecure_url
end
end
(Updated) Controller:
def create
#user = current_user
#post = #user.posts.create(params[:post])
render nothing: true
end
Whenever jobs are first dispatched, no matter the method, they fail initially:
undefined method `gsub' for nil:NilClass
Then, they always succeed on the first retry.
I've come across the following github issue, that appears to be resolved – relating to this same issue:
https://github.com/mperham/sidekiq/issues/331
Here, people are saying that if they create initializers to initialize the ActiveRecord methods on the model, that it resolves their issue.
To accomplish this, I've tried creating an initializer in lib/initializers called sidekiq.rb, with the following, simply to initialize the methods on the Post model:
Post.first
Now, the first job created completes successfully the first time. This is good. However, a second job created fails the first time – and completes upon retry... putting me right back to where I started.
This is really blowing my mind – has anyone had the same issue? Any help is appreciated.
Change your model callback from after_save to after_commit for the create action. Sometimes, sidekiq can initialize your worker before the model actually finishes saving to the database.
after_commit :process, :on => :create

Strange behavior with a resque scheduler job

so some context, I got some advice here:
Scheduling events in Ruby on Rails
aand have been tying to implement it today. I cant seem to make it work though. this is my scheduler job that is used to move my questions around between a delayed queue and a ready to send out queue (i've since decided to use email instead of SMS)
require 'Assignment'
require 'QuestionMailer'
module SchedulerJob
#delayed_queue = :delayed_queue
#ready_queue
def self.perform()
#delayed_queue.each do |a|
if(Time.now >= a.question.schedule)
#ready_queue << a
#delayed_queue.delete(a)
end
end
push_questions
end
def self.gather()
assignments = Assignment.find :all
assignments.each do |a|
#delayed_queue << a unless #delayed_queue.include? a
end
end
private
def self.push_questions
#ready_queue.each do |a|
QuestionMailer.question(a)
end
end
end
I use a callback on_create to call the gather method every time an assignment is created, and then the perform action actually does the sending of emails when resque runs.
I'm getting a strange error from the callback though.
undefined method `include?' for :delayed_queue:Symbol
here is the code from the assignment model
class Assignment < ActiveRecord::Base
belongs_to :user
belongs_to :question
attr_accessible :title, :body, :user_id, :question_id , :response , :correct
after_create :queue_assignments
def grade
self.correct = (response == self.question.solution) unless response == nil
end
def queue_assignments
SchedulerJob.gather
end
Any ideas what's going on? I think this is a problem with my understanding of how these queue's work with resque-scheduler. I assumed that if the queues were list-like objects then I could operate on them , but it appears that it a symbol instead of something with methode like include? I assume the << notation for adding something to it is also invalid.
Also please advise if this isn't the way to go about handling this kind of job scheduling
It appears you may have not restarted your Rails app after adding the new method gather to the SchedulerJob module. Try restarting your app to resolve this.
You may also be able to add the directory containing your Resque worker to Rails' watchable_dirs array so that changes you make to Resque worker modules in development don't require restarting your app. See this blog post for details:
http://wondible.com/2012/01/13/rails-3-2-autoloading-in-theory/

Thread-safe Rails controller actions - setting instance variables?

I have to write a threaded Rails app because I am running it atop of Neo4j.rb, which embeds a Neo4j graph database inside the Rails process, and thus I have to serve multiple requests from the same process. Yeah, it'd be cool if connecting to a Neo4j database worked like SQL databases, but it doesn't, so I'll quit complaining and just use it.
I'm quite worried about the implications of writing concurrent code (as I should be), and just need some advice on how to handle common a common scenario - a controller sets an instance variable or a variable in the session hash, then some stuff happens. Consider the following crude code to demonstrate what I mean:
# THIS IS NOT REAL PRODUCTION CODE
# I don't do this in real life, it is just to help me ask my question, I
# know about one-way hashing, etc.!
class SessionsController
def create
user = User.find_by_email_and_password(params[:email], params[:password])
raise 'auth error' unless user
session[:current_user_id] = user.id
redirect_to :controller => 'current_user', :action => 'show'
end
end
class CurrentUserController
def show
#current_user = User.find(session[:current_user_id])
render :action => :show # .html.erb file that uses #current_user
end
end
The question: Are there any race conditions in this code?
In SessionsController, are the session hash and the params hash thread-local? Say the same browser session makes multiple requests to /sessions#create (to borrow Rails route syntax) with different credentials, the user that is logged in should be the request that hit the line session[:current_user_id] = user.id last? Or should I wrap a mutex lock around the controller action?
In the CurrentUserController, if the show action is hit simultaneously by two requests with different sessions, will the same #current_user variable be set by both? I.e. will the first request, as it is processing the .html.erb file, find that it's #current_user instance variable has suddenly been changed by the second thread?
Thanks
Each request gets a new instance of your controller. As a consequence controller instance variables are thread safe. params and session are also backed by controller instance variables (or the request object itself) and so are also safe.
It's important to know what is shared between threads and what isn't.
Now back to your specific example. Two requests hit CurrentUserController#show simultaneously, hence they are handled by two concurrent threads. The key here is that each thread has its own instance of CurrentUserController, so there are two #current_user variables which don't interfere. So there's no race condition around #current_user.
An example of race condition would be this:
class ApplicationController < ActionController::Base
before_each :set_current_user
cattr_accessor :current_user
def set_current_user
self.class.current_user = User.find_by_id(session[:current_user_id])
end
end
# model
class LogMessage < ActiveRecord::Base
belongs_to :user
def self.log_action(attrs)
log_message = new(attrs)
log_message.user = ApplicationController.current_user
log_message.save
end
end
On more general note, because of GIL (Global Interpreter Lock) benefits from using threads in MRI ruby are rather limited. There are implementation which are free from GIL (jruby).

Resources