ActiveJob: how to do simple operations without a full blown job class? - ruby-on-rails

With delayed_job, I was able to do simple operations like this:
#foo.delay.increment!(:myfield)
Is it possible to do the same with Rails' new ActiveJob? (without creating a whole bunch of job classes that do these small operations)

ActiveJob is merely an abstraction on top of various background job processors, so many capabilities depend on which provider you're actually using. But I'll try to not depend on any backend.
Typically, a job provider consists of persistence mechanism and runners. When offloading a job, you write it into persistence mechanism in some way, then later one of the runners retrieves it and runs it. So the question is: can you express your job data in a format, compatible with any action you need?
That will be tricky.
Let's define what is a job definition then. For instance, it could be a single method call. Assuming this syntax:
Model.find(42).delay.foo(1, 2)
We can use the following format:
{
class: 'Model',
id: '42', # whatever
method: 'foo',
args: [
1, 2
]
}
Now how do we build such a hash from a given call and enqueue it to a job queue?
First of all, as it appears, we'll need to define a class that has a method_missing to catch the called method name:
class JobMacro
attr_accessor :data
def initialize(record = nil)
self.data = {}
if record.present?
self.data[:class] = record.class.to_s
self.data[:id] = record.id
end
end
def method_missing(action, *args)
self.data[:method] = action.to_s
self.data[:args] = args
GenericJob.perform_later(data)
end
end
The job itself will have to reconstruct that expression like so:
data[:class].constantize.find(data[:id]).public_send(data[:method], *data[:args])
Of course, you'll have to define the delay macro on your model. It may be best to factor it out into a module, since the definition is quite generic:
def delay
JobMacro.new(self)
end
It does have some limitations:
Only supports running jobs on persisted ActiveRecord models. A job needs a way to reconstruct the callee to call the method, I've picked the most probable one. You can also use marshalling, if you want, but I consider that unreliable: the unmarshalled object may be invalid by the time the job gets to execute. Same about "GlobalID".
It uses Ruby's reflection. It's a tempting solution to many problems, but it isn't fast and is a bit risky in terms of security. So use this approach cautiously.
Only one method call. No procs (you could probably do that with ruby2ruby gem). Relies on job provider to serialize arguments properly, if it fails to, help it with your own code. For instance, que uses JSON internally, so whatever works in JSON, works in que. Symbols don't, for instance.
Things will break in spectacular ways at first.
So make sure to set up your debugging tools before starting off.
An example of this is Sidekiq's backward (Delayed::Job) compatibility extension for ActiveRecord.

As far as I know, this is currently not supported. You can easily simulate this feature using a custom-defined proxy-job that accepts a model or instance, a method to be performed and a list of arguments.
However, for the sake of code testing and maintainability, this shortcut is not a good approach. It's more effective (even if you need to write a little bit more of code) to have a specific job for everything you want to enqueue. It forces you to think more about the design of your app.

I wrote a gem that can help you with that https://github.com/cristianbica/activejob-perform_later. But be aware that I believe that having methods all around your code that might be executed in workers is the perfect recipe for disaster is not handled carefully :)

Related

Passing Complex Hashes to Sidekiq Jobs

From the Best Practices Guide to using Sidekiq, I understand it's best to pass "string, integer, float, boolean, null(nil), array and hash" as arguments to the job.
I often just pass the id of a persisted object to my jobs, but due to latency constraints I need to save the object after running the job.
The non-persisted object I'm working with contains a mixture of data types:
#MyObject<00x000>{
id: nil
start_time: Fri, 11 Dec 2020 08:45:00 PST -08:00 (*this is a TimeWithZone object)
rate: 18.0 (*this is a BigDecimal object)
...
}
I plan to pass this object to my job by converting it to a hash first:
MyJob.perform_async(my_object.attributes)
and then later persist the object like so:
MyObject.new(my_object_hash).save
My question is, is this safe? Even though I am passing a 'simple' datatype to Sidekiq, it actually contains complex objects. Am I going to lose precision?
Thank you!
This sounds like a "potayto, potahto" solution. You are not not using the serialisation of Sidekiq, but instead serialize it yourself.
Let's have a look at why sidekiq has this rule:
Even if they did serialize correctly, what happens if your queue backs up and that quote object changes in the meantime? [...]
Don't pass symbols, named parameters, keyword arguments or complex Ruby objects (like Date or Time!) as those will not survive the dump/load round trip correctly.
I like to add a third:
Serializing state makes it impossible to distinguish between persisted and ethereal (in-memory, memoized, lazy-loaded etc) data. E.g. a def sent_mails; #sent_mails ||= Mail.for(user_id: id); end now gets serialized: do you want that?
The solution is also provided by sidekiq:
Don't save state to Sidekiq, save simple identifiers. Look up the objects once you actually need them in your perform method.
The XY problem here
Your real problem is not where or how to serialize state. Because sidekiq warns against serializing state regardless of where and how you do this.
The problem you need to solve is either how to store state somewhere where it can be stored properly. Or to avoid storing the state at all: not in redis/sidekiq, nor in the storage that is giving you problems.
Latency
Is your storage slow? Is it not a validation, a serialisation, some side-effect of storage that is slow?
Can you improve this by making it a two-step: insert the state and update/enrich/validate it async later? If you are using Rails, it won't help you here, or might even work against you, but a common model is to store objects in a special "queue" table or events queue; e.g. kafka is famous for this.
When e.g. storage happens over a slow network to a slow API, this is probably unsolvable, but when storage happens in a local database, there are decades of solutions to improve write performance here that you can use. Both inside your database, or with some specialised queue for state-storage (sidekiq is not such a specialised storage queue) depending on the tech used to store. E.g. Linux will allow you to store through memory, making writes to disk really quick, but removing the guarantee that it was really written to disk.
E.g. In a bookkeeping api, we would store the validated object in PostgreSQL and then have async jobs add expensive attributes to this later (e.g. state that had to be retrieved from legacy APIs or through complex calculations).
E.g. in a write-heavy GIS system, we would store objects into a "to_process_places" table, that was monitored by tooling which processes the Places. It all really depends on your domain, and requirements.
Not using state.
A common solution is not to make objects, but use the actual payload by the customer. Just send the HTTP payload (in rails, the params) along and leave it at that. Maybe merge in a header (like the Request Date) or filter out some data (header tokens or cookies).
If your controller can operate with this data, so can a delayed job. Instead of building objects in the controller, leave that to the delayed job. This can even result in really neat and lean controllers: all they do is (some authentication and authorization and then) call the proper job and pass it a sanitized params.
Obviously this requires trade-offs like not being able to validate in-sync, but to give such info over email, push-notification, or delayed response instead, depending on your requirements (e.g. a large CSV import could just email any validation issues, but a login request might need to get immediate response if the login is invalid).
It also requires some thought: you probably don't want to send the Base64 encoded CSV along to sidekiq, but instead write the file to a (temp) storage and pass the filename/url along instead. This might sound obvious, because it is: file uploads are essentially an implementation of the earlier mentioned "temporary state storage": you don't pass the entire PDF/high-res-header-image/CSV along to sidekiq, but store it somewhere so sidekiq can pick it up later to process it. Why should the other attributes not employ the same pattern if passing them along to sidekiq is problematic?
The most important part from the best practices you linked is
Complex Ruby objects do not convert to JSON
Therefore you're not supposed to pass instances of a model to a worker.
If you're using Sidekiq workers, you should comply with this statement and the hash you're passing should be just fine. I am not exactly sure about the TimeWithZone object, but you could try converting this to a JSON or to a string as they do in the best practices guide.
However, if you're using ActiveJob instead of Sidekiq workers (does your Job inherit from ApplicationJob or does it include Sidekiq::Worker ?), then you don't have that problem because ActiveJob uses Global ID to convert objects into a String. And then before performing the job is deserializing the object again. Meaning you can pass an object to your job.
my_object = MyObject.find(1)
my_object.to_global_id #=> #<GlobalID:0x000045432da2344 [...] gid://your_app_name/MyObject/1>>
serialized_my_object = my_object.to_global_id.to_s
my_object = GlobalID.find(serialized_my_object)
You can find more information here
https://github.com/toptal/active-job-style-guide#active-record-models-as-arguments
After doing some experimentation on the Time objects in my job, I found that I am losing nanosecond precision at the other end of the job.
my_object.start_time
=> Mon, 21 Dec 2020 11:35:50 PST -08:00
my_object.strftime('%Y-%m-%d %H:%M:%S.%N')
=> "2020-12-21 11:35:50.151893000"
You can see here, we have precision including 6 digits after the decimal.
(see this answer for more about 'strftime')
Once we call JSON methods on the object:
generated = JSON.generate(my_object.attributes))
=> \"start_time\":\"2020-12-21T11:35:50.151-08:00\"
You can see here we are down to 3 digits of precision after the decimal. The remaining 3 digits are lost at this point.
parsed = JSON.parse(generated)
parsed[‘start_time’] = "2020-12-21T11:35:50.151-08:00"
It appears at the most basic level, the JSON library recursively calls as_json on each of the key-value pairs in the hash. So really it depends on how your particular object implements as_json.
This issue caused test failures that involved querying our db for persisted objects (initialized with something like, start_time = Time.zone.now (!)) that are meant to overlap in time exactly with our MyObject class. Once the half-baked my_object blueprints made it through Sidekiq, they lost a sliver of precision, causing a slight misalignment.
One way to hack away at this issue is by monkey patching the Time class.
In our case, a better solution was to go in the opposite direction and to not use so much precision in our tests. The my_object in the example is something that a human user will have on their calendar; in production we never receive so much precision from clients. So instead we fixed our tests by instructing some of our test objects to use something like Time.zone.now.beginning_of_minute, rather than Time.zone.now. We intentionally removed precision to fix the issue, as well as more closely mirror reality.

Put class instance to class constant in initializers

In one of my old apps, I'm using several API connectors - like AWS or Mandill as example.
For some reason (may be I saw it somewhere, don't remember), I using class constant to initialize this objects on init stage of application.
As example:
/initializers/mandrill.rb:
require 'mandrill'
MANDRILL = Mandrill::API.new ENV['MANDRILL_APIKEY']
Now I can access MANDRILL class constant of my application in any method and use it. (full path MyApplication::Application::MANDRILL, or just MANDRILL). All working fine, example:
def update_mandrill
result = MANDRILL.inbound.update_route id, pattern, url
end
The question is: it is good practice to use such class constants? Or better create new class instance in every method that using this instance, like in example:
def update_mandrill
require 'mandrill'
mandrill = Mandrill::API.new ENV['MANDRILL_APIKEY']
result = mandrill.inbound.update_route id, pattern, url
end
Interesting question.
It's very handy approach but it may have cons in some scenarios.
Imagine you have a constant that either takes a lot of time to initialize or it loads a lot of data into memory. When its initialization takes long you essentially degrade app boot time (which may or may not be a problem, usually it will in development).
If it loads a lot of data into memory it may turn out it's gonna be a problem when running rake tasks for example which load entire environment. You may hit memory boundaries in use cases in which you essentially do not need this data at all.
I know one application which load a lot of data during boot - and it's done very deliberately. Sure, use case is a bit uncommon, but still.
Another thing to consider is - imagine, you're trying to establish connection to external service like Mongo or anything else. If this service is unavailable (what happens) your application won't be able to boot. Maybe this service is essential for app to work, and without it it would be "useless" anyway, but it's also possible that you essentially stop everything because storage in which you keeps log does not work.
I'm not saying you shouldn't use it as you suggested - I do it also in my apps, but you should be aware of potential drawbacks.
Yes, pre-creating a pseudo-constant object (like that api client) is usually a good idea. However, there is, approximately, a thousand ways go about it and the constant is not on top of my personal list.
These days I usually go with setting it in the env files.
# config/environments/production.rb
config.email_client = Mandrill::API.new ENV['MANDRILL_APIKEY'] # the real thing
# config/environments/test.rb
config.email_client = a_null_object # something that conforms to the same api, but does absolutely nothing
# config/environments/development.rb
config.email_client = a_dev_object # post to local smtp, or something
Then you refer to the client like this:
Rails.application.configuration.email_client
And the correct behaviour will be picked up in each env.
If I don't need this per-env variation, then I either use some kind of singleton object (EmailClient.get) or a global variable in the initializer ($email_client). It can be argued that a constant is better than global variable, semantically and because it raises a warning when you try to re-assign it. But I like that global variable stands out more. You see right away that it's something special. (And then again, it's only #3 in the list, so I don't do it very often.).

What's the "right" way to test functions that call methods on new instances?

I seem to have encountered literature alluding to it being bad practice to use RSpec's any_instance_of methods (e.g. expect_any_instance_of). The relish docs even list these methods under the "Working with legacy code" section (http://www.relishapp.com/rspec/rspec-mocks/v/3-4/docs/working-with-legacy-code/any-instance) which implies I shouldn't be writing new code leveraging this.
I feel that I am routinely writing new specs that rely on this capability. A great example is any method that creates a new instance and then calls a method on it. (In Rails where MyModel is an ActiveRecord) I routinely write methods that do something like the following:
def my_method
my_active_record_model = MyModel.create(my_param: my_val)
my_active_record_model.do_something_productive
end
I generally write my specs looking for the do_something_productive being called with use of the expect_any_instance_of. e.g.:
expect_any_instance_of(MyModel).to receive(:do_something_productive)
subject.my_method
The only other way I can see to spec this would be with a bunch of stubs like this:
my_double = double('my_model')
expect(MyModel).to receive(:create).and_return(my_double)
expect(my_double).to receive(:do_something_productive)
subject.my_method
However, I consider this worse because a) it's longer and slower to write, and b) it's much more brittle and white box than the first way. To illustrate the second point, if I change my_method to the following:
def my_method
my_active_record_model = MyModel.new(my_param: my_val)
my_active_record_model.save
my_active_record_model.do_something_productive
end
then the double version of the spec breaks, but the any_instance_of version works just fine.
So my questions are, how are other developers doing this? Is my approach of using any_instance_of frowned upon? And if so, why?
This is kind of a rant, but here are my thoughts:
The relish docs even list these methods under the "Working with legacy code" section (http://www.relishapp.com/rspec/rspec-mocks/v/3-4/docs/working-with-legacy-code/any-instance) which implies I shouldn't be writing new code leveraging this.
I don't agree with this. Mocking/stubbing is a valuable tool when used effectively and should be used in tandem with assertion style testing. The reason for this is that mocking/stubbing enables an "outside-in" testing approach where you can minimize coupling and test high level functionality without needing to call every little db transaction, API call, or helper method in your stack.
The question really is do you want to test state or behavior? Obviously, your app involves both so it doesn't make sense to tether yourself to a single paradigm of testing. Traditional testing via assertions/expectations is effective for testing state and is seldom concerned with how state is changed. On the other hand, mocking forces you to think about interfaces and interactions between objects, with less burden on the mutation of state itself since you can stub and shim return values, etc.
I would, however, urge caution when using *_any_instance_of and avoid it if possible. It's a very blunt instrument and can be easy to abuse especially when a project is small only to become a liability when the project is larger. I usually take *_any_instance_of as a smell that either my code or tests, can be improved, but there are times wen it's necessary to use.
That being said, between the two approaches you propose, I prefer this one:
my_double = double('my_model')
expect(MyModel).to receive(:create).and_return(my_double)
expect(my_double).to receive(:do_something_productive)
subject.my_method
It's explicit, well-isolated, and doesn't incur overhead with database calls. It will likely require a rewrite if the implementation of my_method changes, but that's OK. Since it's well-isolated it probably won't need to be rewritten if any code outside of my_method changes. Contrast this with assertions where dropping a column in a database can break almost the entire test suite.
I don't have a better solution to testing code like that than either of the two you gave. In the stubbing/mocking solution I'd use allow rather than expect for the create call, since the create call isn't the point of the spec, but that's a side issue. I agree that the stubbing and mocking is painful, but that's usually what I do.
However, that code has just a bit of Feature Envy. Extracting a method onto MyModel clears up the smell and eliminates the testing issue:
class MyModel < ActiveRecord::Base
def self.create_productively(attrs)
create(attrs).do_something_productive
end
end
def my_method
MyModel.create_productively(attrs)
end
# in the spec
expect(MyModel).to receive(:create_productively)
subject.my_method
create_productively is a model method, so it can and should be tested with real instances and there's no need to stub or mock.
I often notice that the need to use less-commonly-used features of RSpec means that my code could use a little refactoring.
def self.my_method(attrs)
create(attrs).tap {|m| m.do_something_productive}
end
# Spec
let(:attrs) { # valid hash }
describe "when calling my_method with valid attributes" do
it "does something productive" do
expect(MyModel.my_method(attrs)).to have_done_something_productive
end
end
Naturally, you will have other tests for #do_something_productive itself.
The trade-off is always the same: mocks and stubs are fast, but brittle. Real objects are slower but less brittle, and generally require less test maintenance.
I tend to reserve mocks/stubs for external dependencies (e.g. API calls), or when using interfaces that have been defined but not implemented.

Rspec - combine expect_any_instance_of and a receive counts

I need to verify that any instance of my class receives a certain method, but I don't care if many instances receive it (they're supposed to).
I tried like this:
expect_any_instance_of(MyClass).to receive(:my_method).at_least(:once)
But apparently, it only allows for a single instance to receive the method multiple times, but not for different instances.
Is there a way to achieve that?
If you need to live with the code smell, this rspec-mocks Github issue suggests a solution along these lines:
receive_count = 0
allow_any_instance_of(MyClass).to receive(:my_method) { receive_count += 1 }
# Code to test here.
expect(receive_count).to be > 0
This is a known issue in rspec-mocks. From the v3.4 documentation on Any instance:
The rspec-mocks API is designed for individual object instances, but this feature operates on entire classes of objects. As a result there are some semantically confusing edge cases. For example, in expect_any_instance_of(Widget).to receive(:name).twice it isn't clear whether a specific instance is expected to receive name twice, or if two receives total are expected. (It's the former.)
Furthermore
Using this feature is often a design smell. It may be that your test is trying to do too much or that the object under test is too complex.
Do you have any way to refactor your test or app code to avoid the "confusing edge case"? Perhaps by constructing a test double and expecting it to receive messages?

How would I use Procs in Ruby on Rails?

There are a few helpers I am using in my project, which I just thought that I could maybe treat as Procs, as they do very specific tasks and can be used by very different components.
I've used Procs in small Ruby projects, mainly when learning the language, and I thought that this would be a good occasion to put them to use.
My question is, where would I put the Procs in the Rails folder structure? Are there any guidelines or reccomdendations for this? Is it considered good practice?
I am a bit puzzled what the advantage would be of using Procs over using simple methods? So if you could give some examples, that would be nice.
Anyways: since Procs can be stored in a variable, I would declare a module inside the lib folder, and define the procs as variables, constants, or methods returning the proc. Something like this
module ProcContainer
def proc_1(factor)
Proc.new { |n| n*factor }
end
PROC_2 = Proc.new { |n| 2 * n }
end
which would be used as
gen_proc = ProcContainer.proc_1(6)
result = gen_proc(3)
other_proc = ProcContainer.PROC_2(4)
The advantage of the method is obvious i guess, since it will return a new Proc object every time it is called, while the constant is only evaluated once.
(of course you should change the naming to something more appropriate)
Ruby has amazing syntax for blocks, so we tend to favor them over explicitly making procs. The downside of blocks is that they need to be executed immediately when the called method yields to them (procs don't have that limitation). That is in place for performance reasons, but you can easily package up a block as a proc, and store it somewhere else for later, or pass it down to another method. So even though you are probably using procs every day, you don't really realize it, because your interface to them is through the block syntax.

Resources