How to send a job to the Deadset?
I've been searching around and I see some other developers that want to Notify to their exception handling systems when a job fails, for example if the job connects to an unreliable third party API.
In my case I would like to send the job to the deadset right away instead of just discarding it or waiting around 20 days.I want A human kind to have to click the Retry or Delete button based on the failure reason in the Deadset UI.
So if I have this Russian Roulette job that would retry if feeling lucky or go to the graveyard (aka Deadset) if not, how would be a proper way to do it?
It has come to my mind to set the retry: 0 in the sidekiq_options but that wouldn't be the case always. I think one way would be to use the retry method and pass somehow the sidekiq_options retry: 0 so next time it retries it self then it goes to the Deadset.
class RussianRoulette < ActiveJobBase
sidekiq_options retry: 10
discard_on SomeException do |job, exception|
puts "You are dead but somebody will check your corpse!"
# How to send to deadset?
end
def perform(args)
if feeling_lucky? # If this method fails it retries up to 10 times
puts "You got another chance!"
retry_job
else
puts "You are dead!"
# What to do?:
# throw :abort # No, because we want to send it to the deadset so somebody can erase or retry it
# raise SomeException # Could be, but then how to send to deadset?
# self.class.perform_later(args, {sidekiq_options: {retry: 0}}) # Is this even possible?
end
end
end
The closest I've achieved is to push directly in the Deadset using the Sidekiq API, from the source I see that it expects a 'message'; I've tried to push the job as a JSON but in the WebUI it seems the message parameter is not correct.
Sidekiq::DeadSet.new.kill(job.to_json) # Also tried job.arguments.to_json
Update:
I have followed the logic of the Kill button in the WebUI and I see there is a kill method that is called to Job instance (or not?) but I see it's not available when I call it manually on console.
(byebug) job.kill
*** NoMethodError Exception: undefined method `kill' for #<RussianRoulette:0x00007faf520805a8>
One suggestion i can think of is split the queues in 2 different queues
Queue with retry.
Instant dead queue retry set to 0.
Now RussianRoulette class can enqueue to retry queue or dead queue based on the need. And also you can have a logic in dead queue to re enqueue back to the RussianRoulette or retry queue based on retry arg.
I ended up injecting directly in the Deadset the record by just using the params of my original job. I captured the exception (using a custom exception) and the rescue_from method.
class MyJob < ActiveJob::Base
rescue_from(CustomException) do |exception|
params = self.arguments
params[0]['_aj_symbol_keys'] = params[0].keys.map(&:to_s)
job = {
"error_message": exception.message,
"error_class": exception.class.name,
"jid": SecureRandom.hex(12),
"wrapped": self.class.name,
"created_at": Time.now.to_f,
"enqueued_at": Time.now.to_f,
"failed_at": Time.now.to_f,
"retry_count": 0,
"args": [
{
"job_class": self.class.name,
"job_id": SecureRandom.hex(12),
"provider_job_id": nil,
"priority": nil,
"arguments": params,
"executions": 0,
"queue_name": "default",
"exception_executions": {},
"locale": "en",
"timezone": "UTC",
"enqueued_at": Time.now.to_f
}
],
"retry": 0,
"queue": "default",
"class": "ActiveJob::QueueAdapters::SidekiqAdapter::JobWrapper"
}
Sidekiq::DeadSet.new.kill(job.to_json)
end
def perform(params)
raise CustomException.new("Here is a custom exception") if whatever
end
end
So when i need to send a Job to the morgue directly inside my job logic i just raise an exception and thats it, the rescue_from block will capture it and create it in the morgue.
After this it shows up in my Morgue:
And some human can analyze it and delete or retry it. In my case this was a web scraper, so if the HTML of my target site goes invalid then a developer has to update the logic and we have to retry with the original arguments once the developer pushes the updated scraping code.
Related
I am currently working on stripe webhooks for my rails application and am encountering a problem. All events except for checkout.session.completed are working.
My main goal is to change the payment status booking.paid to true when the event checkout.session.completed happens. The stripe webhooks logs give me a 500 internal server error for the event checkout.session.completed. I think the problem is in my Webhook controller but I just can't figure out what's wrong. Any help would be amazing!
This is my Webhooks_controller:
class WebhooksController < ApplicationController
skip_before_action :authenticate_user!
skip_before_action :verify_authenticity_token
def create
payload = request.body.read
sig_header = request.env['HTTP_STRIPE_SIGNATURE']
event = nil
begin
event = Stripe::Webhook.construct_event(
payload, sig_header, Rails.application.credentials[:stripe][:webhook]
)
rescue JSON::ParserError => e
status 400
return
rescue Stripe::SignatureVerificationError => e
# Invalid signature
puts "Signature error"
p e
return
end
# Handle the event
case event.type
when 'checkout.session.completed'
# session = event.data.object
# #booking.session.client_reference_id.paid = true
booking = Booking.find_by(checkout_session_id: event.data.object.id)
booking.update(paid: true)
end
render json: { message: 'success' }
end
end
I just happen to be writing the exact same feature as you so I'm glad this popped up in my queue.
From taking a quick look at the code nothing stands out much. If we know that the only event that doesn't work is checkout.session.completed, and that's the only one we're even processing, that narrows the problem down a bit... so here's what I did:
I copied your implementation into a controller in my Rails API project, then used the Stripe CLI to listen for, and forward Stripe events to the new endpoint:
$ stripe listen --forward-to http://localhost:3000/webhook_events
I commented out the actual handling of the event so it was only processing the event.
I then used the Stripe CLI in a new terminal to trigger a checkout.session.completed event:
$ stripe trigger checkout.session.completed
Once I did this, my API responded with a 201 and Stripe was happy.
So after all of that, as the previous answer suggests, I think the issue lies with your updating the Booking model, so I have a few suggestions to make working with webhooks in general easier:
Ideally, your controller should respond with a 2xx to Stripe as soon as you've verified the authenticity of the event with the Stripe gem.
Once you've completed that, I would immediately move the processing of the event to a background job using ActiveJob.
In the background job, you know that your event is valid and that the session completed successfully, so now you can start to update your Booking model. The arguments to the job could be as simple as just the Stripe checkout session ID.
Finally, splitting the responsibilities like this will make writing tests much easier (and will catch what the actual problem is!).
I hope this helps, good luck!
I think the issue might lie in the Booking.find_by method. Try adding a line to inspect the value of booking prior to updating its status.
when 'checkout.session.completed'
under this use print(session)
it will show error in the console that is affecting or showing 500 error in checkout session
I am working with the slack API. My script does a bunch of external processing and in some cases it can take around 3-6 seconds. What is happening is the Slack API expects a 200 response within 3 seconds and because my function is not finished within 3 seconds, it retries again and then it ends up posting the same automated responses 2-3 times.
I confirmed this by commenting out all the functions and I had no issue, it posted the responses to slack fine. I then added sleep 10 and it done the same responses 3 times so the ohly thing different was it took longer.
From what I read, I need to have threaded responses. I then need to first respond to the slack API in thread 1 and then go about processing my functions.
Here is what I tried:
def events
Thread.new do
json = {
"text": "Here is your 200 response immediately slack",
}
render(json: json)
end
puts "--------------------------------Json response started----------------------"
sleep 30
puts "--------------------------------Json response completed----------------------"
puts "this is a successful response"
end
When I tested it the same issue happened so I tried using an online API tester and it hits the page, waits 30 seconds and then returns the 200 response but I need it to respond immediately with the 200, THEN process the rest otherwise I will get duplicates.
Am I using threads properly or is there another way to get around this Slack API 3 second response limit? I am new to both rails and slack API so a bit lost here.
Appreciate the eyes :)
I would recommend using ActionJob to run the code in the background if you don't need to use the result of the code in the response. First, create an ActiveJob job by running:
bin/rails generate job do_stuff
And then open up the file created in app/jobs/do_stuff_job.rb and edit the #perform function to include your code (so the puts statements and sleep 30 in your example). Finally, from the controller action you can call DoStuff.perform_later and your job will run in the background! Your final controller action will look something like this:
def events
DoStuff.perform_later # this schedules DoStuff to be done later, in
# the background, so it will return immediately
# and continue to the next line.
json = {
"text": "Here is your 200 response immediately slack",
}
render(json: json)
end
As an aside, I'd highly recommend never using Thread.new in rails. It can create some really confusing behavior especially in test scripts for a number of reasons, but usually because of how it interacts with open connections and specifically ActiveRecord.
I followed this tutorial to create an action cable broadcast but it's not quite working as expected. The channel streams and the web app subscribes successfully, but messages broadcasted from the sidekiq background job are only displayed after refreshing the page. Using the same command on the console does result in an immediate update to the page.
When looking at the frames in chrome's developer mode, I cannot see the broadcasted messages from the background job but can immediately see the ones sent by the console. However, I can confirm that the sidekiq background job is broadcasting those messages somewhere since they do show up upon refresh; however, I don't know where they are being queued.
Are there any additional configuration changes needed to keep the messages from the background job from being queued somewhere? Are there any typos or errors in my code that could be causing this?
Action Cable Broadcast message:
ActionCable.server.broadcast "worker_channel", {html:
"<div class='alert alert-success alert-block text-center'>
Market data retrieval complete.
</div>"
}
smart_worker.rb: -- This is called as perform_async from the controller's action
class SmartWorker
include Sidekiq::Worker
include ApplicationHelper
sidekiq_options retry: false
def perform
ActionCable.server.broadcast "worker_channel", {html:
"<div class='alert alert-success alert-block text-center'>
Market data retrieval complete.
</div>"
}
end
connection.rb:
module ApplicationCable
class Connection < ActionCable::Connection::Base
identified_by :current_user
def connect
self.current_user = current_user #find_verified_user ignored until method implemented correctly and does not always return unauthorized
end
private
def find_verified_user
if current_user = User.find_by(id: cookies.signed[:user_id])
current_user
else
reject_unauthorized_connection
end
end
end
end
worker_channel:
class WorkerChannel < ApplicationCable::Channel
def subscribed
stream_from "worker_channel"
end
def unsubscribed
end
end
worker.js:
App.notifications = App.cable.subscriptions.create('WorkerChannel', {
connected: function() {
console.log('message connected');
},
disconnected: function() {},
received: function(data) {
console.log('message recieved');
$('#notifications').html(data.html);
}
});
cable.yml
development:
adapter: redis
url: redis://localhost:6379/1
test:
adapter: async
production:
adapter: redis
url: <%= ENV.fetch("REDIS_URL") { "redis://localhost:6379/1" } %>
channel_prefix: smarthost_production
Also added
to the view but that didn't make a difference.
I'm not sure this is the entire explanation but this is what I have observed through further testing:
After multiple server restarts, the broadcast started working and would log as expected in the development logger. Console messages where still hit or miss, so I added some additional identifiers to the broadcasted messages and identified that they were being broadcasted before the loading of the next page was completed. This caused two things:
1) A quick flashing of flash messages triggered by the broadcast (in what was perceived to be the old page - i.e. only works after a refresh)
2) A lack of or inconsistent behavior in the browser console: Because the sidekiq worker job finished so quick, sometimes even before the browser started rendering the new page, I believe the console messages are being reset by the page loading actions and are therefore not visible when you check the logs (or even if you stare at it for a while).
It seems as though this is working as expected, and is simply working to quickly in the local environment which makes it seem as though it's not working as intended.
ActionChannel normally does not queue messages and those broadcasted when there's no subscriber should be lost. Observed behaviour can happen if notification actually comes later than you expect.
I'd check:
Run entire job in console, not just notification, and see if it's running slow
Check sidekiq queues latency
Add logging before/after notification in job and check logs if the job is actually run successfully
I've used Heroku tutorial to implement websockets.
It works properly with Thin, but does not work with Unicorn and Puma.
Also there's an echo message implemented, which responds to client's message. It works properly on each server, so there are no problems with websockets implementation.
Redis setup is also correct (it catches all messages, and executes the code inside subscribe block).
How does it work now:
On server start, an empty #clients array is initialized. Then new Thread is started, which is listening to Redis and which is intended to send that message to corresponding user from #clients array.
On page load, new websocket connection is created, it is stored in #clients array.
If we receive the message from browser, we send it back to all clients connected with the same user (that part is working properly on both Thin and Puma).
If we receive the message from Redis, we also look up for all user's connections stored in #clients array.
This is where weird thing happens:
If running with Thin, it finds connections in #clients array and sends the message to them.
If running with Puma/Unicorn, #clients array is always empty, even if we try it in that order (without page reload or anything):
Send message from browser -> #clients.length is 1, message is delivered
Send message via Redis -> #clients.length is 0, message is lost
Send message from browser -> #clients.length is still 1, message is delivered
Could someone please clarify me what am I missing?
Related config of Puma server:
workers 1
threads_count = 1
threads threads_count, threads_count
Related middleware code:
require 'faye/websocket'
class NotificationsBackend
def initialize(app)
#app = app
#clients = []
Thread.new do
redis_sub = Redis.new
redis_sub.subscribe(CHANNEL) do |on|
on.message do |channel, msg|
# logging #clients.length from here will always return 0
# [..] retrieve user
send_message(user.id, { message: "ECHO: #{event.data}"} )
end
end
end
end
def call(env)
if Faye::WebSocket.websocket?(env)
ws = Faye::WebSocket.new(env, nil, {ping: KEEPALIVE_TIME })
ws.on :open do |event|
# [..] retrieve current user
if user
# add ws connection to #clients array
else
# close ws
end
end
ws.on :message do |event|
# [..] retrieve current user
Redis.current.publish({user_id: user.id, { message: "ECHO: #{event.data}"}} )
end
ws.rack_response
else
#app.call(env)
end
end
def send_message user_id, message
# logging #clients.length here will always return correct result
# cs = all connections which belong to that client
cs.each { |c| c.send(message.to_json) }
end
end
Unicorn (and apparently puma) both start up a master process and then fork one or more workers. fork copies (or at least presents the illusion of copying - an actual copy usually only happens as you write to pages) your entire process but only the thread that called fork exists in the new process.
Clearly your app is being initialised before being forked - this is normally done so that workers can start quickly and benefit from copy on write memory savings. As a consequence your redis checking thread is only running in the master process whereas #clients is being modified in the child process.
You can probably work around this by either deferring the creation of your redis thread or disabling app preloading, however you should be aware that your setup will prevent you from scaling beyond a single worker process (which with puma and a thread friendly JVM like jruby would be less of a constraint)
Just in case somebody will face the same problem, here are two solutions I have come up with:
1. Disable app preloading (this was the first solution I have come up with)
Simply remove preload_app! from the puma.rb file. Therefore, all threads will have their own #clients variable. And they will be accessible by other middleware methods (like call etc.)
Drawback: you will lose all benefits of app preloading. It is OK if you have only 1 or 2 workers with a couple of threads, but if you need a lot of them, then it's better to have app preloading. So I continued my research, and here is another solution:
2. Move thread initialization out of initialize method (this is what I use now)
For example, I moved it to call method, so this is how middleware class code looks like:
attr_accessor :subscriber
def call(env)
#subscriber ||= Thread.new do # if no subscriber present, init new one
redis_sub = Redis.new(url: ENV['REDISCLOUD_URL'])
redis_sub.subscribe(CHANNEL) do |on|
on.message do |_, msg|
# parsing message code here, retrieve user
send_message(user.id, { message: "ECHO: #{event.data}"} )
end
end
end
# other code from method
end
Both solutions solve the same problem: Redis-listening thread will be initialized for each Puma worker/thread, not for main process (which is actually not serving requests).
In my rails app controller I am posting to the api of the app on the same machine. I have build this out to handle the posting the data to the url:
url = "http://172.16.155.165:3000/api/jobs"
params = {
:input => "original/video.h264",
:output => "new/video.mp4",
:preset => 'h264'
}
jobResults = Net::HTTP.post_form(URI.parse(url), params)
This works great when I run this code through rails console but when I use it in my controller it gives me this error after loading for a minute or so:
Timeout::Error in SeminarsController#create
Timeout::Error
Once the timeout happens the data is actually posted and the api does what it should. It is like it is hanging until it times out then posts the data. The controller never goes beyond this step though. It should write the response body to a file with jobResults.body which would work fine if it didn't time out. If I write this into rails console it outputs the response immediately. The api will never take a whole minute to respond.
Am I doing something to cause this to happen? How can I make it work right?
edit:
This is the code for create in app/controllers/api/jobs_controller.rb:
def create
job = Job.from_api(params, :callback_url => lambda { |job| api_job_url(job) })
if job.valid?
response.headers["X-State-Changes-Location"] = api_state_changes_url(job)
response.headers["X-Notifications-Location"] = api_notifications_url(job)
respond_with job, :location => api_job_url(job) do |format|
format.html { redirect_to jobs_path }
end
else
respond_with job do |format|
format.html { #job = job; render "/jobs/new"}
end
end
end
Yes. Ideally you should remove the long running process (yes this is long running process) into background job. Remember that when many users start updating the videos, this process will show down for many reasons (like bandwidth, API acceptance rate etc). Rake::Timeout always pops out if the process passes the threshold. It is actually designed to abort requests that are taking too long to respond. And, it is not raised in console.
How can I make it work right?
Move it to the background job. Or you can explictly increase the rake timeout interval by doing something like this
# config/initializers/timeout.rb
Rack::Timeout.timeout = 30 # seconds
But i suggest not to do this. This rake-timeout helps in debugging. Mainly people use in heroku with newrelic.