Erlang: check on which scheduler a process is running? - erlang

I could use erlang:trace/3 to keep track of which scheduler the processes are running at any given time and then use the timestamp+pid to get the scheduler id but is there a simpler/more efficient way?
perhaps something like self/0 that returns the scheduler ID instead of the process ID

You are may be looking for erlang:system_info(scheduler_id).

Related

It is possible to handle the task shut down from inside the Airflow Task?

We use Neo4j and call its queries from Airflow tasks. The issue is that these queries often don't stop when the task is marked as "Failed" or "Completed" in the Airflow GUI. So, I would like to find out a way how to call a kill query from inside the currently running task when the task is marked as "Failed" or "Completed".
In the Airflow, the query is executed using session.run(query) method from GraphDatabase.driver. Where GraphDatabase is a part of neo4j python library
Is it any straightforward solution how to do it?
Base Operator has "on_kill" method that you can override: https://airflow.apache.org/docs/apache-airflow/stable/_api/airflow/models/baseoperator/index.html#airflow.models.baseoperator.BaseOperator.on_kill
It's likely that the operator you use (Neo4j) does not have it properly implemented - but you can always create a custom operator with proper on_kill implementation and possibly contribute it back as PR
If you know the query id, you could try running the following query,
CALL dbms.killQuery(queryId)
See https://neo4j.com/docs/operations-manual/current/monitoring/query-management/ . This link also show you how to list running queries or transactions

A way to turn off the DASK

I'm a new user of DASK. I have a code in which I use DASK for some parallelizations. Is there some easy way, like a flag, for example, to run the code with DASK off, that is, in serial?
See the docs
You can set the scheduler to be the single-thread, in-process one, known as "sync" (or "single-threaded") within a context block:
with dask.config.set(scheduler='sync'):
# do stuff
or until further notice
dask.config.set(scheduler='sync')
# do stuff

How can I get notified when Sidekiq Enqueues jobs or has stopped processing jobs?

I am on Heroku and I got an error because my redis db got too full. The my sidekiq processes stopped working. It was like that for a day until I realized it. Now I have 600+ jobs that I have tried to process but they are just breaking everything now. How can I sound off the alarms when sidekiq can't process jobs or when the Enqueue starts to fill up?
You could set a rake task on a schedule to check Sidekiq stats, and then take the appropriate action ( like send an email ).
I've created my own module with helper methods for Sidekiq that serves a number of purposes, e.g deleting jobs, checking queues, retriving jobs by certain criteria etc. https://gist.github.com/blotto/10324119
For your purpose, grab the sidekiq stats as such :
def sidekiq_stats()
summary = Hash.new
stats = Sidekiq::Stats.new
summary = { processed: stats.processed,
failed: stats.failed,
enqueued: stats.enqueued,
queues: stats.queues}
end
And then evaluate the enqueued value, set a tolerance on what you think is too high, and then let loose the hounds.
If you're using zabbix for monitoring, you could use sidekiq_queue_zabbix template at https://github.com/hungntit/sidekiq_queue_zabbix. This template supports showing graph and sending alert when sidekiq queue size is higher than one specified limit number

How to see the stats about workers in Sidekiq?

We can get some information about sidekiq's status via its API. There's not an API method for workers.
I want to know how many workers a certain class is running. For example I have a class named FooStreamer.rb and it's performed with perform_async method. I want to know how many workers it is running at the current time.
Any ideas?
Solved by Mike (creator of sidekiq) with the following commit:
https://github.com/mperham/sidekiq/commit/c606dd4fde8cdc795d2c750d211a74bf1b380217
Sidekiq comes with a Sinatra web interface which you can access thru mydomain.com/sidekiq. You just need to mount it as per these instructions (it differs depending on whether you use Passenger or Unicorn)
https://github.com/mperham/sidekiq/wiki/Monitoring
There's no API that I know of, but you can easily iterate through the Redis keys that stores the Sidekiq information to count the number of workers working on a particular queue
workers = redis.smembers("workers")
workers.each do |worker|
tokens = worker.split(":")
machine = tokens[0]
pid = tokens[1].split("-")[0]
key = "worker:" + pid
obj = redis.get(key)
#obj will contain information on what queue this worker is processing
end
You can use https://github.com/phstc/sidekiq-statsd and check the stats using a Graphite.

Default queue for new delayed jobs using delayed_job_3?

I am using multiple heroku servers that share the same DB. I would like to have each server only process delayed jobs for the server that created the delayed job entry.
For example:
Server A only processes queue "server_a"
Server A only processes queue "server_b"
etc...
This is accomplishable by using Delayed Job 3 (https://github.com/collectiveidea/delayed_job)
However, for this to work I would need to manually assign a queue name for each delayed job created, which can be a pain. (for example: object.delay(:queue => 'tracking').method)
Instead I would like to be able to assign a "default queue" for all new jobs. Ideally, I put something like this in a delayed_job_config.rb & it works:
DEFAULT_QUEUE_NAME = ENV['APP_NAME']
... the idea being that I do nothing to existing delayed jobs & they automatically get assigned a queue with the same name as the app server.
I am looking for suggestions on how to accomplish this -- or if you want to give it a stab, throw some code my way.
Thanks in advance!
In config/initializers/delayed_job.rb
Delayed::Worker.default_queue_name = `hostname`.chomp

Resources