I have a rake task and when I run it in console, it is killed. This rake task operates with a table of cca 40.000 rows, I guess that may be a problem with Out of memory.
Also, I believe that this query used is optimized for dealing with long tables:
MyModel.where(:processed => false).pluck(:attribute_for_analysis).find_each(:batch_size => 100) do |a|
# deal with 40000 rows and only attribute `attribute_for_analysis`.
end
This task will not be run in the future on regular basis, so I want to avoid some job monitoring solutions like God etc...but considering background jobs e.g.Rescue job.
I work with Ubuntu, ruby 2.0 and rails 3.2.14
> My free memory is as follows:
Mem: total used free shared buffers cached
3891076 1901532 1989544 0 1240 368128
-/+ buffers/cache: 1532164 2358912
Swap: 4035580 507108 3528472
QUESTIONS:
How to investigate why rake task is always killed (answered)
How to make this rake task running ( not answered - still is killed )
What is the difference between total-vm, aton-rs, file-rss (not answered)
UPDATE 1
-Can someone explain the difference between?:
total-vm
anon-rss
file-rss
$ grep "Killed process" /var/log/syslog
Dec 25 13:31:14 Lenovo-G580 kernel: [15692.810010] Killed process 10017 (ruby) total-vm:5605064kB, anon-rss:3126296kB, file-rss:988kB
Dec 25 13:56:44 Lenovo-G580 kernel: [17221.484357] Killed process 10308 (ruby) total-vm:5832176kB, anon-rss:3190528kB, file-rss:1092kB
Dec 25 13:56:44 Lenovo-G580 kernel: [17221.498432] Killed process 10334 (ruby-timer-thr) total-vm:5832176kB, anon-rss:3190536kB, file-rss:1092kB
Dec 25 15:03:50 Lenovo-G580 kernel: [21243.138675] Killed process 11586 (ruby) total-vm:5547856kB, anon-rss:3085052kB, file-rss:1008kB
UPDATE 2
modified query like this and rake task is still killed.
MyModel.where(:processed => false).find_in_batches do |group|
p system("free -k")
group.each do |row| # process
end
end
Related
using Apache Ariflow:
We have created a DAG that runs everyday at 07:00 AM: schedule_interval='0 7 * * *'
The task is searching for a new row in a certain table. If it sees a new row, it continues to execute more tasks and so on.
We want the task to run for 19 hours. If it did not find a new row in that table, it will skip the rest of the tasks. The task's timeout is: timeout=60 * 60 * 19
Recently we have found that after 12 hours of running, we get an error which prompts the task to fail. Because we have a retry, the task retrires and then runs fully for 19 hours.
So instead of 19 hours, we get a run of 31 hours.
Here is the error:
INFO - Dependencies not met for <TaskInstance: DAG_NAME.check_for_new_file 2021-06-28T07:00:00+00:00 [running]>, dependency 'Task Instance State' FAILED: Task is in the 'running' state which is not a valid state for execution. The task must be cleared in order to be run.
Has anyone experienced this error? If I seem to understand correctly, the task is trying to run again after 12 hours, so the task is being changed to a 'failed' instead of 'running' state?
Thanks!
I am running rake test in a Docker container, where I need to limit the container's max-memory. This has been achieved by passing the -m 1800m flag to docker run command. Unfortunately, the rake process seems to keep growing wrt memory usage, and is finally killed mid-way due to some sort of OOM killer. I've tried putting the following in my test_helper.rb file...
class ActiveSupport::TestCase
teardown :perform_gc
def perform_gc
puts "==> Before GC: #{ObjectSpace.count_objects}"
GC.start(full_mark: true)
puts "==> Before GC: #{ObjectSpace.count_objects}"
end
end
...and am getting this output during my test run.
==> Before GC: {:TOTAL=>1067513, :FREE=>798, :T_OBJECT=>146231, :T_CLASS=>13450, :T_MODULE=>3350, :T_FLOAT=>10, :T_STRING=>448040, :T_REGEXP =>3951, :T_ARRAY=>156744, :T_HASH=>32722, :T_STRUCT=>2162, :T_BIGNUM=>8, :T_FILE=>266, :T_DATA=>113834, :T_MATCH=>20339, :T_COMPLEX=>1, :T_RATIONAL=>59, :T_NODE=>115807, :T_ICLASS=>9741}
==> After GC: {:TOTAL=>1067920, :FREE=>304019, :T_OBJECT=>92774, :T_CLASS=>13431, :T_MODULE=>3350, :T_FLOAT=>10, :T_STRING=>328707, :T_REGEXP=>3751, :T_ARRAY=>107523, :T_HASH=>25206, :T_STRUCT=>2023, :T_BIGNUM=>7, :T_FILE=>11, :T_DATA=>112605, :T_MATCH=>11, :T_COMPLEX=>1, :T_RATIONAL=>59, :T_NODE=>64713, :T_ICLASS=>9719}
... test result of test #1 ....
==> Before GC: {:TOTAL=>1598233, :FREE=>338182, :T_OBJECT=>111209, :T_CLASS=>15057, :T_MODULE=>3481, :T_FLOAT=>10, :T_STRING=>570289, :T_REGEXP=>4836, :T_ARRAY=>219746, :T_HASH=>54358, :T_STRUCT=>12047, :T_BIGNUM=>8, :T_FILE=>12, :T_DATA=>138031, :T_MATCH=>2600, :T_COMPLEX=>1, :T_RATIONAL=>389, :T_NODE=>117993, :T_ICLASS=>9984}
==> After GC: {:TOTAL=>1598233, :FREE=>653201, :T_OBJECT=>103708, :T_CLASS=>14275, :T_MODULE=>3426, :T_FLOAT=>10, :T_STRING=>418825, :T_REGEXP=>3773, :T_ARRAY=>137405, :T_HASH=>39734, :T_STRUCT=>7444, :T_BIGNUM=>7, :T_FILE=>12, :T_DATA=>128923, :T_MATCH=>12, :T_COMPLEX=>1, :T_RATIONAL=>59, :T_NODE=>77590, :T_ICLASS=>9828}
... test result of test #2 ....
==> Before GC: {:TOTAL=>1598233, :FREE=>269630, :T_OBJECT=>114406, :T_CLASS=>14815, :T_MODULE=>3470, :T_FLOAT=>10, :T_STRING=>611637, :T_REGEXP=>4352, :T_ARRAY=>248693, :T_HASH=>58757, :T_STRUCT=>12208, :T_BIGNUM=>8, :T_FILE=>25, :T_DATA=>139671, :T_MATCH=>2288, :T_COMPLEX=>1, :T_RATIONAL=>83, :T_NODE=>108278, :T_ICLASS=>9901}
==> After GC: {:TOTAL=>1598233, :FREE=>635044, :T_OBJECT=>105028, :T_CLASS=>14358, :T_MODULE=>3427, :T_FLOAT=>10, :T_STRING=>429137, :T_REGEXP=>3775, :T_ARRAY=>140654, :T_HASH=>41626, :T_STRUCT=>8085, :T_BIGNUM=>7, :T_FILE=>12, :T_DATA=>129507, :T_MATCH=>15, :T_COMPLEX=>1, :T_RATIONAL=>59, :T_NODE=>77631, :T_ICLASS=>9857}
... test result of test #3 ....
... and so on ....
The value of ObjectSpace.count_objects[:TOTAL] is constantly growing after each test!
1067920 (after GC is run at the end of 1st test)
5250321 (after GC is run at the end of 18th test)
8631313 (after GC is run at the end of last, but ten, test)
8631313 (this number remains the same for the next 10 tests)
8631313 (after GC is run at the end of last, but three, test)
8631313 (after GC is run at the end of last, but two, test)
8631313 (after GC is run at the end of last, but one, test)
8631721 (after GC is run at the end of last test, after which rake aborts)
I'm also monitoring the process's memory usage via docker stats and ps aux --sort -rss. The memory consumption stabilises around the 1.77 GB mark, which is dangerously close to the 1800 mb limit set for the container. This is also validated by the value of ObjectSpace.count_object[:TOTAL] not changing for the last 10-15 test before rake is killed/aborted.
The error message in docker logs at the time of crash/abort/kill is:
PANIC: could not write to log file 00000001000000000000000A at offset 14819328, length 16384: Cannot allocate memory
LOG: unexpected EOF on client connection with an open transaction
LOG: WAL writer process (PID 262) was terminated by signal 6: Aborted
LOG: terminating any other active server processes
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
LOG: all server processes terminated; reinitializing
LOG: database system was interrupted; last known up at 2018-02-26 06:35:38 UTC
How do I get to the bottom of this and ensure that my tests run in constant memory?
I've been trying to limit the amount of workers per queue using the sidekiq-limit_fetch gem, and Sidekiq seems to "see" the imposed limits in the log but when I watch the workers the limits are ignored.
Here's the part from the log where Sidekiq sees the limits:
2013-04-02T05:47:19Z 748 TID-11ilcw DEBUG: {:queues=>
["recommendvariations",
"recommendvariations",
"recommendvariations",
"recommendphenotypes",
"recommendphenotypes",
"recommendphenotypes",
"preparse",
"preparse",
"preparse",
"parse",
"parse",
"parse",
"zipgenotyping",
"zipgenotyping",
"zipfulldata",
"deletegenotype",
"fitbit",
"frequency",
"genomegov",
"mailnewgenotype",
"mendeley_details",
"mendeley",
"pgp",
"plos_details",
"plos",
"snpedia",
"fixphenotypes"],
:concurrency=>5,
:require=>".",
:environment=>"production",
:timeout=>8,
:profile=>false,
:verbose=>true,
:pidfile=>"/tmp/sidekiq.pid",
:logfile=>"./log/sidekiq.log",
:limits=>
{"recommendvariations"=>1,
"recommendphenotypes"=>1,
"preparse"=>2,
"parse"=>2,
"zipgenotyping"=>1,
"zipfulldata"=>1,
"fitbit"=>3,
"frequency"=>10,
"genomegov"=>1,
"mailnewgenotype"=>1,
"mendeley_details"=>1,
"mendeley"=>1,
"pgp"=>1,
"plos_details"=>1,
"plos"=>1,
"snpedia"=>1,
"fixphenotypes"=>1},
:strict=>false,
:config_file=>"config/sidekiq.yml",
:tag=>"snpr"}
and here's the sidekiq.yml. Judging from the web-interface of sidekiq the limits are ignored - right now, I got 2 workers on the "recommendvariations"-queue but that should be 1.
I start the workers over bundle exec sidekiq -e production -C config/sidekiq.yml.
Has anyone else ever encountered this?
Did you try to set the limit in a sidekiq.rb initializer file?
Like this:
Sidekiq::Queue['recommend'].limit = 1
It worked for me.
I'm used to using delayed_jobs method of going into the console to see whats in the queue, and the ease of clearing the queue when needed. Are there similar commands in Sidekiq for this? Thanks!
There is an ergonomic API for viewing and managing queues.
It is not required by default.
require 'sidekiq/api'
Here's the excerpt:
# get a handle to the default queue
default_queue = Sidekiq::Queue.new
# get a handle to the mailer queue
mailer_queue = Sidekiq::Queue.new("mailer")
# How many jobs are in the default queue?
default_queue.size # => 1001
# How many jobs are in the mailer queue?
mailer_queue.size # => 50
#Deletes all Jobs in a Queue, by removing the queue.
default_queue.clear
You can also get some summary statistics.
stats = Sidekiq::Stats.new
# Get the number of jobs that have been processed.
stats.processed # => 100
# Get the number of jobs that have failed.
stats.failed # => 3
# Get the queues with name and number enqueued.
stats.queues # => { "default" => 1001, "email" => 50 }
#Gets the number of jobs enqueued in all queues (does NOT include retries and scheduled jobs).
stats.enqueued # => 1051
I haven't ever used Sidekiq, so it's possible that there are methods just for viewing the queued jobs, but they would really just be wrappers around Redis commands, since that's basically all Sidekiq (and Resque) is:
# See workers
Sidekiq::Client.registered_workers
# See queues
Sidekiq::Client.registered_queues
# See all jobs for one queue
Sidekiq.redis { |r| r.lrange "queue:app_queue", 0, -1 }
# See all jobs in all queues
Sidekiq::Client.registered_queues.each do |q|
Sidekiq.redis { |r| r.lrange "queue:#{q}", 0, -1 }
end
# Remove a queue and all of its jobs
Sidekiq.redis do |r|
r.srem "queues", "app_queue"
r.del "queue:app_queue"
end
Unfortunately, removing a specific job is a little more difficult as you'd have to copy its exact value:
# Remove a specific job from a queue
Sidekiq.redis { |r| r.lrem "queue:app_queue", -1, "the payload string stored in Redis" }
You could do all of this even more easily via redis-cli :
$ redis-cli
> select 0 # (or whichever namespace Sidekiq is using)
> keys * # (just to get an idea of what you're working with)
> smembers queues
> lrange queues:app_queue 0 -1
> lrem queues:app_queue -1 "payload"
if there is any scheduled job. You may delete all the jobs using the following command:
Sidekiq::ScheduledSet.new.clear
if there any queues you wanted to delete all jobs you may use the following command:
Sidekiq::Queue.new.clear
Retries Jobs can be removed by the following command also:
Sidekiq::RetrySet.new.clear
There are more information here at the following link, you may checkout:
https://github.com/mperham/sidekiq/wiki/API
There is a API for accessing real-time information about workers, queues and jobs.
Visit here https://github.com/mperham/sidekiq/wiki/API
A workaround is to use the testing module (require 'sidekiq/testing') and to drain the worker (MyWorker.drain).
There were hanged 'workers' in default queue and I was able to see them though web interface. But they weren't available from console if I used Sidekiq::Queue.new.size
irb(main):002:0> Sidekiq::Queue.new.size
2014-03-04T14:37:43Z 17256 TID-oujb9c974 INFO: Sidekiq client with redis options {:namespace=>"sidekiq_staging"}
=> 0
Using redis-cli I was able to find them
redis 127.0.0.1:6379> keys *
1) "sidekiq_staging:worker:ip-xxx-xxx-xxx-xxx:7635c39a29d7b255b564970bea51c026-69853672483440:default"
2) "sidekiq_staging:worker:ip-xxx-xxx-xxx-xxx:0cf585f5e93e1850eee1ae4613a08e45-70328697677500:default:started"
3) "sidekiq_staging:worker:ip-xxx-xxx-xxx-xxx:7635c39a29d7b255b564970bea51c026-69853672320140:default:started"
...
The solution was:
irb(main):003:0> Sidekiq.redis { |r| r.del "workers", 0, -1 }
=> 1
Also in the Sidekiq v3 there is a command
Sidekiq::Workers.new.prune
But for some reason it didn't work for me that day
And if you want to clear the sidekiq retry queue, it's this: Sidekiq::RetrySet.new.clear
$ redis-cli
> select 0 # (or whichever namespace Sidekiq is using)
> keys * # (just to get an idea of what you're working with)
> smembers queues
> lrange queue:queue_name 0 -1 # (queue_name must be your relevant queue)
> lrem queue:queue_name -1 "payload"
Rake task for clear all sidekiq queues:
namespace :sidekiq do
desc 'Clear sidekiq queue'
task clear: :environment do
require 'sidekiq/api'
Sidekiq::Queue.all.each(&:clear)
end
end
Usage:
rake sidekiq:clear
This is not a direct solution for the Rails console, but for a quick monitoring of the Sidekiq task count and queue size, you can use sidekiqmon binary that ships with Sidekiq 6+:
$ sidekiqmon
Sidekiq 6.4.2
2022-07-25 11:05:56 UTC
---- Overview ----
Processed: 20,313,347
Failed: 57,120
Busy: 9
Enqueued: 17
Retries: 0
Scheduled: 37
Dead: 2,382
---- Processes (1) ----
36f993209f93:15:a498f85c6a12 [server]
Started: 2022-07-25 10:49:43 +0000 (16 minutes ago)
Threads: 10 (9 busy)
Queues: default, elasticsearch, statistics
---- Queues (3) ----
NAME SIZE LATENCY
default 0 0.00
elasticsearch 17 0.74
statistics 0 0.00
Using: ruby 1.8.7 (2009-06-12 patchlevel 174) [x86_64-linux]
I have a ruby batch job that the last 2 nights in a row has hung up at about the same time.
The weird thing is when I do a kill -QUIT on the process it frees it up and continues processing.
Here is the stack when I send the SIGQUIT:
Wed Mar 23 2011 11:07:55 SignalException: SIGQUIT: SELECT * FROM `influencers` WHERE (`influencers`.`external_id` = 199884972) LIMIT 1
Wed Mar 23 2011 11:07:55 /usr/lib/ruby/gems/1.8/gems/activerecord-2.3.5/lib/active_record /connection_adapters/abstract_adapter.rb:219:in `log'/usr/lib/ruby/gems/1.8/gems/activerecord- 2.3.5/lib/active_record/connection_adapters/mysql_adapter.rb:323:in `execute'/usr/lib/ruby/gems/1.8 /gems/activerecord-2.3.5/lib/active_record/connection_adapters/mysql_adapter.rb:608:in `select'/usr/lib/ruby/gems/1.8/gems/activerecord-2.3.5/lib/active_record/connection_adapters/abstract/database_statements.rb:7:in `select_all_without_query_cache'/usr/lib/ruby/gems/1.8/gems/activerecord-2.3.5/lib/active_record/connection_adapters/abstract/query_cache.rb:62:in `select_all'/usr/lib/ruby/gems/1.8/gems/activerecord-2.3.5/lib/active_record/base.rb:661:in `find_by_sql'/usr/lib/ruby/gems/1.8/gems/activerecord-2.3.5/lib/active_record/base.rb:1548:in `find_every'/usr/lib/ruby/gems/1.8/gems/activerecord-2.3.5/lib/active_record/base.rb:1505:in `find_initial'/usr/lib/ruby/gems/1.8/gems/activerecord-2.3.5/lib/active_record/base.rb:613:in `find'/usr/lib/ruby/gems/1.8/gems/activerecord-2.3.5/lib/active_record/base.rb:1900:in `find_by_twitter_id'/u/apps/myapp/releases/20110323011051/app/models/influencer.rb:148:in `add_follower'/u/apps/myapp/releases/20110323011051/app/models/influencer.rb:93:in `sync_follower_list'/u/apps/myapp/releases/20110323011051/app/models/influencer.rb:91:in `each'/u/apps/myapp/releases/20110323011051/app/models/influencer.rb:91:in `sync_follower_list'/u/apps/myapp/releases/20110323011051/lib/twitter_helper.rb:379:in `retrieve_followers_of_competitors'/u/apps/myapp/releases/20110323011051/lib/twitter_helper.rb:372:in `each'/u/apps/myapp/releases/20110323011051/lib/twitter_helper.rb:372:in `retrieve_followers_of_competitors'/u/apps/myapp/releases/20110323011051/vendor/gems/will_paginate-2.3.11/lib/will_paginate/finder.rb:168:in `method_missing'/usr/lib/ruby/gems/1.8/gems/activerecord-2.3.5/lib/active_record/associations/association_collection.rb:369:in `method_missing_without_paginate'/usr/lib/ruby/gems/1.8/gems/activerecord-2.3.5/lib/active_record/associations/association_proxy.rb:215:in `method_missing'/usr/lib/ruby/gems/1.8/gems/activerecord-2.3.5/lib/active_record/associations/association_proxy.rb:215:in `each'/usr/lib/ruby/gems/1.8/gems/activerecord-2.3.5/lib/active_record/associations/association_proxy.rb:215:in `send'/usr/lib/ruby/gems/1.8/gems/activerecord-2.3.5/lib/active_record/associations/association_proxy.rb:215:in `method_missing'/usr/lib/ruby/gems/1.8/gems/activerecord-2.3.5/lib/active_record/associations/association_collection.rb:369:in `method_missing_without_paginate'/u/apps/myapp/releases/20110323011051/vendor/gems/will_paginate-2.3.11/lib/will_paginate/finder.rb:168:in `method_missing'/u/apps/myapp/releases/20110323011051/lib/twitter_helper.rb:370:in `retrieve_followers_of_competitors'/u/apps/myapp/releases/20110323011051/lib/twitter_helper.rb:42:in `retrieve_twitter_data'/u/apps/myapp/releases/20110323011051/lib/tasks/fetch_data.rake:19:in `fetch_data'/u/apps/myapp/releases/20110323011051/lib/tasks/fetch_data.rake:11:in `each'/u/apps/myapp/releases/20110323011051/lib/tasks/fetch_data.rake:11:in `fetch_data'/u/apps/myapp/releases/20110323011051/lib/tasks/fetch_data.rake:128/usr/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:636:in `call'/usr/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:636:in `execute'/usr/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:631:in `each'/usr/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:631:in `execute'/usr/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:597:in `invoke_with_call_chain'/usr/lib/ruby/1.8/monitor.rb:242:in `synchronize'/usr/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:590:in `invoke_with_call_chain'/usr/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:583:in `invoke'/usr/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:2051:in `invoke_task'/usr/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:2029:in `top_level'/usr/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:2029:in `each'/usr/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:2029:in `top_level'/usr/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:2068:in `standard_exception_handling'/usr/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:2023:in `top_level'/usr/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:2001:in `run'/usr/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:2068:in `standard_exception_handling'/usr/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:1998:in `run'/usr/lib/ruby/gems/1.8/gems/rake-0.8.7/bin/rake:31/usr/bin/rake:19:in `load'/usr/bin/rake:19
I suspect that from looking at the code im getting some kind of deadlock on the rails logger. Any suggestions on how to troubleshoot. Maybe something to do with rails logger folding. Not sure why this would start in last couple days....
I'm not sure if you use logrotate on your logs, but it could be that the logs get rotated and Rails.logger can't write anything any more. Doing a kill -QUIT would re-open the log file and continue processing.