I am struggling to get Heroku to use Sidekiq as the job handler in production.
Locally - it appears like Sidekiq is taking the email jobs, as the logs below show from Sidekiq's stdout:
2018-09-15T05:10:05.164Z 5256 TID-ovryxfcwc ActionMailer::DeliveryJob JID-156e566ba5bc0136ef18ee45 INFO: start
2018-09-15T05:10:06.838Z 5256 TID-ovryxfcwc ActionMailer::DeliveryJob JID-156e566ba5bc0136ef18ee45 INFO: done: 1.674 sec
2018-09-15T05:10:06.839Z 5256 TID-ovryxfcwc ActionMailer::DeliveryJob JID-c97af1298abad5b8caebfcd4 INFO: start
2018-09-15T05:10:06.935Z 5256 TID-ovryxfcwc ActionMailer::DeliveryJob JID-c97af1298abad5b8caebfcd4 INFO: done: 0.096 sec
2018-09-15T05:10:06.937Z 5256 TID-ovryxfcwc ActionMailer::DeliveryJob JID-0cca4280f85cd9f41325a2af INFO: start
2018-09-15T05:10:07.035Z 5256 TID-ovryxfcwc ActionMailer::DeliveryJob JID-0cca4280f85cd9f41325a2af INFO: done: 0.098 sec
However, in production in Heroku, the logs show that all mail taks are performed by the Web Dyno. I would assume that the Worker Dyno should be processing the ActionMailer job, but never appears to get to sidekiq.
Under config/initializers/active_job.rb I have:
Rails.application.config.active_job.queue_adapter = :sidekiq
Where am I going wrong getting my application to work in production with sidekiq the way it does in test?
It was me being stupid and out of practice - of course the worker is running sidekiq - and it has to call the web worker to actually do the job.
Confirmed by looking at the stats by enabling the sidekiq route to check the jobs were run.
Related
Our Environment:
Jenkins version - Jenkins 2.319.1
Jenkins Master image : jenkins/jenkins:2.319.1-lts-alpine
Jenkins worker image: jenkins/inbound-agent:4.11-1-alpine
Installed plugins:
Kubernetes - 1.30.6
Kubernetes Client API - 5.4.1
Kubernetes Credentials Plugin - 0.9.0
JAVA version on master: openjdk 11.0.13
JAVA version on Agent/worker : openjdk 11.0.14
Hi team,
We are facing issue in jenkins where jenkins agent disconnects(or goes offline) from master while still job is running on agent/worker. We are getting below error(highlighted) and tried below things but issue is still not resolving fully. Jenkins is deployed on EKS.
Error:
5334535:2022-11-02 14:07:54.573+0000 [id=140290] INFO hudson.slaves.NodeProvisioner#update: worker-7j4x4 provisioning successfully completed. We have now 2 computer(s)
5334695:2022-11-02 14:07:54.675+0000 [id=140291] INFO o.c.j.p.k.KubernetesLauncher#launch: Created Pod: kubernetes done-jenkins/worker-7j4x4
5334828:2022-11-02 14:07:56.619+0000 [id=140291] INFO o.c.j.p.k.KubernetesLauncher#launch: Pod is running: kubernetes done-jenkins/worker-7j4x4
5334964-2022-11-02 14:07:58.650+0000 [id=140309] INFO h.TcpSlaveAgentListener$ConnectionHandler#run: Accepted JNLP4-connect connection #97 from /100.122.254.111:42648
5335123-2022-11-02 14:09:19.733+0000 [id=140536] INFO hudson.model.AsyncPeriodicWork#lambda$doRun$1: Started DockerContainerWatchdog Asynchronous Periodic Work
5335275-2022-11-02 14:09:19.733+0000 [id=140536] INFO c.n.j.p.d.DockerContainerWatchdog#execute: Docker Container Watchdog has been triggered
5335409-2022-11-02 14:09:19.734+0000 [id=140536] INFO c.n.j.p.d.DockerContainerWatchdog$Statistics#writeStatisticsToLog: Watchdog Statistics: Number of overall executions: 2608, Executions with processing timeout: 0, Containers removed gracefully: 0, Containers removed with force: 0, Containers removal failed: 0, Nodes removed successfully: 0, Nodes removal failed: 0, Container removal average duration (gracefully): 0 ms, Container removal average duration (force): 0 ms, Average overall runtime of watchdog: 0 ms, Average runtime of container retrieval: 0 ms
5335965-2022-11-02 14:09:19.734+0000 [id=140536] INFO c.n.j.p.d.DockerContainerWatchdog#loadNodeMap: We currently have 1 nodes assigned to this Jenkins instance, which we will check
5336139-2022-11-02 14:09:19.734+0000 [id=140536] INFO c.n.j.p.d.DockerContainerWatchdog#execute: Docker Container Watchdog check has been completed
5336279-2022-11-02 14:09:19.734+0000 [id=140536] INFO hudson.model.AsyncPeriodicWork#lambda$doRun$1: Finished DockerContainerWatchdog Asynchronous Periodic Work. 1 ms
5336438-groovy.lang.MissingPropertyException: No such property: envVar for class: groovy.lang.Binding
5336532- at groovy.lang.Binding.getVariable(Binding.java:63)
5336585- at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxInterceptor.onGetProperty(SandboxInterceptor.java:271)
–
5394279-2022-11-02 15:09:19.733+0000 [id=141899] INFO hudson.model.AsyncPeriodicWork#lambda$doRun$1: Started DockerContainerWatchdog Asynchronous Periodic Work
5394431-2022-11-02 15:09:19.734+0000 [id=141899] INFO c.n.j.p.d.DockerContainerWatchdog#execute: Docker Container Watchdog has been triggered
5394565-2022-11-02 15:09:19.734+0000 [id=141899] INFO c.n.j.p.d.DockerContainerWatchdog$Statistics#writeStatisticsToLog: Watchdog Statistics: Number of overall executions: 2620, Executions with processing timeout: 0, Containers removed gracefully: 0, Containers removed with force: 0, Containers removal failed: 0, Nodes removed successfully: 0, Nodes removal failed: 0, Container removal average duration (gracefully): 0 ms, Container removal average duration (force): 0 ms, Average overall runtime of watchdog: 0 ms, Average runtime of container retrieval: 0 ms
5395121-2022-11-02 15:09:19.734+0000 [id=141899] INFO c.n.j.p.d.DockerContainerWatchdog#loadNodeMap: We currently have 3 nodes assigned to this Jenkins instance, which we will check
5395295-2022-11-02 15:09:19.734+0000 [id=141899] INFO c.n.j.p.d.DockerContainerWatchdog#execute: Docker Container Watchdog check has been completed
5395435-2022-11-02 15:09:19.734+0000 [id=141899] INFO hudson.model.AsyncPeriodicWork#lambda$doRun$1: Finished DockerContainerWatchdog Asynchronous Periodic Work. 1 ms
5395594-2022-11-02 15:11:59.502+0000 [id=140320] INFO hudson.slaves.ChannelPinger$1#onDead: Ping failed. Terminating the channel JNLP4-connect connection from ip-100-122-254-111.eu-central-1.compute.internal/100.122.254.111:42648.
5395817-java.util.concurrent.TimeoutException: Ping started at 1667401679501 hasn't completed by 1667401919502
5395920- at hudson.remoting.PingThread.ping(PingThread.java:134)
5395977- at hudson.remoting.PingThread.run(PingThread.java:90)
5396032:2022-11-02 15:11:59.503+0000 [id=141914] INFO j.s.DefaultJnlpSlaveReceiver#channelClosed: Computer.threadPoolForRemoting 5049 for worker-7j4x4 terminated: java.nio.channels.ClosedChannelException
5396231-2022-11-02 15:12:35.579+0000 [id=141933] INFO hudson.model.AsyncPeriodicWork#lambda$doRun$1: Started Periodic background build discarder
5396368-2022-11-02 15:12:36.257+0000 [id=141933] INFO hudson.model.AsyncPeriodicWork#lambda$doRun$1: Finished Periodic background build discarder. 678 ms
5396514-2022-11-02 15:14:15.582+0000 [id=141422] INFO hudson.slaves.ChannelPinger$1#onDead: Ping failed. Terminating the channel JNLP4-connect connection from ip-100-122-237-38.eu-central-1.compute.internal/100.122.237.38:55038.
5396735-java.util.concurrent.TimeoutException: Ping started at 1667401815582 hasn't completed by 1667402055582
5396838- at hudson.remoting.PingThread.ping(PingThread.java:134)
5396895- at hudson.remoting.PingThread.run(PingThread.java:90)
5396950-2022-11-02 15:14:15.584+0000 [id=141915] INFO j.s.DefaultJnlpSlaveReceiver#channelClosed: Computer.threadPoolForRemoting 5050 for worker-fjf1p terminated: java.nio.channels.ClosedChannelException
****5397149-2022-11-02 15:14:19.733+0000 [id=141950] INFO hudson.model.AsyncPeriodicWork#lambda$doRun$1: Started DockerContainerWatchdog Asynchronous Periodic Work
5397301-2022-11-02 15:14:19.733+0000 [id=141950] INFO c.n.j.p.d.DockerContainerWatchdog#execute: Docker Container Watchdog has been triggered
5397435-2022-11-02 15:14:19.734+0000 [id=141950] INFO c.n.j.p.d.DockerContainerWatchdog$Statistics#writeStatisticsToLog: Watchdog Statistics: Number of overall executions: 2621, Executions with processing timeout: 0, Containers removed gracefully: 0, Containers removed with force: 0, Containers removal failed: 0, Nodes removed successfully: 0, Nodes removal failed: 0, Container removal average duration (gracefully): 0 ms, Container removal average duration (force): 0 ms, Average overall runtime of watchdog: 0 ms, Average runtime of container retrieval: 0 ms
Any suggestion or resolutions pls.
Tried below things:
Increased idleMinutes to 180 from default
Verified that resources are sufficient as per graphana dashboard
Changed podRetention to onFailure from Never
Changed podRetention to Always from Never
Increased readTimeout
Increased connectTimeout
Increased slaveConnectTimeoutStr
Disabled the ping thread from UI via disabling “response time" checkbox from preventive node monitroing
Increased activeDeadlineSeconds
Verified same java version on master and agent
Updated kubernetes and kubernetes API client plugins
Expectation is worker/agent should disconnect once job is successfully ran and after idleMinutes defined it should terminate but few times its terminating while job is still running on agent
As you can see in the stack trace below, Reminders::FindStaleJobsJob is causing a problem because of the uninitialized constant Reminders. What I don't get is that I don't call Reminders::FindStaleJobsJob anywhere; rather, I call Recaps::FindStaleJobsJob.
I have flushed out the Sidekiq queue and still get this error repeatedly.
2018-09-25T17:45:14.539Z 12784 TID-oxxicof3s INFO: Running in ruby 2.4.1p111 (2017-03-22 revision 58053) [x86_64-darwin17]
2018-09-25T17:45:14.539Z 12784 TID-oxxicof3s INFO: See LICENSE and the LGPL-3.0 for licensing details.
2018-09-25T17:45:14.539Z 12784 TID-oxxicof3s INFO: Upgrade to Sidekiq Pro for more features and support: http://sidekiq.org
2018-09-25T17:45:14.541Z 12784 TID-oxxicof3s INFO: Starting processing, hit Ctrl-C to stop
2018-09-25T18:00:05.107Z 12784 TID-oxxi975os Recaps::FindStaleJobsJob JID-ec113586e3f8fe72eb3ca479 INFO: start
2018-09-25T18:00:05.135Z 12784 TID-oxxim1crg ActiveJob::QueueAdapters::SidekiqAdapter::JobWrapper JID-4bc5f87567ca3f019b2015e4 INFO: start
2018-09-25T18:00:05.136Z 12784 TID-oxxi970ss Recaps::FindStaleJobsJob JID-3125783fd5da7604b95bb813 INFO: start
2018-09-25T18:00:05.155Z 12784 TID-oxxim1crg ActiveJob::QueueAdapters::SidekiqAdapter::JobWrapper JID-4bc5f87567ca3f019b2015e4 INFO: fail: 0.02 sec
2018-09-25T18:00:05.155Z 12784 TID-oxxim1crg WARN: {"context":"Job raised exception","job":{"class":"ActiveJob::QueueAdapters::SidekiqAdapter::JobWrapper","queue":"default","description":"","args":[{"job_class":"Reminders::FindStaleJobsJob","job_id":"d6161fcf-2abd-4e2b-8946-73668a78282f","queue_name":"default","arguments":[]}],"retry":true,"jid":"4bc5f87567ca3f019b2015e4","created_at":1537898405.1336598,"enqueued_at":1537898405.133705},"jobstr":"{\"class\":\"ActiveJob::QueueAdapters::SidekiqAdapter::JobWrapper\",\"queue\":\"default\",\"description\":\"\",\"args\":[{\"job_class\":\"Reminders::FindStaleJobsJob\",\"job_id\":\"d6161fcf-2abd-4e2b-8946-73668a78282f\",\"queue_name\":\"default\",\"arguments\":[]}],\"retry\":true,\"jid\":\"4bc5f87567ca3f019b2015e4\",\"created_at\":1537898405.1336598,\"enqueued_at\":1537898405.133705}"}
2018-09-25T18:00:05.155Z 12784 TID-oxxim1crg WARN: NameError: uninitialized constant Reminders
My Sidekiq cron initializer:
#/config/initializers/sidekiq_cron_scheduler.rb
jobs_hash = {
'recap' => {
'class' => 'Recaps::FindStaleJobsJob',
'cron' => '0, 15, 30, 45 * * * *',
'active_job' => true
}
}
Sidekiq::Cron::Job.load_from_hash jobs_hash
Am I doing something silly and obvious?
Something was hung up somewhere. I removed the Recaps module from the sidekiq-cron initializer and let it fail on that. Then I reintroduced the module name and with a few redis-cli flushall commands sprinkled and there, and everything seems to be working fine.
I set up the release plugin on my Grails project and successfully ran it on my localhost.
When I try to set up the same build in Jenkins, the build hangs indefinitely. The last thing in the output before it hangs is the checkCommitNeeded step.
Anything I can do to figure out what's going wrong?
I have set -Prelease.useAutomaticVersion=true and the two version params in switches, as mentioned in the plugin docs.
Update
On the researchgate Gitter, Christian Gonzalez mentioned that Jenkins is detecting another commit caused by the release plugin, and getting itself stuck in a loop. For Git, an additional behavior can be added to ignore changes committed by the plugin. However, my project is using SVN.
Update
Below is a snippet of the output from adding -d
11:12:48.907 [DEBUG] [org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter] Executing actions for task ':checkCommitNeeded'.
11:12:48.908 [INFO] [org.gradle.api.Project] Running [svn, status] in [/var/lib/jenkins/jobs/MyTeam/jobs/MyProject/jobs/MyProject-release/workspace]
11:12:48.924 [INFO] [org.gradle.api.Project] Running [svn, status] produced output: []
11:12:48.926 [DEBUG] [org.gradle.api.internal.tasks.execution.ExecuteAtMostOnceTaskExecuter] Finished executing task ':checkCommitNeeded'
11:12:48.926 [INFO] [org.gradle.execution.taskgraph.AbstractTaskPlanExecutor] :checkCommitNeeded (Thread[Daemon worker,5,main]) completed. Took 0.02 secs.
11:12:48.926 [DEBUG] [org.gradle.internal.operations.DefaultBuildOperationWorkerRegistry] Worker root.3 completed (0 in use)
11:12:48.926 [DEBUG] [org.gradle.internal.operations.DefaultBuildOperationWorkerRegistry] Worker root.4 started (1 in use).
11:12:48.926 [INFO] [org.gradle.execution.taskgraph.AbstractTaskPlanExecutor] :checkUpdateNeeded (Thread[Daemon worker,5,main]) started.
11:12:48.927 [LIFECYCLE] [class org.gradle.internal.buildevents.TaskExecutionLogger] :myproject:checkUpdateNeeded
11:12:48.927 [DEBUG] [org.gradle.api.internal.tasks.execution.ExecuteAtMostOnceTaskExecuter] Starting to execute task ':checkUpdateNeeded'
11:12:48.927 [DEBUG] [org.gradle.api.internal.tasks.execution.SkipUpToDateTaskExecuter] Determining if task ':checkUpdateNeeded' is up-to-date
11:12:48.927 [INFO] [org.gradle.api.internal.tasks.execution.SkipUpToDateTaskExecuter] Executing task ':checkUpdateNeeded' (up-to-date check took 0.0 secs) due to:
Task has not declared any outputs.
11:12:48.927 [DEBUG] [org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter] Executing actions for task ':checkUpdateNeeded'.
11:12:48.928 [INFO] [org.gradle.api.Project] Running [svn, status, -q, -u] in [/var/lib/jenkins/jobs/MyTeam/jobs/MyProject/jobs/MyProject-release/workspace]
11:12:51.477 [DEBUG] [org.gradle.launcher.daemon.server.Daemon] DaemonExpirationPeriodicCheck running
11:12:51.479 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Waiting to acquire shared lock on daemon addresses registry.
11:12:51.480 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Lock acquired.
11:12:51.481 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Releasing lock on daemon addresses registry.
11:13:01.477 [DEBUG] [org.gradle.launcher.daemon.server.Daemon] DaemonExpirationPeriodicCheck running
11:13:01.477 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Waiting to acquire shared lock on daemon addresses registry.
11:13:01.478 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Lock acquired.
11:13:01.480 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Releasing lock on daemon addresses registry.
11:13:11.477 [DEBUG] [org.gradle.launcher.daemon.server.Daemon] DaemonExpirationPeriodicCheck running
11:13:11.477 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Waiting to acquire shared lock on daemon addresses registry.
11:13:11.477 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Lock acquired.
11:13:11.479 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Releasing lock on daemon addresses registry.
...
The last 4 lines are repeated over and over.
I faced the same issue. For me, the reason was that I did the wrong setup configuration for the project. Example: wrong GitHub URL (without .git extension) added, incorrect Poll SCM config, etc.
Fix for me was to restart the Jenkins server & correct the changes under 'Manage' for your project & again build.
When I look to /sidekiq, I can see in the section called Retries that there's a job that has not been processed - failed.
Queue default
Job Order.order_report_for_manufacturers
Arguments
JID d25956cdd486335ecaf8a186
Created At about 5 hours ago
Enqueued about an hour ago
Retry Count 10
Last Retry about an hour ago
Next Retry about 2 hours from now
It's confusing for me, because when I search in my project the method order_report_for_manufacturers, the editor (TextMate) doesn't find anything. Then I though that maybe this method might be used in a CRON job, so I logged in to my Ubuntu server, run crontab -l but there's nothing (no rake task) that would use a method with such a name.
I tried also restart the server (and Sidekiq), but in a while this job (with this failed method) is there again.
How do I find from where is the method called? (it's not in the Order model).
Thank you.
EDIT:
Sidekiq log:
2016-05-15T14:23:40.910Z 25380 TID-163go0 Sidekiq::Extensions::DelayedClass JID-d25956cdd486335ecaf8a186 INFO: fail: 0.006 sec
2016-05-15T14:23:40.910Z 25380 TID-163go0 WARN: {"class"=>"Sidekiq::Extensions::DelayedClass", "args"=>["---\n- !ruby/class 'Order'\n- :order_report_for_manufacturers\n- []\n"], "retry"=>true, "queue"=>"default", "jid"=>"d25956cdd486335ecaf8a186", "created_at"=>1463316907.739199, "enqueued_at"=>1463322220.9033682, "error_message"=>"undefined method `order_report_for_manufacturers' for #<Class:0x000000036eb740>", "error_class"=>"NoMethodError", "failed_at"=>1463316955.7728505, "retry_count"=>8, "retried_at"=>1463322220.9097943}
2016-05-15T14:23:40.911Z 25380 TID-163go0 WARN: NoMethodError: undefined method `order_report_for_manufacturers' for #<Class:0x000000036eb740>
2016-05-15T14:23:40.911Z 25380 TID-163go0 WARN: /home/deployer/apps/myapp-production/shared/bundle/ruby/2.2.0/gems/activerecord-4.2.6/lib/active_record/dynamic_matchers.rb:26:in `method_missing'
EDIT2:
I am using Sidekiq as an Upstart service (Ubuntu server). I am just running the command ps aux | grep '[s]idekiq', I get this output:
deployer 6647 0.4 15.8 1162920 611080 ? Ssl May03 73:59 sidekiq 4.1.1 myapp-production [0 of 25 busy]
deployer 6659 0.4 18.2 1132336 702620 ? Ssl May03 75:06 sidekiq 4.1.1 myapp-production [0 of 25 busy]
deployer 25380 0.4 8.5 936444 329260 ? Ssl May14 5:14 sidekiq 4.1.1 myapp-production [0 of 25 busy]
I don't know why Sidekiq is there three times; I know that I restarted sidekiq yesterday (sudo stop sidekiq index=0 and sudo start sidekiq index=0) - that's the one on the third line.
What are the first two rows I have no idea. Redis is running on a separated server and communicate with this server (where is the application + Sidekiq).
I'm using the latest version of Sidekiq (2.15.2). I'm not sure how this happened but for some reason, my Sidekiq dashboard always has 2 busy processes.
When I click on busy, I just get the message, "Internal Server Error."
I tried stopping and restarting Sidekiq, but I still get the 2 busy processes. Does anyone have any suggestions? Thanks!
Here is my log after resetting Sidekiq.
2013-11-13T05:09:02Z 2508 TID-osg9yldog INFO: Received USR1, no longer accepting new work
2013-11-13T05:09:02Z 2508 TID-osgbbuwe4 INFO: Shutting down 25 quiet workers
2013-11-13T05:09:04Z 2593 TID-ow9gk2yd0 INFO: Received USR1, no longer accepting new work
2013-11-13T05:09:04Z 2593 TID-ow9hyfa04 INFO: Shutting down 25 quiet workers
2013-11-13T05:09:32Z 2508 TID-osg9yldog INFO: Shutting down
2013-11-13T05:09:32Z 2508 TID-osgbbuwe4 INFO: Shutting down 0 quiet workers
2013-11-13T05:09:36Z 2593 TID-ow9gk2yd0 INFO: Shutting down
2013-11-13T05:09:36Z 2593 TID-ow9hyfa04 INFO: Shutting down 0 quiet workers
2013-11-13T05:10:33Z 15613 TID-osg19rhj4 INFO: Booting Sidekiq 2.15.2 using redis://localhost:6379/0 with options {}
2013-11-13T05:10:33Z 15613 TID-osg19rhj4 INFO: Running in ruby 2.0.0p247 (2013-06-27 revision 41674) [x86_64-linux]
2013-11-13T05:10:33Z 15613 TID-osg19rhj4 INFO: See LICENSE and the LGPL-3.0 for licensing details.
2013-11-13T05:10:33Z 15613 TID-osg19rhj4 INFO: Starting processing, hit Ctrl-C to stop
2013-11-13T05:10:33Z 15698 TID-ox8r4hzlg INFO: Booting Sidekiq 2.15.2 using redis://localhost:6379/0 with options {}
2013-11-13T05:10:33Z 15698 TID-ox8r4hzlg INFO: Running in ruby 2.0.0p247 (2013-06-27 revision 41674) [x86_64-linux]
2013-11-13T05:10:33Z 15698 TID-ox8r4hzlg INFO: See LICENSE and the LGPL-3.0 for licensing details.
2013-11-13T05:10:33Z 15698 TID-ox8r4hzlg INFO: Starting processing, hit Ctrl-C to stop
If you are using a default redis.conf you'll find the following lines:
save 900 1
save 300 10
save 60 10000
dbfilename dump.rdb
dir ./
This means Redis will write on those periods a snapshot of the current status.
When you are developing probably you'd like to delete this file or commenting all the save lines as per redis.conf document before restarting the server again, or it will try to pick up any previous activity.