I've created some job (weather forecasting) and it is a heavy load, mostly CPU and memory, for a long(er) time. I notice that when I'm running the job from the cli I can still use my browser without stuttering. But when I move the same job to a cron job there are stutters all over the place.
I think this has to do with the way that CFS scheduling from the kernel will group processes (by tty). See e.g. here for documentation.
Now that link does provide some pointers on how to fix it, possibly. But I was wondering if anyone has already done such a thing and what the results were.
Linux xyz 4.4.0-170-generic #199-Ubuntu SMP Thu Nov 14 01:45:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Related
This Jenkins installation is running on a GCE since the beginning of 2020, pipelines, plugins and other configurations are working without issue.
After a day of experimenting with it to introduce some new OS-level additional features that should have taken 5 minutes and ended up taking the whole day, I realized I was hitting a wall and decided to clean the board by deleting the VM and creating a new one using a snapshot taken back in november, last time the VM was properly working without modification. This particular Jenkins installation is used to build staging version of our internal applications so I wasn't that concerned with downtime and such.
After creating a new VM, same specifications of the previous one, running Debian 10, and assigning the snapshot as source for the disk and went to reload the dashboard and got surprised with this:
Logging in the vm itself I find that everything directory/file wise is there but running sudo systemctl status jenkins returns this:
● jenkins.service - LSB: Start Jenkins at boot time
Loaded: loaded (/etc/init.d/jenkins; generated; vendor preset: enabled)
Active: failed (Result: exit-code) since Tue 2021-01-05 17:54:55 UTC; 5s ago
Docs: man:systemd-sysv-generator(8)
Process: 653 ExecStart=/etc/init.d/jenkins start (code=exited, status=7)
Tasks: 0 (limit: 4915)
CGroup: /system.slice/jenkins.service
Jan 05 17:52:34 jenkins-1-vm systemd[1]: Starting LSB: Start Jenkins at boot time...
Jan 05 17:52:40 jenkins-1-vm jenkins[653]: Correct java version found
Jan 05 17:52:41 jenkins-1-vm su[767]: Successful su for jenkins by root
Jan 05 17:52:41 jenkins-1-vm su[767]: + ??? root:jenkins
Jan 05 17:52:41 jenkins-1-vm su[767]: pam_unix(su:session): session opened for user jenkins by (uid=0)
Jan 05 17:54:55 jenkins-1-vm jenkins[653]: Starting Jenkins Automation Server: jenkins failed!
Jan 05 17:54:55 jenkins-1-vm systemd[1]: jenkins.service: Control process exited, code=exited status=7
Jan 05 17:54:55 jenkins-1-vm systemd[1]: Failed to start LSB: Start Jenkins at boot time.
Jan 05 17:54:55 jenkins-1-vm systemd[1]: jenkins.service: Unit entered failed state.
Jan 05 17:54:55 jenkins-1-vm systemd[1]: jenkins.service: Failed with result 'exit-code'.
I started searching on google, basically spending the last 2 hours on this, and found nothing relevant apart a lot of articles mentioning using Java8 which cannot be applying to this case as java is there and the log itself says Correct java version found.
As a last attempt I tried to apt purge jenkins and reinstall it and after that everything works but, of course, everything is also wiped out. So I created another vm and before attempting anything else, decided to ask here for help.
Is there something in Jenkins that could not being brought over in a snapshot of the disk and cause this terrible Failed to start LSB: Start Jenkins at boot time. message? What can I try to fix this and restore it?
Adding more information: Trying to launch jenkins via the .war file (java -jar /usr/share/jenkins/jenkins.war) works but start it as if it's a new installation, asking for an admin password and all the rest, ignoring the existing config.xml and all the rest already present in /var/lib/jenkins.
I have had a similar experience with Jenkins running on a GCE VM. I have not finished solving the problem, but I have managed to get Jenkins running without reconfiguring everything again.
After stepping through the start-up script over a few hours I found a spot where it disappeared into a hole and came back as a failure. By looking at the steps after the failure I was able to get the system going again from first principles.
The commands I ended up running (and should really stick in a script because my Jenkins instance will not start after a system reboot with the same fingerprint you are getting). This is a Debian 10 system running in GCE.
. /etc/default/jenkins
DAEMON_ARGS="--name=$NAME --inherit --env=JENKINS_HOME=$JENKINS_HOME --output=$JENKINS_LOG --pidfile=$PIDFILE"
DAEMON=/usr/bin/daemon
SU=/bin/su
JAVA=`type -p java`
$SU -l $JENKINS_USER --shell=/bin/bash -c "$DAEMON $DAEMON_ARGS -- $JAVA $JAVA_ARGS -jar $JENKINS_WAR $JENKINS_ARGS"
At this point Jenkins is running and answers to my web browser call.
I've been using rails server with Rails 6.0.1 on Mac OS Catalina. I've noticed that if start the server (whether using Puma or unicorn), and shut it down, and try to shut down the computer, it just hangs until Apple's watchdog forcefully shuts down the system. Upon the next bootup, I always get the same crash report.
panic(cpu 2 caller 0xffffff7f8ef9daae): watchdog timeout: no checkins from watchdogd in 187 seconds (21 totalcheckins since monitoring last enabled), shutdown in progress
Backtrace (CPU 2), Frame : Return Address
0xffffff83b7473c40 : 0xffffff800e539a3b
0xffffff83b7473c90 : 0xffffff800e670fe5
0xffffff83b7473cd0 : 0xffffff800e662a5e
0xffffff83b7473d20 : 0xffffff800e4e0a40
0xffffff83b7473d40 : 0xffffff800e539127
0xffffff83b7473e40 : 0xffffff800e53950b
0xffffff83b7473e90 : 0xffffff800ecd1875
0xffffff83b7473f00 : 0xffffff7f8ef9daae
0xffffff83b7473f10 : 0xffffff7f8ef9d472
0xffffff83b7473f50 : 0xffffff7f8efb2e76
0xffffff83b7473fa0 : 0xffffff800e4e013e
Kernel Extensions in backtrace:
com.apple.driver.watchdog(1.0)[AA44EEB8-57FA-3CAC-9105-C7AB21900B9A]#0xffffff7f8ef9c000->0xffffff7f8efa4fff
com.apple.driver.AppleSMC(3.1.9)[6DA4BDC6-9C64-34B3-A60E-D345D2DC2D5F]#0xffffff7f8efa5000->0xffffff7f8efc3fff
dependency: com.apple.driver.watchdog(1)[AA44EEB8-57FA-3CAC-9105-C7AB21900B9A]#0xffffff7f8ef9c000
dependency: com.apple.iokit.IOACPIFamily(1.4)[4A40B298-87E0-373E-84A9-9A2227924F8F]#0xffffff7f8ef07000
dependency: com.apple.iokit.IOPCIFamily(2.9)[AA7C7A4F-9F5D-3533-9E78-177C3B6A72BF]#0xffffff7f8ef10000
BSD process name corresponding to current thread: kernel_task
Boot args: chunklist-security-epoch=0 -chunklist-no-rev2-dev
Mac OS version:
19B88
Kernel version:
Darwin Kernel Version 19.0.0: Thu Oct 17 16:17:15 PDT 2019; root:xnu-6153.41.3~29/RELEASE_X86_64
Has anyone else seen this problem and how did you go about fixing it? My guess is the rails server leaves some processes running even after it's shut down via Ctrl-C that's preventing the OS from shutting down correctly.
Not really a professional method, but...
open a terminal (privileged if at all possible)
take a snapshot using "ps" ("ps awxu" I seem to remember) of the running processes
start the Rails server
tinker a bit
now stop the server
take another snapshot
I fully expect some low-level background process to have been left running and not listening to shutdown signals. MacOS shutdown process is maybe too well-behaved and polite for its own good.
Should this be the case, get the PID or name of the process(es) and try pkilling it with HUP, TERM, and finally KILL signals. You can get a good idea of where those processes started from by checking their image path (be careful not to kill innocent processes).
Wait some time to be sure that pkilling the process didn't leave the system in an unstable state, then try shutting down the machine and see how it went.
This is a common Catalina issue, and Apple clearly doesn't care about it. You can see this stackexchange thread, and Apple forum discussion.
So far, there's no fix available, but resetting SMC/NVRAM can give You a few proper shutdowns.
One way to not being impacted by this issue is to use Docker and docker-compose.
I know it doesn't solve your root issue, but with Docker you're OS agnostic so that you can work on any OSes and still your project works if you can install Docker. You'll also survive OS upgrades.
Docker is now very common and popular so you can have a lot of help from the community, there are plenty of blog articles explaining how to containerise a Rails application.
I also found this problem .
This problem is about Catalina support for Graphic Processing Unit
If you are using the NVIDIA Graphic chip , Catalina is no problem,
but there will be problems if you are using AMD in Catalina
Especially the 2015 macbook pro
Many APP provider are already compatible with this issue,
but Apple company has not responded .
For users this is an issue for Apple .
In Ubuntu 18 (64 bit), the running processes start/load address seemed to be randomized each time the same application is run - it no longer starts at 0x400000. May I know if this is caused the ASLR enabled? In Ubuntu 18, I need to set the ASLR to 0 in order the for start address to be fixed each time the same application is executed, but in Ubuntu 16 and below, this is not necessary.
What has changed in Ubuntu 18?
As you know, side-channel attacks due to CPU architecture issues were all over the news recently. In order to mitigate these types of attacks, the Kernel Page Table Isolation (previously called KAISER) patch set was developed and merged into the linux kernel 4.15RC6.
Ubuntu 18.04 used kernel 4.15 on initial release, which explains why ASLR is enabled by default in Ubuntu 18.04 and later.
I am having some performance problems when I am starting Jenkins inside Kubernetes cluster.
One of the points that sometimes occurs and it takes so much time is next operation:
INFO: Finished Download metadata. 1,397 ms
In this case, it is just 1 second but sometimes it takes like 40 seconds. I have tried to find this log message in Jenkins core but I have not found it, so I suspect it is some plugin. My question where is this happening, what is doing and why it is required.
Thanks.
Feb 10, 2018 2:04:22 PM hudson.model.AsyncPeriodicWork$1 run
INFO: Started Download metadata
Feb 10, 2018 2:04:22 PM hudson.model.AsyncPeriodicWork$1 run
INFO: Finished Download metadata. 4 ms
Believe you are referring to the logs like the one above. If yes, these are the log rotation strategy logs thats gets executed through AsyncPeriodicWork class and it is configured in Jenkins specifically for discarding Old Builds.
Following image gives you the configuration in Jenkins UI
You can appropriately configure this based on your project requirements, if you feel this is impacting your startup time.
On Ubuntu 16.04 Server (Kernel 4.4.0-22) it takes 2-5 minutes to initialize the "random: nonblocking pool" according to /var/log/syslog, compared to Ubuntu 14.04:
May 28 18:10:42 foo kernel: [ 277.447574] random: nonblocking pool is initialized
This happened a lot faster on Ubuntu 14.04 (Kernel 3.13.0-79):
May 27 06:28:56 foo kernel: [ 14.859194] random: nonblocking pool is initialized
I observed this on DigitalOcean VMs. It's causing trouble for Rails applications because the unicorn server seems to wait for this pool to become available before starting up.
What is a reasonable time for this initialization step?
Why would it take so much longer on Ubuntu 16.04?
Is it reasonable for an application to wait for this pool to become available or might the dependency on the pool be a bug on the application side?
"apt-get install rng-tools" which makes Ubuntu use available hardware number generators fixes this issue - the pool will be ready in 10s instead of minutes then.