we have a Rails app that runs using Apache -> Passenger. At least once a week, our alerts that monitor server CPU and RAM start getting triggered on one or more of our app servers, and the root cause is that one or more of the Passenger processes are taking up a large chunk of the server CPU and RAM , without actually serving any requests.
for example, when i run "passenger-status" on the server that triggers these alerts, i see this:
Version : 5.3.1
Date : 2022-06-03 22:00:13 +0000
Instance: (Apache/2.4.51 (Amazon) OpenSSL/1.0.2k-fips Phusion_Passenger/5.3.1)
----------- General information -----------
Max pool size : 12
App groups : 1
Processes : 9
Requests in top-level queue : 0
----------- Application groups -----------
Requests in queue: 0
* PID: 16915 Sessions: 1 Processed: 3636 Uptime: 3h 2m 30s
CPU: 5% Memory : 1764M Last used: 0s ago
* PID: 11275 Sessions: 0 Processed: 34 Uptime: 55m 24s
CPU: 45% Memory : 5720M Last used: 35m 43s ago
...
see how the 2nd process hasn't been used for > 35 minutes but is taking up so much of the server resources?
the only solution has been to manually kill the PID which seems to resolve the issue, but is there a way to automate this check?
i also realize that the Passenger version is old and can be upgraded (which I will get done soon) but i have seen this issue in multiple versions prior to the current version, so i wasn't sure if an upgrade by itself is guaranteed to resolve this or not.
I'm trying to deploy the entire Spring Cloud Data Flow platform to a MicroK8s cluster running on one of our server, a VM with Ubuntu 20.04. Before starting performing actions on the target server, I tried to deploy it on my local computer (same OS) and I even succeeded and created/run one stream. Nevertheless, I am currently experiencing an error both on my local computer and on the VM, and I can't manage to pinpoint the root cause.
My current situation:
I'm following the official guide for deploying SCDF using kubectl, only difference being that I'm using tag v2.9.4, latest at the time of writing, instead of v2.9.1. I also skipped the configuration of monitoring frameworks, and hence commented the relevant lines in the configuration of SCDF server, as suggested in the docs. Kafka message broker and MySQL database are deployed without issues.
But, after executing kubectl commands to create config map, service and deployment for Skipper, I can see that Skipper pod goes in status "CrashLoopBackOff". Checking the logs of the pod, the only thing I see is that the application is terminated right after it seems to have started:
[...]
2022-04-11 15:00:11.713 INFO 1 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port(s): 7577 (http) with context path ''
2022-04-11 15:00:11.907 INFO 1 --- [ main] o.s.c.s.s.app.SkipperServerApplication : Started SkipperServerApplication in 78.901 seconds (JVM running for 82.435)
2022-04-11 15:00:12.531 INFO 1 --- [ionShutdownHook] o.s.s.s.DefaultStateMachineService : Entering stop sequence, stopping all managed machines
2022-04-11 15:00:12.617 INFO 1 --- [ionShutdownHook] j.LocalContainerEntityManagerFactoryBean : Closing JPA EntityManagerFactory for persistence unit 'default'
2022-04-11 15:00:12.703 INFO 1 --- [ionShutdownHook] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Shutdown initiated...
2022-04-11 15:00:12.799 INFO 1 --- [ionShutdownHook] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Shutdown completed.
Native Memory Tracking:
Total: reserved=961864767, committed=325411903
- Java Heap (reserved=356515840, committed=138334208)
(mmap: reserved=356515840, committed=138334208)
- Class (reserved=269444100, committed=94409732)
(classes #17623)
( instance classes #16455, array classes #1168)
(malloc=3355652 #45645)
(mmap: reserved=266088448, committed=91054080)
( Metadata: )
( reserved=79691776, committed=78340096)
( used=76414680)
( free=1925416)
( waste=0 =0.00%)
( Class space:)
( reserved=186396672, committed=12713984)
( used=11544696)
( free=1169288)
( waste=0 =0.00%)
- Thread (reserved=14794856, committed=1323112)
(thread #14)
(stack: reserved=14729216, committed=1257472)
(malloc=51792 #86)
(arena=13848 #25)
- Code (reserved=255686068, committed=26629556)
(malloc=2053556 #8654)
(mmap: reserved=253632512, committed=24576000)
- GC (reserved=1728178, committed=1019570)
(malloc=560818 #2163)
(mmap: reserved=1167360, committed=458752)
- Compiler (reserved=35543622, committed=35543622)
(malloc=71174 #1162)
(arena=35472448 #19)
- Internal (reserved=432627, committed=432627)
(malloc=399859 #1104)
(mmap: reserved=32768, committed=32768)
- Other (reserved=10248, committed=10248)
(malloc=10248 #3)
- Symbol (reserved=22101496, committed=22101496)
(malloc=19867360 #240000)
(arena=2234136 #1)
- Native Memory Tracking (reserved=4899928, committed=4899928)
(malloc=9656 #122)
(tracking overhead=4890272)
- Arena Chunk (reserved=81808, committed=81808)
(malloc=81808)
- Tracing (reserved=1, committed=1)
(malloc=1 #1)
- Logging (reserved=4572, committed=4572)
(malloc=4572 #192)
- Arguments (reserved=19063, committed=19063)
(malloc=19063 #495)
- Module (reserved=310496, committed=310496)
(malloc=310496 #2710)
- Synchronizer (reserved=283672, committed=283672)
(malloc=283672 #2348)
- Safepoint (reserved=8192, committed=8192)
(mmap: reserved=8192, committed=8192)
No matter how many times the pod is restarted, it always exits at this phase. This is the output of kubectl get all
NAME READY STATUS RESTARTS AGE
pod/kafka-zk-6b6f4976cf-9hjzn 1/1 Running 0 69m
pod/kafka-broker-0 1/1 Running 0 58m
pod/mysql-7c57b4cfdf-njb97 1/1 Running 0 39m
pod/skipper-b46bfd5fd-wrnqv 0/1 CrashLoopBackOff 13 (57s ago) 38m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.152.183.1 <none> 443/TCP 148m
service/kafka-zk ClusterIP 10.152.183.62 <none> 2181/TCP,2888/TCP,3888/TCP 69m
service/kafka-broker ClusterIP None <none> 9092/TCP 69m
service/mysql ClusterIP 10.152.183.139 <none> 3306/TCP 40m
service/skipper LoadBalancer 10.152.183.250 <pending> 80:31955/TCP 38m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/kafka-zk 1/1 1 1 69m
deployment.apps/mysql 1/1 1 1 39m
deployment.apps/skipper 0/1 1 0 38m
NAME DESIRED CURRENT READY AGE
replicaset.apps/kafka-zk-6b6f4976cf 1 1 1 69m
replicaset.apps/mysql-7c57b4cfdf 1 1 1 39m
replicaset.apps/skipper-b46bfd5fd 1 1 0 38m
NAME READY AGE
statefulset.apps/kafka-broker 1/1 69m
What I tried:
Changing the Skipper service type from LoadBalancer to NodePort (I have not enabled metallb so load balancing is not provided), but didn't work;
Changing the port exposed by the container, in the default configuration is port 80, I changed it to 7577 (also in the service configuration), but the error still occurs;
Downgraded to the version 2.8.2 of skipper, the same in the documentation above, the behaviour was exactly the same.
Increasing the logging level by setting logging.level.org.springframework to DEBUG and then to TRACE didn't result in anything useful showing up in the logs, except a cryptic line which I did not found anywhere on google:
[...]
2022-04-11 15:22:38.818 DEBUG 1 --- [ main] o.s.c.c.CompositeCompatibilityVerifier : All conditions are passing
2022-04-11 15:22:39.098 DEBUG 1 --- [ main] ocalVariableTableParameterNameDiscoverer : Cannot find '.class' file for class [class org.springframework.statemachine.boot.autoconfigure.StateMachineAutoConfiguration$StateMachineMonitoringConfiguration$$EnhancerBySpringCGLIB$$b266f314] - unable to determine constructor/method parameter names
2022-04-11 15:22:39.925 INFO 1 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port(s): 7577 (http) with context path ''
2022-04-11 15:22:40.244 INFO 1 --- [ main] o.s.c.s.s.app.SkipperServerApplication : Started SkipperServerApplication in 76.267 seconds (JVM running for 79.716)
[...]
Can anyone suggest me what to try next, or give me some way to further diagnosticate this issue?
A bazel binary that I am building completes unsuccessfully during the analysis phase. What flags and tools can I use to debug why it fails during analysis.
Currently, clean builds return the following output
ERROR: build interrupted
INFO: Elapsed time: 57.819 s
FAILED: Build did NOT complete successfully (133 packages loaded)
If I retry building after failed completion, I receive the following output
ERROR: build interrupted
INFO: Elapsed time: 55.514 s
FAILED: Build did NOT complete successfully (68 packages loaded)
What flags can I use to identify
what packages are being loaded
what package the build is being interrupted on
whether the interruption is coming from a timeout or an external process.
Essentially, something similar to --verbose_failures but for the analysis phase rather than the execution phrase.
So far I have ran my build through the build profiler, and have not been able to glean any insight. Here is the output of my build:
WARNING: This information is intended for consumption by Blaze developers only, and may change at any time. Script against it at your own risk
INFO: Loading /<>/result
INFO: bazel profile for <> at Mon Jun 04 00:10:11 GMT 2018, build ID: <>, 49405 record(s)
INFO: Aggregating task statistics
=== PHASE SUMMARY INFORMATION ===
Total launch phase time 9.00 ms 0.02%
Total init phase time 91.0 ms 0.16%
Total loading phase time 1.345 s 2.30%
Total analysis phase time 57.063 s 97.53%
Total run time 58.508 s 100.00%
=== INIT PHASE INFORMATION ===
Total init phase time 91.0 ms
Total time (across all threads) spent on:
Type Total Count Average
=== LOADING PHASE INFORMATION ===
Total loading phase time 1.345 s
Total time (across all threads) spent on:
Type Total Count Average
CREATE_PACKAGE 0.67% 9 3.55 ms
VFS_STAT 0.69% 605 0.05 ms
VFS_DIR 0.96% 255 0.18 ms
VFS_OPEN 2.02% 8 12.1 ms
VFS_READ 0.00% 5 0.01 ms
VFS_GLOB 23.74% 1220 0.93 ms
SKYFRAME_EVAL 24.44% 3 389 ms
SKYFUNCTION 36.95% 8443 0.21 ms
SKYLARK_LEXER 0.19% 31 0.29 ms
SKYLARK_PARSER 0.68% 31 1.04 ms
SKYLARK_USER_FN 0.03% 5 0.27 ms
SKYLARK_BUILTIN_FN 5.91% 349 0.81 ms
=== ANALYSIS PHASE INFORMATION ===
Total analysis phase time 57.063 s
Total time (across all threads) spent on:
Type Total Count Average
CREATE_PACKAGE 0.30% 138 3.96 ms
VFS_STAT 0.05% 2381 0.03 ms
VFS_DIR 0.19% 1020 0.35 ms
VFS_OPEN 0.04% 128 0.61 ms
VFS_READ 0.00% 128 0.01 ms
VFS_GLOB 0.92% 3763 0.45 ms
SKYFRAME_EVAL 31.13% 1 57.037 s
SKYFUNCTION 65.21% 32328 3.70 ms
SKYLARK_LEXER 0.01% 147 0.10 ms
SKYLARK_PARSER 0.03% 147 0.39 ms
SKYLARK_USER_FN 0.20% 343 1.08 ms
As far as my command, I am running
bazel build src:MY_TARGET --embed_label MY_LABEL --stamp --show_loading_progress
Use the --host_jvm_debug startup flag to debug Bazel itself during a build.
From https://bazel.build/contributing.html:
Debugging Bazel
Start creating a debug configuration for both C++ and
Java in your .bazelrc with the following:
build:debug -c dbg
build:debug --javacopt="-g"
build:debug --copt="-g"
build:debug --strip="never"
Then you can rebuild Bazel with bazel build --config debug //src:bazel and use your favorite debugger to start debugging.
For debugging the C++ client you can just run it from gdb or lldb as
you normally would. But if you want to debug the Java code, you must
attach to the server using the following:
Run Bazel with debugging option --host_jvm_debug before the command (e.g., bazel --batch --host_jvm_debug build //src:bazel).
Attach a debugger to the port 5005. With jdb for instance, run jdb -attach localhost:5005. From within Eclipse, use the remote
Java application launch configuration.
Our IntelliJ plugin has built-in debugging support
I am working on siege 4.0.2 on ubuntu 16.04 environment. I need get a failed transaction when I simulate more than 1100 user, I know that if failed transaction comes means so there is a problem in server memory may be maemory out of failure. How to understand the failed transaction ? And how to solve the problem for failed transaction comes ?
siege -c1190 -t1m http://192.168.1.11:8080/
HTTP/1.1 200 7.02 secs: 57 bytes ==> GET /kiosk/start
HTTP/1.1 200 7.01 secs: 57 bytes ==> GET /kiosk/start
siege aborted due to excessive socket failure; you
can change the failure threshold in $HOME/.siegerc
Transactions: 3263 hits
Availability: 76.11 %
Elapsed time: 9.34 secs
Data transferred: 0.18 MB
Response time: 1.98 secs
Transaction rate: 349.36 trans/sec
Throughput: 0.02 MB/sec
Concurrency: 691.94
Successful transactions: 3263
Failed transactions: 1024
Longest transaction: 7.75
Shortest transaction: 0.03
When I simulate 1100 users , I got a error discriptor tables full sock. c 119: Too many open files and then I do ulimit - n 10000 and the error went.
Then I again simulate 1100 users, I got a new error
[error] socket: read error Connection reset by peer sock.c:539: Connection reset by peer
Could not able to throw the error . How to remove this error ? Anybody please help me
I have trouble while trying to set PassengerTempDir:
I have redmine on apache+mod_passenger, CentOs. In redmine I have 500 Internal Error while uploading files. Due to my issue I have found that I should change PassengerTempDir to custom folder.
wrong permissions set in the webserver_private directory of passenger.
6.6 PassengerTempDir
For tests I have create folder /home/tmp_passenger and set 777 for it.
And the next I proceed export PASSENGER_TMPDIR=/tmp_passenger
Result is :
passenger-status
ERROR: Phusion Passenger doesn't seem to be running.
So, before I have proceed export PASSENGER_TMPDIR=/tmp_passenger
passenger-status
Version : 4.0.53
Date : 2014-12-29 12:43:36 +0100
Instance: 1416
----------- General information -----------
Max pool size : 20
Processes : 1
Requests in top-level queue : 0
----------- Application groups -----------
/home/admin/web/MYDOMAIN/public_html/redmine#default:
App root: /home/admin/web/MYDOMAIN/public_html/redmine
Requests in queue: 0
* PID: 6440 Sessions: 0 Processed: 8 Uptime: 22m 1s
CPU: 0% Memory : 52M Last used: 10m 29s ago
After I have proceed export PASSENGER_TMPDIR=/tmp_passenger
passenger-status
ERROR: Phusion Passenger doesn't seem to be running.
Please, help me resolve issue. What I should do now ?