I have a ktor app. I works fine when I run it in development mode. I package it in a docker image by copying over what the gradle application plugin provided. That also works fine on my local machine 8 cores. But now the strange part. When I do exactly the same thing on a rented V-Server also running Ubuntu-20.04 like my local system, ktor is incredible slow.
docker-compose logs server:
server | 2021-08-24 08:00:23.337 [main] INFO ktor.application - Autoreload is disabled because the development mode is off.
server | 2021-08-24 08:25:35.048 [main] INFO ktor.application - Autoreload is disabled because the development mode is off.
server | 2021-08-24 09:18:48.246 [main] INFO c.e.e.s.TemplateStore - Starting to parse Sentences
server | 2021-08-24 09:18:48.345 [main] INFO c.e.e.s.TemplateStore - Finished parsing sentences
server | 2021-08-24 09:18:48.346 [main] INFO ktor.application - Responding at http://0.0.0.0:8080
server | 2021-08-24 09:18:48.347 [main] INFO ktor.application - Application started in 3193.32 seconds.
Application started in 3193.32 seconds
The source code can be found here https://github.com/1-alex98/whatisthat . It has a docker-compose.yml defining the whole docker container being started.
Local system 32 gb ram + 8 cores . V-Server 4 gb Ram + 2 cores (htop shows pleinty of resources are free).
I am looking for ideas on what in the world could cause this behavior. Or ways to debug it.
Update:
Seems to read a file forever:
"main" #1 prio=5 os_prio=0 cpu=652.14ms elapsed=173.92s tid=0x00007f01d4016000 nid=0xe runnable [0x00007f01dace6000]
java.lang.Thread.State: RUNNABLE
at java.io.FileInputStream.readBytes(java.base#11.0.12/Native Method)
at java.io.FileInputStream.read(java.base#11.0.12/FileInputStream.java:279)
at java.io.FilterInputStream.read(java.base#11.0.12/FilterInputStream.java:133)
at sun.security.provider.NativePRNG$RandomIO.readFully(java.base#11.0.12/NativePRNG.java:424)
at sun.security.provider.NativePRNG$RandomIO.ensureBufferValid(java.base#11.0.12/NativePRNG.java:526)
at sun.security.provider.NativePRNG$RandomIO.implNextBytes(java.base#11.0.12/NativePRNG.java:545)
- locked <0x00000000c7571158> (a java.lang.Object)
at sun.security.provider.NativePRNG$Blocking.engineNextBytes(java.base#11.0.12/NativePRNG.java:268)
at java.security.SecureRandom.nextBytes(java.base#11.0.12/SecureRandom.java:751)
at kotlin.random.AbstractPlatformRandom.nextBytes(PlatformRandom.kt:47)
at kotlin.random.Random.nextBytes(Random.kt:260)
at com.example.routes.websocket.WebsocketRoutingKt.<clinit>(WebsocketRouting.kt:40)
at com.example.plugins.RoutingKt$routing$1.invoke(Routing.kt:13)
at com.example.plugins.RoutingKt$routing$1.invoke(Routing.kt:11)
at io.ktor.routing.Routing$Feature.install(Routing.kt:106)
at io.ktor.routing.Routing$Feature.install(Routing.kt:88)
at io.ktor.application.ApplicationFeatureKt.install(ApplicationFeature.kt:68)
at io.ktor.routing.RoutingKt.routing(Routing.kt:129)
at com.example.plugins.RoutingKt.routing(Routing.kt:11)
at com.example.ApplicationKt$main$1.invoke(Application.kt:18)
at com.example.ApplicationKt$main$1.invoke(Application.kt:14)
at io.ktor.server.engine.internal.CallableUtilsKt.executeModuleFunction(CallableUtils.kt:50)
at io.ktor.server.engine.ApplicationEngineEnvironmentReloading$launchModuleByName$1.invoke(ApplicationEngineEnvironmentReloading.kt:317)
at io.ktor.server.engine.ApplicationEngineEnvironmentReloading$launchModuleByName$1.invoke(ApplicationEngineEnvironmentReloading.kt:316)
at io.ktor.server.engine.ApplicationEngineEnvironmentReloading.avoidingDoubleStartupFor(ApplicationEngineEnvironmentReloading.kt:341)
at io.ktor.server.engine.ApplicationEngineEnvironmentReloading.launchModuleByName(ApplicationEngineEnvironmentReloading.kt:316)
at io.ktor.server.engine.ApplicationEngineEnvironmentReloading.access$launchModuleByName(ApplicationEngineEnvironmentReloading.kt:30)
at io.ktor.server.engine.ApplicationEngineEnvironmentReloading$instantiateAndConfigureApplication$1.invoke(ApplicationEngineEnvironmentReloading.kt:304)
at io.ktor.server.engine.ApplicationEngineEnvironmentReloading$instantiateAndConfigureApplication$1.invoke(ApplicationEngineEnvironmentReloading.kt:295)
at io.ktor.server.engine.ApplicationEngineEnvironmentReloading.avoidingDoubleStartup(ApplicationEngineEnvironmentReloading.kt:323)
at io.ktor.server.engine.ApplicationEngineEnvironmentReloading.instantiateAndConfigureApplication(ApplicationEngineEnvironmentReloading.kt:295)
at io.ktor.server.engine.ApplicationEngineEnvironmentReloading.createApplication(ApplicationEngineEnvironmentReloading.kt:136)
at io.ktor.server.engine.ApplicationEngineEnvironmentReloading.start(ApplicationEngineEnvironmentReloading.kt:268)
at io.ktor.server.netty.NettyApplicationEngine.start(NettyApplicationEngine.kt:174)
at com.example.ApplicationKt.main(Application.kt:21)
at com.example.ApplicationKt.main(Application.kt)
It is a fresh rented server but I guess something is wrong with it
docker-compose being slow and my program not starting seemed to be due to insufficient(not good enough) input to /dev/urandom. Installing https://github.com/smuellerDD/jitterentropy-rngd resolved the problem.
Related
I'm running iotedge on kubernetes.
The K8S cluster is a local cluster setup largely using the "Kubernetes the hard way" method, with some modifications.
I did manage to get things working on one installation. However, I'm now getting this on another installation. The initial installation works fine, but after shutting down a machine to simulate a hardware failure, the pod gets recreated, but starts to show this error again. This error happens EVEN if the node shutdown is NOT the one iotedged is running on.
Environment
3 Nodes running Ubuntu 20.04 LTS
Two networks on each node, one for the internet, one for an internal network. K8S is setup using the internal, static IP address
HAProxy/Keepalived for HA without a load balancer, running on a Virtual IP address
Multus CNI for attaching pods to additional networks
CoreDNS
Troubleshooting
Confirmed that CoreDNS seems to be functioning fine, and is able to resolve internal and external addresses
Remaining nodes are able to ping pods on other nodes
Deleting the iotedged pod and allowing k8s to recreate it works, but then edgeAgent an edgeHub have errors until I delete/recreate them as well
Re-run the entire k8s installation. Initial installation works fine, but simulating machine failure continues to be problematic.
Kubernetes Versions:
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T16:31:21Z", GoVersion:"go1.16.1", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T16:25:06Z", GoVersion:"go1.16.1", Compiler:"gc", Platform:"linux/amd64"}
edgeiotd error:
<6>2021-07-09T22:00:02Z [INFO] - Starting Azure IoT Edge Security Daemon - Kubernetes mode
<6>2021-07-09T22:00:02Z [INFO] - Version - 1.1.3
<6>2021-07-09T22:00:02Z [INFO] - Using config file: /etc/iotedged/config.yaml
<6>2021-07-09T22:00:02Z [INFO] - Configuring /var/lib/iotedge as the home directory.
<6>2021-07-09T22:00:02Z [INFO] - Configuring certificates...
<6>2021-07-09T22:00:02Z [INFO] - Transparent gateway certificates not found, operating in quick start mode...
<6>2021-07-09T22:00:02Z [INFO] - Finished configuring provisioning environment variables and certificates.
<6>2021-07-09T22:00:02Z [INFO] - Initializing hsm...
<6>2021-07-09T22:00:02Z [INFO] - Finished initializing hsm.
<6>2021-07-09T22:00:02Z [INFO] - Provisioning edge device...
<6>2021-07-09T22:00:02Z [INFO] - Starting provisioning edge device via manual mode using a device connection string...
<6>2021-07-09T22:00:02Z [INFO] - Manually provisioning device "********" in hub "********.azure-devices.net"
<6>2021-07-09T22:00:02Z [INFO] - Finished provisioning edge device.
<6>2021-07-09T22:00:02Z [INFO] - Initializing the module runtime...
<6>2021-07-09T22:00:02Z [INFO] - Attempting to use config from /home/edgeletuser/.kube/config file.
<6>2021-07-09T22:00:02Z [INFO] - Using in-cluster config
<3>2021-07-09T22:00:34Z [ERR!] - The daemon could not start up successfully: Could not initialize module runtime
<3>2021-07-09T22:00:34Z [ERR!] - caused by: Could not initialize kubernetes module runtime
<3>2021-07-09T22:00:34Z [ERR!] - caused by: HTTP response error: SelfSubjectAccessReviewCreate
<3>2021-07-09T22:00:34Z [ERR!] - caused by: Hyper HTTP error
<3>2021-07-09T22:00:34Z [ERR!] - caused by: error trying to connect: Connection timed out (os error 110)
<6>2021-07-09T22:00:02Z [INFO] (/project/hsm-sys/azure-iot-hsm-c/src/hsm_log.c:log_init:41) Initialized logging
edgeHub Logs after recreating iotedged:
2021-08-18 19:05:40 Starting Edge Hub
2021-08-18 19:05:40.481 +00:00 Edge Hub Main()
<7> 2021-08-18 19:05:40.609 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Edged.WorkloadClient] - Making a Http call to http://localhost:35001/ to CreateServerCertificateAsync
<7> 2021-08-18 19:05:40.912 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Edged.WorkloadClient] - Error when getting an Http response from http://localhost:35001/ for CreateServerCertificateAsync
HTTP Response:
{"message":"Module not found"}
Microsoft.Azure.Devices.Edge.Util.Edged.Version_2019_01_30.GeneratedCode.IoTEdgedException`1[Microsoft.Azure.Devices.Edge.Util.Edged.Version_2019_01_30.GeneratedCode.ErrorResponse]: Not Found
at Microsoft.Azure.Devices.Edge.Util.Edged.Version_2019_01_30.GeneratedCode.HttpWorkloadClient.CreateServerCertificateAsync(String api_version, String name, String genid, ServerCertificateRequest request, CancellationToken cancellationToken) in /home/vsts/work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/edged/version_2019_01_30/generatedCode/HttpWorkloadClient.cs:line 624
at Microsoft.Azure.Devices.Edge.Util.TaskEx.TimeoutAfter[T](Task`1 task, TimeSpan timeout) in /home/vsts/work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/TaskEx.cs:line 126
at Microsoft.Azure.Devices.Edge.Util.Edged.WorkloadClientVersioned.Execute[T](Func`1 func, String operation) in /home/vsts/work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/edged/WorkloadClientVersioned.cs:line 59
Unhandled exception. System.AggregateException: One or more errors occurred. (Error calling CreateServerCertificateAsync: Module not found)
---> Microsoft.Azure.Devices.Edge.Util.Edged.WorkloadCommunicationException- Message:Error calling CreateServerCertificateAsync: Module not found, StatusCode:404, at: at Microsoft.Azure.Devices.Edge.Util.Edged.Version_2019_01_30.WorkloadClient.HandleException(Exception ex, String operation) in /home/vsts/work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/edged/version_2019_01_30/WorkloadClient.cs:line 109
at Microsoft.Azure.Devices.Edge.Util.Edged.WorkloadClientVersioned.Execute[T](Func`1 func, String operation) in /home/vsts/work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/edged/WorkloadClientVersioned.cs:line 77
at Microsoft.Azure.Devices.Edge.Util.Edged.Version_2019_01_30.WorkloadClient.CreateServerCertificateAsync(String hostname, DateTime expiration) in /home/vsts/work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/edged/version_2019_01_30/WorkloadClient.cs:line 35
at Microsoft.Azure.Devices.Edge.Util.CertificateHelper.GetServerCertificatesFromEdgelet(Uri workloadUri, String workloadApiVersion, String workloadClientApiVersion, String moduleId, String moduleGenerationId, String edgeHubHostname, DateTime expiration) in /home/vsts/work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/CertificateHelper.cs:line 260
at Microsoft.Azure.Devices.Edge.Hub.Service.EdgeHubCertificates.LoadAsync(IConfigurationRoot configuration, ILogger logger) in /home/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.Service/EdgeHubCertificates.cs:line 54
at Microsoft.Azure.Devices.Edge.Hub.Service.Program.MainAsync(IConfigurationRoot configuration) in /home/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.Service/Program.cs:line 54
--- End of inner exception stack trace ---
at System.Threading.Tasks.Task`1.GetResultCore(Boolean waitCompletionNotification)
at System.Threading.Tasks.Task`1.get_Result()
at Microsoft.Azure.Devices.Edge.Hub.Service.Program.Main() in /home/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.Service/Program.cs:line 33
Are you still blocked? what troubleshooting steps you have tried so far? Did you check the Common issues and resolutions for Azure IoT Edge? As per the error messages Transparent gateway certificates not found, operating in quick start mode and The daemon could not start up successfully: Could not initialize module runtime looks like the setup is not configured properly. Try restarting the server and check the transparent gateway setup. Please refer the transparent gateway setup and check if you have missed anything.
I have been trying to understand an issue I've had when running roribio16/alpine-sqs docker image on one of my machines. Whenever I try to run the image without specifying any other settings, docker run roribio16/alpine-sqs
[xxxx#yyyy ~]$ docker run roribio16/alpine-sqs
2021-05-29 15:48:41,216 INFO Included extra file "/etc/supervisor/conf.d/elasticmq.conf" during parsing
2021-05-29 15:48:41,216 INFO Included extra file "/etc/supervisor/conf.d/insight.conf" during parsing
2021-05-29 15:48:41,216 INFO Included extra file "/etc/supervisor/conf.d/sqs-init.conf" during parsing
2021-05-29 15:48:41,216 INFO Set uid to user 0 succeeded
2021-05-29 15:48:41,222 INFO RPC interface 'supervisor' initialized
2021-05-29 15:48:41,222 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2021-05-29 15:48:41,222 INFO supervisord started with pid 1
2021-05-29 15:48:42,225 INFO spawned: 'sqs-init' with pid 9
2021-05-29 15:48:42,229 INFO spawned: 'elasticmq' with pid 10
2021-05-29 15:48:42,230 INFO spawned: 'insight' with pid 11
cp: can't stat '/opt/custom/*.conf': No such file or directory
> sqs-insight#0.3.0 start /opt/sqs-insight
> node index.js
15:48:42.605 [main] INFO org.elasticmq.server.Main$ - Starting ElasticMQ server (0.15.0) ...
Loading config file from "/opt/sqs-insight/lib/../config/config_local.json"
15:48:42.929 [elasticmq-akka.actor.default-dispatcher-2] INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started
Unable to load queues for undefined
Config contains 0 queues.
library initialization failed - unable to allocate file descriptor table - out of memorylistening on port 9325
2021-05-29 15:48:43,233 INFO success: sqs-init entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2021-05-29 15:48:43,233 INFO success: elasticmq entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2021-05-29 15:48:43,234 INFO success: insight entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2021-05-29 15:48:43,234 INFO exited: sqs-init (exit status 0; expected)
2021-05-29 15:48:44,318 INFO exited: elasticmq (terminated by SIGABRT (core dumped); not expected)
2021-05-29 15:48:45,322 INFO spawned: 'elasticmq' with pid 67
15:48:45.743 [main] INFO org.elasticmq.server.Main$ - Starting ElasticMQ server (0.15.0) ...
15:48:46.044 [elasticmq-akka.actor.default-dispatcher-2] INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started
library initialization failed - unable to allocate file descriptor table - out of memory2021-05-29 15:48:47,223 INFO success: elasticmq entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2021-05-29 15:48:47,389 INFO exited: elasticmq (terminated by SIGABRT (core dumped); not expected)
2021-05-29 15:48:48,393 INFO spawned: 'elasticmq' with pid 89
15:48:48.766 [main] INFO org.elasticmq.server.Main$ - Starting ElasticMQ server (0.15.0) ...
15:48:49.066 [elasticmq-akka.actor.default-dispatcher-3] INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started
library initialization failed - unable to allocate file descriptor table - out of memory^C2021-05-29 15:48:49,559 INFO success: elasticmq entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2021-05-29 15:48:49,559 WARN received SIGINT indicating exit request
2021-05-29 15:48:49,559 INFO waiting for insight, elasticmq to die
2021-05-29 15:48:49,566 INFO stopped: insight (terminated by SIGTERM)
2021-05-29 15:48:50,431 INFO stopped: elasticmq (terminated by SIGABRT (core dumped))
With a bit of googling I found this post where somebody had the same issue when running some other random image, and then posted that they managed to get the image running by setting some ulimits when running the image, which also worked for me (docker run --ulimit nofile=122880:122880 roribio16/alpine-sqs).
I checked the ulimits set inside the container when I didn't use this configuration
docker exec -it ca bash
$ ulimit -a
and found that the nofile setting was ridiculously high, which I assume is what is causing the container to run out of memory, if too many files are being opened simultaneously. I don't have a particulary good understanding of how this works though so would appreciate any clarification somebody could shed on that particular topic also.
Anyway the point of that ramble is that I want to try and find where the default docker container ulimits are set as I don't understand why they are so high on the machine I am using. I have another machine that does not have this problem.
I can find lots of ways to change the default limits but there does not seem to be much information about where these limits get set in the first place. I understand according to the docker documentation that if custom values are not set then the ulimits should be inherited from my system but as far as I can tell my system nofile settings are much lower than what I'm seeing in the container.
(Both machines run manjaro linux however the one that doesn't have this issue is XFCE and the one that does is KDE).
I have Jenkins 2.164.3 on a CentOS 7 server.
I have a Windows Server 2003 slave with Java version 1.8.0.
I have 3 x linux slaves working successfully.
The windows service on the slave is installed and running.
The windows slave is setup with the following with Launch Method "Let jenkins control this Windows salve as a Windows server"
This Jenkins server is a new server that replaced an older jenkins server (debian wheezy from turnkey linux ~3 years ago). This windows slave used to connect to that old server. To remove the connection on this slave to the old server, I did the following:
1. sc delete
2. deleted the files in folder c:\jenkins
3. rebooted server
4. from new jenkins server, launched slave which copied files to c:\jenkins folder and installed service.
On my new jenkins server, I setup the windows slave and when I connect, the log has the following:
[2019-05-27 12:24:07] [windows-slaves] Connecting to 192.168.1.152
Checking if Java exists
java -version returned 1.8.0.
[2019-05-27 12:24:16] [windows-slaves] Copying jenkins-slave.xml
[2019-05-27 12:24:16] [windows-slaves] Copying slave.jar
[2019-05-27 12:24:16] [windows-slaves] Starting the service
[2019-05-27 12:24:16] [windows-slaves] Waiting for the service to become ready
ERROR: [2019-05-27 12:24:52] [windows-slaves] The service did not respond. Perhaps it failed to launch?
[2019-05-27 12:36:00] [windows-slaves] Connecting to 192.168.1.152
Checking if Java exists
java -version returned 1.8.0.
[2019-05-27 12:36:08] [windows-slaves] Copying jenkins-slave.xml
[2019-05-27 12:36:08] [windows-slaves] Copying slave.jar
[2019-05-27 12:36:08] [windows-slaves] Starting the service
ERROR: Unexpected error in launching an agent. This is probably a bug in Jenkins
org.jinterop.dcom.common.JIException: Service Already Running
at org.jvnet.hudson.wmi.Win32Service$Implementation.start(Win32Service.java:149)
Caused: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.kohsuke.jinterop.JInteropInvocationHandler.invoke(JInteropInvocationHandler.java:140)
Caused: java.lang.reflect.UndeclaredThrowableException
at com.sun.proxy.$Proxy90.start(Unknown Source)
at hudson.os.windows.ManagedWindowsServiceLauncher.launch(ManagedWindowsServiceLauncher.java:342)
at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:294)
at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
The windows slave is Windows Server 2003, the service is installed and running.
In the log file C:\Jenkins\jenkins-slave.wrapper.log, it has the following:
2019-05-27 12:19:32,644 INFO - Starting ServiceWrapper in the service mode
2019-05-27 12:19:32,659 INFO - Starting javaw.exe -Xrs -jar "C:\Jenkins\slave.jar" -tcp "C:\Jenkins\port.txt"
2019-05-27 12:19:32,675 INFO - Extension loaded: killOnStartup
2019-05-27 12:19:32,675 DEBUG - Checking the potentially runaway process with PID=1408
2019-05-27 12:19:32,675 DEBUG - No runaway process with PID=1408. The process has been already stopped.
2019-05-27 12:19:32,675 INFO - Starting javaw.exe -Xrs -jar "C:\Jenkins\slave.jar" -tcp "C:\Jenkins\port.txt"
2019-05-27 12:19:32,691 INFO - Started process 4084
2019-05-27 12:19:32,691 DEBUG - Forwarding logs of the process System.Diagnostics.Process (javaw) to winsw.SizeBasedRollingLogAppender
2019-05-27 12:19:32,691 INFO - Recording PID of the started process:4084. PID file destination is C:\Jenkins\jenkins_agent.pid
2019-05-27 12:23:56,529 INFO - Stopping jenkinsslave-C__Jenkins
2019-05-27 12:23:56,529 DEBUG - ProcessKill 4084
2019-05-27 12:23:56,561 INFO - Stopping process 4084
2019-05-27 12:23:56,561 INFO - Send SIGINT 4084
2019-05-27 12:23:56,561 WARN - SIGINT to 4084 failed - Killing as fallback
2019-05-27 12:23:56,561 INFO - Finished jenkinsslave-C__Jenkins
2019-05-27 12:23:56,561 DEBUG - Completed. Exit code is 0
2019-05-27 12:24:16,374 INFO - Starting ServiceWrapper in the service mode
2019-05-27 12:24:16,390 INFO - Starting javaw.exe -Xrs -jar "C:\Jenkins\slave.jar" -tcp "C:\Jenkins\port.txt"
2019-05-27 12:24:16,405 INFO - Extension loaded: killOnStartup
2019-05-27 12:24:16,405 DEBUG - Checking the potentially runaway process with PID=4084
2019-05-27 12:24:16,405 DEBUG - No runaway process with PID=4084. The process has been already stopped.
2019-05-27 12:24:16,405 INFO - Starting javaw.exe -Xrs -jar "C:\Jenkins\slave.jar" -tcp "C:\Jenkins\port.txt"
2019-05-27 12:24:16,421 INFO - Started process 364
2019-05-27 12:24:16,421 DEBUG - Forwarding logs of the process System.Diagnostics.Process (javaw) to winsw.SizeBasedRollingLogAppender
2019-05-27 12:24:16,421 INFO - Recording PID of the started process:364. PID file destination is C:\Jenkins\jenkins_agent.pid
The error on the jenkins server shows the service is not running. On the windows slave machine, the service is running. What is the problem and how do I fix?
Thanks.
Very old question, but if you end up here because you are getting this error, find the jenkins_agent.pid file and delete it. It should be in the same folder the rest of your jenkins slave files. The service should start again normally after that.
I know this question is old and you've long moved on but maybe this will help someone. I ran into a similar problem with a Windows slave, specifically I was seeing it go through a cycle of restarts much like you were:
2019-05-27 12:24:16,405 DEBUG - Checking the potentially runaway process with PID=4084
2019-05-27 12:24:16,405 DEBUG - No runaway process with PID=4084. The process has been already stopped.
2019-05-27 12:24:16,405 INFO - Starting javaw.exe -Xrs -jar "C:\Jenkins\slave.jar" -tcp "C:\Jenkins\port.txt"
2019-05-27 12:24:16,421 INFO - Started process 364
To solve the problem I checked the following:
See if the Windows service is running, cycle it
In addition to checking the C:\<path to jenkins>\jenkins-slave.wrapper.log also have a look at C:\<path to jenkins>\jenkins-slave.err.log
The err log is where I found my problem, I had an issue with a cert unable to find valid certificate
Edit the C:\<path to jenkins>\jenkins-slave.xml file and fix whatever startup parameter is causing you a problem. Make sure to check the java path and version.
In my certificate error case, I needed to add a -noCertificateCheck to my arguments so I could move on
Another potential downfall maybe that the executable setting in the jenkins-slave.xml config file no longer points to a valid java.exe.
This may happen after a Java update
I am trying to start Apache nifi version 1.2.0 on window 8 machine. It used to start properly. After I restarted the system the nifi is not starting at all. I had check status Its keep getting "Apacha Nifi not running".
Below are logs from nifi.bootstrap.log file:-
2017-07-05 15:41:57,105 WARN [NiFi Bootstrap Command Listener]
org.apache.nifi.bootstrap.RunNiFi Failed to set permissions so that only the
owner can read pid file E:\softwares\nifi-1.2.0\bin\..\run\nifi.pid; this
may allows others to have access to the key needed to communicate with NiFi.
Permissions should be changed so that only the owner can read this file
2017-07-05 15:41:57,142 WARN [NiFi Bootstrap Command Listener]
org.apache.nifi.bootstrap.RunNiFi Failed to set permissions so that only the
owner can read status file E:\softwares\nifi-1.2.0\bin\..\run\nifi.status;
this may allows others to have access to the key needed to communicate with
NiFi. Permissions should be changed so that only the owner can read this
file
2017-07-05 15:41:57,168 INFO [NiFi Bootstrap Command Listener]
org.apache.nifi.bootstrap.RunNiFi Apache NiFi now running and listening for
Bootstrap requests on port 50765
2017-07-05 15:43:12,077 ERROR [NiFi logging handler] org.apache.nifi.StdErr
Failed to start web server: Unable to start Flow Controller.
2017-07-05 15:43:12,078 ERROR [NiFi logging handler] org.apache.nifi.StdErr
Shutting down...
2017-07-05 15:43:14,501 INFO [main] org.apache.nifi.bootstrap.RunNiFi NiFi
never started. Will not restart NiFi
Stack trace from nifi.app.log: -
2017-07-05 15:43:12,077 WARN [main] org.apache.nifi.web.server.JettyServer Failed to start web server... shutting down.
org.apache.nifi.web.NiFiCoreException: Unable to start Flow Controller.
at org.apache.nifi.web.contextlistener.ApplicationStartupContextListener.contextInitialized(ApplicationStartupContextListener.java:88)
at org.eclipse.jetty.server.handler.ContextHandler.callContextInitialized(ContextHandler.java:876)
at org.eclipse.jetty.servlet.ServletContextHandler.callContextInitialized(ServletContextHandler.java:532)
at org.eclipse.jetty.server.handler.ContextHandler.startContext(ContextHandler.java:839)
at org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:344)
at org.eclipse.jetty.webapp.WebAppContext.startWebapp(WebAppContext.java:1480)
at org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1442)
at org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:799)
at org.eclipse.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:261)
at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:540)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:131)
at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:113)
at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:113)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:131)
at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:105)
at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:113)
at org.eclipse.jetty.server.handler.gzip.GzipHandler.doStart(GzipHandler.java:290)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:131)
at org.eclipse.jetty.server.Server.start(Server.java:452)
at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:105)
at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:113)
at org.eclipse.jetty.server.Server.doStart(Server.java:419)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.apache.nifi.web.server.JettyServer.start(JettyServer.java:695)
at org.apache.nifi.NiFi.<init>(NiFi.java:160)
at org.apache.nifi.NiFi.main(NiFi.java:267)
Caused by: java.io.IOException: Expected to read a Sentinel Byte of '1' but got a value of '0' instead
at org.apache.nifi.repository.schema.SchemaRecordReader.readRecord(SchemaRecordReader.java:65)
at org.apache.nifi.controller.repository.SchemaRepositoryRecordSerde.deserializeRecord(SchemaRepositoryRecordSerde.java:115)
at org.apache.nifi.controller.repository.SchemaRepositoryRecordSerde.deserializeEdit(SchemaRepositoryRecordSerde.java:109)
at org.apache.nifi.controller.repository.SchemaRepositoryRecordSerde.deserializeEdit(SchemaRepositoryRecordSerde.java:46)
at org.wali.MinimalLockingWriteAheadLog$Partition.recoverNextTransaction(MinimalLockingWriteAheadLog.java:1096)
at org.wali.MinimalLockingWriteAheadLog.recoverFromEdits(MinimalLockingWriteAheadLog.java:459)
at org.wali.MinimalLockingWriteAheadLog.recoverRecords(MinimalLockingWriteAheadLog.java:301)
at org.apache.nifi.controller.repository.WriteAheadFlowFileRepository.loadFlowFiles(WriteAheadFlowFileRepository.java:381)
at org.apache.nifi.controller.FlowController.initializeFlow(FlowController.java:712)
at org.apache.nifi.controller.StandardFlowService.initializeController(StandardFlowService.java:953)
at org.apache.nifi.controller.StandardFlowService.load(StandardFlowService.java:534)
at org.apache.nifi.web.contextlistener.ApplicationStartupContextListener.contextInitialized(ApplicationStartupContextListener.java:72)
... 28 common frames omitted
Thanks in advance
After Googling on this error "Caused by: java.io.IOException: Expected to read a Sentinel Byte of '1' but got a value of '0' instead" I found that this error indicates a partial write to the repos.
Here are a couple of things you can check/try to bring your Dataflow back online ;
check if your dsks are not full
Did you launch nifi with the same user ? Did you run it with administrator privileges ?
You can backup/move your repositories and try to start Nifi with empty repositories, you will still have your dataflows there but any file that was processing when you shutdown will be gone.
Could you please try that ?
I think the issue is with incompatible java version, use JAVA 8 version.
If you haven't set JAVA_HOME then set in environment variables with path Like "C:/program files/jdk1.8"
Jira addressing when NiFi run with java 9 version and the issue not resolved yet
https://issues.apache.org/jira/browse/NIFI-4419
We have installed WSO2 Message Broker, v2.2.0 on Suse 64 bit OS, single core. We have configured the master-datasources.xml to point to an Oracle database. The startup of the MB takes minutes, especially:
TID: [0] [MB] [2014-06-11 15:57:53,039] INFO {org.apache.cassandra.thrift.ThriftServer} - Listening for thrift clients... {org.apache.cassandra.thrift.ThriftServer}
TID: [0] [MB] [2014-06-11 15:57:53,219] INFO {org.apache.cassandra.service.GCInspector} - GC for MarkSweepCompact: 407 ms for 1 collections, 60663688 used; max is 1037959168 {org.apache.cassandra.service.GCInspector}
TID: [0] [MB] [2014-06-11 15:58:39,137] WARN {org.wso2.carbon.core.internal.StartupFinalizerServiceComponent} - Waiting for required OSGi services: org.wso2.carbon.server.admin.common.IServerAdmin,org.wso2.carbon.throttling.agent.ThrottlingAgent, {org.wso2.carbon.core.internal.StartupFinalizerServiceComponent}
TID: [0] [MB] [2014-06-11 15:59:39,136] WARN {org.wso2.carbon.core.internal.StartupFinalizerServiceComponent} - Waiting for required OSGi services: org.wso2.carbon.server.admin.common.IServerAdmin,org.wso2.carbon.throttling.agent.ThrottlingAgent, {org.wso2.carbon.core.internal.StartupFinalizerServiceComponent}
TID: [0] [MB] [2014-06-11 16:00:39,136] WARN {org.wso2.carbon.core.internal.StartupFinalizerServiceComponent} - Waiting for required OSGi services: org.wso2.carbon.server.admin.common.IServerAdmin,org.wso2.carbon.throttling.agent.ThrottlingAgent, {org.wso2.carbon.core.internal.StartupFinalizerServiceComponent}
Is there a reason for this?
With Wso2 MB 220 we are getting these kind of errors when zookeeper/casandra server does not start properly.Ideally if clustering enabled zookeeper(Internal or External) server should be started properly before MB starts.
Further If you trying to run a MB cluster on a single machine and want to run two Zookeeper nodes there, Most probably you will be end up in these OSGI level errors.Please follow blog post on http://indikasampath.blogspot.com/2014/05/wso2-message-broker-cluster-setup-in.html for configuration details on WSO2 Message Broker cluster setup on a single machine