Closed. This question is not about programming or software development. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 3 days ago.
Improve this question
I'm trying to migrate a 3 node zookeeper ensemble from VMs to a kubernetes cluster without downtime.
I know there are a lot of blog posts and other articles on how to migrate zookeeper without downtime VMs to VMs to bare mettal to Vms etc. but couldn't find one which migrates w/o downtime to k8s.
This is the config on all zk nodes (zoo.cfg):
autopurge.purgeInterval=1
initLimit=10
syncLimit=5
autopurge.snapRetainCount=5
snapCount=5000
4lw.commands.whitelist=*
tickTime=2000
dataDir=/var/opt/zookeeper/data/data
admin.serverPort=8080
reconfigEnabled=true
admin.enableServer=True
standaloneEnabled=false
dynamicConfigFile=/opt/zookeeper/apache-zookeeper-3.7.1-bin/conf/zoo.cfg.dynamic
and /opt/zookeeper/current/conf/zoo.cfg.dynamic
server.1=inzzk01:2888:3888;2181
server.2=inzzk02:2888:3888;2181
server.3=inzzk03:2888:3888;2181
Up until here all is good, the cluster is formed
I run zk in k8s as a statefulset from this answer (btw, by itself if I create a 3 pod cluster it works as expected), so scrap everything on k8s to work on a clean cluster and
add the below to the config on VMs + restart each node:
server.4=10.100.102.106:30888:31888;30181
server.5=10.100.102.232:30889:31889;30182
The 2 IP addresses above are correct k8s nodes IP addresses (also the ports are correct)
In the logs all is normal:
2023-02-17 13:45:30,107 [myid:1] - WARN [QuorumConnectionThread-[myid=1]-3:QuorumCnxManager#401] - Cannot open channel to 4 at election address /10.100.102.106:31888
java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:607)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:384)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$QuorumConnectionReqThread.run(QuorumCnxManager.java:458)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
2023-02-17 13:45:30,113 [myid:1] - WARN [QuorumConnectionThread-[myid=1]-4:QuorumCnxManager#401] - Cannot open channel to 5 at election address /10.100.102.232:31889
java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:607)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:384)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$QuorumConnectionReqThread.run(QuorumCnxManager.java:458)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
I tried all types of services (ClusterIP, headless and not, Loadbalancer and NodePort) in the end I figured the simple way to go is no service + add hostNetwork: true to the statefulset. This way the ports are directly mapped to the k8s nodes so no proxy/SNAT/DNAT/xNAT :) so I can target them directly. Again not recommended! but for the sake of this example.
kubectl -n infraservices get all
NAME READY STATUS RESTARTS AGE
pod/zk-0 0/1 Running 1 (65s ago) 2m45s
In the logs of the pod:
2023-02-17 14:00:48,308 [myid:4] - INFO [WorkerReceiver[myid=4]:FastLeaderElection$Messenger$WorkerReceiver#390] - Notification: my state:LOOKING; n.sid:4, n.state:LOOKING, n.leader:4, n.round:0x1, n.peerEpoch:0x0, n.zxid:0x0, message format version:0x2, n.config version:0x0
2023-02-17 14:00:48,315 [myid:4] - INFO [WorkerReceiver[myid=4]:FastLeaderElection$Messenger$WorkerReceiver#308] - 4 Received version: 1600000000 my version: 0
2023-02-17 14:00:48,315 [myid:4] - INFO [WorkerReceiver[myid=4]:FastLeaderElection$Messenger$WorkerReceiver#316] - restarting leader election
2023-02-17 14:00:48,315 [myid:4] - WARN [RecvWorker:2:QuorumCnxManager$RecvWorker#1408] - Interrupting SendWorker thread from RecvWorker. sid: 2. myId: 4
2023-02-17 14:00:48,393 [myid:4] - WARN [RecvWorker:3:QuorumCnxManager$RecvWorker#1408] - Interrupting SendWorker thread from RecvWorker. sid: 3. myId: 4
2023-02-17 14:00:48,394 [myid:4] - INFO [QuorumPeerListener:QuorumCnxManager$Listener#985] - Leaving listener
2023-02-17 14:00:48,395 [myid:4] - WARN [SendWorker:2:QuorumCnxManager$SendWorker#1288] - Interrupted while waiting for message on queue
java.lang.InterruptedException
at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(Unknown Source)
at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown Source)
at org.apache.zookeeper.util.CircularBlockingQueue.poll(CircularBlockingQueue.java:105)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1453)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$900(QuorumCnxManager.java:99)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:1277)
2023-02-17 14:00:48,395 [myid:4] - WARN [RecvWorker:1:QuorumCnxManager$RecvWorker#1402] - Connection broken for id 1, my id = 4
java.net.SocketException: Socket closed
at java.base/java.net.SocketInputStream.socketRead0(Native Method)
at java.base/java.net.SocketInputStream.socketRead(Unknown Source)
at java.base/java.net.SocketInputStream.read(Unknown Source)
at java.base/java.net.SocketInputStream.read(Unknown Source)
at java.base/java.io.BufferedInputStream.fill(Unknown Source)
at java.base/java.io.BufferedInputStream.read(Unknown Source)
at java.base/java.io.DataInputStream.readInt(Unknown Source)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1390)
2023-02-17 14:00:48,395 [myid:4] - WARN [SendWorker:3:QuorumCnxManager$SendWorker#1288] - Interrupted while waiting for message on queue
java.lang.InterruptedException
at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(Unknown Source)
at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown Source)
at org.apache.zookeeper.util.CircularBlockingQueue.poll(CircularBlockingQueue.java:105)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1453)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$900(QuorumCnxManager.java:99)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:1277)
2023-02-17 14:00:48,395 [myid:4] - INFO [WorkerReceiver[myid=4]:FastLeaderElection$Messenger$WorkerReceiver#472] - WorkerReceiver is down
2023-02-17 14:00:48,395 [myid:4] - WARN [SendWorker:1:QuorumCnxManager$SendWorker#1288] - Interrupted while waiting for message on queue
java.lang.InterruptedException
On the VM the logs look like this:
2023-02-17 14:03:42,165 [myid:1] - INFO [ListenerHandler-inzzk01/10.100.100.128:3888:QuorumCnxManager$Listener$ListenerHandler#1076] - Received connection request from /10.100.102.106:51674
2023-02-17 14:03:42,167 [myid:1] - WARN [RecvWorker:4:QuorumCnxManager$RecvWorker#1402] - Connection broken for id 4, my id = 1
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1390)
2023-02-17 14:03:42,167 [myid:1] - WARN [RecvWorker:4:QuorumCnxManager$RecvWorker#1408] - Interrupting SendWorker thread from RecvWorker. sid: 4. myId: 1
2023-02-17 14:03:42,167 [myid:1] - WARN [SendWorker:4:QuorumCnxManager$SendWorker#1288] - Interrupted while waiting for message on queue
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
at org.apache.zookeeper.util.CircularBlockingQueue.poll(CircularBlockingQueue.java:105)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1453)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$900(QuorumCnxManager.java:99)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:1277)
2023-02-17 14:03:42,167 [myid:1] - WARN [SendWorker:4:QuorumCnxManager$SendWorker#1300] - Send worker leaving thread id 4 my id = 1
I am able to connect from the pod to the cluster with zkCli.sh
root#inccl02az12-23rpb-mvnnw:/apache-zookeeper-3.7.1-bin# bin/zkCli.sh -timeout 3000 -server inzzk01:2181
[zk: inzzk01:2181(CONNECTED) 6] get /zookeeper/config
server.1=inzzk01:2888:3888:participant;0.0.0.0:2181
server.2=inzzk02:2888:3888:participant;0.0.0.0:2181
server.3=inzzk03:2888:3888:participant;0.0.0.0:2181
version=1600000000
[zk: inzzk01:2181(CONNECTED) 7]
So how can I connect at least one zookeeper node as a pod in k8s to an existing cluster outside k8s ?
Related
I am trying to run this github repo on docker as specified by the instructions
Aidbox with Timescale DB
However when running the conatiners one of them throws this error message
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
WARNING: read already refers to: #'clojure.core/read in namespace: aidbox.sdk.crud, being replaced by: #'aidbox.sdk.crud/read
WARNING: update already refers to: #'clojure.core/update in namespace: aidbox.sdk.crud, being replaced by: #'aidbox.sdk.crud/update
Exception in thread "main" com.zaxxer.hikari.pool.HikariPool$PoolInitializationException: Failed to initialize pool: Connection to localhost:5488 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.
at com.zaxxer.hikari.pool.HikariPool.throwPoolInitializationException(HikariPool.java:597)
at com.zaxxer.hikari.pool.HikariPool.checkFailFast(HikariPool.java:576)
at com.zaxxer.hikari.pool.HikariPool.<init>(HikariPool.java:115)
at com.zaxxer.hikari.HikariDataSource.<init>(HikariDataSource.java:81)
at app.db$create_pool.invokeStatic(db.clj:45)
at app.db$create_pool.invoke(db.clj:42)
at app.db$datasource.invokeStatic(db.clj:70)
at app.db$datasource.invoke(db.clj:61)
at app.core$mk_connection.invokeStatic(core.clj:234)
at app.core$mk_connection.invoke(core.clj:231)
at app.core$_main.invokeStatic(core.clj:241)
at app.core$_main.invoke(core.clj:240)
at clojure.lang.AFn.applyToHelper(AFn.java:152)
at clojure.lang.AFn.applyTo(AFn.java:144)
at app.core.main(Unknown Source)
Caused by: org.postgresql.util.PSQLException: Connection to localhost:5488 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.
at org.postgresql.Driver$ConnectThread.getResult(Driver.java:409)
at org.postgresql.Driver.connect(Driver.java:267)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:247)
at org.postgresql.ds.common.BaseDataSource.getConnection(BaseDataSource.java:98)
at org.postgresql.ds.common.BaseDataSource.getConnection(BaseDataSource.java:83)
at com.zaxxer.hikari.pool.PoolBase.newConnection(PoolBase.java:353)
at com.zaxxer.hikari.pool.PoolBase.newPoolEntry(PoolBase.java:201)
at com.zaxxer.hikari.pool.HikariPool.createPoolEntry(HikariPool.java:473)
at com.zaxxer.hikari.pool.HikariPool.checkFailFast(HikariPool.java:562)
... 13 more
Caused by: java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:607)
at org.postgresql.core.PGStream.<init>(PGStream.java:75)
at org.postgresql.core.v3.ConnectionFactoryImpl.tryConnect(ConnectionFactoryImpl.java:91)
at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:192)
at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:49)
at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:211)
at org.postgresql.Driver.makeConnection(Driver.java:458)
at org.postgresql.Driver.access$100(Driver.java:57)
at org.postgresql.Driver$ConnectThread.run(Driver.java:368)
at java.lang.Thread.run(Thread.java:748)
I have tried changing the port the database (Which I think it’s what it’s trying to connect to) from port 5432 to 5488 but I get the same error. I have read other stack-overflow posts on this error message but they haven’t worked either.
I have also tried to speak with the company that made this project but as is understandable since this is an open source project of theirs from which they make no money it is not in their priorities to fix this issue.
So hopefully one of the brilliant minds roaming this site can help me out.
If you want more info on what the repo is for
Bringing data from wearables and medical IoT devices to FHIR solutions
I see container_name: timescaledb in docker-compose file. Hence host will be timescaledb and default port 5432 if you are trying to connect from another container in same network
With Ignite 2.7.6 when trying to bring up an embedded ignite server node (in a spring boot app) on a docker bridge network with simple configuration the server start up fails with the below error,
[10:16:16] Ignite node started OK (id=e7276b83)
[10:16:16] >>> Ignite cluster is not active (limited functionality available). Use control.(sh|bat) script or IgniteCluster interface to activate.
[10:16:16] Topology snapshot [ver=1, locNode=e7276b83, servers=1, clients=0, state=INACTIVE, CPUs=1, offheap=0.1GB, heap=0.4GB]
mediation-service - [INFO ] 10:16:16.981 [main] com.**.**.perfmon.common.spring.EmbeddedIgnite - ====>>> Activating Ignite Cluster
mediation-service - [WARN ] 10:16:17.383 [exchange-worker-#49] org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager - Started write-ahead log manager in NONE mode, persisted data may be lost in a case of unexpected node failure. Make sure to deactivate the cluster before shutdown.
[10:16:17] Started write-ahead log manager in NONE mode, persisted data may be lost in a case of unexpected node failure. Make sure to deactivate the cluster before shutdown.
mediation-service - [ERROR] 10:16:21.982 [tcp-disco-srvr-#3] org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi - Failed to accept TCP connection.
java.net.SocketTimeoutException: Accept timed out
at java.base/java.net.PlainSocketImpl.socketAccept(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:458)
at java.base/java.net.ServerSocket.implAccept(ServerSocket.java:565)
at java.base/java.net.ServerSocket.accept(ServerSocket.java:533)
at org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServer.body(ServerImpl.java:5845)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServerThread.body(ServerImpl.java:5763)
at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
mediation-service - [WARN ] 10:16:21.982 [RMI TCP Accept-19887] sun.rmi.transport.tcp - RMI TCP Accept-19887: accept loop for ServerSocket[addr=0.0.0.0/0.0.0.0,localport=19887] throws
java.net.SocketTimeoutException: Accept timed out
at java.base/java.net.PlainSocketImpl.socketAccept(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:458)
at java.base/java.net.ServerSocket.implAccept(ServerSocket.java:565)
at java.base/java.net.ServerSocket.accept(ServerSocket.java:533)
at java.rmi/sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:394)
at java.rmi/sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:366)
at java.base/java.lang.Thread.run(Thread.java:834)
mediation-service - [WARN ] 10:16:21.982 [RMI TCP Accept-0] sun.rmi.transport.tcp - RMI TCP Accept-0: accept loop for ServerSocket[addr=0.0.0.0/0.0.0.0,localport=33254] throws
java.net.SocketTimeoutException: Accept timed out
at java.base/java.net.PlainSocketImpl.socketAccept(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:458)
at java.base/java.net.ServerSocket.implAccept(ServerSocket.java:565)
at java.base/java.net.ServerSocket.accept(ServerSocket.java:533)
at java.rmi/sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:394)
at java.rmi/sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:366)
at java.base/java.lang.Thread.run(Thread.java:834)
mediation-service - [ERROR] 10:16:21.984 [tcp-disco-srvr-#3] - Critical system error detected. Will be handled accordingly to configured handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=java.net.SocketTimeoutException: Accept timed out]]
Below are the relevant config,
Ignite config xml snippet:
....
....
<property name="discoverySpi">
<bean
class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
<property name="ipFinder">
<bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder"/>
</property>
</bean>
</property>
....
....
docker-compose snippet:
services:
***-mediation-service:
image: ***/mediation-service:latest
build: .
environment:
- PERCENTAGE_OF_RAM_FOR_HEAP=80.0
- SERVICE_NAME=mediation-service
- SERVICE_PORT=9887
- IGNITE_TCP_DISCOVERY_ADDRESSES=localhost
- JAVA_TOOL_OPTIONS=-Dcom.sun.management.jmxremote=true
-Dcom.sun.management.jmxremote.rmi.port=19887
-Dcom.sun.management.jmxremote.port=19887
-Dcom.sun.management.jmxremote.local.only=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-Djava.rmi.server.hostname=$HOST_IP
-Djava.net.preferIPv4Stack=true
-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=29887
...
...
networks:
- something-mediation-network
networks:
something-mediation-network:
driver: bridge
ipam:
driver: default
config:
- subnet: 186.30.240.0/24
Any one knows whats going on here?
Thanks
Muthu
UPDATE (11/13/2020): I tried the same with 2.9.0 as suggested by #alamar but with the same result..please see below
mediation-service - [ERROR] 01:03:16.871 [tcp-disco-srvr-[:47500]-#3-#50] org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi - Failed to accept TCP connection.
java.net.SocketTimeoutException: Accept timed out
at java.base/java.net.PlainSocketImpl.socketAccept(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:458)
at java.base/java.net.ServerSocket.implAccept(ServerSocket.java:565)
at java.base/java.net.ServerSocket.accept(ServerSocket.java:533)
at org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServer.body(ServerImpl.java:6620)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServerThread.body(ServerImpl.java:6543)
at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:58)
mediation-service - [WARN ] 01:03:16.871 [RMI TCP Accept-19887] sun.rmi.transport.tcp - RMI TCP Accept-19887: accept loop for ServerSocket[addr=0.0.0.0/0.0.0.0,localport=19887] throws
java.net.SocketTimeoutException: Accept timed out
at java.base/java.net.PlainSocketImpl.socketAccept(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:458)
at java.base/java.net.ServerSocket.implAccept(ServerSocket.java:565)
at java.base/java.net.ServerSocket.accept(ServerSocket.java:533)
at java.rmi/sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:394)
at java.rmi/sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:366)
at java.base/java.lang.Thread.run(Thread.java:834)
mediation-service - [WARN ] 01:03:16.871 [RMI TCP Accept-0] sun.rmi.transport.tcp - RMI TCP Accept-0: accept loop for ServerSocket[addr=0.0.0.0/0.0.0.0,localport=33351] throws
java.net.SocketTimeoutException: Accept timed out
at java.base/java.net.PlainSocketImpl.socketAccept(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:458)
at java.base/java.net.ServerSocket.implAccept(ServerSocket.java:565)
at java.base/java.net.ServerSocket.accept(ServerSocket.java:533)
at java.rmi/sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:394)
at java.rmi/sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:366)
at java.base/java.lang.Thread.run(Thread.java:834)
mediation-service - [ERROR] 01:03:16.876 [tcp-disco-srvr-[:47500]-#3-#50] - Critical system error detected. Will be handled accordingly to configured handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=java.net.SocketTimeoutException: Accept timed out]]
java.net.SocketTimeoutException: Accept timed out
at java.base/java.net.PlainSocketImpl.socketAccept(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:458)
at java.base/java.net.ServerSocket.implAccept(ServerSocket.java:565)
at java.base/java.net.ServerSocket.accept(ServerSocket.java:533)
at org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServer.body(ServerImpl.java:6620)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServerThread.body(ServerImpl.java:6543)
at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:58)
mediation-service - [WARN ] 01:03:17.271 [tcp-disco-srvr-[:47500]-#3-#50] org.apache.ignite.internal.processors.cache.CacheDiagnosticManager - Page locks dump:
UPDATE (11/18/2020):
I have another update which is that if i use Java 8 instead of Java 11 i don't see this issue during cluster activation & things work.
So i suspect this has something to do with the underlying java library use/dependencies..
The error means that the socket has a timeout set, and no incoming message was received during the timeout.
The funny thing is that the socket that Ignite creates has no timeout! Which suggests a bug somewhere...
... and this time it's in Java: JDK-8237858. The bug description says that the accept can be interrupted by a signal (which is expected), and that causes Java to throw the error (which is the bug).
According to the OpenJDK Jira, this doesn't affect Java 8. Fixed in Java 16, and also doesn't affect Java 13 with default settings.
I don't see mentions of fixes in Java 11 maintenance releases though.
UPDATE: There is a fix for this in 2.12. Basically, Ignite had to embed a workaround for the bug in its own code.
I'm trying to publish MQTT messages to the iot.eclipse.org:1883. I installed the heater bundle to Kura 3.0.0. in my raspberry pi B (with raspbian) and set the 'broker-url' in the MqttDataTransport to the Eclipse sandbox broker and the 'connect.auto-on-startup' to true.
I saw that my status did not change after that. Then, I displayed the Kura.log and I realized that DataPublisherService did not connect.CloudServices disconnected
o.e.k.d.h.Heater - Getting CloudClient for heater...
2017-07-21 12:15:20,836 [Component Resolve Thread (Bundle 7)] INFO o.e.k.d.h.Heater - Activating Heater... Done.
2017-07-21 12:15:21,034 [Component Resolve Thread (Bundle 7)] INFO o.e.k.c.c.ConfigurationServiceImpl - Registering ConfigurableComponent - org.eclipse.kura.demo.heater.Heater....
2017-07-21 12:15:21,040 [Component Resolve Thread (Bundle 7)] INFO o.e.k.c.c.ConfigurationServiceImpl - Registering ConfigurableComponent - org.eclipse.kura.demo.heater.Heater....Done
2017-07-21 12:15:22,974 [] INFO o.e.k.d.h.Heater - Published to data message: org.eclipse.kura.message.KuraPayload#248966
2017-07-21 12:15:22,986 [DataServiceImpl:Submit] INFO o.e.k.c.d.DataServiceImpl - DataPublisherService not connected
Caused by: java.net.SocketTimeoutException: connect timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.eclipse.paho.client.mqttv3.internal.TCPNetworkModule.start(TCPNetworkModule.java:70)
at org.eclipse.paho.client.mqttv3.internal.ClientComms$ConnectBG.run(ClientComms.java:650)
... 1 more
MqttDataTransport - xxxxx Connect failed. Forcing disconnect. xxxxx {}
o.e.k.c.d.t.m.MqttDataTransport - Closing client...
o.e.k.c.d.t.m.MqttDataTransport - Closed
`
What must I do? Thanks in advance.
I have a Ubuntu master machine and windows 10 slave machine
I need to connect slave machine(windows machine) from master machine(ubuntu) using SSH Connection
Followed the below link
https://devopscube.com/setup-slaves-on-jenkins-2/
attached node details and global credential configuration:
Faced the below issue while launching. please help me on this
[05/08/17 06:26:10] [SSH] Opening SSH connection to 172.16.108.233:22. Connection timed out (Connection timed out) SSH Connection failed with IOException: "Connection timed out (Connection timed out)". java.io.IOException: There was a problem while connecting to
172.16.108.233:22 at com.trilead.ssh2.Connection.connect(Connection.java:818) at com.trilead.ssh2.Connection.connect(Connection.java:687) at com.trilead.ssh2.Connection.connect(Connection.java:587) at hudson.plugins.sshslaves.SSHLauncher.openConnection(SSHLauncher.java:1198) at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:724) at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:719) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.net.ConnectException: Connection timed out (Connection timed out) at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at com.trilead.ssh2.transport.TransportManager.establishConnection(TransportManager.java:354) at com.trilead.ssh2.transport.TransportManager.initialize(TransportManager.java:467) at com.trilead.ssh2.Connection.connect(Connection.java:758) ... 9 more [05/08/17 06:28:21] Launch failed - cleaning up connection [05/08/17 06:28:21] [SSH] Connection closed.
Note: In Slave machine Git only installed(Jenkins not installed)
In the key configuration you should put the private key not the public key
And in the slave machine you don't need to install Jenkins
How can i detect this exception when i start my application:
org.springframework.amqp.AmqpConnectException: java.net.ConnectException: Connection timed out: connect
The project is configured in the Spring way (http://projects.spring.io/spring-amqp/) and I can see the errors in the logs, but I can't detect it in my java Classes.
Consumer raised exception, processing can restart if the connection factory supports it. Exception summary: org.springframework.amqp.AmqpConnectException: java.net.ConnectException: Connection timed out: connect
Restarting Consumer: tags=[{}], channel=null, acknowledgeMode=AUTO local queue size=0
Failed to check/redeclare auto-delete queue(s).org.springframework.amqp.AmqpConnectException: java.net.ConnectException: Connection timed out: connect
at org.springframework.amqp.rabbit.support.RabbitExceptionTranslator.convertRabbitAccessException(RabbitExceptionTranslator.java:54)
at org.springframework.amqp.rabbit.connection.AbstractConnectionFactory.createBareConnection(AbstractConnectionFactory.java:217)
at org.springframework.amqp.rabbit.connection.CachingConnectionFactory.createConnection(CachingConnectionFactory.java:444)
at org.springframework.amqp.rabbit.connection.ConnectionFactoryUtils$1.createConnection(ConnectionFactoryUtils.java:80)
at org.springframework.amqp.rabbit.connection.ConnectionFactoryUtils.doGetTransactionalResourceHolder(ConnectionFactoryUtils.java:130)
at org.springframework.amqp.rabbit.connection.ConnectionFactoryUtils.getTransactionalResourceHolder(ConnectionFactoryUtils.java:67)
at org.springframework.amqp.rabbit.core.RabbitTemplate.doExecute(RabbitTemplate.java:1035)
at org.springframework.amqp.rabbit.core.RabbitTemplate.execute(RabbitTemplate.java:1028)
at org.springframework.amqp.rabbit.core.RabbitTemplate.execute(RabbitTemplate.java:1004)
at org.springframework.amqp.rabbit.core.RabbitAdmin.getQueueProperties(RabbitAdmin.java:254)
at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer.redeclareElementsIfNecessary(SimpleMessageListenerContainer.java:947)
at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer.access$300(SimpleMessageListenerContainer.java:82)
at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer$AsyncMessageProcessingConsumer.run(SimpleMessageListenerContainer.java:1065)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection timed out: connect
at java.net.DualStackPlainSocketImpl.connect0(Native Method)
at java.net.DualStackPlainSocketImpl.socketConnect(DualStackPlainSocketImpl.java:79)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:172)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at com.rabbitmq.client.impl.FrameHandlerFactory.create(FrameHandlerFactory.java:32)
at com.rabbitmq.client.ConnectionFactory.newConnection(ConnectionFactory.java:615)
at com.rabbitmq.client.ConnectionFactory.newConnection(ConnectionFactory.java:651)
at org.springframework.amqp.rabbit.connection.AbstractConnectionFactory.createBareConnection(AbstractConnectionFactory.java:208)
... 12 more
This can happen for example when you have no access to the AMQP port.
Thank you!
There is currently no event emitted when this error occurs (I have created a JIRA issue to add one.
In the meantime, since the the connection is shared, you could call createConnection() on the connection factory from your own code from time to time, to check the connection state.
Or, you could hook into the logging subsystem to capture the WARN log, as I described in this answer.