The scenario is made by 2 VMs, each one with docker and hazelcast memebers running as containers.
Reading this guide https://hazelcast.com/blog/configuring-hazelcast-in-non-orchestrated-docker-environments/ I can get the Scenario 3 Public IP address, port mapping, and TCP discovery method working with one member per node.
But if I add a member to one of the nodes it takes the place of the other member in the cluster or logs connection problems. So I'm not able to make the cluster working with more than one member per node.
The configuration in both nodes is:
hazelcast:
network:
join:
multicast:
enabled: false
tcp-ip:
enabled: true
member-list:
- 10.132.0.2:5701
- 10.128.0.3:5701
- 10.128.0.3:5702
The Container in node 10.132.0.2 is run with:
docker run -v `pwd`:/mnt --rm --name member1 -e "JAVA_OPTS=-Dhazelcast.local.
publicAddress=10.132.0.2 -Dhazelcast.config=/mnt/hazelcast.yml" -p 5701:5701 hazelcast/hazelcast:4.0.1
########################################
# JAVA_OPTS=-Djava.net.preferIPv4Stack=true -Djava.util.logging.config.file=/opt/hazelcast/logging.properties -XX:MaxRAMPercentage=80.0 -XX:+UseParallelGC --add-modules java.se --add-exports java.base/jdk.internal.ref=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.nio=ALL-UNNAMED --add-opens java.base/sun.nio.ch=ALL-UNNAMED --add-opens java.management/sun.management=ALL-UNNAMED --add-opens jdk.management/com.sun.management.internal=ALL-UNNAMED -Dhazelcast.local.publicAddress=10.132.0.2 -Dhazelcast.config=/mnt/hazelcast.yml
# CLASSPATH=/opt/hazelcast/*:/opt/hazelcast/lib/*
# starting now....
########################################
+ exec java -server -Djava.net.preferIPv4Stack=true -Djava.util.logging.config.file=/opt/hazelcast/logging.properties -XX:MaxRAMPercentage=80.0 -XX:+UseParallelGC --add-modules java.se --add-exports java.base/jdk.internal.ref=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.nio=ALL-UNNAMED --add-opens java.base/sun.nio.ch=ALL-UNNAMED --add-opens java.management/sun.management=ALL-UNNAMED --add-opens jdk.management/com.sun.management.internal=ALL-UNNAMED -Dhazelcast.local.publicAddress=10.132.0.2 -Dhazelcast.config=/mnt/hazelcast.yml com.hazelcast.core.server.HazelcastMemberStarter
Sep 29, 2020 6:35:23 AM com.hazelcast.internal.config.AbstractConfigLocator
INFO: Loading configuration '/mnt/hazelcast.yml' from System property 'hazelcast.config'
Sep 29, 2020 6:35:23 AM com.hazelcast.internal.config.AbstractConfigLocator
INFO: Using configuration file at /mnt/hazelcast.yml
Sep 29, 2020 6:35:24 AM com.hazelcast.instance.AddressPicker
INFO: [LOCAL] [dev] [4.0.1] Interfaces is disabled, trying to pick one address from TCP-IP config addresses: [10.128.0.3, 10.132.0.2]
Sep 29, 2020 6:35:24 AM com.hazelcast.instance.AddressPicker
INFO: [LOCAL] [dev] [4.0.1] Prefer IPv4 stack is true, prefer IPv6 addresses is false
Sep 29, 2020 6:35:24 AM com.hazelcast.instance.AddressPicker
WARNING: [LOCAL] [dev] [4.0.1] Could not find a matching address to start with! Picking one of non-loopback addresses.
Sep 29, 2020 6:35:24 AM com.hazelcast.instance.AddressPicker
INFO: [LOCAL] [dev] [4.0.1] Picked [172.17.0.2]:5701, using socket ServerSocket[addr=/0.0.0.0,localport=5701], bind any local is true
Sep 29, 2020 6:35:24 AM com.hazelcast.instance.AddressPicker
INFO: [LOCAL] [dev] [4.0.1] Using public address: [10.132.0.2]:5701
Sep 29, 2020 6:35:24 AM com.hazelcast.system
INFO: [10.132.0.2]:5701 [dev] [4.0.1] Hazelcast 4.0.1 (20200409 - e086b9c) starting at [10.132.0.2]:5701
Sep 29, 2020 6:35:24 AM com.hazelcast.system
INFO: [10.132.0.2]:5701 [dev] [4.0.1] Copyright (c) 2008-2020, Hazelcast, Inc. All Rights Reserved.
Sep 29, 2020 6:35:24 AM com.hazelcast.spi.impl.operationservice.impl.BackpressureRegulator
INFO: [10.132.0.2]:5701 [dev] [4.0.1] Backpressure is disabled
Sep 29, 2020 6:35:25 AM com.hazelcast.instance.impl.Node
INFO: [10.132.0.2]:5701 [dev] [4.0.1] Creating TcpIpJoiner
Sep 29, 2020 6:35:25 AM com.hazelcast.cp.CPSubsystem
WARNING: [10.132.0.2]:5701 [dev] [4.0.1] CP Subsystem is not enabled. CP data structures will operate in UNSAFE mode! Please note that UNSAFE mode will not provide strong consistency guarantees.
Sep 29, 2020 6:35:26 AM com.hazelcast.spi.impl.operationexecutor.impl.OperationExecutorImpl
INFO: [10.132.0.2]:5701 [dev] [4.0.1] Starting 2 partition threads and 3 generic threads (1 dedicated for priority tasks)
Sep 29, 2020 6:35:26 AM com.hazelcast.internal.diagnostics.Diagnostics
INFO: [10.132.0.2]:5701 [dev] [4.0.1] Diagnostics disabled. To enable add -Dhazelcast.diagnostics.enabled=true to the JVM arguments.
Sep 29, 2020 6:35:26 AM com.hazelcast.core.LifecycleService
INFO: [10.132.0.2]:5701 [dev] [4.0.1] [10.132.0.2]:5701 is STARTING
Sep 29, 2020 6:35:26 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.132.0.2]:5701 [dev] [4.0.1] Connecting to /10.128.0.3:5701, timeout: 10000, bind-any: true
Sep 29, 2020 6:35:26 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.132.0.2]:5701 [dev] [4.0.1] Connecting to /10.128.0.3:5702, timeout: 10000, bind-any: true
Sep 29, 2020 6:35:26 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.132.0.2]:5701 [dev] [4.0.1] Connecting to /10.132.0.2:5703, timeout: 10000, bind-any: true
Sep 29, 2020 6:35:26 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.132.0.2]:5701 [dev] [4.0.1] Connecting to /10.132.0.2:5702, timeout: 10000, bind-any: true
Sep 29, 2020 6:35:36 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.132.0.2]:5701 [dev] [4.0.1] Could not connect to: /10.128.0.3:5702. Reason: SocketTimeoutException[null]
Sep 29, 2020 6:35:36 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.132.0.2]:5701 [dev] [4.0.1] Could not connect to: /10.128.0.3:5701. Reason: SocketTimeoutException[null]
Sep 29, 2020 6:35:36 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.132.0.2]:5701 [dev] [4.0.1] Could not connect to: /10.132.0.2:5702. Reason: SocketTimeoutException[null]
Sep 29, 2020 6:35:36 AM com.hazelcast.internal.cluster.impl.TcpIpJoiner
INFO: [10.132.0.2]:5701 [dev] [4.0.1] [10.132.0.2]:5702 is added to the blacklist.
Sep 29, 2020 6:35:36 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.132.0.2]:5701 [dev] [4.0.1] Could not connect to: /10.132.0.2:5703. Reason: SocketTimeoutException[null]
Sep 29, 2020 6:35:36 AM com.hazelcast.internal.cluster.impl.TcpIpJoiner
INFO: [10.132.0.2]:5701 [dev] [4.0.1] [10.128.0.3]:5702 is added to the blacklist.
Sep 29, 2020 6:35:36 AM com.hazelcast.internal.cluster.impl.TcpIpJoiner
INFO: [10.132.0.2]:5701 [dev] [4.0.1] [10.128.0.3]:5701 is added to the blacklist.
Sep 29, 2020 6:35:36 AM com.hazelcast.internal.cluster.impl.TcpIpJoiner
INFO: [10.132.0.2]:5701 [dev] [4.0.1] [10.132.0.2]:5703 is added to the blacklist.
Sep 29, 2020 6:35:37 AM com.hazelcast.internal.cluster.ClusterService
INFO: [10.132.0.2]:5701 [dev] [4.0.1]
Members {size:1, ver:1} [
Member [10.132.0.2]:5701 - 69284e57-ce61-405c-87d3-1e9ea46b2bed this
]
Sep 29, 2020 6:35:37 AM com.hazelcast.core.LifecycleService
INFO: [10.132.0.2]:5701 [dev] [4.0.1] [10.132.0.2]:5701 is STARTED
The containers in node 10.128.0.3 are run with:
docker run -v `pwd`:/mnt --rm --name member2 -e "JAVA_OPTS=-Dhazelcast.local.p
ublicAddress=10.128.0.3:5701 -Dhazelcast.config=/mnt/hazelcast.yml" -p 5701:5701 hazelcast/hazelcast:4.0.1
########################################
# JAVA_OPTS=-Djava.net.preferIPv4Stack=true -Djava.util.logging.config.file=/opt/hazelcast/logging.properties -XX:MaxRAMPercentage=80.0 -XX:+UseParallelGC --add-modules java.se --add-exports java.base/jdk.internal.ref=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.nio=ALL-UNNAMED --add-opens java.base/sun.nio.ch=ALL-UNNAMED --add-opens java.management/sun.management=ALL-UNNAMED --add-opens jdk.management/com.sun.management.internal=ALL-UNNAMED -Dhazelcast.local.publicAddress=10.128.0.3:5701 -Dhazelcast.config=/mnt/hazelcast.yml
# CLASSPATH=/opt/hazelcast/*:/opt/hazelcast/lib/*
# starting now....
########################################
Sep 29, 2020 6:36:54 AM com.hazelcast.internal.config.AbstractConfigLocator
INFO: Loading configuration '/mnt/hazelcast.yml' from System property 'hazelcast.config'
Sep 29, 2020 6:36:54 AM com.hazelcast.internal.config.AbstractConfigLocator
INFO: Using configuration file at /mnt/hazelcast.yml
Sep 29, 2020 6:36:55 AM com.hazelcast.instance.AddressPicker
INFO: [LOCAL] [dev] [4.0.1] Interfaces is disabled, trying to pick one address from TCP-IP config addresses: [10.128.0.3, 10.132.0.2]
Sep 29, 2020 6:36:55 AM com.hazelcast.instance.AddressPicker
INFO: [LOCAL] [dev] [4.0.1] Prefer IPv4 stack is true, prefer IPv6 addresses is false
Sep 29, 2020 6:36:55 AM com.hazelcast.instance.AddressPicker
WARNING: [LOCAL] [dev] [4.0.1] Could not find a matching address to start with! Picking one of non-loopback addresses.
Sep 29, 2020 6:36:55 AM com.hazelcast.instance.AddressPicker
INFO: [LOCAL] [dev] [4.0.1] Picked [172.17.0.2]:5701, using socket ServerSocket[addr=/0.0.0.0,localport=5701], bind any local is true
Sep 29, 2020 6:36:55 AM com.hazelcast.instance.AddressPicker
INFO: [LOCAL] [dev] [4.0.1] Using public address: [10.128.0.3]:5701
Sep 29, 2020 6:36:55 AM com.hazelcast.system
INFO: [10.128.0.3]:5701 [dev] [4.0.1] Hazelcast 4.0.1 (20200409 - e086b9c) starting at [10.128.0.3]:5701
Sep 29, 2020 6:36:55 AM com.hazelcast.system
INFO: [10.128.0.3]:5701 [dev] [4.0.1] Copyright (c) 2008-2020, Hazelcast, Inc. All Rights Reserved.
Sep 29, 2020 6:36:56 AM com.hazelcast.spi.impl.operationservice.impl.BackpressureRegulator
INFO: [10.128.0.3]:5701 [dev] [4.0.1] Backpressure is disabled
Sep 29, 2020 6:36:56 AM com.hazelcast.instance.impl.Node
INFO: [10.128.0.3]:5701 [dev] [4.0.1] Creating TcpIpJoiner
Sep 29, 2020 6:36:56 AM com.hazelcast.cp.CPSubsystem
WARNING: [10.128.0.3]:5701 [dev] [4.0.1] CP Subsystem is not enabled. CP data structures will operate in UNSAFE mode! Please note that UNSAFE mode will not provide strong consistency guarantees.
Sep 29, 2020 6:36:58 AM com.hazelcast.spi.impl.operationexecutor.impl.OperationExecutorImpl
INFO: [10.128.0.3]:5701 [dev] [4.0.1] Starting 2 partition threads and 3 generic threads (1 dedicated for priority tasks)
Sep 29, 2020 6:36:58 AM com.hazelcast.internal.diagnostics.Diagnostics
INFO: [10.128.0.3]:5701 [dev] [4.0.1] Diagnostics disabled. To enable add -Dhazelcast.diagnostics.enabled=true to the JVM arguments.
Sep 29, 2020 6:36:58 AM com.hazelcast.core.LifecycleService
INFO: [10.128.0.3]:5701 [dev] [4.0.1] [10.128.0.3]:5701 is STARTING
Sep 29, 2020 6:36:58 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5701 [dev] [4.0.1] Connecting to /10.128.0.3:5702, timeout: 10000, bind-any: true
Sep 29, 2020 6:36:58 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5701 [dev] [4.0.1] Connecting to /10.132.0.2:5703, timeout: 10000, bind-any: true
Sep 29, 2020 6:36:58 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5701 [dev] [4.0.1] Connecting to /10.132.0.2:5702, timeout: 10000, bind-any: true
Sep 29, 2020 6:36:58 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5701 [dev] [4.0.1] Connecting to /10.132.0.2:5701, timeout: 10000, bind-any: true
Sep 29, 2020 6:36:58 AM com.hazelcast.internal.nio.tcp.TcpIpConnection
INFO: [10.128.0.3]:5701 [dev] [4.0.1] Initialized new cluster connection between /172.17.0.2:56429 and /10.132.0.2:5701
Sep 29, 2020 6:37:05 AM com.hazelcast.internal.cluster.ClusterService
INFO: [10.128.0.3]:5701 [dev] [4.0.1]
Members {size:2, ver:2} [
Member [10.132.0.2]:5701 - 69284e57-ce61-405c-87d3-1e9ea46b2bed
Member [10.128.0.3]:5701 - aa22f242-cc82-44ff-9dc1-06678d14420e this
]
Sep 29, 2020 6:37:06 AM com.hazelcast.core.LifecycleService
INFO: [10.128.0.3]:5701 [dev] [4.0.1] [10.128.0.3]:5701 is STARTED
Sep 29, 2020 6:37:08 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5701 [dev] [4.0.1] Could not connect to: /10.132.0.2:5702. Reason: SocketTimeoutException[null]
Sep 29, 2020 6:37:08 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5701 [dev] [4.0.1] Could not connect to: /10.128.0.3:5702. Reason: SocketTimeoutException[null]
Sep 29, 2020 6:37:08 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5701 [dev] [4.0.1] Could not connect to: /10.132.0.2:5703. Reason: SocketTimeoutException[null]
So far everything is ok, but when I start member 3:
docker run -v `pwd`:/mnt --rm --name member3 -e "JAVA_OPTS=-Dhazelcast.local.p
ublicAddress=10.128.0.3:5702 -Dhazelcast.config=/mnt/hazelcast.yml" -p 5702:5701 hazelcast/hazelcast:4.0.1
########################################
# JAVA_OPTS=-Djava.net.preferIPv4Stack=true -Djava.util.logging.config.file=/opt/hazelcast/logging.properties -XX:M
axRAMPercentage=80.0 -XX:+UseParallelGC --add-modules java.se --add-exports java.base/jdk.internal.ref=ALL-UNNAMED
--add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.nio=ALL-UNNAMED --add-opens java.base/sun.ni
o.ch=ALL-UNNAMED --add-opens java.management/sun.management=ALL-UNNAMED --add-opens jdk.management/com.sun.manageme
nt.internal=ALL-UNNAMED -Dhazelcast.local.publicAddress=10.128.0.3:5702 -Dhazelcast.config=/mnt/hazelcast.yml
# CLASSPATH=/opt/hazelcast/*:/opt/hazelcast/lib/*
# starting now....
########################################
+ exec java -server -Djava.net.preferIPv4Stack=true -Djava.util.logging.config.file=/opt/hazelcast/logging.properti
es -XX:MaxRAMPercentage=80.0 -XX:+UseParallelGC --add-modules java.se --add-exports java.base/jdk.internal.ref=ALL-
UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.nio=ALL-UNNAMED --add-opens java.bas
e/sun.nio.ch=ALL-UNNAMED --add-opens java.management/sun.management=ALL-UNNAMED --add-opens jdk.management/com.sun.
management.internal=ALL-UNNAMED -Dhazelcast.local.publicAddress=10.128.0.3:5702 -Dhazelcast.config=/mnt/hazelcast.y
ml com.hazelcast.core.server.HazelcastMemberStarter
Sep 29, 2020 6:38:26 AM com.hazelcast.internal.config.AbstractConfigLocator
INFO: Loading configuration '/mnt/hazelcast.yml' from System property 'hazelcast.config'
Sep 29, 2020 6:38:26 AM com.hazelcast.internal.config.AbstractConfigLocator
INFO: Using configuration file at /mnt/hazelcast.yml
Sep 29, 2020 6:38:26 AM com.hazelcast.instance.AddressPicker
INFO: [LOCAL] [dev] [4.0.1] Interfaces is disabled, trying to pick one address from TCP-IP config addresses: [10.12
8.0.3, 10.132.0.2]
Sep 29, 2020 6:38:26 AM com.hazelcast.instance.AddressPicker
INFO: [LOCAL] [dev] [4.0.1] Prefer IPv4 stack is true, prefer IPv6 addresses is false
Sep 29, 2020 6:38:26 AM com.hazelcast.instance.AddressPicker
WARNING: [LOCAL] [dev] [4.0.1] Could not find a matching address to start with! Picking one of non-loopback address
es.
Sep 29, 2020 6:38:26 AM com.hazelcast.instance.AddressPicker
INFO: [LOCAL] [dev] [4.0.1] Picked [172.17.0.3]:5701, using socket ServerSocket[addr=/0.0.0.0,localport=5701], bind
any local is true
Sep 29, 2020 6:38:26 AM com.hazelcast.instance.AddressPicker
INFO: [LOCAL] [dev] [4.0.1] Using public address: [10.128.0.3]:5702
Sep 29, 2020 6:38:26 AM com.hazelcast.system
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Hazelcast 4.0.1 (20200409 - e086b9c) starting at [10.128.0.3]:5702
Sep 29, 2020 6:38:26 AM com.hazelcast.system
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Copyright (c) 2008-2020, Hazelcast, Inc. All Rights Reserved.
Sep 29, 2020 6:38:27 AM com.hazelcast.spi.impl.operationservice.impl.BackpressureRegulator
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Backpressure is disabled
Sep 29, 2020 6:38:27 AM com.hazelcast.instance.impl.Node
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Creating TcpIpJoiner
Sep 29, 2020 6:38:27 AM com.hazelcast.cp.CPSubsystem
WARNING: [10.128.0.3]:5702 [dev] [4.0.1] CP Subsystem is not enabled. CP data structures will operate in UNSAFE mod
e! Please note that UNSAFE mode will not provide strong consistency guarantees.
Sep 29, 2020 6:38:27 AM com.hazelcast.spi.impl.operationexecutor.impl.OperationExecutorImpl
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Starting 2 partition threads and 3 generic threads (1 dedicated for priority
tasks)
Sep 29, 2020 6:38:27 AM com.hazelcast.internal.diagnostics.Diagnostics
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Diagnostics disabled. To enable add -Dhazelcast.diagnostics.enabled=true to t
he JVM arguments.
Sep 29, 2020 6:38:27 AM com.hazelcast.core.LifecycleService
INFO: [10.128.0.3]:5702 [dev] [4.0.1] [10.128.0.3]:5702 is STARTING
Sep 29, 2020 6:38:28 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Connecting to /10.132.0.2:5703, timeout: 10000, bind-any: true
Sep 29, 2020 6:38:28 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Connecting to /10.132.0.2:5702, timeout: 10000, bind-any: true
Sep 29, 2020 6:38:28 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Connecting to /10.128.0.3:5701, timeout: 10000, bind-any: true
Sep 29, 2020 6:38:28 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Connecting to /10.132.0.2:5701, timeout: 10000, bind-any: true
Sep 29, 2020 6:38:28 AM com.hazelcast.internal.nio.tcp.TcpIpConnection
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Initialized new cluster connection between /172.17.0.3:52951 and /10.132.0.2:
5701
Sep 29, 2020 6:38:35 AM com.hazelcast.internal.cluster.ClusterService
INFO: [10.128.0.3]:5702 [dev] [4.0.1]
Members {size:3, ver:3} [
Member [10.132.0.2]:5701 - 69284e57-ce61-405c-87d3-1e9ea46b2bed
Member [10.128.0.3]:5701 - aa22f242-cc82-44ff-9dc1-06678d14420e
Member [10.128.0.3]:5702 - 0dd31ea2-db2e-4e43-941a-98592e222817 this
]
Sep 29, 2020 6:38:38 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Could not connect to: /10.132.0.2:5703. Reason: SocketTimeoutException[null]
Sep 29, 2020 6:38:38 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Could not connect to: /10.128.0.3:5701. Reason: SocketTimeoutException[null]
Sep 29, 2020 6:38:38 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Could not connect to: /10.132.0.2:5702. Reason: SocketTimeoutException[null]
Sep 29, 2020 6:38:38 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Connecting to /10.128.0.3:5701, timeout: 10000, bind-any: true
Sep 29, 2020 6:38:48 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Could not connect to: /10.128.0.3:5701. Reason: SocketTimeoutException[null]
Sep 29, 2020 6:38:48 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Connecting to /10.128.0.3:5701, timeout: 10000, bind-any: true
Sep 29, 2020 6:38:58 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Could not connect to: /10.128.0.3:5701. Reason: SocketTimeoutException[null]
Sep 29, 2020 6:39:08 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Connecting to /10.128.0.3:5701, timeout: 10000, bind-any: true
Sep 29, 2020 6:39:18 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Could not connect to: /10.128.0.3:5701. Reason: SocketTimeoutException[null]
Sep 29, 2020 6:39:18 AM com.hazelcast.internal.nio.tcp.TcpIpConnectionErrorHandler
WARNING: [10.128.0.3]:5702 [dev] [4.0.1] Removing connection to endpoint [10.128.0.3]:5701 Cause => java.net.Socket
TimeoutException {null}, Error-Count: 5
Sep 29, 2020 6:39:18 AM com.hazelcast.internal.cluster.impl.MembershipManager
WARNING: [10.128.0.3]:5702 [dev] [4.0.1] Member [10.128.0.3]:5701 - aa22f242-cc82-44ff-9dc1-06678d14420e is suspect
ed to be dead for reason: No connection
Sep 29, 2020 6:39:18 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Connecting to /10.128.0.3:5701, timeout: 10000, bind-any: true
Sep 29, 2020 6:39:27 AM com.hazelcast.internal.cluster.impl.ClusterHeartbeatManager
WARNING: [10.128.0.3]:5702 [dev] [4.0.1] This node does not have a connection to Member [10.128.0.3]:5701 - aa22f24
2-cc82-44ff-9dc1-06678d14420e
Sep 29, 2020 6:39:28 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Could not connect to: /10.128.0.3:5701. Reason: SocketTimeoutException[null]
Sep 29, 2020 6:39:28 AM com.hazelcast.internal.nio.tcp.TcpIpConnectionErrorHandler
WARNING: [10.128.0.3]:5702 [dev] [4.0.1] Removing connection to endpoint [10.128.0.3]:5701 Cause => java.net.Socket
TimeoutException {null}, Error-Count: 6
Sep 29, 2020 6:39:32 AM com.hazelcast.internal.cluster.impl.ClusterHeartbeatManager
WARNING: [10.128.0.3]:5702 [dev] [4.0.1] This node does not have a connection to Member [10.128.0.3]:5701 - aa22f24
2-cc82-44ff-9dc1-06678d14420e
Sep 29, 2020 6:39:37 AM com.hazelcast.internal.cluster.impl.ClusterHeartbeatManager
WARNING: [10.128.0.3]:5702 [dev] [4.0.1] This node does not have a connection to Member [10.128.0.3]:5701 - aa22f24
2-cc82-44ff-9dc1-06678d14420e
Sep 29, 2020 6:39:38 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Connecting to /10.128.0.3:5701, timeout: 10000, bind-any: true
Sep 29, 2020 6:39:42 AM com.hazelcast.internal.cluster.impl.ClusterHeartbeatManager
WARNING: [10.128.0.3]:5702 [dev] [4.0.1] This node does not have a connection to Member [10.128.0.3]:5701 - aa22f24
2-cc82-44ff-9dc1-06678d14420e
Sep 29, 2020 6:39:47 AM com.hazelcast.internal.cluster.impl.ClusterHeartbeatManager
WARNING: [10.128.0.3]:5702 [dev] [4.0.1] This node does not have a connection to Member [10.128.0.3]:5701 - aa22f24
2-cc82-44ff-9dc1-06678d14420e
Sep 29, 2020 6:39:48 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Could not connect to: /10.128.0.3:5701. Reason: SocketTimeoutException[null]
Sep 29, 2020 6:39:48 AM com.hazelcast.internal.nio.tcp.TcpIpConnectionErrorHandler
WARNING: [10.128.0.3]:5702 [dev] [4.0.1] Removing connection to endpoint [10.128.0.3]:5701 Cause => java.net.Socket
TimeoutException {null}, Error-Count: 7
Sep 29, 2020 6:39:48 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Connecting to /10.128.0.3:5701, timeout: 10000, bind-any: true
Sep 29, 2020 6:39:52 AM com.hazelcast.internal.cluster.impl.ClusterHeartbeatManager
WARNING: [10.128.0.3]:5702 [dev] [4.0.1] This node does not have a connection to Member [10.128.0.3]:5701 - aa22f24
2-cc82-44ff-9dc1-06678d14420e
Sep 29, 2020 6:39:57 AM com.hazelcast.internal.cluster.impl.ClusterHeartbeatManager
WARNING: [10.128.0.3]:5702 [dev] [4.0.1] This node does not have a connection to Member [10.128.0.3]:5701 - aa22f24
2-cc82-44ff-9dc1-06678d14420e
Sep 29, 2020 6:39:58 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Could not connect to: /10.128.0.3:5701. Reason: SocketTimeoutException[null]
Sep 29, 2020 6:39:58 AM com.hazelcast.internal.nio.tcp.TcpIpConnectionErrorHandler
WARNING: [10.128.0.3]:5702 [dev] [4.0.1] Removing connection to endpoint [10.128.0.3]:5701 Cause => java.net.Socket
TimeoutException {null}, Error-Count: 8
It looks like there is a communication problem between the members on the same node
In another test the member3 replaced in the cluster the member2 and it marked the connection attemps from node2 as suspicious
The VM are made fresh on GCP and are on the same network, I used this image:
Google, Container-Optimized OS, 85-13310.1041.9 stable, Kernel: ChromiumOS-5.4.49 Kubernetes: 1.18.9 Docker: 19.03.9 Family: cos-stable, supports Shielded VM features, supports Confidential VM features on N2D
The problem is caused by member3 that thinks to be on the port 5701 instead of the 5702.
The solution is to specify in the configuration the port where the member will listen on the docker host
The configuration for the member3 is
hazelcast:
network:
port:
port: 5702
join:
multicast:
enabled: false
tcp-ip:
enabled: true
member-list:
- 10.132.0.2:5701
- 10.128.0.3:5701
- 10.128.0.3:5702
In this way the cluster works and every member can communicate with others.
Your scenario should work and I guess it will be some environmental issue. Could you try the following 2 things?
Explicitly set port also on member1
Use :5701 also in member1 configuration - both member-list (hazelcast.yml) and hazelcast.local.publicAddress property (command-line).
This step probably changes nothing for your problem, but it should at least avoid non-related warnings in the logs.
Try if member2 is reachable from member3
After you start your nodes, execute an interactive shell in member3 and try to send protocol challenge (HZC) to member2's socket address. If you see the protocol response (HZC) in the terminal, then the communication between containers works properly. If you don't see the response, then check your Docker and firewall configuration to see what can cause the issue.
docker exec -it member3 /bin/bash
# Following command runs in container.
# The first HZC line is the one you type (followed by Enter).
# The second is the reply from member2.
bash-5.0# nc 10.128.0.3 5701
HZC
HZC
If you see the correct protocol response in the terminal we will need to investigate the Hazelcast behavior more. I was not able to reproduce the problem in my environment.
If you want to take this a little further, Hazelcast supports the lifecycle calls into containers and from orchestration managers (i.e. K8s and OpenShift), for this you use the hazelcast network discovery module for that particular platform. This free's you from hard-coding addresses and ports and allows those assignments to be done dynamically at runtime.
I am runningJenkins Master & K8s-Master on same server. Jenkins running through tomcat Apache(not on K8s cluster). I have another server for K8s-Worker-Node, On both the server CentOS-8 OS installed. I have configured Jenkins Kubernetes Plugin version - 1.26.4 But while running pipeline job i always getting an error, Below is K8s cluster Jenkins agent pod log.
[root#K8s-Master /]# kubectl logs -f pipeline-test-33-sj6tl-r0clh-g559d -c jnlp
Aug 08, 2020 8:37:21 AM hudson.remoting.jnlp.Main createEngine
INFO: Setting up agent: pipeline-test-33-sj6tl-r0clh-g559d
Aug 08, 2020 8:37:21 AM hudson.remoting.jnlp.Main$CuiListener <init>
INFO: Jenkins agent is running in headless mode.
Aug 08, 2020 8:37:21 AM hudson.remoting.Engine startEngine
INFO: Using Remoting version: 4.3
Aug 08, 2020 8:37:21 AM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
INFO: Using /home/jenkins/agent/remoting as a remoting work directory
Aug 08, 2020 8:37:21 AM org.jenkinsci.remoting.engine.WorkDirManager setupLogging
INFO: Both error and output logs will be printed to /home/jenkins/agent/remoting
Aug 08, 2020 8:37:21 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among [http://jenkins-serverjenkins/]
Aug 08, 2020 8:37:41 AM hudson.remoting.jnlp.Main$CuiListener error
SEVERE: Failed to connect to http://jenkins-server/jenkins/tcpSlaveAgentListener/: jenkins-server
java.io.IOException: Failed to connect to http://jenkins-serverjenkins/tcpSlaveAgentListener/: jenkins-server
at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:217)
at hudson.remoting.Engine.innerRun(Engine.java:693)
at hudson.remoting.Engine.run(Engine.java:518)
Caused by: java.net.UnknownHostException: jenkins-server
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:607)
at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
at sun.net.www.http.HttpClient.New(HttpClient.java:339)
at sun.net.www.http.HttpClient.New(HttpClient.java:357)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1226)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1162)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1056)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:990)
at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:214)
... 2 more
Below settings configuration already enabled.
Manage Jenkins --> Configure Global Security --> Agents Random [Enabled]
I am successfully able to communicate from my Jenkins to the K8s master cluster(Verified in Jenkins Cloud section).
Even in K8s master all the namespace pods are running. weave-net CNI installed, Don't know what is causing problem while agent provisioning through Jenkins.
My Jenkins/K8s master & K8s-Worker-Node /etc/hosts as follows.
# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
75.76.77.5 jenkins-server jenkins-server.company.domain.com
75.76.77.6 k8s-node-1 k8s-node-1.company.domain.com
Below output getting in K8s-Worker node. It looks there is no problem in connecting jenkins-master from K8s-worker node.
# curl -I http://jenkins-server/jenkins/tcpSlaveAgentListener/
HTTP/1.1 200
Server: nginx/1.14.1
Date: Fri, 28 Aug 2020 06:13:34 GMT
Content-Type: text/plain;charset=UTF-8
Connection: keep-alive
Cache-Control: private
Expires: Thu, 01 Jan 1970 00:00:00 GMT
X-Content-Type-Options: nosniff
X-Hudson-JNLP-Port: 40021
X-Jenkins-JNLP-Port: 40021
X-Instance-Identity: MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAnkgz8Av2x8R9R2KZDzWm1K11O01r7VDikW48rCNQlgw/pUeNSPJu9pv7kH884tOE65GkMepNdtJcOFQFtY1qZ0sr5y4GF5TOc7+U/TqfwULt60r7OQlKcrsQx/jJkF0xLjR+xaJ64WKnbsl0AiZhd8/ynk02UxFXKcgwkEP2PGpGyQ1ps5t/yj6ueFiPAHX2ssK8aI7ynVbf3YyVrtFOlqhnTy11mJFoLAZnpjYRCJsrX5z/xciVq5c2XmEikLzMpjFl0YBAsDo7JL4eBUwiBr64HPcSKrsBBB9oPE4oI6GkYUCAni8uOLfzoNr9B1eImaETYSdVPdSKW/ez/OeHjQIDAQAB
X-Jenkins-Agent-Protocols: JNLP4-connect, Ping
X-Remoting-Minimum-Version: 3.14
# curl http://jenkins-server:40021/
Jenkins-Agent-Protocols: JNLP4-connect, Ping
Jenkins-Version: 2.235.3
Jenkins-Session: 4455fd45
Client: 75.76.77.6
Server: 75.76.77.5
Remoting-Minimum-Version: 3.14
It looks Kubernetes DNS not resolving the name. So any pointers to resolve this problem will help. Thanks.
It was an Kubernetes DNS resolution issue. With the help of following link - https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution created dnsutils.yaml pod and found that my K8s cluster pods was returning following error "connection timed out; no servers could be reached" for below command.
kubectl exec -i -t dnsutils -- nslookup kubernetes.default
So i have uninstalled and re-installed Kubernetes version - v1.19.0. Now everything working fine. Thanks.!!!
I have been facing this issue for the past 8 months (since Dec 2019) and decided to post it here - I am connecting Jenkins Agents - Inbound TCP (JNLP) across Azure VNet and Subscriptions.
The Agents within the same VNet / Subscription as Master connect without any issue and no ping timeout issues occurs. However, Jenkins Agents residing in other Vnets and subscriptions very often get disconnected due to Ping Timeout. These agents are in AKS clusters, built upon openjdk:8-jdk-alpine image and running as pods, managed by Deployments. We use port 50000 as a static port for all JNLP connections.
The logs for Agents in other Vnet's state:
Jul 25, 2020 12:03:33 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
INFO: Using /opt/workspace/remoting as a remoting work directory
Jul 25, 2020 12:03:33 PM org.jenkinsci.remoting.engine.WorkDirManager setupLogging
INFO: Both error and output logs will be printed to /opt/workspace/remoting
Jul 25, 2020 12:03:34 PM hudson.remoting.jnlp.Main createEngine
INFO: Setting up agent: jenkins-slave-jmeter
Jul 25, 2020 12:03:34 PM hudson.remoting.jnlp.Main$CuiListener <init>
INFO: Jenkins agent is running in headless mode.
Jul 25, 2020 12:03:34 PM hudson.remoting.Engine startEngine
INFO: Using Remoting version: 3.33
Jul 25, 2020 12:03:34 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
INFO: Using /opt/workspace/remoting as a remoting work directory
Jul 25, 2020 12:03:34 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among [https://jenkins.internaldomain.com/]
Jul 25, 2020 12:03:34 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve
INFO: Remoting server accepts the following protocols: [JNLP4-connect, Ping]
Jul 25, 2020 12:03:34 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Agent discovery successful
Agent address: jenkins.internaldomain.com
Agent port: 50000
Identity: 70:35:e7:e9:31:ed:a3:1f:xx:xx:xx:xx:xx:xx:xx:xx
Jul 25, 2020 12:03:34 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Handshaking
Jul 25, 2020 12:03:34 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connecting to jenkins.internaldomain.com:50000
Jul 25, 2020 12:03:34 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Trying protocol: JNLP4-connect
Jul 25, 2020 12:03:34 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Remote identity confirmed: 70:35:e7:e9:31:ed:a3:1f:xx:xx:xx:xx:xx:xx:xx:xx
Jul 25, 2020 12:03:35 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connected
Jul 25, 2020 12:27:37 PM hudson.slaves.ChannelPinger$1 onDead
INFO: Ping failed. Terminating the channel JNLP4-connect connection to jenkins.internaldomain.com/10.177.xxx.xxx:50000.
java.util.concurrent.TimeoutException: Ping started at 1595679817121 hasn't completed by 1595680057121
at hudson.remoting.PingThread.ping(PingThread.java:134)
at hudson.remoting.PingThread.run(PingThread.java:90)
Jul 25, 2020 12:32:37 PM hudson.slaves.ChannelPinger$1 onDead
INFO: Ping failed. Terminating the channel JNLP4-connect connection to jenkins.internaldomain.com/10.177.xxx.xxx:50000.
java.util.concurrent.TimeoutException: Ping started at 1595680117120 hasn't completed by 1595680357121
at hudson.remoting.PingThread.ping(PingThread.java:134)
at hudson.remoting.PingThread.run(PingThread.java:90)
Jul 25, 2020 12:37:37 PM hudson.slaves.ChannelPinger$1 onDead
INFO: Ping failed. Terminating the channel JNLP4-connect connection to jenkins.internaldomain.com/10.177.xxx.xxx:50000.
java.util.concurrent.TimeoutException: Ping started at 1595680417120 hasn't completed by 1595680657122
at hudson.remoting.PingThread.ping(PingThread.java:134)
at hudson.remoting.PingThread.run(PingThread.java:90)
Jul 25, 2020 12:39:20 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Terminated
Jul 25, 2020 12:39:30 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Performing onReconnect operation.
Jul 25, 2020 12:39:30 PM jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$FindEffectiveRestarters$1 onReconnect
INFO: Restarting agent via jenkins.slaves.restarter.UnixSlaveRestarter#3ac9ecc9
Jul 25, 2020 12:39:32 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
INFO: Using /opt/workspace/remoting as a remoting work directory
Jul 25, 2020 12:39:32 PM org.jenkinsci.remoting.engine.WorkDirManager setupLogging
INFO: Both error and output logs will be printed to /opt/workspace/remoting
Jul 25, 2020 12:39:32 PM hudson.remoting.jnlp.Main createEngine
INFO: Setting up agent: jenkins-slave-jmeter
Jul 25, 2020 12:39:32 PM hudson.remoting.jnlp.Main$CuiListener <init>
INFO: Jenkins agent is running in headless mode.
Jul 25, 2020 12:39:32 PM hudson.remoting.Engine startEngine
INFO: Using Remoting version: 3.33
Jul 25, 2020 12:39:32 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
INFO: Using /opt/workspace/remoting as a remoting work directory
Jul 25, 2020 12:39:32 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among [https://jenkins.internaldomain.com/]
Jul 25, 2020 12:39:32 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve
INFO: Remoting server accepts the following protocols: [JNLP4-connect, Ping]
Jul 25, 2020 12:39:32 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Agent discovery successful
Agent address: jenkins.internaldomain.com
Agent port: 50000
Identity: 70:35:e7:e9:31:ed:a3:1f:xx:xx:xx:xx:xx:xx:xx:xx
Jul 25, 2020 12:39:32 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Handshaking
Jul 25, 2020 12:39:32 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connecting to jenkins.internaldomain.com:50000
Jul 25, 2020 12:39:32 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Trying protocol: JNLP4-connect
Jul 25, 2020 12:39:33 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Remote identity confirmed: 70:35:e7:e9:31:ed:a3:1f:xx:xx:xx:xx:xx:xx:xx:xx
Jul 25, 2020 12:39:34 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connected
ICMP Ping (Echo) is disabled as per org policies and this blog here states that Remoting Ping is different than ICMP Ping so ICMP being disabled isn't one of the worries (that's what I think)
We tried by disabling Ping in Jenkins Master and Agents but that didn't work.
Initially we ran Master on port 80 and it was blocked hence we suspected it to issue with non-secure port. We configured Master with SSL and on port 443 but the issue was still present.
We found that Azure Idle Timeout is 4 minutes, and Jenkins default ping interval is 5 minutes (300 seconds) so I have tried setting these configuration properties:
hudson.slaves.ChannelPinger.pingIntervalSeconds :
Default: 300,
Custom: 108
Description: Frequency of pings between the master and agents, in seconds
hudson.slaves.ChannelPinger.pingTimeoutSeconds :
Default: 240,
Custom: No Changes
Description: Timeout for each ping between the master and agents, in seconds
Still, no luck.
Has anyone faced an issue like this before, all I can find about is similar issue happened for Windows Agents
Jenkins Agents Name, Identity, Master DNS and IP have been changed to look like generic
Jenkins Master and Slave could keep connection between same VNet but is not able to connect across VNet, that would mean specific port might be blocked across VNet. You will need to enable ping port network traffic across VNet using Network Security Groups (NSG). You can read about it from link.
Once you have enabled that, you can create VM in VNet of jenkins master and try connecting to jenkins slave ping port (using telnet or similar tool). If you are able to connect that would mean NSG is not blocking the traffic otherwise NSG would be blocking the traffic.
I can't run a jenkins image with docker.
It is getting stuck while running:
afik#ubuntu:~$ docker run --name myjenkins -p 8080:8080 -p 50000:50000 -v /var/jenkins_home jenkins > out1 Aug 23, 2018 10:34:41 AM Main deleteWinstoneTempContents WARNING: Failed to delete the temporary Winstone file /tmp/winstone/jenkins.war Aug 23, 2018 10:34:41 AM org.eclipse.jetty.util.log.JavaUtilLog info INFO: Logging initialized #422ms Aug 23, 2018 10:34:41 AM winstone.Logger logInternal INFO: Beginning extraction from war file Aug 23, 2018 10:34:42 AM org.eclipse.jetty.util.log.JavaUtilLog warn WARNING: Empty contextPath Aug 23, 2018 10:34:42 AM org.eclipse.jetty.util.log.JavaUtilLog info INFO: jetty-9.2.z-SNAPSHOT Aug 23, 2018 10:34:42 AM org.eclipse.jetty.util.log.JavaUtilLog info INFO: NO JSP Support for /, did not find org.eclipse.jetty.jsp.JettyJspServlet Aug 23, 2018 10:34:43 AM org.eclipse.jetty.util.log.JavaUtilLog info INFO: Started w.#47404bea{/,file:/var/jenkins_home/war/,AVAILABLE}{/var/jenkins_home/war} Aug 23, 2018 10:34:43 AM org.eclipse.jetty.util.log.JavaUtilLog info INFO: Started ServerConnector#1252b961{HTTP/1.1}{0.0.0.0:8080} Aug 23, 2018 10:34:43 AM org.eclipse.jetty.util.log.JavaUtilLog info INFO: Started #2442ms Aug 23, 2018 10:34:43 AM winstone.Logger logInternal INFO: Winstone Servlet Engine v2.0 running: controlPort=disabled Aug 23, 2018 10:34:43 AM jenkins.InitReactorRunner$1 onAttained INFO: Started initialization Aug 23, 2018 10:34:43 AM jenkins.InitReactorRunner$1 onAttained INFO: Listed all plugins Aug 23, 2018 10:34:44 AM jenkins.InitReactorRunner$1 onAttained INFO: Prepared all plugins Aug 23, 2018 10:34:44 AM jenkins.InitReactorRunner$1 onAttained INFO: Started all plugins Aug 23, 2018 10:34:44 AM jenkins.InitReactorRunner$1 onAttained INFO: Augmented all extensions Aug 23, 2018 10:34:44 AM jenkins.InitReactorRunner$1 onAttained INFO: Loaded all jobs Aug 23, 2018 10:34:45 AM hudson.model.AsyncPeriodicWork$1 run INFO: Started Download metadata Aug 23, 2018 10:34:45 AM jenkins.util.groovy.GroovyHookScript execute INFO: Executing /var/jenkins_home/init.groovy.d/tcp-slave-agent-port.groovy Aug 23, 2018 10:34:45 AM jenkins.InitReactorRunner$1 onAttained INFO: Completed initialization Aug 23, 2018 10:34:45 AM org.springframework.context.support.AbstractApplicationContext prepareRefresh INFO: Refreshing org.springframework.web.context.support.StaticWebApplicationContext#7147f78f: display name [Root WebApplicationContext]; startup date [Thu Aug 23 10:34:45 UTC 2018]; root of context hierarchy Aug 23, 2018 10:34:45 AM org.springframework.context.support.AbstractApplicationContext obtainFreshBeanFactory INFO: Bean factory for application context [org.springframework.web.context.support.StaticWebApplicationContext#7147f78f]: org.springframework.beans.factory.support.DefaultListableBeanFactory#6dbf29b1 Aug 23, 2018 10:34:45 AM org.springframework.beans.factory.support.DefaultListableBeanFactory preInstantiateSingletons INFO: Pre-instantiating singletons in org.springframework.beans.factory.support.DefaultListableBeanFactory#6dbf29b1: defining beans [authenticationManager]; root of factory hierarchy Aug 23, 2018 10:34:45 AM org.springframework.context.support.AbstractApplicationContext prepareRefresh INFO: Refreshing org.springframework.web.context.support.StaticWebApplicationContext#74caac54: display name [Root WebApplicationContext]; startup date [Thu Aug 23 10:34:45 UTC 2018]; root of context hierarchy Aug 23, 2018 10:34:45 AM org.springframework.context.support.AbstractApplicationContext obtainFreshBeanFactory INFO: Bean factory for application context [org.springframework.web.context.support.StaticWebApplicationContext#74caac54]: org.springframework.beans.factory.support.DefaultListableBeanFactory#4e956477 Aug 23, 2018 10:34:45 AM org.springframework.beans.factory.support.DefaultListableBeanFactory preInstantiateSingletons INFO: Pre-instantiating singletons in org.springframework.beans.factory.support.DefaultListableBeanFactory#4e956477: defining beans [filter,legacy]; root of factory hierarchy Aug 23, 2018 10:34:45 AM jenkins.install.SetupWizard init INFO: ************************************************************* ************************************************************* ************************************************************* Jenkins initial setup is required. An admin user has been created and a password generated. Please use the following password to proceed to installation: ac6f4e5afd7c4a7f8ee2d360b3c5649d This may also be found at: /var/jenkins_home/secrets/initialAdminPassword ************************************************************* ************************************************************* ************************************************************* Aug 23, 2018 10:34:50 AM hudson.model.UpdateSite updateData INFO: Obtained the latest update center data file for UpdateSource default Aug 23, 2018 10:34:51 AM hudson.WebAppMain$3 run INFO: Jenkins is fully up and running Aug 23, 2018 10:34:51 AM hudson.model.UpdateSite updateData INFO: Obtained the latest update center data file for UpdateSource default Aug 23, 2018 10:34:53 AM hudson.model.DownloadService$Downloadable load INFO: Obtained the updated data file for hudson.tasks.Maven.MavenInstaller Aug 23, 2018 10:34:54 AM hudson.model.DownloadService$Downloadable load INFO: Obtained the updated data file for hudson.tools.JDKInstaller Aug 23, 2018 10:34:54 AM hudson.model.AsyncPeriodicWork$1 run INFO: Finished Download metadata. 9,318 ms ^CAug 23, 2018 10:36:00 AM winstone.Logger logInternal INFO: JVM is terminating. Shutting down Winstone afik#ubuntu:~$
You actually are running Jenkins... you're just doing it "interactively." That is, you're attached to the process that the docker run command kicked off. If you look at the Dockerfile for the Jenkins image, you'll see:
ENTRYPOINT ["/bin/tini", "--", "/usr/local/bin/jenkins.sh"]
This basically says that, when you run a Jenkins container, it will kick off that shell script. When you get to the point of seeing this:
INFO: Finished Download metadata. 9,318 ms
You're Jenkins container is up and running. You should be able to open a browser and point it to http://localhost:8080/, where you should see the Jenkins UI.
In order to not have your terminal window sit on the command like that, you need to instruct docker to detach from the container (sort of like running a process in the background). If you run docker run --help, you'll see the following entry:
-d, --detach Run container in background and print container ID
So you just need to use the -d flag in your command:
docker run --name myjenkins -p 8080:8080 -p 50000:50000 -v /var/jenkins_home -d jenkins
Instead of showing you the logs, it will instead display a container ID (a big long alphanumeric). Now, if you want to see the logs (for instance, to get the Jenkins-generated password), you can use:
docker logs myjenkins
(that's the name you assigned to the container with the --name option) or
docker logs [container ID]
We started to monitor docker events in our k8s cluster and noticed that are a lot of Kill/Die/Stop/Destroy for various containers in a short time period.
Is that normal? (I assume it's not)
Aparently is not a capacity problem:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk False Thu, 16 Aug 2018 11:19:30 -0300 Tue, 14 Aug 2018 14:02:37 -0300 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Thu, 16 Aug 2018 11:19:30 -0300 Tue, 14 Aug 2018 14:02:37 -0300 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Thu, 16 Aug 2018 11:19:30 -0300 Tue, 14 Aug 2018 14:02:37 -0300 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Thu, 16 Aug 2018 11:19:30 -0300 Fri, 11 May 2018 16:37:48 -0300 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Thu, 16 Aug 2018 11:19:30 -0300 Tue, 14 Aug 2018 14:02:37 -0300 KubeletReady kubelet is posting ready status
All Pods shows status "Running"
Any tips on how debug it further?
You can inspect the docker container status as following commands on the node hosts where runs pods on.
docker inspect <container id>
More option is here
And events logs and journal logs are helpful to debug.
kubectl get events
journalctl --no-pager