Spring amqp stop endless retrying loop, if rabbitmq service is not working - spring-amqp

I'm having trouble with stopping infinite retrying when rabbitmq server is down. I tried using this code snippet
#Bean(name = "rabbitListenerContainerFactory")
public SimpleRabbitListenerContainerFactory simpleRabbitListenerContainerFactory(
SimpleRabbitListenerContainerFactoryConfigurer configurer,
ConnectionFactory connectionFactory) {
SimpleRabbitListenerContainerFactory factory = new SimpleRabbitListenerContainerFactory();
configurer.configure(factory, connectionFactory);
BackOff recoveryBackOff = new FixedBackOff(5000, 1);
factory.setRecoveryBackOff(recoveryBackOff);
return factory;
}
But I am still getting endless loop of retrying
17:49:47,417 DEBUG o.s.a.r.l.BlockingQueueConsumer [org.springframework.amqp.rabbit.RabbitListenerEndpointContainer#13-1] Starting consumer Consumer#4f35699: tags=[[]], channel=null, acknowledgeMode=AUTO local queue size=0
2020-08-31 17:49:49,431 WARN o.s.a.r.l.SimpleMessageListenerContainer [org.springframework.amqp.rabbit.RabbitListenerEndpointContainer#34-2] stopping container - restart recovery attempts exhausted
2020-08-31 17:49:49,431 DEBUG o.s.a.r.l.SimpleMessageListenerContainer [org.springframework.amqp.rabbit.RabbitListenerEndpointContainer#34-2] Shutting down Rabbit listener container
2020-08-31 17:49:49,431 INFO o.s.a.r.c.CachingConnectionFactory [org.springframework.amqp.rabbit.RabbitListenerEndpointContainer#13-1] Attempting to connect to: [localhost:5672]
2020-08-31 17:49:49,431 INFO o.s.a.r.l.SimpleMessageListenerContainer [org.springframework.amqp.rabbit.RabbitListenerEndpointContainer#34-2] Waiting for workers to finish.
2020-08-31 17:49:49,431 INFO o.s.a.r.l.SimpleMessageListenerContainer [org.springframework.amqp.rabbit.RabbitListenerEndpointContainer#34-2] Successfully waited for workers to finish.
2020-08-31 17:49:49,431 DEBUG o.s.a.r.l.SimpleMessageListenerContainer [org.springframework.amqp.rabbit.RabbitListenerEndpointContainer#34-2] Cancelling Consumer#e56de36: tags=[[]], channel=null, acknowledgeMode=AUTO local queue size=0
2020-08-31 17:49:49,431 DEBUG o.s.a.r.l.BlockingQueueConsumer [org.springframework.amqp.rabbit.RabbitListenerEndpointContainer#34-2] Closing Rabbit Channel: null
2020-08-31 17:49:51,434 DEBUG o.s.a.r.l.SimpleMessageListenerContainer [org.springframework.amqp.rabbit.RabbitListenerEndpointContainer#13-1] Recovering consumer in 5000 ms.
2020-08-31 17:49:51,434 DEBUG o.s.a.r.l.SimpleMessageListenerContainer [main] Starting Rabbit listener container.
2020-08-31 17:49:51,434 INFO o.s.a.r.c.CachingConnectionFactory [main] Attempting to connect to: [localhost:5672]
2020-08-31 17:49:53,485 INFO o.s.a.r.l.SimpleMessageListenerContainer [main] Broker not available; cannot force queue declarations during start: java.net.ConnectException: Connection refused: connect
2020-08-31 17:49:53,485 INFO o.s.a.r.c.CachingConnectionFactory [org.springframework.amqp.rabbit.RabbitListenerEndpointContainer#35-1] Attempting to connect to: [localhost:5672]
2020-08-31 17:49:55,518 ERROR o.s.a.r.l.SimpleMessageListenerContainer [org.springframework.amqp.rabbit.RabbitListenerEndpointContainer#35-1] Failed to check/redeclare auto-delete queue(s).
org.springframework.amqp.AmqpConnectException: java.net.ConnectException: Connection refused: connect
at org.springframework.amqp.rabbit.support.RabbitExceptionTranslator.convertRabbitAccessException(RabbitExceptionTranslator.java:62) ~[spring-rabbit-2.1.7.RELEASE.jar:2.1.7.RELEASE]
What I am trying to achieve, is that after one attempt of connecting, stop trying
Edit 1
I have my whole consumer side config here, I can't seem to find where another configurable container factory could be. The thing is, if I have fixed back off at for example 3000 ms, the message changes to Recovering consumer in 3000 ms.
#Configuration
#EnableRabbit
#AllArgsConstructor
public class RabbitConfig {
#Bean
public MessageConverter jsonMessageConverter() {
ObjectMapper jsonObjectMapper = new ObjectMapper();
jsonObjectMapper.registerModule(new JavaTimeModule());
return new Jackson2JsonMessageConverter(jsonObjectMapper);
}
#Bean
public RabbitErrorHandler rabbitExceptionHandler() {
return new RabbitErrorHandler();
}
#Bean(name = "rabbitListenerContainerFactory")
public SimpleRabbitListenerContainerFactory simpleRabbitListenerContainerFactory(
SimpleRabbitListenerContainerFactoryConfigurer configurer,
ConnectionFactory connectionFactory) {
SimpleRabbitListenerContainerFactory factory = new SimpleRabbitListenerContainerFactory();
configurer.configure(factory, connectionFactory);
BackOff recoveryBackOff = new FixedBackOff(5000, 1);
factory.setRecoveryBackOff(recoveryBackOff);
return factory;
}
}
I'm using version 2.1.7

I just copied your code and it worked as expected:
2020-08-31 11:58:08.997 INFO 94891 --- [ main] o.s.a.r.c.CachingConnectionFactory : Attempting to connect to: [localhost:5672]
2020-08-31 11:58:09.002 INFO 94891 --- [ main] o.s.a.r.l.SimpleMessageListenerContainer : Broker not available; cannot force queue declarations during start: java.net.ConnectException: Connection refused (Connection refused)
2020-08-31 11:58:09.006 INFO 94891 --- [ntContainer#0-1] o.s.a.r.c.CachingConnectionFactory : Attempting to connect to: [localhost:5672]
2020-08-31 11:58:09.014 INFO 94891 --- [ main] com.example.demo.So63673274Application : Started So63673274Application in 0.929 seconds (JVM running for 1.39)
2020-08-31 11:58:14.091 WARN 94891 --- [ntContainer#0-1] o.s.a.r.l.SimpleMessageListenerContainer : Consumer raised exception, processing can restart if the connection factory supports it. Exception summary: org.springframework.amqp.AmqpConnectException: java.net.ConnectException: Connection refused (Connection refused)
2020-08-31 11:58:14.093 INFO 94891 --- [ntContainer#0-1] o.s.a.r.l.SimpleMessageListenerContainer : Restarting Consumer#6f2cb653: tags=[[]], channel=null, acknowledgeMode=AUTO local queue size=0
2020-08-31 11:58:14.094 INFO 94891 --- [ntContainer#0-2] o.s.a.r.c.CachingConnectionFactory : Attempting to connect to: [localhost:5672]
2020-08-31 11:58:14.095 WARN 94891 --- [ntContainer#0-2] o.s.a.r.l.SimpleMessageListenerContainer : stopping container - restart recovery attempts exhausted
2020-08-31 11:58:14.095 INFO 94891 --- [ntContainer#0-2] o.s.a.r.l.SimpleMessageListenerContainer : Waiting for workers to finish.
2020-08-31 11:58:14.096 INFO 94891 --- [ntContainer#0-2] o.s.a.r.l.SimpleMessageListenerContainer : Successfully waited for workers to finish.
I see you have
stopping container - restart recovery attempts exhausted
too.
Perhaps the listener that keeps trying is from a different container factory?
Recovering consumer in 5000 ms.

Related

Configuration for Quarkus and Keycloak on Docker

I have the following setup:
Keycloak running in docker, public interface mapped to 127.0.0.1:8180, internal keycloak-n:8080
Quarkus running in docker, public interface mapped to 127.0.0.1:8080
Both run in the same docker network and can communicate
An external AutzClient (not in docker) that uses the token communicate with quarkus
Everything works if client and quarkus are outside of Docker and communicate with keycloak via the same interface. As soon as quarkus is in docker, I can't get it to work.
I've tried many changes so far. On keycloak I set the frontendUrl with /subsystem=keycloak-server/spi=hostname/provider=default:write-attribute(name=properties.frontendUrl="http://127.0.0.1:8180/auth"
My current quarkus config (oidc part) looks like:
# OIDC Configuration
quarkus.oidc.auth-server-url=http://keycloak-n:8080/auth/realms/quarkus
quarkus.oidc.client-id=backend-service
quarkus.oidc.credentials.secret=85174256-b231-4385-9fa9-257dd0d27bf0
quarkus.oidc.token.lifespan-grace=20
quarkus.oidc.introspection-path=.well-known/openid-configuration
quarkus.oidc.jwks-path=.well-known/jwks.json
quarkus.oidc.token.issuer=http://127.0.0.1:8180/auth/realms/quarkus
# Enable Policy Enforcement
quarkus.keycloak.policy-enforcer.enable=true
If I remove the token issuer, I get from vertx a issuer validation failed. With the current configuration the initial auth works, but than I get a Connection refused (Connection refused) from PolicyEnforcer, because it tries to communicate with 127.0.0.1. Stacktrace is:
2020-08-03 05:43:27,933 DEBUG [org.apa.htt.imp.con.tsc.ConnPoolByRoute] (executor-thread-1) Releasing connection [{}->http://keycloak-n:8080][null]
2020-08-03 05:43:27,933 DEBUG [org.apa.htt.imp.con.tsc.ConnPoolByRoute] (executor-thread-1) Pooling connection [{}->http://keycloak-n:8080][null]; keep alive indefinitely
2020-08-03 05:43:27,933 DEBUG [org.apa.htt.imp.con.tsc.ConnPoolByRoute] (executor-thread-1) Notifying no-one, there are no waiting threads
2020-08-03 05:43:27,944 DEBUG [org.apa.htt.imp.con.tsc.ThreadSafeClientConnManager] (executor-thread-1) Get connection: {}->http://127.0.0.1:8180, timeout = 0
2020-08-03 05:43:27,944 DEBUG [org.apa.htt.imp.con.tsc.ConnPoolByRoute] (executor-thread-1) [{}->http://127.0.0.1:8180] total kept alive: 1, total issued: 0, total allocated: 1 out of 20
2020-08-03 05:43:27,944 DEBUG [org.apa.htt.imp.con.tsc.ConnPoolByRoute] (executor-thread-1) No free connections [{}->http://127.0.0.1:8180][null]
2020-08-03 05:43:27,944 DEBUG [org.apa.htt.imp.con.tsc.ConnPoolByRoute] (executor-thread-1) Available capacity: 20 out of 20 [{}->http://127.0.0.1:8180][null]
2020-08-03 05:43:27,944 DEBUG [org.apa.htt.imp.con.tsc.ConnPoolByRoute] (executor-thread-1) Creating new connection [{}->http://127.0.0.1:8180]
2020-08-03 05:43:27,944 DEBUG [org.apa.htt.imp.con.DefaultClientConnectionOperator] (executor-thread-1) Connecting to 127.0.0.1:8180
2020-08-03 05:43:27,945 DEBUG [org.apa.htt.imp.con.DefaultClientConnection] (executor-thread-1) Connection org.apache.http.impl.conn.DefaultClientConnection#6ba49b73 closed
2020-08-03 05:43:27,946 DEBUG [org.apa.htt.imp.con.DefaultClientConnection] (executor-thread-1) Connection org.apache.http.impl.conn.DefaultClientConnection#6ba49b73 shut down
2020-08-03 05:43:27,946 DEBUG [org.apa.htt.imp.con.tsc.ThreadSafeClientConnManager] (executor-thread-1) Released connection is not reusable.
2020-08-03 05:43:27,946 DEBUG [org.apa.htt.imp.con.tsc.ConnPoolByRoute] (executor-thread-1) Releasing connection [{}->http://127.0.0.1:8180][null]
2020-08-03 05:43:27,946 DEBUG [org.apa.htt.imp.con.DefaultClientConnection] (executor-thread-1) Connection org.apache.http.impl.conn.DefaultClientConnection#6ba49b73 closed
2020-08-03 05:43:27,946 DEBUG [org.apa.htt.imp.con.tsc.ConnPoolByRoute] (executor-thread-1) Notifying no-one, there are no waiting threads
2020-08-03 05:43:27,947 ERROR [org.key.ada.aut.PolicyEnforcer] (executor-thread-1) Could not lazy load resource with path [/hello/find/1] from server: java.lang.RuntimeException: Could not find resource
at org.keycloak.authorization.client.util.Throwables.retryAndWrapExceptionIfNecessary(Throwables.java:91)
at org.keycloak.authorization.client.resource.ProtectedResource.find(ProtectedResource.java:232)
at org.keycloak.authorization.client.resource.ProtectedResource.findByMatchingUri(ProtectedResource.java:291)
at org.keycloak.adapters.authorization.PolicyEnforcer$PathConfigMatcher.matches(PolicyEnforcer.java:268)
at org.keycloak.adapters.authorization.AbstractPolicyEnforcer.getPathConfig(AbstractPolicyEnforcer.java:351)
at org.keycloak.adapters.authorization.AbstractPolicyEnforcer.authorize(AbstractPolicyEnforcer.java:72)
at io.quarkus.keycloak.pep.runtime.KeycloakPolicyEnforcerAuthorizer.apply(KeycloakPolicyEnforcerAuthorizer.java:45)
at io.quarkus.keycloak.pep.runtime.KeycloakPolicyEnforcerAuthorizer.apply(KeycloakPolicyEnforcerAuthorizer.java:29)
at io.quarkus.vertx.http.runtime.security.HttpAuthorizer$1$1$1.run(HttpAuthorizer.java:68)
at org.jboss.threads.ContextClassLoaderSavingRunnable.run(ContextClassLoaderSavingRunnable.java:35)
at org.jboss.threads.EnhancedQueueExecutor.safeRun(EnhancedQueueExecutor.java:2046)
at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.doRunTask(EnhancedQueueExecutor.java:1578)
at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1452)
at org.jboss.threads.DelegatingRunnable.run(DelegatingRunnable.java:29)
at org.jboss.threads.ThreadLocalResettingRunnable.run(ThreadLocalResettingRunnable.java:29)
at java.base/java.lang.Thread.run(Thread.java:834)
at org.jboss.threads.JBossThread.run(JBossThread.java:479)
Caused by: java.lang.RuntimeException: Error executing http method [GET]. Response : null
at org.keycloak.authorization.client.util.HttpMethod.execute(HttpMethod.java:106)
at org.keycloak.authorization.client.util.HttpMethodResponse$3.execute(HttpMethodResponse.java:68)
at org.keycloak.authorization.client.resource.ProtectedResource$5.call(ProtectedResource.java:226)
at org.keycloak.authorization.client.resource.ProtectedResource$5.call(ProtectedResource.java:222)
at org.keycloak.authorization.client.resource.ProtectedResource.find(ProtectedResource.java:230)
... 15 more
Caused by: java.net.ConnectException: Connection refused (Connection refused)
at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)
at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:403)
at java.base/java.net.Socket.connect(Socket.java:609)
at org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:121)
at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180)
at org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:144)
at org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:134)
at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:605)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:440)
at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:835)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
at org.keycloak.authorization.client.util.HttpMethod.execute(HttpMethod.java:84)
... 19 more
2020-08-03 05:43:27,951 DEBUG [org.key.ada.aut.AbstractPolicyEnforcer] (executor-thread-1) Checking permissions for path [http://127.0.0.1:8080/hello/find/1] with config [null].
2020-08-03 05:43:27,951 DEBUG [org.key.ada.aut.AbstractPolicyEnforcer] (executor-thread-1) Could not find a configuration for path [/hello/find/1]
Is there any real example on how to configure such a scenario? I already tried to set the frontendUrl to the internal address, that actually works for the runtime, but the web frontend is no longer accessible.
UPDATE:
From front end code (abbreviated):
java.io.InputStream stream = Thread.currentThread().getContextClassLoader()
.getResourceAsStream("META-INF/keycloak.json");
auth=AuthzClient.create(stream);
response = auth.obtainAccessToken(user, password);
final String accessToken = response.getToken();
...
requestContext.getHeaders().add(HttpHeaders.AUTHORIZATION, AUTH_HEADER_PREFIX + accessToken);
...
and config in keycloak.json is
{
"realm": "quarkus",
"auth-server-url": "http://localhost:8180/auth/",
"ssl-required": "external",
"resource": "backend-service",
"verify-token-audience": true,
"credentials": {
"secret": "85174256-b231-4385-9fa9-257dd0d27bf0"
},
"confidential-port": 0,
"policy-enforcer": {}
}
Many thanks
So the following setup works for me:
frontendUrl: external-docker-ip --> NOT localhost!
set in jboss cli by e.g.:
/subsystem=keycloak-server/spi=hostname/provider=default:write-attribute(name=properties.frontendUrl,value="http://172.20.48.1:8180/auth")
##quarkus config
quarkus.oidc.auth-server-url=http://internal_keycloak_docker_IP:8080/auth/realms/quarkus
quarkus.oidc.token.issuer=http://external-docker-ip:8180/auth/realms/quarkus
##client json file
"auth-server-url": "http://external-docker-ip:8180/auth/"

Kylin start fail java.lang.IllegalArgumentException: Failed to find metadata store by url: kylin_metadata#hbase

I install kylin by https://github.com/cas-packone/ambari-kylin-service/
2020-01-13 01:52:16,673 INFO [main] utils.Compatibility:41 : Running in ZooKeeper 3.4.x compatibility mode
2020-01-13 01:52:16,710 INFO [main] imps.CuratorFrameworkImpl:284 : Starting
2020-01-13 01:52:16,715 INFO [main] zookeeper.ZooKeeper:438 : Initiating client connection, connectString=node1:2181 sessionTimeout=120000 watcher=org.apache.curator.ConnectionState#6c000e0c
2020-01-13 01:52:16,717 INFO [main-SendThread(node1:2181)] zookeeper.ClientCnxn:1013 : Opening socket connection to server node1/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2020-01-13 01:52:16,719 INFO [main-SendThread(node1:2181)] zookeeper.ClientCnxn:856 : Socket connection established, initiating session, client: /127.0.0.1:49768, server: node1/127.0.0.1:2181
2020-01-13 01:52:16,721 INFO [main-SendThread(node1:2181)] zookeeper.ClientCnxn:1273 : Session establishment complete on server node1/127.0.0.1:2181, sessionid = 0x16f798087cc03af, negotiated timeout = 60000
2020-01-13 01:52:16,722 INFO [main] imps.CuratorFrameworkImpl:326 : Default schema
2020-01-13 01:52:16,725 DEBUG [main] util.ZookeeperDistributedLock:142 : 6521#node1 trying to lock /kylin/kylin_metadata/create_htable/kylin_metadata/lock
2020-01-13 01:52:16,730 INFO [main-EventThread] state.ConnectionStateManager:237 : State change: CONNECTED
Exception in thread "main" java.lang.IllegalArgumentException: Failed to find metadata store by url: kylin_metadata#hbase
at org.apache.kylin.common.persistence.ResourceStore.createResourceStore(ResourceStore.java:89)
at org.apache.kylin.common.persistence.ResourceStore.getStore(ResourceStore.java:101)
at org.apache.kylin.rest.service.AclTableMigrationTool.checkIfNeedMigrate(AclTableMigrationTool.java:94)
at org.apache.kylin.tool.AclTableMigrationCLI.main(AclTableMigrationCLI.java:41)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.kylin.common.persistence.ResourceStore.createResourceStore(ResourceStore.java:83)
... 3 more
Caused by: java.lang.NoSuchMethodError: org.apache.curator.framework.api.CreateBuilder.creatingParentsIfNeeded()Lorg/apache/curator/framework/api/ProtectACLCreateModePathAndBytesable;
at org.apache.kylin.storage.hbase.util.ZookeeperDistributedLock.lock(ZookeeperDistributedLock.java:145)
at org.apache.kylin.storage.hbase.util.ZookeeperDistributedLock.lock(ZookeeperDistributedLock.java:166)
at org.apache.kylin.storage.hbase.HBaseConnection.createHTableIfNeeded(HBaseConnection.java:305)
at org.apache.kylin.storage.hbase.HBaseResourceStore.createHTableIfNeeded(HBaseResourceStore.java:110)
at org.apache.kylin.storage.hbase.HBaseResourceStore.<init>(HBaseResourceStore.java:91)
... 8 more
2020-01-13 01:52:16,772 INFO [Curator-Framework-0] imps.CuratorFrameworkImpl:924 : backgroundOperationsLoop exiting
2020-01-13 01:52:16,773 INFO [Thread-1] zookeeper.ReadOnlyZKClient:344 : Close zookeeper connection 0x7b94089b to node1:2181
2020-01-13 01:52:16,775 INFO [ReadOnlyZKClient-node1:2181#0x7b94089b] zookeeper.ZooKeeper:692 : Session: 0x16f798087cc03ae closed
2020-01-13 01:52:16,775 INFO [ReadOnlyZKClient-node1:2181#0x7b94089b-EventThread] zookeeper.ClientCnxn:517 : EventThread shut down
2020-01-13 01:52:16,777 INFO [Thread-4] zookeeper.ZooKeeper:692 : Session: 0x16f798087cc03af closed
2020-01-13 01:52:16,777 INFO [main-EventThread] zookeeper.ClientCnxn:517 : EventThread shut down
ERROR: Unknown error. Please check full log.

Timeout exception Flink

I have a question regarding Flink. I am running an application in a local cluster, with 1 TaskManager and 4 Taskslots.
After some time of running the application, I got an Timeout error:
java.util.concurrent.TimeoutException: Heartbeat of TaskManager with id feea6a6702a0cf960ae2847b5bd25665 timed out.
I have seen some posts with this topic but any answer to it. Could you help me to see the root cause, or a posible troubleshooting?
I am using flink version 1.5.3
It seems that the docker container of taskmanagers and JobManager are stopped when this happens.
Let me add the error trace from the JobManager container logs:
2019-06-09 13:31:06,300 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job Socket Window NgsiEvent (ef3a860de48d54544d973754c6170d8b) switched from state FAILING to FAILED.
java.util.concurrent.TimeoutException: Heartbeat of TaskManager with id 63dbab620797b84da023b33578478238 timed out.
at org.apache.flink.runtime.jobmaster.JobMaster$TaskManagerHeartbeatListener.notifyHeartbeatTimeout(JobMaster.java:1609)
at org.apache.flink.runtime.heartbeat.HeartbeatManagerImpl$HeartbeatMonitor.run(HeartbeatManagerImpl.java:339)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.apache.flink.runtime.concurrent.akka.ActorSystemScheduledExecutorAdapter$ScheduledFutureTask.run(ActorSystemScheduledExecutorAdapter.java:154)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2019-06-09 13:31:06,308 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Could not restart the job Socket Window NgsiEvent (ef3a860de48d54544d973754c6170d8b) because the restart strategy prevented it.
java.util.concurrent.TimeoutException: Heartbeat of TaskManager with id 63dbab620797b84da023b33578478238 timed out.
at org.apache.flink.runtime.jobmaster.JobMaster$TaskManagerHeartbeatListener.notifyHeartbeatTimeout(JobMaster.java:1609)
at org.apache.flink.runtime.heartbeat.HeartbeatManagerImpl$HeartbeatMonitor.run(HeartbeatManagerImpl.java:339)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.apache.flink.runtime.concurrent.akka.ActorSystemScheduledExecutorAdapter$ScheduledFutureTask.run(ActorSystemScheduledExecutorAdapter.java:154)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2019-06-09 13:31:06,317 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Stopping checkpoint coordinator for job ef3a860de48d54544d973754c6170d8b.
2019-06-09 13:31:06,322 INFO org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore - Shutting down
2019-06-09 13:31:06,331 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink#16363182f31f:36715] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink#16363182f31f:36715]] Caused by: [16363182f31f]
2019-06-09 13:31:06,351 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Job ef3a860de48d54544d973754c6170d8b reached globally terminal state FAILED.
2019-06-09 13:31:06,434 INFO org.apache.flink.runtime.jobmaster.JobMaster - Stopping the JobMaster for job Socket Window NgsiEvent(ef3a860de48d54544d973754c6170d8b).
2019-06-09 13:31:06,447 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Suspending SlotPool.
2019-06-09 13:31:06,448 INFO org.apache.flink.runtime.jobmaster.JobMaster - Close ResourceManager connection 883e842633b0fd9a2e53ab45778581fe: JobManager is shutting down..
2019-06-09 13:31:06,449 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcActor - The rpc endpoint org.apache.flink.runtime.jobmaster.slotpool.SlotPool has not been started yet. Discarding message org.apache.flink.runtime.rpc.messages.LocalRpcInvocation until processing is started.
2019-06-09 13:31:06,457 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Disconnect job manager 00000000000000000000000000000000#akka.tcp://flink#jobmanager:6123/user/jobmanager_2 for job ef3a860de48d54544d973754c6170d8b from the resource manager.
2019-06-09 13:31:06,459 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Stopping SlotPool.
2019-06-09 13:31:06,460 INFO org.apache.flink.runtime.jobmaster.JobManagerRunner - JobManagerRunner already shutdown.
2019-06-09 13:31:16,304 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink#16363182f31f:36715] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink#16363182f31f:36715]] Caused by: [16363182f31f: Name or service not known]
2019-06-09 13:31:26,320 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink#16363182f31f:36715] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink#16363182f31f:36715]] Caused by: [16363182f31f: Name or service not known]
2019-06-09 13:31:36,286 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink#16363182f31f:36715] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink#16363182f31f:36715]] Caused by: [16363182f31f]
Thanks in advance!

Re-creating the queue and re-connecting to rabbitMQ

Components Involved: Spring Config-server, Spring AMQP (RabbitMQ), Spring Config-client
Goal: Use push notification to inform config-client to refresh config.
RabbitMQ instance: From docker hub, I pulled rabbitmq:3-management image and ran.
Config-client AMQP version pom.xml:
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-bus-amqp</artifactId>
<version>1.3.1.RELEASE</version>
</dependency>
Config-server pom.xml:
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-config-monitor</artifactId>
<version>1.3.1.RELEASE</version>
</dependency>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-stream-rabbit</artifactId>
<version>1.2.1.RELEASE</version>
</dependency>
Fault Tolerance Scenario:
- Bring down RabbitMQ service/cluster/instance.
- All config client looses connectivity. Queues are deleted since they were created as auto-delete.
- Bring back up RabbitMQ service.
Expectation: All config client should reconnect successfully.
Reality: This is not working. Please see below error.
2018-03-27 09:07:12.850 WARN 21251 --- [AO2Q06fYCALSA-6] o.s.a.r.listener.BlockingQueueConsumer : Failed to declare queue:springCloudBus.anonymous.FGZPCPqzTAO2Q06fYCALSA
2018-03-27 09:07:12.851 ERROR 21251 --- [AO2Q06fYCALSA-6] o.s.a.r.l.SimpleMessageListenerContainer : Consumer received fatal exception on startup
org.springframework.amqp.rabbit.listener.QueuesNotAvailableException: Cannot prepare queue for listener. Either the queue doesn't exist or the broker will not allow us to use it.
at org.springframework.amqp.rabbit.listener.BlockingQueueConsumer.start(BlockingQueueConsumer.java:548)
at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer$AsyncMessageProcessingConsumer.run(SimpleMessageListenerContainer.java:1335)
at java.base/java.lang.Thread.run(Thread.java:844)
Caused by: org.springframework.amqp.rabbit.listener.BlockingQueueConsumer$DeclarationException: Failed to declare queue(s):[springCloudBus.anonymous.FGZPCPqzTAO2Q06fYCALSA]
at org.springframework.amqp.rabbit.listener.BlockingQueueConsumer.attemptPassiveDeclarations(BlockingQueueConsumer.java:621)
at org.springframework.amqp.rabbit.listener.BlockingQueueConsumer.start(BlockingQueueConsumer.java:520)
[common frames omitted]
Caused by: java.io.IOException: null
Caused by: com.rabbitmq.client.ShutdownSignalException: channel error; protocol method: #method(reply-code=404, reply-text=NOT_FOUND - no queue 'springCloudBus.anonymous.FGZPCPqzTAO2Q06fYCALSA' in vhost '/', class-id=50, method-id=10)
[common frames omitted]
Caused by: com.rabbitmq.client.ShutdownSignalException: channel error; protocol method: #method(reply-code=404, reply-text=NOT_FOUND - no queue 'springCloudBus.anonymous.FGZPCPqzTAO2Q06fYCALSA' in vhost '/', class-id=50, method-id=10)
at com.rabbitmq.client.impl.ChannelN.asyncShutdown(ChannelN.java:505)
[common frames omitted]
2018-03-27 09:07:12.852 ERROR 21251 --- [AO2Q06fYCALSA-6] o.s.a.r.l.SimpleMessageListenerContainer : Stopping container from aborted consumer
2018-03-27 09:07:12.853 INFO 21251 --- [AO2Q06fYCALSA-6] o.s.a.r.l.SimpleMessageListenerContainer : Waiting for workers to finish.
2018-03-27 09:07:12.853 INFO 21251 --- [AO2Q06fYCALSA-6] o.s.a.r.l.SimpleMessageListenerContainer : Successfully waited for workers to finish.
Description of error from my understanding
config client using the existing broker, listener tried to reconnect but queue is missing. Makes 3 retries by default. This is expected as we are going through a scenario when a Rabbit MQ service is down and restarted without persistent data. Issue is reconnection fails. I know from many articles that mentions we cannot redeclare queue without using admin. For that we create a XML config file that creates property beans declaring admin and other stuff.
What is the ask?
- Will it be ideal if all this is taken care as by default scenario.
** Also I still don't have the working solution. NEED HELP"
I just tested it with Boot 2.0 and Finchley.M9 (bus 2.0.0.M7) with no problems...
2018-03-27 13:25:06.125 INFO 36716 --- [ main] c.s.b.r.p.RabbitExchangeQueueProvisioner : declaring queue for inbound: springCloudBus.anonymous.tySvAS8BSpS7OtQ_VCeiVQ, bound to: springCloudBus
...
2018-03-27 13:26:38.220 ERROR 36716 --- [ 127.0.0.1:5672] o.s.a.r.c.CachingConnectionFactory : Channel shutdown: connection error; protocol method: #method(reply-code=320, reply-text=CONNECTION_FORCED - broker forced connection closure with reason 'shutdown', class-id=0, method-id=0)
2018-03-27 13:26:58.757 INFO 36716 --- [pS7OtQ_VCeiVQ-6] o.s.a.r.c.CachingConnectionFactory : Attempting to connect to: [localhost:5672]
2018-03-27 13:26:58.761 INFO 36716 --- [pS7OtQ_VCeiVQ-6] o.s.a.r.c.CachingConnectionFactory : Created new connection: rabbitConnectionFactory#52c8295b:5/SimpleConnection#74846ead [delegate=amqp://guest#127.0.0.1:5672/, localPort= 49746]
2018-03-27 13:26:58.762 INFO 36716 --- [pS7OtQ_VCeiVQ-6] o.s.amqp.rabbit.core.RabbitAdmin : Auto-declaring a non-durable, auto-delete, or exclusive Queue (springCloudBus.anonymous.tySvAS8BSpS7OtQ_VCeiVQ) durable:false, auto-delete:true, exclusive:true. It will be redeclared if the broker stops and is restarted while the connection factory is alive, but all messages will be lost.
The RabbitExchangeQueueProvisioner explicitly sets up a RabbitAdmin to re-declare the queue after the connection is re-established.
I'll try with older versions now...
EDIT
Same result with boot 1.5.10 and Edgware.SR3 (bus 1.3.3.RELEASE).
EDIT2
Same result with the 1.3.1 bus starter (brings in 1.2.1 stream rabbit). Works fine.

Flume agent throws java.net.ConnectException: Connection refused

I have been using flume for a while now, I have got agent and collector running on same machine.
Configuration
agent: exec("/usr/bin/tail -n +0 -F /path/to/file") | agentE2ESink("hostname", 35855)
collector: collectorSource(35855) | collector(10000) { collectorSink("/hdfs/path/to/sink","name") }
Facing issues in the agent node:
2012-06-04 19:13:33,625 [naive file wal consumer-27] INFO debug.InsistentOpenDecorator: open attempt 0 failed, backoff (1000ms): Failed to open thrift event sink to hostname:35855 : java.net.ConnectException: Connection refused
2012-06-04 19:13:34,625 [logicalNode hostname-19] ERROR connector.DirectDriver: Expected ACTIVE but timed out in state OPENING
2012-06-04 19:13:34,632 [naive file wal consumer-27] INFO debug.InsistentOpenDecorator: open attempt 1 failed, backoff (2000ms): Failed to open thrift event sink to hostname:35855 : java.net.ConnectException: Connection refused
2012-06-04 19:13:36,635 [naive file wal consumer-27] INFO debug.InsistentOpenDecorator: open attempt 2 failed, backoff (4000ms): Failed to open thrift event sink to hostname:35855 : java.net.ConnectException: Connection refused
and then empty ACKs will be sent continuously
2012-06-04 19:19:56,960 [Roll-TriggerThread-0] INFO endtoend.AckListener$Empty: Empty Ack Listener began 20120604-191956958+0530.881565921235084.00000026
2012-06-04 19:20:07,043 [Roll-TriggerThread-0] INFO hdfs.SeqfileEventSink: closed /tmp/flume-user1/agent/hostname/writing/20120604-191956958+0530.881565921235084.00000026
I dont understand why the connection is refused. Are there any system level changes that needs to be done ?
Note: the collector is listening to the port but agent is unable to send data through the 35855 port.
Can anyone help me with this problem.
Thanks
If you are running both the agent and the collector on the same box, you should be using localhost as the address.
agentE2ESink("localhost", 35855)

Resources