Opscenter does not show my key spaces - datastax-enterprise

In my opscenter web page, in the schema tab I was unable to see any of my keyspaces(0 Keyspaces | 0 Column Families ) and in the logs keep on saying
WARN [rollup-snapshot] 2013-11-18 20:02:47,373 42937 operations dropped so far.
WARN [rollup-snapshot] 2013-11-18 20:02:47,373 Thrift operation queue is full, discarding thrift operation
WARN [rollup-snapshot] 2013-11-18 20:02:47,373 42938 operations dropped so far.
WARN [rollup-snapshot] 2013-11-18 20:02:47,373 Thrift operation queue is full, discarding thrift operation
WARN [rollup-snapshot] 2013-11-18 20:02:47,373 42939 operations dropped so far.
WARN [rollup-snapshot] 2013-11-18 20:02:47,373 Thrift operation queue is full, discarding thrift operation
WARN [rollup-snapshot] 2013-11-18 20:02:47,374 42940 operations dropped so far.
WARN [rollup-snapshot] 2013-11-18 20:02:47,374 Thrift operation queue is full, discarding thrift operation
WARN [rollup-snapshot] 2013-11-18 20:02:47,374 42941 operations dropped so far.
WARN [rollup-snapshot] 2013-11-18 20:02:47,374 Thrift operation queue is full, discarding thrift operation
WARN [rollup-snapshot] 2013-11-18 20:02:47,374 42942 operations dropped so far.
WARN [rollup-snapshot] 2013-11-18 20:02:47,374 Thrift operation queue is full, discarding thrift operation
WARN [rollup-snapshot] 2013-11-18 20:02:47,374 42943 operations dropped so far.
WARN [rollup-snapshot] 2013-11-18 20:02:47,374 Thrift operation queue is full, discarding thrift operation
WARN [rollup-snapshot] 2013-11-18 20:02:47,374 42944 operations dropped so far.
WARN [rollup-snapshot] 2013-11-18 20:02:47,374 Thrift operation queue is full, discarding thrift operation
WARN [rollup-snapshot] 2013-11-18 20:02:47,374 42945 operations dropped so far.
WARN [rollup-snapshot] 2013-11-18 20:02:47,374 Thrift operation queue is full, discarding thrift operation
WARN [rollup-snapshot] 2013-11-18 20:02:47,374 42946 operations dropped so far.
WARN [rollup-snapshot] 2013-11-18 20:02:47,374 Thrift operation queue is full, discarding thrift operation
WARN [rollup-snapshot] 2013-11-18 20:02:47,374 42947 operations dropped so far.
WARN [rollup-snapshot] 2013-11-18 20:02:47,375 Thrift operation queue is full, discarding thrift operation
WARN [rollup-snapshot] 2013-11-18 20:02:47,375 42948 operations dropped so far.
WARN [rollup-snapshot] 2013-11-18 20:02:47,375 Thrift operation queue is full, discarding thrift operation
WARN [rollup-snapshot] 2013-11-18 20:02:47,375 42949 operations dropped so far.
WARN [rollup-snapshot] 2013-11-18 20:02:47,375 Thrift operation queue is full, discarding thrift operation
WARN [rollup-snapshot] 2013-11-18 20:02:47,375 42950 operations dropped so far.
WARN [rollup-snapshot] 2013-11-18 20:02:47,375 Thrift operation queue is full, discarding thrift operation
WARN [rollup-snapshot] 2013-11-18 20:02:47,375 42951 operations dropped so far.
WARN [rollup-snapshot] 2013-11-18 20:02:47,375 Thrift operation queue is full, discarding thrift operation
WARN [rollup-snapshot] 2013-11-18 20:02:47,375 42952 operations dropped so far.
WARN [rollup-snapshot] 2013-11-18 20:02:47,375 Thrift operation queue is full, discarding thrift operation
WARN [rollup-snapshot] 2013-11-18 20:02:47,375 42953 operations dropped so far.
WARN [rollup-snapshot] 2013-11-18 20:02:47,375 Thrift operation queue is full, discarding thrift operation
WARN [rollup-snapshot] 2013-11-18 20:02:47,375 42954 operations dropped so far.
WARN [rollup-snapshot] 2013-11-18 20:02:47,375 Thrift operation queue is full, discarding thrift operation
WARN [rollup-snapshot] 2013-11-18 20:02:47,376 42955 operations dropped so far.
WARN [rollup-snapshot] 2013-11-18 20:02:47,376 Thrift operation queue is full, discarding thrift operation
WARN [rollup-snapshot] 2013-11-18 20:02:47,376 42956 operations dropped so far.
WARN [rollup-snapshot] 2013-11-18 20:02:47,376 Thrift operation queue is full, discarding thrift operation
WARN [rollup-snapshot] 2013-11-18 20:02:47,376 42957 operations dropped so far.
I restarted the datastax-agent but still I could not find any error's in the log file below is the agent.log file
Startup log:
Starting DataStax agent monitor datastax_agent_monitor[ OK ]
log4j:WARN No appenders could be found for logger (org.eclipse.jetty.util.log).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
INFO [main] 2013-11-27 01:37:45,191 Loading conf files: /var/lib/datastax-agent/conf/address.yaml
INFO [main] 2013-11-27 01:37:45,260 Java vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.7.0_25
INFO [main] 2013-11-27 01:37:45,261 Waiting for the config from OpsCenter
INFO [main] 2013-11-27 01:37:45,262 Attempting to determine Cassandra's broadcast address through JMX
INFO [main] 2013-11-27 01:37:45,264 Starting Stomp
INFO [main] 2013-11-27 01:37:45,264 SSL communication is disabled
INFO [main] 2013-11-27 01:37:45,264 Creating stomp connection to x.x.x.x:61620
INFO [Initialization] 2013-11-27 01:37:45,266 New JMX connection (127.0.0.1:7199)
INFO [StompConnection receiver] 2013-11-27 01:37:45,274 Reconnecting in 0s.
INFO [StompConnection receiver] 2013-11-27 01:37:45,280 Connected to x.x.x.x:61620
INFO [main] 2013-11-27 01:37:45,313 Starting Jetty server: {:port 61621, :host nil, :ssl? false, :join? false}
INFO [Jetty] 2013-11-27 01:37:45,511 Jetty server started
INFO [StompConnection receiver] 2013-11-27 01:37:45,566 Got new config from OpsCenter: {:kerberos_use_keytab true, :rollups300_ttl 2419200, :kerberos_use_ticket_cache true, :rollups60_ttl 604800, :thrift_port 9160, :ec2_metadata_api_host "x.x.x.x", :metrics_enabled 1, :rollups7200_ttl 31536000, :thrift_ssl_truststore nil, :metrics_ignored_column_families "", :cassandra_log_location "/var/log/cassandra/system.log", :thrift_rpc_interface "x.x.x.x", :thrift_ssl_truststore_password nil, :jmx_port 7199, :provisioning 0, :use_ssl 0, :kerberos_debug false, :rollups86400_ttl -1, :api_port "61621", :storage_keyspace "OpsCenter", :kerberos_renew_tgt true, :metrics_ignored_solr_cores "", :thrift_ssl_truststore_type "JKS", :metrics_ignored_keyspaces "system, system_traces, system_auth, dse_auth, OpsCenter", :rollup_subscriptions [], :cassandra_install_location ""}
INFO [StompConnection receiver] 2013-11-27 01:37:45,567 New JMX connection (127.0.0.1:7199)
INFO [Initialization] 2013-11-27 01:37:45,633 Using x.x.x.x as the cassandra broadcast address
INFO [StompConnection receiver] 2013-11-27 01:37:45,662 Starting up agent collection.
INFO [Initialization] 2013-11-27 01:37:45,714 agent RPC address is x.x.x.x
INFO [Initialization] 2013-11-27 01:37:45,715 agent RPC broadcast address is x.x.x.x
INFO [StompConnection receiver] 2013-11-27 01:37:45,721 Starting OS metric collectors (Linux)
INFO [Initialization] 2013-11-27 01:37:45,723 Clearing ssl.truststore
INFO [Initialization] 2013-11-27 01:37:45,723 Clearing ssl.truststore.password
INFO [Initialization] 2013-11-27 01:37:45,723 Setting ssl.store.type to JKS
INFO [Initialization] 2013-11-27 01:37:45,728 Clearing kerberos.service.principal.name
INFO [Initialization] 2013-11-27 01:37:45,728 Clearing kerberos.principal
INFO [Initialization] 2013-11-27 01:37:45,728 Setting kerberos.useTicketCache to true
INFO [Initialization] 2013-11-27 01:37:45,728 Clearing kerberos.ticketCache
INFO [Initialization] 2013-11-27 01:37:45,729 Setting kerberos.useKeyTab to true
INFO [Initialization] 2013-11-27 01:37:45,729 Clearing kerberos.keyTab
INFO [Initialization] 2013-11-27 01:37:45,729 Setting kerberos.renewTGT to true
INFO [Initialization] 2013-11-27 01:37:45,729 Setting kerberos.debug to false
INFO [thrift-init] 2013-11-27 01:37:45,733 Connecting to Cassandra cluster: x.x.x.x (port 9160)
INFO [StompConnection receiver] 2013-11-27 01:37:45,737 Starting Cassandra JMX metric collectors
INFO [thrift-init] 2013-11-27 01:37:45,749 Downed Host Retry service started with queue size -1 and retry delay 10s
INFO [StompConnection receiver] 2013-11-27 01:37:45,755 New JMX connection (127.0.0.1:7199)
INFO [thrift-init] 2013-11-27 01:37:45,757 Registering JMX me.prettyprint.cassandra.service_Agent Cluster:ServiceType=hector,MonitorType=hector
INFO [pdp-loader] 2013-11-27 01:37:45,834 in execute with client org.apache.cassandra.thrift.Cassandra$Client#67cf1438
INFO [thrift-init] 2013-11-27 01:37:45,836 Connected to Cassandra cluster: /Test
INFO [pdp-loader] 2013-11-27 01:37:45,844 Attempting to load stored metric values.
INFO [thrift-init] 2013-11-27 01:37:45,841 in execute with client org.apache.cassandra.thrift.Cassandra$Client#67cf1438
INFO [thrift-init] 2013-11-27 01:37:45,845 Using partitioner: org.apache.cassandra.dht.Murmur3Partitioner
INFO [jmx-metrics-1] 2013-11-27 01:37:50,748 New JMX connection (127.0.0.1:7199)
INFO [qtp131393312-25] 2013-11-27 01:38:59,902 HTTP: :get /os-metric/disk-space {} - 200
INFO [qtp131393312-24] 2013-11-27 01:39:04,468 HTTP: :get /os-metric/disk-space {} - 200
WARN [rollup-snapshot] 2013-11-27 01:42:45,841 Thrift operation queue is full, discarding thrift operation
WARN [rollup-snapshot] 2013-11-27 01:42:45,842 1 operations dropped so far.
WARN [rollup-snapshot] 2013-11-27 01:42:45,842 Thrift operation queue is full, discarding thrift operation
WARN [rollup-snapshot] 2013-11-27 01:42:45,842 2 operations dropped so far.
WARN [rollup-snapshot] 2013-11-27 01:42:45,843 Thrift operation queue is full, discarding thrift operation
WARN [rollup-snapshot] 2013-11-27 01:42:45,843 3 operations dropped so far.
WARN [rollup-snapshot] 2013-11-27 01:42:45,843 Thrift operation queue is full, discarding thrift operation
WARN [rollup-snapshot] 2013-11-27 01:42:45,843 4 operations dropped so far.
WARN [rollup-snapshot] 2013-11-27 01:42:45,843 Thrift operation queue is full, discarding thrift operation
WARN [rollup-snapshot] 2013-11-27 01:42:45,843 5 operations dropped so far.
WARN [rollup-snapshot] 2013-11-27 01:42:45,844 Thrift operation queue is full, discarding thrift operation
WARN [rollup-snapshot] 2013-11-27 01:42:45,844 6 operations dropped so far.
WARN [rollup-snapshot] 2013-11-27 01:42:45,844 Thrift operation queue is full, discarding thrift operation
WARN [rollup-snapshot] 2013-11-27 01:42:45,844 7 operations dropped so far.
Thrift is running:
tcp 0 0 0.0.0.0:7199 0.0.0.0:* LISTEN 498 21333533 15520/java
tcp 0 0 0.0.0.0:9160 0.0.0.0:* LISTEN 498 21334831 15520/java
Cassandra nodes are up and running.

The issue in this case was related to the number of column families created in the cluster. A large number of column families can slow down fetching the list of keyspaces and column families as well as back up metric insertion. You can configure which column families have metrics collected. See:
http://www.datastax.com/documentation/opscenter/4.0/webhelp/index.html#opsc/configure/../../opsc/configure/../../opsc/configure/opscExcludingKeyspaces_c.html
If you don't want to disable monitoring on clusters with a large number of column families, there are a few settings you can tweak in the agent config.
thrift_max_conns - the max number of concurrent connections to make to the local node
asysnc_pool_size - the size of the threadpool pulling from a queue of inserts and inserting in to cassandra
async_queue_size - the size of the queue of inserts to send to cassandra, if the queue fills up additional operations will be dropped

Related

Spring amqp stop endless retrying loop, if rabbitmq service is not working

I'm having trouble with stopping infinite retrying when rabbitmq server is down. I tried using this code snippet
#Bean(name = "rabbitListenerContainerFactory")
public SimpleRabbitListenerContainerFactory simpleRabbitListenerContainerFactory(
SimpleRabbitListenerContainerFactoryConfigurer configurer,
ConnectionFactory connectionFactory) {
SimpleRabbitListenerContainerFactory factory = new SimpleRabbitListenerContainerFactory();
configurer.configure(factory, connectionFactory);
BackOff recoveryBackOff = new FixedBackOff(5000, 1);
factory.setRecoveryBackOff(recoveryBackOff);
return factory;
}
But I am still getting endless loop of retrying
17:49:47,417 DEBUG o.s.a.r.l.BlockingQueueConsumer [org.springframework.amqp.rabbit.RabbitListenerEndpointContainer#13-1] Starting consumer Consumer#4f35699: tags=[[]], channel=null, acknowledgeMode=AUTO local queue size=0
2020-08-31 17:49:49,431 WARN o.s.a.r.l.SimpleMessageListenerContainer [org.springframework.amqp.rabbit.RabbitListenerEndpointContainer#34-2] stopping container - restart recovery attempts exhausted
2020-08-31 17:49:49,431 DEBUG o.s.a.r.l.SimpleMessageListenerContainer [org.springframework.amqp.rabbit.RabbitListenerEndpointContainer#34-2] Shutting down Rabbit listener container
2020-08-31 17:49:49,431 INFO o.s.a.r.c.CachingConnectionFactory [org.springframework.amqp.rabbit.RabbitListenerEndpointContainer#13-1] Attempting to connect to: [localhost:5672]
2020-08-31 17:49:49,431 INFO o.s.a.r.l.SimpleMessageListenerContainer [org.springframework.amqp.rabbit.RabbitListenerEndpointContainer#34-2] Waiting for workers to finish.
2020-08-31 17:49:49,431 INFO o.s.a.r.l.SimpleMessageListenerContainer [org.springframework.amqp.rabbit.RabbitListenerEndpointContainer#34-2] Successfully waited for workers to finish.
2020-08-31 17:49:49,431 DEBUG o.s.a.r.l.SimpleMessageListenerContainer [org.springframework.amqp.rabbit.RabbitListenerEndpointContainer#34-2] Cancelling Consumer#e56de36: tags=[[]], channel=null, acknowledgeMode=AUTO local queue size=0
2020-08-31 17:49:49,431 DEBUG o.s.a.r.l.BlockingQueueConsumer [org.springframework.amqp.rabbit.RabbitListenerEndpointContainer#34-2] Closing Rabbit Channel: null
2020-08-31 17:49:51,434 DEBUG o.s.a.r.l.SimpleMessageListenerContainer [org.springframework.amqp.rabbit.RabbitListenerEndpointContainer#13-1] Recovering consumer in 5000 ms.
2020-08-31 17:49:51,434 DEBUG o.s.a.r.l.SimpleMessageListenerContainer [main] Starting Rabbit listener container.
2020-08-31 17:49:51,434 INFO o.s.a.r.c.CachingConnectionFactory [main] Attempting to connect to: [localhost:5672]
2020-08-31 17:49:53,485 INFO o.s.a.r.l.SimpleMessageListenerContainer [main] Broker not available; cannot force queue declarations during start: java.net.ConnectException: Connection refused: connect
2020-08-31 17:49:53,485 INFO o.s.a.r.c.CachingConnectionFactory [org.springframework.amqp.rabbit.RabbitListenerEndpointContainer#35-1] Attempting to connect to: [localhost:5672]
2020-08-31 17:49:55,518 ERROR o.s.a.r.l.SimpleMessageListenerContainer [org.springframework.amqp.rabbit.RabbitListenerEndpointContainer#35-1] Failed to check/redeclare auto-delete queue(s).
org.springframework.amqp.AmqpConnectException: java.net.ConnectException: Connection refused: connect
at org.springframework.amqp.rabbit.support.RabbitExceptionTranslator.convertRabbitAccessException(RabbitExceptionTranslator.java:62) ~[spring-rabbit-2.1.7.RELEASE.jar:2.1.7.RELEASE]
What I am trying to achieve, is that after one attempt of connecting, stop trying
Edit 1
I have my whole consumer side config here, I can't seem to find where another configurable container factory could be. The thing is, if I have fixed back off at for example 3000 ms, the message changes to Recovering consumer in 3000 ms.
#Configuration
#EnableRabbit
#AllArgsConstructor
public class RabbitConfig {
#Bean
public MessageConverter jsonMessageConverter() {
ObjectMapper jsonObjectMapper = new ObjectMapper();
jsonObjectMapper.registerModule(new JavaTimeModule());
return new Jackson2JsonMessageConverter(jsonObjectMapper);
}
#Bean
public RabbitErrorHandler rabbitExceptionHandler() {
return new RabbitErrorHandler();
}
#Bean(name = "rabbitListenerContainerFactory")
public SimpleRabbitListenerContainerFactory simpleRabbitListenerContainerFactory(
SimpleRabbitListenerContainerFactoryConfigurer configurer,
ConnectionFactory connectionFactory) {
SimpleRabbitListenerContainerFactory factory = new SimpleRabbitListenerContainerFactory();
configurer.configure(factory, connectionFactory);
BackOff recoveryBackOff = new FixedBackOff(5000, 1);
factory.setRecoveryBackOff(recoveryBackOff);
return factory;
}
}
I'm using version 2.1.7
I just copied your code and it worked as expected:
2020-08-31 11:58:08.997 INFO 94891 --- [ main] o.s.a.r.c.CachingConnectionFactory : Attempting to connect to: [localhost:5672]
2020-08-31 11:58:09.002 INFO 94891 --- [ main] o.s.a.r.l.SimpleMessageListenerContainer : Broker not available; cannot force queue declarations during start: java.net.ConnectException: Connection refused (Connection refused)
2020-08-31 11:58:09.006 INFO 94891 --- [ntContainer#0-1] o.s.a.r.c.CachingConnectionFactory : Attempting to connect to: [localhost:5672]
2020-08-31 11:58:09.014 INFO 94891 --- [ main] com.example.demo.So63673274Application : Started So63673274Application in 0.929 seconds (JVM running for 1.39)
2020-08-31 11:58:14.091 WARN 94891 --- [ntContainer#0-1] o.s.a.r.l.SimpleMessageListenerContainer : Consumer raised exception, processing can restart if the connection factory supports it. Exception summary: org.springframework.amqp.AmqpConnectException: java.net.ConnectException: Connection refused (Connection refused)
2020-08-31 11:58:14.093 INFO 94891 --- [ntContainer#0-1] o.s.a.r.l.SimpleMessageListenerContainer : Restarting Consumer#6f2cb653: tags=[[]], channel=null, acknowledgeMode=AUTO local queue size=0
2020-08-31 11:58:14.094 INFO 94891 --- [ntContainer#0-2] o.s.a.r.c.CachingConnectionFactory : Attempting to connect to: [localhost:5672]
2020-08-31 11:58:14.095 WARN 94891 --- [ntContainer#0-2] o.s.a.r.l.SimpleMessageListenerContainer : stopping container - restart recovery attempts exhausted
2020-08-31 11:58:14.095 INFO 94891 --- [ntContainer#0-2] o.s.a.r.l.SimpleMessageListenerContainer : Waiting for workers to finish.
2020-08-31 11:58:14.096 INFO 94891 --- [ntContainer#0-2] o.s.a.r.l.SimpleMessageListenerContainer : Successfully waited for workers to finish.
I see you have
stopping container - restart recovery attempts exhausted
too.
Perhaps the listener that keeps trying is from a different container factory?
Recovering consumer in 5000 ms.

Timeout exception Flink

I have a question regarding Flink. I am running an application in a local cluster, with 1 TaskManager and 4 Taskslots.
After some time of running the application, I got an Timeout error:
java.util.concurrent.TimeoutException: Heartbeat of TaskManager with id feea6a6702a0cf960ae2847b5bd25665 timed out.
I have seen some posts with this topic but any answer to it. Could you help me to see the root cause, or a posible troubleshooting?
I am using flink version 1.5.3
It seems that the docker container of taskmanagers and JobManager are stopped when this happens.
Let me add the error trace from the JobManager container logs:
2019-06-09 13:31:06,300 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job Socket Window NgsiEvent (ef3a860de48d54544d973754c6170d8b) switched from state FAILING to FAILED.
java.util.concurrent.TimeoutException: Heartbeat of TaskManager with id 63dbab620797b84da023b33578478238 timed out.
at org.apache.flink.runtime.jobmaster.JobMaster$TaskManagerHeartbeatListener.notifyHeartbeatTimeout(JobMaster.java:1609)
at org.apache.flink.runtime.heartbeat.HeartbeatManagerImpl$HeartbeatMonitor.run(HeartbeatManagerImpl.java:339)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.apache.flink.runtime.concurrent.akka.ActorSystemScheduledExecutorAdapter$ScheduledFutureTask.run(ActorSystemScheduledExecutorAdapter.java:154)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2019-06-09 13:31:06,308 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Could not restart the job Socket Window NgsiEvent (ef3a860de48d54544d973754c6170d8b) because the restart strategy prevented it.
java.util.concurrent.TimeoutException: Heartbeat of TaskManager with id 63dbab620797b84da023b33578478238 timed out.
at org.apache.flink.runtime.jobmaster.JobMaster$TaskManagerHeartbeatListener.notifyHeartbeatTimeout(JobMaster.java:1609)
at org.apache.flink.runtime.heartbeat.HeartbeatManagerImpl$HeartbeatMonitor.run(HeartbeatManagerImpl.java:339)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.apache.flink.runtime.concurrent.akka.ActorSystemScheduledExecutorAdapter$ScheduledFutureTask.run(ActorSystemScheduledExecutorAdapter.java:154)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2019-06-09 13:31:06,317 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Stopping checkpoint coordinator for job ef3a860de48d54544d973754c6170d8b.
2019-06-09 13:31:06,322 INFO org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore - Shutting down
2019-06-09 13:31:06,331 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink#16363182f31f:36715] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink#16363182f31f:36715]] Caused by: [16363182f31f]
2019-06-09 13:31:06,351 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Job ef3a860de48d54544d973754c6170d8b reached globally terminal state FAILED.
2019-06-09 13:31:06,434 INFO org.apache.flink.runtime.jobmaster.JobMaster - Stopping the JobMaster for job Socket Window NgsiEvent(ef3a860de48d54544d973754c6170d8b).
2019-06-09 13:31:06,447 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Suspending SlotPool.
2019-06-09 13:31:06,448 INFO org.apache.flink.runtime.jobmaster.JobMaster - Close ResourceManager connection 883e842633b0fd9a2e53ab45778581fe: JobManager is shutting down..
2019-06-09 13:31:06,449 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcActor - The rpc endpoint org.apache.flink.runtime.jobmaster.slotpool.SlotPool has not been started yet. Discarding message org.apache.flink.runtime.rpc.messages.LocalRpcInvocation until processing is started.
2019-06-09 13:31:06,457 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Disconnect job manager 00000000000000000000000000000000#akka.tcp://flink#jobmanager:6123/user/jobmanager_2 for job ef3a860de48d54544d973754c6170d8b from the resource manager.
2019-06-09 13:31:06,459 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Stopping SlotPool.
2019-06-09 13:31:06,460 INFO org.apache.flink.runtime.jobmaster.JobManagerRunner - JobManagerRunner already shutdown.
2019-06-09 13:31:16,304 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink#16363182f31f:36715] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink#16363182f31f:36715]] Caused by: [16363182f31f: Name or service not known]
2019-06-09 13:31:26,320 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink#16363182f31f:36715] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink#16363182f31f:36715]] Caused by: [16363182f31f: Name or service not known]
2019-06-09 13:31:36,286 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink#16363182f31f:36715] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink#16363182f31f:36715]] Caused by: [16363182f31f]
Thanks in advance!

Hazelcast memory is continuously increasing

I have a hazelcast cluster with two machines.
The only object in the cluster is a map. Analysing the log files I noticed that the health monitor starts to report a slow increase in memory consumption even though no new entries are being added to map (see sample of log entries below)
Any ideas of what may be causing the memory increase?
<p>2015-09-16 10:45:49 INFO HealthMonitor:? - [10.11.173.129]:5903
[dev] [3.2.1] memory.used=97.6M, memory.free=30.4M,
memory.total=128.0M, memory.max=128.0M, memory.used/total=76.27%,
memory.used/max=76.27%, load.process=0.00%, load.system=1.00%,
load.systemAverage=3.00%, thread.count=96, thread.peakCount=107,
event.q.size=0, executor.q.async.size=0, executor.q.client.size=0,
executor.q.operation.size=0, executor.q.query.size=0,
executor.q.scheduled.size=0, executor.q.io.size=0,
executor.q.system.size=0, executor.q.operation.size=0,
executor.q.priorityOperation.size=0, executor.q.response.size=0,
operations.remote.size=1, operations.running.size=0, proxy.count=2,
clientEndpoint.count=0, connection.active.count=2,
connection.count=2</p>
<p>2015-09-16 10:46:02 INFO
InternalPartitionService:? - [10.11.173.129]:5903 [dev] [3.2.1]
Remaining migration tasks in queue = 51 2015-09-16 10:46:12 DEBUG
TeleavisoIvrLoader:71 - Checking for new files... 2015-09-16 10:46:13
INFO InternalPartitionService:? - [10.11.173.129]:5903 [dev] [3.2.1]
All migration tasks has been completed, queues are empty. 2015-09-16
10:46:19 INFO HealthMonitor:? - [10.11.173.129]:5903 [dev] [3.2.1]
memory.used=103.9M, memory.free=24.1M, memory.total=128.0M,
memory.max=128.0M, memory.used/total=81.21%, memory.used/max=81.21%,
load.process=0.00%, load.system=1.00%, load.systemAverage=2.00%,
thread.count=73, thread.peakCount=107, event.q.size=0,
executor.q.async.size=0, executor.q.client.size=0,
executor.q.operation.size=0, executor.q.query.size=0,
executor.q.scheduled.size=0, executor.q.io.size=0,
executor.q.system.size=0, executor.q.operation.size=0,
executor.q.priorityOperation.size=0, executor.q.response.size=0,
operations.remote.size=0, operations.running.size=0, proxy.count=2,
clientEndpoint.count=0, connection.active.count=2,
connection.count=2</p>
<p>2015-09-16 10:46:49 INFO HealthMonitor:? - [10.11.173.129]:5903
[dev] [3.2.1] memory.used=105.1M, memory.free=22.9M,
memory.total=128.0M, memory.max=128.0M, memory.used/total=82.11%,
memory.used/max=82.11%, load.process=0.00%, load.system=1.00%,
load.systemAverage=1.00%, thread.count=73, thread.peakCount=107,
event.q.size=0, executor.q.async.size=0, executor.q.client.size=0,
executor.q.operation.size=0, executor.q.query.size=0,
executor.q.scheduled.size=0, executor.q.io.size=0,
executor.q.system.size=0, executor.q.operation.size=0,
executor.q.priorityOperation.size=0, executor.q.response.size=0,
operations.remote.size=0, operations.running.size=0, proxy.count=2,
clientEndpoint.count=0, connection.active.count=2,
connection.count=2</p>

Neo4j randomly shutting down

I am running neo4j on an EC2 instance. But for some reason it randomly shuts down from time to time. Is there a way to check the shutdown logs? And is there a way to automatically restart the server? I couldn't locate the log folder. But here's what my messages.log file looks like. This section covers the timeframe when the server went down (before 2015-04-13 05:39:59.084+0000) and when I manually restarted the server (at 2015-04-13 05:39:59.084+0000). You can see that there is no record of server issue or shutdown. Time frame before 2015-03-05 08:18:47.084+0000 contains info of the previous server restart.
2015-03-05 08:18:44.180+0000 INFO [o.n.s.m.Neo4jBrowserModule]: Mounted Neo4j Browser at [/browser]
2015-03-05 08:18:44.253+0000 INFO [o.n.s.w.Jetty9WebServer]: Mounting static content at [/webadmin] from [webadmin-html]
2015-03-05 08:18:44.311+0000 INFO [o.n.s.w.Jetty9WebServer]: Mounting static content at [/browser] from [browser]
2015-03-05 08:18:47.084+0000 INFO [o.n.s.CommunityNeoServer]: Server started on: http://0.0.0.0:7474/
2015-03-05 08:18:47.084+0000 INFO [o.n.s.CommunityNeoServer]: Remote interface ready and available at [http://0.0.0.0:7474/]
2015-03-05 08:18:47.084+0000 INFO [o.n.k.i.DiagnosticsManager]: --- SERVER STARTED END ---
2015-04-13 05:39:59.084+0000 INFO [o.n.s.CommunityNeoServer]: Setting startup timeout to: 120000ms based on -1
2015-04-13 05:39:59.265+0000 INFO [o.n.k.InternalAbstractGraphDatabase]: No locking implementation specified, defaulting to 'community'
2015-04-13 05:39:59.383+0000 INFO [o.n.k.i.DiagnosticsManager]: --- INITIALIZED diagnostics START ---
2015-04-13 05:39:59.384+0000 INFO [o.n.k.i.DiagnosticsManager]: Neo4j Kernel properties:
2015-04-13 05:39:59.389+0000 INFO [o.n.k.i.DiagnosticsManager]: neostore.propertystore.db.mapped_memory=78M
2015-04-13 05:39:59.389+0000 INFO [o.n.k.i.DiagnosticsManager]: neostore.nodestore.db.mapped_memory=21M

What is wrong with these Tomcat database connection pooling configuration

From last two years on tomcat I am using this configuration for database connection pooling in tomcat
<Resource auth="Container"
driverClassName="com.mysql.jdbc.Driver"
logAbandoned="true"
maxActive="100"
maxIdle="30"
maxWait="10000"
name="jdbc/maindb"
password="xxxxx"
removeAbandoned="true"
removeAbandonedTimeout="60"
type="javax.sql.DataSource"
url="jdbc:mysql://localhost:3306/maindb?zeroDateTimeBehavior=convertToNull"
connectionProperties="useEncoding=true;"
username="sqladmin" validationQuery="select 1"/>
On production server, from last one month, with this configuration - suddenly tomcat stop responding to any requests. And results in timeout. There is no errors in logs, but as soon as I shutdown tomcat a huge amount of error log comes, which seems to show some kind of deadlock in database connections.
To rectify it, I used database connection pooling configuration from http://tomcat.apache.org/tomcat-8.0-doc/jdbc-pool.html . After using this configuration, now I face two problems, on production, either a table lock occurs even after using INNODB engine, or some queries start returning empty result set even when query is perfectly fine.
<Resource name="jdbc/maindb"
auth="Container"
type="javax.sql.DataSource"
factory="org.apache.tomcat.jdbc.pool.DataSourceFactory"
testWhileIdle="true"
testOnBorrow="true"
testOnReturn="false"
validationQuery="SELECT 1"
validationInterval="30000"
timeBetweenEvictionRunsMillis="30000"
maxActive="100"
minIdle="10"
maxWait="10000"
initialSize="10"
removeAbandonedTimeout="60"
removeAbandoned="true"
logAbandoned="true"
minEvictableIdleTimeMillis="30000"
jmxEnabled="true"
jdbcInterceptors="org.apache.tomcat.jdbc.pool.interceptor.ConnectionState;
org.apache.tomcat.jdbc.pool.interceptor.StatementFinalizer"
username="sqladmin"
password="xxxxx"
driverClassName="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost:3306/maindb"/>
In case of first configuration, after shutting down tomcat following error logs start coming
04-Feb-2015 20:44:46.048 INFO [main] org.apache.catalina.core.StandardServer.await A valid shutdown command was received via the shutdown port. Stopping the Server instance.
04-Feb-2015 20:44:46.049 INFO [main] org.apache.coyote.AbstractProtocol.pause Pausing ProtocolHandler ["http-apr-8080"]
04-Feb-2015 20:44:46.100 INFO [main] org.apache.coyote.AbstractProtocol.pause Pausing ProtocolHandler ["ajp-apr-8009"]
04-Feb-2015 20:44:46.151 INFO [main] org.apache.catalina.core.StandardService.stopInternal Stopping service Catalina
04-Feb-2015 20:44:46.157 INFO [localhost-startStop-2] org.apache.catalina.core.StandardWrapper.unload Waiting for 81 instance(s) to be deallocated for Servlet [dispatcher]
04-Feb-2015 20:44:47.158 INFO [localhost-startStop-2] org.apache.catalina.core.StandardWrapper.unload Waiting for 81 instance(s) to be deallocated for Servlet [dispatcher]
04-Feb-2015 20:44:48.160 INFO [localhost-startStop-2] org.apache.catalina.core.StandardWrapper.unload Waiting for 81 instance(s) to be deallocated for Servlet [dispatcher]
04-Feb-2015 20:44:48.260 INFO [localhost-startStop-2] org.springframework.context.support.AbstractApplicationContext.doClose Closing WebApplicationContext for namespace 'dispatcher-servlet': startup date [Tue Feb 03 18:26:26 UTC 2015]; parent: Root WebApplicationContext
04-Feb-2015 20:44:48.307 INFO [localhost-startStop-2] org.springframework.context.support.AbstractApplicationContext.doClose Closing Root WebApplicationContext: startup date [Tue Feb 03 18:26:24 UTC 2015]; root of context hierarchy
04-Feb-2015 20:44:48.310 INFO [localhost-startStop-2] org.springframework.scheduling.concurrent.ExecutorConfigurationSupport.shutdown Shutting down ExecutorService 'taskExecutor'
04-Feb-2015 20:44:48.329 WARNING [localhost-startStop-2] org.apache.catalina.loader.WebappClassLoaderBase.clearReferencesThreads The web application [ROOT] is still processing a request that has yet to finish. This is very likely to create a memory leak. You can control the time allowed for requests to finish by using the unloadDelay attribute of the standard Context implementation. Stack trace of request processing thread:
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
org.apache.tomcat.dbcp.pool2.impl.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:582)
org.apache.tomcat.dbcp.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:439)
org.apache.tomcat.dbcp.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:360)
org.apache.tomcat.dbcp.dbcp2.PoolingDataSource.getConnection(PoolingDataSource.java:118)
org.apache.tomcat.dbcp.dbcp2.BasicDataSource.getConnection(BasicDataSource.java:1412)
com.myproj.dao.ConnectionPool.getConnection(ConnectionPool.java:41)
And in second case following error logs comes, while doing certain operation
22-Jan-2015 16:36:04.077 SEVERE [http-apr-8080-exec-2] com.myproj.dao.cart.impl.VisitorCartDaoImpl.addCartItem Lock wait timeout exceeded; try restarting transaction
java.sql.SQLException: Lock wait timeout exceeded; try restarting transaction
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:996)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3887)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3823)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2435)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2582)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2530)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1907)
at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2141)
at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2077)
at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2062)
at com.myproj.dao.cart.impl.VisitorCartDaoImpl.addCartItem(VisitorCartDaoImpl.java:96)
Is there anything wrong with the configuration or with the database? I am using MySQL 5.6 as the database and on production, mySQL is running on amazon RDS. And I am using tomcat 8.0.15

Resources