JTA transaction timeout exception - weblogic 10.X - timeout

I changed the JTA transaction timeout from admin console and set to 300, even after changing it fails saying JTA transaction unexpectedly rolled back (maybe due to a timeout) with a:
weblogic.transaction.RollbackException: Transaction timed out after 181 seconds`
To make sure whether my changes (timeout value 300) got reflected for that domain or not I checked under domain config.xml it got reflected with 300.
My question is, is there any other place also do I need to update the transaction timeout value and do I need to restart the server ?
Full stack trace after the exception from server below:
Caused by: org.springframework.transaction.UnexpectedRollbackException: JTA transaction unexpectedly rolled back (maybe due to a timeout); nested exception is weblogic.transaction.RollbackException: Transaction
timed out after 180 seconds
BEA1-160A800A149091F72E5E
at org.springframework.transaction.jta.JtaTransactionManager.doCommit(JtaTransactionManager.java:1031)
at org.springframework.transaction.support.AbstractPlatformTransactionManager.processCommit(AbstractPlatformTransactionManager.java:709)
at org.springframework.transaction.support.AbstractPlatformTransactionManager.commit(AbstractPlatformTransactionManager.java:678)
at org.springframework.transaction.interceptor.TransactionAspectSupport.completeTransactionAfterThrowing(TransactionAspectSupport.java:359)
at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:110)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
at $Proxy103.saveRegistryData(Unknown Source)
at gov.cms.pqri.arch.submission.registry.bean.RegDataAccessManager.persistRegistry(RegDataAccessManager.java:54)
... 14 more
Caused by: weblogic.transaction.RollbackException: Transaction timed out after 180 seconds
BEA1-160A800A149091F72E5E
at weblogic.transaction.internal.TransactionImpl.throwRollbackException(TransactionImpl.java:1818)
at weblogic.transaction.internal.ServerTransactionImpl.internalCommit(ServerTransactionImpl.java:333)
at weblogic.transaction.internal.ServerTransactionImpl.commit(ServerTransactionImpl.java:227)
at weblogic.transaction.internal.TransactionManagerImpl.commit(TransactionManagerImpl.java:281)
at org.springframework.transaction.jta.JtaTransactionManager.doCommit(JtaTransactionManager.java:1028)
... 22 more

after changing the stuck Thread Max time to 300 under servers -> configuration -> tuning (tab) from admin console it is getting updated and working fine.

I have also came across this issue and have resolved the same, since this is related to JTA transaction so we need to increase the timeout of JTA as well along with the time out for stuck max thread. Please click on JTA from the weblogic console home and increase the JTA timeout from 30(by default) to 300.

We met same issue on Weblogic 12.1.2 [JTA transaction unexpectedly rolled back (maybe due to a timeout)] after all investigation we found the root cause of the problem.In my opinion it occurs due to huge dataset processing transactional and near the end of the process If an exception is thrown, JTA is rolling back data as expected.But it does not give the details of the error.In our case ,it mostly cause because of the database integrity (e.g we try to insert data a column with smaller size than data.)
In summary,it will be the best way to investigate db logs instead of increasing stuck Thread Max time.Thread max time can be a solution,but not a proper solution for real enterprise systems.
Also this issue discussed on another stackover link and hibernate jira issue
And solution suggested:
This is a default behaviour of Weblogic JTA realization. To obtain
root exception you should set system property
weblogic.transaction.allowOverrideSetRollbackReason to true.
One of the solution is add this line into
/bin/setDomainEnv.cmd:
set JAVA_OPTIONS=%JAVA_OPTIONS% -Dweblogic.transaction.allowOverrideSetRollbackReason=true

I got my JTA timeouts increased by adding jta.properties file into config folder of my app with lines:
com.atomikos.icatch.default_jta_timeout=600000
com.atomikos.icatch.max_timeout=600000

Related

Transactions not expired after timeout expired

We're using neo4j (3.1.5-enterprise) for one of our services. (Over HTTP)
We set dbms.transaction.timeout=150s in our neo4j config file .
We have a scenario which may take more time than 150 seconds, but what we would like is for the transaction to be expired after 150 seconds anyway.
For some reason its not happening and the transaction continue until it fully executed but its not being stopped after 150 seconds, any guess why?
In our application logs I can see the following exception (more stacktrace details below):
neo.db.NeoHttpDriver - Errors in response:
[NeoResponseError{
code='Neo.DatabaseError.Statement.ExecutionFailed',
message='Transaction timeout. (Overtime: 23793 ms).',
stackTrace='org.neo4j.kernel.guard.GuardTimeoutException: Transaction timeout. (Overtime: 23793 ms).
...
Also, our service steps(in the specific scenario that may take long time) in general is open a transaction, lock some common entity and proceed. Since the transaction is not expired and released(and therefor the common entity continue to be locked) after 150 seconds, then other threads may also be locked for a long time.
Thanks!
Orel
Exception stacktrace:
15:00:59.627 [DefaultThreadPool-7] DEBUG c.e.e.m.neo.db.NeoHttpDriver - Errors in response: [NeoResponseError{code='Neo.DatabaseError.Statement.ExecutionFailed', message='Transaction timeout. (Overtime: 23793 ms).', stackTrace='org.neo4j.kernel.guard.GuardTimeoutException: Transaction timeout. (Overtime: 23793 ms).
at org.neo4j.kernel.guard.TimeoutGuard.check(TimeoutGuard.java:71)
at org.neo4j.kernel.guard.TimeoutGuard.check(TimeoutGuard.java:57)
at org.neo4j.kernel.guard.TimeoutGuard.check(TimeoutGuard.java:49)
at org.neo4j.kernel.impl.api.GuardingStatementOperations.nodeCursorById(GuardingStatementOperations.java:300)
at org.neo4j.kernel.impl.api.OperationsFacade.nodeHasProperty(OperationsFacade.java:343)
at org.neo4j.cypher.internal.spi.v3_1.TransactionBoundQueryContext$NodeOperations.hasProperty(TransactionBoundQueryContext.scala:319)
at org.neo4j.cypher.internal.compatibility.ExceptionTranslatingQueryContextFor3_1$ExceptionTranslatingOperations$$anonfun$hasProperty$1.apply$mcZ$sp(ExceptionTranslatingQueryContextFor3_1.scala:245)
at org.neo4j.cypher.internal.compatibility.ExceptionTranslatingQueryContextFor3_1$ExceptionTranslatingOperations$$anonfun$hasProperty$1.apply(ExceptionTranslatingQueryContextFor3_1.scala:245)
at org.neo4j.cypher.internal.compatibility.ExceptionTranslatingQueryContextFor3_1$ExceptionTranslatingOperations$$anonfun$hasProperty$1.apply(ExceptionTranslatingQueryContextFor3_1.scala:245)
at org.neo4j.cypher.internal.spi.v3_1.ExceptionTranslationSupport$class.translateException(ExceptionTranslationSupport.scala:32)
at org.neo4j.cypher.internal.compatibility.ExceptionTranslatingQueryContextFor3_1.translateException(ExceptionTranslatingQueryContextFor3_1.scala:34)
at org.neo4j.cypher.internal.compatibility.ExceptionTranslatingQueryContextFor3_1$ExceptionTranslatingOperations.hasProperty(ExceptionTranslatingQueryContextFor3_1.scala:245)
at org.neo4j.cypher.internal.compiler.v3_1.spi.DelegatingOperations.hasProperty(DelegatingQueryContext.scala:221)
at org.neo4j.cypher.internal.compiler.v3_1.pipes.AbstractSetPropertyOperation.setProperty(SetOperation.scala:98)
at org.neo4j.cypher.internal.compiler.v3_1.pipes.SetEntityPropertyOperation.set(SetOperation.scala:117)
at org.neo4j.cypher.internal.compiler.v3_1.pipes.SetPipe$$anonfun$internalCreateResults$1.apply(SetPipe.scala:31)
at org.neo4j.cypher.internal.compiler.v3_1.pipes.SetPipe$$anonfun$internalCreateResults$1.apply(SetPipe.scala:30)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at org.neo4j.cypher.internal.compiler.v3_1.ClosingIterator$$anonfun$next$1.apply(ResultIterator.scala:71)
at org.neo4j.cypher.internal.compiler.v3_1.ClosingIterator$$anonfun$next$1.apply(ResultIterator.scala:68)
at org.neo4j.cypher.internal.compiler.v3_1.ClosingIterator$$anonfun$failIfThrows$1.apply(ResultIterator.scala:94)
at org.neo4j.cypher.internal.compiler.v3_1.ClosingIterator.decoratedCypherException(ResultIterator.scala:103)
at org.neo4j.cypher.internal.compiler.v3_1.ClosingIterator.failIfThrows(ResultIterator.scala:92)
at org.neo4j.cypher.internal.compiler.v3_1.ClosingIterator.next(ResultIterator.scala:68)
at org.neo4j.cypher.internal.compiler.v3_1.ClosingIterator.next(ResultIterator.scala:49)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at org.neo4j.cypher.internal.compiler.v3_1.ClosingIterator.foreach(ResultIterator.scala:49)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:183)
at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:45)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at org.neo4j.cypher.internal.compiler.v3_1.ClosingIterator.to(ResultIterator.scala:49)
at scala.collection.TraversableOnce$class.toList(TraversableOnce.scala:294)
at org.neo4j.cypher.internal.compiler.v3_1.ClosingIterator.toList(ResultIterator.scala:49)
at org.neo4j.cypher.internal.compiler.v3_1.EagerResultIterator.<init>(ResultIterator.scala:35)
at org.neo4j.cypher.internal.compiler.v3_1.ClosingIterator.toEager(ResultIterator.scala:53)
at org.neo4j.cypher.internal.compiler.v3_1.executionplan.DefaultExecutionResultBuilderFactory$ExecutionWorkflowBuilder.buildResultIterator(DefaultExecutionResultBuilderFactory.scala:109)
at org.neo4j.cypher.internal.compiler.v3_1.executionplan.DefaultExecutionResultBuilderFactory$ExecutionWorkflowBuilder.createResults(DefaultExecutionResultBuilderFactory.scala:99)
at org.neo4j.cypher.internal.compiler.v3_1.executionplan.DefaultExecutionResultBuilderFactory$ExecutionWorkflowBuilder.build(DefaultExecutionResultBuilderFactory.scala:68)
at org.neo4j.cypher.internal.compiler.v3_1.executionplan.InterpretedExecutionPlanBuilder$$anonfun$getExecutionPlanFunction$1.apply(ExecutionPlanBuilder.scala:164)
at org.neo4j.cypher.internal.compiler.v3_1.executionplan.InterpretedExecutionPlanBuilder$$anonfun$getExecutionPlanFunction$1.apply(ExecutionPlanBuilder.scala:148)
at org.neo4j.cypher.internal.compiler.v3_1.executionplan.InterpretedExecutionPlanBuilder$$anon$1.run(ExecutionPlanBuilder.scala:123)
at org.neo4j.cypher.internal.compatibility.CompatibilityFor3_1$ExecutionPlanWrapper$$anonfun$run$1.apply(CompatibilityFor3_1.scala:275)
at org.neo4j.cypher.internal.compatibility.CompatibilityFor3_1$ExecutionPlanWrapper$$anonfun$run$1.apply(CompatibilityFor3_1.scala:273)
at org.neo4j.cypher.internal.compatibility.exceptionHandlerFor3_1$runSafely$.apply(CompatibilityFor3_1.scala:190)
at org.neo4j.cypher.internal.compatibility.CompatibilityFor3_1$ExecutionPlanWrapper.run(CompatibilityFor3_1.scala:273)
at org.neo4j.cypher.internal.PreparedPlanExecution.execute(PreparedPlanExecution.scala:26)
at org.neo4j.cypher.internal.ExecutionEngine.execute(ExecutionEngine.scala:107)
at org.neo4j.cypher.internal.javacompat.ExecutionEngine.executeQuery(ExecutionEngine.java:59)
at org.neo4j.server.rest.transactional.TransactionHandle.safelyExecute(TransactionHandle.java:371)
at org.neo4j.server.rest.transactional.TransactionHandle.executeStatements(TransactionHandle.java:323)
at org.neo4j.server.rest.transactional.TransactionHandle.execute(TransactionHandle.java:230)
at org.neo4j.server.rest.transactional.TransactionHandle.execute(TransactionHandle.java:119)
at org.neo4j.server.rest.web.TransactionalService.lambda$executeStatements$0(TransactionalService.java:203)
Most likely the problem is that the tx is waiting on a lock. Prior to Neo4j 3.2, dbms.transaction.timeout cannot cover the case of terminating a transaction that's waiting on a lock (or rather, it will mark it for termination, but the actual termination won't happen until the lock is acquired).
In Neo4j 3.2, dbms.lock.acquisition.timeout was introduced, which interrupts waiting on a lock and allows the thread to check if the tx has been terminated and take appropriate action.
The following is based on an answer provided by Neo4j Support:
dbms.lock.acquisition.timeout
As a starting point, dbms.lock.acquisition.timeout was only added in Neo4j 3.2, it does not exist for 3.1. where we don't yet have lock acquisition timeout, hence wait times on locks can over-runs past the set limit. Things like GC can also extend the time. However, as you're currently on 3.1.5, dbms.lock.acquisition.timeout would not yet be enforced.
dbms.transaction.timeout
dbms.transaction.timeout marks a transaction for termination, but the actual logic of checking this and performing the termination happens on a running thread, not one waiting on locks, and doesn't cause the thread to wake up and check. Presumably the logic for terminating a thread upon timeout is that some other thread periodically checks execution time for a transaction, and if it has exceeded the transaction timeout, sets a boolean variable on the transaction to indicate that it is marked for termination. The actual termination of the thread likely happens in an event loop for the transaction, where it checks that variable to see if it's marked for termination, then terminates and rolls back. A thread that attempts to acquire a lock enters a waiting state when the lock is already held by another thread. During this waiting state, the event loop is not being processed, so the thread never reaches the point in the event loop where it can check if it's been marked as terminated and take care of it.
Bottom line:
dbms.transaction.timeout does not cause a hard timeout, it only marks the transaction as timed-out, which will cause it to rollback once the flag is checked.

What causes dask job failure with CancelledError exception

I have been seeing below error message for quite some time now but could not figure out what leads to the failure.
Error:
concurrent.futures._base.CancelledError: ('sort_index-f23b0553686b95f2d91d4a3fda85f229', 7)
On restart of dask cluster it runs successfully.
If running a dask-cloudprovider ECSCluster or FargateCluster the concurrent.futures._base.CancelledError can result from a long-running step in computation where there is no output (logging or otherwise) to the Client. In these cases, due to the lack of interaction with the client, the scheduler regards itself as "idle" and times out after the configured cloudprovider.ecs.scheduler_timeout period, which defaults to 5 minutes. The CancelledError error message is misleading, but if you look in the logs for the scheduler task itself it will record the idle timeout.
The solution is to set scheduler_timeout to a higher value, either via config or by passing directly to the ECSCluster/FargateCluster constructor.

Dataflow concurrency error with ValueState

The Beam 2.1 pipeline uses ValueState in a stateful DoFn. It runs fine with a single worker but when scaling is enabled will fail with "Unable to read value from state" and the root exception below. Any ideas what could cause this?
Caused by: java.util.concurrent.ExecutionException: com.google.cloud.dataflow.worker.KeyTokenInvalidException: Unable to fetch data due to token mismatch for key ��
at com.google.cloud.dataflow.worker.repackaged.com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:500)
at com.google.cloud.dataflow.worker.repackaged.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:459)
at com.google.cloud.dataflow.worker.repackaged.com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:76)
at com.google.cloud.dataflow.worker.repackaged.com.google.common.util.concurrent.ForwardingFuture.get(ForwardingFuture.java:62)
at com.google.cloud.dataflow.worker.WindmillStateReader$WrappedFuture.get(WindmillStateReader.java:309)
at com.google.cloud.dataflow.worker.WindmillStateInternals$WindmillValue.read(WindmillStateInternals.java:384)
... 16 more
Caused by: com.google.cloud.dataflow.worker.KeyTokenInvalidException: Unable to fetch data due to token mismatch for key ��
at com.google.cloud.dataflow.worker.WindmillStateReader.consumeResponse(WindmillStateReader.java:469)
at com.google.cloud.dataflow.worker.WindmillStateReader.startBatchAndBlock(WindmillStateReader.java:411)
at com.google.cloud.dataflow.worker.WindmillStateReader$WrappedFuture.get(WindmillStateReader.java:306)
... 17 more
I believe that exception should just be rethrown. It is thrown by the state mechanism to indicate that additional work on that key should not be performed, and will be automatically retried by the Dataflow runner.
These typically indicate that either that particular work should be performed on a different worker (thus proceeding wouldn't be helpful).
It may be possible that misusing state -- storing the state object from one key and attempting to use it on a different key -- could also lead to these errors. If that is the case, you may be able to see more diagnostic messages in either the worker or shuffler logs in Stackdriver logging.
If neither retrying nor looking at logging and how you use the state objects help, please provide a job ID demonstrating the problem.

what is the causes of Exhausted 2 retries in sidekiq.log

On my sidekiq.log I have a warn message WARN: {:message=>"Exhausted 2 retries"}. I would like to know how to fix that issues.
complete gist at line 16.
The problem is here:
EXECABORT Transaction discarded because of previous errors.
Your Redis instance is broken somehow, most likely out of memory.

I want choose a connection pool for high throughput application

I used C3P0 connection pool to now but get not stable behavior. I test in various kinds of environments and improvement database options. I found today Tomcat 7 jdbc connection pool released and get it. Do anyone use it and get better performance than C3p0?
(I also test boncp connection pool)
My application is very high load. My problems are:
after pass a hour connection pool throws "Can't Open Connection" exception.
sometimes I get this exception "Attempted to use a closed or broken resource" pool and when restart my connection pool(by its mbean) problem fixed
My C3P0 parameters are:
initialPoolSize = 1
minPoolSize=1
maxPoolSize = 50
maxIdleTime = 20000
debugUnreturnedConnectionStackTraces = true
propertyCycle =60
acquireRetryDelay =1000
maxConnectionAge =0
checkoutTimeout =5000
acquireIncrement =1
numHelperThreads =5
acquireRetryAttempts =1
unreturnedConnectionTimeout =90
breakAfterAcquireFailure =false
I also test this parameters with several value but don't see any perceptible changes.
I haven't tried the tomcat pool yet, but will look into this soon. What you can probably do is tweak your c3p0 pool for optimization. This will vary according to the actual load over your application, but as compared to other pooling technologies, I've found c3p0 to be flexible.
It would be nice if you could elaborate your problem here, and mention the pooling parameters you are using.

Resources