Should a CAN Bus-Off condition immediately reset the connection flags despite of Nodeguarding? - connection

I am wondering whether a detected CAN Bus-Off condition should have a direct impact on the connection status flags in my CANopen Master. It is connected to CAN Nodes that handle analog and digital I/O.
For each Node there is a flag in the CANopen Master stating its connection status to that CAN node (TRUE/FALSE), it is controlled by the Nodeguarding response handler.
Whenever our Nodeguarding RTR frame gets responded by the Node, it is set to TRUE, otherwise, after reaching the Life Time with no answer, it is set to FALSE. The default Life Time is 3x 500 ms = 1,5 seconds.
So, when I detect a Bus-Off condition before the Life Time of that Node expires: Should I immediately reset the connection status flag to FALSE, or should I just wait and see for the Nodeguarding to expire?
I think it could well happen that the Bus-Off condition goes away before the Nodeguarding would expire, and I don't know if it is a good idea to be that intolerant.

Related

How to calculate progress rate of DRBD?

WinDRBD's progress is only visible when syncing. But I'd like to know how far has gone if the out-of-sync remains.
drbdsetup status foo --v --s
Through the detail view command, the following contents were obtained.
foo node-id:2 role:Primary suspended:no
write-ordering:flush
volume:1 minor:1 disk:UpToDate backing_dev:\DosDevices\G: quorum:yes
size:524253532 read:7238338 written:5209825 al-writes:589 bm-writes:198 upper-pending:0 lower-pending:10
al-suspended:no blocked:no
Node1 node-id:1 connection:Connected role:Secondary congested:no ap-in-flight:0 rs-in-flight:7168
volume:1 replication:SyncSource peer-disk:Inconsistent done:85.32 resync-suspended:no
received:0 sent:1226764 out-of-sync:210484 pending:6 unacked:10 dbdt1:14.99 eta:14
done:85.32
This part is known as the progress rate.
How was this calculated?
When a resource becomes disconnected, the out-of-sync counter will begin to increment on the peer that is currently Primary. When the resource reconnects, the bitmaps (stored in DRBD's metadata) are compared to determine which blocks became out-of-sync while disconnected, and proceeds to resync those blocks in the background. Any writes that occur during a background resync are immediately replicated, and if that write happens to touch a block that's a part of the background resync, it's removed from the resync queue (since it was updated by the foreground replication).

Transactions not expired after timeout expired

We're using neo4j (3.1.5-enterprise) for one of our services. (Over HTTP)
We set dbms.transaction.timeout=150s in our neo4j config file .
We have a scenario which may take more time than 150 seconds, but what we would like is for the transaction to be expired after 150 seconds anyway.
For some reason its not happening and the transaction continue until it fully executed but its not being stopped after 150 seconds, any guess why?
In our application logs I can see the following exception (more stacktrace details below):
neo.db.NeoHttpDriver - Errors in response:
[NeoResponseError{
code='Neo.DatabaseError.Statement.ExecutionFailed',
message='Transaction timeout. (Overtime: 23793 ms).',
stackTrace='org.neo4j.kernel.guard.GuardTimeoutException: Transaction timeout. (Overtime: 23793 ms).
...
Also, our service steps(in the specific scenario that may take long time) in general is open a transaction, lock some common entity and proceed. Since the transaction is not expired and released(and therefor the common entity continue to be locked) after 150 seconds, then other threads may also be locked for a long time.
Thanks!
Orel
Exception stacktrace:
15:00:59.627 [DefaultThreadPool-7] DEBUG c.e.e.m.neo.db.NeoHttpDriver - Errors in response: [NeoResponseError{code='Neo.DatabaseError.Statement.ExecutionFailed', message='Transaction timeout. (Overtime: 23793 ms).', stackTrace='org.neo4j.kernel.guard.GuardTimeoutException: Transaction timeout. (Overtime: 23793 ms).
at org.neo4j.kernel.guard.TimeoutGuard.check(TimeoutGuard.java:71)
at org.neo4j.kernel.guard.TimeoutGuard.check(TimeoutGuard.java:57)
at org.neo4j.kernel.guard.TimeoutGuard.check(TimeoutGuard.java:49)
at org.neo4j.kernel.impl.api.GuardingStatementOperations.nodeCursorById(GuardingStatementOperations.java:300)
at org.neo4j.kernel.impl.api.OperationsFacade.nodeHasProperty(OperationsFacade.java:343)
at org.neo4j.cypher.internal.spi.v3_1.TransactionBoundQueryContext$NodeOperations.hasProperty(TransactionBoundQueryContext.scala:319)
at org.neo4j.cypher.internal.compatibility.ExceptionTranslatingQueryContextFor3_1$ExceptionTranslatingOperations$$anonfun$hasProperty$1.apply$mcZ$sp(ExceptionTranslatingQueryContextFor3_1.scala:245)
at org.neo4j.cypher.internal.compatibility.ExceptionTranslatingQueryContextFor3_1$ExceptionTranslatingOperations$$anonfun$hasProperty$1.apply(ExceptionTranslatingQueryContextFor3_1.scala:245)
at org.neo4j.cypher.internal.compatibility.ExceptionTranslatingQueryContextFor3_1$ExceptionTranslatingOperations$$anonfun$hasProperty$1.apply(ExceptionTranslatingQueryContextFor3_1.scala:245)
at org.neo4j.cypher.internal.spi.v3_1.ExceptionTranslationSupport$class.translateException(ExceptionTranslationSupport.scala:32)
at org.neo4j.cypher.internal.compatibility.ExceptionTranslatingQueryContextFor3_1.translateException(ExceptionTranslatingQueryContextFor3_1.scala:34)
at org.neo4j.cypher.internal.compatibility.ExceptionTranslatingQueryContextFor3_1$ExceptionTranslatingOperations.hasProperty(ExceptionTranslatingQueryContextFor3_1.scala:245)
at org.neo4j.cypher.internal.compiler.v3_1.spi.DelegatingOperations.hasProperty(DelegatingQueryContext.scala:221)
at org.neo4j.cypher.internal.compiler.v3_1.pipes.AbstractSetPropertyOperation.setProperty(SetOperation.scala:98)
at org.neo4j.cypher.internal.compiler.v3_1.pipes.SetEntityPropertyOperation.set(SetOperation.scala:117)
at org.neo4j.cypher.internal.compiler.v3_1.pipes.SetPipe$$anonfun$internalCreateResults$1.apply(SetPipe.scala:31)
at org.neo4j.cypher.internal.compiler.v3_1.pipes.SetPipe$$anonfun$internalCreateResults$1.apply(SetPipe.scala:30)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at org.neo4j.cypher.internal.compiler.v3_1.ClosingIterator$$anonfun$next$1.apply(ResultIterator.scala:71)
at org.neo4j.cypher.internal.compiler.v3_1.ClosingIterator$$anonfun$next$1.apply(ResultIterator.scala:68)
at org.neo4j.cypher.internal.compiler.v3_1.ClosingIterator$$anonfun$failIfThrows$1.apply(ResultIterator.scala:94)
at org.neo4j.cypher.internal.compiler.v3_1.ClosingIterator.decoratedCypherException(ResultIterator.scala:103)
at org.neo4j.cypher.internal.compiler.v3_1.ClosingIterator.failIfThrows(ResultIterator.scala:92)
at org.neo4j.cypher.internal.compiler.v3_1.ClosingIterator.next(ResultIterator.scala:68)
at org.neo4j.cypher.internal.compiler.v3_1.ClosingIterator.next(ResultIterator.scala:49)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at org.neo4j.cypher.internal.compiler.v3_1.ClosingIterator.foreach(ResultIterator.scala:49)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:183)
at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:45)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at org.neo4j.cypher.internal.compiler.v3_1.ClosingIterator.to(ResultIterator.scala:49)
at scala.collection.TraversableOnce$class.toList(TraversableOnce.scala:294)
at org.neo4j.cypher.internal.compiler.v3_1.ClosingIterator.toList(ResultIterator.scala:49)
at org.neo4j.cypher.internal.compiler.v3_1.EagerResultIterator.<init>(ResultIterator.scala:35)
at org.neo4j.cypher.internal.compiler.v3_1.ClosingIterator.toEager(ResultIterator.scala:53)
at org.neo4j.cypher.internal.compiler.v3_1.executionplan.DefaultExecutionResultBuilderFactory$ExecutionWorkflowBuilder.buildResultIterator(DefaultExecutionResultBuilderFactory.scala:109)
at org.neo4j.cypher.internal.compiler.v3_1.executionplan.DefaultExecutionResultBuilderFactory$ExecutionWorkflowBuilder.createResults(DefaultExecutionResultBuilderFactory.scala:99)
at org.neo4j.cypher.internal.compiler.v3_1.executionplan.DefaultExecutionResultBuilderFactory$ExecutionWorkflowBuilder.build(DefaultExecutionResultBuilderFactory.scala:68)
at org.neo4j.cypher.internal.compiler.v3_1.executionplan.InterpretedExecutionPlanBuilder$$anonfun$getExecutionPlanFunction$1.apply(ExecutionPlanBuilder.scala:164)
at org.neo4j.cypher.internal.compiler.v3_1.executionplan.InterpretedExecutionPlanBuilder$$anonfun$getExecutionPlanFunction$1.apply(ExecutionPlanBuilder.scala:148)
at org.neo4j.cypher.internal.compiler.v3_1.executionplan.InterpretedExecutionPlanBuilder$$anon$1.run(ExecutionPlanBuilder.scala:123)
at org.neo4j.cypher.internal.compatibility.CompatibilityFor3_1$ExecutionPlanWrapper$$anonfun$run$1.apply(CompatibilityFor3_1.scala:275)
at org.neo4j.cypher.internal.compatibility.CompatibilityFor3_1$ExecutionPlanWrapper$$anonfun$run$1.apply(CompatibilityFor3_1.scala:273)
at org.neo4j.cypher.internal.compatibility.exceptionHandlerFor3_1$runSafely$.apply(CompatibilityFor3_1.scala:190)
at org.neo4j.cypher.internal.compatibility.CompatibilityFor3_1$ExecutionPlanWrapper.run(CompatibilityFor3_1.scala:273)
at org.neo4j.cypher.internal.PreparedPlanExecution.execute(PreparedPlanExecution.scala:26)
at org.neo4j.cypher.internal.ExecutionEngine.execute(ExecutionEngine.scala:107)
at org.neo4j.cypher.internal.javacompat.ExecutionEngine.executeQuery(ExecutionEngine.java:59)
at org.neo4j.server.rest.transactional.TransactionHandle.safelyExecute(TransactionHandle.java:371)
at org.neo4j.server.rest.transactional.TransactionHandle.executeStatements(TransactionHandle.java:323)
at org.neo4j.server.rest.transactional.TransactionHandle.execute(TransactionHandle.java:230)
at org.neo4j.server.rest.transactional.TransactionHandle.execute(TransactionHandle.java:119)
at org.neo4j.server.rest.web.TransactionalService.lambda$executeStatements$0(TransactionalService.java:203)
Most likely the problem is that the tx is waiting on a lock. Prior to Neo4j 3.2, dbms.transaction.timeout cannot cover the case of terminating a transaction that's waiting on a lock (or rather, it will mark it for termination, but the actual termination won't happen until the lock is acquired).
In Neo4j 3.2, dbms.lock.acquisition.timeout was introduced, which interrupts waiting on a lock and allows the thread to check if the tx has been terminated and take appropriate action.
The following is based on an answer provided by Neo4j Support:
dbms.lock.acquisition.timeout
As a starting point, dbms.lock.acquisition.timeout was only added in Neo4j 3.2, it does not exist for 3.1. where we don't yet have lock acquisition timeout, hence wait times on locks can over-runs past the set limit. Things like GC can also extend the time. However, as you're currently on 3.1.5, dbms.lock.acquisition.timeout would not yet be enforced.
dbms.transaction.timeout
dbms.transaction.timeout marks a transaction for termination, but the actual logic of checking this and performing the termination happens on a running thread, not one waiting on locks, and doesn't cause the thread to wake up and check. Presumably the logic for terminating a thread upon timeout is that some other thread periodically checks execution time for a transaction, and if it has exceeded the transaction timeout, sets a boolean variable on the transaction to indicate that it is marked for termination. The actual termination of the thread likely happens in an event loop for the transaction, where it checks that variable to see if it's marked for termination, then terminates and rolls back. A thread that attempts to acquire a lock enters a waiting state when the lock is already held by another thread. During this waiting state, the event loop is not being processed, so the thread never reaches the point in the event loop where it can check if it's been marked as terminated and take care of it.
Bottom line:
dbms.transaction.timeout does not cause a hard timeout, it only marks the transaction as timed-out, which will cause it to rollback once the flag is checked.

QTP Recovery scenario used to "skip" consecutive FAILED steps with 0 timeout -- how can I restore original timeout value?

Suppose I use QTPs recovery scenario manager to set the playback synchronization timeout to 0. The handler would return with "continue with next statement".
I'd do that to make sure that any following playback statements don't waste their time waiting for the next non-existing/non-matching step before failing:
I have a lot of GUI tests that kind of get stuck because let's say if 10 controls are missing, their (consecutive) playback steps produce 10 timeout waits before failing. If the playback timeout is 30 seconds, I loose 10x30 seconds=5 minutes execution time while it really would be sufficient to wait for 30 seconds ONCE (because the app does not change anymore -- we waited a full timeout period already).
Now if I have 100 test cases (=action iterations), this possibly happens 100 times, wasting 500 minutes of my test exec time window.
That's why I come up with the idea of a recovery scenario function setting the timeout to 0 after/upon the first failed playback step. This would accelerate the speed while skipping the rightly-FAILED step, yet would not compromise the precision/reliability of identifying the next matching GUI context (which creates a PASSED step).
Then of course upon the next passed playback step, I would want to restore the original timeout value. How could I do that? This is my question.
One cannot define a recovery scenario function that is called for PASSED steps.
I am currently thinking about setting a method function for Reporter.ReportEvent, and "sniffing" for PASSED log entries there. I'd install that method function in the scenario recovery function which sets timeout to 0. Then, when the "sniffer" function senses a ReportEvent call with PASSED status during one of the following playback steps, I'd reset everything (i.e. restore the original timeout, and uninstall the method function). (I am 99% sure, however, that .Click and .Set methods do not call ReportEvent to write their result status...so this option might probably not work.)
Better ideas? This really bugs me.
It sounds to me like you tests aren't designed correctly, if you fail to find an object why do you continue?
One possible (non recovery scenario) solution would be to use RegisterUserFunc to override the methods you are using in order to do an obj.Exist(0) before running the required method.
Function MyClick(obj)
If obj.Exist(1) Then
obj.Click
Else
Reporter.ReportEvent micFail, "Click failed, no object", "Object does not exist"
End If
End Function
RegisterUserFunc "Link", "Click", "MyClick"
RegisterUserFunc "WebButton", "Click", "MyClick"
''# etc
If you have many controls of which some may be missing and you know that after 10 seconds you mentioned (when the first timeout occurs), nothing more will show up, then you can use the exists method with a timeout parameter.
Something like this:
timeout = 10
For Each control in controls
If control.exists(timeout) Then
do something with the control
Else
timeout = 0
End If
Next
Now only the first timeout will be 10 seconds. Each and every subsequent timeout in your collection of controls will have the timeout set to 0 which will save your time.

MQ Connection - 2009 error

am connectting the MQ with below code. I am able connected to MQ successfully. My case is i place the messages to MQ every 1 min once. After disconnecting the cable i get a ResonCode error but IsConnected property still show true. Is this is the right way to check if the connection is still connected ? Or there any best pratcices around that.
I would like to open the connection when applicaiton is started keep it open for ever.
public static MQQueueManager ConnectMQ()
{
if ((queueManager == null) || (!queueManager.IsConnected)||(queueManager.ReasonCode == 2009))
{
queueManager = new MQQueueManager();
}
return queueManager;
}
The behavior of the WMQ client connection is that when idle it will appear to be connected until an API call fails or the connection times out. So isConnected() will likely report true until a get, put or inquire call is attempted and fails, at which point QMgr will then report disconnected.
The other thing to consider here is that 2009 is not the only code you might get. It happens to be the one you get when the connection is severed but there are connection codes for QMgr shutting down, channel shutting down, and a variety of resource and other errors.
Typically for a requirement to maintain a constant connection you would want to wrap the connect and message processing loop inside a try/catch block nested inside a while statement. When you catch an exception other than an intentional exit, close the objects and QMgr, sleep at least 5 seconds, then loop around to the top of the while. The sleep is crucial because if you get caught in a tight reconnect loop and throw hundreds of connection attempts at the QMgr, you can bring even a mainframe QMgr to its knees.
An alternative is to use a v7 WMQ client and QMgr. With this combination, automatic reconnection is configurable as a channel configuration.

Resetting comm event mask

I have been doing overlapped serial port communication in Delphi lately and there is one problem I'm not sure how to solve.
I communicate with a modem. I write a request frame (an AT command) to the modem's COM port and then wait for the modem to respond. The event mask of the port is set to EV_RXCHAR, so when I write a request, I call WaitCommEvent() and start waiting for data to appear in the input queue. When overlapped waiting for event finishes, I immediately start reading data from the queue and read all that the device sends at once:
1) write a request
2) call WaitCommEvent() and wait until waiting finishes
3) read all the data that the device sends (not only the data being in the input queue at that moment)
4) do something and then goto 1
Waiting for event finishes after first byte appears in the input queue. During my read operation, however, more bytes appear in the queue and each of them causes an internal event flag to be set. This means that when I read all the data from the queue and then call WaitCommEvent() for the second time, it will immediately return with EV_RXCHAR mask, even though there is no data to be read.
How should I handle reading and waiting for event to be sure that the event mask returned by WaitCommEvent() is always valid? Is it possible to reset the flags of the serial port so that when I read all data from the queue and call WaitCommEvent() after then, it will not return immediately with a mask that was valid before I read the data?
The only solution that comes to my mind is this:
1) write a request
2) call WaitCommEvent() and wait until waiting finishes
3) read all the data that the device sends (not only the data being in the input queue at that moment)
4) call WaitCommEvent() which should return true immediately at the same time resetting the event flag set internally
5) do something and goto 1
Is it a good idea or is it stupid? Of course I know that the modem almost always finishes its answers with CRLF characters so I could set the comm mask to EV_RXFLAG and wait for the #10 character to appear, but there are many other devices with which I communicate and they do not always send frame end characters.
Your help will be appreciated. Thanks in advance!
Mariusz.
Your solution does sound workable. I just use a state machine to handle the transitions.
(psuedocode)
ioState := ioIdle;
while (ioState <> ioFinished) and (not aborted) do
Case ioState of
ioIdle : if there is data to read then set state to ioMidFrame
ioMidframe : if data to read then read, if end of frame set to ioEndFrame
ioEndFrame : process the data and set to ioFinished
ioFinished : // don't do anything, for doc purposes only.
end;

Resources