Umbraco 7: Write.lock file - umbraco

We've recently migrated our Umbraco site to a Windows 2012 server (was previously on 2008 R2). All was fine for the first week or so, for the past few days we've started to receive write.lock file errors every hour or so. I can rebuild the indexes, stop/restart the app pool, but the error always returns. Using Umbraco 7.4.2. Any ideas?
Thanks
Some further info from the Umbraco log:
System.Exception: App is shutting down so index batch operation is ignored,, IndexSet: ABCCorporateCyIndexSet 2016-11-09 16:15:07,513 [P2952/D88/T139] ERROR UmbracoExamine.DataServices.UmbracoLogService - Provider=ABCCorporateEnIndexer, NodeId=-1 System.Exception: App is shutting down so index batch operation is ignored,, IndexSet: ABCCorporateEnIndexSet 2016-11-09 16:15:07,560 [P2952/D88/T139] ERROR UmbracoExamine.DataServices.UmbracoLogService - Provider=InternalIndexer, NodeId=-1 System.Exception: App is shutting down so index batch operation is ignored,, IndexSet: InternalIndexSet 2016-11-09 16:15:07,560 [P2952/D87/T47] WARN Umbraco.Web.PublishedCache.XmlPublishedCache.XmlCacheFilePersister - Cannot write now because we are going down, changes may be lost. 2016-11-09 16:15:07,576 [P2952/D87/T47] ERROR UmbracoExamine.DataServices.UmbracoLogService - Provider=InternalIndexer, NodeId=-1 System.Exception: App is shutting down so index batch operation is ignored,, IndexSet: InternalIndexSet 2016-11-09 16:15:07,576 [P2952/D87/T47] ERROR UmbracoExamine.DataServices.UmbracoLogService - Provider=ExternalIndexer, NodeId=-1 System.Exception: App is shutting down so index batch operation is ignored,, IndexSet: ExternalIndexSet 2016-11-09 16:15:07,576 [P2952/D87/T47] ERROR UmbracoExamine.DataServices.UmbracoLogService - Provider=ABCCorporateCyIndexer, NodeId=-1 System.Exception: App is shutting down so index batch operation is ignored,, IndexSet: ABCCorporateCyIndexSet 2016-11-09 16:15:07,576 [P2952/D87/T47] ERROR UmbracoExamine.DataServices.UmbracoLogService - Provider=ABCCorporateEnIndexer, NodeId=-1 System.Exception: App is shutting down so index batch operation is ignored,, IndexSet: ABCCorporateEnIndexSet 2016-11-09 16:15:07,576 [P2952/D79/T40] WARN Umbraco.Web.PublishedCache.XmlPublishedCache.XmlCacheFilePersister - Cannot write now because we are going down, changes may be lost. 2016-11-09 16:15:07,576 [P2952/D79/T40] ERROR UmbracoExamine.DataServices.UmbracoLogService - Provider=InternalIndexer, NodeId=-1 System.Exception: App is shutting down so index batch operation is ignored,, IndexSet: InternalIndexSet 2016-11-09 16:15:07,591 [P2952/D91/T34] ERROR Umbraco.Core.Sync.DatabaseServerMessenger - DISTRIBUTED CACHE IS NOT UPDATED. Failed to execute instructions (24232: "[{"RefreshType":5,"RefresherId":"27ab3022-3dfa-47b6-9119-5945bc88fd66","GuidId":"00000000-0000-0000-0000-000000000000","IntId":6351,"JsonIds":null,"JsonPayload":null},{"RefreshType":3,"RefresherId":"55698352-dfc5-4dbe-96bd-a4a0f6f77145","GuidId":"00000000-0000-0000-0000-000000000000","IntId":0,"JsonIds":"[6351]","JsonPayload":null}]"). Instruction is being skipped/ignored Lucene.Net.Store.LockObtainFailedException: Lock obtain timed out: NativeFSLock#C:\inetpub\Intranet2\AppData\TEMP\ExamineIndexes\INTRANET01\External\Index\write.lock: System.IO.IOException: The process cannot access the file 'C:\inetpub\Intranet2\App_Data\TEMP\ExamineIndexes\INTRANET01\External\Index\write.lock' because it is being used by another process.

Your problem might be related to this issue: http://issues.umbraco.org/issue/U4-6338
If you have Windows Server 2012 R2 + IIS and either KB3000850 or KB3007507 installed, YOU ARE AFFECTED
Microsoft have created a hotfix: https://support.microsoft.com/en-us/kb/3052480

Related

Where can I find the default docker ulimit settings?

I have been trying to understand an issue I've had when running roribio16/alpine-sqs docker image on one of my machines. Whenever I try to run the image without specifying any other settings, docker run roribio16/alpine-sqs
[xxxx#yyyy ~]$ docker run roribio16/alpine-sqs
2021-05-29 15:48:41,216 INFO Included extra file "/etc/supervisor/conf.d/elasticmq.conf" during parsing
2021-05-29 15:48:41,216 INFO Included extra file "/etc/supervisor/conf.d/insight.conf" during parsing
2021-05-29 15:48:41,216 INFO Included extra file "/etc/supervisor/conf.d/sqs-init.conf" during parsing
2021-05-29 15:48:41,216 INFO Set uid to user 0 succeeded
2021-05-29 15:48:41,222 INFO RPC interface 'supervisor' initialized
2021-05-29 15:48:41,222 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2021-05-29 15:48:41,222 INFO supervisord started with pid 1
2021-05-29 15:48:42,225 INFO spawned: 'sqs-init' with pid 9
2021-05-29 15:48:42,229 INFO spawned: 'elasticmq' with pid 10
2021-05-29 15:48:42,230 INFO spawned: 'insight' with pid 11
cp: can't stat '/opt/custom/*.conf': No such file or directory
> sqs-insight#0.3.0 start /opt/sqs-insight
> node index.js
15:48:42.605 [main] INFO org.elasticmq.server.Main$ - Starting ElasticMQ server (0.15.0) ...
Loading config file from "/opt/sqs-insight/lib/../config/config_local.json"
15:48:42.929 [elasticmq-akka.actor.default-dispatcher-2] INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started
Unable to load queues for undefined
Config contains 0 queues.
library initialization failed - unable to allocate file descriptor table - out of memorylistening on port 9325
2021-05-29 15:48:43,233 INFO success: sqs-init entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2021-05-29 15:48:43,233 INFO success: elasticmq entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2021-05-29 15:48:43,234 INFO success: insight entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2021-05-29 15:48:43,234 INFO exited: sqs-init (exit status 0; expected)
2021-05-29 15:48:44,318 INFO exited: elasticmq (terminated by SIGABRT (core dumped); not expected)
2021-05-29 15:48:45,322 INFO spawned: 'elasticmq' with pid 67
15:48:45.743 [main] INFO org.elasticmq.server.Main$ - Starting ElasticMQ server (0.15.0) ...
15:48:46.044 [elasticmq-akka.actor.default-dispatcher-2] INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started
library initialization failed - unable to allocate file descriptor table - out of memory2021-05-29 15:48:47,223 INFO success: elasticmq entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2021-05-29 15:48:47,389 INFO exited: elasticmq (terminated by SIGABRT (core dumped); not expected)
2021-05-29 15:48:48,393 INFO spawned: 'elasticmq' with pid 89
15:48:48.766 [main] INFO org.elasticmq.server.Main$ - Starting ElasticMQ server (0.15.0) ...
15:48:49.066 [elasticmq-akka.actor.default-dispatcher-3] INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started
library initialization failed - unable to allocate file descriptor table - out of memory^C2021-05-29 15:48:49,559 INFO success: elasticmq entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2021-05-29 15:48:49,559 WARN received SIGINT indicating exit request
2021-05-29 15:48:49,559 INFO waiting for insight, elasticmq to die
2021-05-29 15:48:49,566 INFO stopped: insight (terminated by SIGTERM)
2021-05-29 15:48:50,431 INFO stopped: elasticmq (terminated by SIGABRT (core dumped))
With a bit of googling I found this post where somebody had the same issue when running some other random image, and then posted that they managed to get the image running by setting some ulimits when running the image, which also worked for me (docker run --ulimit nofile=122880:122880 roribio16/alpine-sqs).
I checked the ulimits set inside the container when I didn't use this configuration
docker exec -it ca bash
$ ulimit -a
and found that the nofile setting was ridiculously high, which I assume is what is causing the container to run out of memory, if too many files are being opened simultaneously. I don't have a particulary good understanding of how this works though so would appreciate any clarification somebody could shed on that particular topic also.
Anyway the point of that ramble is that I want to try and find where the default docker container ulimits are set as I don't understand why they are so high on the machine I am using. I have another machine that does not have this problem.
I can find lots of ways to change the default limits but there does not seem to be much information about where these limits get set in the first place. I understand according to the docker documentation that if custom values are not set then the ulimits should be inherited from my system but as far as I can tell my system nofile settings are much lower than what I'm seeing in the container.
(Both machines run manjaro linux however the one that doesn't have this issue is XFCE and the one that does is KDE).

Timeout exception Flink

I have a question regarding Flink. I am running an application in a local cluster, with 1 TaskManager and 4 Taskslots.
After some time of running the application, I got an Timeout error:
java.util.concurrent.TimeoutException: Heartbeat of TaskManager with id feea6a6702a0cf960ae2847b5bd25665 timed out.
I have seen some posts with this topic but any answer to it. Could you help me to see the root cause, or a posible troubleshooting?
I am using flink version 1.5.3
It seems that the docker container of taskmanagers and JobManager are stopped when this happens.
Let me add the error trace from the JobManager container logs:
2019-06-09 13:31:06,300 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job Socket Window NgsiEvent (ef3a860de48d54544d973754c6170d8b) switched from state FAILING to FAILED.
java.util.concurrent.TimeoutException: Heartbeat of TaskManager with id 63dbab620797b84da023b33578478238 timed out.
at org.apache.flink.runtime.jobmaster.JobMaster$TaskManagerHeartbeatListener.notifyHeartbeatTimeout(JobMaster.java:1609)
at org.apache.flink.runtime.heartbeat.HeartbeatManagerImpl$HeartbeatMonitor.run(HeartbeatManagerImpl.java:339)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.apache.flink.runtime.concurrent.akka.ActorSystemScheduledExecutorAdapter$ScheduledFutureTask.run(ActorSystemScheduledExecutorAdapter.java:154)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2019-06-09 13:31:06,308 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Could not restart the job Socket Window NgsiEvent (ef3a860de48d54544d973754c6170d8b) because the restart strategy prevented it.
java.util.concurrent.TimeoutException: Heartbeat of TaskManager with id 63dbab620797b84da023b33578478238 timed out.
at org.apache.flink.runtime.jobmaster.JobMaster$TaskManagerHeartbeatListener.notifyHeartbeatTimeout(JobMaster.java:1609)
at org.apache.flink.runtime.heartbeat.HeartbeatManagerImpl$HeartbeatMonitor.run(HeartbeatManagerImpl.java:339)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.apache.flink.runtime.concurrent.akka.ActorSystemScheduledExecutorAdapter$ScheduledFutureTask.run(ActorSystemScheduledExecutorAdapter.java:154)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2019-06-09 13:31:06,317 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Stopping checkpoint coordinator for job ef3a860de48d54544d973754c6170d8b.
2019-06-09 13:31:06,322 INFO org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore - Shutting down
2019-06-09 13:31:06,331 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink#16363182f31f:36715] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink#16363182f31f:36715]] Caused by: [16363182f31f]
2019-06-09 13:31:06,351 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Job ef3a860de48d54544d973754c6170d8b reached globally terminal state FAILED.
2019-06-09 13:31:06,434 INFO org.apache.flink.runtime.jobmaster.JobMaster - Stopping the JobMaster for job Socket Window NgsiEvent(ef3a860de48d54544d973754c6170d8b).
2019-06-09 13:31:06,447 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Suspending SlotPool.
2019-06-09 13:31:06,448 INFO org.apache.flink.runtime.jobmaster.JobMaster - Close ResourceManager connection 883e842633b0fd9a2e53ab45778581fe: JobManager is shutting down..
2019-06-09 13:31:06,449 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcActor - The rpc endpoint org.apache.flink.runtime.jobmaster.slotpool.SlotPool has not been started yet. Discarding message org.apache.flink.runtime.rpc.messages.LocalRpcInvocation until processing is started.
2019-06-09 13:31:06,457 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Disconnect job manager 00000000000000000000000000000000#akka.tcp://flink#jobmanager:6123/user/jobmanager_2 for job ef3a860de48d54544d973754c6170d8b from the resource manager.
2019-06-09 13:31:06,459 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Stopping SlotPool.
2019-06-09 13:31:06,460 INFO org.apache.flink.runtime.jobmaster.JobManagerRunner - JobManagerRunner already shutdown.
2019-06-09 13:31:16,304 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink#16363182f31f:36715] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink#16363182f31f:36715]] Caused by: [16363182f31f: Name or service not known]
2019-06-09 13:31:26,320 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink#16363182f31f:36715] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink#16363182f31f:36715]] Caused by: [16363182f31f: Name or service not known]
2019-06-09 13:31:36,286 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink#16363182f31f:36715] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink#16363182f31f:36715]] Caused by: [16363182f31f]
Thanks in advance!

Visual Studio Test Run: Run Coded UI automated tests from test plans in the Test hub

Build Definition
I am using the same machine as Build and Release configuration. I have created build successfully. Using build I am able to execute Coded UI scripts in Visual Studio Test Task in build and they are working fine My configuration is for build definition is mentioned below
Release Destination
after successful build definition and execution of test scripts, My next plan is to Run Automated Tests from Test plans in the Test Hub. I have associated my test scripts with Test cases also. Please have a look at the image of my release definition where I have selected Test Run using the Test run
Notification I receive after failed execution of automated test from test plan in test hub is
Deployment of release Release-11 Rejected in Deploy Test Scripts.
Log
2018-02-21T14:24:20.8978238Z AgentName: EVSRV017-DEVSRV017-4
2018-02-21T14:24:20.8978238Z AgentId: 29
2018-02-21T14:24:20.9038250Z ServiceUrl: https://mytfsserver/tfs/DefaultCollection/
2018-02-21T14:24:20.9038250Z TestPlatformVersion:
2018-02-21T14:24:20.9038250Z EnvironmentUri: dta://env/Calculator/_apis/release/16/20/1
2018-02-21T14:24:20.9038250Z QueryForTaskIntervalInMilliseconds: 3000
2018-02-21T14:24:20.9038250Z MaxQueryForTaskIntervalInMilliseconds: 10000
2018-02-21T14:24:20.9048252Z QueueNotFoundDelayTimeInMilliseconds: 3000
2018-02-21T14:24:20.9058254Z MaxQueueNotFoundDelayTimeInMilliseconds: 50000
2018-02-21T14:24:20.9058254Z RetryCountWhileConnectingToTfs: 3
2018-02-21T14:24:20.9058254Z ===========================================
2018-02-21T14:24:21.3909224Z Initializing the Test Execution Engine
Warning
2018-02-21T14:25:02.1240674Z ##[warning]Failure attempting to call the restapis. Exception: System.AggregateException: One or more errors occurred. ---> System.Net.Http.HttpRequestException: An error occurred while sending the request. ---> System.Net.WebException: The underlying connection was closed: An unexpected error occurred on a send. ---> System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host. ---> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host
2018-02-21T14:25:02.1240674Z at System.Net.Sockets.Socket.EndReceive(IAsyncResult asyncResult)
2018-02-21T14:25:02.1240674Z at System.Net.Sockets.NetworkStream.EndRead(IAsyncResult asyncResult)
2018-02-21T14:25:02.1240674Z --- End of inner exception stack trace ---
ERROR:
2018-02-22T10:10:42.0007605Z ##[error]System.Net.Http.HttpRequestException: An error occurred while sending the request. ---> System.Net.WebException: The underlying connection was closed: An unexpected error occurred on a send. ---> System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host. ---> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host
I added debug variable and it seems that when creating a setting file, it is something like system.io exception.
Enabled DEBUG LOG
2018-02-22T20:17:53.8151287Z Initializing the Test Execution Engine
2018-02-22T20:17:53.8161287Z ##[debug]Creating test settings. test settings name : 44de4d5b-f134-4ba2-b0de-ebd8d30b4d22
2018-02-22T20:18:35.3911287Z ##[warning]Failure attempting to call the restapis. Exception: System.AggregateException: One or more errors occurred. ---> System.Net.Http.HttpRequestException: An error occurred while sending the request. ---> System.Net.WebException: The underlying connection was closed: An unexpected error occurred on a send. ---> System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host. ---> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host
2018-02-22T20:18:35.3931287Z ##[debug]Processed: ##vso[task.logissue type=warning;]Failure attempting to call the restapis. Exception: System.AggregateException: One or more errors occurred. ---> System.Net.Http.HttpRequestException: An error occurred while sending the request. ---> System.Net.WebException: The underlying connection was closed: An unexpected error occurred on a send. ---> System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host. ---> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host
TFS Agent Log
[INFO VstsAgentWebProxy] No proxy setting found.
[INFO ConfigurationStore] IsServiceConfigured: False
[INFO ConfigurationManager] Is service configured: False
Worker Log
[2018-02-23 19:32:17Z INFO VstsAgentWebProxy] No proxy setting found.
[2018-02-23 19:32:52Z INFO JobServerQueue] Try to append 1 batches web console lines, success rate: 1/1.
[2018-02-23 19:32:52Z INFO JobServerQueue] Try to append 1 batches web console lines, success rate: 1/1.
[2018-02-23 19:32:53Z INFO JobServerQueue] Try to append 1 batches web console lines, success rate: 1/1.
[2018-02-23 19:33:34Z INFO JobServerQueue] Catch exception during update timeline records, try to update these timeline records next time.
[2018-02-23 19:33:34Z INFO ProcessInvoker] Finished process with exit code 0, and elapsed time 00:00:49.0055812.
[2018-02-23 19:33:34Z INFO StepsRunner] Step result: Failed
[2018-02-23 19:33:34Z INFO StepsRunner] Update job result with current step result 'Failed'.
[2018-02-23 19:33:34Z INFO StepsRunner] Current state: job state = 'Failed'
[2018-02-23 19:33:34Z INFO JobRunner] Job result after all job steps finish: Failed
[2018-02-23 19:33:34Z INFO JobRunner] Run all post-job steps.
[2018-02-23 19:33:34Z INFO JobRunner] Job result after all post-job steps finish: Failed
[2018-02-23 19:33:34Z INFO JobRunner] Completing the job execution context.
[2018-02-23 19:33:34Z INFO JobServerQueue] Try to append 2 batches web console lines, success rate: 2/2.
[2018-02-23 19:33:34Z INFO JobRunner] Shutting down the job server queue.
[2018-02-23 19:33:34Z ERR JobServerQueue] Microsoft.VisualStudio.Services.Common.VssServiceException: String or binary data would be truncated.
at Microsoft.VisualStudio.Services.WebApi.VssHttpClientBase.HandleResponse(HttpResponseMessage response)
at Microsoft.VisualStudio.Services.WebApi.VssHttpClientBase.<SendAsync>d__48.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult()
at Microsoft.VisualStudio.Services.WebApi.VssHttpClientBase.<SendAsync>d__45`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult()
at Microsoft.VisualStudio.Services.WebApi.VssHttpClientBase.<SendAsync>d__27`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult()
at Microsoft.VisualStudio.Services.WebApi.VssHttpClientBase.<SendAsync>d__26`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.VisualStudio.Services.Agent.JobServerQueue.<ProcessTimelinesUpdateQueueAsync>d__32.MoveNext()
[2018-02-23 19:33:34Z INFO JobServerQueue] Fire signal to shutdown all queues.
[2018-02-23 19:33:35Z INFO JobServerQueue] All queue process task stopped.
[2018-02-23 19:33:35Z INFO JobServerQueue] Try to append 1 batches web console lines, success rate: 1/1.
[2018-02-23 19:33:35Z INFO JobServerQueue] Web console line queue drained.
[2018-02-23 19:33:35Z INFO JobServerQueue] Try to upload 2 log files or attachments, success rate: 2/2.
[2018-02-23 19:33:35Z INFO JobServerQueue] File upload queue drained.
[2018-02-23 19:33:35Z INFO JobServerQueue] Timeline update queue drained.
[2018-02-23 19:33:35Z INFO JobServerQueue] All queue process tasks have been stopped, and all queues are drained.
[2018-02-23 19:33:35Z INFO JobRunner] Raising job completed event.
[2018-02-23 19:33:35Z INFO Worker] Job completed.
I will be thankful to you if anyone can identify where I am missing something or what I need to fix this issue so that I can execute automated tests from test plans in the test hub.
Regards
Based on the error message "Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host." and your clarification. It should be related to the known issue on win server 2008 R2. Please refer to below article for details:
Team Foundation Server: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host.
However the bug has been fixed by the Windows team and they have
released a QFE for it. You can find the QFE here. You will
need to install it on all of your ATs.
So, just try to install the hotfix and restart the computer after you apply this hotfix, then try it again.
You can also try to use the initial workarounds that list in the blog:
Open the IIS Manager
In the Connections pane, make sure the name of your AT is selected.
In the middle pane (titled “ Home”), make sure you are
in the “Features View” (bottom) and scroll down to the Management
section.
Double-click the “Configuration Editor” icon.
The middle pane should now have the title “Configuration Editor”.
In the Section pull down near the top, expand the
system.applicationHost and select “webLimits”.
You should now see a bunch of property value pairs, one of which is
named “minBytesPerSecond”. Its value is most like 240. You will
want to lower this value for the workaround.
Besides, another possibility is that it's caused by the Proxy server, just try to bypass the proxy server, then check it again.

sos berlin scheduler -- job chain - how to trigger other job after job timeout

I'm using sos berlin scheduler (version linux-x64 1.10.5).
Normally when job in job_chain timeout, scheduler will kill the job process and send a email.
So, based on this, I want to trigger other job.
But, I have tried two ways all doesn't work.
Way 1:
Add a function “spooler_task_after()” in the job.
I guess the failure is because this job will create a process on linux system, while job timeout scheduler kill this job process, also kill the function “spooler_task_after()”
Code:
<job timeout="00:00:09">
<script language="shell"><![CDATA[
echo aa
sleep 10s
echo bb
]]></script>
<monitor name="exit_code" ordering="0">
<script language="java:javascript"><![CDATA[
function spooler_task_after(){
var exitCode = spooler_task.exit_code;
spooler_log.info ("Exit Code is: " + exitCode);
/*
call other job
*/
result = true;
return result;
}
]]></script>
</monitor>
<run_time/>
</job>
Result:
2017-07-27 21:22:21.251+0800 [info]
2017-07-27 21:22:21.251+0800 [info] Task sample_errorhandling/job1:23026 - Protocol starts in /httx/opt/sos-scheduler/ldw-scheduler-test1/logs/task.sample_errorhandling,job1.log
2017-07-27 21:22:21.250+0800 [info] SCHEDULER-842 Task is going to process Order sample_errorhandling/job_chain3:12, state=aaa, on JobScheduler 'http://xxxx:4444', Order's Process_class
2017-07-27 21:22:21.268+0800 [info] SCHEDULER-726 Task runs on this JobScheduler 'http://jt-host-kvm-72:4444'
2017-07-27 21:22:21.268+0800 [info] SCHEDULER-918 state=starting (at=never)
2017-07-27 21:22:22.466+0800 [info] SCHEDULER-987 Starting process: '/bin/sh' '-c' '"/tmp/admin/sos.gBdCm8"'
2017-07-27 21:22:23.520+0800 [info] [stdout] aa
2017-07-27 21:22:30.326+0800 [ERROR] SCHEDULER-272 Terminating task after reaching deadline <job timeout="9">
2017-07-27 21:22:30.359+0800 [ERROR] SCHEDULER-202 Connection to task has been lost, state=running_remote_process: Z-REMOTE-101 Separate process: pid=0: Connection lost / zschimmer::com::object_server::Connection::pop_operation
2017-07-27 21:22:30.359+0800 [ERROR] SCHEDULER-202 Connection to task has been lost, state=release: Z-REMOTE-122 Separate process pid=0: Caller has killed process
2017-07-27 21:22:30.384+0800 [ERROR] SCHEDULER-280 Process terminated with exit code 1 (0x63)
2017-07-27 21:22:30.384+0800 [WARN] SCHEDULER-845 Task ended without processing the order. The order remains in job's order queue in the same state
2017-07-27 21:22:30.384+0800 [info] SCHEDULER-843 Task has ended processing of Order sample_errorhandling/job_chain3:12, state=aaa, on JobScheduler 'http:/xxxx:4444'
Way 2:
Add return code on job chain node
This way works on job execute successfully or with error. But failed when job was killed with timeout.
Code in job chain:
<job_chain >
<job_chain_node state="aaa" job="job1" next_state="success" error_state="error">
<on_return_codes >
<on_return_code return_code="1">
<add_order xmlns="https://jobscheduler-plugins.sos-berlin.com/NodeOrderPlugin" job_chain="/error_handling/sendmail"/>
</on_return_code>
</on_return_codes>
</job_chain_node>
<job_chain_node state="success"/>
<job_chain_node state="error"/>
</job_chain>
You can use the error_state= attribute.
When JobScheduler kills the task because of a timeout this is handled as an error situation.
Please note the the next_state of the errorHandling state is error to indicate in JOC that this was an error and that the errorHandling state have its own error_state to indicate if the errorHandler itself fails.
<job_chain>
<job_chain_node state="100" job="job1" next_state="200" error_state="errorHandling"/>
<job_chain_node state="200" job="job2" next_state="success" error_state="errorHandling"/>
<job_chain_node state="errorHandling" job="errorHandlerJob" next_state="error" error_state="errorInErrorHandling"/>
<job_chain_node state="success"/>
<job_chain_node state="errorInErrorHandling"/>
<job_chain_node state="error"/>
</job_chain>

Dataflow job errors: "'The resource 'projects/<removed>/zones/us-central1-a/disks/<removed>-harness-0' is not ready'

One of our pipelines failed this morning with an error we've never seen before. In addition, we had to manually delete the one VM that was was spun up to cancel/stop the job.
Has anything changed in the Dataflow service that could cause this error?
0 [main] INFO com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner - PipelineOptions.filesToStage was not specified. Defaulting to files from the classpath: will stage 49 files. Enable logging at DEBUG level to see which files will be staged.
2243 [main] INFO com.<removed>.cdf.dfp.DFPDenormalizationCloudDataFlowJob - Successfully created cloud dataflow service pipeline
2282 [main] INFO com.<removed>.cdf.dfp.DFPDenormalizationCloudDataFlowJob - Last loaded table was found. It will be processed for denormalization: Clicks_06_2015
2282 [main] INFO com.<removed>.cdf.dfp.DFPDenormalizationCloudDataFlowJob - Last loaded table was found. It will be processed for denormalization: ActiveViews_06_2015
2282 [main] INFO com.<removed>.cdf.dfp.DFPDenormalizationCloudDataFlowJob - Last loaded table was found. It will be processed for denormalization: Impressions_06_2015
2435 [main] WARN com.google.cloud.dataflow.sdk.Pipeline - Transform <removed>:<removed>.advertisers2 does not have a stable unique name. In the future, this will prevent reloading streaming pipelines
2615 [main] WARN com.google.cloud.dataflow.sdk.Pipeline - Transform <removed>:<removed>.lineitems2 does not have a stable unique name. In the future, this will prevent reloading streaming pipelines
2616 [main] WARN com.google.cloud.dataflow.sdk.Pipeline - Transform <removed>:<removed>.creative2name2 does not have a stable unique name. In the future, this will prevent reloading streaming pipelines
2616 [main] WARN com.google.cloud.dataflow.sdk.Pipeline - Transform <removed>:<removed>.adunit2site2 does not have a stable unique name. In the future, this will prevent reloading streaming pipelines
3236 [main] INFO com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner - Executing pipeline on the Dataflow Service, which will have billing implications related to Google Compute Engine usage and other Google Cloud Services.
3241 [main] INFO com.google.cloud.dataflow.sdk.util.PackageUtil - Uploading 49 files from PipelineOptions.filesToStage to staging location to prepare for execution.
41834 [main] INFO com.google.cloud.dataflow.sdk.util.PackageUtil - Uploading PipelineOptions.filesToStage complete: 10 files newly uploaded, 39 files cached
Dataflow SDK version: 0.4.150602
51003 [main] INFO com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner - To access the Dataflow monitoring console, please navigate to https://console.developers.google.com/project/<removed>/dataflow/job/2015-06-11_16_39_02-17130055143605818331
Submitted job: 2015-06-11_16_39_02-17130055143605818331
51004 [main] INFO com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner - To cancel the job using the 'gcloud' tool, run:
> gcloud alpha dataflow jobs --project=<removed> cancel 2015-06-11_16_39_02-17130055143605818331
2015-06-11T23:39:02.506Z: Detail: (b056559940543e6a): Expanding GroupByKey operations into optimizable parts.
2015-06-11T23:39:02.509Z: Detail: (b056559940543d60): Annotating graph with Autotuner information.
2015-06-11T23:39:02.759Z: Detail: (b0565599405437a9): Fusing adjacent ParDo, Read, Write, and Flatten operations
2015-06-11T23:39:02.762Z: Detail: (b05655994054369f): Fusing consumer Impressions_06_2015-ParDoDFP-transform into Impressions_06_2015-BQ-Read
2015-06-11T23:39:02.764Z: Detail: (b056559940543595): Fusing consumer Impressions_06_2015-BQ-Write into Impressions_06_2015-ParDoDFP-transform
2015-06-11T23:39:02.766Z: Detail: (b05655994054348b): Fusing consumer ActiveViews_06_2015-ParDoDFP-transform into ActiveViews_06_2015-BQ-Read
2015-06-11T23:39:02.767Z: Detail: (b056559940543381): Fusing consumer ActiveViews_06_2015-BQ-Write into ActiveViews_06_2015-ParDoDFP-transform
2015-06-11T23:39:02.769Z: Detail: (b056559940543277): Fusing consumer Clicks_06_2015-ParDoDFP-transform into Clicks_06_2015-BQ-Read
2015-06-11T23:39:02.771Z: Detail: (b05655994054316d): Fusing consumer Clicks_06_2015-BQ-Write into Clicks_06_2015-ParDoDFP-transform
2015-06-11T23:39:02.818Z: Detail: (b056559940543987): Adding StepResource setup and teardown to workflow graph.
2015-06-11T23:39:18.614Z: Error: (5494fb7a460f58a8): Workflow failed. Causes: (20fbc2bb0e7cb0b1): One or more operations had an error: 'operation-1434065943092-518467f1f5b21-8d000d8a-d5cd5762': 'The resource 'projects/<removed>/zones/us-central1-a/disks/dfp-denormalization-job-1-06111639-3db5-harness-0' is not ready'.
2015-06-11T23:39:18.651Z: Detail: (4fb958a4957733a5): Cleaning up.
2015-06-11T23:40:36.126Z: Error: (d41cf136c17a5e79): Workflow failed. Causes: (20fbc2bb0e7cb0b1): One or more operations had an error: 'operation-1434065943092-518467f1f5b21-8d000d8a-d5cd5762': 'The resource 'projects/<removed>/zones/us-central1-a/disks/dfp-denormalization-job-1-06111639-3db5-harness-0' is not ready'.
2015-06-11T23:43:05.998Z: Warning: (c5964e114f42988b): Job 2015-06-11_16_39_02-17130055143605818331 is already finishing. Ignoring cancel request.
2015-06-11T23:48:04.715Z: Warning: (cf462c726cde3704): Job 2015-06-11_16_39_02-17130055143605818331 is already finishing. Ignoring cancel request.
2015-06-11T23:50:35.529Z: Warning: Internal Issue (4fb958a495773599): 65177287:8503
748739 [main] INFO com.google.cloud.dataflow.sdk.runners.BlockingDataflowPipelineRunner - Job finished with status FAILED
748740 [main] ERROR com.<removed>.cdf.dfp.DFPDenormalizationCloudDataFlowJob - Job "dfp-denormalization-job-1434066640362" failed. Job may be retried.
This was a temporary issue with the Google Compute Engine API that has since been resolved. When calling GCE on behalf of the user, Dataflow will attempt to work around any transient errors.

Resources