I'm using Fastlane from a Mac Mini to re-sign a mobile application via a TFS build.
Everything worked pretty well until builds started hanging after Fastlane declared the job completed successfully.
The logs from the agent look like this
[2018-07-10 09:44:49Z INFO ProcessInvoker] Process started with process id 33245, waiting for process exit.
[2018-07-10 09:44:49Z INFO JobServerQueue] Try to append 1 batches web console lines, success rate: 1/1.
[2018-07-10 09:44:49Z INFO JobServerQueue] Try to append 1 batches web console lines, success rate: 1/1.
[2018-07-10 09:44:50Z INFO JobServerQueue] Try to upload 1 log files or attachments, success rate: 1/1.
[2018-07-10 09:44:55Z INFO JobServerQueue] Try to append 1 batches web console lines, success rate: 1/1.
[2018-07-10 09:44:56Z INFO JobServerQueue] Try to append 1 batches web console lines, success rate: 1/1.
[2018-07-10 09:44:56Z INFO JobServerQueue] Try to append 1 batches web console lines, success rate: 1/1.
[2018-07-10 09:44:58Z INFO JobServerQueue] Try to append 1 batches web console lines, success rate: 1/1.
[2018-07-10 09:44:59Z INFO JobServerQueue] Try to append 1 batches web console lines, success rate: 1/1.
[2018-07-10 09:45:01Z INFO JobServerQueue] Try to append 1 batches web console lines, success rate: 1/1.
[2018-07-10 09:45:16Z INFO JobServerQueue] Try to append 1 batches web console lines, success rate: 1/1.
[2018-07-10 09:45:17Z INFO JobServerQueue] Try to append 1 batches web console lines, success rate: 1/1.
[2018-07-10 09:45:22Z INFO JobServerQueue] Try to append 1 batches web console lines, success rate: 1/1.
[2018-07-10 09:45:22Z INFO JobServerQueue] Try to append 1 batches web console lines, success rate: 1/1.
[2018-07-10 09:45:22Z INFO JobServerQueue] Try to append 1 batches web console lines, success rate: 1/1.
[2018-07-10 09:45:23Z INFO JobServerQueue] Try to append 1 batches web console lines, success rate: 1/1.
[2018-07-10 09:45:24Z INFO JobServerQueue] Try to append 1 batches web console lines, success rate: 1/1.
[2018-07-10 09:45:39Z INFO JobServerQueue] Try to append 1 batches web console lines, success rate: 1/1.
[2018-07-10 09:45:41Z INFO JobServerQueue] Try to append 1 batches web console lines, success rate: 1/1.
[2018-07-10 09:45:41Z INFO JobServerQueue] Try to append 1 batches web console lines, success rate: 1/1.
[2018-07-10 09:45:42Z INFO JobServerQueue] Try to append 1 batches web console lines, success rate: 1/1.
[2018-07-10 09:45:57Z INFO JobServerQueue] Stop aggressive process web console line queue.
Everytime this line appears :
Stop aggressive process web console line queue.
The job hangs and the build fails after the 60 minutes time limit set.
The status of the fastlane step doesnt seem to be important in this case since most of the time Fastlane outputs through TFS :
2018-07-23T07:00:09.4879740Z
2018-07-23T07:00:09.4890680Z [09:00:09]: [32mfastlane.tools just saved you 14 minutes! 🎉[0m
2018-07-23T07:16:16.5905860Z ##[error]The operation was canceled.
2018-07-23T07:16:16.5987280Z ##[section]Finishing: Run Fastlane
If anyone has an idea, I'm all ears.
We've tried restarting the Fastlane machine, perform a gem cleanup apart from that aggressive console alert, nothing seems out of order.
Thank you in advance!
Related
Im running sonar in Jenkins job. The analysis stage end successfully, but after that, the job get stuck, there is nothing in the log, but after a few min I get out off memory error and the job fails.
my sonar property file:
sonar.language=javascript
# sources
sonar.sources=src
sonar.exclusions=**/node_modules/**
# tests
sonar.tests=src
sonar.test.inclusions=**/*.test.js
# tests reports
sonar.testExecutionReportPaths=reports/test-reporter.xml
sonar.javascript.lcov.reportPaths=coverage/lcov.info
sonar.verbose=true
the log:
13:31:10 13:31:10.683 INFO: Analysis report generated in 522ms, dir size=4 MB
13:31:12 13:31:12.797 INFO: Analysis report compressed in 2114ms, zip size=2 MB
13:31:12 13:31:12.797 INFO: Analysis report generated in /my_reports_loc
13:31:12 13:31:12.797 DEBUG: Upload report
13:31:12 13:31:12.955 DEBUG: POST 200 http://my-sonar/api/ce/submit?projectKeymyProjt&projectName=projectNamet | time=157ms
13:31:12 13:31:12.958 INFO: Analysis report uploaded in 161ms
13:31:12 13:31:12.959 DEBUG: Report metadata written to /my_reports_loc
13:31:12 13:31:12.959 INFO: ANALYSIS SUCCESSFUL, you can browse http://my-sonar/dashboard?id=my-project
13:31:12 13:31:12.959 INFO: Note that you will be able to access the updated dashboard once the server has processed the submitted analysis report
13:31:12 13:31:12.959 INFO: More about the report processing at http://my-sonar/api/ce/task?id=my_id
13:31:12 13:31:12.964 DEBUG: eslint-bridge server will shutdown
13:31:13 13:31:13.208 DEBUG: stylelint-bridge server will shutdown
13:31:13 13:31:13.209 INFO: Analysis total time: 42.940 s
13:31:13 13:31:13.230 INFO: ------------------------------------------------------------------------
13:31:13 13:31:13.230 INFO: EXECUTION SUCCESS
13:31:13 13:31:13.230 INFO: ------------------------------------------------------------------------
13:31:13 13:31:13.230 INFO: Total time: 44.469s
13:31:13 13:31:13.369 INFO: Final Memory: 43M/1106M
13:31:13 13:31:13.369 INFO: ------------------------------------------------------------------------
[Pipeline] }
[Pipeline] //
[Pipeline] }
[Pipeline] //
13:33:47 java.lang.OutOfMemoryError: GC overhead limit exceeded
Finished: FAILURE
You need to increase the Heap Size.
The error about GC overhead implies that Jenkins is thrashing in Garbage Collection. This means it's probably spending more time doing Garbage Collection than doing useful work. This situation normally comes about when the heap is too small for the application.
This post will help you in case if you need to know, how to increase help size for jenkins.
After SonarQube Scanner was executed by a Jenkins job, the analysis result is not being uploaded to the SonarQube server:
00:15:23.616 INFO: ANALYSIS SUCCESSFUL
00:15:23.621 DEBUG: Post-jobs : GitHub Pull Request Issue Publisher (wrapped)
00:15:23.621 INFO: Executing post-job GitHub Pull Request Issue Publisher (wrapped)
In one of my other Jenkins jobs, the result is actually uploaded to SonarQube with this log which is not found for above case:
INFO: Sensor CPD Block Indexer (done) | time=0ms
INFO: 20 files had no CPD blocks
INFO: Calculating CPD for 46 files
INFO: CPD calculation finished
INFO: Analysis report generated in 312ms, dir size=1 MB
INFO: Analysis reports compressed in 227ms, zip size=561 KB
INFO: Analysis report uploaded in 256ms
Is there anything I can do to fix this?
I am running a pyspark job (python 3.5, spark 2.1, java8) in yarn-client mode from an edge node with spark2-submit. The job succed, the result dataframe is written on HDFS and seems correct (we didn't find yet any error with the data in such dataframe).
The issue is that I see a lot (6'000) ERROR messages and I would like to understand what is wrong and if this impact or not the final dataframe.
All ERROR messages looks like this one:
18/06/01 14:08:36 INFO codegen.CodeGenerator: Code generated in 45.712788 ms
18/06/01 14:08:37 INFO executor.Executor: Finished task 33.0 in stage 34.0 (TID 2312). 4600 bytes result sent to driver
18/06/01 14:08:37 INFO executor.Executor: Finished task 117.0 in stage 34.0 (TID 2316). 3801 bytes result sent to driver
18/06/01 14:08:40 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 2512
18/06/01 14:08:40 INFO executor.Executor: Running task 190.1 in stage 34.0 (TID 2512)
18/06/01 14:08:40 INFO storage.ShuffleBlockFetcherIterator: Getting 28 non-empty blocks out of 193 blocks
18/06/01 14:08:40 INFO storage.ShuffleBlockFetcherIterator: Started 5 remote fetches in 1 ms
18/06/01 14:08:40 INFO executor.Executor: Executor is trying to kill task 190.1 in stage 34.0 (TID 2512)
18/06/01 14:08:40 ERROR storage.DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file /...../yarn/nm/usercache/../appcache/application_xxxx/blockmgr-xxxx/temp_shuffle_xxxxx
java.nio.channels.ClosedByInterruptException
at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.FileChannelImpl.truncate(FileChannelImpl.java:372)
at org.apache.spark.storage.DiskBlockObjectWriter.revertPartialWritesAndClose(DiskBlockObjectWriter.scala:212)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.stop(BypassMergeSortShuffleWriter.java:238)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
The ERROR start after quite some feture engineering (select, groupby ..) and I see the ERROR when adding these lines:
df = (df.groupby('x','y')
.agg(func.sum('x').alias('x_sum'))
.groupby('y')
.agg(func.mean('y').alias('py_sum_avg')))
So I guess the of the data shuffle is triggered by groupBy.
I first thought it was an issue with memory so I added much more memory and overhead memory for the driver and executor without a real success (this is what you can find in some other thread). In the code I have other groupBy and it seems it is causing some issue at this stage.
I also see that it could be related to too many files open or if the disk is full but the ERROR messages is a bit different in these 2 cases.
I am quite new in pysaprk so I am looking to advice to debug such issue.
How can I find what is the reason why is called java.nio.channels.ClosedByInterruptException ? I guess this is the reason that trigger ERROR storage.DiskBlockObjectWriter. Is this correct ? Is it trigger by Executor: Executor is trying to kill task 190 If this is a standard process to have some tasks killed why is this triggering ERRORs ? Can I get some hint by looking at the Sprak UI (I see that some task were killed).Can I get more info from the traceback ?
How can fixed these issues ? Any suggestion how to proceed to debug such things ? I am not sure how to proceed to debug this issue and where to look at (memory, issue in the pysaprk code, issue with the setup of the cluster or of my spark params)
I am working on an Hadoop Data Lake with Cloudera CDH 5.8.
There is an issue with using spark.speculation in Spark 2.1 which I am using.
The related upstream bug is SPARK-19293. The exception stack trace in my situation is slightly different than the one in SPARK-19293. Putting
--conf spark.speculation=false
and the ERROR are gone in my test
Build Definition
I am using the same machine as Build and Release configuration. I have created build successfully. Using build I am able to execute Coded UI scripts in Visual Studio Test Task in build and they are working fine My configuration is for build definition is mentioned below
Release Destination
after successful build definition and execution of test scripts, My next plan is to Run Automated Tests from Test plans in the Test Hub. I have associated my test scripts with Test cases also. Please have a look at the image of my release definition where I have selected Test Run using the Test run
Notification I receive after failed execution of automated test from test plan in test hub is
Deployment of release Release-11 Rejected in Deploy Test Scripts.
Log
2018-02-21T14:24:20.8978238Z AgentName: EVSRV017-DEVSRV017-4
2018-02-21T14:24:20.8978238Z AgentId: 29
2018-02-21T14:24:20.9038250Z ServiceUrl: https://mytfsserver/tfs/DefaultCollection/
2018-02-21T14:24:20.9038250Z TestPlatformVersion:
2018-02-21T14:24:20.9038250Z EnvironmentUri: dta://env/Calculator/_apis/release/16/20/1
2018-02-21T14:24:20.9038250Z QueryForTaskIntervalInMilliseconds: 3000
2018-02-21T14:24:20.9038250Z MaxQueryForTaskIntervalInMilliseconds: 10000
2018-02-21T14:24:20.9048252Z QueueNotFoundDelayTimeInMilliseconds: 3000
2018-02-21T14:24:20.9058254Z MaxQueueNotFoundDelayTimeInMilliseconds: 50000
2018-02-21T14:24:20.9058254Z RetryCountWhileConnectingToTfs: 3
2018-02-21T14:24:20.9058254Z ===========================================
2018-02-21T14:24:21.3909224Z Initializing the Test Execution Engine
Warning
2018-02-21T14:25:02.1240674Z ##[warning]Failure attempting to call the restapis. Exception: System.AggregateException: One or more errors occurred. ---> System.Net.Http.HttpRequestException: An error occurred while sending the request. ---> System.Net.WebException: The underlying connection was closed: An unexpected error occurred on a send. ---> System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host. ---> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host
2018-02-21T14:25:02.1240674Z at System.Net.Sockets.Socket.EndReceive(IAsyncResult asyncResult)
2018-02-21T14:25:02.1240674Z at System.Net.Sockets.NetworkStream.EndRead(IAsyncResult asyncResult)
2018-02-21T14:25:02.1240674Z --- End of inner exception stack trace ---
ERROR:
2018-02-22T10:10:42.0007605Z ##[error]System.Net.Http.HttpRequestException: An error occurred while sending the request. ---> System.Net.WebException: The underlying connection was closed: An unexpected error occurred on a send. ---> System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host. ---> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host
I added debug variable and it seems that when creating a setting file, it is something like system.io exception.
Enabled DEBUG LOG
2018-02-22T20:17:53.8151287Z Initializing the Test Execution Engine
2018-02-22T20:17:53.8161287Z ##[debug]Creating test settings. test settings name : 44de4d5b-f134-4ba2-b0de-ebd8d30b4d22
2018-02-22T20:18:35.3911287Z ##[warning]Failure attempting to call the restapis. Exception: System.AggregateException: One or more errors occurred. ---> System.Net.Http.HttpRequestException: An error occurred while sending the request. ---> System.Net.WebException: The underlying connection was closed: An unexpected error occurred on a send. ---> System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host. ---> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host
2018-02-22T20:18:35.3931287Z ##[debug]Processed: ##vso[task.logissue type=warning;]Failure attempting to call the restapis. Exception: System.AggregateException: One or more errors occurred. ---> System.Net.Http.HttpRequestException: An error occurred while sending the request. ---> System.Net.WebException: The underlying connection was closed: An unexpected error occurred on a send. ---> System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host. ---> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host
TFS Agent Log
[INFO VstsAgentWebProxy] No proxy setting found.
[INFO ConfigurationStore] IsServiceConfigured: False
[INFO ConfigurationManager] Is service configured: False
Worker Log
[2018-02-23 19:32:17Z INFO VstsAgentWebProxy] No proxy setting found.
[2018-02-23 19:32:52Z INFO JobServerQueue] Try to append 1 batches web console lines, success rate: 1/1.
[2018-02-23 19:32:52Z INFO JobServerQueue] Try to append 1 batches web console lines, success rate: 1/1.
[2018-02-23 19:32:53Z INFO JobServerQueue] Try to append 1 batches web console lines, success rate: 1/1.
[2018-02-23 19:33:34Z INFO JobServerQueue] Catch exception during update timeline records, try to update these timeline records next time.
[2018-02-23 19:33:34Z INFO ProcessInvoker] Finished process with exit code 0, and elapsed time 00:00:49.0055812.
[2018-02-23 19:33:34Z INFO StepsRunner] Step result: Failed
[2018-02-23 19:33:34Z INFO StepsRunner] Update job result with current step result 'Failed'.
[2018-02-23 19:33:34Z INFO StepsRunner] Current state: job state = 'Failed'
[2018-02-23 19:33:34Z INFO JobRunner] Job result after all job steps finish: Failed
[2018-02-23 19:33:34Z INFO JobRunner] Run all post-job steps.
[2018-02-23 19:33:34Z INFO JobRunner] Job result after all post-job steps finish: Failed
[2018-02-23 19:33:34Z INFO JobRunner] Completing the job execution context.
[2018-02-23 19:33:34Z INFO JobServerQueue] Try to append 2 batches web console lines, success rate: 2/2.
[2018-02-23 19:33:34Z INFO JobRunner] Shutting down the job server queue.
[2018-02-23 19:33:34Z ERR JobServerQueue] Microsoft.VisualStudio.Services.Common.VssServiceException: String or binary data would be truncated.
at Microsoft.VisualStudio.Services.WebApi.VssHttpClientBase.HandleResponse(HttpResponseMessage response)
at Microsoft.VisualStudio.Services.WebApi.VssHttpClientBase.<SendAsync>d__48.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult()
at Microsoft.VisualStudio.Services.WebApi.VssHttpClientBase.<SendAsync>d__45`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult()
at Microsoft.VisualStudio.Services.WebApi.VssHttpClientBase.<SendAsync>d__27`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult()
at Microsoft.VisualStudio.Services.WebApi.VssHttpClientBase.<SendAsync>d__26`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.VisualStudio.Services.Agent.JobServerQueue.<ProcessTimelinesUpdateQueueAsync>d__32.MoveNext()
[2018-02-23 19:33:34Z INFO JobServerQueue] Fire signal to shutdown all queues.
[2018-02-23 19:33:35Z INFO JobServerQueue] All queue process task stopped.
[2018-02-23 19:33:35Z INFO JobServerQueue] Try to append 1 batches web console lines, success rate: 1/1.
[2018-02-23 19:33:35Z INFO JobServerQueue] Web console line queue drained.
[2018-02-23 19:33:35Z INFO JobServerQueue] Try to upload 2 log files or attachments, success rate: 2/2.
[2018-02-23 19:33:35Z INFO JobServerQueue] File upload queue drained.
[2018-02-23 19:33:35Z INFO JobServerQueue] Timeline update queue drained.
[2018-02-23 19:33:35Z INFO JobServerQueue] All queue process tasks have been stopped, and all queues are drained.
[2018-02-23 19:33:35Z INFO JobRunner] Raising job completed event.
[2018-02-23 19:33:35Z INFO Worker] Job completed.
I will be thankful to you if anyone can identify where I am missing something or what I need to fix this issue so that I can execute automated tests from test plans in the test hub.
Regards
Based on the error message "Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host." and your clarification. It should be related to the known issue on win server 2008 R2. Please refer to below article for details:
Team Foundation Server: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host.
However the bug has been fixed by the Windows team and they have
released a QFE for it. You can find the QFE here. You will
need to install it on all of your ATs.
So, just try to install the hotfix and restart the computer after you apply this hotfix, then try it again.
You can also try to use the initial workarounds that list in the blog:
Open the IIS Manager
In the Connections pane, make sure the name of your AT is selected.
In the middle pane (titled “ Home”), make sure you are
in the “Features View” (bottom) and scroll down to the Management
section.
Double-click the “Configuration Editor” icon.
The middle pane should now have the title “Configuration Editor”.
In the Section pull down near the top, expand the
system.applicationHost and select “webLimits”.
You should now see a bunch of property value pairs, one of which is
named “minBytesPerSecond”. Its value is most like 240. You will
want to lower this value for the workaround.
Besides, another possibility is that it's caused by the Proxy server, just try to bypass the proxy server, then check it again.
Is there an easy method to see the elapsed time of a sidekiq job?
The log output?
ScheduledWorker JID-072f6c37f240a92ca3c07914 INFO: start
ScheduledWorker JID-072f6c37f240a92ca3c07914 INFO: done: 0.003 sec