Slave lost error in pyspark - memory

I'm using Spark1.6
I'm running a simple df.show(2) method and got errors like
An error occurred while calling o143.showString.
: org.apache.spark.SparkException: Job aborted due to stage
failure: Task 6 in stage 6.0 failed 4 times, most recent failure:
Lost task 6.3 in stage 6.0
ExecutorLostFailure (executor 2 exited caused by one of the
running tasks) Reason: Slave lost
When I did persist, through spark UI I saw the shuffleWrite memory is very high and took a long time and still returned errors.
Through some search, I found these might be the out of memory problem.
Following this link out of memory error Java
I did a repartition up to 1000, still not so helpful.
I set up the SparkConf as
conf = (SparkConf().set("spark.driver.maxResultSize", "150g").set("spark.serializer", "org.apache.spark.serializer.KryoSerializer"))
My server side memory could be up to 200GB
Do yo have any good idea to do this or point me to related links. Pyspark will be most helpful
Here is the error log from YARN:
Application application_1477088172315_0118 failed 2 times due to
AM Container for appattempt_1477088172315_0118_000006 exited
with exitCode: 10
For more detailed output, check application tracking page: Then,
click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1477088172315_0118_06_000001
Exit code: 10
Stack trace: ExitCodeException exitCode=10:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:582)
at org.apache.hadoop.util.Shell.run(Shell.java:479)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 10
Failing this attempt. Failing the application.
Here is the error info from notebook:
Py4JJavaError: An error occurred while calling o71.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 15.0 failed 4 times, most recent failure: Lost task 1.3 in stage 15.0 (): ExecutorLostFailure (executor 26 exited caused by one of the running tasks) Reason: Slave lost
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:212)
at org.apache.spark.sql.execution.Limit.executeCollect(basicOperators.scala:165)
at org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:174)
at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1499)
at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1499)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56)
at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:2086)
at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$execute$1(DataFrame.scala:1498)
at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$collect(DataFrame.scala:1505)
at org.apache.spark.sql.DataFrame$$anonfun$head$1.apply(DataFrame.scala:1375)
at org.apache.spark.sql.DataFrame$$anonfun$head$1.apply(DataFrame.scala:1374)
at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2099)
at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1374)
at org.apache.spark.sql.DataFrame.take(DataFrame.scala:1456)
at org.apache.spark.sql.DataFrame.showString(DataFrame.scala:170)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:745)
Thank you

Related

Dataflow read GCS error

Does the following exception look familiar to anyone? The exactly same pipeline and data worked last week but got failed couple of times for the same exception today. I didn't see any footprint of my code from the stack trace. Wondering what it might be related to... for example, GCS read quota?
And, since it went fine on my direct runner, for these type of Dataflow exceptions, how can I debug on Dataflow?
{
insertId: "7289985381136617647:828219:0:906922"
jsonPayload: {
exception: "java.io.IOException: Failed to advance reader of source: gs://fiona_dataflow/tmp/BigQueryExtractTemp/5c813875537d4c1a89b74a800bb37c50/000000000864.avro range [0, 808559590)
at com.google.cloud.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.advance(WorkerCustomSources.java:605)
at com.google.cloud.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.advance(ReadOperation.java:398)
at com.google.cloud.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:193)
at com.google.cloud.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:158)
at com.google.cloud.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:75)
at com.google.cloud.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:383)
at com.google.cloud.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:355)
at com.google.cloud.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:286)
at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:134)
at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:114)
at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:101)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
at com.geotab.bigdata.streaming.mapserver.backfill.MapServerBatchBeamApplication.lambda$main$fd9fc9ef$1(MapServerBatchBeamApplication.java:82)
at org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase$1.apply(BigQuerySourceBase.java:211)
at org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase$1.apply(BigQuerySourceBase.java:205)
at org.apache.beam.sdk.io.AvroSource$AvroBlock.readNextRecord(AvroSource.java:579)
at org.apache.beam.sdk.io.BlockBasedSource$BlockBasedReader.readNextRecord(BlockBasedSource.java:223)
at org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.advanceImpl(FileBasedSource.java:473)
at org.apache.beam.sdk.io.OffsetBasedSource$OffsetBasedReader.advance(OffsetBasedSource.java:267)
at com.google.cloud.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.advance(WorkerCustomSources.java:602)
... 14 more
"
job: "2018-04-23_07_30_32-17662367668739576363"
logger: "com.google.cloud.dataflow.worker.WorkItemStatusClient"
message: "Uncaught exception occurred during work unit execution. This will be retried."
stage: "s19"
thread: "27"
work: "1213589185295287945"
worker: "mapserverbatchbeamapplica-04230730-s20x-harness-713d"

Dataflow Batch Job fails with "Failed to close some writers"

I am running a batch pipeline with the Apache Beam 2.2 SDK via the Cloud Dataflow service. There are 751 text files that I parse using TextIO.readAll() transform, deserialize and write to a date partitioned table in BigQuery.
First thing I noticed is that autoscaling was not really kicking in and left the pipeline at 15 workers, even though I was able to push throughput a lot higher when for example manually setting the number of workers to 250.
My pipeline fails with the following stack trace:
(abed94a6f5139e21): java.io.IOException: Failed to close some writers
at org.apache.beam.sdk.io.gcp.bigquery.WriteBundlesToFiles.finishBundle(WriteBundlesToFiles.java:248)
Suppressed: java.io.IOException: com.google.api.client.googleapis.json.GoogleJsonResponseException: 503 Service Unavailable
Service Unavailable
at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.waitForCompletionAndThrowIfUploadFailed(AbstractGoogleAsyncWriteChannel.java:431)
at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.close(AbstractGoogleAsyncWriteChannel.java:289)
at org.apache.beam.sdk.io.gcp.bigquery.TableRowWriter.close(TableRowWriter.java:81)
at org.apache.beam.sdk.io.gcp.bigquery.WriteBundlesToFiles.finishBundle(WriteBundlesToFiles.java:242)
at org.apache.beam.sdk.io.gcp.bigquery.WriteBundlesToFiles$DoFnInvoker.invokeFinishBundle(Unknown Source)
at org.apache.beam.runners.core.SimpleDoFnRunner.finishBundle(SimpleDoFnRunner.java:187)
at com.google.cloud.dataflow.worker.SimpleParDoFn.finishBundle(SimpleParDoFn.java:407)
at com.google.cloud.dataflow.worker.util.common.worker.ParDoOperation.finish(ParDoOperation.java:60)
at com.google.cloud.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:76)
at com.google.cloud.dataflow.worker.DataflowWorker.executeWork(DataflowWorker.java:330)
at com.google.cloud.dataflow.worker.DataflowWorker.doWork(DataflowWorker.java:302)
at com.google.cloud.dataflow.worker.DataflowWorker.getAndPerformWork(DataflowWorker.java:251)
at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:135)
at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:115)
at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:102)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 503 Service Unavailable
Service Unavailable
at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)
at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113)
at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:357)
... 4 more
Should I try with even more workers or split the work across several pipelines?
Thanks to the comment by jkff it worked flawlessly - after setting --maxNumWorkers=250 (15 seems to be the standard maximum).
The error was a transient error that Dataflow would retry several times and in the end, the pipeline ran successfully.

SonarQube RTC jazz plugin failing

I am working in a .NET environment. I am currently using Jenkins to kick off my automated builds, SCM is Rational Team Concert. I am using SonarQube with the MSBuild Scanner, and the "SonarQube Jazz RTC SCM Plugin" to retrieve blame info (annotate in Rational). Most of the time this works perfectly. Sporadically I was seeing failures with no specific details of why the call was failing. If I resubmited the same jenkins job the run would succeed and I would get my scans through, including all my blame info. Recently in one of my jobs the failures started happening every time, and on the exact same file every time. This is the stack that I get out of the jenkins console log.
INFO: ------------------------------------------------------------------------
INFO: EXECUTION FAILURE
INFO: ------------------------------------------------------------------------
INFO: Total time: 3:16.401s
ERROR: Error during SonarQube Scanner execution
INFO: Final Memory: 47M/138M
INFO: ------------------------------------------------------------------------
java.lang.IllegalStateException: The jazz annotate command [cmd /C call lscm annotate -u _serviceAccount -P somePassword Areas/SomeArea/Scripts/SomeFile.js] failed:
at org.sonar.plugins.scm.jazzrtc.JazzRtcBlameCommand.blame(JazzRtcBlameCommand.java:74)
at org.sonar.plugins.scm.jazzrtc.JazzRtcBlameCommand.blame(JazzRtcBlameCommand.java:58)
at org.sonar.scanner.scm.ScmSensor.execute(ScmSensor.java:86)
at org.sonar.scanner.sensor.SensorWrapper.analyse(SensorWrapper.java:53)
at org.sonar.scanner.phases.SensorsExecutor.executeSensor(SensorsExecutor.java:57)
at org.sonar.scanner.phases.SensorsExecutor.execute(SensorsExecutor.java:49)
at org.sonar.scanner.phases.AbstractPhaseExecutor.execute(AbstractPhaseExecutor.java:78)
at org.sonar.scanner.scan.ModuleScanContainer.doAfterStart(ModuleScanContainer.java:184)
at org.sonar.core.platform.ComponentContainer.startComponents(ComponentContainer.java:142)
at org.sonar.core.platform.ComponentContainer.execute(ComponentContainer.java:127)
at org.sonar.scanner.scan.ProjectScanContainer.scan(ProjectScanContainer.java:241)
at org.sonar.scanner.scan.ProjectScanContainer.scanRecursively(ProjectScanContainer.java:236)
at org.sonar.scanner.scan.ProjectScanContainer.scanRecursively(ProjectScanContainer.java:234)
at org.sonar.scanner.scan.ProjectScanContainer.doAfterStart(ProjectScanContainer.java:226)
at org.sonar.core.platform.ComponentContainer.startComponents(ComponentContainer.java:142)
at org.sonar.core.platform.ComponentContainer.execute(ComponentContainer.java:127)
at org.sonar.scanner.task.ScanTask.execute(ScanTask.java:47)
at org.sonar.scanner.task.TaskContainer.doAfterStart(TaskContainer.java:86)
at org.sonar.core.platform.ComponentContainer.startComponents(ComponentContainer.java:142)
at org.sonar.core.platform.ComponentContainer.execute(ComponentContainer.java:127)
at org.sonar.scanner.bootstrap.GlobalContainer.executeTask(GlobalContainer.java:115)
at org.sonar.batch.bootstrapper.Batch.executeTask(Batch.java:118)
at org.sonarsource.scanner.api.internal.batch.BatchIsolatedLauncher.execute(BatchIsolatedLauncher.java:62)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.sonarsource.scanner.api.internal.IsolatedLauncherProxy.invoke(IsolatedLauncherProxy.java:60)
at com.sun.proxy.$Proxy0.execute(Unknown Source)
at org.sonarsource.scanner.api.EmbeddedScanner.doExecute(EmbeddedScanner.java:233)
at org.sonarsource.scanner.api.EmbeddedScanner.runAnalysis(EmbeddedScanner.java:151)
at org.sonarsource.scanner.cli.Main.runAnalysis(Main.java:110)
at org.sonarsource.scanner.cli.Main.execute(Main.java:74)
at org.sonarsource.scanner.cli.Main.main(Main.java:61)
ERROR:
ERROR: Re-run SonarQube Scanner using the -X switch to enable full debug logging.
The SonarQube Scanner did not complete successfully
I have set the -X switch like its asking for and I do not get any more detailed info. I have also tried setting the trace setting in the sonar.log.level to trace in the SonarQube.Analysis.xml file and I can not seem to get any more details about my failure. There is nothing special that I can see about the file that it is failing on. Its JavaScript, its ~1700 lines long. It was getting the annotate/blame calls successfully up until a couple days ago.
Anyone have this issue? Anyone know what I can do to get more details about my failure? Any help is appreciated.
Version info: SonarQube 6.0 SonarQube Jazz Plugin 1.1 Rational
6.0.3 iFix005
Update 1: 8/29
After 11 failed runs, I finally got it to go through without any errors. This is how it looked.
- Run1 failed on FileA
- Run2 failed on FileA
- Run3 failed on FileA
- Run4 failed on FileA
- Run5 failed on FileA
- Run6 failed on FileA
- Run7 failed on FileA
- Run8 aborted
- Run9 failed on FileB
- Run10 failed on FileB
- Run11 failed on FileB
- Run12 successful on all files.
This problem is not gone, but just back to random failures.

Sonar + Jenkins job execution error

It's been a week since I tried to deploy this solution locally on my machine and then deploy it to a production server.
However, I faced many difficulties and incomprehension, that is what I did:
Sonar installation via apt-get and launch of it on port 9000 of localhost.
Installation of Jenkins via apt-get and launch of it on port 8080 of localhost.
Download the Sonar plugin for Jenkins.
After trying to running a job, it failed because :
ERROR: Error during SonarQube Scanner execution org.sonarqube.ws.client.HttpException:
The full error stack is:
ERROR: Error during SonarQube Scanner execution
org.sonarqube.ws.client.HttpException: Error 500 on http:/localhost:9000/api/ce/submit?projectKey=sonar.org:projectname&projectName=devops : {"errors":[{"msg":"An error has occurred. Please contact your administrator"}]}
at org.sonarqube.ws.client.BaseResponse.failIfNotSuccessful(BaseResponse.java:36)
at org.sonar.scanner.bootstrap.ScannerWsClient.failIfUnauthorized(ScannerWsClient.java:106)
at org.sonar.scanner.bootstrap.ScannerWsClient.call(ScannerWsClient.java:75)
at org.sonar.scanner.report.ReportPublisher.upload(ReportPublisher.java:177)
at org.sonar.scanner.report.ReportPublisher.execute(ReportPublisher.java:131)
at org.sonar.scanner.phases.PublishPhaseExecutor.publishReportJob(PublishPhaseExecutor.java:72)
at org.sonar.scanner.phases.PublishPhaseExecutor.executeOnRoot(PublishPhaseExecutor.java:54)
at org.sonar.scanner.phases.AbstractPhaseExecutor.execute(AbstractPhaseExecutor.java:83)
at org.sonar.scanner.scan.ModuleScanContainer.doAfterStart(ModuleScanContainer.java:175)
at org.sonar.core.platform.ComponentContainer.startComponents(ComponentContainer.java:143)
at org.sonar.core.platform.ComponentContainer.execute(ComponentContainer.java:128)
at org.sonar.scanner.scan.ProjectScanContainer.scan(ProjectScanContainer.java:262)
at org.sonar.scanner.scan.ProjectScanContainer.scanRecursively(ProjectScanContainer.java:257)
at org.sonar.scanner.scan.ProjectScanContainer.doAfterStart(ProjectScanContainer.java:247)
at org.sonar.core.platform.ComponentContainer.startComponents(ComponentContainer.java:143)
at org.sonar.core.platform.ComponentContainer.execute(ComponentContainer.java:128)
at org.sonar.scanner.task.ScanTask.execute(ScanTask.java:47)
at org.sonar.scanner.task.TaskContainer.doAfterStart(TaskContainer.java:86)
at org.sonar.core.platform.ComponentContainer.startComponents(ComponentContainer.java:143)
at org.sonar.core.platform.ComponentContainer.execute(ComponentContainer.java:128)
at org.sonar.scanner.bootstrap.GlobalContainer.executeTask(GlobalContainer.java:118)
at org.sonar.batch.bootstrapper.Batch.executeTask(Batch.java:117)
at org.sonarsource.scanner.api.internal.batch.BatchIsolatedLauncher.execute(BatchIsolatedLauncher.java:63)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.sonarsource.scanner.api.internal.IsolatedLauncherProxy.invoke(IsolatedLauncherProxy.java:60)
at com.sun.proxy.$Proxy0.execute(Unknown Source)
at org.sonarsource.scanner.api.EmbeddedScanner.doExecute(EmbeddedScanner.java:233)
at org.sonarsource.scanner.api.EmbeddedScanner.runAnalysis(EmbeddedScanner.java:151)
at org.sonarsource.scanner.cli.Main.runAnalysis(Main.java:123)
at org.sonarsource.scanner.cli.Main.execute(Main.java:77)
at org.sonarsource.scanner.cli.Main.main(Main.java:61)
http://localhost:9000/api/ce/submit?projectKey=sonar.org:projectname&projectName=devops
: {"errors":[{"msg":"An error has occurred. Please contact your administrator"}]}
When I searched on the SonarQube logs I saw that :
2017.05.02 23:35:25 ERROR web[AVvKZ3JA+DB4lMWaAACV][o.s.s.w.WebServiceEngine] Fail to process
request http:/localhost:9000/api/qualityprofiles/restore
java.lang.IllegalStateException: Can't read file part at
org.sonar.server.ws.ServletRequest.readPart(ServletRequest.java:102)
...
Caused by: java.io.IOException: The temporary upload location [/root/Documents/sonarqube-6.3.1/temp/tc/work/Tomcat/localhost/ROOT] is not valid
As seen in "The temporary upload location is not valid", make sure that the folder /root/Documents/sonarqube-6.3.1/temp/tc/work/Tomcat/localhost/ROOT
does exist
is 775 by the user running Sonar.
Or (as seen here) you can define the temporary folder to another location
JVM_OPTIONS="-Xrs -Xms256m -Xmx512m -Djava.io.tmpdir=/opt/another/tmp"
See this thread and this thread as examples.

Restarting a failed/stalled stream during bootstrap of new node

We are trying to add a new Solr node to our cluster:
DC Cassandra
Cassandra node 1
DC Solr
Solr node 1 <-- new node (actually, a replacement for an old node)
Solr node 2
Solr node 3
Solr node 4
Solr node 5
During the bootstrap process:
The stream from node 3 to node 1 failed with an exception:
ERROR [STREAM-OUT-/IP_OF_NODE1] 2014-04-01 01:14:40,887 CassandraDaemon.java (line 196) Exception in thread Thread[STREAM-OUT-/IP_OF_NODE1,5,main]
java.lang.NullPointerException
at org.apache.cassandra.streaming.ConnectionHandler$MessageHandler.signalCloseDone(ConnectionHandler.java:249)
at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:375)
at java.lang.Thread.run(Thread.java:744)
The stream from node 4 to node 1 never started. The last relevant line in node 4's system.log is:
Received streaming plan for Bootstrap.
It should have been followed by:
Prepare completed. Receiving 0 files(0 bytes), sending x files(y bytes)
It seems that the bootstrap process is now stalled because the data file sizes are not changing anymore. How can I force those streams to be retried?
EDIT:
I restarted all nodes today in an attempt to force new node to retry the bootstrap process. Unfortunately, it encountered some stream failures again. This time, the exception in node 1 is as follows:
WARN [STREAM-IN-/IP_OF_NODE3] 2014-04-06 20:48:17,963 StreamSession.java (line 532) [Stream #c84effb0-bda9-11e3-a07d-89325af2f6bf] Retrying for following error
java.lang.RuntimeException: java.io.FileNotFoundException: /home/cassandra/data/my_keyspace/my_table/my_keyspace-my_table-tmp-jb-1209-Data.db (Too many open files)
at org.apache.cassandra.io.util.SequentialWriter.<init>(SequentialWriter.java:75)
at org.apache.cassandra.io.compress.CompressedSequentialWriter.<init>(CompressedSequentialWriter.java:71)
at org.apache.cassandra.io.compress.CompressedSequentialWriter.open(CompressedSequentialWriter.java:42)
at org.apache.cassandra.io.sstable.SSTableWriter.<init>(SSTableWriter.java:107)
at org.apache.cassandra.io.sstable.SSTableWriter.<init>(SSTableWriter.java:60)
at org.apache.cassandra.streaming.StreamReader.createWriter(StreamReader.java:111)
at org.apache.cassandra.streaming.compress.CompressedStreamReader.read(CompressedStreamReader.java:65)
at org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:47)
at org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:37)
at org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:55)
at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:283)
at java.lang.Thread.run(Thread.java:724)
Caused by: java.io.FileNotFoundException: /home/cassandra/data/my_keyspace/my_table/my_keyspace-my_table-tmp-jb-1209-Data.db (Too many open files)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:233)
at org.apache.cassandra.io.util.SequentialWriter.<init>(SequentialWriter.java:71)
ERROR [STREAM-IN-/78.46.63.218] 2014-04-06 20:48:17,964 StreamSession.java (line 418) [Stream #c84effb0-bda9-11e3-a07d-89325af2f6bf] Streaming error occurred
java.lang.IllegalArgumentException: Unknown type 0
at org.apache.cassandra.streaming.messages.StreamMessage$Type.get(StreamMessage.java:89)
at org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:54)
at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:283)
at java.lang.Thread.run(Thread.java:724)
There are tons of similar errors in the log. e.g.:
ERROR [CompactionExecutor:129] 2014-04-06 20:50:06,401 CassandraDaemon.java (line 196) Exception in thread Thread[CompactionExecutor:129,1,main]
java.lang.RuntimeException: java.lang.RuntimeException: java.io.FileNotFoundException: /home/cassandra/data/my_keyspace/my_table/my_keyspace-my_table-jb-51-Data.db (Too many open files)
at org.apache.cassandra.service.pager.QueryPagers$1.next(QueryPagers.java:154)
at org.apache.cassandra.service.pager.QueryPagers$1.next(QueryPagers.java:137)
at org.apache.cassandra.db.Keyspace.indexRow(Keyspace.java:400)
at org.apache.cassandra.db.index.SecondaryIndexBuilder.build(SecondaryIndexBuilder.java:62)
at org.apache.cassandra.db.compaction.CompactionManager$9.run(CompactionManager.java:833)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: /home/cassandra/data/my_keyspace/my_table/my_keyspace-my_table-jb-51-Data.db (Too many open files)
at org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:47)
at org.apache.cassandra.io.util.CompressedPoolingSegmentedFile.createReader(CompressedPoolingSegmentedFile.java:48)
at org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(PoolingSegmentedFile.java:39)
at org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:1195)
at org.apache.cassandra.db.columniterator.SimpleSliceReader.<init>(SimpleSliceReader.java:57)
at org.apache.cassandra.db.columniterator.SSTableSliceIterator.createReader(SSTableSliceIterator.java:65)
at org.apache.cassandra.db.columniterator.SSTableSliceIterator.<init>(SSTableSliceIterator.java:42)
at org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:167)
at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:62)
at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:250)
at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1550)
at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1379)
at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:327)
at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:65)
at org.apache.cassandra.service.pager.SliceQueryPager.queryNextPage(SliceQueryPager.java:77)
at org.apache.cassandra.service.pager.AbstractQueryPager.fetchPage(AbstractQueryPager.java:84)
at org.apache.cassandra.service.pager.SliceQueryPager.fetchPage(SliceQueryPager.java:33)
at org.apache.cassandra.service.pager.QueryPagers$1.next(QueryPagers.java:148)
... 10 more
Caused by: java.io.FileNotFoundException: /home/cassandra/data/my_keyspace/my_table/my_keyspace-my_table-jb-51-Data.db (Too many open files)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:233)
at org.apache.cassandra.io.util.RandomAccessReader.<init>(RandomAccessReader.java:58)
at org.apache.cassandra.io.compress.CompressedRandomAccessReader.<init>(CompressedRandomAccessReader.java:76)
at org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:43)
... 28 more
This appears to be very similar to a Cassandra bug/issue:
https://issues.apache.org/jira/browse/CASSANDRA-6965
I'll follow up on that.
Meanwhile, you could run rebuild/repair on that new node.
EDIT: Another Cassandra issue that appears to be related:
CASSANDRA-6984 - "NullPointerException in Streaming During Repair"
https://issues.apache.org/jira/browse/CASSANDRA-6984
That issue is labeled as a Blocker, so it should get some prompt attention. I've inquired as to whether there is a workaround.
Stay tuned.
(Too many open files)
Looks like you need to increase your ulimit.

Resources