Py4JJavaError: An error occurred while calling o767.fit - machine-learning

I tried to fit a random forest classifier in pyspark but i'm getting this error:
Py4JJavaError: An error occurred while calling o767.fit.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 30.0 failed 1 times, most recent failure: Lost task 0.0 in stage 30.0 (TID 853, localhost, executor driver): java.lang.OutOfMemoryError: Java heap space
Can anyone help me please?
My code :
from pyspark.ml.tuning import ParamGridBuilder
rf = RandomForestClassifier(labelCol="label", featuresCol="features")
paramGrid = (ParamGridBuilder()
.addGrid(rf.numTrees, [100])
.build())
crossval = CrossValidator(estimator=rf,
estimatorParamMaps=paramGrid,
evaluator=BinaryClassificationEvaluator(),
numFolds=10)
cvModel = crossval.fit(trainingData)
predictions = crossval.transform(testData)
predictions.printSchema()

Related

FATAL ERROR: v8::ArrayBuffer::NewBackingStore Allocation failed - process out of memory

while running the application at first its getting complied successful and its recompiling again and giving a "fatal error"
have tried setting the memory space by the "node --max_old_space_size=8048" but still the process out of memory
ERROR:
√ Compiled successfully.
<--- Last few GCs --->
[9120:043932C8] 79996 ms: Mark-sweep (reduce) 905.6 (986.8) -> 905.6 (975.6) MB, 831.5 / 0.1 ms (average mu = 0.147, current mu = 0.000) external memory pressure GC in old space requested
[9120:043932C8] 80890 ms: Mark-sweep (reduce) 905.6 (975.6) -> 905.6 (961.1) MB, 893.1 / 0.1 ms (average mu = 0.074, current mu = 0.000) external memory pressure GC in old space requested
<--- JS stacktrace --->
FATAL ERROR: v8::ArrayBuffer::NewBackingStore Allocation failed - process out of memory

SynapseML OpenCV errors

I am trying to run this synapseml opencv to pre processing and I get this error. It's simple use case of reading images and transforming .
from synapse.ml.opencv import *
image_transformer = (ImageTransformer().setOutputCol("transformed")
.resize(1024, True)
)
images_transformed = image_transformer.transform(images_df)
images_transformed.show()
Error
Py4JJavaError: An error occurred while calling o666.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 3.0 failed 4 times, most recent failure: Lost task 1.3 in stage 3.0 (TID 8) (vm-94657109 executor 2): org.apache.spark.SparkException: Failed to execute user defined function(functions$$$Lambda$4740/1915249168: (struct<origin:string,height:int,width:int,nChannels:int,mode:int,data:binary>) => struct<origin:string,height:int,width:int,nChannels:int,mode:int,data:binary>)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_doConsume_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.serializefromobject_doConsume_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:383)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:905)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:905)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:374)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:338)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: CvException [org.opencv.core.CvException: cv::Exception: /home/jason/opencv/opencv/opencv-3.2.0/modules/core/src/matrix.cpp:307: error: (-215) s >= 0 in function setSize
]
at org.opencv.core.Mat.n_Mat(Native Method)
at org.opencv.core.Mat.<init>(Mat.java:37)
at com.microsoft.azure.synapse.ml.opencv.ImageTransformer$.row2mat(ImageTransformer.scala:294)
at com.microsoft.azure.synapse.ml.opencv.ImageTransformer$.$anonfun$decodeImage$4(ImageTransformer.scala:326)
at scala.Option.map(Option.scala:230)
at com.microsoft.azure.synapse.ml.opencv.ImageTransformer$.decodeImage(ImageTransformer.scala:326)
at com.microsoft.azure.synapse.ml.opencv.ImageTransformer.$anonfun$transform$9(ImageTransformer.scala:632)
at org.apache.spark.injections.UDFUtils$$anon$1.call(UDFUtils.scala:23)
at org.apache.spark.sql.functions$.$anonfun$udf$91(functions.scala:4854)
... 19 more

Why is "java.nio.channels.ClosedByInterruptExceptio" called when caling multiple groupBy with pyspark?

I am running a pyspark job (python 3.5, spark 2.1, java8) in yarn-client mode from an edge node with spark2-submit. The job succed, the result dataframe is written on HDFS and seems correct (we didn't find yet any error with the data in such dataframe).
The issue is that I see a lot (6'000) ERROR messages and I would like to understand what is wrong and if this impact or not the final dataframe.
All ERROR messages looks like this one:
18/06/01 14:08:36 INFO codegen.CodeGenerator: Code generated in 45.712788 ms
18/06/01 14:08:37 INFO executor.Executor: Finished task 33.0 in stage 34.0 (TID 2312). 4600 bytes result sent to driver
18/06/01 14:08:37 INFO executor.Executor: Finished task 117.0 in stage 34.0 (TID 2316). 3801 bytes result sent to driver
18/06/01 14:08:40 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 2512
18/06/01 14:08:40 INFO executor.Executor: Running task 190.1 in stage 34.0 (TID 2512)
18/06/01 14:08:40 INFO storage.ShuffleBlockFetcherIterator: Getting 28 non-empty blocks out of 193 blocks
18/06/01 14:08:40 INFO storage.ShuffleBlockFetcherIterator: Started 5 remote fetches in 1 ms
18/06/01 14:08:40 INFO executor.Executor: Executor is trying to kill task 190.1 in stage 34.0 (TID 2512)
18/06/01 14:08:40 ERROR storage.DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file /...../yarn/nm/usercache/../appcache/application_xxxx/blockmgr-xxxx/temp_shuffle_xxxxx
java.nio.channels.ClosedByInterruptException
at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.FileChannelImpl.truncate(FileChannelImpl.java:372)
at org.apache.spark.storage.DiskBlockObjectWriter.revertPartialWritesAndClose(DiskBlockObjectWriter.scala:212)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.stop(BypassMergeSortShuffleWriter.java:238)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
The ERROR start after quite some feture engineering (select, groupby ..) and I see the ERROR when adding these lines:
df = (df.groupby('x','y')
.agg(func.sum('x').alias('x_sum'))
.groupby('y')
.agg(func.mean('y').alias('py_sum_avg')))
So I guess the of the data shuffle is triggered by groupBy.
I first thought it was an issue with memory so I added much more memory and overhead memory for the driver and executor without a real success (this is what you can find in some other thread). In the code I have other groupBy and it seems it is causing some issue at this stage.
I also see that it could be related to too many files open or if the disk is full but the ERROR messages is a bit different in these 2 cases.
I am quite new in pysaprk so I am looking to advice to debug such issue.
How can I find what is the reason why is called java.nio.channels.ClosedByInterruptException ? I guess this is the reason that trigger ERROR storage.DiskBlockObjectWriter. Is this correct ? Is it trigger by Executor: Executor is trying to kill task 190 If this is a standard process to have some tasks killed why is this triggering ERRORs ? Can I get some hint by looking at the Sprak UI (I see that some task were killed).Can I get more info from the traceback ?
How can fixed these issues ? Any suggestion how to proceed to debug such things ? I am not sure how to proceed to debug this issue and where to look at (memory, issue in the pysaprk code, issue with the setup of the cluster or of my spark params)
I am working on an Hadoop Data Lake with Cloudera CDH 5.8.
There is an issue with using spark.speculation in Spark 2.1 which I am using.
The related upstream bug is SPARK-19293. The exception stack trace in my situation is slightly different than the one in SPARK-19293. Putting
--conf spark.speculation=false
and the ERROR are gone in my test

Slave lost error in pyspark

I'm using Spark1.6
I'm running a simple df.show(2) method and got errors like
An error occurred while calling o143.showString.
: org.apache.spark.SparkException: Job aborted due to stage
failure: Task 6 in stage 6.0 failed 4 times, most recent failure:
Lost task 6.3 in stage 6.0
ExecutorLostFailure (executor 2 exited caused by one of the
running tasks) Reason: Slave lost
When I did persist, through spark UI I saw the shuffleWrite memory is very high and took a long time and still returned errors.
Through some search, I found these might be the out of memory problem.
Following this link out of memory error Java
I did a repartition up to 1000, still not so helpful.
I set up the SparkConf as
conf = (SparkConf().set("spark.driver.maxResultSize", "150g").set("spark.serializer", "org.apache.spark.serializer.KryoSerializer"))
My server side memory could be up to 200GB
Do yo have any good idea to do this or point me to related links. Pyspark will be most helpful
Here is the error log from YARN:
Application application_1477088172315_0118 failed 2 times due to
AM Container for appattempt_1477088172315_0118_000006 exited
with exitCode: 10
For more detailed output, check application tracking page: Then,
click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1477088172315_0118_06_000001
Exit code: 10
Stack trace: ExitCodeException exitCode=10:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:582)
at org.apache.hadoop.util.Shell.run(Shell.java:479)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 10
Failing this attempt. Failing the application.
Here is the error info from notebook:
Py4JJavaError: An error occurred while calling o71.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 15.0 failed 4 times, most recent failure: Lost task 1.3 in stage 15.0 (): ExecutorLostFailure (executor 26 exited caused by one of the running tasks) Reason: Slave lost
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:212)
at org.apache.spark.sql.execution.Limit.executeCollect(basicOperators.scala:165)
at org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:174)
at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1499)
at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1499)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56)
at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:2086)
at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$execute$1(DataFrame.scala:1498)
at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$collect(DataFrame.scala:1505)
at org.apache.spark.sql.DataFrame$$anonfun$head$1.apply(DataFrame.scala:1375)
at org.apache.spark.sql.DataFrame$$anonfun$head$1.apply(DataFrame.scala:1374)
at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2099)
at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1374)
at org.apache.spark.sql.DataFrame.take(DataFrame.scala:1456)
at org.apache.spark.sql.DataFrame.showString(DataFrame.scala:170)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:745)
Thank you

Failed to install ZAP_2.4.2_Core.tar.gz

it's my first post here so please excuse me for making any mistakes.
I try to perform the installation of ZAProxy Plugin in Jenkins.
I installed the custom tools plugin by inserting the values according to the website https://wiki.jenkins-ci.org/display/JENKINS/ZAProxy+Plugin.
After that I execute ZAProxy in my Jenkins Job:
After performing the build I see the following error:
Perform ZAProxy
Unpacking https://github.com/zaproxy/zaproxy/wiki/Downloads/ZAP_2.4.3_Core.tar.gz to /var/lib/jenkins/tools/com.cloudbees.jenkins.plugins.customtools.CustomTool/ZAProxy_2.4.3 on Jenkins
ERROR: java.io.IOException: Failed to install https://github.com/zaproxy/zaproxy/wiki/Downloads/ZAP_2.4.3_Core.tar.gz to /var/lib/jenkins/tools/com.cloudbees.jenkins.plugins.customtools.CustomTool/ZAProxy_2.4.3
at hudson.FilePath.installIfNecessaryFrom(FilePath.java:832)
at hudson.tools.ZipExtractionInstaller.performInstallation(ZipExtractionInstaller.java:79)
at hudson.tools.InstallerTranslator.getToolHome(InstallerTranslator.java:68)
at hudson.tools.ToolLocationNodeProperty.getToolHome(ToolLocationNodeProperty.java:108)
at hudson.tools.ToolInstallation.translateFor(ToolInstallation.java:206)
at com.cloudbees.jenkins.plugins.customtools.CustomTool.forNode(CustomTool.java:154)
at com.cloudbees.jenkins.plugins.customtools.CustomTool.forNode(CustomTool.java:59)
at fr.novia.zaproxyplugin.ZAProxy.retrieveZapHomeWithToolInstall(ZAProxy.java:486)
at fr.novia.zaproxyplugin.ZAProxy.checkParams(ZAProxy.java:574)
at fr.novia.zaproxyplugin.ZAProxy.startZAP(ZAProxy.java:613)
at fr.novia.zaproxyplugin.ZAProxyBuilder.perform(ZAProxyBuilder.java:159)
at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:782)
at hudson.maven.MavenModuleSetBuild$MavenModuleSetBuildExecution.build(MavenModuleSetBuild.java:919)
at hudson.maven.MavenModuleSetBuild$MavenModuleSetBuildExecution.doRun(MavenModuleSetBuild.java:870)
at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:534)
at hudson.model.Run.execute(Run.java:1738)
at hudson.maven.MavenModuleSetBuild.run(MavenModuleSetBuild.java:531)
at hudson.model.ResourceController.execute(ResourceController.java:98)
at hudson.model.Executor.run(Executor.java:410)
Caused by: java.io.IOException: Failed to unpack https://github.com/zaproxy/zaproxy/wiki/Downloads/ZAP_2.4.3_Core.tar.gz (26 bytes read of total -1)
at hudson.FilePath.installIfNecessaryFrom(FilePath.java:826)
... 19 more
Caused by: java.io.IOException: Failed to extract input stream
at hudson.FilePath.readFromTar(FilePath.java:2300)
at hudson.FilePath.access$400(FilePath.java:190)
at hudson.FilePath$10.invoke(FilePath.java:720)
at hudson.FilePath$10.invoke(FilePath.java:718)
at hudson.FilePath.act(FilePath.java:990)
at hudson.FilePath.act(FilePath.java:968)
at hudson.FilePath.untarFrom(FilePath.java:718)
at hudson.FilePath.installIfNecessaryFrom(FilePath.java:824)
... 19 more
Caused by: java.io.IOException: incorrect header check
at com.jcraft.jzlib.InflaterInputStream.read(InflaterInputStream.java:112)
at org.apache.commons.compress.utils.IOUtils.readFully(IOUtils.java:160)
at org.apache.commons.compress.utils.IOUtils.readFully(IOUtils.java:134)
at org.apache.commons.compress.archivers.tar.TarArchiveInputStream.readRecord(TarArchiveInputStream.java:419)
at org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getRecord(TarArchiveInputStream.java:388)
at org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:269)
at hudson.FilePath.readFromTar(FilePath.java:2278)
... 26 more
Build step 'Execute ZAProxy' marked build as failure
[DependencyCheck] Collecting Dependency-Check analysis files...
[DependencyCheck] Finding all files that match the pattern dependency-check- report.xml
[DependencyCheck] Parsing 1 file in /var/lib/jenkins/jobs/<project name>_continuous/workspace
[DependencyCheck] Successfully parsed file /var/lib/jenkins/jobs/<project name>_continuous/workspace/dependency-check-report.xml of module <company name> HELP with 0 unique warnings and 0 duplicates.
[DependencyCheck] Computing warning deltas based on reference build #75
[DependencyCheck] Ignore new warnings since this is the first valid build
[DependencyCheck] Plug-in Result: Success - no threshold has been exceeded
Finished: FAILURE
I just couldn*t find any information about this error(s) and I'm sure it could help other newcomers who have to use this plugin in the future. Please help me. It's the Jenkins of my company so I have to conceal internal information. I overdraw it for you with red color.
Where did you get that URL from?
The correct one for ZAP 2.4.3 core is: https://github.com/zaproxy/zaproxy/releases/download/2.4.3/ZAP_2.4.3_Core.tar.gz

Resources