Jenkins Out of Memory error while running multiple jmeter script - jenkins

I have a Jmeter/Jenkins/maven/perforce framework for API testing purpose.Where each JMX contain around 200 test cases .each test cases are written inside a thread group with one user only. Each test case has some precondition for data fetching from multiple dbs then heating request then use multiple assertions using bean shell one also.Also we have used a lot of customize jar so that we can access server and read the logs or to edit the properties file over there.
If we ran the script from jmeter it ran smoothly but if we run the script from Jenkins say 20 jmx at a time sequentially,then after some time it may be 1 2 or 17 hr it fails showing out of memory error.
My current Jenkins server config is like this:
free -h
total used free shared buff/cache available
Mem:
31G 3.1G 12.9G 16M 15G 24G
i have already tweaked with heap space like 6/8 ,12/12.
Log at the time of failure:
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:90)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
at java.lang.reflect.Method.invoke(Method.java:508)
at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:297)
at java.lang.Thread.run(Thread.java:811)
Caused by: java.lang.OutOfMemoryError: Java heap space
at org.apache.xerces.xni.XMLString.toString(Unknown Source)
at org.apache.xerces.parsers.AbstractDOMParser.characters(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.handleCharacter(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEntityReference(Unknown Source)
at <unknown class>.<unknown method>(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
at utils.APIReportProcessing.fetchAPIReportDetail(APIReportProcessing.java:84)
at jmeterRun.RunProcess.prepareFinalResults(RunProcess.java:179)
at jmeterRun.RunProcess.executeJMeterAndWriteResults(RunProcess.java:158)
at jmeterRun.ControllerJMeter.main(ControllerJMeter.java:115)
... 6 more
Here is code from APIReportProcessing part where it fails.
Below is the code where i am getting error.
public static void fetchAPIReportDetail(String rawXMLReportFile) {
File rawXMLReport = null;
try {
rawXMLReport = new File(rawXMLReportFile);
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(rawXMLReport);
doc.getDocumentElement().normalize();
individualModuleCount.add(passCount + "," + totalTestCount);
} catch (Exception var13) {
var13.printStackTrace();
Logging.log("info", "Error in fetching up data from XML file. Exception:" + var13.getMessage());
} finally {
try {
rawXMLReport.delete();
} catch (Exception var12) {
var12.printStackTrace();
Logging.log("error", "Error in deleting XML data file. Exception:" + var12.getMessage());
}
}
Thanks,
Bibek

You need to increase heap not for JMeter but for Jenkins, check out How to add Java arguments to Jenkins? article for instructions.
You might want to switch to XMLSlurper or XMLParser, both are SAX based therefore the memory footprint should be less.
According to JMeter Best Practices you should rather be using CSV output format for JMeter result files, if you need to have the count of passed requests you could go for JMeterPluginsCMD Command Line Tool

Sorry for delayed response.
In my case the issue resolved with updating correct java version.Previous java version was IBM version where the issue occured.I changed it to the openjdk version "1.8.0_191" version.
Thanks,
BIbek

Related

Glue job failed with "No space left on device" or "ArrayIndexOutOfBoundsException" when writing a huge data frame

I have a glue job that:
create dynamic frames from several data catalogs
change to spark dataframes.
join 4 dataframes and complete aggregation.
write to s3 with csv/parquet file type.
It had no problem with medium-sized data source(about 20G data in total), G1x DPU, 20 workers with execution time 40min.
But when data volume increased to 60G in total, with G2x DPU and 50 workers, time consuming increased to 4-6 hours, and result in error:
21/08/07 16:41:27 ERROR ProcessLauncher: Error from Python:Traceback (most recent call last):
File "/tmp/test-deal-stepfun.py", line 213, in <module>
df.coalesce(1).write.partitionBy("log_date_syd").mode("overwrite").csv(args['DEST_FOLDER'])
File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 927, in csv
self._jwrite.csv(path)
File "/opt/amazon/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/opt/amazon/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o253.csv.
: org.apache.spark.SparkException: Job aborted.
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:198)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:159)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
at org.apache.spark.sql.DataFrameWriter.csv(DataFrameWriter.scala:664)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task creation failed: java.lang.ArrayIndexOutOfBoundsException: 0
java.lang.ArrayIndexOutOfBoundsException: 0
at org.apache.spark.scheduler.CompressedMapStatus.getSizeForBlock(MapStatus.scala:119)
at org.apache.spark.MapOutputTrackerMaster$$anonfun$getLocationsWithLargestOutputs$1.apply(MapOutputTracker.scala:612)
at org.apache.spark.MapOutputTrackerMaster$$anonfun$getLocationsWithLargestOutputs$1.apply(MapOutputTracker.scala:599)
at org.apache.spark.ShuffleStatus.withMapStatuses(MapOutputTracker.scala:192)
at org.apache.spark.MapOutputTrackerMaster.getLocationsWithLargestOutputs(MapOutputTracker.scala:599)
at org.apache.spark.MapOutputTrackerMaster.getPreferredLocationsForShuffle(MapOutputTracker.scala:568)
at org.apache.spark.sql.execution.ShuffledRowRDD.getPreferredLocations(ShuffledRowRDD.scala:152)
at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275)
at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.preferredLocations(RDD.scala:274)
...
BTW, I have job parameters for optimization memory and disk:
"--conf: spark.executor.memory=20g --conf: spark.driver.memory=20g --conf: spark.driver.memoryOverhead=10g --conf: spark.executor.memoryOverhead=10g" to add more memory to spark driver and executors.
"--write-shuffle-files-to-s3: true" redirect intermediate files to s3 to give more space for worker nodes.
In job script, set s3 retry
conf = SparkConf()
conf.set("spark.hadoop.fs.s3.maxRetries","20").set("spark.hadoop.fs.s3.sleepTimeSeconds","30")
In job script, add options to create dynamic frame
"useS3ListImplementation": True,
"groupFiles": "InPartition",
"groupSize": "10485760"
Optimize spark job code. Drop unused columns before join, and distinct for left join.
The errors are related to "no space left on device", or "ArrayIndexOutOfBoundsException" when writing.
The metrics pattern:
metrics
How to avoid failure for huge data writing in glue job, thanks a lot!
I recently encountered this same issue while running an AWS glue job configured to use S3 for shuffle. In my case the issue was that I had incorrectly set the configuration for spark.shuffle.glue.s3ShuffleBucket. Once I fixed my job parameters to have --conf spark.shuffle.glue.s3ShuffleBucket=S://mybucket/mypath, with the key being --conf and the value being spark.shuffle.glue.s3ShuffleBucket=S://mybucket/mypath, it worked.

Cannot create project when using cloud dataflow plugin for Eclipse

I want to try setup the development environment on Windows with Eclipse. But failed at the final step after successfully setting project id and gs bucket. The errors seems related to network (I have set proxy in Eclipse), I guess Maven need to set proxy, but how? Can someone confirm it? Thanks.
"Error encountered when trying to create project"
java.lang.reflect.InvocationTargetException
at org.eclipse.jface.operation.ModalContext.run(ModalContext.java:423)
at org.eclipse.jface.wizard.WizardDialog.run(WizardDialog.java:1059)
at com.google.cloud.dataflow.eclipse.ui.wizard.NewDataflowProjectWizard.performFinish(NewDataflowProjectWizard.java:50)
at org.eclipse.jface.wizard.WizardDialog.finishPressed(WizardDialog.java:853)
at org.eclipse.jface.wizard.WizardDialog.buttonPressed(WizardDialog.java:438)
at org.eclipse.jface.dialogs.Dialog$2.widgetSelected(Dialog.java:619)
at org.eclipse.swt.widgets.TypedListener.handleEvent(TypedListener.java:248)
at org.eclipse.swt.widgets.EventTable.sendEvent(EventTable.java:84)
at org.eclipse.swt.widgets.Display.sendEvent(Display.java:4353)
at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1061)
at org.eclipse.swt.widgets.Display.runDeferredEvents(Display.java:4172)
at org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:3761)
at org.eclipse.jface.window.Window.runEventLoop(Window.java:832)
at org.eclipse.jface.window.Window.open(Window.java:808)
at org.eclipse.ui.actions.NewProjectAction.run(NewProjectAction.java:117)
at org.eclipse.jface.action.Action.runWithEvent(Action.java:519)
at org.eclipse.jface.action.ActionContributionItem.handleWidgetSelection(ActionContributionItem.java:595)
at org.eclipse.jface.action.ActionContributionItem.access$2(ActionContributionItem.java:511)
at org.eclipse.jface.action.ActionContributionItem$5.handleEvent(ActionContributionItem.java:420)
at org.eclipse.swt.widgets.EventTable.sendEvent(EventTable.java:84)
at org.eclipse.swt.widgets.Display.sendEvent(Display.java:4353)
at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1061)
at org.eclipse.swt.widgets.Display.runDeferredEvents(Display.java:4172)
at org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:3761)
at org.eclipse.e4.ui.internal.workbench.swt.PartRenderingEngine$9.run(PartRenderingEngine.java:1151)
at org.eclipse.core.databinding.observable.Realm.runWithDefault(Realm.java:332)
at org.eclipse.e4.ui.internal.workbench.swt.PartRenderingEngine.run(PartRenderingEngine.java:1032)
at org.eclipse.e4.ui.internal.workbench.E4Workbench.createAndRunUI(E4Workbench.java:148)
at org.eclipse.ui.internal.Workbench$5.run(Workbench.java:636)
at org.eclipse.core.databinding.observable.Realm.runWithDefault(Realm.java:332)
at org.eclipse.ui.internal.Workbench.createAndRunWorkbench(Workbench.java:579)
at org.eclipse.ui.PlatformUI.createAndRunWorkbench(PlatformUI.java:150)
at org.eclipse.ui.internal.ide.application.IDEApplication.start(IDEApplication.java:135)
at org.eclipse.equinox.internal.app.EclipseAppHandle.run(EclipseAppHandle.java:196)
at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.runApplication(EclipseAppLauncher.java:134)
at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.start(EclipseAppLauncher.java:104)
at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:380)
at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:235)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:648)
at org.eclipse.equinox.launcher.Main.basicRun(Main.java:603)
at org.eclipse.equinox.launcher.Main.run(Main.java:1465)
Caused by: java.lang.IllegalStateException: org.eclipse.core.runtime.CoreException: Could not resolve artifact com.google.cloud.dataflow:google-cloud-dataflow-java-archetypes-examples:pom:LATEST
at com.google.cloud.dataflow.eclipse.core.project.DataflowProjectCreator.run(DataflowProjectCreator.java:207)
at org.eclipse.jface.operation.ModalContext$ModalContextThread.run(ModalContext.java:122)
Caused by: org.eclipse.core.runtime.CoreException: Could not resolve artifact com.google.cloud.dataflow:google-cloud-dataflow-java-archetypes-examples:pom:LATEST
at org.eclipse.m2e.core.internal.embedder.MavenImpl$5.call(MavenImpl.java:776)
at org.eclipse.m2e.core.internal.embedder.MavenImpl$5.call(MavenImpl.java:1)
at org.eclipse.m2e.core.internal.embedder.MavenExecutionContext.executeBare(MavenExecutionContext.java:176)
at org.eclipse.m2e.core.internal.embedder.MavenExecutionContext.execute(MavenExecutionContext.java:151)
at org.eclipse.m2e.core.internal.embedder.MavenExecutionContext.execute(MavenExecutionContext.java:99)
at org.eclipse.m2e.core.internal.embedder.MavenImpl.resolve(MavenImpl.java:743)
at org.eclipse.m2e.core.internal.embedder.MavenImpl.resolve(MavenImpl.java:720)
at com.google.cloud.dataflow.eclipse.core.project.DataflowArtifactRetriever.archetypePom(DataflowArtifactRetriever.java:54)
at com.google.cloud.dataflow.eclipse.core.project.DataflowProjectCreator.defaultArchetypeVersions(DataflowProjectCreator.java:250)
at com.google.cloud.dataflow.eclipse.core.project.DataflowProjectCreator.run(DataflowProjectCreator.java:204)
... 1 more
"Couldn't get archetype versions to use as default"
org.eclipse.core.runtime.CoreException: Could not resolve artifact com.google.cloud.dataflow:google-cloud-dataflow-java-archetypes-examples:pom:LATEST
at org.eclipse.m2e.core.internal.embedder.MavenImpl$5.call(MavenImpl.java:776)
at org.eclipse.m2e.core.internal.embedder.MavenImpl$5.call(MavenImpl.java:1)
at org.eclipse.m2e.core.internal.embedder.MavenExecutionContext.executeBare(MavenExecutionContext.java:176)
at org.eclipse.m2e.core.internal.embedder.MavenExecutionContext.execute(MavenExecutionContext.java:151)
at org.eclipse.m2e.core.internal.embedder.MavenExecutionContext.execute(MavenExecutionContext.java:99)
at org.eclipse.m2e.core.internal.embedder.MavenImpl.resolve(MavenImpl.java:743)
at org.eclipse.m2e.core.internal.embedder.MavenImpl.resolve(MavenImpl.java:720)
at com.google.cloud.dataflow.eclipse.core.project.DataflowArtifactRetriever.archetypePom(DataflowArtifactRetriever.java:54)
at com.google.cloud.dataflow.eclipse.core.project.DataflowProjectCreator.defaultArchetypeVersions(DataflowProjectCreator.java:250)
at com.google.cloud.dataflow.eclipse.core.project.DataflowProjectCreator.run(DataflowProjectCreator.java:204)
at org.eclipse.jface.operation.ModalContext$ModalContextThread.run(ModalContext.java:122)
Contains: Failed to resolve version for com.google.cloud.dataflow:google-cloud-dataflow-java-archetypes-examples:pom:LATEST: Could not find metadata com.google.cloud.dataflow:google-cloud-dataflow-java-archetypes-examples/maven-metadata.xml in local (C:\Users\ztang16\.m2\repository)
org.eclipse.aether.resolution.VersionResolutionException: Failed to resolve version for com.google.cloud.dataflow:google-cloud-dataflow-java-archetypes-examples:pom:LATEST: Could not find metadata com.google.cloud.dataflow:google-cloud-dataflow-java-archetypes-examples/maven-metadata.xml in local (C:\Users\ztang16\.m2\repository)
at org.apache.maven.repository.internal.DefaultVersionResolver.resolveVersion(DefaultVersionResolver.java:313)
at org.eclipse.aether.internal.impl.DefaultArtifactResolver.resolve(DefaultArtifactResolver.java:302)
at org.eclipse.aether.internal.impl.DefaultArtifactResolver.resolveArtifacts(DefaultArtifactResolver.java:246)
at org.eclipse.aether.internal.impl.DefaultArtifactResolver.resolveArtifact(DefaultArtifactResolver.java:223)
at org.eclipse.aether.internal.impl.DefaultRepositorySystem.resolveArtifact(DefaultRepositorySystem.java:294)
at org.eclipse.m2e.core.internal.embedder.MavenImpl$5.call(MavenImpl.java:753)
at org.eclipse.m2e.core.internal.embedder.MavenImpl$5.call(MavenImpl.java:1)
at org.eclipse.m2e.core.internal.embedder.MavenExecutionContext.executeBare(MavenExecutionContext.java:176)
at org.eclipse.m2e.core.internal.embedder.MavenExecutionContext.execute(MavenExecutionContext.java:151)
at org.eclipse.m2e.core.internal.embedder.MavenExecutionContext.execute(MavenExecutionContext.java:99)
at org.eclipse.m2e.core.internal.embedder.MavenImpl.resolve(MavenImpl.java:743)
at org.eclipse.m2e.core.internal.embedder.MavenImpl.resolve(MavenImpl.java:720)
at com.google.cloud.dataflow.eclipse.core.project.DataflowArtifactRetriever.archetypePom(DataflowArtifactRetriever.java:54)
at com.google.cloud.dataflow.eclipse.core.project.DataflowProjectCreator.defaultArchetypeVersions(DataflowProjectCreator.java:250)
at com.google.cloud.dataflow.eclipse.core.project.DataflowProjectCreator.run(DataflowProjectCreator.java:204)
at org.eclipse.jface.operation.ModalContext$ModalContextThread.run(ModalContext.java:122)
Caused by: org.eclipse.aether.transfer.MetadataNotFoundException: Could not find metadata com.google.cloud.dataflow:google-cloud-dataflow-java-archetypes-examples/maven-metadata.xml in local (C:\Users\ztang16\.m2\repository)
at org.eclipse.aether.internal.impl.DefaultMetadataResolver.resolve(DefaultMetadataResolver.java:247)
at org.eclipse.aether.internal.impl.DefaultMetadataResolver.resolveMetadata(DefaultMetadataResolver.java:205)
at org.apache.maven.repository.internal.DefaultVersionResolver.resolveVersion(DefaultVersionResolver.java:250)
... 15 more
You seem to be hitting an issue when resolving the latest version of our Maven archetype for the examples project. The most inner exception seems to be:
org.eclipse.aether.transfer.MetadataNotFoundException:
Could not find metadata com.google.cloud.dataflow:google-cloud-dataflow-java-archetypes-examples/maven-metadata.xml
in local (C:\Users\YOUR-USERNAME\.m2\repository)
This indicates a problem in your local repository.
We have not seen this specific error before, but tangentially related problems have been resolved by manually deleting the following directory on your local machine and retrying.
C:\Users\YOUR-USERNAME\.m2\repository\com\google\cloud\dataflow\dataflow:google-cloud-dataflow-java-archetypes-examples
This would force Maven to reinitialize that part of the local repository, and perhaps recreate the missing metadata file.
That said, we have slightly changed how our Eclipse plugin interacts with the m2e plugin. The new code doesn't take this path through m2e at all. It is quite likely that this would be a non-issue with an updated code. We tentatively plan to release a new version next week, which would contain this change.
Now, there also could be an issue with a HTTP proxy configuration in your environment. I found this StackOverflow question, which explains how to configure advanced options of the M2Eclipse plugin, including the proxy settings. Dataflow Eclipse plugin is built on top of M2Eclipse and I would expect those setting to apply automatically.

SeqFilesFromDirectory() error on amazon EMR

I am trying to run a simple program on Amazon EMR which converts text files in a directory into sequence files. The program runs fine on my local machine but gives me following error on Amazon EMR. Could someone please tell me how to get rid of this error.
Configuration conf=new Configuration();
System.out.println("fs.default.name : - " + conf.get("fs.default.name"));
Path input=new Path(URI.create(args[0]));
Path output=new Path(URI.create(args[1]));
ToolRunner.run(new SequenceFilesFromDirectory(),new String[]{
"--input",input.toString(),
"--output",output.toString(),
"--overwrite",
"--method","mapreduce"});
Thank you.
Exception in thread "main" java.lang.IllegalArgumentException: This file system object (hdfs://172.31.4.175:9000) does not support access to the request path ..
You possibly called FileSystem.get(conf) when you should have called FileSystem.get(uri, conf) to obtain a file system supporting your path.
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:384)
at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:129)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:513)
at org.apache.mahout.text.SequenceFilesFromDirectory.runMapReduce(SequenceFilesFromDirectory.java:140)
at org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:89)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at com.gifts.text.SeqFileDirectory.main(SeqFileDirectory.java:36)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:187)*

Galago 3.5 Indexing

Downloaded Galago 3.5 bin version and tried to index wiki-small.corpus following this guide. Strangely I get a File Not Found Exception for the .index file when trying to run the build index command. This error goes away when I explicitly use the inputPath and indexPath but instead now I get this exception -
Created executor: org.lemurproject.galago.tupleflow.execution.LocalCheckpointedStageExecutor#69107c05
Running without server!
Use --server=true to enable web-based status page.
Stage inputSplit completed with 0 errors.
Mar 14, 2014 3:26:01 PM org.lemurproject.galago.core.parse.UniversalParser process
INFO: Processing split: /Users/nanz/Downloads/wiki-small.corpus
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at org.lemurproject.galago.core.parse.UniversalParser.process(UniversalParser.java:137)
at org.lemurproject.galago.core.parse.UniversalParser.process(UniversalParser.java:52)
at org.lemurproject.galago.core.types.DocumentSplit$FileIdOrder$TupleUnshredder.processTuple(DocumentSplit.java:2033)
at org.lemurproject.galago.core.types.DocumentSplit$FileIdOrder$DuplicateEliminator.processTuple(DocumentSplit.java:1989)
at org.lemurproject.galago.core.types.DocumentSplit$FileIdOrder$ShreddedBuffer.copyTuples(DocumentSplit.java:1705)
at org.lemurproject.galago.core.types.DocumentSplit$FileIdOrder$ShreddedBuffer.copyUntilFileId(DocumentSplit.java:1732)
at org.lemurproject.galago.core.types.DocumentSplit$FileIdOrder$ShreddedBuffer.copyUntil(DocumentSplit.java:1740)
at org.lemurproject.galago.core.types.DocumentSplit$FileIdOrder$ShreddedReader.run(DocumentSplit.java:1940)
at org.lemurproject.galago.tupleflow.FileOrderedReader.run(FileOrderedReader.java:76)
at org.lemurproject.galago.tupleflow.execution.LocalCheckpointedStageExecutor$LocalExecutionStatus.run(LocalCheckpointedStageExecutor.java:96)
at java.lang.Thread.run(Thread.java:695)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at org.lemurproject.galago.core.parse.UniversalParser.constructParserWithSplit(UniversalParser.java:213)
at org.lemurproject.galago.core.parse.UniversalParser.process(UniversalParser.java:132)
... 10 more
Caused by: java.lang.NullPointerException
at org.lemurproject.galago.core.index.KeyValueReader.getManifest(KeyValueReader.java:35)
at org.lemurproject.galago.core.index.corpus.CorpusReader.init(CorpusReader.java:41)
at org.lemurproject.galago.core.index.corpus.CorpusReader.(CorpusReader.java:32)
at org.lemurproject.galago.core.parse.CorpusSplitParser.(CorpusSplitParser.java:33)
... 16 more
Stage parsePostings completed with 1 errors.
java.lang.Exception: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
Exception in thread "main" java.util.concurrent.ExecutionException: Stage threw an exception:
at org.lemurproject.galago.tupleflow.execution.JobExecutor$JobExecutionStatus.waitForStages(JobExecutor.java:1062)
at org.lemurproject.galago.tupleflow.execution.JobExecutor$JobExecutionStatus.run(JobExecutor.java:971)
at org.lemurproject.galago.tupleflow.execution.JobExecutor.runWithoutServer(JobExecutor.java:1122)
at org.lemurproject.galago.tupleflow.execution.JobExecutor.runLocally(JobExecutor.java:1177)
at org.lemurproject.galago.core.tools.AppFunction.runTupleFlowJob(AppFunction.java:101)
at org.lemurproject.galago.core.tools.apps.BuildIndex.run(BuildIndex.java:789)
at org.lemurproject.galago.core.tools.AppFunction.run(AppFunction.java:55)
at org.lemurproject.galago.core.tools.App.run(App.java:82)
at org.lemurproject.galago.core.tools.App.run(App.java:73)
at org.lemurproject.galago.core.tools.App.main(App.java:69)
Caused by: java.lang.Exception: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at org.lemurproject.galago.tupleflow.execution.LocalCheckpointedStageExecutor$LocalExecutionStatus.run(LocalCheckpointedStageExecutor.java:99)
at java.lang.Thread.run(Thread.java:695)
I tried building the source code and I got the same results in that case as well. Can somebody point out where I am going wrong ? Hardly anybody seems to have faced this issue so there's not much I get via a simple Google search.
Solved. Just in case someone else faces this issue, one of my friends figured it out that Galago would not work directly on the wiki-small.corpus file as it tries to look for corpus.keys which do not exist for this. Just replace this .corpus file instead with the directory of documents and everything will work just fine. Do specify the indexPath and inputPath parameters explicitly. Use "galago build help" to view the exact syntax. Cheers.
I know this is late, but the wiki-small.corpus file from the textbook's website was built with an old version of galago, namely the 1.0 series, which is preserved in this google code repository: https://code.google.com/p/galagosearch/
The newer releases of Galago (2.0 ... 3.5 ...3.7) are part of newer development under the Lemur Project on sourceforge, and the corpus format has since changed. If you had a corpus file built with Galago 3.5, your commands should have worked.

Loading beans from Grails Script

I'm running a grails script to load a bean from the grails application, however, it seems that I have a dependency problem. Here it's my code:
import grails.spring.BeanBuilder
import org.springframework.context.ApplicationContext
target(main: "Script to load location information into Solr") {
println "Hello script"
def bb = new BeanBuilder()
ApplicationContext appContext = bb.createApplicationContext()
def service = appContext.getBean("solrjService")
}
setDefaultTarget(main)
When I execute the script I get the following stacktrace:
main:
Hello script
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.codehaus.groovy.tools.GroovyStarter.rootLoader(GroovyStarter.java:108)
at org.codehaus.groovy.tools.GroovyStarter.main(GroovyStarter.java:130)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
Caused by: java.lang.IllegalAccessError: tried to access field org.slf4j.impl.StaticLoggerBinder.SINGLETON from class org.slf4j.LoggerFactory
at org.slf4j.LoggerFactory.staticInitialize(LoggerFactory.java:83)
at org.slf4j.LoggerFactory.<clinit>(LoggerFactory.java:73)
at org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155)
at org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:131)
at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:272)
at grails.spring.BeanBuilder.<clinit>(BeanBuilder.java:84)
Any ideas ??
Thanks for your time
If your read the call stack, it's obviously a problem with SLF4J.
See http://slf4j.org/faq.html#IllegalAccessError
It looks like you may be mixing versions of the SLF4J jars, and getting a conflict.
But, of course, Burt is correct - once you get past this, you will find that you've initialized you BeanBuilder's ApplicationContext with no beans.
I had to include the target _GrailsBootstrap to be able to load my beans http://grails.org/doc/latest/guide/commandLine.html#creatingGantScripts
includeTargets << grailsScript("_GrailsBootstrap")
target ('default': "Load Location Information to Solr Server") {
depends(configureProxy, packageApp, classpath, loadApp, configureApp)
def service = appCtx.getBean('solrjService')
println service.getLocationSuggestion("Barcelona")
}
I run script his way (that's why I had a classpath problem)
grails run-script scripts/Myscript.groovy
Then now, I run it this way
grails Myscript.groovy
and I don't have any classpath problems :D
thanks for your help

Resources