Apache Beam exception when running wordcount example

Apache Beam exception when running wordcount example - google-cloud-dataflow

I think I followed very step on the document, but I still ran into this exception. (the only different is that I run this from Eclipse J2EE, but I won't expect this really maters, doesn't it?)
Code: (I didn't write this, it's right from the beam project example). I think you'd have to specify a google cloud platform project and provide the right credential to access it. However, I didn't find anywhere in this example project that does the setting up.
public static void main(String[] args) {
// Create a PipelineOptions object. This object lets us set various execution
// options for our pipeline, such as the runner you wish to use. This example
// will run with the DirectRunner by default, based on the class path configured
// in its dependencies.
PipelineOptions options = PipelineOptionsFactory.create();
// Create the Pipeline object with the options we defined above.
Pipeline p = Pipeline.create(options);
// Apply the pipeline's transforms.
// Concept #1: Apply a root transform to the pipeline; in this case, TextIO.Read to read a set
// of input text files. TextIO.Read returns a PCollection where each element is one line from
// the input text (a set of Shakespeare's texts).
// This example reads a public data set consisting of the complete works of Shakespeare.
p.apply(TextIO.Read.from("gs://apache-beam-samples/shakespeare/*"))
.....
)
Exception:
Exception in thread "main" java.lang.IllegalStateException: Failed to validate gs://apache-beam-samples/shakespeare/*
at org.apache.beam.sdk.io.TextIO$Read$Bound.expand(TextIO.java:309)
at org.apache.beam.sdk.io.TextIO$Read$Bound.expand(TextIO.java:205)
at org.apache.beam.sdk.runners.PipelineRunner.apply(PipelineRunner.java:76)
at org.apache.beam.runners.direct.DirectRunner.apply(DirectRunner.java:296)
at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:388)
at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:302)
at org.apache.beam.sdk.values.PBegin.apply(PBegin.java:47)
at org.apache.beam.sdk.Pipeline.apply(Pipeline.java:152)
at google.dataflow.beam.example.MinimalWordCount.main(MinimalWordCount.java:77)
Caused by: java.io.IOException: Unable to match files in bucket apache-beam-samples, prefix shakespeare/ against pattern shakespeare/[^/]*
at org.apache.beam.sdk.util.GcsUtil.expand(GcsUtil.java:234)
at org.apache.beam.sdk.util.GcsIOChannelFactory.match(GcsIOChannelFactory.java:53)
at org.apache.beam.sdk.io.TextIO$Read$Bound.expand(TextIO.java:304)
... 8 more
Caused by: com.google.api.client.http.HttpResponseException: 400 Bad Request
{
"error" : "invalid_grant"
}
at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1070)
at com.google.auth.oauth2.UserCredentials.refreshAccessToken(UserCredentials.java:207)
at com.google.auth.oauth2.OAuth2Credentials.refresh(OAuth2Credentials.java:149)
at com.google.auth.oauth2.OAuth2Credentials.getRequestMetadata(OAuth2Credentials.java:135)
at com.google.auth.http.HttpCredentialsAdapter.initialize(HttpCredentialsAdapter.java:96)
at com.google.cloud.hadoop.util.ChainingHttpRequestInitializer.initialize(ChainingHttpRequestInitializer.java:52)
at com.google.api.client.http.HttpRequestFactory.buildRequest(HttpRequestFactory.java:93)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.buildHttpRequest(AbstractGoogleClientRequest.java:300)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
at com.google.cloud.hadoop.util.ResilientOperation$AbstractGoogleClientRequestExecutor.call(ResilientOperation.java:166)
at com.google.cloud.hadoop.util.ResilientOperation.retry(ResilientOperation.java:66)
at com.google.cloud.hadoop.util.ResilientOperation.retry(ResilientOperation.java:103)
at org.apache.beam.sdk.util.GcsUtil.expand(GcsUtil.java:227)
... 10 more

Try to run it From command Prompt if using Windows.
Go to the folder containing pom.xml file and open cmd there.
then give command with the respective arguments.
mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount -Dexec.args=" --output=counts" -Pdirect-runner
If you want to run with your input file. Then make a txt file with any name and put it in the folder containing pom. And then Fire following Command.
mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount -Dexec.args="--inputFile=YOURFILENAME.txt --output=counts" -Pdirect-runner**
Hope this will do. Rest i am looking into your issue

Related

Failures in init.groovy.d scripts: null values returned

I'm trying to get Jenkins set up, with configuration, within a Docker environment. Per a variety of sources, it appears the suggested method is to insert scripts into JENKINS_HOME/init.groovy.d. I've taken scripts from places like the Jenkins wiki and made slight changes. They're only partially working. Here is one of them:
import java.util.logging.ConsoleHandler
import java.util.logging.FileHandler
import java.util.logging.SimpleFormatter
import java.util.logging.LogManager
import jenkins.model.Jenkins
// Log into a file
println("extralogging.groovy")
def RunLogger = LogManager.getLogManager().getLogger("hudson.model.Run")
def logsDir = new File("/var/log/jenkins")
if (!logsDir.exists()) { logsDir.mkdirs() }
FileHandler handler = new FileHandler(logsDir.absolutePath+"/jenkins-%g.log", 1024 * 1024, 10, true);
handler.setFormatter(new SimpleFormatter());
RunLogger.addHandler(handler)
This script fails on the last line, RunLogger.addHandler(handler).
2019-12-20 19:25:18.231+0000 [id=30] WARNING j.util.groovy.GroovyHookScript#execute: Failed to run script file:/var/lib/jenkins/init.groovy.d/02-extralogging.groovy
java.lang.NullPointerException: Cannot invoke method addHandler() on null object
I've had a number of other scripts return NULL objects from various gets similar to this one:
def RunLogger = LogManager.getLogManager().getLogger("hudson.model.Run")
My goal is to be able to develop (locally) a Jenkins implementation and then hand it to our sysops guys. Later, as I add pipelines and what not, I'd like to be able to also work on them in a local Jenkins configuration and then hand something for import into production Jenkins.
I'm not sure how to produce API documentation so I can chase this myself. Maybe I need to stop doing it this way and just grab the files that get modified when I do this via the GUI and just stuff the files into the right place.
Suggestions?

How to "reduce" Jenkins Pipeline output path

We were building our solution without any "Pipeline" in Jenkins until recently, so I'm currently in the progress to move our build to multibranch pipelines.
The issue that I'm running into is that we have a lot of structure une our solution(lot of subfolder, and sometimes some big names).
Currently, the jenkins pipeline extract everything in a folder that looks like:
D:\ws\ght-build_feature_pipelines-TMQ33LB5OQIQ5VXVMFKFDG2HWCD4MUOGEGUWJUOMZ5D2GI42BIQA
Which is very-long, and now we are reaching the 260 characters limit of MSBuild:
C:\Program Files (x86)\Microsoft Visual
Studio\2017\Professional\MSBuild\15.0\Bin\Microsoft.Common.CurrentVersion.targets(2991,5):
error MSB3553: Resource file
"obj\Release\xx.aaaaaaaaaa.yyy.bbbbbb.dddddddddddddd.yyyyyyy.vvv.dddddddddd.Resources.resources"
has an invalid name. The item metadata "%(FullPath)" cannot be applied
to the path
"obj\Release\xx.aaaaaaaaaa.yyy.bbbbbb.dddddddddddddd.yyyyyyy.vvv.dddddddddd.Resources.resources".
The specified path, file name, or both are too long. The fully
qualified file name must be less than 260 characters, and the
directory name must be less than 248 characters.
[D:\ws\ght-build_feature_pipelines-TMQ33LB5OQIQ5VXVMFKFDG2HWCD4MUOGEGUWJUOMZ5D2GI42BIQA\Src\bbbbbb\dddddd\dddddddddddddd\yyyyyyy\xx.aaaaaaaaaa.yyy.bbbbbb.dddddddddddddd.yyyyyyy.vvv\xx.aaaaaaaaaa.yyy.bbbbbb.dddddddddddddd.yyyyyyy.vvv.csproj]
We have so much cases where the length is big that it's really a big job to refactor everything, so I'm looking on how to specify to jenkins a smaller path?

What I finally did:
pipeline {
agent {
node{
label 'windows-node'
customWorkspace "D:\\ws\\${env.BRANCH_NAME}"
}
}
options{
skipDefaultCheckout()
}
...
}
And I've a step that does the checkout. It was easier for me to have a "per-job" behavior, without touching jenkins global settings.

Update (for any recent Jenkins instances)
Turns out that with recent Jenkins versions PATH_MAX seems to be ignored.
The only thing it does: Issue a warning in the Jenkins log when smaller than a certain value, which actually does not matter - as the setting itself will anyways be ignored (as seen on Jenkins 2.249.3). See also: JENKINS-2111
As far as I can tell - the new setting was introduced in jenkins-branch-api 2.0.21:
There's a new property introduced: MAX_LENGTH.
This defaults to 32 characters by default.
You can set it the same way like PATH_MAX:
As a java property - to ensure that Jenkins will start using the right setting, e.g.:
-Djenkins.branch.WorkspaceLocatorImpl.MAX_LENGTH=40
or during run-time, using the script console:
jenkins.branch.WorkspaceLocatorImpl.MAX_LENGTH=40
For older Jenkins instances
Actually there's a java property you can set to specify the length of the directory name, e.g.:
-Djenkins.branch.WorkspaceLocatorImpl.PATH_MAX=20
To make it permanent you have to specify this property in the Jenkins java startup configuration file.
You may also read and write this property using the Jenkins script console for temporary changes or to just give it a try as it takes effect immediately, e.g.
println jenkins.branch.WorkspaceLocatorImpl.PATH_MAX
jenkins.branch.WorkspaceLocatorImpl.PATH_MAX = 20
println jenkins.branch.WorkspaceLocatorImpl.PATH_MAX
Setting this value to 0 changes the path generation behavior.
For details please check:
https://issues.jenkins-ci.org/browse/JENKINS-34564
https://issues.jenkins-ci.org/browse/JENKINS-38706

Select a different runner for cucumber.api.cli.Main?

Is it possible to define/specify a runner when starting tests from cucumber's command line(cucumber.api.cli.Main)?
My reason for this is so i can generate xml reports in Jenkins and push the results to ALM Octane.
I kind of inherited this project and its using gradle to do a javaexect and call cucumber.api.cli.Main
I know its possible to do this with #RunWith(OctaneCucumber.class) when using JUnit runner + maven (or only JUnit runner), otherwise that tag is ignored. I have the custom runner with that tag but when i run from cucumber.api.cli.Main i can't find a way to run with it and my tag just gets ignored.

What #Grasshopper suggested didn't exactly work but it made me look in the right direction.
Instead of adding the code as a plugin, i managed to "hack/load" the octane reporter by creating a copy of the cucumber.api.cli.Main, using it as a base to run the cli commands and change a bit the run method and add the plugin at runtime. Needed to do this because the plugin required quite a few parameters in its constructor. Might not be the perfect solution, but it allowed me to keep the gradle build process i initially had.
public static byte run(String[] argv, ClassLoader classLoader) throws IOException {
RuntimeOptions runtimeOptions = new RuntimeOptions(new ArrayList<String>(asList(argv)));
ResourceLoader resourceLoader = new MultiLoader(classLoader);
ClassFinder classFinder = new ResourceLoaderClassFinder(resourceLoader, classLoader);
Runtime runtime = new Runtime(resourceLoader, classFinder, classLoader, runtimeOptions);
//====================Added the following lines ================
//Hardcoded runner(?) class. If its changed, it will need to be changed here also
OutputFile outputFile = new OutputFile(Main.class);
runtimeOptions.addPlugin(new HPEAlmOctaneGherkinFormatter(resourceLoader, runtimeOptions.getFeaturePaths(), outputFile));
//==============================================================
runtime.run();
return runtime.exitStatus();
}

Cloud Dataflow: java.lang.IllegalStateException: no evaluator registered for GroupedValues

I'm getting the following exception when running the pipeline locally. There is no exception when submitting for cloud execution.
Thanks,
Genady
INFO: Executing pipeline using the DirectPipelineRunner.
Exception in thread "main" java.lang.IllegalStateException: no evaluator registered for GroupedValues [GroupedValues]
at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner$Evaluator.visitTransform(DirectPipelineRunner.java:606)
at com.google.cloud.dataflow.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:200)
at com.google.cloud.dataflow.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:196)
at com.google.cloud.dataflow.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:109)
at com.google.cloud.dataflow.sdk.Pipeline.traverseTopologically(Pipeline.java:204)
at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner$Evaluator.run(DirectPipelineRunner.java:583)
at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner.run(DirectPipelineRunner.java:327)
at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner.run(DirectPipelineRunner.java:70)
at app.Main.main(Main.java:124)
The code outline is basically this:
PCollection<KV<MyKey, Iterable<MyValue>>> groupedByMyKey = ...
PCollection<KV<MyKey, MyAggregated>> aggregated = groupedByMyKey.apply(
Combine.<MyKey, MyValue, MyAggregated>groupedValues(new Aggregator()));
Aggregator class extends CombineFn<MyValue, List<MyValue>, MyAggregated>

Can you share a code snippet that triggers this? GroupedValues is a PTransform that is often used within various combining transforms, so it might be from using something like Min, Max, etc.
The error means that the DirectPipelineRunner doesn't know how to evaluate a GroupedValues. However, that's unexpected, since that should have been expanded into a ParDo before execution.

I found the reason to this behaviour
I was using a command line argument to run it in remote mode (--runner=BlockingDataflowPipelineRunner) and then forced it to run locally with
PipelineRunner<?> runner = DirectPipelineRunner.fromOptions(options);
runner.run(p);
After removing these lines and just using the --runner=DirectPipelineRunner argument it worked as expected.

Evaluate large Groovy script in in GroovyShell

I'm using GroovyConsole to evaluate scripts I get from external sources. So the code to evaluate is dynamic and I don't have control over it. Actually is written into a database and I have to read it as a String. Not perfect, but that's how it is.
What I'm doing right now:
private GroovyShell shell
def processScript( def script){
if (script) {
try{
shell.evaluate (script, 'some_random_name')
}catch( e ){
log.warn "Could not process script: $e"
}
}
}
This usually works. But now we got a large script (~3000 LOC) and it throws java.lang.RuntimeException: Method code too large! because the script is larger than 64K.
I tried to dump the script into a file and use a BufferedReader, but it throws the same Exception.
So is there a better way to evaluate dynamic Groovy code from within a Groovy method?

The problem is your script reach the java limit for a method. I think the only solution is to split your script in many scripts in some way.
See this answer

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Apache Beam exception when running wordcount example - google-cloud-dataflow

Related

Failures in init.groovy.d scripts: null values returned

How to "reduce" Jenkins Pipeline output path

Select a different runner for cucumber.api.cli.Main?

Cloud Dataflow: java.lang.IllegalStateException: no evaluator registered for GroupedValues

Evaluate large Groovy script in in GroovyShell

Categories

Resources