Google Dataflow batch job reading from Pub Sub - google-cloud-dataflow

I'm having trouble running a dataflow pipeline in batch mode using PubsubIO.Read as the source. I have tried using both .maxNumRecords and .maxReadTime in order to make it a bounded collection. However, my job hangs on the read step and I get Socket Timeout Exceptions. Only when I publish a message to the topic while the job is running does it process queued messages. I was under the impression that if I specify a subscription for the pipeline that it would process any messages that it missed on startup. I'm using the Dataflow SDK 1.9 for Java. Below is a snippet of my code.
AuditPipelineOptions options = PipelineOptionsFactory.fromArgs(args).withValidation()
.as(AuditPipelineOptions.class);
options.setRunner(TemplatingDataflowPipelineRunner.class);
Pipeline p = Pipeline.create(options);
p.apply(PubsubIO.Read.subscription(options.getPubSubSubscription()).withCoder(StringUtf8Coder.of()).maxReadTime(new Duration(1000 * 60)))
.apply(ParDo.of(new CreateEntitiesFn(options.getNamespace(),options.getIgnoreKeys())))
.apply(DatastoreIO.v1().write().withProjectId(options.getProject()));
p.run();
Here is the full exception:
exception thrown while executing request
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:170)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
at sun.security.ssl.InputRecord.read(InputRecord.java:503)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:983)
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:940)
at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1536)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1441)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:338)
at com.google.api.client.http.javanet.NetHttpResponse.<init>(NetHttpResponse.java:37)
at com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:94)
at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:981)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
at com.google.cloud.dataflow.sdk.util.PubsubJsonClient.pull(PubsubJsonClient.java:171)
at com.google.cloud.dataflow.sdk.io.PubsubIO$Read$Bound$PubsubReader.processElement(PubsubIO.java:886)
at com.google.cloud.dataflow.sdk.util.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:49)
at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.processElement(DoFnRunnerBase.java:139)
at com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn.processElement(SimpleParDoFn.java:188)
at com.google.cloud.dataflow.sdk.runners.worker.ForwardingParDoFn.processElement(ForwardingParDoFn.java:42)
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerLoggingParDoFn.processElement(DataflowWorkerLoggingParDoFn.java:47)
at com.google.cloud.dataflow.sdk.util.common.worker.ParDoOperation.process(ParDoOperation.java:55)
at com.google.cloud.dataflow.sdk.util.common.worker.OutputReceiver.process(OutputReceiver.java:52)
at com.google.cloud.dataflow.sdk.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:221)
at com.google.cloud.dataflow.sdk.util.common.worker.ReadOperation.start(ReadOperation.java:182)
at com.google.cloud.dataflow.sdk.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:69)
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.executeWork(DataflowWorker.java:285)
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.doWork(DataflowWorker.java:221)
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.getAndPerformWork(DataflowWorker.java:171)
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.doWork(DataflowWorkerHarness.java:192)
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:172)
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:159)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Sample Job ID: 2017-05-17_12_07_37-2442852182886580802

Related

Error when binding two instances Janusgraph on same graph

So I'm trying to deploy 2 instances of Janusgraph (v 0.3.1) to insert data in the same keyspace of ScyllaDb backend. To do that i deploy 2 janusgraph containers with docker. The first one is starting without errors and creates the keyspace in my ScyllaDb but the second one displays some errors.
So my Janusgraph containers works on a cluster while my backend works on another cluster. Actually, I tried with only one container to restart it when my keyspace Scylla is already created and I have the same problem.
I also found a script clean.groovy that allowed to force shutdown the graph instances opened. But nothing works...
5316 [main] INFO org.janusgraph.diskstorage.Backend - Initiated backend operations thread pool of size 8
5648 [main] WARN org.apache.tinkerpop.gremlin.server.GremlinServer - Graph [graph] configured at [/janusgraph-config/janusgraph.properties] could not be instantiated and will not be available in Gremlin Server. GraphFactory message: GraphFactory could not instantiate this Graph implementation [class org.janusgraph.core.JanusGraphFactory]
java.lang.RuntimeException: GraphFactory could not instantiate this Graph implementation [class org.janusgraph.core.JanusGraphFactory]
at org.apache.tinkerpop.gremlin.structure.util.GraphFactory.open(GraphFactory.java:82)
at org.apache.tinkerpop.gremlin.structure.util.GraphFactory.open(GraphFactory.java:70)
at org.apache.tinkerpop.gremlin.structure.util.GraphFactory.open(GraphFactory.java:104)
at org.apache.tinkerpop.gremlin.server.util.DefaultGraphManager.lambda$new$0(DefaultGraphManager.java:57)
at java.util.LinkedHashMap$LinkedEntrySet.forEach(LinkedHashMap.java:671)
at org.apache.tinkerpop.gremlin.server.util.DefaultGraphManager.<init>(DefaultGraphManager.java:55)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor.<init>(ServerGremlinExecutor.java:80)
at org.apache.tinkerpop.gremlin.server.GremlinServer.<init>(GremlinServer.java:120)
at org.apache.tinkerpop.gremlin.server.GremlinServer.<init>(GremlinServer.java:84)
at org.apache.tinkerpop.gremlin.server.GremlinServer.main(GremlinServer.java:343)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.tinkerpop.gremlin.structure.util.GraphFactory.open(GraphFactory.java:78)
... 13 more
Caused by: org.janusgraph.core.JanusGraphException: A JanusGraph graph with the same instance id [0a0a01657-node11] is already open. Might required forced shutdown.
at org.janusgraph.graphdb.database.StandardJanusGraph.<init>(StandardJanusGraph.java:165)
at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:160)
at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:131)
at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:111)
... 18 more
5651 [main] INFO org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor - Initialized Gremlin thread pool. Threads in pool named with pattern gremlin-*
5800 [main] INFO org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor - Initialized GremlinExecutor and preparing GremlinScriptEngines instances.
7837 [gremlin-server-exec-1] ERROR org.apache.tinkerpop.gremlin.jsr223.DefaultGremlinScriptEngineManager - Could not create GremlinScriptEngine for gremlin-groovy
java.lang.IllegalStateException: javax.script.ScriptException: javax.script.ScriptException: groovy.lang.MissingPropertyException: No such property: graph for class: Script1
at org.apache.tinkerpop.gremlin.jsr223.DefaultGremlinScriptEngineManager.lambda$createGremlinScriptEngine$16(DefaultGremlinScriptEngineManager.java:464)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:270)
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
at org.apache.tinkerpop.gremlin.jsr223.DefaultGremlinScriptEngineManager.createGremlinScriptEngine(DefaultGremlinScriptEngineManager.java:450)
at org.apache.tinkerpop.gremlin.jsr223.DefaultGremlinScriptEngineManager.getEngineByName(DefaultGremlinScriptEngineManager.java:219)
at org.apache.tinkerpop.gremlin.jsr223.CachedGremlinScriptEngineManager.lambda$getEngineByName$0(CachedGremlinScriptEngineManager.java:57)
at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660)
at org.apache.tinkerpop.gremlin.jsr223.CachedGremlinScriptEngineManager.getEngineByName(CachedGremlinScriptEngineManager.java:57)
at org.apache.tinkerpop.gremlin.groovy.engine.GremlinExecutor.lambda$eval$0(GremlinExecutor.java:263)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: javax.script.ScriptException: javax.script.ScriptException: groovy.lang.MissingPropertyException: No such property: graph for class: Script1
at org.apache.tinkerpop.gremlin.groovy.jsr223.GremlinGroovyScriptEngine.eval(GremlinGroovyScriptEngine.java:397)
at javax.script.AbstractScriptEngine.eval(AbstractScriptEngine.java:264)
at org.apache.tinkerpop.gremlin.jsr223.DefaultGremlinScriptEngineManager.lambda$createGremlinScriptEngine$16(DefaultGremlinScriptEngineManager.java:460)
... 24 more
Caused by: javax.script.ScriptException: groovy.lang.MissingPropertyException: No such property: graph for class: Script1
at org.apache.tinkerpop.gremlin.groovy.jsr223.GremlinGroovyScriptEngine.eval(GremlinGroovyScriptEngine.java:713)
at org.apache.tinkerpop.gremlin.groovy.jsr223.GremlinGroovyScriptEngine.eval(GremlinGroovyScriptEngine.java:395)
... 26 more
Caused by: groovy.lang.MissingPropertyException: No such property: graph for class: Script1
at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.unwrap(ScriptBytecodeAdapter.java:66)
at org.codehaus.groovy.runtime.callsite.PogoGetPropertySite.getProperty(PogoGetPropertySite.java:51)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callGroovyObjectGetProperty(AbstractCallSite.java:310)
at Script1.run(Script1.groovy:16)
at org.apache.tinkerpop.gremlin.groovy.jsr223.GremlinGroovyScriptEngine.eval(GremlinGroovyScriptEngine.java:690)
... 27 more
7841 [main] WARN org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor - Could not initialize gremlin-groovy GremlinScriptEngine as init script could not be evaluated
java.util.concurrent.CompletionException: java.lang.IllegalArgumentException: gremlin-groovy is not an available GremlinScriptEngine
at java.util.concurrent.CompletableFuture.reportJoin(CompletableFuture.java:375)
at java.util.concurrent.CompletableFuture.join(CompletableFuture.java:1934)
at org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor.lambda$new$4(ServerGremlinExecutor.java:141)
at java.util.LinkedHashMap$LinkedKeySet.forEach(LinkedHashMap.java:559)
at org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor.<init>(ServerGremlinExecutor.java:136)
at org.apache.tinkerpop.gremlin.server.GremlinServer.<init>(GremlinServer.java:120)
at org.apache.tinkerpop.gremlin.server.GremlinServer.<init>(GremlinServer.java:84)
at org.apache.tinkerpop.gremlin.server.GremlinServer.main(GremlinServer.java:343)
Caused by: java.lang.IllegalArgumentException: gremlin-groovy is not an available GremlinScriptEngine
at org.apache.tinkerpop.gremlin.jsr223.CachedGremlinScriptEngineManager.registerLookUpInfo(CachedGremlinScriptEngineManager.java:95)
at org.apache.tinkerpop.gremlin.jsr223.CachedGremlinScriptEngineManager.getEngineByName(CachedGremlinScriptEngineManager.java:58)
at org.apache.tinkerpop.gremlin.groovy.engine.GremlinExecutor.lambda$eval$0(GremlinExecutor.java:263)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
I think all those errors are created by this :
Caused by: org.janusgraph.core.JanusGraphException: A JanusGraph graph with the same instance id [0a0a01657-node11] is already open. Might required forced shutdown.
And this is the script clean.groovy
graph = JanusGraphFactory.open('janusgraph.properties')
mgmt = graph.openManagement()
instances = mgmt.getOpenInstances()
instances.iterator().findAll {
!it.contains('current')
}.each {
mgmt.forceCloseInstance(it)
}
mgmt.commit()
So did anybody solved this problem of graph ?
Thanks in advance.
I finally got answer ! So you can use ConfiguredGraphFactory but it failed for me. So for the first container i started it normally but the second one i added another configuration in the janusgraph.properties file :
graph.replace-instance-if-exists=true`
And it works perfectly !

com.google.api.client.http.HttpRequest error when staging Dataflow to GCP

I am trying to run my Java Dataflow with the DataflowRunner on GCP.
Dataflow works fine with DirectRunner but when changing to DataflowRunner and running it, the console starts staging the files and I suddenly I get the following error several times:
mars 19, 2018 8:43:56 PM com.google.api.client.http.HttpRequest execute
AVERTISSEMENT: exception thrown while executing request
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:170)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
at sun.security.ssl.InputRecord.read(InputRecord.java:503)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973)
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:930)
at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1569)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1474)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:338)
at com.google.api.client.http.javanet.NetHttpResponse.<init>(NetHttpResponse.java:37)
at com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:94)
at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:981)
at com.google.api.client.googleapis.media.MediaHttpUploader.executeCurrentRequestWithoutGZip(MediaHttpUploader.java:545)
at com.google.api.client.googleapis.media.MediaHttpUploader.executeCurrentRequest(MediaHttpUploader.java:562)
at com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:419)
at com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:427)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:357)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
But the staging continues (as the console still outputs some staging info) and at some point the errors pops again (may be for the 5th time) and then the following:
mars 19, 2018 8:48:13 PM org.apache.beam.runners.dataflow.util.PackageUtil stageClasspathElements
INFOS: Still staging 127 files
As the console says that it is still staging and don't understand whether this is staging issue or related to the execution of my code...
Any idea about it and how to solve this?
** EDIT **
There is another error that I didn't notice before:
mars 19, 2018 9:41:46 PM com.google.api.client.http.HttpRequest execute
AVERTISSEMENT: exception thrown while executing request
java.io.IOException: insufficient data written
at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.close(HttpURLConnection.java:3540)
at com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:81)
at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:981)
at com.google.api.client.googleapis.media.MediaHttpUploader.executeCurrentRequestWithoutGZip(MediaHttpUploader.java:545)
at com.google.api.client.googleapis.media.MediaHttpUploader.executeCurrentRequest(MediaHttpUploader.java:562)
at com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:419)
at com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:427)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:357)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
This was actually a networking issue. Workaround solution is to deploy the pipeline through a Compute Engine instance. This will assure you a good network bandwidth.

Dataflow Batch Job fails with "Failed to close some writers"

I am running a batch pipeline with the Apache Beam 2.2 SDK via the Cloud Dataflow service. There are 751 text files that I parse using TextIO.readAll() transform, deserialize and write to a date partitioned table in BigQuery.
First thing I noticed is that autoscaling was not really kicking in and left the pipeline at 15 workers, even though I was able to push throughput a lot higher when for example manually setting the number of workers to 250.
My pipeline fails with the following stack trace:
(abed94a6f5139e21): java.io.IOException: Failed to close some writers
at org.apache.beam.sdk.io.gcp.bigquery.WriteBundlesToFiles.finishBundle(WriteBundlesToFiles.java:248)
Suppressed: java.io.IOException: com.google.api.client.googleapis.json.GoogleJsonResponseException: 503 Service Unavailable
Service Unavailable
at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.waitForCompletionAndThrowIfUploadFailed(AbstractGoogleAsyncWriteChannel.java:431)
at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.close(AbstractGoogleAsyncWriteChannel.java:289)
at org.apache.beam.sdk.io.gcp.bigquery.TableRowWriter.close(TableRowWriter.java:81)
at org.apache.beam.sdk.io.gcp.bigquery.WriteBundlesToFiles.finishBundle(WriteBundlesToFiles.java:242)
at org.apache.beam.sdk.io.gcp.bigquery.WriteBundlesToFiles$DoFnInvoker.invokeFinishBundle(Unknown Source)
at org.apache.beam.runners.core.SimpleDoFnRunner.finishBundle(SimpleDoFnRunner.java:187)
at com.google.cloud.dataflow.worker.SimpleParDoFn.finishBundle(SimpleParDoFn.java:407)
at com.google.cloud.dataflow.worker.util.common.worker.ParDoOperation.finish(ParDoOperation.java:60)
at com.google.cloud.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:76)
at com.google.cloud.dataflow.worker.DataflowWorker.executeWork(DataflowWorker.java:330)
at com.google.cloud.dataflow.worker.DataflowWorker.doWork(DataflowWorker.java:302)
at com.google.cloud.dataflow.worker.DataflowWorker.getAndPerformWork(DataflowWorker.java:251)
at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:135)
at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:115)
at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:102)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 503 Service Unavailable
Service Unavailable
at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)
at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113)
at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:357)
... 4 more
Should I try with even more workers or split the work across several pipelines?
Thanks to the comment by jkff it worked flawlessly - after setting --maxNumWorkers=250 (15 seems to be the standard maximum).
The error was a transient error that Dataflow would retry several times and in the end, the pipeline ran successfully.

How to handle "an unhandled error caused by the Dataflow SDK" (corrupted gz as input)

Is there a way to deal with "an unhandled error caused by the Dataflow SDK"?
Specifically, we have a Dataflow job that takes a list of gz files (in GCS) as input, and produces some output.
Once in a while one of the gz files may be corrupted, and the job fails because of it.
We are wondering if there is a way to handle this -- specifically, we want to make it so that the job would ignore such corrupted file(s) and proceed.
It is not clear if we can catch an exception thrown due to the corrupted gz file or not (because it appears that it is handled in Dataflow SDK itself, causing it to fail).
(For Google Dataflow team: Here is a specific dataflow job id: 2017-04-02_05_08_20-5491890758767473661.)
Update: Here's the stack trace we got from the logging UI.
(778029c78ed61ff2): java.io.IOException: Failed to advance reader of source: StaticValueProvider{value=gs://aaa.gz} range [0, 9223372036854775807)
at com.google.cloud.dataflow.sdk.runners.worker.WorkerCustomSources$BoundedReaderIterator.advance(WorkerCustomSources.java:544)
at com.google.cloud.dataflow.sdk.util.common.worker.ReadOperation$SynchronizedReaderIterator.advance(ReadOperation.java:425)
at com.google.cloud.dataflow.sdk.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:217)
at com.google.cloud.dataflow.sdk.util.common.worker.ReadOperation.start(ReadOperation.java:182)
at com.google.cloud.dataflow.sdk.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:69)
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.executeWork(DataflowWorker.java:284)
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.doWork(DataflowWorker.java:220)
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.getAndPerformWork(DataflowWorker.java:170)
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.doWork(DataflowWorkerHarness.java:192)
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:172)
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:159)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.EOFException
at org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream.read(GzipCompressorInputStream.java:278)
at java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385)
at com.google.cloud.dataflow.sdk.io.TextIO$TextSource$TextBasedReader.tryToEnsureNumberOfBytesInBuffer(TextIO.java:1077)
at com.google.cloud.dataflow.sdk.io.TextIO$TextSource$TextBasedReader.findSeparatorBounds(TextIO.java:1011)
at com.google.cloud.dataflow.sdk.io.TextIO$TextSource$TextBasedReader.readNextRecord(TextIO.java:1043)
at com.google.cloud.dataflow.sdk.io.CompressedSource$CompressedReader.readNextRecord(CompressedSource.java:482)
at com.google.cloud.dataflow.sdk.io.FileBasedSource$FileBasedReader.advanceImpl(FileBasedSource.java:536)
at com.google.cloud.dataflow.sdk.io.OffsetBasedSource$OffsetBasedReader.advance(OffsetBasedSource.java:287)
at com.google.cloud.dataflow.sdk.runners.worker.WorkerCustomSources$BoundedReaderIterator.advance(WorkerCustomSources.java:541)
... 14 more

Jenkins slave job fails after successful job

I have 2 jenkins servers because my 2 builds have some incompatible system requirements.
I setup a new node for one of the servers and migrated the jobs from the other server and set them up to run on the node.
The node runs the job just fine and even archives the artifacts (they are linked from the job) but the job throws and exception and gets marked as a failure.
** Below is the output from the jobs **
Completed build, now archiving <-- I print this out at the end of my last build step
FATAL: Remote call on ops-1-jenkins-android-10-186.fam.io failed
java.io.IOException: Remote call on ops-1-jenkins-android-10-186.fam.io failed
at hudson.remoting.Channel.call(Channel.java:748)
at hudson.Launcher$RemoteLauncher.kill(Launcher.java:940)
at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:556)
at hudson.model.Run.execute(Run.java:1745)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:89)
at hudson.model.Executor.run(Executor.java:240)
Caused by: java.lang.NoClassDefFoundError: Could not initialize class hudson.slaves.SlaveComputer
at hudson.util.ProcessTree.getKillers(ProcessTree.java:151)
at hudson.util.ProcessTree$OSProcess.killByKiller(ProcessTree.java:212)
at hudson.util.ProcessTree$UnixProcess.kill(ProcessTree.java:557)
at hudson.util.ProcessTree$UnixProcess.killRecursively(ProcessTree.java:564)
at hudson.util.ProcessTree$Unix.killAll(ProcessTree.java:488)
at hudson.Launcher$RemoteLauncher$KillTask.call(Launcher.java:952)
at hudson.Launcher$RemoteLauncher$KillTask.call(Launcher.java:943)
at hudson.remoting.UserRequest.perform(UserRequest.java:118)
at hudson.remoting.UserRequest.perform(UserRequest.java:48)
at hudson.remoting.Request$2.run(Request.java:328)
at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at hudson.remoting.Engine$1$1.run(Engine.java:63)
at java.lang.Thread.run(Thread.java:701)
Thanks Lovato!
After I updated all the plugins and jenkins things worked just fine.
We got this error: FATAL: java.io.IOException: Remote call on pm_jenprodslave_1 failed. We found the slave server was unresponsive and had to reboot it. Once it came back the build works again.

Resources