Dataflow stock WordCount example with DataflowPipelineRunner fails when run outside of Maven - google-cloud-dataflow

I am able to successfully run the WordCount example using DataflowPipelineRunner with the maven exec:java command shown in the docs.
However, when I attempt to run it in my own 1.8 VM, it doesn't work. I am using these args (on Windows):
--project=highfive-metrics-service \
--stagingLocation=gs://highfive-dataflow-test/staging \
--runner=BlockingDataflowPipelineRunner \
--gCloudPath=C:/Progra~1/Google/CloudS~1/google-cloud-sdk/bin/gcloud.cmd
I get the following error:
2014-12-24T04:53:34.849Z: (5eada047929dcead): Workflow failed. Causes: (5eada047929dce2e): There was a problem creating the GCE VMs or starting Dataflow on the VMs so no data was processed. Possible causes:
1. A failure in user code on in the worker.
2. A failure in the Dataflow code.
Next Steps:
1. Check the GCE serial console for possible errors in the logs.
2. Look for similar issues on http://stackoverflow.com/questions/tagged/google-cloud-dataflow.
Prior to the subsequent cleanup, I observed three harness instances on GCE as expected. Looking at the serial console for the first one, wordcount-jroy-1224043800-12232038-8cfa-harness-0, I see "normal" (comparing to what I see when running with Maven) looking output that ends with:
Dec 24 04:38:45 [ 16.443484] IPv6: ADDRCONF(NETDEV_CHANGE): docker0: link becomes ready
wordcount-jroy-1224043800-12232038-8cfa-harness-0 kernel: [ 16.438005] IPv6: ADDRCONF(NETDEV_CHANGE): veth30b3796: link becomes ready
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 kernel: [ 16.439395] docker0: port 1(veth30b3796) entered forwarding state
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 kernel: [ 16.440262] docker0: port 1(veth30b3796) entered forwarding state
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 kernel: [ 16.443484] IPv6: ADDRCONF(NETDEV_CHANGE): docker0: link becomes ready
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 12898 100 12898 0 0 2009k 0 --:--:-- --:--:-- --:--:-- 3148k
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: {"attributes":{"config":"{\"alsologtostderr\":true,\"base_task_dir\":\"/tmp/tasks/\",\"commandlines_file_name\":\"commandlines.txt\",\"continue_on_exception\":true,\"dataflow_api_endpoint\":\"https://www.googleapis.com/\",\"dataflow_api_version\":\"v1beta1\",\"log_dir\":\"/dataflow/logs/taskrunner/harness\",\"log_to_gcs\":true,\"log_to_serialconsole\":true,\"parallel_worker_flags\":{\"job_id\":\"2014-12-23_20_38_16.593375-08_10.48.106.68_-469744588\",\"project_id\":\"highfive-metrics-service\",\"reporting_enabled\":true,\"root_url\":\"https://www.googleapis.com/\",\"service_path\":\"dataflow/v1b3/projects/\",\"temp_gcs_directory\":\"gs://highfive-dataflow-test/staging\",\"worker_id\":\"wordcount-jroy-1224043800-12232038-8cfa-harness-0\"},\"project_id\":\"highfive-metrics-service\",\"python_harness_cmd\":\"python_harness_main\",\"scopes\":[\"https://www.googleapis.com/auth/devstorage.full_control\",\"https://www.googleapis.com/auth/cloud-platform\"],\"task_group\":\"nogroup\",\"task_user\":\"nobody\",\"temp_g
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 goo[ 16.494163] device veth29b6136 entered promiscuous mode
gle: cs_directory\":\"gs://highfive-dataflow-test/staging\",\"vm_id\":\"wordcoun[ 16.505311] IPv6: ADDRCONF(NETDEV_UP): veth29b6136: link is not ready
[ 16.507623] docker0: port 2(veth29b6136) entered forwarding state
t-jroy-122404380[ 16.507633] docker0: port 2(veth29b6136) entered forwarding state
0-12232038-8cfa-harness-0\"}","google-container-manifest":"\ncontainers:\n-\n env:\n -\n name: GCS_BUCKET\n value: dataflow-docker-images\n image: google/docker-registry\n imagePullPolicy: PullNever\n name: repository\n ports:\n -\n containerPort: 5000\n hostPort: 5000\n name: registry\n-\n image: localhost:5000/dataflow/taskrunner:20141217-rc00 \n imagePullPolicy: PullIfNotPresent\n name: taskrunner\n volumeMounts:\n -\n mountPath: /dataflow/logs/taskrunner/harness\n name: dataflowlogs-harness\n-\n env:\n -\n name: LOG_DIR\n value: /dataflow/logs\n image: localhost:5000/dataflow/shuffle:20141217-rc00 \n imagePullPolicy: PullIfNotPresent\n name: shuffle\n ports:\n -\n containerPort: 12345\n hostPort: 12345\n name: shuffle1\n -\n containerPort: 22349\n hostPort: 22349\n name: shuffle2\n volumeMounts:\n -\n mountPath: /var/shuffle\n name: dataflow-shuffle\n -\n mountPath: /dataflow/logs\n name: dataflow-logs\nversion: v1
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: beta2\nvolumes:\n-\n name: dataflowlogs-harness\n source:\n hostDir:\n path: /var/log/dataflow/taskrunner/harness\n-\n name: dataflow-shuffle\n source:\n hostDir:\n path: /dataflow/shuffle\n-\n name: dataflow-logs\n source:\n hostDir:\n path: /var/log/dataflow/shuffle\n","job_id":"2014-12-23_20_38_16.593375-08_10.48.106.68_-469744588","packages":"gs://dataflow-releases-prod/worker_packages/NOTICES.shuffle|NOTICES.shuffler|gs://highfive-dataflow-test/staging/access-bridge-64-fE-vq3Wgxy5FvnwmA5YdzQ.jar|access-bridge-64-fE-vq3Wgxy5FvnwmA5YdzQ.jar|gs://highfive-dataflow-test/staging/avro-1.7.7-dTlef6huetK-4IFERNhcqA.jar|avro-1.7.7-dTlef6huetK-4IFERNhcqA.jar|gs://highfive-dataflow-test/staging/charsets-7HC8Y2_U4k8yfkY6e4lxnw.jar|charsets-7HC8Y2_U4k8yfkY6e4lxnw.jar|gs://highfive-dataflow-test/staging/cldrdata-A4PVsm4mesLVUWOTKV5dhQ.jar|cldrdata-A4PVsm4mesLVUWOTKV5dhQ.jar|gs://highfive-dataflow-test/staging/commons-codec-1.3-2I5AW2KkklMQs3emwoFU5Q.jar|commons-codec-1.3-2I5AW2KkklMQs3emwoFU5Q.jar|gs://highfive-dataf
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: low-test/staging/commons-compress-1.4.1-uyvcB16Wfp4wnt8X1Uqi4w.jar|commons-compress-1.4.1-uyvcB16Wfp4wnt8X1Uqi4w.jar|gs://highfive-dataflow-test/staging/commons-logging-1.1.1-blBISC6STJhwBOT8Ksr3NQ.jar|commons-logging-1.1.1-blBISC6STJhwBOT8Ksr3NQ.jar|gs://highfive-dataflow-test/staging/dataflow-test-YIJKUxARCp14MLdWzNdBdQ.zip|dataflow-test-YIJKUxARCp14MLdWzNdBdQ.zip|gs://highfive-dataflow-test/staging/deploy-eLnif2izXW_mrleXudK0Eg.jar|deploy-eLnif2izXW_mrleXudK0Eg.jar|gs://highfive-dataflow-test/staging/dnsns-hmxeUSrhtJou0Wo-UoCjTw.jar|dnsns-hmxeUSrhtJou0Wo-UoCjTw.jar|gs://highfive-dataflow-test/staging/google-api-client-1.19.0-YgeHY_Y9dPd2PwGBWwvmmw.jar|google-api-client-1.19.0-YgeHY_Y9dPd2PwGBWwvmmw.jar|gs://highfive-dataflow-test/staging/google-api-services-bigquery-v2-rev167-1.19.0-mNojB6wqlFqAd2G9Zo7o5w.jar|google-api-services-bigquery-v2-rev167-1.19.0-mNojB6wqlFqAd2G9Zo7o5w.jar|gs://highfive-dataflow-test/staging/google-api-services-compute-v1-rev34-1.19.0-yR5ItN9uOowLPyMiTckyCA.jar|google-api-services
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: -compute-v1-rev34-1.19.0-yR5ItN9uOowLPyMiTckyCA.jar|gs://highfive-dataflow-test/staging/google-api-services-dataflow-v1beta3-rev1-1.19.0-Cg8Pyd4F0t7yqSE4E7v7Rg.jar|google-api-services-dataflow-v1beta3-rev1-1.19.0-Cg8Pyd4F0t7yqSE4E7v7Rg.jar|gs://highfive-dataflow-test/staging/google-api-services-datastore-protobuf-v1beta2-rev1-2.1.0-UxLefoYWxF5K1EpQjKMJ4w.jar|google-api-services-datastore-protobuf-v1beta2-rev1-2.1.0-UxLefoYWxF5K1EpQjKMJ4w.jar|gs://highfive-dataflow-test/staging/google-api-services-pubsub-v1beta1-rev9-1.19.0-7E1jg5ZyfaqZBCHY18fPkQ.jar|google-api-services-pubsub-v1beta1-rev9-1.19.0-7E1jg5ZyfaqZBCHY18fPkQ.jar|gs://highfive-dataflow-test/staging/google-api-services-storage-v1-rev11-1.19.0-8roIrNilTlO2ZqfGfOaqkg.jar|google-api-services-storage-v1-rev11-1.19.0-8roIrNilTlO2ZqfGfOaqkg.jar|gs://highfive-dataflow-test/staging/google-cloud-dataflow-java-examples-all-manual_build-A9j6W_hzOlq6PBrg1oSIAQ.jar|google-cloud-dataflow-java-examples-all-manual_build-A9j6W_hzOlq6PBrg1oSIAQ.jar|gs://highfive-dataf
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: low-test/staging/google-cloud-dataflow-java-examples-all-manual_build-tests-iIdI-AhKWiVKTuJzU5JxcQ.jar|google-cloud-dataflow-java-examples-all-manual_build-tests-iIdI-AhKWiVKTuJzU5JxcQ.jar|gs://highfive-dataflow-test/staging/google-cloud-dataflow-java-sdk-all-alpha-PqdZNVZwhs6ixh6de6vM7A.jar|google-cloud-dataflow-java-sdk-all-alpha-PqdZNVZwhs6ixh6de6vM7A.jar|gs://highfive-dataflow-test/staging/google-http-client-1.19.0-1Vc3U5mogjNLbpTK7NVwDg.jar|google-http-client-1.19.0-1Vc3U5mogjNLbpTK7NVwDg.jar|gs://highfive-dataflow-test/staging/google-http-client-jackson-1.15.0-rc-oW6nFU6Gme53SYGJ9KlNbA.jar|google-http-client-jackson-1.15.0-rc-oW6nFU6Gme53SYGJ9KlNbA.jar|gs://highfive-dataflow-test/staging/google-http-client-jackson2-1.19.0-AOUP2FfuHtACTs_0sul54A.jar|google-http-client-jackson2-1.19.0-AOUP2FfuHtACTs_0sul54A.jar|gs://highfive-dataflow-test/staging/google-http-client-protobuf-1.15.0-rc-xYoprQdNcvzuQGZXvJ3ZaQ.jar|google-http-client-protobuf-1.15.0-rc-xYoprQdNcvzuQGZXvJ3ZaQ.jar|gs://highfive-dataflow-test/st
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: aging/google-oauth-client-1.19.0-b3S5WqgD7iWrwg38pfg3Xg.jar|google-oauth-client-1.19.0-b3S5WqgD7iWrwg38pfg3Xg.jar|gs://highfive-dataflow-test/staging/google-oauth-client-java6-1.19.0-cP8xzICJnsNlhTfaS0egcg.jar|google-oauth-client-java6-1.19.0-cP8xzICJnsNlhTfaS0egcg.jar|gs://highfive-dataflow-test/staging/guava-18.0-HtxcCcuUqPt4QL79yZSvag.jar|guava-18.0-HtxcCcuUqPt4QL79yZSvag.jar|gs://highfive-dataflow-test/staging/hamcrest-all-1.3-n3_QBeS4s5a8ffbBPQIpFQ.jar|hamcrest-all-1.3-n3_QBeS4s5a8ffbBPQIpFQ.jar|gs://highfive-dataflow-test/staging/hamcrest-core-1.3-DvCZoZPq_3EWA4TcZlVL6g.jar|hamcrest-core-1.3-DvCZoZPq_3EWA4TcZlVL6g.jar|gs://highfive-dataflow-test/staging/httpclient-4.0.1-sfocsPjEBE7ppkUpSIJZkA.jar|httpclient-4.0.1-sfocsPjEBE7ppkUpSIJZkA.jar|gs://highfive-dataflow-test/staging/httpcore-4.0.1-_SGEPUOMREqA8u_h7qy9_w.jar|httpcore-4.0.1-_SGEPUOMREqA8u_h7qy9_w.jar|gs://highfive-dataflow-test/staging/idea_rt-6II88e1BKUeCOQqcrZht-w.jar|idea_rt-6II88e1BKUeCOQqcrZht-w.jar|gs://highfive-dataflow-test/staging/jacce
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: ss-laKenN34W6jKKivkBUzVcA.jar|jaccess-laKenN34W6jKKivkBUzVcA.jar|gs://highfive-dataflow-test/staging/jackson-annotations-2.4.2-7cAfM1zz0nmoSOC_NlRIcw.jar|jackson-annotations-2.4.2-7cAfM1zz0nmoSOC_NlRIcw.jar|gs://highfive-dataflow-test/staging/jackson-core-2.4.2-3CV4j5-qI7Y-1EADAiakmw.jar|jackson-core-2.4.2-3CV4j5-qI7Y-1EADAiakmw.jar|gs://highfive-dataflow-test/staging/jackson-core-asl-1.9.13-Ht2i1DaJ57v29KlMROpA4Q.jar|jackson-core-asl-1.9.13-Ht2i1DaJ57v29KlMROpA4Q.jar|gs://highfive-dataflow-test/staging/jackson-databind-2.4.2-M7rkZKQCfOO3vWkOyf9BKg.jar|jackson-databind-2.4.2-M7rkZKQCfOO3vWkOyf9BKg.jar|gs://highfive-dataflow-test/staging/jackson-mapper-asl-1.9.13-eoeZFbovPzo033HQKy6x_Q.jar|jackson-mapper-asl-1.9.13-eoeZFbovPzo033HQKy6x_Q.jar|gs://highfive-dataflow-test/staging/javaws-O8JqID6BpsXsCSRRkhii3w.jar|javaws-O8JqID6BpsXsCSRRkhii3w.jar|gs://highfive-dataflow-test/staging/jce-eMjjWzdqQh30yNZ9HMuXMA.jar|jce-eMjjWzdqQh30yNZ9HMuXMA.jar|gs://highfive-dataflow-test/staging/jfr-xDzacRGMQeIR4SdPe69o1A.jar|jfr
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: -xDzacRGMQeIR4SdPe69o1A.jar|gs://highfive-dataflow-test/staging/jfxrt-5aSYnU7M458Xy_hx5zXF8w.jar|jfxrt-5aSYnU7M458Xy_hx5zXF8w.jar|gs://highfive-dataflow-test/staging/jfxswt-X8I_DFy9gs_6LMLp6_LFPA.jar|jfxswt-X8I_DFy9gs_6LMLp6_LFPA.jar|gs://highfive-dataflow-test/staging/joda-time-2.4-EIO48_0LMn2_imYqUT5jxA.jar|joda-time-2.4-EIO48_0LMn2_imYqUT5jxA.jar|gs://highfive-dataflow-test/staging/jsr305-1.3.9-ntb9Wy3-_ccJ7t2jV2Tb3g.jar|jsr305-1.3.9-ntb9Wy3-_ccJ7t2jV2Tb3g.jar|gs://highfive-dataflow-test/staging/jsse-HOItnWzBlT4hG5HPmlF56w.jar|jsse-HOItnWzBlT4hG5HPmlF56w.jar|gs://highfive-dataflow-test/staging/junit-4.11-lCgz3FeSwzD13Q_KNW4MuQ.jar|junit-4.11-lCgz3FeSwzD13Q_KNW4MuQ.jar|gs://highfive-dataflow-test/staging/localedata-R9ei3T8qar8cibFNN0X7Qg.jar|localedata-R9ei3T8qar8cibFNN0X7Qg.jar|gs://highfive-dataflow-test/staging/management-agent-kiuGeHiVpYKGCDNexcQPIg.jar|management-agent-kiuGeHiVpYKGCDNexcQPIg.jar|gs://highfive-dataflow-test/staging/mockito-all-1.9.5-_T4jPTp05rc7PhcOO34Saw.jar|mockito-all-1.9.5-_T4jPTp0
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: 5rc7PhcOO34Saw.jar|gs://highfive-dataflow-test/staging/nashorn-x8si6abt-U04QaVUHvl_bg.jar|nashorn-x8si6abt-U04QaVUHvl_bg.jar|gs://highfive-dataflow-test/staging/paranamer-2.3-rdmhSrp7GRPVm0JexWjzzg.jar|paranamer-2.3-rdmhSrp7GRPVm0JexWjzzg.jar|gs://highfive-dataflow-test/staging/plugin-TG6U30mOzKi8yMGKYd7ong.jar|plugin-TG6U30mOzKi8yMGKYd7ong.jar|gs://highfive-dataflow-test/staging/protobuf-java-2.5.0-g0LcHblB4cg-bZEbNj3log.jar|protobuf-java-2.5.0-g0LcHblB4cg-bZEbNj3log.jar|gs://highfive-dataflow-test/staging/resources-RavNZwakZf55HEtrC9KyCw.jar|resources-RavNZwakZf55HEtrC9KyCw.jar|gs://highfive-dataflow-test/staging/rt-Z2kDZdIt-eG8CCtFIinW1g.jar|rt-Z2kDZdIt-eG8CCtFIinW1g.jar|gs://highfive-dataflow-test/staging/slf4j-api-1.7.7-M8fOZEWF4TcHiUbfZmJY7A.jar|slf4j-api-1.7.7-M8fOZEWF4TcHiUbfZmJY7A.jar|gs://highfive-dataflow-test/staging/slf4j-jdk14-1.7.7-hDm19oG8Vzi6jVY9pLtr_g.jar|slf4j-jdk14-1.7.7-hDm19oG8Vzi6jVY9pLtr_g.jar|gs://highfive-dataflow-test/staging/snappy-java-1.0.5-WxwEQNTeXiDmEGBuY9O3Og.jar|snappy-java
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: -1.0.5-WxwEQNTeXiDmEGBuY9O3Og.jar|gs://highfive-dataflow-test/staging/sunec-ffsdkJzKsC8XbuZa-XHp3Q.jar|sunec-ffsdkJzKsC8XbuZa-XHp3Q.jar|gs://highfive-dataflow-test/staging/sunjce_provider-4x9-ynTri_pg6Hhk2Zj9Ow.jar|sunjce_provider-4x9-ynTri_pg6Hhk2Zj9Ow.jar|gs://highfive-dataflow-test/staging/sunmscapi-5TwnMDAci3Hf47yMZYmN1g.jar|sunmscapi-5TwnMDAci3Hf47yMZYmN1g.jar|gs://highfive-dataflow-test/staging/sunpkcs11-vCiFLLKN99XBpHW2JTkOBw.jar|sunpkcs11-vCiFLLKN99XBpHW2JTkOBw.jar|gs://highfive-dataflow-test/staging/xz-1.0-6m1HjeacPsPpniZtMte8kw.jar|xz-1.0-6m1HjeacPsPpniZtMte8kw.jar|gs://highfive-dataflow-test/staging/zipfs-SIKQJJIhpGOgSa4tT6nStA.jar|zipfs-SIKQJJIhpGOgSa4tT6nStA.jar"},"description":"GCE Instance created for Dataflow","disks":[{"deviceName":"persistent-disk-0","index":0,"mode":"READ_WRITE","type":"PERSISTENT"}],"hostname":"wordcount-jroy-1224043800-12232038-8cfa-harness-0.c.highfive-metrics-service.internal","id":8960015560553137779,"image":"","machineType":"projects/537312487774/machineTypes/n1-stan
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: dard-4","maintenanceEvent":"NONE","networkInterfaces":[{"accessConfigs":[{"externalIp":"130.211.184.44","type":"ONE_TO_ONE_NAT"}],"forwardedIps":[],"ip":"10.240.173.213","network":"projects/537312487774/networks/default"}],"scheduling":{"automaticRestart":"TRUE","onHostMaintenance":"MIGRATE"},"serviceAccounts":{"537312487774#developer.gserviceaccount.com":{"aliases":["default"],"email":"537312487774#developer.gserviceaccount.com","scopes":["https://www.googleapis.com/auth/any-api","https://www.googleapis.com/auth/bigquery","https://www.googleapis.com/auth/cloud-platform","https://www.googleapis.com/auth/compute","https://www.googleapis.com/auth/datastore","https://www.googleapis.com/auth/devstorage.full_control","https://www.googleapis.com/auth/logging.write","https://www.googleapis.com/auth/ndev.cloudman","https://www.googleapis.com/auth/pubsub","https://www.googleapis.com/auth/userinfo.email"]},"default":{"aliases":["default"],"email":"537312487774#developer.gserviceaccount.com","scopes":["https://www.goog
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: leapis.com/auth/any-api","https://www.googleapis.com/auth/bigquery","https://www.googleapis.com/auth/cloud-platform","https://www.googleapis.com/auth/compute","https://www.googleapis.com/auth/datastore","https://www.googleapis.com/auth/devstorage.full_control","https://www.googleapis.com/auth/logging.write","https://www.googleapis.com/auth/ndev.cloudman","https://www.googleapis.com/auth/pubsub","https://www.googleapis.com/auth/userinfo.email"]}},"tags":["dataflow"],"zone":"projects/537312487774/zones/us-central1-a"}
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: No startup script found in metadata.
Not sure what I should be looking for, but this seems to reliably fail for me in this manner. I see the same problem when I try to run a custom pipeline of my own (i.e. not WordCount), and also when I run the WordCount example on Linux.
I saved off a file where I recorded:
The complete output from the WordCount main class
The metadata field values set on the GCE instance
The complete serial console output
It is available here.
Things I've tried so far, without success:
Forcing the language level of the compiled classes to 1.7 (am using 1.8 JRE)
Modifying DataflowPipelineRunner::detectClassPathResourcesToStage to not emit JRE jar files (this is a difference I noticed in the log compared to Maven; when running under Maven the JRE jars are not staged).
EDIT: Attempting to set the classpath to EXACTLY the same as what Maven ends up using (removing all of our projects' dependencies). This seemed to change the behavior a bit and I got to a java.lang.ClassNotFoundException: com.google.cloud.dataflow.examples.WordCount$ExtractWordsFn in the worker output.
Strongly suspicious that the problem lies with the staged classpath, but without more specific error messages, I'm shooting in the dark. Would appreciate ideas of where to look next or other things to try.

When running pipelines using [Blocking]DataflowPipelineRunner from the Cloud Dataflow Java SDK, the runner automatically copies everything from your local Java class path to a staging location in Google Cloud Storage, which is being accessed by workers on-demand.
ClassNotFoundException in the Cloud Dataflow worker environment is an indication that required dependencies for your pipeline are not properly staged in a Google Cloud Storage bucket. This likely root cause can be confirmed by looking at the contents of your staging bucket in Google Developers Console and the console output of BlockingDataflowPipelineRunner.
Now, the problem can be fixed by bundling all dependencies into a single, monolithic jar. In Maven, the following command can be used to create such a jar as long as the bundle plugin is properly configured to embed all transitive dependencies:
mvn bundle:bundle
Then, the bundled jar can be executed normally, such as:
java -cp <bundled jar> <main class> --project=<project> ...
Alternatively, the problem can be fixed by manually adding dependencies to your local class path. For example, the following command may be helpful when running an unbundled jar:
java -cp <unbundled jar>:<dep1>:<dep2>:...:<depN> <main class> --project=<project> ...
where dep1 to depN are all the dependencies needed for execution of the program. This is clearly error prone, and we don't endorse it. Our documentation recommends using mvn exec:java because that sets the execution class path automatically from the dependencies listed in the POM file. Specifically, to run WordCount example, use:
mvn exec:java -pl examples \
-Dexec.mainClass=com.google.cloud.dataflow.examples.WordCount \
-Dexec.args="--project=<YOUR GCP PROJECT NAME> --stagingLocation=<YOUR GCS LOCATION> --runner=BlockingDataflowPipelineRunner"
The main difference between bundled and unbundled version is in the upload activity before pipeline submission. Unbundled version has an advantage that it can automatically use unchanged dependencies that may have been uploaded in previous submissions.
To summarize, use mvn exec:java when running an unbundled jar, or bundle the dependencies into a monolithic jar. We'll try to clarify this in the documentation.

There's a very high likelihood that this is an issue with staging dependencies.
There's a high probability if you create a bundled jar it will just work. You can create a bundled jar by running the command
mvn bundle:bundle
This will create a single jar that should pull in all dependencies transitively. You then just need to add that jar to your class path and Dataflow should automatically stage it; Thereby ensuring your code as well as any dependencies are available on the worker.
Most likely the job worked with mvn exec, because maven automatically generates a class path with all dependencies from the POM. When running manually, that doesn't happen. i.e if you invoke java directly e.g.
java -cp <JAR FILES> your.main.class --project=<YOUR PROJECT> ....
then you must add all dependencies to the class path so that they get staged. Creating a bundled jar as suggested above is usually the easiest way to do that.

My suggestion would be to look at the worker logs to see if we can find additional information about what's going on in the workers.
There are three ways to get this information. The first is via the Dataflow UI. Go to the Google Cloud Console and then select the Dataflow option in the left hand frame. You should see a list of your jobs. You can click on the job in question. This should show you a graph of your job. On the right side you should see a button "view logs". Please click that. You should then see a UI for navigating the logs and you can look for errors.
The second option is to look for the logs on GCS. The location to look for is:
gs://PATH TO YOUR STAGING DIRECTORY/logs/JOB-ID/VM-ID/LOG-FILE
You might see multiple log files. The one we are most interested in is the one that starts with "start_java_worker". If that log file doesn't exist then the worker didn't make enough progress to actually upload the file; or else there might have been a permission problem uploading the log file.
In that case the best thing to do is to try to ssh into one of the VMs before it gets torn down. You should have about 15 minutes before the job fails and the VMs are deleted.
Once you login to the VM you can find all the logs in
/var/log/dataflow/...
The log we care most about at this point is:
/var/log/dataflow/taskrunner/harness/start_java_worker-SOME ID.log
If there is a problem starting the code that runs on the VM that log should tell us. That log and the other logs should also tell us if there is a permission problem that prevents the code running on the worker from being able to access Dataflow.
Please take a look and let us know if you find anything.

Related

Custom Runtime Won't Use Dockerfile

I have an App Engine service I deploy a custom runtime in a flexible environment. Deployments functioned normally on 11/20. On 11/21 gcloud app deploy stopped using the Dockerfile and began treating it as a non-custom runtime. Neither the app.yaml nor the Dockerfile have changed.
Below is a sample log from 11/20 and 11/21 respectively. You will note Using Dockerfile found in... of the first log is not present in the second log.
First log, 11/20:
2020-11-20 11:12:02,202 DEBUG root Loaded Command Group: ['gcloud', 'app']
2020-11-20 11:12:02,547 DEBUG root Loaded Command Group: ['gcloud', 'app', 'deploy']
2020-11-20 11:12:02,551 DEBUG root Running [gcloud.app.deploy] with arguments: [--project: "distributed-computing-qa", --version: "9-2-0rc9"]
2020-11-20 11:12:02,621 INFO oauth2client.client Refreshing access_token
2020-11-20 11:12:03,043 DEBUG root Loading runtimes experiment config from [gs://runtime-builders/experiments.yaml]
2020-11-20 11:12:03,076 INFO root Reading [<googlecloudsdk.api_lib.storage.storage_util.ObjectReference object at 0x0000021920ECA548>]
2020-11-20 11:12:03,526 DEBUG root API endpoint: [https://appengine.googleapis.com/], API version: [v1]
2020-11-20 11:12:04,419 INFO ___FILE_ONLY___ Services to deploy:
2020-11-20 11:12:04,420 INFO ___FILE_ONLY___ descriptor: [C:\Users\Benjamin Filkins\Documents\Projects\Deployment\QA\dci\app.yaml]
source: [C:\Users\Benjamin Filkins\Documents\Projects\Deployment\QA\dci]
target project: [distributed-computing-qa]
target service: [default]
target version: [9-2-0rc9]
target url: [https://distributed-computing-qa.uc.r.appspot.com]
2020-11-20 11:12:05,272 DEBUG root No bucket specified, retrieving default bucket.
2020-11-20 11:12:05,274 DEBUG root Using bucket [gs://staging.distributed-computing-qa.appspot.com].
2020-11-20 11:12:05,941 DEBUG root Service [appengineflex.googleapis.com] is already enabled for project [distributed-computing-qa]
2020-11-20 11:12:06,109 INFO ___FILE_ONLY___ Beginning deployment of service [default]...
2020-11-20 11:12:06,123 INFO root Ignoring directory [node_modules]: Directory matches ignore regex.
2020-11-20 11:12:09,085 INFO root Ignoring directory [server\node_modules]: Directory matches ignore regex.
2020-11-20 11:12:09,679 INFO root Using Dockerfile found in C:\Users\Benjamin Filkins\Documents\Projects\Deployment\QA\dci
2020-11-20 11:12:09,679 INFO ___FILE_ONLY___ Building and pushing image for service [default]
2020-11-20 11:12:10,305 DEBUG root Could not call git with args ('config', '--get-regexp', 'remote\\.(.*)\\.url'): Command '['git', 'config', '--get-regexp', 'remote\\.(.*)\\.url']' returned non-zero exit status 1.
2020-11-20 11:12:10,305 INFO root Could not generate [source-context.json]: Could not list remote URLs from source directory: C:\Users\Benjamin Filkins\Documents\Projects\Deployment\QA\dci
2020-11-20 11:12:37,592 INFO root Uploading [C:\Users\BENJAM~1\AppData\Local\Temp\tmpwbdhi28f\src.tgz] to [staging.distributed-computing-qa.appspot.com/us.gcr.io/distributed-computing-qa/appengine/default.9-2-0rc9:latest]
2020-11-20 11:13:03,413 DEBUG root Using builder image: [gcr.io/cloud-builders/docker]
Second log, 11/21:
2020-11-21 05:10:39,041 DEBUG root Loaded Command Group: ['gcloud', 'app']
2020-11-21 05:10:39,177 DEBUG root Loaded Command Group: ['gcloud', 'app', 'deploy']
2020-11-21 05:10:39,181 DEBUG root Running [gcloud.app.deploy] with arguments: [--project: "distributed-computing-qa", --version: "9-2-0rc10"]
2020-11-21 05:10:39,203 DEBUG root Loading runtimes experiment config from [gs://runtime-builders/experiments.yaml]
2020-11-21 05:10:39,231 INFO root Reading [<googlecloudsdk.api_lib.storage.storage_util.ObjectReference object at 0x000001E60B3ED208>]
2020-11-21 05:10:39,522 DEBUG root API endpoint: [https://appengine.googleapis.com/], API version: [v1]
2020-11-21 05:10:40,196 INFO ___FILE_ONLY___ Services to deploy:
2020-11-21 05:10:40,198 INFO ___FILE_ONLY___ descriptor: [C:\Users\Benjamin
Filkins\Documents\Projects\Deployment\QA\dci\app.yaml]
source: [C:\Users\Benjamin Filkins\Documents\Projects\Deployment\QA\dci]
target project: [distributed-computing-qa]
target service: [default]
target version: [9-2-0rc10]
target url: [https://distributed-computing-qa.uc.r.appspot.com]
2020-11-21 05:10:44,749 DEBUG root No bucket specified, retrieving default bucket.
2020-11-21 05:10:44,758 DEBUG root Using bucket [gs://staging.distributed-computing-qa.appspot.com].
2020-11-21 05:10:45,460 DEBUG root Service [appengineflex.googleapis.com] is already enabled for project [distributed-computing-qa]
2020-11-21 05:10:45,645 INFO ___FILE_ONLY___ Beginning deployment of service [default]...
2020-11-21 05:10:45,658 INFO root Ignoring directory [node_modules]: Directory matches ignore regex.
2020-11-21 05:10:48,255 INFO root Ignoring directory [server\node_modules]: Directory matches ignore regex.
2020-11-21 05:10:57,261 DEBUG root Could not call git with args ('config', '--get-regexp', 'remote\\.(.*)\\.url'): Command '['git', 'config', '--get-regexp', 'remote\\.(.*)\\.url']' returned non-zero exit status 1.
2020-11-21 05:10:57,261 INFO root Could not find any remote repositories associated with [C:\Users\Benjamin Filkins\Documents\Projects\Deployment\QA\dci]. Cloud diagnostic tools may not be able to display the correct source code for this deployment.
2020-11-21 05:11:19,099 DEBUG root Skipping upload of [.env]
2020-11-21 05:11:19,099 INFO root Incremental upload skipped 100.0% of data
There are four separate projects this is now occurring on. A co-worker can also confirm the same behavior. What I have tried and can confirm:
Updated Google Cloud SDK to latest version (319.0.0)
Confirmed Cloud Build API is active
Confirmed the Cloud Build service account has the App Engine Admin, Cloud Build Service Account and Service Account User roles
App.yaml and Dockerfile present in root and unchanged between attempts
App.yaml contains runtime: custom and env: flex
What I cannot confirm with certainty or prove did not have an impact:
Changes in OS (Windows 10), though no update had occurred during this time period
Changes in my GCP service account roles/permissions, though given the spread across four distinct projects and impacting multiple users seems incredibly unlikely
Any additional insight into this issue or additional items I may have missed would be greatly appreciated.
I have solved the issue by downgrading to SDK version 271.0.0. My machine has both Python 2.7 and 3 and I noted 274 and above began support for using Python 3.
Upgrading to 274 or above results in the reported issue. 273 and below (I only went as far as 267) does not have the reported issues. While I am currently unable to provide concrete evidence, my suspicion would be down to the SDK's ability to determine which version of Python to prefer. As noted here support of Python 2 was deprecated on 09/30/2020.

pod with app is getting evicted from gke as out of memory even if there is only 1/3rd memory used

I am newbie to gke.
I have python app running inside a gke pod. Pod gets evicted as out of memory after 30minutes. Total vm memory is 13GB, and as i ssh into the pod, the peak used memory before eviction is only about 3GB...
I have tried running some dummy code as defined in Dockerfile "CMD tail -f /dev/null", then connect to the pod and running scraper script manually, with success - being able to finish with peak mem usage of 9 GB.
docker file:
CMD python3 scraper.py
> Managed pods Revision Name Status Restarts Created on 1
> scraper-df68b65bf-gbhms Running 0 Sep 2, 2019, 2:59:59 PM 1
> scraper-df68b65bf-gktqw Running 0 Sep 2, 2019, 2:59:59 PM 1
> scraper-df68b65bf-z4kjb Running 0 Sep 2, 2019, 2:59:59 PM 1
> scraper-df68b65bf-wk6td Running 0 Sep 2, 2019, 3:00:45 PM 1
> scraper-df68b65bf-xqm7h Running 0 Sep 2, 2019, 3:00:45 PM
My guess is there are many instances of my app running inside of space of 13 GB in many parallel pods? How do I run single instance of my app on gke so I have all memory from vm available to it?
Do you have replica count set to one in your deployment.yaml file?
spec:
replicas: 1
In case it is HorizontalPodAutoscaler you can edit it by:
Get the HorizontalPodAutoscaler
kubectl get HorizontalPodAutoscaler
Edit it by using the edit command
kubectl edit HorizontalPodAutoscaler <pod scaler name>
And the end result of HorizontalPodAutoscaler looks like this
spec:
maxReplicas: 1
minReplicas: 1
Awesome reply #Bismal.
#Wotjas, just to add my 2 cents; you can use the Cloud Console to set the min and max values, you just need to go to:
Cloud Menu -> GKE -> Workloads -> Actions -> Scale
Set the desired values, then save.
More detailed information can be found in this document [1].
[1] https://cloud.google.com/kubernetes-engine/docs/how-to/scaling-apps

ChromeHeadless giving timeout when running GitLab CI pipeline with Docker Centos 7.5 image

So I am trying to run Karma tests for an Angular 6 application on a docker image with Centos 7.5 using a pipeline for GitLab CI.
The problem is
30 08 2018 07:09:55.222:WARN [launcher]: ChromeHeadless have not
captured in 60000 ms, killing.
30 08 2018 07:09:55.244:INFO [launcher]: Trying to start ChromeHeadless again (1/2).
30 08 2018 07:10:55.264:WARN [launcher]: ChromeHeadless have not captured in 60000 ms, killing.
30 08 2018 07:10:55.277:INFO [launcher]: Trying to start ChromeHeadless again (2/2).
30 08 2018 07:11:55.339:WARN [launcher]: ChromeHeadless have not captured in 60000 ms, killing.
30 08 2018 07:11:55.355:ERROR [launcher]: ChromeHeadless failed 2 times (timeout). Giving up.
ERROR: Job failed: exit code 1
I run the tests with ng test --browsers ChromeHeadlessNoSandbox --watch=false --code-coverage
Karma conf :
browsers: ['Chrome', 'ChromeHeadlessNoSandbox'],
customLaunchers: {
ChromeHeadlessNoSandbox: {
base: 'ChromeHeadless',
flags: [
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-gpu',
'--remote-debugging-port=9222',
],
},
},
Also on the Image the docker file I install the latest chrome stable:
RUN wget https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm
RUN yum -y localinstall google-chrome-stable_current_x86_64.rpm && yum clean all
Do you have any idea about why its giving timeout? In the local environment, it runs perfectly.
I have resolved the same issue. My test suits runs perfectly in my local machine but when running these tests in a docker container, it fails due to connection timeout. (i'm using Gitlab runner also, and My docker image is based on circleci/node:8.9.2-browsers)
After investigating this issue, i found that the chrome bin variable path was missed in the docker file. so i fixed the issue by adding:
export CHROME_BIN=/usr/bin/google-chrome
to my .gitlab-ci.yml in before_script
# TESTING
unit_test_client:
stage: test
before_script:
- export CHROME_BIN=/usr/bin/google-chrome
script:
- npm run test:client
You can also fix your issue by setting the CHROME_BIN by
process.env.CHROME_BIN='/usr/bin/google-chrome' in the top of your karma config file.
In this case, you need to handle the case that you are running the test in your local machine, probably it should match the same chrome path
We had the same issue and resolved it by adding the following flag in the karma.config.js
headlessChrome: {
base: "ChromeHeadless",
flags: [
"--no-sandbox",
"--no-proxy-server",
"--disable-web-security",
"--disable-gpu",
"--js-flags=--max-old-space-size=8196", // THIS LINE FIXED IT!!!
],
You might want to check the console output before Karma attempts to launch the browser. I got stuck for hours on constant timeouts when there were invalid import paths in my Angular application. Karma prints such errors in the console but continues by launching the browsers as if the errors had no importance. This is a bit misleading, but worth checking before blaming the browsers.
Second, machine performance: Every once in a while you might get a timeout at the first launch attempt, but the next attempt will then liekly succeed. Dockerization might decrease your performance, but not much. A headless Chrome should run fast even with minimal setup.
If your browser would not be installed or addressable, then there should be some output regarding that as well.
Hi i solved this issue this way:
In my case the client had a proxy-blocker to manage the networking configurations. So i provided the proxy as server in the customLauncher flag and works perfectly, but only in the pipeline, locally the tests stopped. but it's just take off the proxy that runs locally.
Before: This way the tests runs locally, but do not works in the jenkins pipeline (npm run test)
browsers: ['MyChromeHeadless'],
customLaunchers: {
MyChromeHeadless: {
base: 'ChromeHeadless',
flags: [
'--no-sandbox',
'--proxy-auto-detect'
]
}
}
After: This way the tests runs in the pipeline, but do not works locally cuz i'm not under the client networking with access to the proxy provied, if you are, should work.
browsers: ['MyChromeHeadless'],
customLaunchers: {
MyChromeHeadless: {
base: 'ChromeHeadless',
flags: [
'--no-sandbox',
'--proxy-bypass-list=*',
'--proxy-server=http://proxy.your.company'
]
}
}

CouchDB 1.3.1 on Centos 6.4

I compiled CouchDB and installed. It seems to work great except when I use views on the database, then it just spins the wheel and nothing happens and the cpu load spikes to 100% and slowly it eats away all memory and starts to swap a lot which in return increases the cpu load.
I have tried both with the js-1.70-12 that comes with centos 6.4, as well as build and install my own js-1.85-1. All erlang packages are installed from epel :
erlang-crypto-R14B-04.2.el6.x86_64
erlang-syntax_tools-R14B-04.2.el6.x86_64
erlang-mnesia-R14B-04.2.el6.x86_64
erlang-ssl-R14B-04.2.el6.x86_64
erlang-cosProperty-R14B-04.2.el6.x86_64
erlang-asn1-R14B-04.2.el6.x86_64
erlang-cosEventDomain-R14B-04.2.el6.x86_64
erlang-eunit-R14B-04.2.el6.x86_64
erlang-erl_docgen-R14B-04.2.el6.x86_64
erlang-toolbar-R14B-04.2.el6.x86_64
erlang-debugger-R14B-04.2.el6.x86_64
erlang-tools-R14B-04.2.el6.x86_64
erlang-typer-R14B-04.2.el6.x86_64
erlang-megaco-R14B-04.2.el6.x86_64
erlang-oauth-1.1.1-1.el6.x86_64
erlang-stdlib-R14B-04.2.el6.x86_64
erlang-hipe-R14B-04.2.el6.x86_64
erlang-kernel-R14B-04.2.el6.x86_64
erlang-runtime_tools-R14B-04.2.el6.x86_64
erlang-snmp-R14B-04.2.el6.x86_64
erlang-public_key-R14B-04.2.el6.x86_64
erlang-inets-R14B-04.2.el6.x86_64
erlang-ibrowse-2.2.0-4.el6.x86_64
erlang-cosEvent-R14B-04.2.el6.x86_64
erlang-cosNotification-R14B-04.2.el6.x86_64
erlang-edoc-R14B-04.2.el6.x86_64
erlang-otp_mibs-R14B-04.2.el6.x86_64
erlang-cosFileTransfer-R14B-04.2.el6.x86_64
erlang-cosTransactions-R14B-04.2.el6.x86_64
erlang-inviso-R14B-04.2.el6.x86_64
erlang-jinterface-R14B-04.2.el6.x86_64
erlang-erl_interface-R14B-04.2.el6.x86_64
erlang-diameter-R14B-04.2.el6.x86_64
erlang-gs-R14B-04.2.el6.x86_64
erlang-tv-R14B-04.2.el6.x86_64
erlang-appmon-R14B-04.2.el6.x86_64
erlang-odbc-R14B-04.2.el6.x86_64
erlang-wx-R14B-04.2.el6.x86_64
erlang-et-R14B-04.2.el6.x86_64
erlang-observer-R14B-04.2.el6.x86_64
erlang-sasl-R14B-04.2.el6.x86_64
erlang-dialyzer-R14B-04.2.el6.x86_64
erlang-common_test-R14B-04.2.el6.x86_64
erlang-os_mon-R14B-04.2.el6.x86_64
erlang-examples-R14B-04.2.el6.x86_64
erlang-compiler-R14B-04.2.el6.x86_64
erlang-erts-R14B-04.2.el6.x86_64
erlang-xmerl-R14B-04.2.el6.x86_64
erlang-orber-R14B-04.2.el6.x86_64
erlang-cosTime-R14B-04.2.el6.x86_64
erlang-ssh-R14B-04.2.el6.x86_64
erlang-docbuilder-R14B-04.2.el6.x86_64
erlang-percept-R14B-04.2.el6.x86_64
erlang-parsetools-R14B-04.2.el6.x86_64
erlang-ic-R14B-04.2.el6.x86_64
erlang-pman-R14B-04.2.el6.x86_64
erlang-webtool-R14B-04.2.el6.x86_64
erlang-test_server-R14B-04.2.el6.x86_64
erlang-reltool-R14B-04.2.el6.x86_64
erlang-R14B-04.2.el6.x86_64
erlang-mochiweb-1.4.1-5.el6.x86_64
Every thing configures and makes and installs as expected. You can dump data into the database, you can create documents and all that. But I can not run any view, temporary or not.
The only error I see in the logs is like this one, and it is a lot of these errors :
[Sun, 18 Aug 2013 23:10:38 GMT] [error] [<0.124.0>] {error_report,<0.30.0>,
{<0.124.0>,crash_report,
[[{initial_call,
{mochiweb_socket_server,init,['Argument__1']}},
{pid,<0.124.0>},
{registered_name,[]},
{error_info,
{exit,eaddrinuse,
[{gen_server,init_it,6},
{proc_lib,init_p_do_apply,3}]}},
{ancestors,
[couch_secondary_services,couch_server_sup,
<0.31.0>]},
{messages,[]},
{links,[<0.93.0>]},
{dictionary,[]},
{trap_exit,true},
{status,running},
{heap_size,987},
{stack_size,24},
{reductions,459}],
[]]}}
But I have no idea what they mean.
Do I need to compile and install erlang as well ? All the above packages or just erlang ?
Your compilation and installation looks fine. At least your error (note: eaddrinuse in traceback) is about that there is some process that listens same address and port as your CouchDB try to. Check other listening processes with netstat -anp command or change CouchDB's listen port to different.

hadoop only launch local job by default why?

I have written my own hadoop program and I can run using pseudo distribute mode in my own laptop, however, when I put the program in the cluster which can run example jar of hadoop, it by default launches the local job though I indicate the hdfs file path, below is the output, give suggestions?
./hadoop -jar MyRandomForest_oob_distance.jar hdfs://montana-01:8020/user/randomforest/input/genotype1.txt hdfs://montana-01:8020/user/randomforest/input/phenotype1.txt hdfs://montana-01:8020/user/randomforest/output1_distance/ hdfs://montana-01:8020/user/randomforest/input/genotype101.txt hdfs://montana-01:8020/user/randomforest/input/phenotype101.txt 33 500 1
12/03/16 16:21:25 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
12/03/16 16:21:25 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
12/03/16 16:21:25 INFO mapred.JobClient: Running job: job_local_0001
12/03/16 16:21:25 INFO mapred.MapTask: io.sort.mb = 100
12/03/16 16:21:25 INFO mapred.MapTask: data buffer = 79691776/99614720
12/03/16 16:21:25 INFO mapred.MapTask: record buffer = 262144/327680
12/03/16 16:21:25 WARN mapred.LocalJobRunner: job_local_0001
java.io.FileNotFoundException: File /user/randomforest/input/genotype1.txt does not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:125)
at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:356)
at Data.Data.loadData(Data.java:103)
at MapReduce.DearMapper.loadData(DearMapper.java:261)
at MapReduce.DearMapper.setup(DearMapper.java:332)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
12/03/16 16:21:26 INFO mapred.JobClient: map 0% reduce 0%
12/03/16 16:21:26 INFO mapred.JobClient: Job complete: job_local_0001
12/03/16 16:21:26 INFO mapred.JobClient: Counters: 0
Total Running time is: 1 secs
LocalJobRunner has been chosen as your configuration most probably has the mapred.job.tracker property set to local or has not been set at all (in which case the default is local). To check, go to "wherever you extracted/installed hadoop"/etc/hadoop/ and see if the file mapred-site.xml exists (for me it did not, a file called mapped-site.xml.template was there). In that file (or create it if it doesn't exist) make sure it has the following property:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
See the source for org.apache.hadoop.mapred.JobClient.init(JobConf)
What is the value of this configuration property in the hadoop configuration on the machine you are submitting this from? Also confirm that the hadoop executable you are running references this configuration (and that you don't have 2+ installations configured differently) - type which hadoop and trace any symlinks you come across.
Alternatively you can override this when you submit your job, if you know the JobTracker host and port number using the -jt option:
hadoop jar MyRandomForest_oob_distance.jar -jt hostname:port hdfs://montana-01:8020/user/randomforest/input/genotype1.txt hdfs://montana-01:8020/user/randomforest/input/phenotype1.txt hdfs://montana-01:8020/user/randomforest/output1_distance/ hdfs://montana-01:8020/user/randomforest/input/genotype101.txt hdfs://montana-01:8020/user/randomforest/input/phenotype101.txt 33 500 1
If you're using Hadoop 2 and your job is running locally instead of on the cluster, ensure that you have setup mapred-site.xml to contain the mapreduce.framework.name property with a value of yarn. You also need to set up an aux-service in yarn-site.xml
Checkout the Cloudera Hadoop 2 operator migration blog for more information.
I had the same problem that every mapreduce v2 (mrv2) or yarn task only ran with the mapred.LocalJobRunner
INFO mapred.LocalJobRunner: Starting task: attempt_local284299729_0001_m_000000_0
The Resourcemanager and Nodemanagers were accessible and the mapreduce.framework.name was set to yarn.
Setting the HADOOP_MAPRED_HOME before executing the job fixed the problem for me.
export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
cheers
dan

Resources