I'm having a problem when trying to sync beta => alpha using mutagen on OSX
When a file is created on beta (Docker container) it won't get synchronized into alpha (host) resulting in error:
public/asd.txt: unable to create file: unable to set staged file permissions: unable to set ownership information: chown /Users/xxx/.mutagen/staging/sync_ORDrmPtp9gAdxdF0NC6EkEp3Eqaf9MMUgaXGn2CJx8H-alpha/da/3a404e69fec4666cf122d6588790e042a663d9d5_da39a3ee5e6b4b0d3255bfef95601890afd80709: operation not permitted
My mutagen.yml file says:
defaults:
permissions:
defaultOwner: "id:10"
defaultGroup: "id:3"
ignore:
paths:
- .DS_Store
code:
alpha: "./"
beta: "docker://php/app"
mode: "two-way-resolved"
permissions:
defaultFileMode: 666
defaultDirectoryMode: 777
ignore:
vcs: true
paths:
- "/vendor/"
Directory /Users/xxx/.mutagen/staging is owned by same user starting mutagen project start
Also tried starting mutagen sync as superuser with same result.
Any answers appreciated!
The problem is that mutagen tries to apply the defaultOwner on the host system also, but chown is not permitted on Mac (see this question for further information).
The solution is to set the defaultOwner and defaultGroup for the beta system only like this:
code:
alpha: "./"
beta: "docker://php/app"
mode: "two-way-resolved"
ignore:
vcs: true
paths:
- "/vendor/"
configurationBeta:
permissions:
defaultOwner: "id:10"
defaultGroup: "id:3"
This way the owner and group settings remain "default" for the host system, which is the user running mutagen.
I found it really helpful to confirm my settings for a running sync with mutagen sync monitor -l. This helped me to resolve this issue.
If I run sitespeed within a docker and obtain the following output:
Google Chrome 63.0.3239.84
Mozilla Firefox 54.0.1
[2017-12-27 18:10:01] INFO: Versions OS: linux 4.9.49-moby nodejs: v8.9.1 sitespeed.io: 6.2.2 browsertime: 2.1.2 coach: 1.1.1
[2017-12-27 18:10:02] INFO: Starting chrome for analysing https://www.google.com/ 3 time(s)
[2017-12-27 18:10:02] INFO: Testing url https://www.google.com/ run 1
[2017-12-27 18:10:18] INFO: Testing url https://www.google.com/ run 2
[2017-12-27 18:10:29] INFO: Testing url https://www.google.com/ run 3
[2017-12-27 18:10:40] INFO: 18 requests, 584.40 kb, backEndTime: 158ms (±6.42ms), firstPaint: 321ms (±3.32ms), firstVisualChange: 389ms (±7.78ms), DOMContentLoaded: 376ms (±3.63ms), Load: 529ms (±91.22ms), speedIndex: 477 (±9.23), visualComplete85: 422ms (±7.90ms), lastVisualChange: 2.65s (±137.82ms), rumSpeedIndex: 321 (±3.32) (3 runs)
[2017-12-27 18:10:43] INFO: HTML stored in /sitespeed.io/reports
[2017-12-27 18:10:43] INFO: Finished analysing https://www.google.com/
Where are the HTML logs stored? '/sitespeed.io/reports', I'm not sure where to go to access this.
The example from the Docker hub page says:
docker run --shm-size=1g --rm -v "$(pwd)":/sitespeed.io sitespeedio/sitespeed.io http://www.sitespeed.io/ -b chrome
the --rm part of the command means that the container is removed after it finishes. So you will not be able to get "inside" the container and access the results, but...
the -v "$(pwd)":/sitespeed.io part means that
-v: a volume is created
"$(pwd)": at your working directory
pointing to the /sitespeed.io folder of the container
Practically this means that if you run the above command at a directory named /my-docker-tests, (despite of the fact that the container is removed) you will be able to see the result files on your host's file system at /my-docker-tests/sitespeed-result/
I'm trying to add local files via the Beeline client, however I keep running into an issue where it tells me the file does not exist.
[test#test-001 tmp]$ touch /tmp/m.py
[test#test-001 tmp]$ stat /tmp/m.py
File: ‘/tmp/m.py’
Size: 0 Blocks: 0 IO Block: 4096 regular empty file
Device: 801h/2049d Inode: 34091464 Links: 1
Access: (0664/-rw-rw-r--) Uid: ( 1036/ test) Gid: ( 1037/ test)
Context: unconfined_u:object_r:user_tmp_t:s0
Access: 2017-02-27 22:04:06.527970709 +0000
Modify: 2017-02-27 22:04:06.527970709 +0000
Change: 2017-02-27 22:04:06.527970709 +0000
Birth: -
[test#test-001 tmp]$ beeline -u jdbc:hive2://hs2-test:10000/default -n r-zubis
Connecting to jdbc:hive2://hs2-test:10000/default
Connected to: Apache Hive (version 1.2.1.2.3.0.0-2557)
Driver: Hive JDBC (version 1.2.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.2.1 by Apache Hive
0: jdbc:hive2://hs2-test:10000/def> ADD FILE '/tmp/m.py';
Error: Error while processing statement: '/tmp/m.py' does not exist (state=,code=1)
0: jdbc:hive2://hs2-test:10000/def>
What's the issue?
You can only add files on the box HiveServer2 is running on. (and I needed to remove the quotes) I found it via a blog comment on Cloudera. Not sure why this isn't in the Beeline docs.
If, like me you are stuck in the position where HiveServer2 is running remotely, beeline will let you load the files from HDFS,
hdfs fs -put /tmp/m.py
then
beeline> add file hdfs:/user/homedir/m.py;
I am having issues deploying an app of mine to Google AppEngine. It is currently live on Heroku with no issues, so I dont think the issue is with the app. I emailed back and forth with Google support for a while, and one of the things they had me try was to deploy one of their starter apps. For simplicity sake, I am asking this question from the point of view that I am simply trying to deploy their test application, not my real app.
I have downloaded the hello world app from Github, per Google's instructions.
When I go to deploy, it sticks on a certain step and will not go any further.
I have done a fresh install of the google cloud tools and have made sure to run gcloud init.
I downloaded the app from that URL and am using the included app.yaml
Here is my entire CLI output. It will sit at the same status for HOURS.
Thanks for the help.
--Tim
Timothys-MBP:2-hello-world tgwright$ gcloud auth login
Your browser has been opened to visit:
https://accounts.google.com/o/oauth2/auth?redirect_uri=xxxxxx
Saved Application Default Credentials.
You are now logged in as [timothy.xxxxx#gmail.com].
Your current project is [xxx-gae-02]. You can change this setting by running:
$ gcloud config set project PROJECT_ID
Timothys-MBP:2-hello-world tgwright$ gcloud init
Welcome! This command will take you through the configuration of gcloud.
Settings from your current configuration [default] are:
Your active configuration is: [default]
[app]
suppress_change_warning = true
[core]
account = timothy.xxxxx#gmail.com
disable_usage_reporting = False
project = xxx-gae-02
Pick configuration to use:
[1] Re-initialize this configuration [default] with new settings
[2] Create a new configuration
Please enter your numeric choice: 1
Your current configuration has been set to: [default]
Pick credentials to use:
[1] timothy.xxxxx#gmail.com
[2] Log in with new credentials
Please enter your numeric choice: 1
You are now logged in as: [timothy.g.wright#gmail.com]
Pick cloud project to use:
[1] [xxx-gae-01]
[2] [xxxx-cloud01]
Please enter your numeric choice: 1
Your current project has been set to: [xxx-gae-01].
Which compute zone would you like to use as project default?
[1] [asia-east1-b]
[2] [asia-east1-a]
[3] [asia-east1-c]
[4] [europe-west1-c]
[5] [europe-west1-d]
[6] [europe-west1-b]
[7] [us-central1-c]
[8] [us-central1-b]
[9] [us-central1-f]
[10] [us-central1-a]
[11] [us-east1-d]
[12] [us-east1-b]
[13] [us-east1-c]
[14] Do not set default zone
Please enter your numeric choice: 9
Your project default compute zone has been set to [us-central1-f].
You can change it by running [gcloud config set compute/zone NAME].
Your project default compute region has been set to [us-central1].
You can change it by running [gcloud config set compute/region NAME].
Do you want to use Google's source hosting (see
"https://cloud.google.com/source-repositories/docs/") (Y/n)? n
gcloud has now been configured!
You can use [gcloud config] to change more gcloud settings.
Your active configuration is: [default]
[app]
suppress_change_warning = true
[compute]
region = us-central1
zone = us-central1-f
[core]
account = timothy.xxxx#gmail.com
disable_usage_reporting = False
project = xxx-gae-01
Timothys-MBP:2-hello-world tgwright$ gcloud auth list
Credentialed accounts:
- timothy.xxx#gmail.com (active)
To set the active account, run:
$ gcloud config set account ``ACCOUNT''
Timothys-MBP:2-hello-world tgwright$ gcloud config list
Your active configuration is: [default]
[app]
suppress_change_warning = true
[compute]
region = us-central1
zone = us-central1-f
[core]
account = timothy.xxxx#gmail.com
disable_usage_reporting = False
project = xxx-gae-01
Timothys-MBP:2-hello-world tgwright$ gcloud preview app deploy --verbosity debug
DEBUG: Running gcloud.preview.app.deploy with Namespace(__calliope_internal_deepest_parser=ArgumentParser(prog='gcloud.preview.app.deploy', usage=None, description="*(BETA)* This command is used to deploy both code and configuration to the App Engine\nserver. As an input it takes one or more ``DEPLOYABLES'' that should be\nuploaded. A ``DEPLOYABLE'' can be a module's .yaml file or a configuration's\n.yaml file.", version=None, formatter_class=<class 'argparse.HelpFormatter'>, conflict_handler='error', add_help=False), account=None, authority_selector=None, authorization_token_file=None, bucket=None, cmd_func=<bound method Command.Run of <googlecloudsdk.calliope.backend.Command object at 0x10e936590>>, command_path=['gcloud', 'preview', 'app', 'deploy'], configuration=None, credential_file_override=None, deployables=[], docker_build=None, document=None, force=False, format=None, h=None, help=None, http_timeout=None, ignore_bad_certs=False, image_url=None, log_http=None, project=None, promote=None, quiet=None, repo_info_file=None, server=None, stop_previous_version=None, trace_email=None, trace_log=False, trace_token=None, user_output_enabled=None, verbosity='debug', version=None).
DEBUG: API endpoint: [https://appengine.googleapis.com/], API version: [v1beta4]
You are about to deploy the following modules:
- xxx-gae-01/default (from [/Users/tgwright/Google Drive/Sites/2-hello-world/app.yaml])
Deployed URL: [https://xxx-gae-01.appspot.com]
Do you want to continue (Y/n)? y
Beginning deployment...
DEBUG: Host: appengine.google.com
DEBUG: _Authenticate configuring auth; needs_auth=False
DEBUG: Sending request to https://appengine.google.com/api/vms/prepare?app_id=ebt-gae-01 headers={'X-appcfg-api-version': '1', 'content-length': '0', 'Content-Type': 'application/octet-stream'} body=
INFO: Attempting refresh to obtain initial access_token
INFO: Refreshing access_token
If this is your first deployment, this may take a while...\DEBUG: Got response: {bucket: vm-containers.xxx-gae-01.appspot.com, path: /containers}
If this is your first deployment, this may take a while...done.
INFO: Not checking for [go] because runtime is [ruby]
I am able to successfully run the WordCount example using DataflowPipelineRunner with the maven exec:java command shown in the docs.
However, when I attempt to run it in my own 1.8 VM, it doesn't work. I am using these args (on Windows):
--project=highfive-metrics-service \
--stagingLocation=gs://highfive-dataflow-test/staging \
--runner=BlockingDataflowPipelineRunner \
--gCloudPath=C:/Progra~1/Google/CloudS~1/google-cloud-sdk/bin/gcloud.cmd
I get the following error:
2014-12-24T04:53:34.849Z: (5eada047929dcead): Workflow failed. Causes: (5eada047929dce2e): There was a problem creating the GCE VMs or starting Dataflow on the VMs so no data was processed. Possible causes:
1. A failure in user code on in the worker.
2. A failure in the Dataflow code.
Next Steps:
1. Check the GCE serial console for possible errors in the logs.
2. Look for similar issues on http://stackoverflow.com/questions/tagged/google-cloud-dataflow.
Prior to the subsequent cleanup, I observed three harness instances on GCE as expected. Looking at the serial console for the first one, wordcount-jroy-1224043800-12232038-8cfa-harness-0, I see "normal" (comparing to what I see when running with Maven) looking output that ends with:
Dec 24 04:38:45 [ 16.443484] IPv6: ADDRCONF(NETDEV_CHANGE): docker0: link becomes ready
wordcount-jroy-1224043800-12232038-8cfa-harness-0 kernel: [ 16.438005] IPv6: ADDRCONF(NETDEV_CHANGE): veth30b3796: link becomes ready
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 kernel: [ 16.439395] docker0: port 1(veth30b3796) entered forwarding state
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 kernel: [ 16.440262] docker0: port 1(veth30b3796) entered forwarding state
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 kernel: [ 16.443484] IPv6: ADDRCONF(NETDEV_CHANGE): docker0: link becomes ready
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 12898 100 12898 0 0 2009k 0 --:--:-- --:--:-- --:--:-- 3148k
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: {"attributes":{"config":"{\"alsologtostderr\":true,\"base_task_dir\":\"/tmp/tasks/\",\"commandlines_file_name\":\"commandlines.txt\",\"continue_on_exception\":true,\"dataflow_api_endpoint\":\"https://www.googleapis.com/\",\"dataflow_api_version\":\"v1beta1\",\"log_dir\":\"/dataflow/logs/taskrunner/harness\",\"log_to_gcs\":true,\"log_to_serialconsole\":true,\"parallel_worker_flags\":{\"job_id\":\"2014-12-23_20_38_16.593375-08_10.48.106.68_-469744588\",\"project_id\":\"highfive-metrics-service\",\"reporting_enabled\":true,\"root_url\":\"https://www.googleapis.com/\",\"service_path\":\"dataflow/v1b3/projects/\",\"temp_gcs_directory\":\"gs://highfive-dataflow-test/staging\",\"worker_id\":\"wordcount-jroy-1224043800-12232038-8cfa-harness-0\"},\"project_id\":\"highfive-metrics-service\",\"python_harness_cmd\":\"python_harness_main\",\"scopes\":[\"https://www.googleapis.com/auth/devstorage.full_control\",\"https://www.googleapis.com/auth/cloud-platform\"],\"task_group\":\"nogroup\",\"task_user\":\"nobody\",\"temp_g
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 goo[ 16.494163] device veth29b6136 entered promiscuous mode
gle: cs_directory\":\"gs://highfive-dataflow-test/staging\",\"vm_id\":\"wordcoun[ 16.505311] IPv6: ADDRCONF(NETDEV_UP): veth29b6136: link is not ready
[ 16.507623] docker0: port 2(veth29b6136) entered forwarding state
t-jroy-122404380[ 16.507633] docker0: port 2(veth29b6136) entered forwarding state
0-12232038-8cfa-harness-0\"}","google-container-manifest":"\ncontainers:\n-\n env:\n -\n name: GCS_BUCKET\n value: dataflow-docker-images\n image: google/docker-registry\n imagePullPolicy: PullNever\n name: repository\n ports:\n -\n containerPort: 5000\n hostPort: 5000\n name: registry\n-\n image: localhost:5000/dataflow/taskrunner:20141217-rc00 \n imagePullPolicy: PullIfNotPresent\n name: taskrunner\n volumeMounts:\n -\n mountPath: /dataflow/logs/taskrunner/harness\n name: dataflowlogs-harness\n-\n env:\n -\n name: LOG_DIR\n value: /dataflow/logs\n image: localhost:5000/dataflow/shuffle:20141217-rc00 \n imagePullPolicy: PullIfNotPresent\n name: shuffle\n ports:\n -\n containerPort: 12345\n hostPort: 12345\n name: shuffle1\n -\n containerPort: 22349\n hostPort: 22349\n name: shuffle2\n volumeMounts:\n -\n mountPath: /var/shuffle\n name: dataflow-shuffle\n -\n mountPath: /dataflow/logs\n name: dataflow-logs\nversion: v1
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: beta2\nvolumes:\n-\n name: dataflowlogs-harness\n source:\n hostDir:\n path: /var/log/dataflow/taskrunner/harness\n-\n name: dataflow-shuffle\n source:\n hostDir:\n path: /dataflow/shuffle\n-\n name: dataflow-logs\n source:\n hostDir:\n path: /var/log/dataflow/shuffle\n","job_id":"2014-12-23_20_38_16.593375-08_10.48.106.68_-469744588","packages":"gs://dataflow-releases-prod/worker_packages/NOTICES.shuffle|NOTICES.shuffler|gs://highfive-dataflow-test/staging/access-bridge-64-fE-vq3Wgxy5FvnwmA5YdzQ.jar|access-bridge-64-fE-vq3Wgxy5FvnwmA5YdzQ.jar|gs://highfive-dataflow-test/staging/avro-1.7.7-dTlef6huetK-4IFERNhcqA.jar|avro-1.7.7-dTlef6huetK-4IFERNhcqA.jar|gs://highfive-dataflow-test/staging/charsets-7HC8Y2_U4k8yfkY6e4lxnw.jar|charsets-7HC8Y2_U4k8yfkY6e4lxnw.jar|gs://highfive-dataflow-test/staging/cldrdata-A4PVsm4mesLVUWOTKV5dhQ.jar|cldrdata-A4PVsm4mesLVUWOTKV5dhQ.jar|gs://highfive-dataflow-test/staging/commons-codec-1.3-2I5AW2KkklMQs3emwoFU5Q.jar|commons-codec-1.3-2I5AW2KkklMQs3emwoFU5Q.jar|gs://highfive-dataf
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: low-test/staging/commons-compress-1.4.1-uyvcB16Wfp4wnt8X1Uqi4w.jar|commons-compress-1.4.1-uyvcB16Wfp4wnt8X1Uqi4w.jar|gs://highfive-dataflow-test/staging/commons-logging-1.1.1-blBISC6STJhwBOT8Ksr3NQ.jar|commons-logging-1.1.1-blBISC6STJhwBOT8Ksr3NQ.jar|gs://highfive-dataflow-test/staging/dataflow-test-YIJKUxARCp14MLdWzNdBdQ.zip|dataflow-test-YIJKUxARCp14MLdWzNdBdQ.zip|gs://highfive-dataflow-test/staging/deploy-eLnif2izXW_mrleXudK0Eg.jar|deploy-eLnif2izXW_mrleXudK0Eg.jar|gs://highfive-dataflow-test/staging/dnsns-hmxeUSrhtJou0Wo-UoCjTw.jar|dnsns-hmxeUSrhtJou0Wo-UoCjTw.jar|gs://highfive-dataflow-test/staging/google-api-client-1.19.0-YgeHY_Y9dPd2PwGBWwvmmw.jar|google-api-client-1.19.0-YgeHY_Y9dPd2PwGBWwvmmw.jar|gs://highfive-dataflow-test/staging/google-api-services-bigquery-v2-rev167-1.19.0-mNojB6wqlFqAd2G9Zo7o5w.jar|google-api-services-bigquery-v2-rev167-1.19.0-mNojB6wqlFqAd2G9Zo7o5w.jar|gs://highfive-dataflow-test/staging/google-api-services-compute-v1-rev34-1.19.0-yR5ItN9uOowLPyMiTckyCA.jar|google-api-services
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: -compute-v1-rev34-1.19.0-yR5ItN9uOowLPyMiTckyCA.jar|gs://highfive-dataflow-test/staging/google-api-services-dataflow-v1beta3-rev1-1.19.0-Cg8Pyd4F0t7yqSE4E7v7Rg.jar|google-api-services-dataflow-v1beta3-rev1-1.19.0-Cg8Pyd4F0t7yqSE4E7v7Rg.jar|gs://highfive-dataflow-test/staging/google-api-services-datastore-protobuf-v1beta2-rev1-2.1.0-UxLefoYWxF5K1EpQjKMJ4w.jar|google-api-services-datastore-protobuf-v1beta2-rev1-2.1.0-UxLefoYWxF5K1EpQjKMJ4w.jar|gs://highfive-dataflow-test/staging/google-api-services-pubsub-v1beta1-rev9-1.19.0-7E1jg5ZyfaqZBCHY18fPkQ.jar|google-api-services-pubsub-v1beta1-rev9-1.19.0-7E1jg5ZyfaqZBCHY18fPkQ.jar|gs://highfive-dataflow-test/staging/google-api-services-storage-v1-rev11-1.19.0-8roIrNilTlO2ZqfGfOaqkg.jar|google-api-services-storage-v1-rev11-1.19.0-8roIrNilTlO2ZqfGfOaqkg.jar|gs://highfive-dataflow-test/staging/google-cloud-dataflow-java-examples-all-manual_build-A9j6W_hzOlq6PBrg1oSIAQ.jar|google-cloud-dataflow-java-examples-all-manual_build-A9j6W_hzOlq6PBrg1oSIAQ.jar|gs://highfive-dataf
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: low-test/staging/google-cloud-dataflow-java-examples-all-manual_build-tests-iIdI-AhKWiVKTuJzU5JxcQ.jar|google-cloud-dataflow-java-examples-all-manual_build-tests-iIdI-AhKWiVKTuJzU5JxcQ.jar|gs://highfive-dataflow-test/staging/google-cloud-dataflow-java-sdk-all-alpha-PqdZNVZwhs6ixh6de6vM7A.jar|google-cloud-dataflow-java-sdk-all-alpha-PqdZNVZwhs6ixh6de6vM7A.jar|gs://highfive-dataflow-test/staging/google-http-client-1.19.0-1Vc3U5mogjNLbpTK7NVwDg.jar|google-http-client-1.19.0-1Vc3U5mogjNLbpTK7NVwDg.jar|gs://highfive-dataflow-test/staging/google-http-client-jackson-1.15.0-rc-oW6nFU6Gme53SYGJ9KlNbA.jar|google-http-client-jackson-1.15.0-rc-oW6nFU6Gme53SYGJ9KlNbA.jar|gs://highfive-dataflow-test/staging/google-http-client-jackson2-1.19.0-AOUP2FfuHtACTs_0sul54A.jar|google-http-client-jackson2-1.19.0-AOUP2FfuHtACTs_0sul54A.jar|gs://highfive-dataflow-test/staging/google-http-client-protobuf-1.15.0-rc-xYoprQdNcvzuQGZXvJ3ZaQ.jar|google-http-client-protobuf-1.15.0-rc-xYoprQdNcvzuQGZXvJ3ZaQ.jar|gs://highfive-dataflow-test/st
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: aging/google-oauth-client-1.19.0-b3S5WqgD7iWrwg38pfg3Xg.jar|google-oauth-client-1.19.0-b3S5WqgD7iWrwg38pfg3Xg.jar|gs://highfive-dataflow-test/staging/google-oauth-client-java6-1.19.0-cP8xzICJnsNlhTfaS0egcg.jar|google-oauth-client-java6-1.19.0-cP8xzICJnsNlhTfaS0egcg.jar|gs://highfive-dataflow-test/staging/guava-18.0-HtxcCcuUqPt4QL79yZSvag.jar|guava-18.0-HtxcCcuUqPt4QL79yZSvag.jar|gs://highfive-dataflow-test/staging/hamcrest-all-1.3-n3_QBeS4s5a8ffbBPQIpFQ.jar|hamcrest-all-1.3-n3_QBeS4s5a8ffbBPQIpFQ.jar|gs://highfive-dataflow-test/staging/hamcrest-core-1.3-DvCZoZPq_3EWA4TcZlVL6g.jar|hamcrest-core-1.3-DvCZoZPq_3EWA4TcZlVL6g.jar|gs://highfive-dataflow-test/staging/httpclient-4.0.1-sfocsPjEBE7ppkUpSIJZkA.jar|httpclient-4.0.1-sfocsPjEBE7ppkUpSIJZkA.jar|gs://highfive-dataflow-test/staging/httpcore-4.0.1-_SGEPUOMREqA8u_h7qy9_w.jar|httpcore-4.0.1-_SGEPUOMREqA8u_h7qy9_w.jar|gs://highfive-dataflow-test/staging/idea_rt-6II88e1BKUeCOQqcrZht-w.jar|idea_rt-6II88e1BKUeCOQqcrZht-w.jar|gs://highfive-dataflow-test/staging/jacce
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: ss-laKenN34W6jKKivkBUzVcA.jar|jaccess-laKenN34W6jKKivkBUzVcA.jar|gs://highfive-dataflow-test/staging/jackson-annotations-2.4.2-7cAfM1zz0nmoSOC_NlRIcw.jar|jackson-annotations-2.4.2-7cAfM1zz0nmoSOC_NlRIcw.jar|gs://highfive-dataflow-test/staging/jackson-core-2.4.2-3CV4j5-qI7Y-1EADAiakmw.jar|jackson-core-2.4.2-3CV4j5-qI7Y-1EADAiakmw.jar|gs://highfive-dataflow-test/staging/jackson-core-asl-1.9.13-Ht2i1DaJ57v29KlMROpA4Q.jar|jackson-core-asl-1.9.13-Ht2i1DaJ57v29KlMROpA4Q.jar|gs://highfive-dataflow-test/staging/jackson-databind-2.4.2-M7rkZKQCfOO3vWkOyf9BKg.jar|jackson-databind-2.4.2-M7rkZKQCfOO3vWkOyf9BKg.jar|gs://highfive-dataflow-test/staging/jackson-mapper-asl-1.9.13-eoeZFbovPzo033HQKy6x_Q.jar|jackson-mapper-asl-1.9.13-eoeZFbovPzo033HQKy6x_Q.jar|gs://highfive-dataflow-test/staging/javaws-O8JqID6BpsXsCSRRkhii3w.jar|javaws-O8JqID6BpsXsCSRRkhii3w.jar|gs://highfive-dataflow-test/staging/jce-eMjjWzdqQh30yNZ9HMuXMA.jar|jce-eMjjWzdqQh30yNZ9HMuXMA.jar|gs://highfive-dataflow-test/staging/jfr-xDzacRGMQeIR4SdPe69o1A.jar|jfr
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: -xDzacRGMQeIR4SdPe69o1A.jar|gs://highfive-dataflow-test/staging/jfxrt-5aSYnU7M458Xy_hx5zXF8w.jar|jfxrt-5aSYnU7M458Xy_hx5zXF8w.jar|gs://highfive-dataflow-test/staging/jfxswt-X8I_DFy9gs_6LMLp6_LFPA.jar|jfxswt-X8I_DFy9gs_6LMLp6_LFPA.jar|gs://highfive-dataflow-test/staging/joda-time-2.4-EIO48_0LMn2_imYqUT5jxA.jar|joda-time-2.4-EIO48_0LMn2_imYqUT5jxA.jar|gs://highfive-dataflow-test/staging/jsr305-1.3.9-ntb9Wy3-_ccJ7t2jV2Tb3g.jar|jsr305-1.3.9-ntb9Wy3-_ccJ7t2jV2Tb3g.jar|gs://highfive-dataflow-test/staging/jsse-HOItnWzBlT4hG5HPmlF56w.jar|jsse-HOItnWzBlT4hG5HPmlF56w.jar|gs://highfive-dataflow-test/staging/junit-4.11-lCgz3FeSwzD13Q_KNW4MuQ.jar|junit-4.11-lCgz3FeSwzD13Q_KNW4MuQ.jar|gs://highfive-dataflow-test/staging/localedata-R9ei3T8qar8cibFNN0X7Qg.jar|localedata-R9ei3T8qar8cibFNN0X7Qg.jar|gs://highfive-dataflow-test/staging/management-agent-kiuGeHiVpYKGCDNexcQPIg.jar|management-agent-kiuGeHiVpYKGCDNexcQPIg.jar|gs://highfive-dataflow-test/staging/mockito-all-1.9.5-_T4jPTp05rc7PhcOO34Saw.jar|mockito-all-1.9.5-_T4jPTp0
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: 5rc7PhcOO34Saw.jar|gs://highfive-dataflow-test/staging/nashorn-x8si6abt-U04QaVUHvl_bg.jar|nashorn-x8si6abt-U04QaVUHvl_bg.jar|gs://highfive-dataflow-test/staging/paranamer-2.3-rdmhSrp7GRPVm0JexWjzzg.jar|paranamer-2.3-rdmhSrp7GRPVm0JexWjzzg.jar|gs://highfive-dataflow-test/staging/plugin-TG6U30mOzKi8yMGKYd7ong.jar|plugin-TG6U30mOzKi8yMGKYd7ong.jar|gs://highfive-dataflow-test/staging/protobuf-java-2.5.0-g0LcHblB4cg-bZEbNj3log.jar|protobuf-java-2.5.0-g0LcHblB4cg-bZEbNj3log.jar|gs://highfive-dataflow-test/staging/resources-RavNZwakZf55HEtrC9KyCw.jar|resources-RavNZwakZf55HEtrC9KyCw.jar|gs://highfive-dataflow-test/staging/rt-Z2kDZdIt-eG8CCtFIinW1g.jar|rt-Z2kDZdIt-eG8CCtFIinW1g.jar|gs://highfive-dataflow-test/staging/slf4j-api-1.7.7-M8fOZEWF4TcHiUbfZmJY7A.jar|slf4j-api-1.7.7-M8fOZEWF4TcHiUbfZmJY7A.jar|gs://highfive-dataflow-test/staging/slf4j-jdk14-1.7.7-hDm19oG8Vzi6jVY9pLtr_g.jar|slf4j-jdk14-1.7.7-hDm19oG8Vzi6jVY9pLtr_g.jar|gs://highfive-dataflow-test/staging/snappy-java-1.0.5-WxwEQNTeXiDmEGBuY9O3Og.jar|snappy-java
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: -1.0.5-WxwEQNTeXiDmEGBuY9O3Og.jar|gs://highfive-dataflow-test/staging/sunec-ffsdkJzKsC8XbuZa-XHp3Q.jar|sunec-ffsdkJzKsC8XbuZa-XHp3Q.jar|gs://highfive-dataflow-test/staging/sunjce_provider-4x9-ynTri_pg6Hhk2Zj9Ow.jar|sunjce_provider-4x9-ynTri_pg6Hhk2Zj9Ow.jar|gs://highfive-dataflow-test/staging/sunmscapi-5TwnMDAci3Hf47yMZYmN1g.jar|sunmscapi-5TwnMDAci3Hf47yMZYmN1g.jar|gs://highfive-dataflow-test/staging/sunpkcs11-vCiFLLKN99XBpHW2JTkOBw.jar|sunpkcs11-vCiFLLKN99XBpHW2JTkOBw.jar|gs://highfive-dataflow-test/staging/xz-1.0-6m1HjeacPsPpniZtMte8kw.jar|xz-1.0-6m1HjeacPsPpniZtMte8kw.jar|gs://highfive-dataflow-test/staging/zipfs-SIKQJJIhpGOgSa4tT6nStA.jar|zipfs-SIKQJJIhpGOgSa4tT6nStA.jar"},"description":"GCE Instance created for Dataflow","disks":[{"deviceName":"persistent-disk-0","index":0,"mode":"READ_WRITE","type":"PERSISTENT"}],"hostname":"wordcount-jroy-1224043800-12232038-8cfa-harness-0.c.highfive-metrics-service.internal","id":8960015560553137779,"image":"","machineType":"projects/537312487774/machineTypes/n1-stan
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: dard-4","maintenanceEvent":"NONE","networkInterfaces":[{"accessConfigs":[{"externalIp":"130.211.184.44","type":"ONE_TO_ONE_NAT"}],"forwardedIps":[],"ip":"10.240.173.213","network":"projects/537312487774/networks/default"}],"scheduling":{"automaticRestart":"TRUE","onHostMaintenance":"MIGRATE"},"serviceAccounts":{"537312487774#developer.gserviceaccount.com":{"aliases":["default"],"email":"537312487774#developer.gserviceaccount.com","scopes":["https://www.googleapis.com/auth/any-api","https://www.googleapis.com/auth/bigquery","https://www.googleapis.com/auth/cloud-platform","https://www.googleapis.com/auth/compute","https://www.googleapis.com/auth/datastore","https://www.googleapis.com/auth/devstorage.full_control","https://www.googleapis.com/auth/logging.write","https://www.googleapis.com/auth/ndev.cloudman","https://www.googleapis.com/auth/pubsub","https://www.googleapis.com/auth/userinfo.email"]},"default":{"aliases":["default"],"email":"537312487774#developer.gserviceaccount.com","scopes":["https://www.goog
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: leapis.com/auth/any-api","https://www.googleapis.com/auth/bigquery","https://www.googleapis.com/auth/cloud-platform","https://www.googleapis.com/auth/compute","https://www.googleapis.com/auth/datastore","https://www.googleapis.com/auth/devstorage.full_control","https://www.googleapis.com/auth/logging.write","https://www.googleapis.com/auth/ndev.cloudman","https://www.googleapis.com/auth/pubsub","https://www.googleapis.com/auth/userinfo.email"]}},"tags":["dataflow"],"zone":"projects/537312487774/zones/us-central1-a"}
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: No startup script found in metadata.
Not sure what I should be looking for, but this seems to reliably fail for me in this manner. I see the same problem when I try to run a custom pipeline of my own (i.e. not WordCount), and also when I run the WordCount example on Linux.
I saved off a file where I recorded:
The complete output from the WordCount main class
The metadata field values set on the GCE instance
The complete serial console output
It is available here.
Things I've tried so far, without success:
Forcing the language level of the compiled classes to 1.7 (am using 1.8 JRE)
Modifying DataflowPipelineRunner::detectClassPathResourcesToStage to not emit JRE jar files (this is a difference I noticed in the log compared to Maven; when running under Maven the JRE jars are not staged).
EDIT: Attempting to set the classpath to EXACTLY the same as what Maven ends up using (removing all of our projects' dependencies). This seemed to change the behavior a bit and I got to a java.lang.ClassNotFoundException: com.google.cloud.dataflow.examples.WordCount$ExtractWordsFn in the worker output.
Strongly suspicious that the problem lies with the staged classpath, but without more specific error messages, I'm shooting in the dark. Would appreciate ideas of where to look next or other things to try.
When running pipelines using [Blocking]DataflowPipelineRunner from the Cloud Dataflow Java SDK, the runner automatically copies everything from your local Java class path to a staging location in Google Cloud Storage, which is being accessed by workers on-demand.
ClassNotFoundException in the Cloud Dataflow worker environment is an indication that required dependencies for your pipeline are not properly staged in a Google Cloud Storage bucket. This likely root cause can be confirmed by looking at the contents of your staging bucket in Google Developers Console and the console output of BlockingDataflowPipelineRunner.
Now, the problem can be fixed by bundling all dependencies into a single, monolithic jar. In Maven, the following command can be used to create such a jar as long as the bundle plugin is properly configured to embed all transitive dependencies:
mvn bundle:bundle
Then, the bundled jar can be executed normally, such as:
java -cp <bundled jar> <main class> --project=<project> ...
Alternatively, the problem can be fixed by manually adding dependencies to your local class path. For example, the following command may be helpful when running an unbundled jar:
java -cp <unbundled jar>:<dep1>:<dep2>:...:<depN> <main class> --project=<project> ...
where dep1 to depN are all the dependencies needed for execution of the program. This is clearly error prone, and we don't endorse it. Our documentation recommends using mvn exec:java because that sets the execution class path automatically from the dependencies listed in the POM file. Specifically, to run WordCount example, use:
mvn exec:java -pl examples \
-Dexec.mainClass=com.google.cloud.dataflow.examples.WordCount \
-Dexec.args="--project=<YOUR GCP PROJECT NAME> --stagingLocation=<YOUR GCS LOCATION> --runner=BlockingDataflowPipelineRunner"
The main difference between bundled and unbundled version is in the upload activity before pipeline submission. Unbundled version has an advantage that it can automatically use unchanged dependencies that may have been uploaded in previous submissions.
To summarize, use mvn exec:java when running an unbundled jar, or bundle the dependencies into a monolithic jar. We'll try to clarify this in the documentation.
There's a very high likelihood that this is an issue with staging dependencies.
There's a high probability if you create a bundled jar it will just work. You can create a bundled jar by running the command
mvn bundle:bundle
This will create a single jar that should pull in all dependencies transitively. You then just need to add that jar to your class path and Dataflow should automatically stage it; Thereby ensuring your code as well as any dependencies are available on the worker.
Most likely the job worked with mvn exec, because maven automatically generates a class path with all dependencies from the POM. When running manually, that doesn't happen. i.e if you invoke java directly e.g.
java -cp <JAR FILES> your.main.class --project=<YOUR PROJECT> ....
then you must add all dependencies to the class path so that they get staged. Creating a bundled jar as suggested above is usually the easiest way to do that.
My suggestion would be to look at the worker logs to see if we can find additional information about what's going on in the workers.
There are three ways to get this information. The first is via the Dataflow UI. Go to the Google Cloud Console and then select the Dataflow option in the left hand frame. You should see a list of your jobs. You can click on the job in question. This should show you a graph of your job. On the right side you should see a button "view logs". Please click that. You should then see a UI for navigating the logs and you can look for errors.
The second option is to look for the logs on GCS. The location to look for is:
gs://PATH TO YOUR STAGING DIRECTORY/logs/JOB-ID/VM-ID/LOG-FILE
You might see multiple log files. The one we are most interested in is the one that starts with "start_java_worker". If that log file doesn't exist then the worker didn't make enough progress to actually upload the file; or else there might have been a permission problem uploading the log file.
In that case the best thing to do is to try to ssh into one of the VMs before it gets torn down. You should have about 15 minutes before the job fails and the VMs are deleted.
Once you login to the VM you can find all the logs in
/var/log/dataflow/...
The log we care most about at this point is:
/var/log/dataflow/taskrunner/harness/start_java_worker-SOME ID.log
If there is a problem starting the code that runs on the VM that log should tell us. That log and the other logs should also tell us if there is a permission problem that prevents the code running on the worker from being able to access Dataflow.
Please take a look and let us know if you find anything.