I need to import a set of SYSLOG files to elasticsearch. I'am using a filebeat agent.
I succeeded the data importation, however the data in elasticsearch is not parsed.
This is the input file:
Feb 14 03:43:40 my_host_name run-parts(/etc/cron.daily)[1544] finished rhsmd
Feb 14 03:43:40 my_host_name anacron[240673]: Job `cron.daily' terminated (produced output)
Feb 14 03:43:41 my_host_name anacron[240673]: Normal exit (1 job run)
Feb 14 03:43:41 my_host_name postfix/pickup[241860]: 7E8CFC00BB50: uid=0 from=<root>
I work on the 7.15.2 version of Filebeat and Elasticsearch. I get an index output with the field message not parsed. That contain for example the hole line " Feb 14 03:43:41 my_host_name anacron[240673]: Normal exit (1 job run)".
On the versions 8.0 there is a processor option to add to the configuration file that parse this field:
processors:
- syslog:
field: message
However in the version 7.15.2 this option is not available.
How can I parse this Field in the Filebeat configuration ?
Thank you for your help.
What you could do is either use the dissect or script processors to parse the values according to your needs. Not saying this is the best option, but it is an option
using Apache Ariflow:
We have created a DAG that runs everyday at 07:00 AM: schedule_interval='0 7 * * *'
The task is searching for a new row in a certain table. If it sees a new row, it continues to execute more tasks and so on.
We want the task to run for 19 hours. If it did not find a new row in that table, it will skip the rest of the tasks. The task's timeout is: timeout=60 * 60 * 19
Recently we have found that after 12 hours of running, we get an error which prompts the task to fail. Because we have a retry, the task retrires and then runs fully for 19 hours.
So instead of 19 hours, we get a run of 31 hours.
Here is the error:
INFO - Dependencies not met for <TaskInstance: DAG_NAME.check_for_new_file 2021-06-28T07:00:00+00:00 [running]>, dependency 'Task Instance State' FAILED: Task is in the 'running' state which is not a valid state for execution. The task must be cleared in order to be run.
Has anyone experienced this error? If I seem to understand correctly, the task is trying to run again after 12 hours, so the task is being changed to a 'failed' instead of 'running' state?
Thanks!
I am newbie to gke.
I have python app running inside a gke pod. Pod gets evicted as out of memory after 30minutes. Total vm memory is 13GB, and as i ssh into the pod, the peak used memory before eviction is only about 3GB...
I have tried running some dummy code as defined in Dockerfile "CMD tail -f /dev/null", then connect to the pod and running scraper script manually, with success - being able to finish with peak mem usage of 9 GB.
docker file:
CMD python3 scraper.py
> Managed pods Revision Name Status Restarts Created on 1
> scraper-df68b65bf-gbhms Running 0 Sep 2, 2019, 2:59:59 PM 1
> scraper-df68b65bf-gktqw Running 0 Sep 2, 2019, 2:59:59 PM 1
> scraper-df68b65bf-z4kjb Running 0 Sep 2, 2019, 2:59:59 PM 1
> scraper-df68b65bf-wk6td Running 0 Sep 2, 2019, 3:00:45 PM 1
> scraper-df68b65bf-xqm7h Running 0 Sep 2, 2019, 3:00:45 PM
My guess is there are many instances of my app running inside of space of 13 GB in many parallel pods? How do I run single instance of my app on gke so I have all memory from vm available to it?
Do you have replica count set to one in your deployment.yaml file?
spec:
replicas: 1
In case it is HorizontalPodAutoscaler you can edit it by:
Get the HorizontalPodAutoscaler
kubectl get HorizontalPodAutoscaler
Edit it by using the edit command
kubectl edit HorizontalPodAutoscaler <pod scaler name>
And the end result of HorizontalPodAutoscaler looks like this
spec:
maxReplicas: 1
minReplicas: 1
Awesome reply #Bismal.
#Wotjas, just to add my 2 cents; you can use the Cloud Console to set the min and max values, you just need to go to:
Cloud Menu -> GKE -> Workloads -> Actions -> Scale
Set the desired values, then save.
More detailed information can be found in this document [1].
[1] https://cloud.google.com/kubernetes-engine/docs/how-to/scaling-apps
From Nagios' Plugin Development Guidelines:
Plugins have a very limited runtime - typically 10 sec. As a result, it is very important for plugins to maintain internal code to exit if runtime exceeds a threshold.
All plugins should timeout gracefully, not just networking plugins.
How can I implement a timeout mechanism into my custom plugin? Basically I want my plugin to return a status code 3 - UNKNOWN instead of the default 1 - CRITICAL when the plugin times out, to reduce the number of false positives generated.
EDIT: My plugin is written in Bash.
You can use timeout. Here is example usage:
timeout 15 ping google.com
if [ $? -eq 124 ]; then
echo "UNKNOWN - Time limit exceeded."
exit 3
if
You will get return exit status 124 from timeout when your command don't finish in defined time - 15 sec.
I am able to successfully run the WordCount example using DataflowPipelineRunner with the maven exec:java command shown in the docs.
However, when I attempt to run it in my own 1.8 VM, it doesn't work. I am using these args (on Windows):
--project=highfive-metrics-service \
--stagingLocation=gs://highfive-dataflow-test/staging \
--runner=BlockingDataflowPipelineRunner \
--gCloudPath=C:/Progra~1/Google/CloudS~1/google-cloud-sdk/bin/gcloud.cmd
I get the following error:
2014-12-24T04:53:34.849Z: (5eada047929dcead): Workflow failed. Causes: (5eada047929dce2e): There was a problem creating the GCE VMs or starting Dataflow on the VMs so no data was processed. Possible causes:
1. A failure in user code on in the worker.
2. A failure in the Dataflow code.
Next Steps:
1. Check the GCE serial console for possible errors in the logs.
2. Look for similar issues on http://stackoverflow.com/questions/tagged/google-cloud-dataflow.
Prior to the subsequent cleanup, I observed three harness instances on GCE as expected. Looking at the serial console for the first one, wordcount-jroy-1224043800-12232038-8cfa-harness-0, I see "normal" (comparing to what I see when running with Maven) looking output that ends with:
Dec 24 04:38:45 [ 16.443484] IPv6: ADDRCONF(NETDEV_CHANGE): docker0: link becomes ready
wordcount-jroy-1224043800-12232038-8cfa-harness-0 kernel: [ 16.438005] IPv6: ADDRCONF(NETDEV_CHANGE): veth30b3796: link becomes ready
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 kernel: [ 16.439395] docker0: port 1(veth30b3796) entered forwarding state
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 kernel: [ 16.440262] docker0: port 1(veth30b3796) entered forwarding state
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 kernel: [ 16.443484] IPv6: ADDRCONF(NETDEV_CHANGE): docker0: link becomes ready
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 12898 100 12898 0 0 2009k 0 --:--:-- --:--:-- --:--:-- 3148k
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: {"attributes":{"config":"{\"alsologtostderr\":true,\"base_task_dir\":\"/tmp/tasks/\",\"commandlines_file_name\":\"commandlines.txt\",\"continue_on_exception\":true,\"dataflow_api_endpoint\":\"https://www.googleapis.com/\",\"dataflow_api_version\":\"v1beta1\",\"log_dir\":\"/dataflow/logs/taskrunner/harness\",\"log_to_gcs\":true,\"log_to_serialconsole\":true,\"parallel_worker_flags\":{\"job_id\":\"2014-12-23_20_38_16.593375-08_10.48.106.68_-469744588\",\"project_id\":\"highfive-metrics-service\",\"reporting_enabled\":true,\"root_url\":\"https://www.googleapis.com/\",\"service_path\":\"dataflow/v1b3/projects/\",\"temp_gcs_directory\":\"gs://highfive-dataflow-test/staging\",\"worker_id\":\"wordcount-jroy-1224043800-12232038-8cfa-harness-0\"},\"project_id\":\"highfive-metrics-service\",\"python_harness_cmd\":\"python_harness_main\",\"scopes\":[\"https://www.googleapis.com/auth/devstorage.full_control\",\"https://www.googleapis.com/auth/cloud-platform\"],\"task_group\":\"nogroup\",\"task_user\":\"nobody\",\"temp_g
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 goo[ 16.494163] device veth29b6136 entered promiscuous mode
gle: cs_directory\":\"gs://highfive-dataflow-test/staging\",\"vm_id\":\"wordcoun[ 16.505311] IPv6: ADDRCONF(NETDEV_UP): veth29b6136: link is not ready
[ 16.507623] docker0: port 2(veth29b6136) entered forwarding state
t-jroy-122404380[ 16.507633] docker0: port 2(veth29b6136) entered forwarding state
0-12232038-8cfa-harness-0\"}","google-container-manifest":"\ncontainers:\n-\n env:\n -\n name: GCS_BUCKET\n value: dataflow-docker-images\n image: google/docker-registry\n imagePullPolicy: PullNever\n name: repository\n ports:\n -\n containerPort: 5000\n hostPort: 5000\n name: registry\n-\n image: localhost:5000/dataflow/taskrunner:20141217-rc00 \n imagePullPolicy: PullIfNotPresent\n name: taskrunner\n volumeMounts:\n -\n mountPath: /dataflow/logs/taskrunner/harness\n name: dataflowlogs-harness\n-\n env:\n -\n name: LOG_DIR\n value: /dataflow/logs\n image: localhost:5000/dataflow/shuffle:20141217-rc00 \n imagePullPolicy: PullIfNotPresent\n name: shuffle\n ports:\n -\n containerPort: 12345\n hostPort: 12345\n name: shuffle1\n -\n containerPort: 22349\n hostPort: 22349\n name: shuffle2\n volumeMounts:\n -\n mountPath: /var/shuffle\n name: dataflow-shuffle\n -\n mountPath: /dataflow/logs\n name: dataflow-logs\nversion: v1
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: beta2\nvolumes:\n-\n name: dataflowlogs-harness\n source:\n hostDir:\n path: /var/log/dataflow/taskrunner/harness\n-\n name: dataflow-shuffle\n source:\n hostDir:\n path: /dataflow/shuffle\n-\n name: dataflow-logs\n source:\n hostDir:\n path: /var/log/dataflow/shuffle\n","job_id":"2014-12-23_20_38_16.593375-08_10.48.106.68_-469744588","packages":"gs://dataflow-releases-prod/worker_packages/NOTICES.shuffle|NOTICES.shuffler|gs://highfive-dataflow-test/staging/access-bridge-64-fE-vq3Wgxy5FvnwmA5YdzQ.jar|access-bridge-64-fE-vq3Wgxy5FvnwmA5YdzQ.jar|gs://highfive-dataflow-test/staging/avro-1.7.7-dTlef6huetK-4IFERNhcqA.jar|avro-1.7.7-dTlef6huetK-4IFERNhcqA.jar|gs://highfive-dataflow-test/staging/charsets-7HC8Y2_U4k8yfkY6e4lxnw.jar|charsets-7HC8Y2_U4k8yfkY6e4lxnw.jar|gs://highfive-dataflow-test/staging/cldrdata-A4PVsm4mesLVUWOTKV5dhQ.jar|cldrdata-A4PVsm4mesLVUWOTKV5dhQ.jar|gs://highfive-dataflow-test/staging/commons-codec-1.3-2I5AW2KkklMQs3emwoFU5Q.jar|commons-codec-1.3-2I5AW2KkklMQs3emwoFU5Q.jar|gs://highfive-dataf
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: low-test/staging/commons-compress-1.4.1-uyvcB16Wfp4wnt8X1Uqi4w.jar|commons-compress-1.4.1-uyvcB16Wfp4wnt8X1Uqi4w.jar|gs://highfive-dataflow-test/staging/commons-logging-1.1.1-blBISC6STJhwBOT8Ksr3NQ.jar|commons-logging-1.1.1-blBISC6STJhwBOT8Ksr3NQ.jar|gs://highfive-dataflow-test/staging/dataflow-test-YIJKUxARCp14MLdWzNdBdQ.zip|dataflow-test-YIJKUxARCp14MLdWzNdBdQ.zip|gs://highfive-dataflow-test/staging/deploy-eLnif2izXW_mrleXudK0Eg.jar|deploy-eLnif2izXW_mrleXudK0Eg.jar|gs://highfive-dataflow-test/staging/dnsns-hmxeUSrhtJou0Wo-UoCjTw.jar|dnsns-hmxeUSrhtJou0Wo-UoCjTw.jar|gs://highfive-dataflow-test/staging/google-api-client-1.19.0-YgeHY_Y9dPd2PwGBWwvmmw.jar|google-api-client-1.19.0-YgeHY_Y9dPd2PwGBWwvmmw.jar|gs://highfive-dataflow-test/staging/google-api-services-bigquery-v2-rev167-1.19.0-mNojB6wqlFqAd2G9Zo7o5w.jar|google-api-services-bigquery-v2-rev167-1.19.0-mNojB6wqlFqAd2G9Zo7o5w.jar|gs://highfive-dataflow-test/staging/google-api-services-compute-v1-rev34-1.19.0-yR5ItN9uOowLPyMiTckyCA.jar|google-api-services
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: -compute-v1-rev34-1.19.0-yR5ItN9uOowLPyMiTckyCA.jar|gs://highfive-dataflow-test/staging/google-api-services-dataflow-v1beta3-rev1-1.19.0-Cg8Pyd4F0t7yqSE4E7v7Rg.jar|google-api-services-dataflow-v1beta3-rev1-1.19.0-Cg8Pyd4F0t7yqSE4E7v7Rg.jar|gs://highfive-dataflow-test/staging/google-api-services-datastore-protobuf-v1beta2-rev1-2.1.0-UxLefoYWxF5K1EpQjKMJ4w.jar|google-api-services-datastore-protobuf-v1beta2-rev1-2.1.0-UxLefoYWxF5K1EpQjKMJ4w.jar|gs://highfive-dataflow-test/staging/google-api-services-pubsub-v1beta1-rev9-1.19.0-7E1jg5ZyfaqZBCHY18fPkQ.jar|google-api-services-pubsub-v1beta1-rev9-1.19.0-7E1jg5ZyfaqZBCHY18fPkQ.jar|gs://highfive-dataflow-test/staging/google-api-services-storage-v1-rev11-1.19.0-8roIrNilTlO2ZqfGfOaqkg.jar|google-api-services-storage-v1-rev11-1.19.0-8roIrNilTlO2ZqfGfOaqkg.jar|gs://highfive-dataflow-test/staging/google-cloud-dataflow-java-examples-all-manual_build-A9j6W_hzOlq6PBrg1oSIAQ.jar|google-cloud-dataflow-java-examples-all-manual_build-A9j6W_hzOlq6PBrg1oSIAQ.jar|gs://highfive-dataf
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: low-test/staging/google-cloud-dataflow-java-examples-all-manual_build-tests-iIdI-AhKWiVKTuJzU5JxcQ.jar|google-cloud-dataflow-java-examples-all-manual_build-tests-iIdI-AhKWiVKTuJzU5JxcQ.jar|gs://highfive-dataflow-test/staging/google-cloud-dataflow-java-sdk-all-alpha-PqdZNVZwhs6ixh6de6vM7A.jar|google-cloud-dataflow-java-sdk-all-alpha-PqdZNVZwhs6ixh6de6vM7A.jar|gs://highfive-dataflow-test/staging/google-http-client-1.19.0-1Vc3U5mogjNLbpTK7NVwDg.jar|google-http-client-1.19.0-1Vc3U5mogjNLbpTK7NVwDg.jar|gs://highfive-dataflow-test/staging/google-http-client-jackson-1.15.0-rc-oW6nFU6Gme53SYGJ9KlNbA.jar|google-http-client-jackson-1.15.0-rc-oW6nFU6Gme53SYGJ9KlNbA.jar|gs://highfive-dataflow-test/staging/google-http-client-jackson2-1.19.0-AOUP2FfuHtACTs_0sul54A.jar|google-http-client-jackson2-1.19.0-AOUP2FfuHtACTs_0sul54A.jar|gs://highfive-dataflow-test/staging/google-http-client-protobuf-1.15.0-rc-xYoprQdNcvzuQGZXvJ3ZaQ.jar|google-http-client-protobuf-1.15.0-rc-xYoprQdNcvzuQGZXvJ3ZaQ.jar|gs://highfive-dataflow-test/st
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: aging/google-oauth-client-1.19.0-b3S5WqgD7iWrwg38pfg3Xg.jar|google-oauth-client-1.19.0-b3S5WqgD7iWrwg38pfg3Xg.jar|gs://highfive-dataflow-test/staging/google-oauth-client-java6-1.19.0-cP8xzICJnsNlhTfaS0egcg.jar|google-oauth-client-java6-1.19.0-cP8xzICJnsNlhTfaS0egcg.jar|gs://highfive-dataflow-test/staging/guava-18.0-HtxcCcuUqPt4QL79yZSvag.jar|guava-18.0-HtxcCcuUqPt4QL79yZSvag.jar|gs://highfive-dataflow-test/staging/hamcrest-all-1.3-n3_QBeS4s5a8ffbBPQIpFQ.jar|hamcrest-all-1.3-n3_QBeS4s5a8ffbBPQIpFQ.jar|gs://highfive-dataflow-test/staging/hamcrest-core-1.3-DvCZoZPq_3EWA4TcZlVL6g.jar|hamcrest-core-1.3-DvCZoZPq_3EWA4TcZlVL6g.jar|gs://highfive-dataflow-test/staging/httpclient-4.0.1-sfocsPjEBE7ppkUpSIJZkA.jar|httpclient-4.0.1-sfocsPjEBE7ppkUpSIJZkA.jar|gs://highfive-dataflow-test/staging/httpcore-4.0.1-_SGEPUOMREqA8u_h7qy9_w.jar|httpcore-4.0.1-_SGEPUOMREqA8u_h7qy9_w.jar|gs://highfive-dataflow-test/staging/idea_rt-6II88e1BKUeCOQqcrZht-w.jar|idea_rt-6II88e1BKUeCOQqcrZht-w.jar|gs://highfive-dataflow-test/staging/jacce
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: ss-laKenN34W6jKKivkBUzVcA.jar|jaccess-laKenN34W6jKKivkBUzVcA.jar|gs://highfive-dataflow-test/staging/jackson-annotations-2.4.2-7cAfM1zz0nmoSOC_NlRIcw.jar|jackson-annotations-2.4.2-7cAfM1zz0nmoSOC_NlRIcw.jar|gs://highfive-dataflow-test/staging/jackson-core-2.4.2-3CV4j5-qI7Y-1EADAiakmw.jar|jackson-core-2.4.2-3CV4j5-qI7Y-1EADAiakmw.jar|gs://highfive-dataflow-test/staging/jackson-core-asl-1.9.13-Ht2i1DaJ57v29KlMROpA4Q.jar|jackson-core-asl-1.9.13-Ht2i1DaJ57v29KlMROpA4Q.jar|gs://highfive-dataflow-test/staging/jackson-databind-2.4.2-M7rkZKQCfOO3vWkOyf9BKg.jar|jackson-databind-2.4.2-M7rkZKQCfOO3vWkOyf9BKg.jar|gs://highfive-dataflow-test/staging/jackson-mapper-asl-1.9.13-eoeZFbovPzo033HQKy6x_Q.jar|jackson-mapper-asl-1.9.13-eoeZFbovPzo033HQKy6x_Q.jar|gs://highfive-dataflow-test/staging/javaws-O8JqID6BpsXsCSRRkhii3w.jar|javaws-O8JqID6BpsXsCSRRkhii3w.jar|gs://highfive-dataflow-test/staging/jce-eMjjWzdqQh30yNZ9HMuXMA.jar|jce-eMjjWzdqQh30yNZ9HMuXMA.jar|gs://highfive-dataflow-test/staging/jfr-xDzacRGMQeIR4SdPe69o1A.jar|jfr
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: -xDzacRGMQeIR4SdPe69o1A.jar|gs://highfive-dataflow-test/staging/jfxrt-5aSYnU7M458Xy_hx5zXF8w.jar|jfxrt-5aSYnU7M458Xy_hx5zXF8w.jar|gs://highfive-dataflow-test/staging/jfxswt-X8I_DFy9gs_6LMLp6_LFPA.jar|jfxswt-X8I_DFy9gs_6LMLp6_LFPA.jar|gs://highfive-dataflow-test/staging/joda-time-2.4-EIO48_0LMn2_imYqUT5jxA.jar|joda-time-2.4-EIO48_0LMn2_imYqUT5jxA.jar|gs://highfive-dataflow-test/staging/jsr305-1.3.9-ntb9Wy3-_ccJ7t2jV2Tb3g.jar|jsr305-1.3.9-ntb9Wy3-_ccJ7t2jV2Tb3g.jar|gs://highfive-dataflow-test/staging/jsse-HOItnWzBlT4hG5HPmlF56w.jar|jsse-HOItnWzBlT4hG5HPmlF56w.jar|gs://highfive-dataflow-test/staging/junit-4.11-lCgz3FeSwzD13Q_KNW4MuQ.jar|junit-4.11-lCgz3FeSwzD13Q_KNW4MuQ.jar|gs://highfive-dataflow-test/staging/localedata-R9ei3T8qar8cibFNN0X7Qg.jar|localedata-R9ei3T8qar8cibFNN0X7Qg.jar|gs://highfive-dataflow-test/staging/management-agent-kiuGeHiVpYKGCDNexcQPIg.jar|management-agent-kiuGeHiVpYKGCDNexcQPIg.jar|gs://highfive-dataflow-test/staging/mockito-all-1.9.5-_T4jPTp05rc7PhcOO34Saw.jar|mockito-all-1.9.5-_T4jPTp0
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: 5rc7PhcOO34Saw.jar|gs://highfive-dataflow-test/staging/nashorn-x8si6abt-U04QaVUHvl_bg.jar|nashorn-x8si6abt-U04QaVUHvl_bg.jar|gs://highfive-dataflow-test/staging/paranamer-2.3-rdmhSrp7GRPVm0JexWjzzg.jar|paranamer-2.3-rdmhSrp7GRPVm0JexWjzzg.jar|gs://highfive-dataflow-test/staging/plugin-TG6U30mOzKi8yMGKYd7ong.jar|plugin-TG6U30mOzKi8yMGKYd7ong.jar|gs://highfive-dataflow-test/staging/protobuf-java-2.5.0-g0LcHblB4cg-bZEbNj3log.jar|protobuf-java-2.5.0-g0LcHblB4cg-bZEbNj3log.jar|gs://highfive-dataflow-test/staging/resources-RavNZwakZf55HEtrC9KyCw.jar|resources-RavNZwakZf55HEtrC9KyCw.jar|gs://highfive-dataflow-test/staging/rt-Z2kDZdIt-eG8CCtFIinW1g.jar|rt-Z2kDZdIt-eG8CCtFIinW1g.jar|gs://highfive-dataflow-test/staging/slf4j-api-1.7.7-M8fOZEWF4TcHiUbfZmJY7A.jar|slf4j-api-1.7.7-M8fOZEWF4TcHiUbfZmJY7A.jar|gs://highfive-dataflow-test/staging/slf4j-jdk14-1.7.7-hDm19oG8Vzi6jVY9pLtr_g.jar|slf4j-jdk14-1.7.7-hDm19oG8Vzi6jVY9pLtr_g.jar|gs://highfive-dataflow-test/staging/snappy-java-1.0.5-WxwEQNTeXiDmEGBuY9O3Og.jar|snappy-java
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: -1.0.5-WxwEQNTeXiDmEGBuY9O3Og.jar|gs://highfive-dataflow-test/staging/sunec-ffsdkJzKsC8XbuZa-XHp3Q.jar|sunec-ffsdkJzKsC8XbuZa-XHp3Q.jar|gs://highfive-dataflow-test/staging/sunjce_provider-4x9-ynTri_pg6Hhk2Zj9Ow.jar|sunjce_provider-4x9-ynTri_pg6Hhk2Zj9Ow.jar|gs://highfive-dataflow-test/staging/sunmscapi-5TwnMDAci3Hf47yMZYmN1g.jar|sunmscapi-5TwnMDAci3Hf47yMZYmN1g.jar|gs://highfive-dataflow-test/staging/sunpkcs11-vCiFLLKN99XBpHW2JTkOBw.jar|sunpkcs11-vCiFLLKN99XBpHW2JTkOBw.jar|gs://highfive-dataflow-test/staging/xz-1.0-6m1HjeacPsPpniZtMte8kw.jar|xz-1.0-6m1HjeacPsPpniZtMte8kw.jar|gs://highfive-dataflow-test/staging/zipfs-SIKQJJIhpGOgSa4tT6nStA.jar|zipfs-SIKQJJIhpGOgSa4tT6nStA.jar"},"description":"GCE Instance created for Dataflow","disks":[{"deviceName":"persistent-disk-0","index":0,"mode":"READ_WRITE","type":"PERSISTENT"}],"hostname":"wordcount-jroy-1224043800-12232038-8cfa-harness-0.c.highfive-metrics-service.internal","id":8960015560553137779,"image":"","machineType":"projects/537312487774/machineTypes/n1-stan
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: dard-4","maintenanceEvent":"NONE","networkInterfaces":[{"accessConfigs":[{"externalIp":"130.211.184.44","type":"ONE_TO_ONE_NAT"}],"forwardedIps":[],"ip":"10.240.173.213","network":"projects/537312487774/networks/default"}],"scheduling":{"automaticRestart":"TRUE","onHostMaintenance":"MIGRATE"},"serviceAccounts":{"537312487774#developer.gserviceaccount.com":{"aliases":["default"],"email":"537312487774#developer.gserviceaccount.com","scopes":["https://www.googleapis.com/auth/any-api","https://www.googleapis.com/auth/bigquery","https://www.googleapis.com/auth/cloud-platform","https://www.googleapis.com/auth/compute","https://www.googleapis.com/auth/datastore","https://www.googleapis.com/auth/devstorage.full_control","https://www.googleapis.com/auth/logging.write","https://www.googleapis.com/auth/ndev.cloudman","https://www.googleapis.com/auth/pubsub","https://www.googleapis.com/auth/userinfo.email"]},"default":{"aliases":["default"],"email":"537312487774#developer.gserviceaccount.com","scopes":["https://www.goog
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: leapis.com/auth/any-api","https://www.googleapis.com/auth/bigquery","https://www.googleapis.com/auth/cloud-platform","https://www.googleapis.com/auth/compute","https://www.googleapis.com/auth/datastore","https://www.googleapis.com/auth/devstorage.full_control","https://www.googleapis.com/auth/logging.write","https://www.googleapis.com/auth/ndev.cloudman","https://www.googleapis.com/auth/pubsub","https://www.googleapis.com/auth/userinfo.email"]}},"tags":["dataflow"],"zone":"projects/537312487774/zones/us-central1-a"}
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: No startup script found in metadata.
Not sure what I should be looking for, but this seems to reliably fail for me in this manner. I see the same problem when I try to run a custom pipeline of my own (i.e. not WordCount), and also when I run the WordCount example on Linux.
I saved off a file where I recorded:
The complete output from the WordCount main class
The metadata field values set on the GCE instance
The complete serial console output
It is available here.
Things I've tried so far, without success:
Forcing the language level of the compiled classes to 1.7 (am using 1.8 JRE)
Modifying DataflowPipelineRunner::detectClassPathResourcesToStage to not emit JRE jar files (this is a difference I noticed in the log compared to Maven; when running under Maven the JRE jars are not staged).
EDIT: Attempting to set the classpath to EXACTLY the same as what Maven ends up using (removing all of our projects' dependencies). This seemed to change the behavior a bit and I got to a java.lang.ClassNotFoundException: com.google.cloud.dataflow.examples.WordCount$ExtractWordsFn in the worker output.
Strongly suspicious that the problem lies with the staged classpath, but without more specific error messages, I'm shooting in the dark. Would appreciate ideas of where to look next or other things to try.
When running pipelines using [Blocking]DataflowPipelineRunner from the Cloud Dataflow Java SDK, the runner automatically copies everything from your local Java class path to a staging location in Google Cloud Storage, which is being accessed by workers on-demand.
ClassNotFoundException in the Cloud Dataflow worker environment is an indication that required dependencies for your pipeline are not properly staged in a Google Cloud Storage bucket. This likely root cause can be confirmed by looking at the contents of your staging bucket in Google Developers Console and the console output of BlockingDataflowPipelineRunner.
Now, the problem can be fixed by bundling all dependencies into a single, monolithic jar. In Maven, the following command can be used to create such a jar as long as the bundle plugin is properly configured to embed all transitive dependencies:
mvn bundle:bundle
Then, the bundled jar can be executed normally, such as:
java -cp <bundled jar> <main class> --project=<project> ...
Alternatively, the problem can be fixed by manually adding dependencies to your local class path. For example, the following command may be helpful when running an unbundled jar:
java -cp <unbundled jar>:<dep1>:<dep2>:...:<depN> <main class> --project=<project> ...
where dep1 to depN are all the dependencies needed for execution of the program. This is clearly error prone, and we don't endorse it. Our documentation recommends using mvn exec:java because that sets the execution class path automatically from the dependencies listed in the POM file. Specifically, to run WordCount example, use:
mvn exec:java -pl examples \
-Dexec.mainClass=com.google.cloud.dataflow.examples.WordCount \
-Dexec.args="--project=<YOUR GCP PROJECT NAME> --stagingLocation=<YOUR GCS LOCATION> --runner=BlockingDataflowPipelineRunner"
The main difference between bundled and unbundled version is in the upload activity before pipeline submission. Unbundled version has an advantage that it can automatically use unchanged dependencies that may have been uploaded in previous submissions.
To summarize, use mvn exec:java when running an unbundled jar, or bundle the dependencies into a monolithic jar. We'll try to clarify this in the documentation.
There's a very high likelihood that this is an issue with staging dependencies.
There's a high probability if you create a bundled jar it will just work. You can create a bundled jar by running the command
mvn bundle:bundle
This will create a single jar that should pull in all dependencies transitively. You then just need to add that jar to your class path and Dataflow should automatically stage it; Thereby ensuring your code as well as any dependencies are available on the worker.
Most likely the job worked with mvn exec, because maven automatically generates a class path with all dependencies from the POM. When running manually, that doesn't happen. i.e if you invoke java directly e.g.
java -cp <JAR FILES> your.main.class --project=<YOUR PROJECT> ....
then you must add all dependencies to the class path so that they get staged. Creating a bundled jar as suggested above is usually the easiest way to do that.
My suggestion would be to look at the worker logs to see if we can find additional information about what's going on in the workers.
There are three ways to get this information. The first is via the Dataflow UI. Go to the Google Cloud Console and then select the Dataflow option in the left hand frame. You should see a list of your jobs. You can click on the job in question. This should show you a graph of your job. On the right side you should see a button "view logs". Please click that. You should then see a UI for navigating the logs and you can look for errors.
The second option is to look for the logs on GCS. The location to look for is:
gs://PATH TO YOUR STAGING DIRECTORY/logs/JOB-ID/VM-ID/LOG-FILE
You might see multiple log files. The one we are most interested in is the one that starts with "start_java_worker". If that log file doesn't exist then the worker didn't make enough progress to actually upload the file; or else there might have been a permission problem uploading the log file.
In that case the best thing to do is to try to ssh into one of the VMs before it gets torn down. You should have about 15 minutes before the job fails and the VMs are deleted.
Once you login to the VM you can find all the logs in
/var/log/dataflow/...
The log we care most about at this point is:
/var/log/dataflow/taskrunner/harness/start_java_worker-SOME ID.log
If there is a problem starting the code that runs on the VM that log should tell us. That log and the other logs should also tell us if there is a permission problem that prevents the code running on the worker from being able to access Dataflow.
Please take a look and let us know if you find anything.