Rsyslog collect logs from different timezones - timezone

Im using rsyslog on server to collect logs from remote hosts.
Collect server config:
# timedatectl
Local time: Wed 2022-04-27 16:02:43 MSK
Universal time: Wed 2022-04-27 13:02:43 UTC
RTC time: n/a
Time zone: Europe/Moscow (MSK, +0300)
System clock synchronized: yes
NTP service: inactive
RTC in local TZ: no
# cat /etc/rsyslog.d/20_external.conf
$CreateDirs on
$PreserveFQDN on
# provides UDP syslog reception
module(load="imudp")
input(type="imudp" port="514")
# provides TCP syslog reception
module(load="imtcp")
input(type="imtcp" port="514")
template(
name="external"
type="string"
string="/var/log/external/%HOSTNAME%/%syslogfacility-text%.%programname%.%syslogseverity-text%.log"
)
action(
type="omfile"
dirCreateMode="0775"
FileCreateMode="0644"
dynaFile="external"
)
On remote host
# timedatectl
Local time: Wed 2022-04-27 13:04:03 UTC
Universal time: Wed 2022-04-27 13:04:03 UTC
RTC time: n/a
Time zone: UTC (UTC, +0000)
System clock synchronized: yes
NTP service: inactive
RTC in local TZ: no
# cat /etc/rsyslog.d/10-external.conf
*.* #rserver
# logger "hello, local time $(date)"
And get on rsyslogserver:
cat /var/log/external/ruser.home.xmu/user.root.notice.log
2022-04-27T13:07:06+03:00 ruser.home.xmu root: hello, local time 2022-04-27T13:07:06 UTC
# date
2022-04-27T16:08:56 MSK
What i can do for change time zone settings for some remote hosts on collect-server?
When i reserch incedents from all servers the time does not match in logs. I want the time on the collector in the logs to be in his time zone.
2022-04-27T16:07:06+03:00 ruser.home.xmu root: hello, local time 2022-04-27T13:07:06 UTC

You can define the timezone in rsyslog on the client - which in my opinion is the cleaner solution.
In /etc/rsyslog.conf do the following:
Comment/remove the current template
# Use default timestamp format
#$ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat
Then add the timezone, as well as a custom log template:
timezone(id="MSK" offset="+03:00")
# Custom time format
$template myTemplate,"%timegenerated% %HOSTNAME% %syslogtag%%msg%\n"
$ActionFileDefaultTemplate myTemplate
However, if you can't access the remote client which is sending the logs, it's possible to use the timestamp when the log was received on the server.
$template myTemplate,"%timegenerated% %HOSTNAME% %syslogtag%%msg%\n"
ruleset(name="myRuleset"){
$ActionFileDefaultTemplate myTemplate
# Do some other stuff
}
module(load="imtcp")
input(type="imtcp" port="5000" ruleset="myRuleset")
module(load="imudp")
input(type="imudp" port="5000" ruleset="myRuleset")
NOTE: Don't forget to restart the rsyslog service after applying the changes.
sudo service rsyslog restart
EDIT:
Creating a template using the advanced syntax would look like the following:
template (name="myTemplate" type="string"
string="%timegenerated% %HOSTNAME% %syslogtag%%msg%\n")
The string is the actual template of the messages that should be logged, not the destination to which the messages should be logged.

I ran into a similar problem recently. I think the simplest way to synchronize data coming from multiple sources is to use UTC timestamps everywhere, this way you avoid any timezone and summer time/winter time problems.
I think the problem here (and the problem I had) is that the syslog ouput uses the rfc3164 format which does not include the timezone information:
<6>Feb 28 12:00:00 192.168.0.1 fluentd[11111]: [error] Hello!
There's no way of knowing at what time that message actually happened without the timezone information. Since rsyslog version 8.18.0 what you can do though is define custom rsyslog templates that print the timestamps in UTC. If you use UTC everywhere you don't need that timestamp information anymore.
I'm sending my logs to fluentd, I use the following configuration in my rsyslog.conf file on my hosts:
template(
name="UTCTraditionalForwardFormat"
type="string"
string="<%PRI%>%TIMESTAMP:::date-utc% %HOSTNAME% %syslogtag:1:32%%msg:::sp-if-no-1st-sp%%msg%"
)
*.* action(
type="omfwd"
Target="10.5.0.9"
Port="5145"
Protocol="udp"
Template="UTCTraditionalForwardFormat"
)
The UTCTraditionalForwardFormat here is the same as the default template RSYSLOG_TraditionalForwardFormat (rsyslog templates) but with an added :::date-utc to the TIMESTAMP property.
On your rsyslog that collects logs, looking at the input module's doc there an experimental parameter DefaultTZ which should let you define the source timezone, something like this should work (I haven't tested):
input(
type="imudp"
port="514"
DefaultTZ="+00:00"
)
Assuming this DefaultTZ parameter works, this should work regardless of your hosts timezone.
Little word about fluentd, the fluentd syslog module also accepts the rfc5424 message format which does encode timezone information. I decided not to go this way though because rsyslog default templates does not seem to have a real rfc5424 version (the RSYSLOG_SyslogProtocol23Format template is very close to the rfc5424, but I don't know how close).

Related

how to log to journal with mqtt mosquitto

Mosquitto has an extensive logging mechanism. But it seems it only can log against syslog.
Is there a way to get logs captured in the systemd's jounal without running it as a service?
Note: mosquito runs inside a docker container
mosquitto.conf logging section is:
# =================================================================
# Logging
# =================================================================
# Places to log to. Use multiple log_dest lines for multiple
# logging destinations.
# Possible destinations are: stdout stderr syslog topic file dlt
#
# stdout and stderr log to the console on the named output.
#
# syslog uses the userspace syslog facility which usually ends up
# in /var/log/messages or similar.
#
# topic logs to the broker topic '$SYS/broker/log/<severity>',
# where severity is one of D, E, W, N, I, M which are debug, error,
# warning, notice, information and message. Message type severity is used by
# the subscribe/unsubscribe log_types and publishes log messages to
# $SYS/broker/log/M/susbcribe or $SYS/broker/log/M/unsubscribe.
#
# The file destination requires an additional parameter which is the file to be
# logged to, e.g. "log_dest file /var/log/mosquitto.log". The file will be
# closed and reopened when the broker receives a HUP signal. Only a single file
# destination may be configured.
#
# The dlt destination is for the automotive `Diagnostic Log and Trace` tool.
# This requires that Mosquitto has been compiled with DLT support.
#
# Note that if the broker is running as a Windows service it will default to
# "log_dest none" and neither stdout nor stderr logging is available.
# Use "log_dest none" if you wish to disable logging.
log_dest syslog
log_dest file /var/log/mosquitto/mosquitto.log
# Types of messages to log. Use multiple log_type lines for logging
# multiple types of messages.
# Possible types are: debug, error, warning, notice, information,
# none, subscribe, unsubscribe, websockets, all.
# Note that debug type messages are for decoding the incoming/outgoing
# network packets. They are not logged in "topics".
#log_type error
#log_type warning
#log_type notice
log_type all
# If set to true, client connection and disconnection messages will be included
# in the log.
#connection_messages true
# If using syslog logging (not on Windows), messages will be logged to the
# "daemon" facility by default. Use the log_facility option to choose which of
# local0 to local7 to log to instead. The option value should be an integer
# value, e.g. "log_facility 5" to use local5.
log_facility 5
# If set to true, add a timestamp value to each log message.
#log_timestamp true
# Set the format of the log timestamp. If left unset, this is the number of
# seconds since the Unix epoch.
# This is a free text string which will be passed to the strftime function. To
# get an ISO 8601 datetime, for example:
# log_timestamp_format %Y-%m-%dT%H:%M:%S
#log_timestamp_format
# Change the websockets logging level. This is a global option, it is not
# possible to set per listener. This is an integer that is interpreted by
# libwebsockets as a bit mask for its lws_log_levels enum. See the
# libwebsockets documentation for more details. "log_type websockets" must also
# be enabled.
#websockets_log_level 0
Because of log_dest syslog is specified, logging to syslog should be enabled, shouldn't it?
There are logs of mosquitto in /var/log/syslog:
Nov 17 23:08:29 raspberrypi mosquitto[40]: 1668726509: Opening ipv6 listen socket on port 1883.
Nov 17 23:08:29 raspberrypi mosquitto[40]: 1668726509: Opening ipv4 listen socket on port 1883.
Nov 17 23:08:29 raspberrypi mosquitto[40]: 1668726509: Opening websockets listen socket on port 9001.
Nov 17 23:08:29 raspberrypi mosquitto[40]: 1668726509: mosquitto version 2.0.15 running
...
Nov 30 12:12:18 raspberrypi mosquitto[39]: 1669810338: Sending PUBLISH to shell (d0, q0, r0, m0, '$SYS/broker/load/bytes/sent/1min', ... (4 bytes))
Nov 30 12:12:29 raspberrypi mosquitto[39]: 1669810349: Sending PUBLISH to shell (d0, q0, r0, m0, '$SYS/broker/load/bytes/sent/1min', ... (4 bytes))
Nov 30 12:12:40 raspberrypi mosquitto[39]: 1669810360: Sending PUBLISH to shell (d0, q0, r0, m0, '$SYS/broker/load/bytes/sent/1min', ... (4 bytes))
But they do not end up in the journal. I suspect it is because no systemd init is running. Can this be worked around?
Just configure mosquitto to log to syslog and it will end up in the journal.
Settings log_dest syslog should just work.
Edit for clarity.
This works on my Fedora 36 machine where journald appears to be consuming syslogd messages as can be seen by running journalctl -f -t mosquitto which output the usual mosquitto startup log info.

import and Parse syslog file to elastisearch

I need to import a set of SYSLOG files to elasticsearch. I'am using a filebeat agent.
I succeeded the data importation, however the data in elasticsearch is not parsed.
This is the input file:
Feb 14 03:43:40 my_host_name run-parts(/etc/cron.daily)[1544] finished rhsmd
Feb 14 03:43:40 my_host_name anacron[240673]: Job `cron.daily' terminated (produced output)
Feb 14 03:43:41 my_host_name anacron[240673]: Normal exit (1 job run)
Feb 14 03:43:41 my_host_name postfix/pickup[241860]: 7E8CFC00BB50: uid=0 from=<root>
I work on the 7.15.2 version of Filebeat and Elasticsearch. I get an index output with the field message not parsed. That contain for example the hole line " Feb 14 03:43:41 my_host_name anacron[240673]: Normal exit (1 job run)".
On the versions 8.0 there is a processor option to add to the configuration file that parse this field:
processors:
- syslog:
field: message
However in the version 7.15.2 this option is not available.
How can I parse this Field in the Filebeat configuration ?
Thank you for your help.
What you could do is either use the dissect or script processors to parse the values according to your needs. Not saying this is the best option, but it is an option

fluentd TimeParser Error - Invalid Time Format

I'm trying to get some Cisco Meraki MX firewalls logs pointed to our Kubernetes cluster using fluentd pods. I'm using the #syslog source plugin, and able to get the logs generated, but I keep getting this error
2022-06-30 16:30:39 -0700 [error]: #0 invalid input data="<134>1 1656631840.701989724 838071_MT_DFRT urls src=10.202.11.05:39802 dst=138.128.172.11:443 mac=90:YE:F6:23:EB:T0 request: UNKNOWN https://f3wlpabvmdfgjhufgm1xfd6l2rdxr.b3-4-eu-w01.u5ftrg.com/..." error_class=Fluent::TimeParser::TimeParseError error="invalid time format: value = 1 1656631840.701989724 838071_ME_98766, error_class = ArgumentError, error = string doesn't match"
Everything seems to be fine, but it seems as though the Meraki is sending it's logs in Epoch time, and the fluentd #syslog plugin is not liking it.
I have a vanilla config:
<source>
#type syslog
port 5140
tag meraki
</source>
Is there a way to possibly transform the time strings to something fluentd will like? Or what am I missing here.

Docker container Application logs to ELK stack without filebeat

I'm using the Elasti Cloud as it appears to be the most suitable for quickly setting up application logging. I have 24 docker container running in different nodes, and some containers have no of replicas also. i want to export inside docker container logs to elk stack.. I don't want to install Filebeat on each of my containers because that seems like it goes directly against Docker's separation of duties mantra.
.... how do I get logs from my application containers to log stash server
You can send your syslog to Logstash by configuring rsyslogd like this
# /etc/rsyslog.d/99-ship-syslog.conf
*.*;syslog;auth,authpriv.none action(
type="omfwd"
Target="myremote.elk-server.net"
Port="5001"
Protocol="udp"
)
If you don't have rsyslog running yet, you can add it like so (alpine linux example):
# Dockerfile
FROM alpine:3.7
RUN apk update \
&& apk add rsyslog
COPY rsyslog.conf /etc/rsyslog.conf
EXPOSE 514 514/udp
VOLUME [ "/var/log", "/etc/rsyslog.d" ]
ENTRYPOINT [ "rsyslogd", "-n" ]
--
# rsyslogd.conf
#
# if you experience problems, check:
# http://www.rsyslog.com/troubleshoot
#### MODULES ####
module(load="imuxsock") # local system logging support (e.g. via logger command)
#module(load="imklog") # kernel logging support (previously done by rklogd)
module(load="immark") # --MARK-- message support
module(load="imudp") # UDP listener support
input(type="imudp" port="514")
# Log all kernel messages to the console.
# Logging much else clutters up the screen.
#kern.* action(type="omfile" file="/dev/console")
# Log anything (except mail) of level info or higher.
# Don't log private authentication messages!
*.info;mail.none;authpriv.none;cron.none action(type="omfile" file="/var/log/messages")
# The authpriv file has restricted access.
authpriv.* action(type="omfile" file="/var/log/secure")
# Log all the mail messages in one place.
mail.* action(type="omfile" file="/var/log/maillog")
# Log cron stuff
cron.* action(type="omfile" file="/var/log/cron")
# Everybody gets emergency messages
*.emerg action(type="omusrmsg" users="*")
# Save news errors of level crit and higher in a special file.
uucp,news.crit action(type="omfile" file="/var/log/spooler")
# Save boot messages also to boot.log
local7.* action(type="omfile" file="/var/log/boot.log")
# log every host in its own directory
if $fromhost-ip then /var/log/$fromhost-ip/messages
# Include all .conf files in /etc/rsyslog.d
$IncludeConfig /etc/rsyslog.d/*.conf
$template GRAYLOGRFC5424,"<%PRI%>%PROTOCOL-VERSION% %TIMESTAMP:::date-rfc3339% %HOSTNAME% %APP-NAME% %PROCID% %MSGID% %STRUCTURED-DATA% %msg%\n"
*.info;mail.none;authpriv.none;cron.none;*.* ##graylog:514;GRAYLOGRFC5424 # forward everything to remote server
As you're running within a java-application, you can even send you logs directly to syslog. Here's a small configuration example with log4j
log4j.rootLogger=INFO, SYSLOG
log4j.appender.SYSLOG=org.apache.log4j.net.SyslogAppender
log4j.appender.SYSLOG.syslogHost=myremote.elk-server.net
log4j.appender.SYSLOG.layout=org.apache.log4j.PatternLayout
log4j.appender.SYSLOG.layout.conversionPattern=%d{ISO8601} %-5p [%t] %c{2} %x - %m%n
log4j.appender.SYSLOG.Facility=LOCAL1

Dataflow stock WordCount example with DataflowPipelineRunner fails when run outside of Maven

I am able to successfully run the WordCount example using DataflowPipelineRunner with the maven exec:java command shown in the docs.
However, when I attempt to run it in my own 1.8 VM, it doesn't work. I am using these args (on Windows):
--project=highfive-metrics-service \
--stagingLocation=gs://highfive-dataflow-test/staging \
--runner=BlockingDataflowPipelineRunner \
--gCloudPath=C:/Progra~1/Google/CloudS~1/google-cloud-sdk/bin/gcloud.cmd
I get the following error:
2014-12-24T04:53:34.849Z: (5eada047929dcead): Workflow failed. Causes: (5eada047929dce2e): There was a problem creating the GCE VMs or starting Dataflow on the VMs so no data was processed. Possible causes:
1. A failure in user code on in the worker.
2. A failure in the Dataflow code.
Next Steps:
1. Check the GCE serial console for possible errors in the logs.
2. Look for similar issues on http://stackoverflow.com/questions/tagged/google-cloud-dataflow.
Prior to the subsequent cleanup, I observed three harness instances on GCE as expected. Looking at the serial console for the first one, wordcount-jroy-1224043800-12232038-8cfa-harness-0, I see "normal" (comparing to what I see when running with Maven) looking output that ends with:
Dec 24 04:38:45 [ 16.443484] IPv6: ADDRCONF(NETDEV_CHANGE): docker0: link becomes ready
wordcount-jroy-1224043800-12232038-8cfa-harness-0 kernel: [ 16.438005] IPv6: ADDRCONF(NETDEV_CHANGE): veth30b3796: link becomes ready
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 kernel: [ 16.439395] docker0: port 1(veth30b3796) entered forwarding state
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 kernel: [ 16.440262] docker0: port 1(veth30b3796) entered forwarding state
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 kernel: [ 16.443484] IPv6: ADDRCONF(NETDEV_CHANGE): docker0: link becomes ready
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 12898 100 12898 0 0 2009k 0 --:--:-- --:--:-- --:--:-- 3148k
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: {"attributes":{"config":"{\"alsologtostderr\":true,\"base_task_dir\":\"/tmp/tasks/\",\"commandlines_file_name\":\"commandlines.txt\",\"continue_on_exception\":true,\"dataflow_api_endpoint\":\"https://www.googleapis.com/\",\"dataflow_api_version\":\"v1beta1\",\"log_dir\":\"/dataflow/logs/taskrunner/harness\",\"log_to_gcs\":true,\"log_to_serialconsole\":true,\"parallel_worker_flags\":{\"job_id\":\"2014-12-23_20_38_16.593375-08_10.48.106.68_-469744588\",\"project_id\":\"highfive-metrics-service\",\"reporting_enabled\":true,\"root_url\":\"https://www.googleapis.com/\",\"service_path\":\"dataflow/v1b3/projects/\",\"temp_gcs_directory\":\"gs://highfive-dataflow-test/staging\",\"worker_id\":\"wordcount-jroy-1224043800-12232038-8cfa-harness-0\"},\"project_id\":\"highfive-metrics-service\",\"python_harness_cmd\":\"python_harness_main\",\"scopes\":[\"https://www.googleapis.com/auth/devstorage.full_control\",\"https://www.googleapis.com/auth/cloud-platform\"],\"task_group\":\"nogroup\",\"task_user\":\"nobody\",\"temp_g
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 goo[ 16.494163] device veth29b6136 entered promiscuous mode
gle: cs_directory\":\"gs://highfive-dataflow-test/staging\",\"vm_id\":\"wordcoun[ 16.505311] IPv6: ADDRCONF(NETDEV_UP): veth29b6136: link is not ready
[ 16.507623] docker0: port 2(veth29b6136) entered forwarding state
t-jroy-122404380[ 16.507633] docker0: port 2(veth29b6136) entered forwarding state
0-12232038-8cfa-harness-0\"}","google-container-manifest":"\ncontainers:\n-\n env:\n -\n name: GCS_BUCKET\n value: dataflow-docker-images\n image: google/docker-registry\n imagePullPolicy: PullNever\n name: repository\n ports:\n -\n containerPort: 5000\n hostPort: 5000\n name: registry\n-\n image: localhost:5000/dataflow/taskrunner:20141217-rc00 \n imagePullPolicy: PullIfNotPresent\n name: taskrunner\n volumeMounts:\n -\n mountPath: /dataflow/logs/taskrunner/harness\n name: dataflowlogs-harness\n-\n env:\n -\n name: LOG_DIR\n value: /dataflow/logs\n image: localhost:5000/dataflow/shuffle:20141217-rc00 \n imagePullPolicy: PullIfNotPresent\n name: shuffle\n ports:\n -\n containerPort: 12345\n hostPort: 12345\n name: shuffle1\n -\n containerPort: 22349\n hostPort: 22349\n name: shuffle2\n volumeMounts:\n -\n mountPath: /var/shuffle\n name: dataflow-shuffle\n -\n mountPath: /dataflow/logs\n name: dataflow-logs\nversion: v1
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: beta2\nvolumes:\n-\n name: dataflowlogs-harness\n source:\n hostDir:\n path: /var/log/dataflow/taskrunner/harness\n-\n name: dataflow-shuffle\n source:\n hostDir:\n path: /dataflow/shuffle\n-\n name: dataflow-logs\n source:\n hostDir:\n path: /var/log/dataflow/shuffle\n","job_id":"2014-12-23_20_38_16.593375-08_10.48.106.68_-469744588","packages":"gs://dataflow-releases-prod/worker_packages/NOTICES.shuffle|NOTICES.shuffler|gs://highfive-dataflow-test/staging/access-bridge-64-fE-vq3Wgxy5FvnwmA5YdzQ.jar|access-bridge-64-fE-vq3Wgxy5FvnwmA5YdzQ.jar|gs://highfive-dataflow-test/staging/avro-1.7.7-dTlef6huetK-4IFERNhcqA.jar|avro-1.7.7-dTlef6huetK-4IFERNhcqA.jar|gs://highfive-dataflow-test/staging/charsets-7HC8Y2_U4k8yfkY6e4lxnw.jar|charsets-7HC8Y2_U4k8yfkY6e4lxnw.jar|gs://highfive-dataflow-test/staging/cldrdata-A4PVsm4mesLVUWOTKV5dhQ.jar|cldrdata-A4PVsm4mesLVUWOTKV5dhQ.jar|gs://highfive-dataflow-test/staging/commons-codec-1.3-2I5AW2KkklMQs3emwoFU5Q.jar|commons-codec-1.3-2I5AW2KkklMQs3emwoFU5Q.jar|gs://highfive-dataf
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: low-test/staging/commons-compress-1.4.1-uyvcB16Wfp4wnt8X1Uqi4w.jar|commons-compress-1.4.1-uyvcB16Wfp4wnt8X1Uqi4w.jar|gs://highfive-dataflow-test/staging/commons-logging-1.1.1-blBISC6STJhwBOT8Ksr3NQ.jar|commons-logging-1.1.1-blBISC6STJhwBOT8Ksr3NQ.jar|gs://highfive-dataflow-test/staging/dataflow-test-YIJKUxARCp14MLdWzNdBdQ.zip|dataflow-test-YIJKUxARCp14MLdWzNdBdQ.zip|gs://highfive-dataflow-test/staging/deploy-eLnif2izXW_mrleXudK0Eg.jar|deploy-eLnif2izXW_mrleXudK0Eg.jar|gs://highfive-dataflow-test/staging/dnsns-hmxeUSrhtJou0Wo-UoCjTw.jar|dnsns-hmxeUSrhtJou0Wo-UoCjTw.jar|gs://highfive-dataflow-test/staging/google-api-client-1.19.0-YgeHY_Y9dPd2PwGBWwvmmw.jar|google-api-client-1.19.0-YgeHY_Y9dPd2PwGBWwvmmw.jar|gs://highfive-dataflow-test/staging/google-api-services-bigquery-v2-rev167-1.19.0-mNojB6wqlFqAd2G9Zo7o5w.jar|google-api-services-bigquery-v2-rev167-1.19.0-mNojB6wqlFqAd2G9Zo7o5w.jar|gs://highfive-dataflow-test/staging/google-api-services-compute-v1-rev34-1.19.0-yR5ItN9uOowLPyMiTckyCA.jar|google-api-services
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: -compute-v1-rev34-1.19.0-yR5ItN9uOowLPyMiTckyCA.jar|gs://highfive-dataflow-test/staging/google-api-services-dataflow-v1beta3-rev1-1.19.0-Cg8Pyd4F0t7yqSE4E7v7Rg.jar|google-api-services-dataflow-v1beta3-rev1-1.19.0-Cg8Pyd4F0t7yqSE4E7v7Rg.jar|gs://highfive-dataflow-test/staging/google-api-services-datastore-protobuf-v1beta2-rev1-2.1.0-UxLefoYWxF5K1EpQjKMJ4w.jar|google-api-services-datastore-protobuf-v1beta2-rev1-2.1.0-UxLefoYWxF5K1EpQjKMJ4w.jar|gs://highfive-dataflow-test/staging/google-api-services-pubsub-v1beta1-rev9-1.19.0-7E1jg5ZyfaqZBCHY18fPkQ.jar|google-api-services-pubsub-v1beta1-rev9-1.19.0-7E1jg5ZyfaqZBCHY18fPkQ.jar|gs://highfive-dataflow-test/staging/google-api-services-storage-v1-rev11-1.19.0-8roIrNilTlO2ZqfGfOaqkg.jar|google-api-services-storage-v1-rev11-1.19.0-8roIrNilTlO2ZqfGfOaqkg.jar|gs://highfive-dataflow-test/staging/google-cloud-dataflow-java-examples-all-manual_build-A9j6W_hzOlq6PBrg1oSIAQ.jar|google-cloud-dataflow-java-examples-all-manual_build-A9j6W_hzOlq6PBrg1oSIAQ.jar|gs://highfive-dataf
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: low-test/staging/google-cloud-dataflow-java-examples-all-manual_build-tests-iIdI-AhKWiVKTuJzU5JxcQ.jar|google-cloud-dataflow-java-examples-all-manual_build-tests-iIdI-AhKWiVKTuJzU5JxcQ.jar|gs://highfive-dataflow-test/staging/google-cloud-dataflow-java-sdk-all-alpha-PqdZNVZwhs6ixh6de6vM7A.jar|google-cloud-dataflow-java-sdk-all-alpha-PqdZNVZwhs6ixh6de6vM7A.jar|gs://highfive-dataflow-test/staging/google-http-client-1.19.0-1Vc3U5mogjNLbpTK7NVwDg.jar|google-http-client-1.19.0-1Vc3U5mogjNLbpTK7NVwDg.jar|gs://highfive-dataflow-test/staging/google-http-client-jackson-1.15.0-rc-oW6nFU6Gme53SYGJ9KlNbA.jar|google-http-client-jackson-1.15.0-rc-oW6nFU6Gme53SYGJ9KlNbA.jar|gs://highfive-dataflow-test/staging/google-http-client-jackson2-1.19.0-AOUP2FfuHtACTs_0sul54A.jar|google-http-client-jackson2-1.19.0-AOUP2FfuHtACTs_0sul54A.jar|gs://highfive-dataflow-test/staging/google-http-client-protobuf-1.15.0-rc-xYoprQdNcvzuQGZXvJ3ZaQ.jar|google-http-client-protobuf-1.15.0-rc-xYoprQdNcvzuQGZXvJ3ZaQ.jar|gs://highfive-dataflow-test/st
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: aging/google-oauth-client-1.19.0-b3S5WqgD7iWrwg38pfg3Xg.jar|google-oauth-client-1.19.0-b3S5WqgD7iWrwg38pfg3Xg.jar|gs://highfive-dataflow-test/staging/google-oauth-client-java6-1.19.0-cP8xzICJnsNlhTfaS0egcg.jar|google-oauth-client-java6-1.19.0-cP8xzICJnsNlhTfaS0egcg.jar|gs://highfive-dataflow-test/staging/guava-18.0-HtxcCcuUqPt4QL79yZSvag.jar|guava-18.0-HtxcCcuUqPt4QL79yZSvag.jar|gs://highfive-dataflow-test/staging/hamcrest-all-1.3-n3_QBeS4s5a8ffbBPQIpFQ.jar|hamcrest-all-1.3-n3_QBeS4s5a8ffbBPQIpFQ.jar|gs://highfive-dataflow-test/staging/hamcrest-core-1.3-DvCZoZPq_3EWA4TcZlVL6g.jar|hamcrest-core-1.3-DvCZoZPq_3EWA4TcZlVL6g.jar|gs://highfive-dataflow-test/staging/httpclient-4.0.1-sfocsPjEBE7ppkUpSIJZkA.jar|httpclient-4.0.1-sfocsPjEBE7ppkUpSIJZkA.jar|gs://highfive-dataflow-test/staging/httpcore-4.0.1-_SGEPUOMREqA8u_h7qy9_w.jar|httpcore-4.0.1-_SGEPUOMREqA8u_h7qy9_w.jar|gs://highfive-dataflow-test/staging/idea_rt-6II88e1BKUeCOQqcrZht-w.jar|idea_rt-6II88e1BKUeCOQqcrZht-w.jar|gs://highfive-dataflow-test/staging/jacce
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: ss-laKenN34W6jKKivkBUzVcA.jar|jaccess-laKenN34W6jKKivkBUzVcA.jar|gs://highfive-dataflow-test/staging/jackson-annotations-2.4.2-7cAfM1zz0nmoSOC_NlRIcw.jar|jackson-annotations-2.4.2-7cAfM1zz0nmoSOC_NlRIcw.jar|gs://highfive-dataflow-test/staging/jackson-core-2.4.2-3CV4j5-qI7Y-1EADAiakmw.jar|jackson-core-2.4.2-3CV4j5-qI7Y-1EADAiakmw.jar|gs://highfive-dataflow-test/staging/jackson-core-asl-1.9.13-Ht2i1DaJ57v29KlMROpA4Q.jar|jackson-core-asl-1.9.13-Ht2i1DaJ57v29KlMROpA4Q.jar|gs://highfive-dataflow-test/staging/jackson-databind-2.4.2-M7rkZKQCfOO3vWkOyf9BKg.jar|jackson-databind-2.4.2-M7rkZKQCfOO3vWkOyf9BKg.jar|gs://highfive-dataflow-test/staging/jackson-mapper-asl-1.9.13-eoeZFbovPzo033HQKy6x_Q.jar|jackson-mapper-asl-1.9.13-eoeZFbovPzo033HQKy6x_Q.jar|gs://highfive-dataflow-test/staging/javaws-O8JqID6BpsXsCSRRkhii3w.jar|javaws-O8JqID6BpsXsCSRRkhii3w.jar|gs://highfive-dataflow-test/staging/jce-eMjjWzdqQh30yNZ9HMuXMA.jar|jce-eMjjWzdqQh30yNZ9HMuXMA.jar|gs://highfive-dataflow-test/staging/jfr-xDzacRGMQeIR4SdPe69o1A.jar|jfr
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: -xDzacRGMQeIR4SdPe69o1A.jar|gs://highfive-dataflow-test/staging/jfxrt-5aSYnU7M458Xy_hx5zXF8w.jar|jfxrt-5aSYnU7M458Xy_hx5zXF8w.jar|gs://highfive-dataflow-test/staging/jfxswt-X8I_DFy9gs_6LMLp6_LFPA.jar|jfxswt-X8I_DFy9gs_6LMLp6_LFPA.jar|gs://highfive-dataflow-test/staging/joda-time-2.4-EIO48_0LMn2_imYqUT5jxA.jar|joda-time-2.4-EIO48_0LMn2_imYqUT5jxA.jar|gs://highfive-dataflow-test/staging/jsr305-1.3.9-ntb9Wy3-_ccJ7t2jV2Tb3g.jar|jsr305-1.3.9-ntb9Wy3-_ccJ7t2jV2Tb3g.jar|gs://highfive-dataflow-test/staging/jsse-HOItnWzBlT4hG5HPmlF56w.jar|jsse-HOItnWzBlT4hG5HPmlF56w.jar|gs://highfive-dataflow-test/staging/junit-4.11-lCgz3FeSwzD13Q_KNW4MuQ.jar|junit-4.11-lCgz3FeSwzD13Q_KNW4MuQ.jar|gs://highfive-dataflow-test/staging/localedata-R9ei3T8qar8cibFNN0X7Qg.jar|localedata-R9ei3T8qar8cibFNN0X7Qg.jar|gs://highfive-dataflow-test/staging/management-agent-kiuGeHiVpYKGCDNexcQPIg.jar|management-agent-kiuGeHiVpYKGCDNexcQPIg.jar|gs://highfive-dataflow-test/staging/mockito-all-1.9.5-_T4jPTp05rc7PhcOO34Saw.jar|mockito-all-1.9.5-_T4jPTp0
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: 5rc7PhcOO34Saw.jar|gs://highfive-dataflow-test/staging/nashorn-x8si6abt-U04QaVUHvl_bg.jar|nashorn-x8si6abt-U04QaVUHvl_bg.jar|gs://highfive-dataflow-test/staging/paranamer-2.3-rdmhSrp7GRPVm0JexWjzzg.jar|paranamer-2.3-rdmhSrp7GRPVm0JexWjzzg.jar|gs://highfive-dataflow-test/staging/plugin-TG6U30mOzKi8yMGKYd7ong.jar|plugin-TG6U30mOzKi8yMGKYd7ong.jar|gs://highfive-dataflow-test/staging/protobuf-java-2.5.0-g0LcHblB4cg-bZEbNj3log.jar|protobuf-java-2.5.0-g0LcHblB4cg-bZEbNj3log.jar|gs://highfive-dataflow-test/staging/resources-RavNZwakZf55HEtrC9KyCw.jar|resources-RavNZwakZf55HEtrC9KyCw.jar|gs://highfive-dataflow-test/staging/rt-Z2kDZdIt-eG8CCtFIinW1g.jar|rt-Z2kDZdIt-eG8CCtFIinW1g.jar|gs://highfive-dataflow-test/staging/slf4j-api-1.7.7-M8fOZEWF4TcHiUbfZmJY7A.jar|slf4j-api-1.7.7-M8fOZEWF4TcHiUbfZmJY7A.jar|gs://highfive-dataflow-test/staging/slf4j-jdk14-1.7.7-hDm19oG8Vzi6jVY9pLtr_g.jar|slf4j-jdk14-1.7.7-hDm19oG8Vzi6jVY9pLtr_g.jar|gs://highfive-dataflow-test/staging/snappy-java-1.0.5-WxwEQNTeXiDmEGBuY9O3Og.jar|snappy-java
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: -1.0.5-WxwEQNTeXiDmEGBuY9O3Og.jar|gs://highfive-dataflow-test/staging/sunec-ffsdkJzKsC8XbuZa-XHp3Q.jar|sunec-ffsdkJzKsC8XbuZa-XHp3Q.jar|gs://highfive-dataflow-test/staging/sunjce_provider-4x9-ynTri_pg6Hhk2Zj9Ow.jar|sunjce_provider-4x9-ynTri_pg6Hhk2Zj9Ow.jar|gs://highfive-dataflow-test/staging/sunmscapi-5TwnMDAci3Hf47yMZYmN1g.jar|sunmscapi-5TwnMDAci3Hf47yMZYmN1g.jar|gs://highfive-dataflow-test/staging/sunpkcs11-vCiFLLKN99XBpHW2JTkOBw.jar|sunpkcs11-vCiFLLKN99XBpHW2JTkOBw.jar|gs://highfive-dataflow-test/staging/xz-1.0-6m1HjeacPsPpniZtMte8kw.jar|xz-1.0-6m1HjeacPsPpniZtMte8kw.jar|gs://highfive-dataflow-test/staging/zipfs-SIKQJJIhpGOgSa4tT6nStA.jar|zipfs-SIKQJJIhpGOgSa4tT6nStA.jar"},"description":"GCE Instance created for Dataflow","disks":[{"deviceName":"persistent-disk-0","index":0,"mode":"READ_WRITE","type":"PERSISTENT"}],"hostname":"wordcount-jroy-1224043800-12232038-8cfa-harness-0.c.highfive-metrics-service.internal","id":8960015560553137779,"image":"","machineType":"projects/537312487774/machineTypes/n1-stan
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: dard-4","maintenanceEvent":"NONE","networkInterfaces":[{"accessConfigs":[{"externalIp":"130.211.184.44","type":"ONE_TO_ONE_NAT"}],"forwardedIps":[],"ip":"10.240.173.213","network":"projects/537312487774/networks/default"}],"scheduling":{"automaticRestart":"TRUE","onHostMaintenance":"MIGRATE"},"serviceAccounts":{"537312487774#developer.gserviceaccount.com":{"aliases":["default"],"email":"537312487774#developer.gserviceaccount.com","scopes":["https://www.googleapis.com/auth/any-api","https://www.googleapis.com/auth/bigquery","https://www.googleapis.com/auth/cloud-platform","https://www.googleapis.com/auth/compute","https://www.googleapis.com/auth/datastore","https://www.googleapis.com/auth/devstorage.full_control","https://www.googleapis.com/auth/logging.write","https://www.googleapis.com/auth/ndev.cloudman","https://www.googleapis.com/auth/pubsub","https://www.googleapis.com/auth/userinfo.email"]},"default":{"aliases":["default"],"email":"537312487774#developer.gserviceaccount.com","scopes":["https://www.goog
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: leapis.com/auth/any-api","https://www.googleapis.com/auth/bigquery","https://www.googleapis.com/auth/cloud-platform","https://www.googleapis.com/auth/compute","https://www.googleapis.com/auth/datastore","https://www.googleapis.com/auth/devstorage.full_control","https://www.googleapis.com/auth/logging.write","https://www.googleapis.com/auth/ndev.cloudman","https://www.googleapis.com/auth/pubsub","https://www.googleapis.com/auth/userinfo.email"]}},"tags":["dataflow"],"zone":"projects/537312487774/zones/us-central1-a"}
Dec 24 04:38:45 wordcount-jroy-1224043800-12232038-8cfa-harness-0 google: No startup script found in metadata.
Not sure what I should be looking for, but this seems to reliably fail for me in this manner. I see the same problem when I try to run a custom pipeline of my own (i.e. not WordCount), and also when I run the WordCount example on Linux.
I saved off a file where I recorded:
The complete output from the WordCount main class
The metadata field values set on the GCE instance
The complete serial console output
It is available here.
Things I've tried so far, without success:
Forcing the language level of the compiled classes to 1.7 (am using 1.8 JRE)
Modifying DataflowPipelineRunner::detectClassPathResourcesToStage to not emit JRE jar files (this is a difference I noticed in the log compared to Maven; when running under Maven the JRE jars are not staged).
EDIT: Attempting to set the classpath to EXACTLY the same as what Maven ends up using (removing all of our projects' dependencies). This seemed to change the behavior a bit and I got to a java.lang.ClassNotFoundException: com.google.cloud.dataflow.examples.WordCount$ExtractWordsFn in the worker output.
Strongly suspicious that the problem lies with the staged classpath, but without more specific error messages, I'm shooting in the dark. Would appreciate ideas of where to look next or other things to try.
When running pipelines using [Blocking]DataflowPipelineRunner from the Cloud Dataflow Java SDK, the runner automatically copies everything from your local Java class path to a staging location in Google Cloud Storage, which is being accessed by workers on-demand.
ClassNotFoundException in the Cloud Dataflow worker environment is an indication that required dependencies for your pipeline are not properly staged in a Google Cloud Storage bucket. This likely root cause can be confirmed by looking at the contents of your staging bucket in Google Developers Console and the console output of BlockingDataflowPipelineRunner.
Now, the problem can be fixed by bundling all dependencies into a single, monolithic jar. In Maven, the following command can be used to create such a jar as long as the bundle plugin is properly configured to embed all transitive dependencies:
mvn bundle:bundle
Then, the bundled jar can be executed normally, such as:
java -cp <bundled jar> <main class> --project=<project> ...
Alternatively, the problem can be fixed by manually adding dependencies to your local class path. For example, the following command may be helpful when running an unbundled jar:
java -cp <unbundled jar>:<dep1>:<dep2>:...:<depN> <main class> --project=<project> ...
where dep1 to depN are all the dependencies needed for execution of the program. This is clearly error prone, and we don't endorse it. Our documentation recommends using mvn exec:java because that sets the execution class path automatically from the dependencies listed in the POM file. Specifically, to run WordCount example, use:
mvn exec:java -pl examples \
-Dexec.mainClass=com.google.cloud.dataflow.examples.WordCount \
-Dexec.args="--project=<YOUR GCP PROJECT NAME> --stagingLocation=<YOUR GCS LOCATION> --runner=BlockingDataflowPipelineRunner"
The main difference between bundled and unbundled version is in the upload activity before pipeline submission. Unbundled version has an advantage that it can automatically use unchanged dependencies that may have been uploaded in previous submissions.
To summarize, use mvn exec:java when running an unbundled jar, or bundle the dependencies into a monolithic jar. We'll try to clarify this in the documentation.
There's a very high likelihood that this is an issue with staging dependencies.
There's a high probability if you create a bundled jar it will just work. You can create a bundled jar by running the command
mvn bundle:bundle
This will create a single jar that should pull in all dependencies transitively. You then just need to add that jar to your class path and Dataflow should automatically stage it; Thereby ensuring your code as well as any dependencies are available on the worker.
Most likely the job worked with mvn exec, because maven automatically generates a class path with all dependencies from the POM. When running manually, that doesn't happen. i.e if you invoke java directly e.g.
java -cp <JAR FILES> your.main.class --project=<YOUR PROJECT> ....
then you must add all dependencies to the class path so that they get staged. Creating a bundled jar as suggested above is usually the easiest way to do that.
My suggestion would be to look at the worker logs to see if we can find additional information about what's going on in the workers.
There are three ways to get this information. The first is via the Dataflow UI. Go to the Google Cloud Console and then select the Dataflow option in the left hand frame. You should see a list of your jobs. You can click on the job in question. This should show you a graph of your job. On the right side you should see a button "view logs". Please click that. You should then see a UI for navigating the logs and you can look for errors.
The second option is to look for the logs on GCS. The location to look for is:
gs://PATH TO YOUR STAGING DIRECTORY/logs/JOB-ID/VM-ID/LOG-FILE
You might see multiple log files. The one we are most interested in is the one that starts with "start_java_worker". If that log file doesn't exist then the worker didn't make enough progress to actually upload the file; or else there might have been a permission problem uploading the log file.
In that case the best thing to do is to try to ssh into one of the VMs before it gets torn down. You should have about 15 minutes before the job fails and the VMs are deleted.
Once you login to the VM you can find all the logs in
/var/log/dataflow/...
The log we care most about at this point is:
/var/log/dataflow/taskrunner/harness/start_java_worker-SOME ID.log
If there is a problem starting the code that runs on the VM that log should tell us. That log and the other logs should also tell us if there is a permission problem that prevents the code running on the worker from being able to access Dataflow.
Please take a look and let us know if you find anything.

Resources