Dataflow with Go SDK failing with 'InvalidProtocolBufferException' - google-cloud-dataflow

We were previously running Beam 2.11.0 which started failing due to an apparent change in URN format. When I attempted to update and use the latest release (2.13.0) the pipeline started timing out, and the only seemingly relevant error that I could identify from the logs during testing was:
org.apache.beam.vendor.grpc.v1p13p1.com.google.protobuf.InvalidProtocolBufferException: While parsing a protocol message, the input ended unexpectedly in the middle of a field. This could mean either that the input has been truncated or that an embedded message misreported its own length.
To further test this, I attempted to use the wordcount example as provided in the Beam docs/repo here - https://beam.apache.org/get-started/wordcount-example/ - But I got the same result. I'm not sure if this is a generic error or something to do with Dataflow. The pipelines seem to work when running via direct runner.
Note: I have rebuilt our worker_harness_container_image with the latest version.
I understand the Go SDK is not officially supported by Dataflow, but could anybody tell me if the error is something related to Dataflow or some other issue?
PS: I've asked the question in Dataflow and Beam Slack channels but haven't had any response.

Related

How to catch an ouppuit of a process or a command

I'm trying to write a rule where the condition depends on the output of a script.sh. I had tried several approaches, but I did not have success.
Searching in your documentation but didn´t find anything that help me. I tried several evt or proc, but neither of them given me any info.
In fact, this is the rule with I'm trying to see how I can find a workaround:
- rule: FIM Custom rule
desc: Testing rule
condition: access_log_files and (evt.type=close)
output: Test result (proc_name=%proc.name command=%proc.cmdline evt_type=%evt.type evt.args =%evt.args syslog_.facility_str=%syslog.facility.str syslog_message=%syslog.message)
priority: WARNING
Consider please that I´m running Falco on Docker with the last image.
When I execute in the Ubuntu host the command logger test, I recievedin the stdout of the docker falco container this message:
{"hostname":"dc95654c63c3","output":"01:21:29.759239580: Warning Test result (proc_name=python3 command=python3 /usr/lib/ubuntu-advantage/timer.py evt_type=close evt.args =res=0 syslog_.facility_str= syslog_message=)","priority":"Warning","rule":"FIM Custom rule","source":"syscall","tags":[],"time":"2022-12-17T01:21:29.759239580Z", "output_fields": {"evt.args":"res=0 ","evt.time":1671240089759239580,"evt.type":"close","proc.cmdline":"python3 /usr/lib/ubuntu-advantage/timer.py","proc.name":"python3","syslog.facility.str":null,"syslog.message":null}}
So please tell me what I can do.
Thanks
In order to feed Falco with external sources of events (those that are not Kernel Syscalls) you'd need to use a Falco plugin. There are plugins to obtain events from Kubernetes, AWS CloudTrail, or even from GitHub. However, there is no plugin, that I know of, to obtain information from the standard output of a program or from Syslog.
Due to the nature of the project Falco, anyone in the community can contribute with such a plugin, so I invite you to join the Falco slack channel and ask around, or even write your own plugin.

why observedLargestContentFullPaint is always GREATER thanLargestContentFullPaint when using LHCI?

I'm currently using LHCI integrated with Gitlab CICD pipeline and I need to do the following:-
Add Assertion for ObservedDomContent Load value generated from the LightHouse json report, is it possible?
I'm using type Simulated type "Simulated" I have noticed that the observedLargestContentFullPaint is always larger than the LargestContentFullPaint although I expected that it will be less, can anyone explain and which one should I rely on during my page assessment

Pass end point details in Rest Web Service command in Automation Anywhere

In the Rest Webservice Command, I don't see any option to pass a variable in the URI.
We do not want to hard the end point in the script.
As an e.g I will want the script to use different points for dev/stage and prod.
Is there a work around for this.
On building a URI with variables like :
https://$v_hostname$/test-rs-v1/employee/data send request works fine but
bot runs we get an error stating :
Hostname could not be parsed.
Update: That was a bug and fixed on version 11.3.1. You can only achieve that on version 11.3.1 or later.
Reference: https://docs.automationanywhere.com/bundle/enterprise-v11.3/page/topics/release-notes/release-notes-11-3-1.html
Workaround for older versions (If you have experience with C#): Build and test DLLs
The following applies only on version 11.3.1 and later.
Make sure that $v_hostname$ contains a value at the run time, using debugging option or message box command.
I did reproduce the same error by entering a variable that doesn't exist or doesn't have a value, there is no another scenario would reproduce "Hostname could not be parsed".
If the hostname/URL is invalid you will get "The remote name could not be resolved:".
I've tested the REST Web Service command on both community and enterprise editions, and it's working very well.

The Dataflow appears to be stuck

Got the following message:
The Dataflow appears to be stuck. Please reach out to the Dataflow team at http://stackoverflow.com/questions/tagged/google-cloud-dataflow.
I realized there were other questions regarding the same error message, but the context seemed different for each and the message rather generic, so I'm posting again.
Job ID: 2017-09-25_09_27_25-5047889078463721675
Please assist. Thanks.
EDIT: Problem seems to have disappeared (at least for now) after updating to Apache Beam SDK for Python 2.1.1 from 2.0.0.
A common cause of stuckness in Dataflow pipelines is an inability to start the workers. If you look at the Stackdriver Logs (view Logs in the UI, and click the link to go to Stackdriver) you should be able to view the worker_startup logs. Any problems here can indicate failures to start workers, which would cause the job to be stuck.

Why is my Dataflow pipeline not showing steps?

When I run the examples I get a pretty picture showing the flow and I can monitor as it executes. With my application it doesn't show the diagram and if I click on "Step" it displays nothing.
Adding screenshot of Job log. No warnings or errors. BTW, I assumed the icon on the log entry with an "i" stands for Info level, but when I change the level from BASIC to ALL many more entries are added and they all have the same icon. That is confusing. Icons should be more clear and should have hover tips, IMO.
I'm on the Dataflow team. I'm sorry that you are encountering this issue.
I believe this is occurring because of the custom step names you're code is using.
From your screenshot of the job logs, it appears that some of these steps have been given names that represent a GCS storage path location.
I noticed this from this message in the logs:
Executing operation "gs://datalake/landing/...."
This fails to render in the monitoring UI and likely hits an assertion because slashes are disallowed characters.
In order to work around this issue would you please try removing the custom step names used in your code. Which seems to be set to gs:// style paths. You could also try specifying names for each step, without using special characters.
Please try running the job again after that change and see if the graph renders properly in the dataflow UI.
I have created a github issue to track this bug and prevent these slash characters from sent in the future in the dataflow SDK code.
Please let me know if you encounter any more issues.

Resources