Enabling Debug|Trace worker logs for google cloud dataflow - google-cloud-dataflow

Not able to enable the debug|trace level logging of the dataflow workers.
The documentation :https://cloud.google.com/dataflow/docs/guides/logging#SettingLevels
indicates the usage of DataflowWorkerLoggingOptions to programmatically overrides the default log level on the worker and enable the debug|trace level logging; however the interface is deprecated and no more present in bean-sdk 2.27.0 .
Has anyone been able to enable the worker level debugging in cloud dataflow; in any way.

The documentation is still up to date and the interface is still present and will work.
The interface is deprecated because the Java-based Dataflow worker is not used when running a pipeline using Beam's portability framework. Quoting the deprecation message:
#deprecated This interface will no longer be the source of truth for worker logging configuration once jobs are executed using a dedicated SDK harness instead of user code being co-located alongside Dataflow worker code. Please set the option below and also the corresponding option within org.apache.beam.sdk.options.SdkHarnessOptions to ensure forward compatibility.
So what you should do is follow the instructions that you linked and also set up logging in SdkHarnessOptions.

Related

Spring Cloud Data Flow - Task Properties

I'm using SCDF and i was wondering if there was any way to configure default properties for one application?
I got a task application registered in SCDF and this application gets some JDBC properties to access business database :
app.foo.export.datasource.url=jdbc:db2://blablabla
app.foo.export.datasource.username=testuser
app.foo.export.datasource.password=**************
app.foo.export.datasource.driverClassName=com.ibm.db2.jcc.DB2Driver
Do i really need to put this prop in a property file like this : (it's bit weird to define them during the launch)
task launch fooTask --propertiesFile aaa.properties
Also, we cannot use the rest API, credentials would appear in the url.
Or is there another way/place to define default business props for an application ? These props will be only used by this task.
The purpose is to have one place where OPS team can configure url and credentials without playing with the launch command.
Thank you.
Yeah, SCDF feels a bit weird in the configuration area.
As you wrote, you can register an application and create tasks, but all the configuration is passed at the first launch of the task. Speaking other way round, you can't fully install/configure a task without running it.
As soon as a task has run once, you can relaunch it without any configuration and it uses the configuration from before. The whole config is saved in the SCDF database.
However, if you try to overwrite an existing configuration property with a new value, SCDF seems to ignore the new value and continue to use the old one. No idea if this is intended by design or a bug or if we are doing something wrong.
Because we run SCDF tasks on Kubernetes and we are used to configure all infrastructure in YAML files, the best option we found was to write our own Operator for SCDF.
This operator works against the REST interface of SCDF and also compensates the weird configuration issues mentioned above.
For example the overwrite issue is solved by first deleting the configuration and recreate it with the new values.
With this operator we have reached what you are looking for: all our SCDF configuration is in a git repository and all changes are done through merge requests. Thanks to CI/CD, on the next launch, the new configuration is used.
However, a Kubernetes operator should be part of the product. Without it, SCDF on Kubernetes feels quite "alien".

How do you reuse the same openapi.yaml file for production and development

We are using a GitOps model for deploying our software. Everything in dev branch goes to the dev environment and everything in main gets deployed to production. All good and fine except that we use Google Cloud Endpoints that rely in the host parameter of the openapi.yaml. There is only room for a single value so we have to remember to change it for each deployment not allowing us to do a fully automated deploy.
How do you manage the same openapi.yaml definition when using Google Cloud Endpoints?
There is one example given in the official documentation, see if it helps your use-case.
Basic structure of an OpenAPI document, notice how the "host" is parameterized with "YOUR-PROJECT-ID.appspot.com"
Deploying the Endpoints configuration, using the provided script "./deploy_api.sh"
Source code for deploy_api.sh
One common solution for different environments properties management is to create different build profiles, and create different environment specific properties files like openapi_dev.yaml, openapi_qa.yaml, openapi_prod.yaml, and supply the one based on the profile(dev/qa/prod) being used. Refer here for more details.
Another way documented at GitOps-style continuous delivery with Cloud Build, where a multi branch, multi-repository approach is suggested.
Under the FAQ section in Swagger OpenAPI guide, it is clearly mentioned that, we can specify multiple hosts, e.g. development, test and production but for OpenAPI 3.0. OpenAPI2.0 supports only one host per API specification (or two if you count HTTP and HTTPS as different hosts). A possible way to target multiple hosts is to omit the host and schemes from your specification and serve it from each host. In this case, each copy of the specification will target the corresponding host.
As per Google documentation Cloud Endpoints currently support OpenAPI version 2.0. A feature request has been filed for support of version 3.0 but there have been no releases. You can follow for the updates here.

How to set mode and time in Dynamic Agents?

I am referring to this page:
https://www.instana.com/docs/setup_and_manage/host_agent/updates/#update-interval
Is there a way to pass mode and time from outside as environment variables or any other way beside logging into the pod and manually changing the files inside etc/instana/com.instana.agent.main.config.UpdateManager.cfg file?
To whoever removed his/her answer: It was a correct answer. I don't know why you deleted it. Anyhow, I am posting again in case someone stumbles here.
You can control frequency and time by using INSTANA_AGENT_UPDATES_FREQUENCY and INSTANA_AGENT_UPDATES_TIME environment variables.
Updating mode via env variable is still unknown at this point.
Look at this page for more info: https://www.instana.com/docs/setup_and_manage/host_agent/on/docker/#updates-and-version-pinning
Most agent settings that one may want to change quickly are available as environment variables, see https://www.instana.com/docs/setup_and_manage/host_agent/on/docker. For example, setting the mode via environment variable is supported as well with INSTANA_AGENT_MODE, see e.g., https://hub.docker.com/r/instana/agent. The valid values are:
APM: the default, the agent monitors everything
INFRASTRUCTURE: the agent will collect metrics and entities but not traces
OFF: agent runs but collects no telemetry
AWS: agent will collect data about AWS managed services in a region and an account, supported on EC2 and Fargate, and with some extra configurations, on hosts outside AWS
On Kubernetes, it is also of course possible to use a ConfigMap to override files in the agent container.

Logging linenumbers with SLF4J and Google Dataflow

When running Apache Beam Google Dataflow jobs and using the SLF4J logger we don't get anything beyond the log message in Stack Driver.
Example of additional information would be function, line number etc.
Is there anyway to configure the logger like a log4j.xml or java logging properties file?
There is no way to customize logs messages in Dataflow other than what is shown in this logging pipelines messages
Have you looked at Cloud Logging? It has several features such as Custom logs / Ingestion API. In case you haven't, take a look at this guide to setup the SLF4J logging facade through Logback appender and Cloud Logging. Once you have configured Logback to use the Cloud Logging, you can use the SLF4J logging API. Another option is to use the Cloud Logging API with a default Java Logging API handler, which can be added programmatically or by using a configuration file, here is an example using logger.
Isaac Miliani, I tried the same option google cloud logging as provided in the google cloud docs,
Added logback.xml to src/main/resources (classpath).
Created loggingeventenhancer and enhancer class to add new labels.
Added markers to logger error, to find the type of error in Stackdriver.
But the logs in stackdriver doesnt have new labels added via logging appender. I think the logback.xml is not found by the maven compile command to deploy the job in dataflow.
Can you provide whats going on wrong here?

How are WorkerHarnessThreads managed in Cloud Dataflow?

Is the option numberOfWorkerHarnessThreads used by cloud-dataflow runner now?
Earlier the PipelineOptions property numberOfWorkerHarnessThreads was specified in the doc and was displayed in Dataflow Job Monitoring UI under Pipeline options. Both are missing now.
If this is not used, how are the worker threads managed now?
The option is still there. You can find it in DataflowPipelineDebugOptions.

Resources