Beam/Dataflow: No session file found: /var/opt/google/dataflow/pickled_main_session - google-cloud-dataflow

When using Apache Beam (GCP Dataflow) I see the following warning in worker logs:
No session file found: /var/opt/google/dataflow/pickled_main_session.
Functions defined in __main__ (interactive session) may fail.
My Dataflow job seems to be fine regardless, but I'm wondering what this warning is all about.
I have seen the following in some sample code (which I am NOT currently doing):
pipeline_options.view_as(SetupOptions).save_main_session = True
where pipeline_options is the main way of specifying options for the Beam/Dataflow pipeline, as in the following later in the code:
with beam.Pipeline(options=pipeline_options) as p:
# actual pipeline code here
I am curious if the two are related. Does the presence of the warning mean I should always be saving the main session? Are these two things related? Unrelated?

You should be able to safely ignore this warning. No need to set save_main_session if it's not required for your pipeline.

Related

import python functions into serversless

I h got an issue with the following serverless config.
this is my handler and the files/folders structure.
the issue is that after uploading my project to AWS when I test my lambda I got an error as follows:
lambda execution fails: "errorMessage": "Unable to import module 'app_monitor': No module named 'monitoring'"
{
"errorMessage": "Unable to import module 'src/app_monitor': No module named 'monitoring'",
"errorType": "Runtime.ImportModuleError",
"requestId": "bca3f67d-815f-452b-a2a6-c713ad2c6baa",
"stackTrace": []
}
have you got any clue how can I add this into serverless config.?
First, a quick tip on troubleshooting: When I ran into such issues it was helpful to go to the AWS console, look at the lambda function, and see what the uploaded file structure looks like on that end. Is the monitoring folder there?
Moreover, in order to specify how a specific function is packaged, you have to explicitly state that you want it to be individually packaged and not follow the general rules of the project as a whole.
You should try to add:
app_monitoring:
package:
individually: true
patterns:
- 'src/**'
More documentation on packaging configuration here
You may also have better luck with explicitly stating the patterns you need, I know I've had issues with globs in the past. So for example you can try:
patterns:
- 'src/app_monitoring.py'
- 'src/monitoring/get_lb.py'

Does a BEAM file remember whether it was built with -Werror?

I am working on a tool that deals with BEAM files, and we want to be able to assume the code was compiled with -Werror, so we don't have to repeat validations that are already done by the erl_lint compiler pass.
Is there a way to figure out if the BEAM was built with -Werror?
I'd expect beam_lib:chunks/2 to help here, but unfortunately it doesn't seem to have what I'm looking for:
beam_lib:chunks("sample.beam", [debug_info, attributes, compile_info]).
% the stuff returned says nothing about -Werror, even if I compile with -Werror
It seems that this information would be always stripped
However, if you are in control of compilation process - you can put additional info into beam files, - which will be accessible through M:module_info(compile) and via beam chunks as well.
For example in rebar:
{erl_opts, [debug_info, {compile_info, [{my_key, my_value}]}]}.
And then:
1> my_module:module_info(compile).
[{version,"7.6.6"},
{options,[debug_info, ...
{my_key,my_value}]
The same is true for "discoverability" of this key directly from beam chunks:
2> beam_lib:chunks("my_beam.beam", [compile_info]).
{ok, ... {my_key,my_value}]}]}}
Meaning, that you can "stamp" your beam files with some meta-information easily. So, a workaround may be to stamp those beam files with this mark.

Get console Logger (or TaskListener) from Pipeline script method

If I have a Pipeline script method in Pipeline script (Jenkinsfile), my Global Pipeline Library's vars/ or in a src/ class, how can obtain the OutputStream for the console log? I want to write directly to the console log.
I know I can echo or println, but for this purpose I need to write without the extra output that yields. I also need to be able to pass the OutputStream to something else.
I know I can call TaskListener.getLogger() if I can get the TaskListener (really hudson.util.StreamTaskListener) instance, but how?
I tried:
I've looked into manager.listener.logger (from the groovy postbuild plugin) and in the early-build context I'm calling from it doesn't yield an OutputStream that writes to the job's Console Log.
echo "listener is a ${manager.listener} - ${manager.listener.getClass().getName()} from ${manager} and has a ${manager.listener.logger} of class ${manager.listener.logger.getClass().getName()}"
prints
listener is a hudson.util.LogTaskListener#420c55c4 - hudson.util.LogTaskListener from org.jvnet.hudson.plugins.groovypostbuild.GroovyPostbuildRecorder$BadgeManager#58ac0c55 and has a java.io.PrintStream#715b9f99 of class java.io.PrintStream
I know you can get it from a StepContext via context.get(TaskListener.class) but I'm not in a Step, I'm in a CpsScript (i.e. WorkflowScript i.e. Jenkinsfile).
Finding it from a CpsFlowExecution obtained from the DSL instance registered as the steps script-property, but I couldn't work out how to discover the TaskListener that's passed to it when it's created
How is it this hard? What am I missing? There's so much indirect magic I find it incredibly hard to navigate the system.
BTW, I'm aware direct access is blocked by Script Security, but I can create #Whitelisted methods, and anything in a global library's vars/ is always whitelisted anyway.
You can access the build object from the Jenkins root object:
def listener = Jenkins.get()
.getItemByFullName(env.JOB_NAME)
.getBuildByNumber(Integer.parseInt(env.BUILD_NUMBER))
.getListener()
def logger = listener.getLogger() as PrintStream
logger.println("Listener: ${listener} Logger: ${logger}")
Result:
Listener: CloseableTaskListener[org.jenkinsci.plugins.workflow.log.BufferedBuildListener#6e9e6a16 / org.jenkinsci.plugins.workflow.log.BufferedBuildListener#6e9e6a16] Logger: java.io.PrintStream#423efc01
After banging my head against this problem for a couple days I think I have a solution:
CpsThreadGroup.current().execution.owner.listener
It's ugly, and I don't know if it's correct or if there's a better way, but seems to work.

Write to the system's standard error in Progress

I am writing a small program in Progress that needs to write an error message to the system's standard error. What ways, simple if at all possible, can I use to print to standard error?
I am using OpenEdge 11.3.
When on Windows (10.2B+) you can use .NET:
System.Console:Error:WriteLine ("This is an error message") .
together with
prowin32 2> stderr.out
Progress doesn't provide a way to write to stderr - the easiest way I can think of is to output-through an external program that takes stdin and echoes it to stderr.
You could look into LOG-MANAGER:WRITE-MESSAGE. It won't log to standard output or standard error, but to a client-specific log. This log should be monitored in any case (specifically if the client is an application server).
From the documentation:
For an interactive or batch client, the WRITE-MESSAGE( ) method writes the log entries to the log file specified by the LOGFILE-NAME attribute or the Client Logging (-clientlog) startup parameter. For WebSpeed agents and AppServer servers, the WRITE-MESSAGE() method writes the log entries to the server log file. For DataServers, the WRITE-MESSAGE() method writes the log entries to the log file specified by the DataServer Logging (-dslog) startup parameter.
LOG-MANAGER:WRITE-MESSAGE("Got here, x=" + STRING(x), "DEBUG1").
Will write this in the log:
[04/12/05#13:19:19.742-0500] P-003616 T-001984 1 4GL DEBUG1 Got here, x=5
There are quite a lot of options regarding the LOG-MANAGER system, what messages to display, where the file is placed, etc.
There is no easy way, but in Unixen you can always do something like this using OUTPUT THROUGH (untested):
output through "cat >&2" no-echo unbuffered.
Alternatively -- and this is tested -- if you just want error messages from a batch-mode program to go to standard out then
output through "tee" ...
...definitely works.

Cloud Dataflow: java.lang.IllegalStateException: no evaluator registered for GroupedValues

I'm getting the following exception when running the pipeline locally. There is no exception when submitting for cloud execution.
Thanks,
Genady
INFO: Executing pipeline using the DirectPipelineRunner.
Exception in thread "main" java.lang.IllegalStateException: no evaluator registered for GroupedValues [GroupedValues]
at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner$Evaluator.visitTransform(DirectPipelineRunner.java:606)
at com.google.cloud.dataflow.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:200)
at com.google.cloud.dataflow.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:196)
at com.google.cloud.dataflow.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:109)
at com.google.cloud.dataflow.sdk.Pipeline.traverseTopologically(Pipeline.java:204)
at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner$Evaluator.run(DirectPipelineRunner.java:583)
at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner.run(DirectPipelineRunner.java:327)
at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner.run(DirectPipelineRunner.java:70)
at app.Main.main(Main.java:124)
The code outline is basically this:
PCollection<KV<MyKey, Iterable<MyValue>>> groupedByMyKey = ...
PCollection<KV<MyKey, MyAggregated>> aggregated = groupedByMyKey.apply(
Combine.<MyKey, MyValue, MyAggregated>groupedValues(new Aggregator()));
Aggregator class extends CombineFn<MyValue, List<MyValue>, MyAggregated>
Can you share a code snippet that triggers this? GroupedValues is a PTransform that is often used within various combining transforms, so it might be from using something like Min, Max, etc.
The error means that the DirectPipelineRunner doesn't know how to evaluate a GroupedValues. However, that's unexpected, since that should have been expanded into a ParDo before execution.
I found the reason to this behaviour
I was using a command line argument to run it in remote mode (--runner=BlockingDataflowPipelineRunner) and then forced it to run locally with
PipelineRunner<?> runner = DirectPipelineRunner.fromOptions(options);
runner.run(p);
After removing these lines and just using the --runner=DirectPipelineRunner argument it worked as expected.

Resources