Use different Logs in waf - waf

I added a custom logger to the waf build command as seen below. The output of this wscript is shown below.
But I want to have the normal waf logger to print on the terminal, and the normal + custom logger output to be written to the log file. Is this possible? The problem is, that the normal output is most times sufficient, and the added verbosity by the custom logger slows down the build quite a lot.
#!/usr/bin/env python
# -*- encoding: utf-8 -*-
top = '.'
out = 'build'
VERSION = '0.0.0'
APPNAME = 'app'
from waflib import Configure, Logs
import logging
Configure.autoconfig = True
def options(opt):
opt.load('compiler_c')
def configure(conf):
conf.load('compiler_c')
conf.path.make_node('main.c').write(
'#include <stdio.h>\n\nint main(int argc, char* argv[]) {\n return 0;\n}\n')
def build(bld):
import sys
import os
log_file = os.path.join(out, 'build.log')
bld.logger = Logs.make_logger(log_file, out)
hdlr = logging.StreamHandler(sys.stdout)
formatter = logging.Formatter('%(message)s')
hdlr.setFormatter(formatter)
bld.logger.addHandler(hdlr)
bld.program(target='app', source='main.c')
The produced output is like this:
$ python waf
Configuring the project
Setting top to : /cygdrive/d/log
Setting out to : /cygdrive/d/log/build
Checking for 'gcc' (C compiler) : /usr/bin/gcc
Waf: Entering directory `/cygdrive/d/log/build'
[1/2] Compiling main.c
['/usr/bin/gcc', '../main.c', '-c', '-o/cygdrive/d/log/build/main.c.1.o']
[2/2] Linking build/app.exe
['/usr/bin/gcc', '-Wl,--enable-auto-import', 'main.c.1.o', '-o/cygdrive/d/log/build/app.exe', '-Wl,-Bstatic', '-Wl,-Bdynamic']
Waf: Leaving directory `/cygdrive/d/log/build'
'build' finished successfully (0.473s)
What I need is a terminal output like this:
[1/2] Compiling main.c
[2/2] Linking build/app.exe
and a log file output like this:
[1/2] Compiling [32mmain.c[0m
['/usr/bin/gcc', '../main.c', '-c', '-o/cygdrive/d/log/build/main.c.1.o']
[2/2] Linking [33mbuild/app.exe[0m
['/usr/bin/gcc', '-Wl,--enable-auto-import', 'main.c.1.o', '-o/cygdrive/d/log/build/app.exe', '-Wl,-Bstatic', '-Wl,-Bdynamic']
Bonus: How can the non printable characters be removed only in the log file output?

Related

How to Compile Agda Hello World on Nixos?

On nixos, I am trying to compile the hello world example listed in the agda documentation.
In my working directory, I have the following:
The hello-world agda program, hello-world.agda:
module hello-world where
open import IO
main = run (putStrLn "Hello, World!")
A nix shell file, shell.nix:
{ pkgs ? import <nixpkgs> { } }:
with pkgs;
mkShell {
buildInputs = [
(agda.withPackages (ps: [
ps.standard-library
]))
];
}
To enter a shell with the standard-library dependency available, I ran $ nix-shell shell.nix.
Then, trying to compile the program, I ran $ agda --compile hello-world.agda, as advised by the linked agda hello world documentation.
But that gave me the following error:
$ agda --compile hello-world.agda
Checking hello-world (/home/matthew/backup/projects/agda-math/hello-world.agda).
/home/matthew/backup/projects/agda-math/hello-world.agda:3,1-15
Failed to find source of module IO in any of the following
locations:
/home/matthew/backup/projects/agda-math/IO.agda
/home/matthew/backup/projects/agda-math/IO.lagda
/nix/store/7pg293b76ppv2rw2saf5lcbckn6kdy7z-Agda-2.6.2.2-data/share/ghc-9.0.2/x86_64-linux-ghc-9.0.2/Agda-2.6.2.2/lib/prim/IO.agda
/nix/store/7pg293b76ppv2rw2saf5lcbckn6kdy7z-Agda-2.6.2.2-data/share/ghc-9.0.2/x86_64-linux-ghc-9.0.2/Agda-2.6.2.2/lib/prim/IO.lagda
when scope checking the declaration
open import IO
It seems it should be finding the standard library, since I'm running from the nix-shell with agda's standard-library specified, but that error on open import IO looks like the standard library is somehow still not found.
Any idea what the problem is likely to be?
Or what else I can do to get agda working on nixos?
You might need to create a defaults file in AGDA_DIR (which typically refers to ~/.agda/ unless overwritten through environment variable) with the libraries you want to make available to your program:
echo standard-library >> ~/.agda/defaults
which should make the compiler automatically load the standard library.
Alternatively, you can pass them in on the command-line:
agda -l standard-library
or use a project-local .agda-lib file as follows:
name: my-libary
depend: standard-library
include: -- Optionally specify include paths
You should not need to specify the include paths with Nix, but in case you do, you can use the -i command line flag or add the path to the standard-library.agda-lib file in ~/.agda/libraries.

How to refer to custom derivation in shell.nix?

I'm very new to Nix. I'd like to refer to project scripts in my shell.nix file, so that when I cd into my project directory I can refer to them by name, and I can keep them up-to-date whenever the sources change.
To learn how to do this, I created a very simple derivation for a shell script. Eventually I'd like to use other languages, but I'm starting simple. It looks like this:
project
nix
myScript
default.nix
builder.sh
shell.nix
# default.nix
{ pkgs ? import <nixpkgs> {} }:
pkgs.stdenv.mkDerivation rec {
name = "myScript";
echo = pkgs.coreutils + "/bin/echo";
builder = "${pkgs.coreutils}/bin/bash";
args = [ ./builder.sh ];
}
# builder.sh
$echo "$echo Hello world" > $out
When I run nix-build myScript.nix it creates a symlinked result file that looks like this:
/nix/store/3mfkgajns47hfv0diihzi2scwl4hm2fl-coreutils-9.1/bin/echo Hello world
I tried referencing this in my shell.nix file like this:
{ pkgs ? import (fetchTarball "https://github.com/NixOS/nixpkgs/archive/bf972dc380f36a3bf83db052380e55f0eaa7dcb6.tar.gz") {} }:
let
myScript = import ./myScript {};
in
pkgs.mkShell {
buildInputs = [
myScript
];
shellHook = ''
echo Loading shell.nix
'';
}
But whenever I enter the projects directory and run the command `myScript, I get an error:
zsh: command not found: myScript
I already have direnv configured correctly, which I can confirm by adding other shell tools and checking their versions. So it's something wrong with my nix files.
I'm almost certainly doing something wrong here. I know I can simplify this with pkgs.writeShellScriptBin, but the shell script is more a minimal example of what I want to get working. Eventually I'd use more complex derivations.
What I think is wrong
I think the myScript derivation or builder is doing something wrong. It does create the expected output file (i.e. I can chmod +x and run it, and it works) but I suspect I need to tell nix how to run it? I'm not sure. And also I might be importing the derivation incorrectly.
This is a problem with your default.nix, not your shell.nix.
For mkShell to work with buildInputs as you intend, you need $out to be a directory with an $out/bin/myScript, not a file on its own. nixpkgs has a helper that will do this for you, in https://github.com/NixOS/nixpkgs/blob/master/pkgs/build-support/trivial-builders.nix --
# default.nix; no builder.sh needed
{ pkgs ? import <nixpkgs> {} }:
pkgs.writeShellScript "myScript" ''
echo "Hello world" # use the bash-builtin echo, not the external coreutils one
'';

Can't pass in Requirements.txt for Dataflow

I've been trying to deploy a pipeline on Google Cloud Dataflow. It's been a quite a challenge so far.
I'm facing an import issue because I realised that ParDo functions require the requirements.txt to be present if not it will say that it can't find the required module. https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/
So I tried fixing the problem by passing in the requirements.txt file, only to be met with a very incomprehensible error message.
import apache_beam as beam
from apache_beam.runners.interactive.interactive_runner import InteractiveRunner
from apache_beam.io.gcp.bigtableio import WriteToBigTable
from apache_beam.runners import DataflowRunner
import apache_beam.runners.interactive.interactive_beam as ib
from apache_beam.options import pipeline_options
from apache_beam.options.pipeline_options import GoogleCloudOptions
import google.auth
from google.cloud.bigtable.row import DirectRow
import datetime
# Setting up the Apache Beam pipeline options.
options = pipeline_options.PipelineOptions(flags=[])
# Sets the project to the default project in your current Google Cloud environment.
_, options.view_as(GoogleCloudOptions).project = google.auth.default()
# Sets the Google Cloud Region in which Cloud Dataflow runs.
options.view_as(GoogleCloudOptions).region = 'us-central1'
# IMPORTANT! Adjust the following to choose a Cloud Storage location.
dataflow_gcs_location = 'gs://tunnel-insight-2-0-dev-291100/dataflow'
# Dataflow Staging Location. This location is used to stage the Dataflow Pipeline and SDK binary.
options.view_as(GoogleCloudOptions).staging_location = '%s/staging' % dataflow_gcs_location
# Sets the pipeline mode to streaming, so we can stream the data from PubSub.
options.view_as(pipeline_options.StandardOptions).streaming = True
# Sets the requirements.txt file
options.view_as(pipeline_options.SetupOptions).requirements_file = "requirements.txt"
# Dataflow Temp Location. This location is used to store temporary files or intermediate results before finally outputting to the sink.
options.view_as(GoogleCloudOptions).temp_location = '%s/temp' % dataflow_gcs_location
# The directory to store the output files of the job.
output_gcs_location = '%s/output' % dataflow_gcs_location
ib.options.recording_duration = '1m'
...
...
pipeline_result = DataflowRunner().run_pipeline(p, options=options)
I've tried to pass requirements using "options.view_as(pipeline_options.SetupOptions).requirements_file = "requirements.txt""
I get this error
---------------------------------------------------------------------------
CalledProcessError Traceback (most recent call last)
~/apache-beam-custom/packages/beam/sdks/python/apache_beam/utils/processes.py in check_output(*args, **kwargs)
90 try:
---> 91 out = subprocess.check_output(*args, **kwargs)
92 except OSError:
/opt/conda/lib/python3.7/subprocess.py in check_output(timeout, *popenargs, **kwargs)
410 return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
--> 411 **kwargs).stdout
412
/opt/conda/lib/python3.7/subprocess.py in run(input, capture_output, timeout, check, *popenargs, **kwargs)
511 raise CalledProcessError(retcode, process.args,
--> 512 output=stdout, stderr=stderr)
513 return CompletedProcess(process.args, retcode, stdout, stderr)
CalledProcessError: Command '['/root/apache-beam-custom/bin/python', '-m', 'pip', 'download', '--dest', '/tmp/dataflow-requirements-cache', '-r', 'requirements.txt', '--exists-action', 'i', '--no-binary', ':all:']' returned non-zero exit status 1.
During handling of the above exception, another exception occurred:
RuntimeError Traceback (most recent call last)
<ipython-input-12-f018e5c84d08> in <module>
----> 1 pipeline_result = DataflowRunner().run_pipeline(p, options=options)
~/apache-beam-custom/packages/beam/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py in run_pipeline(self, pipeline, options)
491 environments.DockerEnvironment.from_container_image(
492 apiclient.get_container_image_from_options(options),
--> 493 artifacts=environments.python_sdk_dependencies(options)))
494
495 # This has to be performed before pipeline proto is constructed to make sure
~/apache-beam-custom/packages/beam/sdks/python/apache_beam/transforms/environments.py in python_sdk_dependencies(options, tmp_dir)
624 options,
625 tmp_dir,
--> 626 skip_prestaged_dependencies=skip_prestaged_dependencies))
~/apache-beam-custom/packages/beam/sdks/python/apache_beam/runners/portability/stager.py in create_job_resources(options, temp_dir, build_setup_args, populate_requirements_cache, skip_prestaged_dependencies)
178 populate_requirements_cache if populate_requirements_cache else
179 Stager._populate_requirements_cache)(
--> 180 setup_options.requirements_file, requirements_cache_path)
181 for pkg in glob.glob(os.path.join(requirements_cache_path, '*')):
182 resources.append((pkg, os.path.basename(pkg)))
~/apache-beam-custom/packages/beam/sdks/python/apache_beam/utils/retry.py in wrapper(*args, **kwargs)
234 while True:
235 try:
--> 236 return fun(*args, **kwargs)
237 except Exception as exn: # pylint: disable=broad-except
238 if not retry_filter(exn):
~/apache-beam-custom/packages/beam/sdks/python/apache_beam/runners/portability/stager.py in _populate_requirements_cache(requirements_file, cache_dir)
569 ]
570 _LOGGER.info('Executing command: %s', cmd_args)
--> 571 processes.check_output(cmd_args, stderr=processes.STDOUT)
572
573 #staticmethod
~/apache-beam-custom/packages/beam/sdks/python/apache_beam/utils/processes.py in check_output(*args, **kwargs)
97 "Full traceback: {} \n Pip install failed for package: {} \
98 \n Output from execution of subprocess: {}" \
---> 99 .format(traceback.format_exc(), args[0][6], error.output))
100 else:
101 raise RuntimeError("Full trace: {}, \
RuntimeError: Full traceback: Traceback (most recent call last):
File "/root/apache-beam-custom/packages/beam/sdks/python/apache_beam/utils/processes.py", line 91, in check_output
out = subprocess.check_output(*args, **kwargs)
File "/opt/conda/lib/python3.7/subprocess.py", line 411, in check_output
**kwargs).stdout
File "/opt/conda/lib/python3.7/subprocess.py", line 512, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['/root/apache-beam-custom/bin/python', '-m', 'pip', 'download', '--dest', '/tmp/dataflow-requirements-cache', '-r', 'requirements.txt', '--exists-action', 'i', '--no-binary', ':all:']' returned non-zero exit status 1.
Pip install failed for package: -r
Output from execution of subprocess: b'Obtaining file:///root/apache-beam-custom/packages/beam/sdks/python (from -r requirements.txt (line 3))\n Saved /tmp/dataflow-requirements-cache/apache-beam-2.25.0.zip\nCollecting absl-py==0.11.0\n Downloading absl-py-0.11.0.tar.gz (110 kB)\n Saved /tmp/dataflow-requirements-cache/absl-py-0.11.0.tar.gz\nCollecting argon2-cffi==20.1.0\n Downloading argon2-cffi-20.1.0.tar.gz (1.8 MB)\n Installing build dependencies: started\n Installing build dependencies: finished with status \'error\'\n ERROR: Command errored out with exit status 1:\n command: /root/apache-beam-custom/bin/python /root/apache-beam-custom/lib/python3.7/site-packages/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-3iuiaex9/overlay --no-warn-script-location --no-binary :all: --only-binary :none: -i https://pypi.org/simple -- \'setuptools>=40.6.0\' wheel \'cffi>=1.0\'\n cwd: None\n Complete output (85 lines):\n Collecting setuptools>=40.6.0\n Downloading setuptools-51.1.1.tar.gz (2.1 MB)\n Collecting wheel\n Downloading wheel-0.36.2.tar.gz (65 kB)\n Collecting cffi>=1.0\n Downloading cffi-1.14.4.tar.gz (471 kB)\n Collecting pycparser\n Downloading pycparser-2.20.tar.gz (161 kB)\n Skipping wheel build for setuptools, due to binaries being disabled for it.\n Skipping wheel build for wheel, due to binaries being disabled for it.\n Skipping wheel build for cffi, due to binaries being disabled for it.\n Skipping wheel build for pycparser, due to binaries being disabled for it.\n Installing collected packages: setuptools, wheel, pycparser, cffi\n Running setup.py install for setuptools: started\n Running setup.py install for setuptools: finished with status \'done\'\n Running setup.py install for wheel: started\n Running setup.py install for wheel: finished with status \'done\'\n Running setup.py install for pycparser: started\n Running setup.py install for pycparser: finished with status \'done\'\n Running setup.py install for cffi: started\n Running setup.py install for cffi: finished with status \'error\'\n ERROR: Command errored out with exit status 1:\n command: /root/apache-beam-custom/bin/python -u -c \'import sys, setuptools, tokenize; sys.argv[0] = \'"\'"\'/tmp/pip-install-6zs5jguv/cffi/setup.py\'"\'"\'; __file__=\'"\'"\'/tmp/pip-install-6zs5jguv/cffi/setup.py\'"\'"\';f=getattr(tokenize, \'"\'"\'open\'"\'"\', open)(__file__);code=f.read().replace(\'"\'"\'\\r\\n\'"\'"\', \'"\'"\'\\n\'"\'"\');f.close();exec(compile(code, __file__, \'"\'"\'exec\'"\'"\'))\' install --record /tmp/pip-record-z8o69lka/install-record.txt --single-version-externally-managed --prefix /tmp/pip-build-env-3iuiaex9/overlay --compile --install-headers /root/apache-beam-custom/include/site/python3.7/cffi\n cwd: /tmp/pip-install-6zs5jguv/cffi/\n Complete output (56 lines):\n Package libffi was not found in the pkg-config search path.\n Perhaps you should add the directory containing `libffi.pc\'\n to the PKG_CONFIG_PATH environment variable\n No package \'libffi\' found\n Package libffi was not found in the pkg-config search path.\n Perhaps you should add the directory containing `libffi.pc\'\n to the PKG_CONFIG_PATH environment variable\n No package \'libffi\' found\n Package libffi was not found in the pkg-config search path.\n Perhaps you should add the directory containing `libffi.pc\'\n to the PKG_CONFIG_PATH environment variable\n No package \'libffi\' found\n Package libffi was not found in the pkg-config search path.\n Perhaps you should add the directory containing `libffi.pc\'\n to the PKG_CONFIG_PATH environment variable\n No package \'libffi\' found\n Package libffi was not found in the pkg-config search path.\n Perhaps you should add the directory containing `libffi.pc\'\n to the PKG_CONFIG_PATH environment variable\n No package \'libffi\' found\n running install\n running build\n running build_py\n creating build\n creating build/lib.linux-x86_64-3.7\n creating build/lib.linux-x86_64-3.7/cffi\n copying cffi/setuptools_ext.py -> build/lib.linux-x86_64-3.7/cffi\n copying cffi/pkgconfig.py -> build/lib.linux-x86_64-3.7/cffi\n copying cffi/verifier.py -> build/lib.linux-x86_64-3.7/cffi\n copying cffi/vengine_gen.py -> build/lib.linux-x86_64-3.7/cffi\n copying cffi/backend_ctypes.py -> build/lib.linux-x86_64-3.7/cffi\n copying cffi/__init__.py -> build/lib.linux-x86_64-3.7/cffi\n copying cffi/cffi_opcode.py -> build/lib.linux-x86_64-3.7/cffi\n copying cffi/error.py -> build/lib.linux-x86_64-3.7/cffi\n copying cffi/api.py -> build/lib.linux-x86_64-3.7/cffi\n copying cffi/commontypes.py -> build/lib.linux-x86_64-3.7/cffi\n copying cffi/ffiplatform.py -> build/lib.linux-x86_64-3.7/cffi\n copying cffi/lock.py -> build/lib.linux-x86_64-3.7/cffi\n copying cffi/cparser.py -> build/lib.linux-x86_64-3.7/cffi\n copying cffi/recompiler.py -> build/lib.linux-x86_64-3.7/cffi\n copying cffi/vengine_cpy.py -> build/lib.linux-x86_64-3.7/cffi\n copying cffi/model.py -> build/lib.linux-x86_64-3.7/cffi\n copying cffi/_cffi_include.h -> build/lib.linux-x86_64-3.7/cffi\n copying cffi/parse_c_type.h -> build/lib.linux-x86_64-3.7/cffi\n copying cffi/_embedding.h -> build/lib.linux-x86_64-3.7/cffi\n copying cffi/_cffi_errors.h -> build/lib.linux-x86_64-3.7/cffi\n running build_ext\n building \'_cffi_backend\' extension\n creating build/temp.linux-x86_64-3.7\n creating build/temp.linux-x86_64-3.7/c\n gcc -pthread -B /opt/conda/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DUSE__THREAD -DHAVE_SYNC_SYNCHRONIZE -I/usr/include/ffi -I/usr/include/libffi -I/root/apache-beam-custom/include -I/opt/conda/include/python3.7m -c c/_cffi_backend.c -o build/temp.linux-x86_64-3.7/c/_cffi_backend.o\n c/_cffi_backend.c:15:10: fatal error: ffi.h: No such file or directory\n #include <ffi.h>\n ^~~~~~~\n compilation terminated.\n error: command \'gcc\' failed with exit status 1\n ----------------------------------------\n ERROR: Command errored out with exit status 1: /root/apache-beam-custom/bin/python -u -c \'import sys, setuptools, tokenize; sys.argv[0] = \'"\'"\'/tmp/pip-install-6zs5jguv/cffi/setup.py\'"\'"\'; __file__=\'"\'"\'/tmp/pip-install-6zs5jguv/cffi/setup.py\'"\'"\';f=getattr(tokenize, \'"\'"\'open\'"\'"\', open)(__file__);code=f.read().replace(\'"\'"\'\\r\\n\'"\'"\', \'"\'"\'\\n\'"\'"\');f.close();exec(compile(code, __file__, \'"\'"\'exec\'"\'"\'))\' install --record /tmp/pip-record-z8o69lka/install-record.txt --single-version-externally-managed --prefix /tmp/pip-build-env-3iuiaex9/overlay --compile --install-headers /root/apache-beam-custom/include/site/python3.7/cffi Check the logs for full command output.\n WARNING: You are using pip version 20.1.1; however, version 20.3.3 is available.\n You should consider upgrading via the \'/root/apache-beam-custom/bin/python -m pip install --upgrade pip\' command.\n ----------------------------------------\nERROR: Command errored out with exit status 1: /root/apache-beam-custom/bin/python /root/apache-beam-custom/lib/python3.7/site-packages/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-3iuiaex9/overlay --no-warn-script-location --no-binary :all: --only-binary :none: -i https://pypi.org/simple -- \'setuptools>=40.6.0\' wheel \'cffi>=1.0\' Check the logs for full command output.\nWARNING: You are using pip version 20.1.1; however, version 20.3.3 is available.\nYou should consider upgrading via the \'/root/apache-beam-custom/bin/python -m pip install --upgrade pip\' command.\n'
Did I do something wrong?
-------------- EDIT---------------------------------------
Ok, I've got my pipeline to work, but I'm still having a problem with my requirements.txt file which I believe I'm passing in correctly.
My pipeline code:
import apache_beam as beam
from apache_beam.runners.interactive.interactive_runner import InteractiveRunner
from apache_beam.io.gcp.bigtableio import WriteToBigTable
from apache_beam.runners import DataflowRunner
import apache_beam.runners.interactive.interactive_beam as ib
from apache_beam.options import pipeline_options
from apache_beam.options.pipeline_options import GoogleCloudOptions
import google.auth
from google.cloud.bigtable.row import DirectRow
import datetime
# Setting up the Apache Beam pipeline options.
options = pipeline_options.PipelineOptions(flags=[])
# Sets the project to the default project in your current Google Cloud environment.
_, options.view_as(GoogleCloudOptions).project = google.auth.default()
# Sets the Google Cloud Region in which Cloud Dataflow runs.
options.view_as(GoogleCloudOptions).region = 'us-central1'
# IMPORTANT! Adjust the following to choose a Cloud Storage location.
dataflow_gcs_location = ''
# Dataflow Staging Location. This location is used to stage the Dataflow Pipeline and SDK binary.
options.view_as(GoogleCloudOptions).staging_location = '%s/staging' % dataflow_gcs_location
# Sets the pipeline mode to streaming, so we can stream the data from PubSub.
options.view_as(pipeline_options.StandardOptions).streaming = True
# Sets the requirements.txt file
options.view_as(pipeline_options.SetupOptions).requirements_file = "requirements.txt"
# Dataflow Temp Location. This location is used to store temporary files or intermediate results before finally outputting to the sink.
options.view_as(GoogleCloudOptions).temp_location = '%s/temp' % dataflow_gcs_location
# The directory to store the output files of the job.
output_gcs_location = '%s/output' % dataflow_gcs_location
ib.options.recording_duration = '1m'
# The Google Cloud PubSub topic for this example.
topic = ""
subscription = ""
output_topic = ""
# Info
project_id = ""
bigtable_instance = ""
bigtable_table_id = ""
class CreateRowFn(beam.DoFn):
def process(self,words):
from google.cloud.bigtable.row import DirectRow
import datetime
direct_row = DirectRow(row_key="phone#4c410523#20190501")
direct_row.set_cell(
"stats_summary",
b"os_build",
b"android",
datetime.datetime.now())
return [direct_row]
p = beam.Pipeline(InteractiveRunner(),options=options)
words = p | "read" >> beam.io.ReadFromPubSub(subscription=subscription)
windowed_words = (words | "window" >> beam.WindowInto(beam.window.FixedWindows(10)))
# Writing to BigTable
test = words | beam.ParDo(CreateRowFn()) | WriteToBigTable(
project_id=project_id,
instance_id=bigtable_instance,
table_id=bigtable_table_id)
pipeline_result = DataflowRunner().run_pipeline(p, options=options)
As you can see in "CreateRowFn", I need to import
from google.cloud.bigtable.row import DirectRow
import datetime
Only then this works.
I've passed in requirements.txt as options.view_as(pipeline_options.SetupOptions).requirements_file = "requirements.txt" and I see it on Dataflow console.
If I remove the import statements, I get "in process NameError: name 'DirectRow' is not defined".
Is there anyway to overcome this?
I've found the answer in the FAQs. My mistake was not about how to pass in requirements.txt but how to handle NameErrors
https://cloud.google.com/dataflow/docs/resources/faq
How do I handle NameErrors?
If you're getting a NameError when you execute your pipeline using the Dataflow service but not when you execute locally (i.e. using the DirectRunner), your DoFns may be using values in the global namespace that are not available on the Dataflow worker.
By default, global imports, functions, and variables defined in the main session are not saved during the serialization of a Dataflow job. If, for example, your DoFns are defined in the main file and reference imports and functions in the global namespace, you can set the --save_main_session pipeline option to True. This will cause the state of the global namespace to be pickled and loaded on the Dataflow worker.
Notice that if you have objects in your global namespace that cannot be pickled, you will get a pickling error. If the error is regarding a module that should be available in the Python distribution, you can solve this by importing the module locally, where it is used.
For example, instead of:
import re
…
def myfunc():
# use re module
use:
def myfunc():
import re
# use re module
Alternatively, if your DoFns span multiple files, you should use a different approach to packaging your workflow and managing dependencies.
So the conclusion is:
It is ok to use import statements within the functions
Google Dataflow workers already have the these packages installed: https://cloud.google.com/dataflow/docs/concepts/sdk-worker-dependencies.
If you are running it from cloud composer
In that case you need to add the new Packages to PYPI PACKAGES as shown below.
You can also pass --requirements_file path://requirements.txt as flag in the command while running it.
I prefer to use --setup_file path://setup.py flag instead. The format of setup file is as follows
import setuptools
REQUIRED_PACKAGES = [
'joblib==0.15.1',
'numpy==1.18.5',
'google',
'google-cloud',
'google-cloud-storage',
'cassandra-driver==3.22.0'
]
PACKAGE_NAME = 'my_package'
PACKAGE_VERSION = '0.0.1'
setuptools.setup(
name=PACKAGE_NAME,
version=PACKAGE_VERSION,
description='Searh Rank project',
install_requires=REQUIRED_PACKAGES,
author="Mohd Faisal",
packages=setuptools.find_packages()
)
Use the format below for dataflow script:
from __future__ import absolute_import
import argparse
import logging
import apache_beam as beam
from apache_beam.options.pipeline_options import (GoogleCloudOptions,
PipelineOptions,
SetupOptions,
StandardOptions,
WorkerOptions)
from datetime import date
class Userprocess(beam.DoFn):
def process(self, msg):
yield "OK"
def run(argv=None):
logging.info("Parsing dataflow flags... ")
pipeline_options = PipelineOptions()
pipeline_options.view_as(SetupOptions).save_main_session = True
parser = argparse.ArgumentParser()
parser.add_argument(
'--project',
required=True,
help=(
'project id staging or production '))
parser.add_argument(
'--temp_location',
required=True,
help=(
'temp location'))
parser.add_argument(
'--job_name',
required=True,
help=(
'job name'))
known_args, pipeline_args = parser.parse_known_args(argv)
today = date.today()
logging.info("Processing Date is " + str(today))
google_cloud_options = pipeline_options.view_as(GoogleCloudOptions)
google_cloud_options.project = known_args.project
google_cloud_options.job_name = known_args.job_name
google_cloud_options.temp_location = known_args.temp_location
# pipeline_options.view_as(StandardOptions).runner = known_args.runner
with beam.Pipeline(argv=pipeline_args, options=pipeline_options) as p:
beam.ParDo(Userprocess())
if __name__ == '__main__':
logging.getLogger().setLevel(logging.INFO)
logging.info("Starting dataflow daily pipeline ")
try:
run()
except:
pass
Try running the script locally for errors.

LLVM cannot find clang binary

I have just built and installed LLVM Clang 3.5.0 with compiler-rt. clang binary seems to work, but cannot build a simple test program:
$ cat hello.c
#include <stdio.h>
int main(int argc, char **argv) {
printf("Hello World\n");
return 0;
}
Building it borks with error: unable to execute command: Executable "" doesn't exist!
$ clang hello.c -o hello
error: unable to execute command: Executable "" doesn't exist!
Executable "" ? Interesting...
Debugging it further reveals that clang tries to call itself to build .o object file and then ld to link it, but does not know where itself exists apparently.
$ clang -### hello.c -o hello
clang version 3.5.0 (tags/RELEASE_350/final)
Target: x86_64-alpine-linux-musl
Thread model: posix
"" "-cc1" "-triple" "x86_64-alpine-linux-musl" "-emit-obj" "-mrelax-all" "-disable-free" "-main-file-name" "hello.c" "-mrelocation-model" "static" "-mdisable-fp-elim" "-fmath-errno" "-masm-verbose" "-mconstructor-aliases" "-munwind-tables" "-fuse-init-array" "-target-cpu" "x86-64" "-target-linker-version" "2.24" "-dwarf-column-info" "-resource-dir" "../lib/clang/3.5.0" "-internal-isystem" "/usr/local/include" "-internal-isystem" "../lib/clang/3.5.0/include" "-internal-externc-isystem" "/include" "-internal-externc-isystem" "/usr/include" "-fdebug-compilation-dir" "/" "-ferror-limit" "19" "-fmessage-length" "158" "-mstackrealign" "-fobjc-runtime=gcc" "-fdiagnostics-show-option" "-o" "/tmp/hello-37746e.o" "-x" "c" "hello.c"
"/usr/bin/ld" "-z" "relro" "--eh-frame-hdr" "-m" "elf_x86_64" "-dynamic-linker" "/lib/ld-musl-x86_64.so.1" "-o" "hello" "/usr/bin/../lib/gcc/x86_64-alpine-linux-musl/4.8.3/../../../crt1.o" "/usr/bin/../lib/gcc/x86_64-alpine-linux-musl/4.8.3/../../../crti.o" "/usr/bin/../lib/gcc/x86_64-alpine-linux-musl/4.8.3/crtbegin.o" "-L/usr/bin/../lib/gcc/x86_64-alpine-linux-musl/4.8.3" "-L/usr/bin/../lib/gcc/x86_64-alpine-linux-musl/4.8.3/../../../../x86_64-alpine-linux-musl/lib" "-L/usr/bin/../lib/gcc/x86_64-alpine-linux-musl/4.8.3/../../.." "-L/../lib" "-L/lib" "-L/usr/lib" "/tmp/hello-37746e.o" "-lgcc" "--as-needed" "-lgcc_s" "--no-as-needed" "-lc" "-lgcc" "--as-needed" "-lgcc_s" "--no-as-needed" "/usr/bin/../lib/gcc/x86_64-alpine-linux-musl/4.8.3/crtend.o" "/usr/bin/../lib/gcc/x86_64-alpine-linux-musl/4.8.3/../../../crtn.o"
When I run the first line putting /usr/bin/clang as the first item, it builds just fine:
$ /usr/bin/clang "-cc1" "-triple" "x86_64-alpine-linux-musl" "-emit-obj" "-mrelax-all" "-disable-free" "-main-file-name" "hello.c" "-mrelocation-model" "static" "-mdisable-fp-elim" "-fmath-errno" "-masm-verbose" "-mconstructor-aliases" "-munwind-tables" "-target-cpu" "x86-64" "-target-linker-version" "2.24" "-dwarf-column-info" "-resource-dir" "../lib/clang/3.5.0" "-internal-isystem" "/usr/local/include" "-internal-isystem" "../lib/clang/3.5.0/include" "-internal-externc-isystem" "/include" "-internal-externc-isystem" "/usr/include" "-fdebug-compilation-dir" "/" "-ferror-limit" "19" "-fmessage-length" "158" "-mstackrealign" "-fobjc-runtime=gcc" "-fdiagnostics-show-option" "-o" "/tmp/hello-4f64bb.o" "-x" "c" "hello.c"
$
And following /usr/bin/ld is able to link it just fine resulting in:
$ ./hello
Hello World
Anny suggestions what did I screw during configure/build?
clang source analysis shows that 'clang' program on Linux uses /proc/self/exe to find out real path of its binary. I am running in chroot without /proc mounted, thus it failed.
mount -t proc proc /proc
solves the issue

Which dart2js command is launched by DartEditor?

After reading article found at https://www.dartlang.org/articles/web-ui/tools.html, I tried to compile my application by following it.
My application stored in web/app.html can be successfully compiled to Javascript under DartEditor by using "Run as Javascript" command.
When I try to use following command lines to perform a compilation for deploying in production, I encounter an issue on uncopied package part file.
$ dart --package-root=packages/ packages/web_ui/dwc.dart --out /tmp/dart/ --no-rewrite-urls web/app.html
$ ls lib/app/
model_browser.dart model_server.dart
$ ls lib/app/src/model/
model_browser.dart model_server.dart model_shared.dart
$ cd /tmp/dart
$ dart2js -v app.html_bootstrap.dart --package-root=packages/ -oapp.html_bootstrap.dart.js
...
info: scanning library file:///private/tmp/dart/_from_packages/bm/model_browser.dart
_from_packages/app/model_browser.dart:12:1: Error: Cannot read "_from_packages/app/src/model/model_shared.dart" (OS Error: No such file or directory, errno = 2).
part 'src/model/model_shared.dart';
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
info: Error: compiler cancelled: Error: Cannot read "_from_packages/app/src/model/model_shared.dart" (OS Error: No such file or directory, errno = 2).
...
$ ls _from_packages/app/
model_browser.dart model_browser.dart.map
$ ls _from_packages/app/src/model/
model_browser.dart model_browser.dart.map
In fact, "model_shared.dart" file isn't copied into /tmp/dart/_from_packages/app/src/model/model_shared.dart.
The content of model_browser.dart is following
library model;
import 'dart:json' as json;
import 'package:bm/i18n.dart' as i18n;
import 'package:logging/logging.dart';
import 'package:web_ui/web_ui.dart';
part 'src/model/model_shared.dart';
part 'src/model/model_browser.dart';
DartEditor can launch my application as Javascript. What is the right command line for launching dart2js in order to take into account the "part" statements of my "model" library ?
The issue is linked to "--no-rewrite-urls" option and the absolute path in --out option.
If I remove the "--no-rewrite-urls" option and put a relative path in --out option, the dart2js compilation is successfully done.

Resources