Following pipeline works with DirectRunner but raises exception below with DataflowRunner.
How do I go about debugging such errors? This appears to be pretty opaque to me.
p = beam.Pipeline("DataflowRunner", argv=[
'--project', project,
'--staging_location', staging_location,
'--temp_location', temp_location,
'--output', output_gcs
])
(p
| 'read events' >> beam.io.Read(beam.io.BigQuerySource(query=query, use_standard_sql=True))
| 'write' >> beam.io.WriteToText(output_gcs)
)
p.run().wait_until_finish()
raises
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 578, in do_work
work_executor.execute()
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 165, in execute
op.start()
File "dataflow_worker/operations.py", line 350, in dataflow_worker.operations.DoOperation.start (dataflow_worker/operations.c:13064)
def start(self):
File "dataflow_worker/operations.py", line 351, in dataflow_worker.operations.DoOperation.start (dataflow_worker/operations.c:12958)
with self.scoped_start_state:
File "dataflow_worker/operations.py", line 356, in dataflow_worker.operations.DoOperation.start (dataflow_worker/operations.c:12159)
pickler.loads(self.spec.serialized_fn))
File "/usr/local/lib/python2.7/dist-packages/apache_beam/internal/pickler.py", line 212, in loads
return dill.loads(s)
File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 277, in loads
return load(file)
File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 266, in load
obj = pik.load()
File "/usr/lib/python2.7/pickle.py", line 858, in load
dispatch[key](self)
File "/usr/lib/python2.7/pickle.py", line 1090, in load_global
klass = self.find_class(module, name)
File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 423, in find_class
return StockUnpickler.find_class(self, module, name)
File "/usr/lib/python2.7/pickle.py", line 1124, in find_class
__import__(module)
ImportError: No module named options.value_provider
value_provider is a module introduced recently for handling templates in the python SDK. However, I don't see any template in your snippet, so it is probably a package mismatch. Are you using matching versions for the SDK and the worker? You can check your worker-startup logs to check the versions of packages you have installed.
Same issue here. As pointed by Maria, it's a mismatch problem between apache_beam and google-cloud-dataflow packages.
Just to make it clear, the following command solves it:
pip2 install --upgrade apache_beam google-cloud-dataflow
Related
Please, so trying to deploy a custom Pytorch based web-app on streamlit, everything works locally, however when deployed, I find the following Error in the Logs:
Downloading: "https://github.com/ultralytics/yolov5/archive/master.zip" to /home/appuser/.cache/torch/hub/master.zip
2022-08-14 12:23:17.584 Uncaught app exception
Traceback (most recent call last):
File "/home/appuser/venv/lib/python3.9/site-packages/streamlit/scriptrunner/script_runner.py", line 557, in _run_script
exec(code, module.__dict__)
File "app.py", line 28, in <module>
model = torch.hub.load('ultralytics/yolov5', 'custom', path=run_model_path)
File "/home/appuser/venv/lib/python3.9/site-packages/torch/hub.py", line 339, in load
model = _load_local(repo_or_dir, model, *args, **kwargs)
File "/home/appuser/venv/lib/python3.9/site-packages/torch/hub.py", line 368, in _load_local
model = entry(*args, **kwargs)
File "/home/appuser/.cache/torch/hub/ultralytics_yolov5_master/hubconf.py", line 74, in custom
return _create(path, autoshape=autoshape, verbose=_verbose, device=device)
File "/home/appuser/.cache/torch/hub/ultralytics_yolov5_master/hubconf.py", line 31, in _create
from models.common import AutoShape, DetectMultiBackend
File "/home/appuser/.cache/torch/hub/ultralytics_yolov5_master/models/common.py", line 14, in <module>
import cv2
File "/home/appuser/venv/lib/python3.9/site-packages/cv2/__init__.py", line 5, in <module>
from .cv2 import *
ImportError: libGL.so.1: cannot open shared object file: No such file or directory
Here's what the requrements.txt contains:
pillow<=9.2.0
numpy<=1.21.0
streamlit==1.11.0
torch<=1.8.2+cu111
opencv<=4.5.1
Would love your assistance.
If you're deploying the app, opencv-python-headless would be the appropriate package rather than opencv
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
from tkinter import *
from pytube import *
##just a title
root =Tk()
root.title('Youtube Downloader')
##label at the top of
ytdLabel= Label(root,text='Enter URL of the video',font=('jost',15))
ytdLabel.pack()
##entry bar
enterURL=Entry(root,width=30)
enterURL.pack()
##
def URLDownloader():
myvid=(str(enterURL.get()))
video=YouTube(myvid)
video=video.streams.get_highest_resolution()
video.download()
dwnloadBtn=Button(root,text='Download',command=URLDownloader)
dwnloadBtn.pack()
root.mainloop()
Exception in Tkinter callback
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/tkinter/__init__.py", line 1892, in __call__
return self.func(*args)
File "/Users/jordanshodeinde/Desktop/Youtube downloader progression/youtube dowloader.py", line 25, in URLDownloader
video=YouTube(myvid)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pytube/__main__.py", line 91, in __init__
self.prefetch()
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pytube/__main__.py", line 181, in prefetch
self.vid_info_raw = request.get(self.vid_info_url)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pytube/request.py", line 36, in get
return _execute_request(url).read().decode("utf-8")
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pytube/request.py", line 24, in _execute_request
return urlopen(request) # nosec
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 214, in urlopen
return opener.open(url, data, timeout)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 523, in open
response = meth(req, response)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 632, in http_response
response = self.parent.error(
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 555, in error
result = self._call_chain(*args)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 494, in _call_chain
result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 747, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 523, in open
response = meth(req, response)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 632, in http_response
response = self.parent.error(
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 561, in error
return self._call_chain(*args)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 494, in _call_chain
result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 641, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
The problem for this error has nothing to do with Tkinter, this is a known bug with the pytube package.
Not sure what version of pytube you're using but this issue always seems to be solved after a new pytube update then it goes back to the same issue sometime down the line.
But you can try pip install pytube==10.9.2 as this is now the latest version or python -m pip install --upgrade pytube. Hopefully that will resolve the issue you're now facing.
I wrote a Python Dataflow job which managed to process 300 files, unfortunately, when I try to run it on 400 files it gets stuck in the Write phase forever.
The logs aren't really helpful, but I think that the issue comes from the writing logic of the code, initially, I only wanted 1 output file, so I wrote:
| 'Write' >> beam.io.WriteToText(
known_args.output,
file_name_suffix=".json",
num_shards=1,
shard_name_template=""
))
Then, I removed, num_shards=1 and shard_name_template="" and I was able to process more files but it'd still get stuck.
Extra Information
the files to process are small, less than a 1MB
also, when removing the num_shards and shard_name_template fields, I noticed that the data got output a temporary folder in the target path, but the job never finishes
I have the following DEADLINE_EXCEEDED exception and I tried solving it by increasing --num_workers to 6 and --disk_size_gb to 30 but it doesn't work.
Error message from worker: Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py", line 638, in do_work work_executor.execute() File "/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py", line 179, in execute op.start() File "dataflow_worker/shuffle_operations.py", line 63, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start File "dataflow_worker/shuffle_operations.py", line 64, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start File "dataflow_worker/shuffle_operations.py", line 79, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start File "dataflow_worker/shuffle_operations.py", line 80, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start File "dataflow_worker/shuffle_operations.py", line 82, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start File "/usr/local/lib/python3.7/site-packages/dataflow_worker/shuffle.py", line 441, in __iter__ for entry in entries_iterator: File "/usr/local/lib/python3.7/site-packages/dataflow_worker/shuffle.py", line 282, in __next__ return next(self.iterator) File "/usr/local/lib/python3.7/site-packages/dataflow_worker/shuffle.py", line 240, in __iter__ chunk, next_position = self.reader.Read(start_position, end_position) File "third_party/windmill/shuffle/python/shuffle_client.pyx", line 133, in shuffle_client.PyShuffleReader.Read OSError: Shuffle read failed: b'DEADLINE_EXCEEDED: (g)RPC timed out when extract-fields-three-mont-10090801-dlaj-harness-fj4v talking to extract-fields-three-mont-10090801-dlaj-harness-1f7r:12346. Server unresponsive (ping error: Deadline Exceeded, {"created":"#1602260204.931126454","description":"Deadline Exceeded","file":"third_party/grpc/src/core/ext/filters/deadline/deadline_filter.cc","file_line":69,"grpc_status":4}). Typically one can self manage this issue, please read: https://cloud.google.com/dataflow/docs/guides/common-errors#tsg-rpc-timeout'
Can you please recommend ways to troubleshoot this type of issues?
After trying to pump resources, I managed to get it working by enabling the Dataflow shuffle service fixed the situation. Please see resource
Just add --experiments=shuffle_mode=service to your PipelineOptions.
I am trying to config OAUTH_PROVIDERS using ambari
File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/SUPERSET/package/scripts/superset.py", line 184, in <module>
Superset().execute()
File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 352, in execute
method(env)
File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 971, in restart
self.stop(env, upgrade_type=upgrade_type)
File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/SUPERSET/package/scripts/superset.py", line 133, in stop
self.configure(env, upgrade_type=upgrade_type)
File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/SUPERSET/package/scripts/superset.py", line 90, in configure
user=params.superset_user)
File "/usr/lib/ambari-agent/lib/resource_management/core/base.py", line 166, in __init__
self.env.run()
File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 160, in run
self.run_action(resource, action)
File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 124, in run_action
provider_action()
File "/usr/lib/ambari-agent/lib/resource_management/core/providers/system.py", line 263, in action_run
returns=self.resource.returns)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 72, in inner
result = function(command, **kwargs)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 102, in checked_call
tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy, returns=returns)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 150, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 314, in _call
raise ExecutionFailed(err_msg, code, out, err)
resource_management.core.exceptions.ExecutionFailed: Execution of 'source /usr/hdp/current/superset/conf/superset-env.sh ; /usr/hdp/current/superset/bin/superset db upgrade' returned 1. Loaded your LOCAL configuration at [/usr/hdp/current/superset/conf/superset_config.py]
Traceback (most recent call last):
File "/usr/hdp/current/superset/bin/superset", line 12, in <module>
from superset.cli import manager
File "/usr/hdp/3.1.4.0-315/superset/lib/python3.6/site-packages/superset/__init__.py", line 180, in <module>
update_perms=utils.get_update_perms_flag(),
File "/usr/hdp/3.1.4.0-315/superset/lib/python3.6/site-packages/flask_appbuilder/base.py", line 135, in __init__
self.init_app(app, session)
File "/usr/hdp/3.1.4.0-315/superset/lib/python3.6/site-packages/flask_appbuilder/base.py", line 156, in init_app
self.sm = self.security_manager_class(self)
File "/usr/hdp/3.1.4.0-315/superset/lib/python3.6/site-packages/flask_appbuilder/security/sqla/manager.py", line 39, in __init__
super(SecurityManager, self).__init__(appbuilder)
File "/usr/hdp/3.1.4.0-315/superset/lib/python3.6/site-packages/flask_appbuilder/security/manager.py", line 199, in __init__
provider_name = _provider['name']
TypeError: string indices must be integers
I am able to setup superset Oauth without Ambari, but struggling to make the config in Ambari because even if make change in superset_cofig.py Ambari is overwriting the superset_cofig.py when we restart the service.
I never used Superset with Ambari, but I am currently struggling with it for standalone usage due to the lack of proper documentation and practical use cases.
To my understanding, in order to read the superset_config.py you need to export the PYTHONPATH and point down to the folder where the config is placed.
For example: export PYTHONPATH=/<folder where the config is placed>/:$PYTHONPATH
If you get this right, you should see in Superset's logs something like this
Loaded your LOCAL configuration at [/<folder where the config is>/superset_config.py]
ubuntu#ubuntu-02:/reddit/r2$ paster serve --reload example.ini http_port=8080Starting subprocess with file monitor
/usr/local/lib/python2.6/dist-packages/Pylons-0.9.6.2-py2.6.egg/pylons/middleware.py:11: DeprecationWarning: The webhelpers.rails package is deprecated.
- Please begin migrating to the new helpers in webhelpers.html,
webhelpers.text, webhelpers.number, etc.
- Import url_for() directly from routes, and redirect_to() from
pylons.controllers.util (if using Pylons) or from routes.
- All Javascript support has been deprecated. You can write link_to_remote()
yourself or use one of the third-party Javascript libraries.
from webhelpers.rails.asset_tag import javascript_path
/reddit/r2/r2/lib/manager/tp_manager.py:22: DeprecationWarning: the sha module is deprecated; use the hashlib module instead
import pylons, sha
Traceback (most recent call last):
File "/usr/local/bin/paster", line 8, in <module>
load_entry_point('PasteScript==1.7.3', 'console_scripts', 'paster')()
File "/usr/local/lib/python2.6/dist-packages/PasteScript-1.7.3-py2.6.egg/paste/script/command.py", line 84, in run
invoke(command, command_name, options, args[1:])
File "/usr/local/lib/python2.6/dist-packages/PasteScript-1.7.3-py2.6.egg/paste/script/command.py", line 123, in invoke
exit_code = runner.run(args)
File "/usr/local/lib/python2.6/dist-packages/PasteScript-1.7.3-py2.6.egg/paste/script/command.py", line 218, in run
result = self.command()
File "/usr/local/lib/python2.6/dist-packages/PasteScript-1.7.3-py2.6.egg/paste/script/serve.py", line 276, in command
relative_to=base, global_conf=vars)
File "/usr/local/lib/python2.6/dist-packages/PasteScript-1.7.3-py2.6.egg/paste/script/serve.py", line 313, in loadapp
**kw)
File "/usr/local/lib/python2.6/dist-packages/PasteDeploy-1.3.4-py2.6.egg/paste/deploy/loadwsgi.py", line 203, in loadapp
return loadobj(APP, uri, name=name, **kw)
File "/usr/local/lib/python2.6/dist-packages/PasteDeploy-1.3.4-py2.6.egg/paste/deploy/loadwsgi.py", line 224, in loadobj
return context.create()
File "/usr/local/lib/python2.6/dist-packages/PasteDeploy-1.3.4-py2.6.egg/paste/deploy/loadwsgi.py", line 617, in create
return self.object_type.invoke(self)
File "/usr/local/lib/python2.6/dist-packages/PasteDeploy-1.3.4-py2.6.egg/paste/deploy/loadwsgi.py", line 109, in invoke
return fix_call(context.object, context.global_conf, **context.local_conf)
File "/usr/local/lib/python2.6/dist-packages/PasteDeploy-1.3.4-py2.6.egg/paste/deploy/util/fixtypeerror.py", line 57, in fix_call
val = callable(*args, **kw)
File "/reddit/r2/r2/config/middleware.py", line 558, in make_app
load_environment(global_conf, app_conf)
File "/reddit/r2/r2/config/environment.py", line 54, in load_environment
config['pylons.g'] = app_globals.Globals(global_conf, app_conf, paths)
File "/reddit/r2/r2/lib/app_globals.py", line 173, in __init__
self.memcache = CMemcache(self.memcaches, num_clients = num_mc_clients)
File "/reddit/r2/r2/lib/cache.py", line 108, in __init__
client.behaviors.update(behaviors)
File "build/bdist.linux-x86_64/egg/pylibmc.py", line 105, in update
File "build/bdist.linux-x86_64/egg/pylibmc.py", line 172, in set_behaviors
_pylibmc.MemcachedError: memcached_behavior_set returned 45
I'm an absolute noob when it comes to running web services, just started learning Linux, and only know T-SQL and ActionScript 2. So, suffice to say that I'm a bit out of my depth here.
I know there are issues with various versions of python-webhelpers, and at least libmemcached, and at this point, I'm pretty stuck. I'm not great at Linux, so I'm never sure what version of a program I've got installed, and neither am I sure which versions are the working versions for what's in the git repository at the moment. What I'd like to do would be to uninstall libmemcached and webhelpers, and reinstall to the correct version. I get the feeling that doing this would require me to re-do much of the process, which is fine, provided it works.
Any help on how to resolve this error would be MUCH appreciated. I've gotten a lot of help previously from answered questions on this site, and I'm hoping someone much smarter than me has the answer to this one!
I encountered this exact same problem, and downgrading to libmemcached-0.48 fixed it.
Make sure you rebuild pylibmc after you do this.