Superset Oauth Integration config using Ambari error - oauth

I am trying to config OAUTH_PROVIDERS using ambari
File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/SUPERSET/package/scripts/superset.py", line 184, in <module>
Superset().execute()
File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 352, in execute
method(env)
File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 971, in restart
self.stop(env, upgrade_type=upgrade_type)
File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/SUPERSET/package/scripts/superset.py", line 133, in stop
self.configure(env, upgrade_type=upgrade_type)
File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/SUPERSET/package/scripts/superset.py", line 90, in configure
user=params.superset_user)
File "/usr/lib/ambari-agent/lib/resource_management/core/base.py", line 166, in __init__
self.env.run()
File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 160, in run
self.run_action(resource, action)
File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 124, in run_action
provider_action()
File "/usr/lib/ambari-agent/lib/resource_management/core/providers/system.py", line 263, in action_run
returns=self.resource.returns)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 72, in inner
result = function(command, **kwargs)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 102, in checked_call
tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy, returns=returns)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 150, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 314, in _call
raise ExecutionFailed(err_msg, code, out, err)
resource_management.core.exceptions.ExecutionFailed: Execution of 'source /usr/hdp/current/superset/conf/superset-env.sh ; /usr/hdp/current/superset/bin/superset db upgrade' returned 1. Loaded your LOCAL configuration at [/usr/hdp/current/superset/conf/superset_config.py]
Traceback (most recent call last):
File "/usr/hdp/current/superset/bin/superset", line 12, in <module>
from superset.cli import manager
File "/usr/hdp/3.1.4.0-315/superset/lib/python3.6/site-packages/superset/__init__.py", line 180, in <module>
update_perms=utils.get_update_perms_flag(),
File "/usr/hdp/3.1.4.0-315/superset/lib/python3.6/site-packages/flask_appbuilder/base.py", line 135, in __init__
self.init_app(app, session)
File "/usr/hdp/3.1.4.0-315/superset/lib/python3.6/site-packages/flask_appbuilder/base.py", line 156, in init_app
self.sm = self.security_manager_class(self)
File "/usr/hdp/3.1.4.0-315/superset/lib/python3.6/site-packages/flask_appbuilder/security/sqla/manager.py", line 39, in __init__
super(SecurityManager, self).__init__(appbuilder)
File "/usr/hdp/3.1.4.0-315/superset/lib/python3.6/site-packages/flask_appbuilder/security/manager.py", line 199, in __init__
provider_name = _provider['name']
TypeError: string indices must be integers
I am able to setup superset Oauth without Ambari, but struggling to make the config in Ambari because even if make change in superset_cofig.py Ambari is overwriting the superset_cofig.py when we restart the service.

I never used Superset with Ambari, but I am currently struggling with it for standalone usage due to the lack of proper documentation and practical use cases.
To my understanding, in order to read the superset_config.py you need to export the PYTHONPATH and point down to the folder where the config is placed.
For example: export PYTHONPATH=/<folder where the config is placed>/:$PYTHONPATH
If you get this right, you should see in Superset's logs something like this
Loaded your LOCAL configuration at [/<folder where the config is>/superset_config.py]

Related

Import Error: Deploying custom Pytorch Model on streamlit

Please, so trying to deploy a custom Pytorch based web-app on streamlit, everything works locally, however when deployed, I find the following Error in the Logs:
Downloading: "https://github.com/ultralytics/yolov5/archive/master.zip" to /home/appuser/.cache/torch/hub/master.zip
2022-08-14 12:23:17.584 Uncaught app exception
Traceback (most recent call last):
File "/home/appuser/venv/lib/python3.9/site-packages/streamlit/scriptrunner/script_runner.py", line 557, in _run_script
exec(code, module.__dict__)
File "app.py", line 28, in <module>
model = torch.hub.load('ultralytics/yolov5', 'custom', path=run_model_path)
File "/home/appuser/venv/lib/python3.9/site-packages/torch/hub.py", line 339, in load
model = _load_local(repo_or_dir, model, *args, **kwargs)
File "/home/appuser/venv/lib/python3.9/site-packages/torch/hub.py", line 368, in _load_local
model = entry(*args, **kwargs)
File "/home/appuser/.cache/torch/hub/ultralytics_yolov5_master/hubconf.py", line 74, in custom
return _create(path, autoshape=autoshape, verbose=_verbose, device=device)
File "/home/appuser/.cache/torch/hub/ultralytics_yolov5_master/hubconf.py", line 31, in _create
from models.common import AutoShape, DetectMultiBackend
File "/home/appuser/.cache/torch/hub/ultralytics_yolov5_master/models/common.py", line 14, in <module>
import cv2
File "/home/appuser/venv/lib/python3.9/site-packages/cv2/__init__.py", line 5, in <module>
from .cv2 import *
ImportError: libGL.so.1: cannot open shared object file: No such file or directory
Here's what the requrements.txt contains:
pillow<=9.2.0
numpy<=1.21.0
streamlit==1.11.0
torch<=1.8.2+cu111
opencv<=4.5.1
Would love your assistance.
If you're deploying the app, opencv-python-headless would be the appropriate package rather than opencv

Cannot use sqlite with the LocalExecutor [AIrflow]

I am trying to restart the airflow scheduler using the following command
airflow scheduler
I am using docker. I went inside my docker image for airflow and opened the CLI for my airflow image. That is where I used this command.
It throws an exception
Traceback (most recent call last):
File "/usr/local/bin/airflow", line 25, in <module>
from airflow.configuration import conf
File "/usr/local/lib/python3.6/site-packages/airflow/__init__.py", line 31, in <module>
from airflow.utils.log.logging_mixin import LoggingMixin
File "/usr/local/lib/python3.6/site-packages/airflow/utils/__init__.py", line 24, in <module>
from .decorators import apply_defaults as _apply_defaults
File "/usr/local/lib/python3.6/site-packages/airflow/utils/decorators.py", line 36, in <module>
from airflow import settings
File "/usr/local/lib/python3.6/site-packages/airflow/settings.py", line 37, in <module>
from airflow.configuration import conf, AIRFLOW_HOME, WEBSERVER_CONFIG # NOQA F401
File "/usr/local/lib/python3.6/site-packages/airflow/configuration.py", line 731, in <module>
conf.read(AIRFLOW_CONFIG)
File "/usr/local/lib/python3.6/site-packages/airflow/configuration.py", line 421, in read
self._validate()
File "/usr/local/lib/python3.6/site-packages/airflow/configuration.py", line 213, in _validate
self._validate_config_dependencies()
File "/usr/local/lib/python3.6/site-packages/airflow/configuration.py", line 247, in _validate_config_dependencies
self.get('core', 'executor')))
airflow.exceptions.AirflowConfigException: error: cannot use sqlite with the LocalExecutor
I am looking for any way to restart the airflow scheduler.
This is expected.
Since sqlite doesn’t support multiple connections it can only be used with SequentialExecutor. This is also explained in the docs.
If you want to use LocalExecutor please set MySQL or PostgreSQL as backend.

Why does my Python Dataflow job gets stuck at the Write phase?

I wrote a Python Dataflow job which managed to process 300 files, unfortunately, when I try to run it on 400 files it gets stuck in the Write phase forever.
The logs aren't really helpful, but I think that the issue comes from the writing logic of the code, initially, I only wanted 1 output file, so I wrote:
| 'Write' >> beam.io.WriteToText(
known_args.output,
file_name_suffix=".json",
num_shards=1,
shard_name_template=""
))
Then, I removed, num_shards=1 and shard_name_template="" and I was able to process more files but it'd still get stuck.
Extra Information
the files to process are small, less than a 1MB
also, when removing the num_shards and shard_name_template fields, I noticed that the data got output a temporary folder in the target path, but the job never finishes
I have the following DEADLINE_EXCEEDED exception and I tried solving it by increasing --num_workers to 6 and --disk_size_gb to 30 but it doesn't work.
Error message from worker: Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py", line 638, in do_work work_executor.execute() File "/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py", line 179, in execute op.start() File "dataflow_worker/shuffle_operations.py", line 63, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start File "dataflow_worker/shuffle_operations.py", line 64, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start File "dataflow_worker/shuffle_operations.py", line 79, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start File "dataflow_worker/shuffle_operations.py", line 80, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start File "dataflow_worker/shuffle_operations.py", line 82, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start File "/usr/local/lib/python3.7/site-packages/dataflow_worker/shuffle.py", line 441, in __iter__ for entry in entries_iterator: File "/usr/local/lib/python3.7/site-packages/dataflow_worker/shuffle.py", line 282, in __next__ return next(self.iterator) File "/usr/local/lib/python3.7/site-packages/dataflow_worker/shuffle.py", line 240, in __iter__ chunk, next_position = self.reader.Read(start_position, end_position) File "third_party/windmill/shuffle/python/shuffle_client.pyx", line 133, in shuffle_client.PyShuffleReader.Read OSError: Shuffle read failed: b'DEADLINE_EXCEEDED: (g)RPC timed out when extract-fields-three-mont-10090801-dlaj-harness-fj4v talking to extract-fields-three-mont-10090801-dlaj-harness-1f7r:12346. Server unresponsive (ping error: Deadline Exceeded, {"created":"#1602260204.931126454","description":"Deadline Exceeded","file":"third_party/grpc/src/core/ext/filters/deadline/deadline_filter.cc","file_line":69,"grpc_status":4}). Typically one can self manage this issue, please read: https://cloud.google.com/dataflow/docs/guides/common-errors#tsg-rpc-timeout'
Can you please recommend ways to troubleshoot this type of issues?
After trying to pump resources, I managed to get it working by enabling the Dataflow shuffle service fixed the situation. Please see resource
Just add --experiments=shuffle_mode=service to your PipelineOptions.

Pipeline keeps running because of Dataflow runner not closing file system writer. NotImplementedError

I'm running a Dataflow job (Apache Beam 2.12.0) using python on the Google Cloud Platform. The pipeline is not terminating and keeps running.
The issue is the same to
https://issues.apache.org/jira/browse/BEAM-7266
It wasn't solved and says "open when met again". It also says that file writer is not closing.
There's only one error log:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 649, in do_work
work_executor.execute()
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 178, in execute
op.finish()
File "dataflow_worker/native_operations.py", line 93, in dataflow_worker.native_operations.NativeWriteOperation.finish
def finish(self):
File "dataflow_worker/native_operations.py", line 94, in dataflow_worker.native_operations.NativeWriteOperation.finish
with self.scoped_finish_state:
File "dataflow_worker/native_operations.py", line 95, in dataflow_worker.native_operations.NativeWriteOperation.finish
self.writer.__exit__(None, None, None)
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/nativefileio.py", line 465, in __exit__
self.file.close()
File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/filesystemio.py", line 202, in close
self._uploader.finish()
File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/gcsio.py", line 606, in finish
raise self._upload_thread.last_error # pylint: disable=raising-bad-type
NotImplementedError: offset: 0, whence: 0, position: 51518, last: 0

Google Cloud Storage auth failure when using boto and gcs-oauth2-boto-plugin

I'm following this tutorial on using boto to access google cloud storage. I created a service account, downloaded the p12 file, and generated the .boto config file using "gsutil config -e" command. However, when I tried to list the buckets in my google cloud storage, I get following error.
import boto
import gcs_oauth2_boto_plugin
uri = boto.storage_uri('', 'gs')
for b in uri.get_all_buckets():
... print b.name
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/deployer/ENV/local/lib/python2.7/site-packages/boto/storage_uri.py", line 571, in get_all_buckets
return conn.get_all_buckets(headers)
File "/home/deployer/ENV/local/lib/python2.7/site-packages/boto/s3/connection.py", line 436, in get_all_buckets
response = self.make_request('GET', headers=headers)
File "/home/deployer/ENV/local/lib/python2.7/site-packages/boto/s3/connection.py", line 664, in make_request
retry_handler=retry_handler
File "/home/deployer/ENV/local/lib/python2.7/site-packages/boto/connection.py", line 1053, in make_request
retry_handler=retry_handler)
File "/home/deployer/ENV/local/lib/python2.7/site-packages/boto/connection.py", line 911, in _mexe
request.authorize(connection=self)
File "/home/deployer/ENV/local/lib/python2.7/site-packages/boto/connection.py", line 375, in authorize
connection._auth_handler.add_auth(self, **kwargs)
File "/home/deployer/ENV/local/lib/python2.7/site-packages/gcs_oauth2_boto_plugin/oauth2_plugin.py", line 70, in add_auth
self.oauth2_client.GetAuthorizationHeader()
File "/home/deployer/ENV/local/lib/python2.7/site-packages/gcs_oauth2_boto_plugin/oauth2_client.py", line 347, in GetAuthorizationHeader
return 'Bearer %s' % self.GetAccessToken().token
File "/home/deployer/ENV/local/lib/python2.7/site-packages/gcs_oauth2_boto_plugin/oauth2_client.py", line 318, in GetAccessToken
access_token = self.FetchAccessToken()
File "/home/deployer/ENV/local/lib/python2.7/site-packages/gcs_oauth2_boto_plugin/oauth2_client.py", line 395, in FetchAccessToken
credentials.refresh(http)
File "/home/deployer/ENV/local/lib/python2.7/site-packages/oauth2client/client.py", line 516, in refresh
self._refresh(http.request)
File "/home/deployer/ENV/local/lib/python2.7/site-packages/oauth2client/client.py", line 653, in _refresh
self._do_refresh_request(http_request)
File "/home/deployer/ENV/local/lib/python2.7/site-packages/oauth2client/client.py", line 710, in _do_refresh_request
raise AccessTokenRefreshError(error_msg)
oauth2client.client.AccessTokenRefreshError: invalid_grant

Resources