Why am I getting a JSONDecodeError while downloading data from Yahoo finance module [closed] - yahoo-finance

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 1 year ago.
Improve this question
I am new to Python and I wrote a simple piece of code to download historical data for a given ticker symbol from yfinance module. The code was running for several weeks and recently, it started to throw the following exceptions. Here is the code:
data = yf.download("ibm", start="1/1/2017", end="1/6/2021")
Here is the error:
Exception in thread Thread-1:
Traceback (most recent call last):
File "C:\Python\Python39\lib\threading.py", line 954, in _bootstrap_inner
self.run()
File "C:\\Python\Python39\lib\threading.py", line 892, in run
self._target(*self._args, **self._kwargs)
File "C:\Python\PythonLearning\venv\lib\site-packages\multitasking\__init__.py", line 102, in _run_via_pool
return callee(*args, **kwargs)
File "C:\Python\PythonLearning\venv\lib\site-packages\yfinance\multi.py", line 167, in _download_one_threaded
data = _download_one(ticker, start, end, auto_adjust, back_adjust,
File "C:\Python\PythonLearning\venv\lib\site-packages\yfinance\multi.py", line 179, in _download_one
return Ticker(ticker).history(period=period, interval=interval,
File "C:\Python\PythonLearning\venv\lib\site-packages\yfinance\base.py", line 157, in history
data = data.json()
File "C:\Python\PythonLearning\venv\lib\site-packages\requests\models.py", line 900, in json
return complexjson.loads(self.text, **kwargs)
File "C:\Python\Python39\lib\json\__init__.py", line 346, in loads
return _default_decoder.decode(s)
File "C:\Python\Python39\lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Python\Python39\lib\json\decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

This is a known issue that is fixed in the latest version of the yfinance module (0.1.63), per the GitHub repository's Issue #777.
Update your yfinance module to 0.1.63 or later.

Related

Apache Beam stateful ParDo Work token invalid

I have a stateful DoFn that basically batches the elements that are coming and when the buffer reaches a certain size, the buffer is cleared and the elements are inserted into BigQuery. What I've noticed is that from time to time, the pipeline is raising an exception, the exception is not stopping the job to run. Below is the stacktrace:
Error message from worker: generic::unknown: Traceback (most recent call last):
File "apache_beam/runners/common.py", line 1213, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 742, in apache_beam.runners.common.PerWindowInvoker.invoke_process
File "apache_beam/runners/common.py", line 867, in apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
File "/usr/local/lib/python3.7/site-packages/gp/pipelines/common/writer_transforms.py", line 140, in process
self._flush_buffer(buffer_state, count_state, buffer_size_state)
File "/usr/local/lib/python3.7/site-packages/gp/pipelines/common/writer_transforms.py", line 162, in _flush_buffer
rows = self._extract_rows(buffer_state)
File "/usr/local/lib/python3.7/site-packages/gp/pipelines/common/writer_transforms.py", line 197, in _extract_rows
for row in buffer.read():
File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/bundle_processor.py", line 510, in __iter__
for elem in self.first:
File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py", line 1039, in _lazy_iterator
self._underlying.get_raw(state_key, continuation_token))
File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py", line 846, in get_raw
continuation_token=continuation_token)))
File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py", line 886, in _blocking_request
raise RuntimeError(response.error)
RuntimeError: INTERNAL: Work token invalid
This is raised when the process method is called and it tries to extract the elements from the buffer, see rows = self._extract_rows(buffer_state)
The DoFn is implemented exactly like in the example https://beam.apache.org/blog/timely-processing/#example-batched-rpc
I've confirmed this error is expected during work reassignments, e.g. when autoscaling. The work item will be retried on the new machine and the pipeline will continue processing correctly. (I agree the error message could be improved.)

ValueError: file descriptor out of range in select() in py2neo

After one of the recent updates of py2neo we are seeing a lot of randomly appearing errors saying
ValueError: file descriptor out of range in select() we are using py2neo to connect to remote neo4j instance.
client_identifier = request.args.get('tribes_client_id')
graph_obj = generic_helpers.get_graph_object(client_identifier) # <- returns py2neo instance
graph_transaction = graph_obj.begin() # <- this is the line causing the exception
Below is stack trace of the exception being raised
Traceback (most recent call last): File "/env/lib/python3.7/site-packages/flask/app.py", line 2447, in wsgi_app response = self.full_dispatch_request() File "/env/lib/python3.7/site-packages/flask/app.py", line 1952, in full_dispatch_request rv = self.handle_user_exception(e) File "/env/lib/python3.7/site-packages/flask/app.py", line 1821, in handle_user_exception reraise(exc_type, exc_value, tb) File "/env/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise raise value File "/env/lib/python3.7/site-packages/flask/app.py", line 1950, in full_dispatch_request rv = self.dispatch_request() File "/env/lib/python3.7/site-packages/flask/app.py", line 1936, in dispatch_request return self.view_functionsrule.endpoint File "/srv/neo4j_maintenance_routes.py", line 116, in get_words_list_for_lookup graph_transaction = graph_obj.begin() File "/env/lib/python3.7/site-packages/py2neo/database.py", line 353, in begin return Transaction(self, autocommit) File "/env/lib/python3.7/site-packages/py2neo/database.py", line 781, in __init__ self.transaction = self.connector.begin() File "/env/lib/python3.7/site-packages/py2neo/internal/connectors.py", line 297, in begin tx = self.pool.acquire() File "/env/lib/python3.7/site-packages/neobolt/direct.py", line 715, in acquire return self.acquire_direct(self.address) File "/env/lib/python3.7/site-packages/neobolt/direct.py", line 608, in acquire_direct connection = self.connector(address, error_handler=self.connection_error_handler) File "/env/lib/python3.7/site-packages/py2neo/internal/connectors.py", line 227, in connector encrypted=cx_data["secure"], **kwargs) File "/env/lib/python3.7/site-packages/neobolt/direct.py", line 972, in connect raise last_error File "/env/lib/python3.7/site-packages/neobolt/direct.py", line 964, in connect connection = _handshake(s, address, der_encoded_server_certificate, **config) File "/env/lib/python3.7/site-packages/neobolt/direct.py", line 898, in _handshake ready_to_read, _, _ = select((s,), (), (), 1) ValueError: filedescriptor out of range in select()
py2neo -> version 4.3.0
python -> version 3.7.3
neo4j -> version (Enterprise 3.5.3)
Thing to notice is we aren't getting these errors every time we are creating a new transaction but on random times and it automatically gets fixed after sometime.
I also come across this issue occasionally. The different thing is, I am implementing neo4j-python-driver (1.7.6).
File "/home/my/anaconda3/envs/python37/lib/python3.7/site-packages/neo4j/__init__.py", line 444, in run
self._connect()
File "/home/my/anaconda3/envs/python37/lib/python3.7/site-packages/neo4j/__init__.py", line 383, in _connect
self._connection = self._acquirer(access_mode)
File "/home/my/anaconda3/envs/python37/lib/python3.7/site-packages/neobolt/direct.py", line 715, in acquire
return self.acquire_direct(self.address)
File "/home/my/anaconda3/envs/python37/lib/python3.7/site-packages/neobolt/direct.py", line 608, in acquire_direct
connection = self.connector(address, error_handler=self.connection_error_handler)
File "/home/my/anaconda3/envs/python37/lib/python3.7/site-packages/neo4j/__init__.py", line 232, in connector
return connect(address, **dict(config, **kwargs))
File "/home/my/anaconda3/envs/python37/lib/python3.7/site-packages/neobolt/direct.py", line 972, in connect
raise last_error
File "/home/my/anaconda3/envs/python37/lib/python3.7/site-packages/neobolt/direct.py", line 964, in connect
connection = _handshake(s, address, der_encoded_server_certificate, **config)
File "/home/my/anaconda3/envs/python37/lib/python3.7/site-packages/neobolt/direct.py", line 898, in _handshake
ready_to_read, _, _ = select((s,), (), (), 1)
ValueError: filedescriptor out of range in select()
After I rerun the same code, nothing happened and the program worked well. I think this issue is probably caused by multiple 'login in' action of the same user. When I still in the process of accessing the neo4j database in the browser and use that account to fetch data with neo4j-python-driver at the same time, this error could happened. However, sometimes it seems that it just happened in that way regardless of the fact that I did nothing but run the code.
Extra Info:
Neo4j Graph Database: 3.5 community version

Dask not starting workers

I am trying to use Dask to perform a groupby operation on a Dataframe.
The code below does not work but it seems that if I initialize the Client from another console the code works, even though I can't see anything on the dashboard ( http://localhost:8787/status ): I mean, there is a dashboard, but all the figures look empty. I am on macOS.
Code:
from datetime import datetime
import numpy as np
import os
from dask import dataframe as dd
from dask.distributed import Client
import pandas as pd
client = Client()
# open http://localhost:8787/status
csv_path = 'chicago-complete.monthly.2018-07-01-to-2018-07-31/data.csv'
dir_destination = 'data'
df = dd.read_csv(csv_path,
dtype = {
'timestamp': str,
'node_id': str,
'subsystem': str,
'sensor': str,
'parameter': str,
'value_raw': str,
'value_hrf': str,
},
parse_dates=['timestamp'],
date_parser=lambda x: pd.datetime.strptime(x, '%Y/%m/%d %H:%M:%S')
)
#%%
if not os.path.exists(dir_destination):
os.makedirs(dir_destination)
def create_node_csv(df_node):
# test function
return len(df_node)
res = df.groupby('node_id').apply(create_node_csv, meta=int)
The csv file is simply composed by columns of string. My goal is to group of all the rows that contains a certain value in a column and than save them as separates file using create_node_csv(df_node) (even though right now is a dummy function). Any other way to do it is appreciated, but I would like to understand what's going on here.
When I run it, the console prints multiple times the following errors:
tornado.application - ERROR - Multiple exceptions in yield list
Traceback (most recent call last):
File "/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 883, in callback
result_list.append(f.result())
File "/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 1141, in run
yielded = self.gen.throw(*exc_info)
File "/anaconda3/lib/python3.7/site-packages/distributed/deploy/local.py", line 208, in _start_worker
yield w._start()
File "/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 1133, in run
value = future.result()
File "/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 1141, in run
yielded = self.gen.throw(*exc_info)
File "/anaconda3/lib/python3.7/site-packages/distributed/nanny.py", line 157, in _start
response = yield self.instantiate()
File "/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 1133, in run
value = future.result()
File "/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 1141, in run
yielded = self.gen.throw(*exc_info)
File "/anaconda3/lib/python3.7/site-packages/distributed/nanny.py", line 226, in instantiate
self.process.start()
File "/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 1133, in run
value = future.result()
File "/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 1141, in run
yielded = self.gen.throw(*exc_info)
File "/anaconda3/lib/python3.7/site-packages/distributed/nanny.py", line 370, in start
yield self.process.start()
File "/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 1133, in run
value = future.result()
File "/anaconda3/lib/python3.7/site-packages/distributed/process.py", line 35, in _call_and_set_future
res = func(*args, **kwargs)
File "/anaconda3/lib/python3.7/site-packages/distributed/process.py", line 184, in _start
process.start()
File "/anaconda3/lib/python3.7/multiprocessing/process.py", line 112, in start
self._popen = self._Popen(self)
File "/anaconda3/lib/python3.7/multiprocessing/context.py", line 291, in _Popen
return Popen(process_obj)
File "/anaconda3/lib/python3.7/multiprocessing/popen_forkserver.py", line 35, in __init__
super().__init__(process_obj)
File "/anaconda3/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__
self._launch(process_obj)
File "/anaconda3/lib/python3.7/multiprocessing/popen_forkserver.py", line 42, in _launch
prep_data = spawn.get_preparation_data(process_obj._name)
File "/anaconda3/lib/python3.7/multiprocessing/spawn.py", line 143, in get_preparation_data
_check_not_importing_main()
File "/anaconda3/lib/python3.7/multiprocessing/spawn.py", line 136, in _check_not_importing_main
is not going to be frozen to produce an executable.''')
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
And:
distributed.nanny - WARNING - Worker process 1844 exited with status 1
distributed.nanny - WARNING - Restarting worker
And:
Traceback (most recent call last):
File "/anaconda3/lib/python3.7/multiprocessing/queues.py", line 242, in _feed
send_bytes(obj)
File "/anaconda3/lib/python3.7/multiprocessing/connection.py", line 200, in send_bytes
self._send_bytes(m[offset:offset + size])
File "/anaconda3/lib/python3.7/multiprocessing/connection.py", line 404, in _send_bytes
self._send(header + buf)
File "/anaconda3/lib/python3.7/multiprocessing/connection.py", line 368, in _send
n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
tornado.application - ERROR - Multiple exceptions in yield list
Traceback (most recent call last):
File "/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 883, in callback
result_list.append(f.result())
File "/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 1147, in run
yielded = self.gen.send(value)
File "/anaconda3/lib/python3.7/site-packages/distributed/deploy/local.py", line 217, in _start_worker
raise gen.TimeoutError("Worker failed to start")
tornado.util.TimeoutError: Worker failed to start
tornado.application - ERROR - Multiple exceptions in yield list
Traceback (most recent call last):
File "/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 883, in callback
result_list.append(f.result())
File "/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 1147, in run
yielded = self.gen.send(value)
File "/anaconda3/lib/python3.7/site-packages/distributed/deploy/local.py", line 217, in _start_worker
EDIT:
Based on the answer:
- How do I prevent the creation of a new Client if I run the program again?
- How can I do the following?
def create_node_csv(df_node):
return len(df_node)
It returns me the following error, is it related to the meta parameter?
ValueError: cannot reindex from a duplicate axis
When you run the script, Client() is causing new Dask workers to be spawned, which also get copies of variables from the original main process. In some some cases, this involves re-importing the script in each worker, each of which, of course, then tries to create a Client and new set of processes.
The best answer, as in general with anything running in processes, is to use functions, and protect the main execution. The following would be a way to do this, without changing your one-script structure:
from datetime import datetime
import numpy as np
import os
from dask import dataframe as dd
from dask.distributed import Client
import pandas as pd
csv_path = 'chicago-complete.monthly.2018-07-01-to-2018-07-31/data.csv'
dir_destination = 'data'
def run():
client = Client()
df = dd.read_csv(csv_path, ...)
if not os.path.exists(dir_destination):
os.makedirs(dir_destination)
def create_node_csv(df_node):
# test function
return len(df_node)
res = df.groupby('node_id').apply(create_node_csv, meta=int)
print(res.compute())
if __name__ == "__main__":
run()
How do I prevent the creation of a new Client if I run the program again?
In the call to Client() you can include the address of an existing cluster, if you know what that would be. Also, some specific types of deployments (are there are a few) may have a concept of the "current cluster".

Unable to Replace a Dask Series Partition

I'm trying to replace a Series dask partition with my own partition.
I've used the code snippet given by #MRocklin in this post.
list_of_delayed = dask_df.to_delayed()
new_partition = dask.delayed(pd.read_csv)(filename)
list_of_delayed[i] = new_partition
new_dask_df = dd.from_delayed(list_of_delayed, meta=dask_df._meta)
I've done exactly the same except dask_df is a series in my case. I'm getting the following error:
Traceback (most recent call last):
File "sdfr_dhruvkmr.py", line 465, in <module>
pts = task[(task.task_date <= dtm.Time.iloc[i]) & (task.T_Date == dtm.Date.iloc[i])]
File "/usr/lib/python2.7/site-packages/edask/dataframe.py", line 130, in __getitem__
new_dask_df = dd.from_delayed(list_of_delayed)
File "/usr/lib/python2.7/site-packages/edask/edask/dask/dataframe/io/io.py", line 493, in from_delayed
type(df).__name__)
TypeError: Expected Delayed object, got Delayed

How do you timeout a twisted test that uses pytest?

I'm working on converting some tests from using Nose and twisted, to using Pytest and twisted, as Nose is no longer in development. The easiest way to convert the tests is by editing the custom decorator that each test has. This decorator is on every test, and defines a timeout for the individual test.
I've tried using #pytest.mark.timeout, but the only method that's worked is the 'thread' method, but this stops the entire test run and won't continue on to the next test. Using the method 'signal' fails to stop the test, but I can see an error present in the junitxml file.
def inlineCallbacksTest ( timeout = None ):
def decorator ( method ):
#wraps ( method )
#pytest.mark.timeout(timeout = timeout, method = 'signal' )
#pytest.inlineCallbacks
def testMethod ( *args, **kwargs ):
return method(*args, **kwargs)
return testMethod
return decorator
The tests themselves use twisted to start up and send messages to the software. I don't need the tests to cancel any twisted processes or locks. I would just like pytest to mark the test as a failure after the timeout, and then move onto the next test.
Below is the error I see in the xml file when using signal method of timeout.
</system-out><system-err>
+++++++++++++++++++++++++++++++++++ Timeout ++++++++++++++++++++++++++++++++++++
~~~~~~~ Stack of PoolThread-twisted.internet.reactor-2 (139997693642496) ~~~~~~~
File "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
self.__bootstrap_inner()
File "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
self.run()
File "/usr/lib64/python2.7/threading.py", line 764, in run
self.__target(*self.__args, **self.__kwargs)
File "/usr/lib64/python2.7/site-packages/twisted/python/threadpool.py", line 190, in _worker
o = self.q.get()
File "/usr/lib64/python2.7/Queue.py", line 168, in get
self.not_empty.wait()
File "/usr/lib64/python2.7/threading.py", line 339, in wait
waiter.acquire()
~~~~~~~ Stack of PoolThread-twisted.internet.reactor-1 (139997702035200) ~~~~~~~
File "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
self.__bootstrap_inner()
File "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
self.run()
File "/usr/lib64/python2.7/threading.py", line 764, in run
self.__target(*self.__args, **self.__kwargs)
File "/usr/lib64/python2.7/site-packages/twisted/python/threadpool.py", line 190, in _worker
o = self.q.get()
File "/usr/lib64/python2.7/Queue.py", line 168, in get
self.not_empty.wait()
File "/usr/lib64/python2.7/threading.py", line 339, in wait
waiter.acquire()
+++++++++++++++++++++++++++++++++++ Timeout ++++++++++++++++++++++++++++++++++++
Unhandled Error
Traceback (most recent call last):
File "/usr/lib64/python2.7/site-
packages/twisted/internet/base.py", line 1169, in run
self.mainLoop()
--- <exception caught here> ---
File "/usr/lib64/python2.7/site-
packages/twisted/internet/base.py", line 1181, in mainLoop
self.doIteration(t)
File "/usr/lib64/python2.7/site-
packages/twisted/internet/epollreactor.py", line 362, in doPoll
l = self._poller.poll(timeout, len(self._selectables))
File "/usr/lib/python2.7/site-packages/pytest_timeout.py", line 110, in handler
timeout_sigalrm(item, timeout)
File "/usr/lib/python2.7/site-packages/pytest_timeout.py", line 243, in timeout_sigalrm
pytest.fail(&apos;Timeout >%ss&apos; % timeout)
File "/usr/lib/python2.7/site-packages/_pytest/outcomes.py", line 85, in fail
raise Failed(msg=msg, pytrace=pytrace)
builtins.Failed: Timeout >5.0s
</system-err>
I have looked around for a similar solution, and the closest I could find was this question. Any help or suggestions would be appreciated.
Coming back to this after 4+ years with an answer. The problem seems to be the exception from the test getting caught by the twisted reactor. I was able to resolve this by updating the version of twisted. Twisted versions since 16.5 have a new Deferred function call addTimeout (Docs). Using that, I was able to modify the original decorator to the following. Now whenever a test times out, it simply raises an exception and moves on to the next one. May not be the most elegant, but I hope this helps someone else out!
import twisted.internet.defer as defer
import pytest_twisted as pt
from functools import wraps
def inlineCallbacksTest ( timeout = None ):
def testDecorator ( testFunc ):
def timeoutError ( value, timeout ):
raise Exception ( "Test Timeout: {} secs have expired".format ( timeout ) )
#wraps ( testFunc )
def wrapper ( *args, **kwargs ):
testDefer = pt.inlineCallbacks ( testFunc )( *args, **kwargs )
testDefer.addTimeout ( timeout, reactor, timeoutError )
return testDefer
return wrapper
return testDecorator

Resources