I'm looking for an open-source library in perl, python or even LISP for handling time-series data. The data will be read in from CSV files: data run lengths will typically be every 10 mins for two years. Can anyone recommend a library that would allow me load the data into an object and to for example, 'exclude all Sundays between 13:00 and 19:00' from the dataset, or conveniently create an object with all the periods I want excluded and to a AND operation on the original dataset. Must be able to handle more than a set of values per time sample.
I've seen pandas for python, it looks promising, any others come to mind?
Pandas is certainly one good way to go. R language also has good support for timeseries.
from pandas import Series, date_range
from numpy.random import randn
rng = date_range('1/1/2011', periods=10000, freq='10min')
ts = Series(randn(len(rng)), index=rng)
filtered_index = rng[((rng.dayofweek!=6) | ((rng.hour < 13) | (rng.hour>=19)))]
no_sunday_afternoons = ts[filtered_index]
print no_sunday_afternoons['2011-01-02 12:30:00':'2011-01-02 19:30:00']
2011-01-02 12:30:00 -1.395918
2011-01-02 12:40:00 0.382604
2011-01-02 12:50:00 -0.422495
2011-01-02 19:00:00 -0.341497
2011-01-02 19:10:00 0.982950
2011-01-02 19:20:00 -0.909796
2011-01-02 19:30:00 0.842446
dtype: float64
Related
How can I convert a DataFrame column of strings (in dd/mm/yyyy format) to datetime dtype?
The easiest way is to use to_datetime:
df['col'] = pd.to_datetime(df['col'])
It also offers a dayfirst argument for European times (but beware this isn't strict).
Here it is in action:
In [11]: pd.to_datetime(pd.Series(['05/23/2005']))
Out[11]:
0 2005-05-23 00:00:00
dtype: datetime64[ns]
You can pass a specific format:
In [12]: pd.to_datetime(pd.Series(['05/23/2005']), format="%m/%d/%Y")
Out[12]:
0 2005-05-23
dtype: datetime64[ns]
If your date column is a string of the format '2017-01-01'
you can use pandas astype to convert it to datetime.
df['date'] = df['date'].astype('datetime64[ns]')
or use datetime64[D] if you want Day precision and not nanoseconds
print(type(df_launath['date'].iloc[0]))
yields
<class 'pandas._libs.tslib.Timestamp'>
the same as when you use pandas.to_datetime
You can try it with other formats then '%Y-%m-%d' but at least this works.
You can use the following if you want to specify tricky formats:
df['date_col'] = pd.to_datetime(df['date_col'], format='%d/%m/%Y')
More details on format here:
Python 2 https://docs.python.org/2/library/datetime.html#strftime-strptime-behavior
Python 3 https://docs.python.org/3.7/library/datetime.html#strftime-strptime-behavior
If you have a mixture of formats in your date, don't forget to set infer_datetime_format=True to make life easier.
df['date'] = pd.to_datetime(df['date'], infer_datetime_format=True)
Source: pd.to_datetime
or if you want a customized approach:
def autoconvert_datetime(value):
formats = ['%m/%d/%Y', '%m-%d-%y'] # formats to try
result_format = '%d-%m-%Y' # output format
for dt_format in formats:
try:
dt_obj = datetime.strptime(value, dt_format)
return dt_obj.strftime(result_format)
except Exception as e: # throws exception when format doesn't match
pass
return value # let it be if it doesn't match
df['date'] = df['date'].apply(autoconvert_datetime)
Try this solution:
Change '2022–12–31 00:00:00' to '2022–12–31 00:00:01'
Then run this code: pandas.to_datetime(pandas.Series(['2022–12–31 00:00:01']))
Output: 2022–12–31 00:00:01
Multiple datetime columns
If you want to convert multiple string columns to datetime, then using apply() would be useful.
df[['date1', 'date2']] = df[['date1', 'date2']].apply(pd.to_datetime)
You can pass parameters to to_datetime as kwargs.
df[['start_date', 'end_date']] = df[['start_date', 'end_date']].apply(pd.to_datetime, format="%m/%d/%Y")
Use format= to speed up
If the column contains a time component and you know the format of the datetime/time, then passing the format explicitly would significantly speed up the conversion. There's barely any difference if the column is only date, though. In my project, for a column with 5 millions rows, the difference was huge: ~2.5 min vs 6s.
It turns out explicitly specifying the format is about 25x faster. The following runtime plot shows that there's a huge gap in performance depending on whether you passed format or not.
The code used to produce the plot:
import perfplot
import random
mdYHM = range(1, 13), range(1, 29), range(2000, 2024), range(24), range(60)
perfplot.show(
kernels=[lambda x: pd.to_datetime(x), lambda x: pd.to_datetime(x, format='%m/%d/%Y %H:%M')],
labels=['pd.to_datetime(x)', "pd.to_datetime(x, format='%m/%d/%Y %H:%M')"],
n_range=[2**k for k in range(19)],
setup=lambda n: pd.Series([f"{m}/{d}/{Y} {H}:{M}"
for m,d,Y,H,M in zip(*[random.choices(e, k=n) for e in mdYHM])]),
equality_check=pd.Series.equals,
xlabel='len(df)'
)
I need to create a tool that reads in an options spread order in string format and spits it out in human readable format.
Examples:
Input:
BUY +6 VERTICAL LUV 100 (Weeklys) 28 AUG 20 37.5/36.5 PUT #.49 LMT
Output:
VERTICAL
BUY +6 LUV 28 AUG 20 (Weeklys) 37.5 PUT
SELL -6 LUV 28 AUG 20 (Weeklys) 36.5 PUT
.49 DEBIT LMT
Input:
BUY +1 DIAGONAL AMGN 100 (Weeklys) 4 SEP 20/28 AUG 20 245/240 CALL #.07 LMT
Output:
DIAGONAL
BUY +1 AMGN 4 SEP 20 (Weeklys) 245 CALL
SELL +1 AMGN 28 AUG 20 (Weeklys) 240 CALL
-.07 CREDIT LMT
On the surface a context-free grammar appears to be a good solution to express the various syntax (diagonal spreads are more complicated). But having almost no experience with context-free grammars I am not sure how I would carry the numbers over and also how I would for instance add the SELL orders which are not explicitly mentioned in the original order string. The SELL leg is assumed due to it being a vertical spread for example.
Hope this makes sense even if you are not an option trader ;-) The basic idea here is that translating the original string requires a bit of intelligence and is not just a matter of generating different text.
Any insights and pointers would be welcome.
It's a little hard to tell from only 2 examples, but my guess is, using a context-free grammar (especially if you have almost no experience with them) is probably overkill. The grammar itself would probably be simple enough, but you would need to either add 'actions' to transform the recognized input into the desired output, or have the parser build a syntax-tree and then write code to extract the data from the tree and generate the desired output.
It would be simpler to use regular expressions with capturing. For instance, here's some python3 code that pretty much handles your 2 examples:
import sys, re
for line in sys.stdin:
mo = re.fullmatch(r'BUY \+(\d+) (VERTICAL|DIAGONAL) (\S+) 100 \(Weeklys\) (\d+ \w+ \d+)(?:/(\d+ \w+ \d+))? ([\d.]+)/([\d.]+) (PUT|CALL) #(.\d+) LMT\n', line)
(n_units, vert_or_diag, name, date1, date2, price1, price2, put_or_call, limit) = mo.groups()
if vert_or_diag == 'VERTICAL':
assert date2 is None
date2 = date1
print()
print(vert_or_diag)
print(f"BUY +{n_units} {name} {date1} (Weeklys) {price1} {put_or_call}")
print(f"SELL -{n_units} {name} {date2} (Weeklys) {price2} {put_or_call}")
print(f"{limit} DEBIT LMT")
It's not perfect, because the problem isn't perfectly specified (e.g., it's unclear what causes the human readable format to have a positive DEBIT vs a negative CREDIT). And the space of inputs is no doubt larger than the regex currently handles.
The point is just to show that, based on the examples given, regular expressions could be a compact solution to the general problem.
How to return time in RFC 3339 format (2014-06-01T12:00:00Z). I read docs about calendar module, but there was no explanation how to generate time format like this. My program should work in different time zones, so please give me advices.
The Erlang Central page Converting Between struct:time and ISO8601 Format has this example:
Unfortunately, no Erlang libraries provide this functionality. Luckily, the native Erlang date and time formats are very easy to format for display or transmission, even in ISO-8601 format:
-module(iso_fmt).
-export([iso_8601_fmt/1]).
iso_8601_fmt(DateTime) ->
{{Year,Month,Day},{Hour,Min,Sec}} = DateTime,
io_lib:format("~4.10.0B-~2.10.0B-~2.10.0B ~2.10.0B:~2.10.0B:~2.10.0B",
[Year, Month, Day, Hour, Min, Sec]).
format_iso8601() ->
{{Year, Month, Day}, {Hour, Min, Sec}} =
calendar:universal_time(),
iolist_to_binary(
io_lib:format(
"~.4.0w-~.2.0w-~.2.0wT~.2.0w:~.2.0w:~.2.0wZ",
[Year, Month, Day, Hour, Min, Sec] )).
Using the above module:
1> {{Year,Month,Day},{Hour,Min,Sec}} = erlang:localtime().
{{2004,8,28},{1,19,37}}
2> io:fwrite("~s\n",[iso_fmt:iso_8601_fmt(erlang:localtime())]).
2004-08-28 01:48:48
To make it output time in UTC, just pass it the return value of erlang:universaltime() instead of erlang:localtime().
Two computer install centos 6.5, kernel is 3.10.44, have different result.
one result is [u'Asia/Shanghai', u'Asia/Urumqi'], and the other is ['Asia/Shanghai', 'Asia/Harbin', 'Asia/Chongqing', 'Asia/Urumqi', 'Asia/Kashgar'].
Is there any config that make the first result same as the second result?
I have following python code:
def get_date():
date = datetime.utcnow()
from_zone = pytz.timezone("UTC")
to_zone = pytz.timezone("Asia/Urumqi")
date = from_zone.localize(date)
date = date.astimezone(to_zone)
return date
def get_curr_time_stamp():
date = get_date()
stamp = time.mktime(date.timetuple())
return stamp
cur_time = get_curr_time_stamp()
print "1", time.strftime("%Y %m %d %H:%M:%S", time.localtime(time.time()))
print "2", time.strftime("%Y %m %d %H:%M:%S", time.localtime(cur_time))
When use this code to get time, the result of one computer(have 2 results) is:
1 2016 04 20 08:53:18
2 2016 04 20 06:53:18
and the other(have 5 results) is:
1 2016 04 20 08:53:18
2 2016 04 20 08:53:18
I don't know why?
You probably just have an outdated version of pytz on the system returning five time zones (or perhaps on both systems). You can find the latest releases here. It's important to stay on top of time zone updates, as the various governments of the world change their time zones often.
Like most systems, pytz gets its data from the tz database. The five time zones for China were reduced to two in version 2014f (corresponding to pytz 2014.6). From the release notes:
China's five zones have been simplified to two, since the post-1970
differences in the other three seem to have been imaginary. The
zones Asia/Harbin, Asia/Chongqing, and Asia/Kashgar have been
removed; backwards-compatibility links still work, albeit with
different behaviors for time stamps before May 1980. Asia/Urumqi's
1980 transition to UTC+8 has been removed, so that it is now at
UTC+6 and not UTC+8. (Thanks to Luther Ma and to Alois Treindl;
Treindl sent helpful translations of two papers by Guo Qingsheng.)
Also, you may wish to read Wikipedia's Time in China article, which explains that the Asia/Urumqui entry is for "Ürümqi Time", which is used unofficially in some parts of the Xinjiang region. This zone is not recognized by the Chinese government, and is considered a politically charged issue. As such, many systems choose to omit the Urumqi time zone, despite it being in listed in the tz database.
What algorithms or formulas are available for computing the equinoxes and solstices? I found one of these a few years ago and implemented it, but the precision was not great: the time of day seemed to be assumed at 00:00, 06:00, 12:00, and 18:00 UTC depending on which equinox or solstice was computed. Wikipedia gives these computed out to the minute, so something more exact must be possible. Libraries for my favorite programming language also come out to those hardcoded times, so I assume they are using the same or a similar algorithm as the one I implemented.
I also once tried using a library that gave me the solar longitude and implementing a search routine to zero in on the exact moments of 0, 90, 180, and 270 degrees; this worked down to the second but did not agree with the times in Wikipedia, so I assume there was something wrong with this approach. I am, however, pleasantly surprised to discover that Maimonides (medieval Jewish scholar) proposed an algorithm using the exact same idea a millenium ago.
A great source for the (complex!) underlying formulas and algorithms is Astronomical Algorithms by Jean Meeus.
Using the PyMeeus implementation of those algorithms, and the code below, you can get the following values for the 2018 winter solstice (where "winter" refers to the northern hemisphere).
winter solstice for 2018 in Terrestrial Time is at:
(2018, 12, 21, 22, 23, 52.493725419044495)
winter solstice for 2018 in UTC, if last leap second was (2016, 12):
(2018, 12, 21, 22, 22, 43.30972542127711)
winter solstice for 2018 in local time, if last leap second was (2016, 12)
and local time offset is -7.00 hours:
(2018, 12, 21, 15, 22, 43.30973883232218)
i.e. 2018-12-21T15:22:43.309725-07:00
Of course, the answer is not accurate down to microseconds, but I also wanted to show how to do high-precision conversions with arrow.
Code:
from pymeeus.Sun import Sun
from pymeeus.Epoch import Epoch
year = 2018 # datetime.datetime.now().year
target="winter"
# Get terrestrial time of given solstice for given year
solstice_epoch = Sun.get_equinox_solstice(year, target=target)
print("%s solstice for %d in Terrestrial Time is at:\n %s" %
(target, year, solstice_epoch.get_full_date()))
print("%s solstice for %d in UTC, if last leap second was %s:\n %s" %
(target, year, Epoch.get_last_leap_second()[:2], solstice_epoch.get_full_date(utc=True)))
solstice_local = (solstice_epoch + Epoch.utc2local()/(24*60*60))
print("%s solstice for %d in local time, if last leap second was %s\n"
" and local time offset is %.2f hours:\n %s" %
(target, year, Epoch.get_last_leap_second()[:2],
Epoch.utc2local() / 3600., solstice_local.get_full_date(utc=True)))
Using the very cool more ISO and TZ aware module Arrow: better dates and times for Python, that can be printed more nicely:
import arrow
import math
slutc = solstice_epoch.get_full_date(utc=True)
frac, whole = math.modf(slutc[5])
print("i.e. %s" % arrow.get(*slutc[:5], int(whole), round(frac * 1e6)).to('local'))
I'm not sure if this is an accurate enough solution for you, but I found a NASA website that has some code snippets for calculating the vernal equinox as well as some other astronomical-type information. I've also found some references to a book called Astronomical Algorithms which may have the answers you need if the info somehow isn't available online.
I know you're looking for something that'll paste into an answer here, but I have to mention SPICE, a toolkit produced by NAIF at JPL, funded by NASA. It might be overkill for Farmer's Almanac stuff, but you mentioned interest in precision and this toolkit is routinely used in planetary science.
I have implemented Jean Meeus' (the author of the Astronomical Algorithms referenced above) equinox and solstice algorithm in C and Java, if you're interested.