What is the time unit for "datadelay" attribute on GoogleFinance? - google-sheets

I'm using the GoogleFinance() functions on a Google spreadsheet to keep track of my stocks. With the "datadelay" attribute I can check how long ago the data has been updated for the last time. But it only returns a raw number, like "54000" for one ticker and "15" for another. What time unit is that supposed to be? minutes? seconds? milliseconds?

When I check the documentation for the Google Finance I saw that there is a page explains there might be delay up to 20 minutes. They also mentioned that they are using different exchanges to retrieve market data and all this different exchanges might have different data delay. It can explain the differences in the "datadelay" column.
For the unit of this column, my assumption is it should be shorter than seconds since 54000 seconds = 900 min, which is far higher than the maximum delay defined in the help page. But I am not sure what would be value for this column when you query in the not-trading days.
The page shows delays for each exchange.

GoogleFinance() function updates every minute (if it is set so), but keep in mind that results may be delayed up to 20 minutes. so the answer is between 1-20 minutes

Related

Time format in Google sheets

I have a 2 columns. One with a planned time of departure and another with actual time of departure. Up until a few days ago, if there was a delay, the difference would read as a negative number. For example, if the planned time was 8:00AM and the actual time was 8:21AM, the 3rd column showed a delay as "-00:21:00"), but now the duration of the delay is displayed as 24:00:00 minus the number of minutes of the delay, as shown in this snippet:
Fortunately, any/all formulas still read and process the values, so this is really just a cosmetic issue for anyone who wants to glance at the raw data.
sounds like the formatting got reset. select your Delayed column and change the format to Duration

Change the delay from 20 to 1 minutes

Quotes are not sourced from all markets and may be delayed up to 20
minutes. Information is provided 'as is' and solely for informational
purposes, not for trading purposes or advice.
This advice appears when I use GOOGLEFINANCE() function in my spreadsheet. It is unfortunate that the data is delayed up to 20 minutes.
What is the best way to get real-time stock prices? Suppose my budget is around $50 per month.
Be aware that I trade only US equities, i.e. no bonds, no cryptocurrencies, and so on.
UPDATE
Here is a sample version of my portfolio spreadsheet : https://docs.google.com/spreadsheets/d/1hIfCuupmc_OZ6514DXFe_NrDCX1Ix6tcvySP_VolppI/edit#gid=42667785. It would be important for me to get the price in real-time, and not delayed by maximum 20 minutes.
Is there a way to fix that?
The GOOGLEFINANCE formula is not consistent with the delays. Different stocks can be delayed by different times. You can get an estimate of the delay by using GOOGLEFINANCE("TICKER","DATADELAY").
This is at least somewhat helpful, but not ideal, because you'll have a price on your sheet and you don't know exactly when the price was from, just an estimate of how old the price might be. And forget about pre-market or after-hours. Once the market closes, all bets are off you'll have no idea when the price is from (i.e. after hours quote or regular session close).
If you want accurate real-time quotes, you're going to need an add-on. You said your budget is $50. That doesn't leave you a lot of options. For $9 per month you can use the Market Data Add-on and get real-time stock prices along with historical intraday prices. There is also a free tier that gives you 100 free daily prices.
Market Data's STOCKDATA formula is a drop-in replacement for GOOGLEFINANCE, so it follows the same syntax. It will accomplish what you need. For example, STOCKDATA("SPY","ALL") will produce an output like this:
Date
Bid
Bid Size
Mid
Ask
Ask Size
Last
Volume
5/19/2022 9:09:48
388.36
1400
388.38
388.41
1400
388.37
2715229
Note that the date and time of the quote is included in the output, so you know exactly when the quote was fetched. There is no doubt as to whether the quote is coming from the previous day or whether it is a quote from the pre-market session (which is the case of this example). If you compare to the current time using NOW(), you'll find the Market Data quotes are delayed by about 1-2 seconds, which is due to network latency from your Google Sheet to the servers.
it's important to notice the word "may" in the first sentence:
...and may be delayed up to 20 minutes...
usually, it's way under 20 minutes (around 1 - 1:30 minutes), but there could be times when some delay may occur.
and to answer your question: no, it's not possible to force it under 1 minute
if you want to go full pro mode with Google Sheets then try: =CRYPTOFINANCE()
The documentation links from player0 indicate that ONLY crypto exchanges are supported. Data is NOT available from stock exchanges (NASDAQ, NYSE, etc).

Prepping Data For Usage Clustering

Dataset: I'm given the number of minutes individual customers use a product each day and am trying to cluster this data in order to find common usage patterns.
My question: How can I format the data so that, for example, a power user with high levels of use for a year looks the same as a different power user who has only been able to use the device for a month before I ended data collection?
So far I've turned each customer into an array where each cell is the number of minutes used that day. This array starts when the user first uses the product and ends after the user's first year of use. All entries in the cells must be double values (e.x. 200.0 minutes used) for the clustering model. I've considered either setting all cells/days after the last day of data collection to either -1.0 or NULL. Are either of these a valid approach? If not what would you suggest?
For the problem where you want both users (one that used the product a lot every day for a year, and the other used it a lot for one month), create a new entry where it's values are:
avg_usage per time_bin
time_bin can be a month, a day or another time bin which best fits your needs.
This way, a user which use a product, let's say 200 minutes per day for one year, will get:
200 * 30 * 12 / 12 = 6000 minutes per month
and the other user, which joined just last month, will also get, with the exact same usage will get:
200 * 30 * 1 / 1 = 6000 minutes per month.
This way, it doesn't matter when you have started to use the product, the only thing that matter, is the usage rate.
An important thing you might take into consideration, that products, may be forgotten for some time. for example, a computer, and I'm away for a vacation. Those days I didn't use my computer, doesn't have (maybe) an effect of my general usage of this product. So, based on your data, product and intuition you might consider removing gaps like the one I mentioned, and not take it into account inside the calculation.
The amount of time a user has used your product could be a signal of something, but if indeed he only started some time ago, and still using it until today, it may be something you need to take into consideration, and for that use, this average binning technique may help.

get minute bar historical data from Google finance?

I can get daily data easily using this link:
https://www.google.com/finance/getprices?q=LHA&x=ETR&i=60&p=1d&f=d,c,h,l,o,v
But when I try to change "1d" to "1y" I still get 1 day's data.
I am trying to get 2 years' worth.
Is there a way to do this? yahoo or bing finance would be fine too.
You need to use '1Y', not '1y' on your query to get a time period stretching back 1 year. However, you will also need to change the granularity of your query, as minute data is only available for the previous 5 days.
This query will provide you with minute data for the previous 5 days.
https://www.google.com/finance/getprices?q=LHA&x=ETR&i=60&p=5d&f=d,c,h,l,o,v
The following query will provide you with the last two years of prices at the close.
https://www.google.com/finance/getprices?q=LHA&x=ETR&i=86400&p=2Y&f=d,c,h,l,o,v
google API "getprices" is NO more fetching intraday data of any interval (say 1 minute or 5 minute or 60 minutes .....). It is now fetching data in 1 day interval only irrespective of interval set.
I tried getting intraday data before, the furthest back I can get is 15 days, i.e. 15d
Using 2w or 1y did not work for me.
references:
http://www.mathworks.com/matlabcentral/fileexchange/32745-get-intraday-stock-price/content/getHistoricalIntraDayStockPrice.m
http://www.codeproject.com/Articles/221952/Simple-Csharp-DLL-to-download-data-from-Google-Fin
Update [2020]
Google and Yahoo deprecated their APIs so only direct website access works. However, the max resolution you can get is 1 day (end-of-day).
For historical high-resolution data Tickdata.com is a good source for tick-level data.
If 1-min bars are sufficient FirstRateData.com has 1-min data for most stocks and fx going back 20 years.

Time and date dimension in data warehouse

I'm building a data warehouse. Each fact has it's timestamp. I need to create reports by day, month, quarter but by hours too. Looking at the examples I see that dates tend to be saved in dimension tables.
(source: etl-tools.info)
But I think, that it makes no sense for time. The dimension table would grow and grow. On the other hand JOIN with date dimension table is more efficient than using date/time functions in SQL.
What are your opinions/solutions ?
(I'm using Infobright)
Kimball recommends having separate time- and date dimensions:
design-tip-51-latest-thinking-on-time-dimension-tables
In previous Toolkit books, we have
recommended building such a dimension
with the minutes or seconds component
of time as an offset from midnight of
each day, but we have come to realize
that the resulting end user
applications became too difficult,
especially when trying to compute time
spans. Also, unlike the calendar day
dimension, there are very few
descriptive attributes for the
specific minute or second within a
day. If the enterprise has well
defined attributes for time slices
within a day, such as shift names, or
advertising time slots, an additional
time-of-day dimension can be added to
the design where this dimension is
defined as the number of minutes (or
even seconds) past midnight. Thus this
time-ofday dimension would either have
1440 records if the grain were minutes
or 86,400 records if the grain were
seconds.
My guess is that it depends on your reporting requirement.
If you need need something like
WHERE "Hour" = 10
meaning every day between 10:00:00 and 10:59:59, then I would use the time dimension, because it is faster than
WHERE date_part('hour', TimeStamp) = 10
because the date_part() function will be evaluated for every row.
You should still keep the TimeStamp in the fact table in order to aggregate over boundaries of days, like in:
WHERE TimeStamp between '2010-03-22 23:30' and '2010-03-23 11:15'
which gets awkward when using dimension fields.
Usually, time dimension has a minute resolution, so 1440 rows.
Time should be a dimension on data warehouses, since you will frequently want to aggregate about it. You could use the snowflake-Schema to reduce the overhead. In general, as I pointed out in my comment, hours seem like an unusually high resolution. If you insist on them, making the hour of the day a separate dimension might help, but I cannot tell you if this is good design.
I would recommend having seperate dimension for date and time. Date Dimension would have 1 record for each date as part of identified valid range of dates. For example: 01/01/1980 to 12/31/2025.
And a seperate dimension for time having 86400 records with each second having a record identified by the time key.
In the fact records, where u need date and time both, add both keys having references to these conformed dimensions.

Resources