Quotes are not sourced from all markets and may be delayed up to 20
minutes. Information is provided 'as is' and solely for informational
purposes, not for trading purposes or advice.
This advice appears when I use GOOGLEFINANCE() function in my spreadsheet. It is unfortunate that the data is delayed up to 20 minutes.
What is the best way to get real-time stock prices? Suppose my budget is around $50 per month.
Be aware that I trade only US equities, i.e. no bonds, no cryptocurrencies, and so on.
UPDATE
Here is a sample version of my portfolio spreadsheet : https://docs.google.com/spreadsheets/d/1hIfCuupmc_OZ6514DXFe_NrDCX1Ix6tcvySP_VolppI/edit#gid=42667785. It would be important for me to get the price in real-time, and not delayed by maximum 20 minutes.
Is there a way to fix that?
The GOOGLEFINANCE formula is not consistent with the delays. Different stocks can be delayed by different times. You can get an estimate of the delay by using GOOGLEFINANCE("TICKER","DATADELAY").
This is at least somewhat helpful, but not ideal, because you'll have a price on your sheet and you don't know exactly when the price was from, just an estimate of how old the price might be. And forget about pre-market or after-hours. Once the market closes, all bets are off you'll have no idea when the price is from (i.e. after hours quote or regular session close).
If you want accurate real-time quotes, you're going to need an add-on. You said your budget is $50. That doesn't leave you a lot of options. For $9 per month you can use the Market Data Add-on and get real-time stock prices along with historical intraday prices. There is also a free tier that gives you 100 free daily prices.
Market Data's STOCKDATA formula is a drop-in replacement for GOOGLEFINANCE, so it follows the same syntax. It will accomplish what you need. For example, STOCKDATA("SPY","ALL") will produce an output like this:
Date
Bid
Bid Size
Mid
Ask
Ask Size
Last
Volume
5/19/2022 9:09:48
388.36
1400
388.38
388.41
1400
388.37
2715229
Note that the date and time of the quote is included in the output, so you know exactly when the quote was fetched. There is no doubt as to whether the quote is coming from the previous day or whether it is a quote from the pre-market session (which is the case of this example). If you compare to the current time using NOW(), you'll find the Market Data quotes are delayed by about 1-2 seconds, which is due to network latency from your Google Sheet to the servers.
it's important to notice the word "may" in the first sentence:
...and may be delayed up to 20 minutes...
usually, it's way under 20 minutes (around 1 - 1:30 minutes), but there could be times when some delay may occur.
and to answer your question: no, it's not possible to force it under 1 minute
if you want to go full pro mode with Google Sheets then try: =CRYPTOFINANCE()
The documentation links from player0 indicate that ONLY crypto exchanges are supported. Data is NOT available from stock exchanges (NASDAQ, NYSE, etc).
Related
I'm using the GoogleFinance() functions on a Google spreadsheet to keep track of my stocks. With the "datadelay" attribute I can check how long ago the data has been updated for the last time. But it only returns a raw number, like "54000" for one ticker and "15" for another. What time unit is that supposed to be? minutes? seconds? milliseconds?
When I check the documentation for the Google Finance I saw that there is a page explains there might be delay up to 20 minutes. They also mentioned that they are using different exchanges to retrieve market data and all this different exchanges might have different data delay. It can explain the differences in the "datadelay" column.
For the unit of this column, my assumption is it should be shorter than seconds since 54000 seconds = 900 min, which is far higher than the maximum delay defined in the help page. But I am not sure what would be value for this column when you query in the not-trading days.
The page shows delays for each exchange.
GoogleFinance() function updates every minute (if it is set so), but keep in mind that results may be delayed up to 20 minutes. so the answer is between 1-20 minutes
https://docs.google.com/spreadsheets/d/1SiUfqrJNHPAYjibeNBdzWQEcuzka5srf7mSHAv_bn5k/edit?usp=sharing
What would a formula look like to calculate the average tons per hour by driver in this example spreadsheet? Correcting for long times or even days between loads.
We're being charged on an hourly basis for freight so I'd like to figure out which drivers are the most efficient.
It's been tricky because the only concrete source of information we have is the scale tickets. So if they only do a single load in a day or go several hours between loads then the data would be skewed if you use a simple metric like time elapsed.
Also, I'll need the time elapsed between rows (not just the difference between Time In and Time Out) unless that time is > 1.5 hours. So something like:
=(TIMEVALUE(E3)-TIMEVALUE(D2))*24
...With some added logic to not include anything over 1.5 hours.
If a pivot table would be better than a lengthy formula, that's fine with me.
Here's an example for some added context: Driver Cody goes to Farm Nic to receive a load of hay, then comes back to the weigh station (Ticket, Time In, Gross are then determined), dumps the load, comes back to weigh again empty (Tare, Net, and Time Out are determined here), and goes back to Farm Nic until all the hay is harvested. Then it's on to Farm Zach and Farm Williams to repeat the process. There are several Drivers going at a time, which can be seen if the spreadsheet is sorted by Ticket. My goal is to figure out how many Tons each driver delivers per hour. The time elapsed would include the time between Tickets, because Time In and Time Out just show the time elapsed between coming in with a load of hay and leaving to go back to the field. To get a true measure of tons delivered per hour, you'd need to include the time between tickets, but also remove any instance where that time is greater than 1.5 hours. That will account for circumstances where the Driver isn't working and we aren't being billed, such as during equipment breakdowns.
I'm not much of a formulas guy so I hope this suffice your needs.
First I added a column to your sheet, to calculate how many amount of hours is taking for every single row, to do that I made use of the TIMEVALUE function:
=(TIMEVALUE(E2)-TIMEVALUE(D2))*24
Now you just need to get all the driver's hours and tons and make the quotient total_tons / total_hours. For that they may be some other functions that would do the job, myself I have used QUERY:
=QUERY(Sheet1!A:M, "select C, sum(I), sum(M), sum(I) / sum(M) group by C", 1)
I think pretty straightforward query, group all the data by C (Driver's name) and then sum the column I (tons) and the column M (hours).
With the following result:
The format may be a little off but you can change it as mush as you want. You can copy or play with the sheet
EDIT
After you change your requirements I made a change to my formula to calculate the hours worked:
=IF(
AND(
C3=C2,
A3=A2,
IFERROR(
(TIMEVALUE(E3)-TIMEVALUE(D2))*24) <= 1.5,
TRUE
),
(TIMEVALUE(E3)-TIMEVALUE(D2))*24,
(TIMEVALUE(E2)-TIMEVALUE(D2))*24
)
Let me explain here, before there was a much simpler formula but now having multiple rows that we need to check makes the formula more lenghty.
First with the IF and AND statement we check if the next row has:
The same day (A3=A2)
The same Driver (C3=C2)
Less than an hour and a half of difference (TIMEVALUE(E3)-TIMEVALUE(D2))*24) <= 1.5)
And also because the last row throws an error trying to TIMEVALUE an empty column I had to add the IFERROR
After that the TRUE condition (same day, same driver, under 1.5h hours difference) will calculate from the current Time in (D2) to the next Time in (D3):
(TIMEVALUE(D3)-TIMEVALUE(D2))*24
And in the FALSE statement we do the same we were doing before:
(TIMEVALUE(E2)-TIMEVALUE(D2))*24
The QUERY function stays the same. And the results have decreased drastically:
If you have any doubts you can go ahead and see the sheet
Dataset: I'm given the number of minutes individual customers use a product each day and am trying to cluster this data in order to find common usage patterns.
My question: How can I format the data so that, for example, a power user with high levels of use for a year looks the same as a different power user who has only been able to use the device for a month before I ended data collection?
So far I've turned each customer into an array where each cell is the number of minutes used that day. This array starts when the user first uses the product and ends after the user's first year of use. All entries in the cells must be double values (e.x. 200.0 minutes used) for the clustering model. I've considered either setting all cells/days after the last day of data collection to either -1.0 or NULL. Are either of these a valid approach? If not what would you suggest?
For the problem where you want both users (one that used the product a lot every day for a year, and the other used it a lot for one month), create a new entry where it's values are:
avg_usage per time_bin
time_bin can be a month, a day or another time bin which best fits your needs.
This way, a user which use a product, let's say 200 minutes per day for one year, will get:
200 * 30 * 12 / 12 = 6000 minutes per month
and the other user, which joined just last month, will also get, with the exact same usage will get:
200 * 30 * 1 / 1 = 6000 minutes per month.
This way, it doesn't matter when you have started to use the product, the only thing that matter, is the usage rate.
An important thing you might take into consideration, that products, may be forgotten for some time. for example, a computer, and I'm away for a vacation. Those days I didn't use my computer, doesn't have (maybe) an effect of my general usage of this product. So, based on your data, product and intuition you might consider removing gaps like the one I mentioned, and not take it into account inside the calculation.
The amount of time a user has used your product could be a signal of something, but if indeed he only started some time ago, and still using it until today, it may be something you need to take into consideration, and for that use, this average binning technique may help.
The question
How do you handle a change in grain (from weekly measurement to daily measurement) for a snapshot fact table.
Background info
For a star-schema design I want to incorporate the results of a survey as a fact (e.g. in week 2 of 2015 80% of the respondents have responded 'yes', in week 3 76% etc.)
This survey is conducted each week, and I only have access to the result of the survey (% of people saying yes this week) and not to the individual responses.
Based on (my interpretation of) Christopher Adamson's "Star Schema: The complete reference" I believe I should use a snapshot fact table for these kind of measurements.
The date dimension for this fact should be on the week-level, and be a conformed rollup of a more fine-grained date dimension for other facts in other stars that take place on a daily basis.
Here comes trouble
Now someone decides they want to conduct these surveys daily instead of weekly. What is the best way to handle this? Some of the options I'm currently considering:
change the week dimension to a daily one, and fake the old facts as if they happened on the last day of the week.
change the week dimension to a daily one, and add 7 facts for each weekly one.
create a new star, with the daily fact and dimension and treat the old one as an aggregate.
I'd appreciate any input. Please tell me if my logic is off, or my question is not clear :)
I'm not convinced that this is a snapshot. Each survey response represents a "transaction".
With an appropriate date dimension you can calculate the Yes/No percentages, rolled up by week.
Further, this would enable you to show results like "Surveys issued on a Sunday night get more responses", or "People who respond on Friday are more likely to answer 'Yes'". (contrived examples)
Following clarification, this does look like a periodic snapshot. The example of a bank account balance is often used to describe a similar scenario.
A key feature of a periodic snapshot is that every combination of every dimension should be present. If your grain is monthly, then every month you record the fact, even if it has not changed from the previous month.
I think that is the key to your problem. Knowing that your grain may change from weekly to daily, make your grain daily. It does mean you'll be repeating the weekly value on every day of the week, but that is a true representation of your knowledge of the fact; on Wednesday you only knew that its value was the same as Monday.
If you design your ETL right, you won't need to make any changes when the daily updates begin.
Your second option is the one I'd choose in your place.
Is there an example of a time-varying numerical quantity that might be in a data warehouse that cannot be meaningfully aggregated over time? If so why?
Stock levels cannot, because they represent a value that is already an aggregation at a particular moment in time.
If you have ten items in stock today and ten yesterday, and ten in stock every day this week, you cannot add them up to "70" meaningfully for the whole week, unless you are measuring something like space utilisation efficiency.
Other examples: bank balance, or speed of flywheel, or time since overhaul.
Many subatomic processes can be observed using our notion of "time" but probably wouldn't make much sense when aggregated. This is because our notion of "time" doesn't make much sense at the quantum level.