What is the range of the pacf function in python? I'd assumed it'd be [-1,1] like Pearson's correlation and autocorrelation, but on trying on my data, I see it has values like -5.
Can anyone tell me why pacf has a different range, and what exactly is this range?
Edit :
the functions I'm using are -
from statsmodels.graphics.tsaplots import plot_pacf, plot_acf
from statsmodels.tsa.stattools import pacf, acf
I'm using the data from this hackathon. This a timeseries of weekly sales across different stores and departments. I have checked the pacf values for data of 1 store, 1 dept
Here's the code for getting the pacf values :
# getting the data for just one store & dept
s1d1 = data[(data.Store==1)&(data.Dept==1)].sort_values('Date').reset_index(drop=True)
# differencing the series by 1 to make it stationary
s1d1['Weekly_Sales_shifted'] = s1d1.Weekly_Sales.shift(1)
s1d1['Weekly_Sales_differenced'] = s1d1.Weekly_Sales - s1d1.Weekly_Sales_shifted
# dropping the first record since it will have a nan in differenced column
s1d1 = s1d1.dropna(axis=0, subset='Weekly_Sales_differenced', how='any')
# getting the pacf values
pacf_values = pacf(s1d1['Weekly_Sales_differenced'], nlags=53)
Related
Since Tableau does not have a function for P-values(correct me if I'm wrong here) I created a spreadsheet with all possible sample sizes under two different alphas/significance levels and need to connect the appropriate p-value to a calculated field from the main database source (aggregate count of people). I assumed I could easily match numbers with a condition to bring back the p-value in a calculated field yet I'm hitting a brick wall. Biggest issue seems to be that the field I want to join the P-value reference table to is an aggregated integer. Also, I do not have any extensions and my end result needs to be an integer, not a graph.
Any secret tricks here?
Seems I cannot blend the reference table in nor join it to an aggregate?
Thanks!
I found a work around in calculating the critical value for a two tailed t-test in tableau. However, I didn't figure out how to join based on an aggregated calculated field. Work around: I used a conditional statement just copying and pasting about 100 critical values based on (sample size - 2) aka degrees of freedom, into a calculated field. To save time, use excel to pull down the conditions to 120. Worked like a charm!
Here is the conditional logic for alpha = .2 (80%) in two tailed t-test (replace the ## line with about 117 rows):
IF [degrees of freedom] = 1 THEN 3.08
ELSEIF [degrees of freedom] = 2 THEN 1.89
ELSEIF [degrees of freedom] = 3 THEN 1.64
##ELSEIF [...calculate down to 120] = ... then ...
ELSEIF [degrees of freedom] > 121 THEN 1.28
END
I've been trying to create a dataset for a chart all day and its way beyond me so hopefully someone can put me out of my misery!
I'm trying to create a line chart of average closing prices for a list of NYSE stocks over the previous 90 days. To build the chart, I believe I need a column of average prices and another column of dates. I should be able to create the chart but I've completely failed at creating the dataset. All I've managed to do is cook my processor.
I have a Column (A2) of NYSE Tickers, and I've tried to build a matrix of prices per Ticker, per date which has taken 2 separate ARRAYFORMULA() functions.
I'm hoping there's a way to process all the data within 1 ARRAYFORMULA() and output the average price per date into each cell, but anything is better than what I've been trying.
Here's some sample data:
NYSE Tickers
HZON
VGAC
BSN
DMYD
SNPR
THCB
(They won't all have data going as far back as 90 days)
Ideally, the output would be:
Avg. Price
Date
$10.11
27/02/2021
$10.08
26/02/2021
$10.02
25/02/2021
(Average price of all NYSE Tickers in my A2 column for that date)
I hope this is possible and someone can help!
Thanks
UPDATE
The dataset is just input manually by me. Pretty much everything about my trial and error attempt is useless. But, I was using this to return historical close prices:
=IFERROR(INDEX(GOOGLEFINANCE(CONCATENATE("NYSE:",$A2), "close", DATEVALUE(B$2)), 2, 2), "")
I'll try to demo what I was trying to accomplish: (best read from inside-out).
Output each average price+date to new rows {
-- For each day (X) - going back 90 days from today {
---- Get average price of array {
------ For each Ticker (build array?) {
-------- Get close price of Ticker on (X) date {
I am trying to explore the forecasting function provided by kusto. I tried the sample which obviously generated the forecasting trend shown by the docs. However, I then tried the forecasting function with similar parameters on our production data. For some reason the forecasted values are all null.
Our kusto raw data looks like the following:
I would like to forecast the values of a0. Here is my query:
...
| distinct ['name']))
| summarize a0=avg(todouble(['temp'])) by d0=bin(['timestamp'], 1s), d1=['name']
| summarize timeAsList = make_list(d0), dataAsList0 = make_list(a0)
| extend forecast = series_decompose_forecast(dataAsList0, 60*60*24*3) // 3 day forecast
| render timechart
This is what the query renders:
This line is just our production data, not a forecast. The actual forecast array is just an array of nulls, as you can see.
What is wrong with the query?
The second parameter of series_decompose_forecast defines the number of points to leave out of training from the original time series. In your case the length of your original time series is ~1:39 hours (just by looking at the screenshot) so setting 3 days to leave out leaves no data for training. You need to extend the time series with the forecast period prior to calling series_decompose_forecast. Also I recommend using make-series to create the time series, filling empty gaps, instead of summarize by bin and make list. So the final query should look like below. I cannot test it as I have no access to the data. If you need please share a sample datatable and I can craft you the full working query
thanks
Adi
let start_time=datetime(...);
let end_time=datetime(...);
let dt=1s;
let forecast_points=60*60*24*3
tbl
| make-series a0=avg(todouble(temp)) on timestamp from start_time to (end_time+forecast_points*dt) step dt
| extend forecast = series_decompose_forecast(a0, forecast_points) // 3 day forecast
| render timechart
I am currently calculating the average for a single dimension in a Druid data source using a timeseries query via pydruid. This is based on an example in the documentation (https://github.com/druid-io/pydruid):
from pydruid.client import PyDruid
from pydruid.utils.aggregators import count, doublesum
client = PyDruid()
client.timeseries(
datasource='test_datasource',
granularity='hour',
intervals='2019-05-13T11:00:00.000/2019-05-23T17:00:00.000',
aggregations={
'sum':doublesum('dimension_name'),
'count': count('rows')
},
post_aggregations={
'average': (
Field('sum')/ Field('count')
)
}
)
My problem is that I don't know what count('rows') is doing. This seems to give the total row count for a datasource and is not filtered on the dimension. So I don't know whether the average will be incorrect if one row in the dimension in question has a null value.
I was wondering whether anyone knew how to calculate the average correctly?
Many thanks
I have this spreadsheet with some data in column A - Date, column B (keyword usage on a specific date in 2012) and in column C (keyword usage on a specific date in 2013), as it is shown on picture:
What I would like is a function like FORECAST which would "predict" the value for the future date, in this case for February 22 (C5), based on this data.
Can you please help me with the formula?
You can use the =FORECAST(value, data_Y, data_X) formula.
value is the known value for which you want to predict the corresponding forecast (in your case B5);
data_Y is the series of data points for which you want to predict the future value (C2:C4);
data_X is the series of corresponding data points which form the basis of the forecast (B2:B4).
So in your example the formula you would put in C5 is =forecast(B5,C2:C4,B2:B4) which will return 28.7
You can find all formulas and their explanations for Google Spreadsheet at https://support.google.com/drive/bin/static.py?hl=en&topic=25273&page=table.cs