Analyzing final value given changing x values through time - time-series

I am analyzing data that contains the Y variable (final chemical content of a plant when it was harvested ~ 8 weeks) and the explanatory x variables of light quality measured each week for 8 weeks. I understand that this is not a typical time series analysis because my y value is not measured at each of these intervals but rather only at week 8, but I want to see how the changes throughout the 8 weeks influences the final chemical concentration. One possibility would be a nested regression where treatments (categorical) would have the light measurements (numerical) nested within them for each week. However, I'm not sure if this is the best approach. Any suggestions would be helpful.

Related

Time series with multiple independent variables

its been a while since I worked with time series data.
I have to build a model with a data for past 8 years. A dataset contains one dependent variable - price and few independent variables (lets assume, there are 2 independent variables). Each independent variable has its problems - trend, seasonality or both.
date
price (y)
x1
x2
01-01-2022
8
34.674
1.3333
02-01-2022
6
68.542
2.0
03-01-2022
5
44.523
4.0001
How should I approach this task? Should I apply transformations to each independent variable? What model options do I have? Which are suitable for time series with multiple independent variables?
As I understand, Vector auto regression (VAR) would be incorrect, as I want to predict only one feature (price).

generalized linear mixed model output spss

I am writing my master thesis and I run a generalized linear mixed regression model in SPSS (version 28) using count data.
Research question: which effect has the population mobility on the Covid-19 incidence at the federal state level in Germany during the period from February 2020 to November 2021.
To test the effect of population mobility (independent variable) on Covid-19 incidence (dependent variable) hierarchical models were used, with fixed factors:
mobility variables in 6 places.(scale)
cumulative vaccination rate (only second dose).( scale)
season (summer as the reference category) (nominal)
and random effects:
one model with days variable (Time level). (Scale)
Second model with federal states variable ( each state has a number from 1 to 16) ( place level). (Nominal)
Third model with both days and federal states (Time and place level).
First I have built intercept-only model to check which type of regression is more suitable for the count data (Possion or Negativ binomial) and to choose also the best variable as an offset from two variables..It showed that negative binomial regression is the best for this data. (Based on the BIC or AIC)
Secondly I have checked the collinearity between the original 6 mobility variables and I have excluded mobility variables that are highly correlated based on VIF. (Only one Variable was excluded)
Thirdly I have built 7 generalized linear models by adding only the fixed effects or the fixed factors which are the 5 mobility variables, the cumulative vaccination rate dose 2 and the season (with summer as a reference category) to the intercept only model gradually. From these 7 models the final model with best model fit was selected.
Finally I have built a generalized linear mixed model with the above final model and a classic random effect by adding Days variable only ((random-intercept component for time; TIME level)) and then with federal states variable only ((random-intercept component for place; PLACE level)) and finally with adding both of them together.
I am not sure if I ran the last step regarding the generalized linear mixed models correctly or not??
These are my Steps:
Analyze-> mixed models-> generalized linear mixed model-> fields and effects:
1.target-> case
Target distribution and relationship (link) with the linear model-> custom :
Distribution-> negative binomial
Link Funktion -> log
2.Fixed effects-> include intercept & 5 mobility variables & cumulative vaccination rate & season
3.random effects-> no intercept & days variable (TIME LEVEL)
Random effect covariance type: variance component
4.weight and offset-> use offset field-> log expected cases adjusted wave variable
Build options like general and estimation remain unchanged (suggested by spss)
Model options like Estimated means remain unchanged (suggested by spss)
I have done the same steps with the other 2 models except with random effects:
3.random effects-> no intercept & Federal state variable (PLACE LEVEL)
3.random effects-> no intercept & days variable & Federal state variable (TIME & PLACE LEVEL)
Output:
1.the variance of the random effect of days variable ( time level ) was very small 5,565E-6, indicating only marginal effect in the model. (MODEL 1)
2.the covariance of the random effect of the federal states was zero and the variance was 0.079 ( place level )(MODEL 2)
3.the variance of the random effect of days variable was very small 4,126E-6 and the covariance of the random effect of the federal states was zero and the variance was 0.060 ( Time and place level )(MODEL 3)
Can someone please check my steps and tell me which model from the models in the last step is the best for the presentation of results and explain also the last point in the output within the picture?
Thanks in advance to all of you...

Interpreting Seasonality in Time Series

I have a discrete time series covering 49 quarters between January 2007 and March 2019, which I am trying to analyse. Before undertaking various forms of analysis I wanted to check for the existence of seasonality and have tried to methods for such in R. In the first I used the WO function (Webel and Ollech) from the seastests package, which informed me that the data did not display seasonality.
library(seastests)
summary(wo(tt))
> summary(wo(tt))
Test used: WO
Test statistic: 0
P-value: 0.8174965 0.5785041 0.2495668
The WO - test does not identify seasonality
However, I wanted to check such again and used the decompose function, from which I got the below, which would appear to suggest a seasonal component. Can anyone advise if;
I am reading the decomposed data correctly?
AND
Why there is such disagreement between decompose and the seastest results?
The decompose function is a simple function that basically estimates the (moving) period average. The volatility of your time series increases strongly in the last years. Thus the averages may pick up on some random increases. Also, the seasonal component that you obtain using the decompose() function will basically always look seasonal.
set.seed(1234)
x <- ts(rnorm(80), frequency=4)
seastests::wo(x)
plot(decompose(x))
Therefore, seasonality tests are preferable to assessing whether a time series really is seasonal.
Still, if you have information that the data generating process has changed, you may want to use the test on the last few years of observations.

SPSS 26: How to calculate the absolute differences in scores from repeated measunrements in order to create cumulative frequency tables

I am working with SPSS 26 and I have some trouble finding out which functions to use...
I have scores from repeated measurements (9 setups with each 3 stimulus types Ă¡ 10 scores ) and need to calculate the absolute differences in scores in order to create cumulative frequency tables. The whole thing is about test-retest variability of the scores obtained with the instrument. The main goal is to be able to say that e.g. XX % of the scores for setup X and stimulus type X were within X points. Sorry, I hope that is somehow understandable :) I APPRECIATE ANY HELP I CAN GET I AM TERRIBLE AT THIS!

LSTM and labels

Lets start off with "I know ML cannot predict stock markets better than monkeys."
But I just want to go through with it.
My question is a theretical one.
Say I have date, open, high, low, close as columns. So I guess I have 4 features, open, high, low, close.
'my_close' is going to be my label(answer) and I will use the 'close' 7 days from current row. Basically i shift the 'close' column up 7 rows and make it a new column called 'my_close'.
LSTMs work on sequences. So say the sequence I set is 20 days.
hence my shape will be (1000days of data, 20 day as a sequence, 3 features).
The problem that is bothering me is should these 20 days or rows of data, have the exact same label? or can they have individual labels ?
Or have i misunderstood the whole theory?
Thanks guys.
In your case, You want to predict the current day's stock price using previous 7 days stock values. The way your building your inputs and outputs require some modification before feeding into the model.
Your making mistake in understanding timesteps(in your sequences).
Timesteps(sequences) in layman terms is the total number of inputs we will consider while predicting the output. In your case, it will be 7(not 20) as we will be using previous 7 days data to predict the current day's output.
Your Input should be previous 7 days of info
[F11,F12,F13],[F21,F22,F23],........,[F71,F72,F73]
Fij in this, F represents the feature, i represents timestep and j represents feature number.
and the output will be the stock price of the 8th day.
Here your model will analyze previous 7 days inputs and predict the output.
So to answer your question You will have a common label for previous 7 days input.
I strongly recommend you to study a bit more on LSTM's.

Resources