What machine learning technique can be used for multivariate time series? - time-series

I'm trying to predict the price of tomatoes, I've collected a data set that contains the previous tomato price along with which I've also added features that might affect the change in tomato price, for example, wages in agriculture over months, inflation rate over months, rainfall over months. Does this qualify as a multivariate time series? What machine learning technique can be used to solve this problem? The constraint is that there are only 48 data points (4 years *12 months). Also, can the test and train be pulled using Cross Validation ?
Columns in my dataset:
Year
Month
Tomato price
Wage
Inflation
Rainfall
Number of festivals in the month
Thanks in advance !!

Related

Taking data till same day and predicting for the same day in multivariate LSTM stock market forecasting

In this below Github post, the author predicted the next "1" day with a multivariate LSTM model. But I think he is taking data till the same day and also predicting for the same day. I am not sure though.
Is it okay to take data including the same day to predict the day? What can I do to predict the next day's price without taking its own data in this code?
https://github.com/flo7up/relataly-public-python-tutorials/blob/master/007%20Time%20Series%20Forecasting%20-%20Multivariate%20Time%20Series%20Models.ipynb
Thanks in advance...

Overlapping dependent time series, ML problem approach [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
Below is a simplified description of the problem:
Three weeks before delivery of a product a estimation of what the qty will be delivered on a certain demand date is given by the buyer.
This quantity might change as times comes closer to delivery (Illustrated in the image below). This seems quite straight forward but there is a high correlation between the Demand weeks. e.g if a qty is lowered for one week its likely that a surrounding week will increase.
Is there an approach that will get the model to acknowledge the surrounding demand weeks?
I'm currently using random forest regression with the attributes shown in the image and the results are OK but I thought asking for inspiration here might be a good idea.
From your description I understood, that you are currently using only the forecasts of the buyer as an input. And what you would like to do is to also consider the actual Qty of the last week(s) as an input for the next estimation. To achieve this you could create another column in your table that is the actual Qty shifted by one week. That way you get a new column "Actual Qty previous week". Then you can train your model to try to predict using both the buyer forecast and the actual Qty from last week. Of cause you can do the thing once more and shift by two weeks to also make the week before that available.
In addition you can also come up with more elaborate calculated features. One idea would be the average deviation of the buyer-forecast from the final demand (where you take the average for e.g. the last 10 weeks). That way you would be able to detect that some buyers tend to overestimate and some tend to underestimate.
Since you mentioned that variations of qty are influencing the subsequent weeks, I propose to just do tha: create a new feature that is going to show the variation.
This implies to run the predictive algorithm iteratively one week after the other, adding each time a new feature to the dataset: the variation of predicted total quantity for previous weeks.
The method would go like this:
run prediction model for week1
add a feature to the dataset: variation of predicted qty for week 1
run prediction model for week2
add a feature to the dataset: variation of predicted qty for week 1 + week 2
run prediction model for week3
etc ...
This is of course only the idea. It is possible to add different kind of features (variation of last week only, moving average of last weeks, whatever would make sense,...)

Python - How to predict feature sales value by feeding other parameters? Using LSTM or?

I am new to this Regression world and I have a nerd question, you may say.
Actually I was trying to solve a problem to predict future sales in my organization.
I have collected all the data for last year. My data includes (for each day):
Total Sales(count)
Temperature
Wind Direction
Precipitation
Day of week (i.e 1 or 2 or 3.. or 7)
Whether a working day or not.
etc.
My goal :
1. I will train a model so that if I give the input of all the values of 2 to 7 (i.e of data, of the day that I want to predict, which is neither in test nor test data) and it will give me the predicted value of 1 (i.e Total Sales).
I Tried :
1. 1st I tried with a Univariate LSTM model(i.e with total sales from past one year data, predict the next data). But, I couldn't feed the other data as input.
Then I tried a Multivariate LSTM model, but this would predict all of the data for the next series.
Then I searched for many tutorials to solve the problem. Such as : This video tutorial which uses LSTM for electricity bill consumption, but it only shows the model building and not how to implement it.
I came with another question : from stack overflow. But here, the user seems to be moving to reinforcement learning.
Conclusion : What should i do to solve such problems? How to predict future sales count by feeding the data for that day?

Time Series Forecasting: weekly vs daily predictions

I have some daily time series data. I am trying to predict the next 3 days from the historical daily set of data.
The historical data shows a definite trend based upon the day-of-week such as Monday, Tuesday, etc Monday and Tuesdays are high, Wednesday typically highest and then decreasing over the remaining part of the week.
If i group the data monthly or weekly, i can definitely see a trend over time going up that appears to be additive.
My goal is to predict the next 3 days only. My intuition is telling me to take one approach and I am hoping for some feedback on pros/cons versus other approaches.
My intuition tells me it might be better to group the data by week or month and then predict the next week or month. Suppose I predict the next week total by loading historical weekly data into ARIMA, train, test and predict the next week. Within a week, each Day-of-week typically contributes x percent to that weekly total. So, if Wednesday historically has on average contributed 50% of the weekly volume, and for the next week I am predicted 1000, then I would predict Wednesday to be 500. Is this a common approach?
Alternatively, I could load the historical daily values into ARIMA, train, test and let ARIMA predict the next 3 days. The big difference here is the whole "predict weekly" versus "predict daily".
In the time series forecasting space, is this a common debate and if so, perhaps someone can suggest some key words i can google to educate myself on pros/cons?
Also, perhaps there is a suggested algorithm to use when day of week is a factor?
Thanks in advance for any responses.
Dan
This is a standard daily time series problem where there is a day-of-week seasonality. If you are using R, you could make the time series a ts object with frequency = 7 and then use auto.arima() from the forecast package to forecast it. Any other seasonal forecasting method is also potentially applicable.

Is it necessary to make time series data stationary before applying tree based ML methods i.e. Random Forest or Xgboost etc?

As in case of ARIMA models, we have to make our data stationary. Is it necessary to make our time series data stationary before applying tree based ML methods?
I have a dataset of customers with monthly electricity consumption of past 2 to 10 years, and I am supposed to predict each customer's next 5 to 6 month's consumption. In the dataset some customers have strange behavior like for a particular month their consumption varies considerably to what he consumed in the same month of last year or last 3 to 4 years, and this change is not because of temperature. And as we don't know the reason behind this change, model is unable to predict that consumption correctly.
So making each customer's timeseries stationary would help in this case or not?

Resources