Can I use a transfer learning in facebook prophet?

Can I use a transfer learning in facebook prophet? - time-series

I want my prophet model to predict values for every 10 minute interval over the next 24h (e.g. 24*6=144 values).
Let's say I've trained a model on a huge (over 900k of rows) .csv file where sample row is
...
ds=2018-04-24 16:10, y=10
ds=2018-04-24 16:20, y=14
ds=2018-04-24 16:30, y=12
...
So I call mode.fit(huge_df) and wait for 1-2 seconds to receive 144 values.
And then an hour passes and I want to tune my prediction for the following (144 - 6) 138 values given a new data (6 rows).
How can I tune my existing prophet model without having to call mode.fit(huge_df + live_df) and wait for some seconds again? I'd like to be able to call mode.tune(live_df) and get an instant prediction.

As far as I'm aware this is not really a possibility. I think they use a variant of the BFGS optimization algorithm to maximize the posterior probability of the the models. So as I see it the only way to train the model is to take into account the whole dataset you want to use. The reason why transfer learning works with neural networks is that it is just a weight (parameter) initialization and back propagation is then run iteratively in the standard SGD training schema. Theoretically you could initialize the parameters to the ones of the previous model in the case of prophet, which might or might not work as expected. I'm however not aware that something of the likes is currently implemented (but since its open-source you could give it a shot, hopefully reducing convergence times quite a bit).
Now as far as practical advice goes. You probably don't need all the data, just tail it to what you really need for the problem at hand. For instance it does not make sense to have 10 years of data if you have only monthly seasonality. Also depending on how strongly your data is autocorrelated, you may downsample a bit without loosing any predictive power. Another idea would be to try an algorithm that is suitable for online-learning (or batch) - You could for instance try a CNN with dilated convolution.

Time Series problems are quite different from usual Machine Learning Problems. When we are training cat/dog classifier, the feature set of cats and dogs are not going to change instantly (evolution is slow). But when it comes to the time series problems, training should happen, every time prior to forecasting. This becomes even more important when you are doing univariate forecasting (as is your case), as only feature we're providing to the model is the past values and these value will change at every instance. Because of these concerns, I don't think something like transfer learning will work in time series.
Instead what you can do is, try converting your time series problem into the regression problem by use of rolling windowing approach. Then, you can save that model and get you predictions. But, make sure to train it again and again in short intervals of time, like once a day or so, depending upon how frequently you need a forecast.

Related

Predicting time series based on previous events using neural networks

I want to see if the following problem can be solved by using neural networks: I have a database containing over 1000 basketball events, where the total score has been recorded every second from minute 5 till minute 20, and where the basketball games are all from the same league. This means that the events are occurring on different time periods. The data is afterwards interpolated to have the exact time difference between two timesteps, and thus obtaining exactly 300 points between minute 5 and minute 20. This can be seen here:
Time series. The final goal is to have a model that can predict the y values between t=15 till t=20 and use as input data the y values between t=5 and t=15. I want to train the model by using the database containing the 1000 events. For this I tried using the following network:
input data vs output data
Neural network
The input data, that will be used to train the neural network model would have the shape (1000,200) and the output data, would have the shape (1000,100).
Can someone maybe guide me in the right direction for this and maybe give some feedback if this is a correct approach for such a problem, I have found some previous time series problems, but all of them were based on one large time series, while in this situation I have 1000 different time series.

There are a couple different ways to approach this problem. Based on the comments this sounds like a univariate/multi-step time series forecasting albeit across many different events.
First to clarify most deep learning for time series models/frameworks take data in the following format (batch_size, n_historical_steps, n_feature_time_series) and output the result in the format (batch_size, n_forecasted_steps, n_targets) .
Since this is a univariate forecasting problem n_feature_time_series would be one (unless I'm missing something). Now n_historical_steps is a hyper parameter we often optimize on as often the entire temporal history is not relevant to forecasting the next time n steps. You might want to try optimizing on that as well. However let say you choose to use the full temporal history then this would look like (batch_size, 200, 1). Following this approach you might then have output shape of (batch_size, 100, 1). You could then use a batch_size of 1000 to feed in all the different events at once (assuming of course you have a different validation/test set).This would give you an input shape of (1000, 200, 1) This is how you would likely do it for instance if you were going to use models like DA-RNN, LSTM, vanilla Transformer, etc.
There are some other models though that would create a learnable series embedding_id such as the Convolutional Transformer Paper or Deep AR. This is essentially a unique series identifier that would be associated with each event and the model would learn to forecast in the same pass on each.
I have models of both varieties implemented that you could use in Flow Forecast. Though I don't have any detailed tutorials on this type of problem at the moment. I will also say also that in all honesty given that you only have 1000 BB events (each with only 300 univariate time steps) and the many variables in play at Basketball I doubt that you will be able to accomplish this task with any real degree of accuracy. I would guess you probably need at least 20k+ basketball event data to be able to forecast this type of problem well with deep learning at least.

What is 'Refresh Rate' in the context of machine learning algorithms?

I've recently been using an AI/ML platform called Monument(Monument.Ai) to project time series. The platform contains various ML algos and parameters within the algo to tune the projections. When using algos such as Light GBM and LSTM, there is a parameter called 'Refresh Rate.' Refresh rate is a parameter that takes in an integer. In the platform, it describes refresh rate as
How frequently windows are constructed. Every window is used to validate this number of data points
where windows in this context are 'sub windows' within the main training period. My question is what is the underlying use of refresh rate and how does changing it from 1, 10, or 50 impact the projections?

Monument worker here. I think we should set up an Faq platform somewhere, as the questions could be confusing to others without context :-)
Back to your question, the refresh rate affects only the "validation" part for a time series analysis. It is interpreted as a frequency number, so 1 = high refresh rate and 50 = low refresh rate. A higher refresh rate gives you the better validation effectiveness, but is slower than a lower refresh rate; hence you usually choose a moderate one (10 is a good choice).
====== More technical explanations below. ======
On Monument, you choose an algorithm to make future "prediction" on your time series data, and look at the "validation" results to see how suitable the algorithm is to your problem. The prediction task is specified by two "window" parameters: lookback and lookahead. Selecting lookback=10 and lookahead=5 means you are trying to "predict 5 data points into the future by using the last 10 data points".
Validation needs to reflect the result from the exact same prediction task. Particularly, for each historical data point, you want to train a new model with 10 points in the past to predict 5 points ahead. This is when refresh rate=1, i.e., refresh for every data point. For each historical data point, you create a "sub-window" of length 15 (10+5). That is a lot of new models to train and could be very very slow.
If time and memory limit is not a concern then refresh rate=1 is a good choice, but usually we want to be more efficient. Here we are exploiting a "local reusability" assumption, that is a model trained for a sub-window is useful for adjacent sub-windows. Then we can train model on one sub-window and use it on 10 historical points, that is, refresh rate=10. This way much less computation is required and validation is still accurate to a certain extent. Note you may not want to set refresh rate=200, because it is not very convincing that my model is still useful for data 200 points away. As you see there is a tradeoff between speed and accuracy.

Validating accuracy on time-series data with an atypical ending

I'm working on a project to predict demand for a product based on past historical data for multiple stores. I have data from multiple stores over a 5 year period. I split the 5-year time series into overlapping subsequences and use the last 18 months to predict the next 3 and I'm able to make predictions. However, I've run into a problem in choosing a cross-validation method.
I want to have a holdout test split, and use some sort of cross-validation for training my model and tuning parameters. However, the last year of the data was a recession where almost all demand suffered. When I use the last 20% (time-wise) of the data as a holdout set, my test score is very low compared to my OOF cross-validation scores, even though I am using a timeseriessplit CV. This is very likely to be caused by this recession being new behavior, and the model can't predict these strong downswings since it has never seen them before.
The solution I'm thinking of is using a random 20% of the data as a holdout, and a shuffled Kfold as cross-validation. Since I am not feeding any information about when the sequence started into the model except the starting month (1 to 12) of the sequence (to help the model explain seasonality), my theory is that the model should not overfit this data based on that. If all types of economy are present in the data, the results of the model should extrapolate to new data too.
I would like a second opinion on this, do you think my assumptions are correct? Is there a different way to solve this problem?

Your overall assumption is correct in that you can probably take random chunks of time to form your training and testing set. However, when doing it this way, you need to be careful. Rather than predicting the raw values of the next 3 months from the prior 18 months, I would predict the relative increase/decrease of sales in the next 3 months vs. the mean of the past 18 months.
(see here)
http://people.stern.nyu.edu/churvich/Forecasting/Handouts/CourantTalk2.pdf
Otherwise, the correlation between the next 3 months with your prior 18 months data might give you a misleading impression about the accuracy of your model

Applying machine learning to training data parameters

I'm new to machine learning, and I understand that there are parameters and choices that apply to the model you attach to a certain set of inputs, which can be tuned/optimised, but those inputs obviously tie back to fields you generated by slicing and dicing whatever source data you had in a way that makes sense to you. But what if the way you decided to model and cut up your source data, and therefore training data, isn't optimal? Are there ways or tools that extend the power of machine learning into, not only the model, but the way training data was created in the first place?
Say you're analysing the accelerometer, GPS, heartrate and surrounding topography data of someone moving. You want to try determine where this person is likely to become exhausted and stop, assuming they'll continue moving in a straight line based on their trajectory, and that going up any hill will increase heartrate to some point where they must stop. If they're running or walking modifies these things obviously.
So you cut up your data, and feel free to correct how you'd do this, but it's less relevant to the main question:
Slice up raw accelerometer data along X, Y, Z axis for the past A number of seconds into B number of slices to try and profile it, probably applying a CNN to it, to determine if running or walking
Cut up the recent C seconds of raw GPS data into a sequence of D (Lat, Long) pairs, each pair representing the average of E seconds of raw data
Based on the previous sequence, determine speed and trajectory, and determine the upcoming slope, by slicing the next F distance (or seconds, another option to determine, of G) into H number of slices, profiling each, etc...
You get the idea. How do you effectively determine A through H, some of which would completely change the number and behaviour of model inputs? I want to take out any bias I may have about what's right, and let it determine end-to-end. Are there practical solutions to this? Each time it changes the parameters of data creation, go back, re-generate the training data, feed it into the model, train it, tune it, over and over again until you get the best result.

What you call your bias is actually the greatest strength you have. You can include your knowledge of the system. Machine learning, including glorious deep learning is, to put it bluntly, stupid. Although it can figure out features for you, interpretation of these will be difficult.
Also, especially deep learning, has great capacity to memorise (not learn!) patterns, making it easy to overfit to training data. Making machine learning models that generalise well in real world is tough.
In most successful approaches (check against Master Kagglers) people create features. In your case I'd probably want to calculate magnitude and vector of the force. Depending on the type of scenario, I might transform (Lat, Long) into distance from specific point (say, point of origin / activation, or established every 1 minute) or maybe use different coordinate system.
Since your data in time series, I'd probably use something well suited for time series modelling that you can understand and troubleshoot. CNN and such are typically your last resort in majority of cases.
If you really would like to automate it, check e.g. Auto Keras or ludwig. When it comes to learning which features matter most, I'd recommend going with gradient boosting (GBDT).
I'd recommend reading this article from AirBnB that takes deeper dive into journey of building such systems and feature engineering.

Using Random Forest for time series dataset

For a time series dataset, I would like to do some analysis and create prediction model. Usually, we would split data (by random sampling throughout entire data set) into training set and testing set and use the training set with randomForest function. and keep the testing part to check the behaviour of the model.
However, I have been told that it is not possible to split data by random sampling for time series data.
I would appreciate if someone explain how to split data into training and testing for time series data. Or if there is any alternative to do time series random forest.
Regards

We live in a world where "future-to-past-causality" only occurs in cool scifi movies. Thus, when modeling time series we like to avoid explaining past events with future events. Also, we like to verify that our models, strictly trained on past events, can explain future events.
To model time series T with RF rolling is used. For day t, value T[t] is the target and values T[t-k] where k= {1,2,...,h}, where h is the past horizon will be used to form features. For nonstationary time series, T is converted to e.g. the relatively change Trel. = (T[t+1]-T[t]) / T[t].
To evaluate performance, I advise to check the out-of-bag cross validation measure of RF. Be aware, that there are some pitfalls possibly rendering this measure over optimistic:
Unknown future to past contamination - somehow rolling is faulty and the model using future events to explain the same future within training set.
Non-independent sampling: if the time interval you want to forecast ahead is shorter than the time interval the relative change is computed over, your samples are not independent.
possible other mistakes I don't know of yet
In the end, everyone can make above mistakes in some latent way. To check that is not happening you need to validate your model with back testing. Where each day is forecasted by a model strictly trained on past events only.
When OOB-CV and back testing wildly disagree, this may be a hint to some bug in the code.
To backtest, do rolling on T[t-1 to t-traindays]. Model this training data and forecast T[t]. Then increase t by one, t++, and repeat.
To speed up you may train your model only once or at every n'th increment of t.

Reading Sales File
Sales<-read.csv("Sales.csv")
Finding length of training set.
train_len=round(nrow(Sales)*0.8)
test_len=nrow(Sales)
Splitting your data into training and testing set here I have considered 80-20 split you can change that. Make sure your data in sorted in ascending order.
Training Set
training<-slice(SubSales,1:train_len)
Testing Set
testing<-slice(SubSales,train_len+1:test_len)

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart