How to identify the dataset as a time series data? - machine-learning

I am working on an automated tool where it have to support all kinds of data. Without looking at the data, are there any methods to identify that the dataset following time series? Are there any statistical tests?

Plot the data set if you find data points are equally spaced then your data is time series data.
Ex:continuous monitoring of a person’s heart rate, hourly readings of air temperature, daily closing price of a company stock, monthly rainfall data, and yearly sales figures.

Related

Best algorithm for time series prediction?

I would like to ask you some suggestions about a time series prediction problem. In particular, I have to predict on a daily basis the total water demand in a certain area, creating a model based on 4 CVSs files containing:
water demand in aggregated form (time series with daily granularity, 2 years data)
amount of water entering the area's cistern (time series with daily granularity, 2 years data)
amount of water leaving the area's cistern (time series with daily granularity, 2 years data)
water request from 4,000 measurements points across the area (time series with daily granularity, 2 years data).
In your opinion, what is the best model for having a good prediction of the water demand in the area, using the available data and features? I can only think of LSTMs or MLP, I don't know if something like ARIMA or (SARIMA) could be useful in this case, seeing that I have many features but not many days.
Thank you in advance for you help :)
Forecasting is inevitably a domain-specific problem because you can often make better decisions about model and methods when you know something about the system or process you are trying to forecast.
There are quite a few academic papers on forecasting domestic water demand which you could look at if you have access:
E.g.
Demand Forecasting for Water Distribution Systems by Chen and Boccelli (2014)
Urban Water Demand Forecasting: Review of Methods and Models by Donkor et al (2014)
Predicting water demand: a review of the methods employed and future possibilities by de Souza Groppo et. al (2019)
I'm not an expert in this domain so you should probably wait for someone who is to answer the question but I think using an auto-regressive model (e.g. ARIMA), as you have suggested, is a good start because demand is essentially due to aggregate human activity which is inherently driven by daily / weekly routines, and seasonal effects.
There are various routines to fit such models to data. Jason Brownlee has a nice tutorial here using Python's statsmodels.tsa package.
You could also see what people have used for residential energy consumption forecasting as the problem is probably very similar to water demand forecasting.

Storing any number series data in a time-series database

I would like to make use of time-series database InfluxDb to store data points indexed by another number instead of time which every data point is stored against. So I can take advantage all the features for a series of datapoints against this number..
For example I have a rocket doing multiple launches on which I have several sensors recording temperature, air pressure, fuel level &c. And I want to graph these datapoints against elevation not time..
I realise I could store elevation itself against time then from the time for say a temperature reading work out the elevation and project the results - but that working out would lose the performance characteristics of just querying the datapoints indexed by elevation. Also third party tools which use the time-series database won't be able to simply get these datapoints against elevation as opposed to time to graph them out, e.g. Grafana, without me putting something in-between to marry the data up..
One idea I had was to have a fake time where meters = seconds and store against this, then I would need make that a composite with something else to differentiate rocket launches, e.g. increment year by 1 starting at year 0.. So I don't see every launch starting at the same elevation and can separate the "number-series" from each other - I guess I would have that problem anyway and the proper way to that would be through tags..
What makes you believe that this approach would be more efficient than storing the elevation jointly with your other sensor data? Fetching data is pretty cheap so the performance gain might be very light compared to the augmented complexity of your keys. Not to mention that you would still need to have the time make part of your elevation-timestamp, otherwise you will end up with duplicate pseudo timestamps and therefore incomplete data as most time series databases do not allow multiple values at the same timestamp for a given series.
I would encourage you to also have a look at other time series databases which include elevation as part of their standard data model. Check out Warp 10 for that matter (std disclaimer, I am the co-founder of SenX, maker of Warp 10).

Use features with diffrent lengths of time to an ANN model

I'm trying to build a ANN model to predict oil prices and I have diffrent economic features, each one as a time serie.
I pretend to predict the monthly prices, but there are some features that are only avaliable in a annual period. Is it possible to use anyway these features to predict the monthly prices or only can use the monthly features?
Thank you an regards,
Gonzalma.

Augmenting forecasts with knowledge of some future events

When using AWS Forecast, is there some way to augment our model with "partial future information" in order to improve forecasts?
I have been getting quite solid looking predictions from AWS Forecast so far, but suspect that I could improve the predictions somewhat substantially if I could provide some information about known future events.
I'm very new to forecasting and machine learning and by "partial future information", I mean:
I am trying to predict how the time-series of variable X will behave in the future
I am training a model with past time-series information for many different variables, including X
I would like to also provide known future time-series information for a subset of these variables because 1) they should have a significant impact on predictions and 2) this would give me the ability to perform "what-if" analysis
To be more concrete:
I am trying to predict future revenue from past revenue, web traffic volume, advertising spending, and promotional discounts
AWS Forecast has been providing me with good forecasts so far (I hold back so many months of known data from the model and its predictions about the "future" match the known data quite well)
However, I would really like to also tell AWS Forecast about, for example, a significant advertising campaign that is planned for the near future
I would also really like to be able to vary some future variable or variables and see how they affect the outcome ("what if I spend $Z on advertising next month?")
Currently, I am providing all of our past revenue, web traffic volume, advertising spending, and promotional discount information to AWS Forecast as a "Target Time Series" in the format of a single CSV file with 3 columns (metric name, timestamp, metric value); approximately 15 distinct values of metric name; and about 10,000 total rows of data (several years worth of daily values of 15 variables = ~ 2 * 365 * 15 = ~ 11,000 rows). Every metric is provided over the same time interval (for instance, all of the metrics are provided between 2017-10-01 and 2019-11-25).
I'd like to provide some additional, partial data that highlights known future significant events (spending on advertising, promotional discounts) to improve our predictions even further.
For example:
Revenue from 2017-10-01 to 2019-11-25
Web traffic from 2017-10-01 to 2019-11-25
Ad spend from 2017-10-01 to 2019-11-25
Promotional discounts from 2017-10-01 to 2019-11-25
plus planned ad spend for 2019-11-26 to 2020-02-01
Can someone please help me with some of the terminology and the "how-to" mechanics of this?
In general, to use a variable in your historical data, you need a forecast of it in the future as well. It would be like trying to forecast electrical usage and then putting historical temperatures in the data set. If you don't have a forecast of the future temperatures, that information hasn't done you any good in improving your forecast. Because now I know what the effect of "an extra one degree of temperature on electrical usage", but ¿what do I do with that if I have no idea what the temperature will be tomorrow?
In your case you have 1 metric you want to forecast (revenue) and three supporting pieces of data: traffic, ad spend, discount. It's great that you have future ad spend, but without the other two, you're a bit out of luck (per the prior paragraph).
However, you can still do something here, but you'll just have to make some assumptions. What I would do is choose a fixed value for all dates in the future and set that for all future dates. Perhaps appropriate values would be discount at zero (full price item) and web traffic at—I'm making this up—1K per day. Now you have full data sets for past and future.
With that set up you could now answer the question, albeit with a caveat. The forecast you get out is now saying...
Here's how much revenue we can expect given our planned ad spend, if we offer no discounts and we get 1K people to the website every day.
Perhaps you could improve that by inputting traffic values in the future that are the same from a year prior. In which case, you could now say ...
Here's how much revenue we can expect given our planned ad spend, if we offer no discounts and the website gets the same traffic as this time last year.
You can take that to variations such as "traffic goes up 10%" or you can take a guess at what the discounts will be or, like before, you could replicate your discounts and traffic from a year prior and say...
Here's how much revenue we can expect given our planned ad spend, if we offer discounts just like last year and see website traffic just like last year.
I suspect you get the idea, so I'll stop all the variations. These are, of course, really just future forecasts of those data; however, it's worth nothing that "creating a forecast" of discounts or web-traffic, doesn't have to be complicated and fancy. "The same as last year" is a perfectly valid "forecast" of what's to come.

Types of noises when predicting a time series like electricity loads and exchange prices

In another question I presented my developed methodology to add noise to a time series of electricity loads and exchange prices. With these noisy time series I want to test how well my system can cope with predicted time series (Function for testing system stability, which receives predicted time series as input).
RobertBaron pointed out that my methodology only adds "white noise" (i.e. normally distributed noise) to a time series.
Hence the question:
What other types of noise are typically added when predicting electricity loads and exchange prices?

Resources