Time Series Forecasting - Part 1
This is Part 1 in a series on time series forecasting - The full series is Part 1, Part 2, and Part 3.
Time series forecasting is supported in the Oracle Database by Oracle OLAP FORECAST command and by Oracle Data Mining (ODM). The FORECAST command can be used to forecast data by one of three methods: straight-line trend, exponential growth, or Holt-Winters extrapolation. FORECAST performs the calculation according to the selected method and optionally stores the result in a variable in your analytic workspace. The first two methods are simple extrapolation techniques. The Holt-Winters forecasting method is more sophisticated. It is a type of exponential smoothing or moving average technique. The Holt-Winters method constructs three statistically related series, which are used to make the actual forecast. These series are:
- The smoothed data series, which is the original data with seasonal effects and random error removed
- The seasonal index series, which is the seasonal effect for each period. A value greater than one represents a seasonal increase in the data for that period, and a value less than one is a seasonal decrease in the data
- The trend series, which is the change in the data for each period with the seasonal effects and random error removed
ODM, through its Support Vector Machine (SVM) regression functionality, provides a powerful non-linear technique for time series forecast that could include other variables besides the series itself and captures complex relationships. The rest of this post covers the data mining approach to time series modeling. This post is part of a two-post series. In the next post I will give an example of time series forecasting using ODM and the approach described below.
Data Mining Approach
ODM SVM regression supports modeling of time series via a time delay or lag space approach. This approach is also called "state-space reconstruction" in the physics community and "tapped delay line" in the engineering community. In its simplest form, past values of the target (the time series we want to forecast) are used as inputs (predictors) to the model. These inputs are called lagged variables and can be easily computed using the SQL LAG analytic function. Other attributes that are also relevant for forecasting the series can be added in the same fashion. Suppose we are trying to forecast the maximum daily electrical load based on electrical load values and average daily temperatures. Following the above approach, for a given date, we could use, for example, the load values and the average daily temperatures for the previous two days as inputs. This is illustrated in the table below where the lagged values are computed using the SQL LAG analytic function. The data shows maximum load values (Y) and average daily temperatures (X) for 10 days.
In some cases, the values for auxiliary attributes (X), like the average temperature in the example above, would not be known at the time we are trying to forecast the target (Y) and would therefore not be included among the inputs. However, we would still be able to use the lagged values of X. Once the attributes have been selected, we can train the SVM model with these target (Y) and predictor attributes, in the example above the predictors are: LAG(Y,1), LAG(Y,2), X, LAG(X,1), LAG(X,2). The data would be split into training and test data sets. Usually one would train on earlier dates and test on later ones. Alternatively, for one-step ahead forecast testing (more on this below), the training and test data sets can be randomly selected from all available data. A SVM regression model that the only inputs are lagged target (Y) values is called an "autoregressive model." The input space that includes all of the lagged variables is called the "embedding space."
Things get a bit more complicated if the data rows in the time series are not equally spaced, that is, the time interval between observations is not the same. One approach is to use a smoothing technique to compute values for the attributes at equally spaced time intervals, and then use the interpolated values for training instead of the original data.
When modeling time series, following the above approach, it is necessary to make decisions regarding:
- Trend removal
- Target transformation
- Lagged attribute selection
A key fact for the above time delay approach to be effective is the assumption that the time series is stationary. This implies that the statistical distribution of the time series values at the various time intervals is the same. In particular, this means that the time series does not have a trend. In practice, many time series exhibit trend. For example, many financial indicators, such as stock prices, usually go up over time. A trend component in the time series means that the series values tend to go up over time, or that the series values tend to go down over time. The simplest method is called differencing and is the standard statistical method for handling nondeterministic (stochastic) trends. In this case, instead of using Y (the time series value) as a target, we use the difference D = Y-LAG(Y,1) for the target. The same applies to the target lagged values. For example, instead of using LAG(Y,1) as a predictor we would use LAG(Y,1)-LAG(Y,2). Sometimes it is necessary to compute differences of differences. At apply time the differencing of the target can be reversed to obtain forecasts for the original series.
It is usually useful to normalize the target for SVM regression. This helps speed up the algorithm convergence. For time series problems, the target should be normalized prior to the creation of the lagged variables.
Lagged Attribute Selection
You can either select the lags by analyzing the data (compute correlogram and cross-correlations) or by selecting a window size. For example, if we use a window of size 2 we would include LAG(Y,1) and LAG(Y,2) as predictors, where Y is the target attribute. Some care is needed in choosing the window size. The window size directly affects the pattern recognition capability of the SVM algorithm. It limits the size of the patterns that can be recognized. If the window is too small we might not have enough information to capture the dynamics of the system underlying the time series data. Different patterns may look the same as only a small fraction of the pattern is revealed by the lagged attributes. If the window is too large, the extra lagged attributes will add noise and make the problem harder to solve.
There are several different ways to compute forecasts. The two most commonly used strategies are: one-step-ahead (open-loop) forecasting and multi-step (closed-loop) forecasting.
Single-Step or Open-Loop Forecasting
This strategy requires all the inputs values to the model to be available. If the previous value of the target is included in the model we can only make forecasts for the next time interval, thus the single-step name. For the demand forecast example above, we would only be able to forecast the demand for one day in the future (Day 11). In order to compute Y_12, the forecast for Day 12, we would need to wait until the actual values for Day 11 were available. In other words, let say that we have trained a SVM regression model with target Y and inputs LAG(Y,1) and LAG(Y,2). Let the output (predicted value) computed by the model prediction be designated by P(.,.). Then:
- forecast Y_11 as P(Y_10,Y_9)
- forecast Y_12 as P(Y_11,Y_10)
- and so on
Multi-step or Closed-Loop Forecasting
This strategy uses actual values for the inputs when available and estimates or predicted values when the actual values are not available. Let's say that we have trained a SVM regression model with target Y and inputs LAG(Y,1) and LAG(Y,2). Again, let the output (predicted value) computed by the model prediction be designated by P(.,.). Then:
- forecast Y_11 as P_11 = P(Y_10,Y_9)
- forecast Y_12 as P_12 = P(P_11,Y_10)
- and so on
Multi-step forecasts can be computed using a simple PL/SQL procedure. This is illustrated in Part 3 of this series.
Comparison with Traditional Time Series Techniques
SVM regression offers the same benefits for time series forecasting as those of feedfoward neural networks, but with simpler training. The advantages of using such models include:
- The ability to model very complex functions
- The ability to use a large number of variables in the model and to include other data (i.e., fundamental and technical factors) in addition to lagged time series data
This is Part 1 in a series on time series forecasting - The full series is Part 1, Part 2, and Part 3
Readings: Business intelligence, Data mining, Oracle analytics
Labels: Time Series