Content area
Most organizational decisions are based upon a forecast. This is especially true in inventory control and scheduling, where forecasts are frequently generated for hundreds of items on a regular basis. Evaluating forecast model performance is not a simple task. It can be made easier if managers keep in mind the differences provided between standard and relative error measures, as well as distortions that can occur due to outlier and zero values.
Full text
Most organizational decisions are based upon a forecast. This is especially true in inventory control and scheduling, where forecasts are frequently generated for hundreds of items on a regular basis. Much of this process relies on computerized forecasts providing the advantage of computational efficiency and ease of use. Forecast accuracy is typically judged by standard forecast-error measures provided by most computer packages. The challenge for managers is to make good decisions based upon these forecasts and be able to expediently evaluate forecast performance.
Unfortunately, there is little consensus among forecasters as to the best and most reliable forecast error measures [1]. Complicating the issue is that different error measures often provide conflicting results [5]. Knowing which forecast-error measure to rely on can be difficult, yet extremely important. Different forecast-error measures provide unique information to the manager and each have their shortcomings. Knowing when to rely on which measure can be highly beneficial.
UNDERSTANDING FORECAST ERROR MEASURES
Common Forecast Error Measures Most forecast-error measures can be divided into two groups: standard and relative error measures [4]. Listed below are some of the more common forecast-error measures in these categories. Specific suggestions with regard to their use follow.
If Xt is the actual value for time period t and Ft is the forecast error for the period t, the forecast error for that period is the difference between the actual and the forecast:
et = Xt - Ft When evaluating performance for multiple observations, say n, there will be n error terms. We can define the following standard forecast-error measures. Standard versus Relative Forecast-Error Measures Standard error measures, such as mean error (ME) or mean square error (MSE), typically provide the error in the same units as the data. As such, the true magnitude of the error can be difficult for managers to comprehend. For example, a forecast error of 50 units has a completely different level of gravity if the actual value for that period was 500 versus 100 units. Also, a forecast error of $50 is vastly different from a forecast error of 50 cartons. In addition, having the error in actual units of measurement makes it difficult to compare accuracies across time series or different periods of time. In inventory control, for example, units of measure typically vary greatly between series. Some series might be measured in dollars, others in pallets or boxes. When comparing accuracy between series, the results are not meaningful or the series with large numbers may dominate the comparison. Relative error measures, which are unit-free, do not have these problems. They can be significantly better to use and provide an easier understanding of the quality of the forecast. As managers easily understand percentages, communication is made easier. These measures make comparisons across different time series or different time intervals easy and meaningful. Relative errors include all percentage measures, including simply the error as a percentage of actual sales. However, these error measures are not without shortcomings. Because these measures are defined as a ratio, problems arise in the computation of values that are zero or close to zero. Simple ways of handling this problem are addressed further in this article. MAPE is one of the most popular of the relative error measures. Error Measures Based on Absolute Values Error measures that use absolute values, such as the mean absolute deviation (MAD), provide valuable information. Because absolute values are used, these measures do not have the problem of errors of opposite signs cancelling themselves out. For example, a low mean error (ME) may mislead the manager into thinking that forecasts are doing well when, in fact, high forecasts and low forecasts may be cancelling each other out. This problem is avoided with absolute error measures. Typically the shortcoming of these error measures is that they assume a symmetrical loss function. That is, the organizational cost of overforecasting is assumed to be the same as underforecasting and they are summed together. So, the manager is provided with the total magnitude of error but does not know the true bias or direction of that error. When using MAD it is beneficial to also compute an error measure of bias, such as mean error (ME). ME, by contrast to MAD, provides the direction of the error which is a tendency to over or underforecast. It is very common for managers and sales personnel to have a biased forecast in line with the organizational incentive system, such as being evaluated against a sales quota. This information is useful, as the forecast can then be adjusted for the bias. The two pieces of information, the MAD and bias, can work to complement each other and provide a more complete picture for the manager. Outliers A frequent problem that can distort interpretation are outliers or unusually high or low data points. The problem can also arise due to a mistake in recording the data, promotional events or just random occurrences. Outliers can make it particularly difficult to compare performance across series. Some error measures, such as the mean square error (MSE), are especially susceptible due to the squaring of the error term which can make overall error appear unusually high. This can distort true forecast accuracy. One method which is immune to outliers is to compare forecast performance against a comparison model which can serve as a baseline. Another option is to use the median for the error measure which will remove higher and lower values in favor of the middle value. Finally, extreme values can be replaced by certain limits. For example, all values less than 0.01 can be replaced by 0.01 [1]. Therefore, the extreme values are tempered, but some original information is retained. In any case, outliers should not be ignored as traditional error measures may provide misleading forecast results.
Zero Values USING SOME COMMON ERROR MEASURES Mean Square Error Mean square error (MSE) is an error measure that has particular benefits under certain circumstances. Squaring of error can be advantageous in certain situations as the errors are weighted based on magnitude. Larger errors are given greater weight than smaller errors. This can be quite beneficial in situations when the cost function increases with the square of the error. For example, in inventory control or production planning, larger errors can create costly problems. Overforecasting can lead to higher production and inventory levels. In inventory control, MSE is popular as it can be directly tied to the variability of the forecast errors. This is important for calculating safety stocks in order to cover the variability of demand during the lead time period. In general, this is a good error measure to use in situations when large errors are costly and decision making is very conservative [2].
The disadvantage of MSE is that it is inherently difficult to understand. Sometimes using the root mean square error (RMSE), which is simply the square root of MSE, may be preferred as the error is provided in the same units as the data. Like the MSE, the RMSE penalizes errors according to their magnitude. Also, as both MSE and RMSE are not unit-free, comparisons across series are difficult. Mean Absolute Deviation Mean Absolute Percentage Error Other Useful Error Measures
One useful way of evaluating forecast performance is to compare accuracy against a baseline forecast. A comparison between forecasts is quite simple to do. A forecasting technique that commonly serves as a baseline is the Naive model, which is nothing more than last period's actual serving as next period's forecast. The idea is that a chosen forecasting model must perform better than Naive in order to justify its use. The accuracy of multiple forecasting procedures can be compared with this baseline.
A particularly useful way to measure forecast performance is to compute the MAPEs for the forecasts being evaluated and the Naive model. The difference of these MAPEs tells managers the improvement in forecast accuracy by using their forecasting method versus using last period's actual as the forecast. For example, if the MAPE for your forecasting model was 14% and the MAPE for Naive was 23%, using your model provided 9% improvement over simply using last period's actual.
One statistic that performs an automatic comparison against the Naive model in a slightly more complex way is Theil's U statistic [4]. This statistic compares the forecast accuracy of your model with what forecast accuracy would have been had the Naive model been used instead. This provides an automatic and convenient comparison of forecasting methods. The actual computation of Theil's U statistic can be very difficult to explain and this has hindered its widespread use. However, regardless of this complexity, managers can effectively use this statistic by bearing a few things in mind. First, the precise understanding of Theil's U statistic is irrelevant. What is important is understanding that this statistic provides an automatic comparison with the Naive model with results that fall into easily interpreted ranges. Simply, if the Naive method is as good as the forecasting model being evaluated, Theil's U statistic will be equal to 1. If Theil's U statistic is less than 1, your forecasting technique is better than Naive. If Theil's U statistic is greater than 1, there is no point to using your model as Naive is better. Most statistical and forecasting software packages provide Theil's U statistic. Managers would at least be wise to consider taking at look at it. CONCLUSION
Evaluating forecast model performance is not a simple task. It can be made easier if managers keep in mind the differences provided between standard and relative error measures, as well as distortions that can occur due to outlier and zero values. MSE can be an excellent error measure in environments where large errors are much costlier than small errors, though it can be difficult to understand and does not allow comparisons across series. MAD is shown to be a simple error measure particularly useful when coupled with a measure of bias. Like the MSE, it is a good choice when variability is important, such as in inventory control and safety stock calculations. MAPE is frequently the preferred choice among forecasters due to its ease of understanding and because it allows for convenient comparisons across time series. Finally, a frequently overlooked method is to compare forecast performance against a baseline forecast, such as the Naive model. This can provide an effective yet simple indication of forecast accuracy.
REFERENCES
1. Armstrong, J. S., and F. Collopy. "Error Measures for Generalizing about Forecasting Methods: Empirical Comparisons with Discussion." International Journal of Forecasting 8 (1992): 69-80.
2. Flores, B. E. "A Pragmatic View of Accuracy Measurement in Forecasting." OMEGA 14, no. 2 (1986): 93-98.
3. Makridakis, S. "Accuracy Measures; Theoretical and Practical Concerns." International Journal of Forecasting 9, no. 4 (1993): 527529.
4.-----,S.C. Wheelwright, and V.E. McGee. Forecasting Methods and Applications. New York: John Wiley & Sons, 1983.
5. Sanders, N.R. "The Dollar Considerations of Forecasting With Technique combinations." Production and Inventory Management Journal 33, no. 2 (1992): 47-49.
6. Smith, B. Focus Forecasting: Computer Techniques for Inventory Control. Boston: CBI Publishing, 1984.
About the Author
NADA R. SANDERS, PhD, received her PhD from the Ohio State University in operations management and is currently associate professor of operations management at Wright State University. Her publications have appeared in Decision Sciences, Journal of Operations Management, OMEGA, Journal of Behavioral Decision Making and OM Review. She is a member of APICS, Decision Sciences, and the International Institute of Forecasters.
Copyright American Production & Inventory Control Society, Inc. First Quarter 1997