Abstract
In order to achieve a targeted number of foreign tourist arrivals set by the Indonesian government in 2017, we need to predict the number of foreign tourist arrivals. As a major tourist destination in Indonesia, Bali plays an important role in determining the target. According to the characteristic of the tourist arrivals data, one shows that we need a more flexible forecasting technique. In this case we propose to use a Support Vector Machine (SVM) technique. Furthermore, the effects of noise components have to be filtered. Singular Spectrum Analysis (SSA) plays an important role in filtering such noise. Therefore, the combination of these two methods (SSA-SVM) will be used to predict the number of foreign tourist arrivals to Bali in 2017. The performance of SSA-SVM is evaluated via simulation studies and applied to tourist arrivals data in Bali. As the results, SSA-SVM shows better performances compare to other methods.
Keywords: Foreign tourist, Singular spectrum analysis, Support vector machine
(ProQuest: ... denotes formulae omitted.)
1. Introduction
Tourism is a prime sector in the Indonesian economy growth. In 2014 the tourism sector in Indonesia contributed 3.2 percent of the total national GDP and opened 3,326,000 jobs or 2.9 percent of total employment [1]. Therefore, in the National Medium Term Development Plan (RPJM) 2015-2019 the tourism sector becomes a priority sector [2]. Bali is a major tourist destination in Indonesia. It has a charm of natural beauty and cultural richness as the main attraction for tourists of both domestic and foreign. Especially, since 2015 many policies that support tourism in Bali, such as international events, visa-free policies, and the opening of new aviation routes from the most contributing countries of tourists in Bali such as Australia and China. Based on a data released by Statistics Indonesia (BPS), in 2016, the total number of foreign tourist arrivals to Bali was 44.88 percent [3]. In 2017, the Indonesian government set a target for the number of foreign tourists to be 15 million foreign tourists [4]. In order to achieve this target, a proper planning is required. Information on the estimated number of foreign tourists is needed in preparing the plan.
A common forecasting method used to predict the number of foreign tourist arrivals to Bali is Unweight Moving Average method [5]. This method can be said as a conventional time series forecasting technique because it is classified as a simple method. The performance of this method is also less favorable when applied to data containing trend, especially, a nonlinear trend [6,7]. One of forecasting methods that can overcome the limitations of Unweight Moving Average (MA) is Support Vector Machine (SVM). The SVM is one of the most recent machine learning methods introduced by Vapnik [8]. This method can be applied to a non stationary and non linear data [9]. This is because the SVM method applies the principle of Structural Risk Minimization (SRM) and uses e-Insensitive Loss Functions.
The existence of an Incidental event or the change of government policy disturb on the data which can be categorized as noise. The noise tends to be irregular and unpredictable. This often makes the forecasting results of the model formed less accurate. Therefore, reducing the influence of noise components in building the model will certainly improve the accuracy of forecasting. In order to be able to implement it, the first necessary step is decomposing time series data into several components. This can be done using the Singular Spectrum Analysis (SSA) method [10]. This method has also been widely applied by some authors [11-13].
In this research, we combine SSA with SVM (SSA-SVM) in predicting the number of foreign tourist arrivals to Bali in 2017. In order to see how far the SSA-SVM method can improve the accuracy of forecasting, the performances of the proposed method will be compared to MA, SSA and SVM techniques.
2. Research Method
2.1. The Foreign Tourist
A foreign tourist is everyone who is expected by a country outside his / her residence, which by one or several non-desired purposes he wishes to stay for no more than 12 (twelve) months [3]. This definition includes two categories of foreign guests, namely tourists and travelers. A tourist is a visitor stated above that stays at least twenty-four (24) hours, and will be no more than twelve (12) months in the place visited for the purpose of personal, business visits or professionals. A traveler stays less than twenty-four hours (including passenger cruise i.e. any visitor arriving in a country by boat or train, where they do not stay in the concerned country). The number of foreign tourist arrivals whose purpose in this study is the number of foreign tourists arriving through Ngurah Rai Airport Bali.
2.2. MA
The MA method is a simple forecasting method. This method predicts a value in a certain period by averaging a number of κ values of the previous period [6]. Therefore, the accuracy of this method is determined by choosing the appropriate κ value. Mathematically this method can be expressed as follows:
... (1)
The y is an actual value, y is a forecasted value, t is time and κ is an estimation period.
2.3. SVM
Consider (x1,y1),...,(xl,y1), where x e Rn is the input vector, and y e R is the corresponding values, and l is the amount of data. In the regression context, we consider the following model:
...
where s is a tolerated error and f(x) is an unknown function, which can be formulated as follows:
... (2)
Ф(x) is a nonlinear function transforming x into a high dimensional space, h is a weight vector and b is a bias. The estimator of f (x) is obtained by minimizing error risks (R(f)) using Structural Risk Minimization (SRM) as defined by:
... (3)
In equation 3, the first term ... is the empirical error and the second term ... is the regularization term. To obtain sparse solutions, Vapnik [8] introduce s -Insensitive Loss Functions as follows:
... (4)
The best estimator of f (x) can be than obtained by minimizing following objective function:
... (5)
... (6)
C is a pre-specified value in order to modulate the balance of empirical and regularization. at and a· are the multiplier of Lagrange accordance with a support vector xi and K(x,x) is defined as the kernel function. The kernel function employed in this study is Gaussian Radial Basis Function (RBF), because it has a better performance compared to the other kernel functions [14-15]. RBF is formulated by:
... (7)
There are three parameters (C,e,a) to be optimized. The optimization of those parameters uses a grid search technique. This technique is very powerful and able to improve the accuracy significantly [16]. The detailed information about SVM can be seen in Vapnik (1995) [8]. Applied to our time series data, we denote x in SVM by xt,xt-1,xt-2,... and the output (y) replaced by xt+1.
2.4. SSA
The SSA is divided into 4 processes namely embedding, singular value decomposition (SVD), grouping and diagonal averaging. The embedding process and SVD are known as decomposition stages as well as grouping and diagonal averaging reconstruction stages [10]. In the embedding process, a one-dimensional time series data, x1,...,xN, will be replaced by the new multidimensional series, ... where ... Score L is in between ... and L is commonly known as window length. The appropriate L value is gained from an optimization process and K = N - L +1. New series data can be changed into matrix which is known as a trajectory matrix, as follows:
... (8)
The trajectory matrix X will be changed into SVD. For instance, S = XXT . From S , we obtain eigenvalues (?,..., lL) with decreasing order of magnitude Ą >... >ÂL > 0 and eigenvectors (u1,..., uL) from each eigenvalues. Suppose, the rank of X is denoted by d , d = max {i,Åt > 0} (note that in real-life series, usually have d = L· with L· = min {L,K}). Sequentially, it can be also gained vi = for i = 1,..., d , so that X can be formulated as follows:
... (9)
Xi is called elementary matrix. The decomposed X matrix will be then grouped into m disjoint subsets (I1,...,Im). If the I = {i1,...,ip }, so the results of X¡ matrix corresponding to the group I, defined as XI = Xk +... + Xip. The XI matrix calculates the grouping of I = I1,...,Im so that equation 8 can be elaborated as:
... (10)
The grouping process will be based on the eigenvectors ( u) plot and the cumulative ratio of eigenvalues (Л) [10]. Eigenvectors plot used to see the data characteristics of the elementary matrix and the cumulative ratio of eigenvalues used to see how much the contribution of the elementary matrix involved in the grouping process could explaining the condition of the trajectory matrix. The grouping results of (10) can be transformed to the new N time series data. This step aims to gain the single score of the data components obtained from grouping process. For instance equation 10 generated Y matrix whose size is LxKwith element ... otherwise. Therefore the diagonal averaging enables us to replace Y into series y1,..., yN which is formulated as follows:
... (11)
2.5. SSA-SVM
The irregular and unpredictable noise components often lead to overfitting and underfitting. Therefore, referring to Wang et al. (2013), before analyzing the time series data using SVM, the noise component is firstly filtered using the SSA method [17] and then will be redescribed according to the trend component and oscillation to create a better accuracy of forecasting. Hence, there will be three group of the data containing trend, oscillation and noise. The procedures of the SSA-SVM are illustrated by Figure 1.
2.6. Grid Search
The grid search method is one of the common methods to obtain the optimal (C,s,o) parameters in SVM and window length (L) in SSA method. We build some grid parameter points from a particular range [18]. We choose the optimal parameters corresponding to cross validation method for the time series data [19]. The best method is determined by smallest Mean Absolute Percentage Error (MAPE).
... (12)
where y, and y, are respectively the actual and forecasted values and n is the number of data.
3. Results and Analysis
We analyze the data using some functions which available at kernlab and Rssa R packages. The performances of the proposed technique are evaluated via simulation studies. We also apply the propose technique to a real data application, in this case we apply to tourist arrivals data in Bali.
3.1. Simulation Study
We simulate the generated model 200 times with the number observation (n) of each model is 74. The generated data is divided into training data (62 data) and testing data (12 data). The grid of required parameters are C = {10,20,30}, s = {0,0.1,...,0.7}, a = {1,2,3,4} and L = {2,3,...,37}. Matrix X1 will be entered into the trend group, matrix X2,..., X5 will be entered into the oscillation group and the rest will enter into the noise group.The predictions are applied to several periods of the data (3, 6, 9 and 12 data). It aims to predict how far the accuracy of forecasting from the two data types if the prediction range is longer.
The nonlinear data is generated fromyt = 0.3 + yt-1 + 2sin(2^t/16) + 2cos(2^t/16) + ut with t = 1,2,..., n and u ~ N(0,1). Figure 2 shows that SSA-SVM method has a better performance inasmuch as the median and variety of MAPE resulted is smaller compared to the others method. The same is still be valid although the range of forecasting used is getting longer.
Based on the data simulation, SSA-SVM method has better performances compared to MA and SVM methods.
3.2. Real-Data Application
The data used in this study is foreign tourist arrivals data through Ngurah Rai (Bali) airport from January 2007 until December 2016. Figure 3 shows that the data has a positive trend and the right tail of the curve has a non-linear pattern. From the results of this initial identification, the MA method may imply a less accuracy.
Before entering into the processing stage, the data are firstly divided into training data and testing data. The training data an of the number of foreign tourist arrivals from January 2007 to December 2015 (108 data) and data for the validation model (testing data) using data period January 2016 until December 2016 (12 data).
The grid search used with the same grid range as in the simulation data. As the results using SVM method we obtain C = 20, s = 0.7 and a = 1. In the SSA-SVM method, window length obtained 52. Next the grouping process will be based from the eigenvector plot. This research does not show the whole plot of eigenvector but only 12 initial eigenvector plots, because it can already be represent the condition of the trajectory matrix. Based of Figure 4, matrix X¡ has a trend pattern, so it be entered into the trend group. Matrix XX5 seen have a oscillation pattern, so it be entered into the oscillation group. And the rest will enter into the noise group. Furthermore, the value of cumulative ratio for 5 eigenvalue used, has reached 99.67 percent this indicates that using 5 elementary matrixs has been able to explain 99.67 percent condition of trajectory matrix. This already indicates that the grouping process performed is correct.
The optimal SVM parameters of this combined method for trend group are C = 30, s = 0.1 and u = 1 and oscillation group are C = 20, s = 0.4 and u = 3. The MAPE for each forecasting methods can be seen in Table 2.
Table 2 shows that SSA-SVM method has the lowest MAPE in all prediction period. The SSA-SVM method has highly accurate forecasting for periods of 3 and 6 months and good forecasting for periods of 9 and 12 months [20]. Based on this, it can be said that the SSA-SVM method is the best method to predict the number of foreign tourist arrivals to Bali. The results of forecasting the number of foreign tourists, period January 2017-December 2017 using SSA-SVM method can be seen in Table 3.
Based on Table 3, the number of foreign tourists who come to Bali in 2017 is estimated to reach 5.39 million tourists. We can also see that it has some fluctuations in every month where the highest number of arrivals is in August, but in overall it has a positive trend.
4. Conclusion
SSA-SVM method is the best method to forecast the number of foreign tourist arrivals to Bali. It combines the advantages of SSA that are able to decompose time series data to filtering out noise component and the ability of SVM method to handle nonlinear large variation of the data, it improves the forecasting accuracy.
Acknowledment
This research is supported by Statistics Indonesia (BPS) and master program of Applied Statistics, Padjadjaran University.
Copyright © 2018 Universitas Ahmad Dahlan. All rights reserved.
Received September 14, 2017; Revised January 13, 2018; Accepted July 2, 2018
References
[1] World Travel & Tourism Council (WTTC). Travel & Tourism Economic Impact 2015 Indonesia. WTTC. 2015.
[2] Bappenas. Rencana Pembangunan Jangka Panjang Menengah Nasional 2015-2019. Jakarta: Bappenas. 2014.
[3] BPS. Statistik Kunjungan Wisatawan Mancanegara 2016. Jakarta: BPS. 2017.
[4] Republik Indonesia. Peraturan Presiden No.45 Tahun 2016 Tentang Rencana Kerja Pemerintah (rKp) Tahun 2017. Jakarta: Sekretariat Negara. 2016
[5] Kemenpar. Analisis Kunjungan Wisatawan Mancanegara pada Kawasan 3 Great Triwulan 1 2015. Jakarta: Kemenpar. 2015
[6] Lind DA, Marchal WG, and Wathen SA. Statistical Techniques in Business and Economics; Sixteenth Edition. New York: McGraw-Hill Education. 2002.
[7] Talluri KT, Ryzin GJ. The Theory and Practice of Revenue Management. New York: Springer. 2005.
[8] Vapnik VN. The Nature of Statistical Learning Theory. New York: Springer. 1995.
[9] Gavrishchaka V and Banerjee S. Support Vector Machine as an Efficient Framework for Stock Market Volatility Forecasting. Computational Management Science. 2006; 39(2): 147-160.
[10] Golyandina N and Zhigljavsky A. Singular Spectrum Analysis for Time Series. New York: Springer. 2013.
[11] Sitohang YO and Darmawan G. (2017). The Accuracy Comparison between ARFIMA and Singular Spectrum Analysis for Forecasting the Sales Volume of Motorcycle in Indonesia. Proceedings of The 4th International Conference on Research, Implementation, and Education of Matheatics and Science (4th ICRIEM). Yogyakarta: AIP Conference Proceedings. 2017; 1868: 040011-1-040011-8.
[12] Hassani H et al. Forecasting U.S. Tourist Arrivals Using Optimal Singular Spectrum Analysis. Journal of Tourism Management. 2015; 46: 322-335.
[13] Unnikrishnan P and Jothiprakash V. Extraction of Nonlinear Rainfall Trends Using Singular Spectrum Analysis. Journal of Hydrological Engineering. 2015; 05015007(15): 1-15.
[14] Kim KJ. Financial Time Series Forecasting Using Support Vector Machines. Neurocomputing. 2003; 55: 307-319.
[15] Sotomayor A et al. Forecast Urban Air Pollution in Mexico City by Using Support Vector Machines: A Kernel Perfomance Approach. International Journal of Intelligence Science. 2013; 3(3): 126-135.
[16] Syarif I, Bennett AP and Wills G. SVM Parameter Optimization Using Grid Search and enetic Algorithm to Improve Classification Performance. TELKOMNIKA (Telcommunication, Computing, Electronics and Control). 2016; 14(4): 1502-1509.
[17] Wang Y et al. Comparative Study of Monthly Inflow Prediction Methods for the Three Gorges Reservoir. Journal of Stochastic Environmental Research and Risk Assessment. 2013; 28(3): 555-570.
[18] Rao SS. Engineering Optimization: Theory and Practice. Fourth Edition. New Jersey (US): J Wiley. 2009.
[19] Hyndman RJ. Forecasting: Principles & Practice. Australia: University of Western Australia. 2014.
[20] Lewis CD. Industrial and Business Forecasting Methods. London: Butterworths. 1982.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2018. This work is published under https://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
According to the characteristic of the tourist arrivals data, one shows that we need a more flexible forecasting technique. Singular Spectrum Analysis (SSA) plays an important role in filtering such noise. [...]the combination of these two methods (SSA-SVM) will be used to predict the number of foreign tourist arrivals to Bali in 2017. [...]reducing the influence of noise components in building the model will certainly improve the accuracy of forecasting. [...]there will be three group of the data containing trend, oscillation and noise.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer





