1. Introduction
Water-related soil erosion as an environmental concern and considerable source of transferring sediments into rivers is a threating land degradation phenomenon affecting around one billion hectares in the world [1]. The consequents of the water erosion include on-site impacts such as loss of soil resources, decrease in soil fertility, reduction of vegetation growth, filling of valleys and reservoirs, desertification and destruction of human infrastructure, and off-site impacts consisting of sedimentation of water courses, decreases in water quality and economic and ecological damages to societies [2,3]. Water erosion occurs in different forms based on changes in its morphometric characteristics on hill slopes including rain splash, sheet (interrill) erosion, rill erosion, bank erosion and gully (badland) erosion [4]. Among these, gully erosion is a complex erosion problem [5] that will be accelerated or triggered with land use change and heavy rainfalls [6]. The contribution of gullies in overall sediment production in semi-arid and arid regions is 50–80% worldwide [7]. It has been reported that soil loss rates by gully erosion ranges from minimal 10% up to 94% of total sediment yield in water erosion [5].
According to the definition, gully erosion is an erosion process where runoff water accumulates and sometimes recurs in narrow channels and then, over a short time, the soil from the narrow channels will be removed and a considerable channel with high depth will emerge [8]. Three types of gullies have been reported as (i) permanent gullies which are often related to agricultural lands, and are specified with very deep channels that with ordinary tillage are obliterated. Their depth ranges from 0.5 to as much as 25–30 m [9,10,11,12], (ii) ephemeral gullies (rill form) are small eroded channels by overland flow that are easily filled through normal tillage [9]. They are specified by a critical cross-sectional area of about 929 cm2 [13], a minimum width and depth of 0.3 and 0.6 m, respectively [14], and (iii) bank gullies constitute wherever a morphological bank will be cut by concentrated runoff. With increasing the local slope of the soil surface as subvertical or vertical, they will be quickly developed by erosion, piping and consequently mass movements at or below the soil surface [15,16].
An area of about 1.1% of the world’s land areas has been covered by Iran where the annual amount of soil loss is 2–2.5 billion tons, ranking as the second in the world in terms of the amount of soil erosion [17]. Reports indicate that about 88 million hectares (more than half of the area) of Iran is covered by critical soil erosion conditions [17]. Since gullies lead to degradation of a large amount of soil and transferring huge volume of sediments into streams, the agriculture lands, residential areas and even infrastructures will suffer [18]. Therefore, recognition of the areas that are more prone to gully erosion is a critical issue for better land management and prevention of gully erosion in land allocation studies.
Over the last decades, some investigations and numerous computer-aided techniques have been developed for gully erosion modelling including expert knowledge methods such as analytical hierarchy process (AHP) [19,20], bivariate statistical methods (BSMs) such as frequency ratio (FR) [21,22], certainty factors (CF) [23], weight of evidence (WoE) [22,24], information value (InVal) and evidential belief function (EBF) [25], conditional probability (CP) [26], index of entropy (IOE) [27], multivariate statistical methods (MSMs) such as linear regression (LiR) [28] and logistic regression (LR) [29,30], and machine learning methods such as support vector machine (SVM) [31,32,33], and random forest (RF) [34,35], classification and regression trees (CART) [33] and artificial neural networks (ANN) [33].
Recently, ensemble machine learning methods have been used more for spatial prediction of natural hazards studies such as groundwater and flood [36,37,38,39,40,41,42,43,44], landslides [45,46,47,48,49,50,51,52,53,54,55,56,57,58], wildfire [59], sinkhole [60], droughtiness [61] and land/ground subsidence [62]. However, few studies using ensemble machine learning models have been reported on gully erosion, such as [31]. An advantage of the ensemble algorithms as powerful techniques is that they have higher goodness-of-fit and perdition accuracy than the individual or single-based methods/algorithms by removing their weaknesses. For instance, Pourghasemi et al. [31] used artificial neural network (ANN), SVM, maximum entropy (ME) and their ensembles to prepare gully susceptibility mapping. They reported that the ANN-SVM ensemble had more ability to detect gully erosion in comparison to the individual and other ensemble methods. Although some methods and techniques have been developed for susceptibility assessment, the results of the modeling process are different from one region to another even from a model to another one, indicating that the obtained results by a model are for that specific case study. Overall, ensemble machine learning has improved the prediction capability of gully erosion models significantly.
This aim of this work is to expand the body of the proposed gully erosion modelling and verify a new ensemble artificial intelligence approach based on rotation forest (RF) and the ADTree algorithm, named as RFADT, for gully erosion mapping in a semi-arid watershed, Klocheh, Bijar in Kurdistan province, Iran. RF is a relatively new and powerful ensemble framework that has proven its efficiency in various real world problems [63,64,65,66,67], whereas the ADTree is a new robust and efficient algorithm [41]. To the best of our knowledge, RFADT has not been considered for gully erosion susceptibility mapping. Finally, the effectiveness of the proposed RFADT model is assessed by comparing its performance with benchmarks, ADTree, SVM with RBF and Polynomial kernel functions, LR and Naïve Bayes Multinomial Updatable (NBMU), and conclusions are given.
2. Description of Study Area
The study area is the Klocheh watershed, located between Kurdistan province and Hamadan province in the west of Iran, between longitudes 47°50′24″ E and 48°8′35″ E, and latitudes 35°14′24″ N and 35°40′5″ N, covering an area of about 498.49 km2 (Figure 1). Elevation in the study area varies from 1612 to 2331 m above sea level, with an average elevation of 1925.47 m. The terrain surface of the area is relatively steep with slope ranging from 0° to 67.06° with the mean slope of 6.56°. Statistical analysis of rainfall for the period of 1987–2010 shows that the annual average rainfall is about 338 mm. The mean daily maximum and minimum temperatures are 4.4 and 13.4 °C, respectively. The number of frost days is 104 and the number of snow days is 35 (http://kurdistanmet.telepol.ir).
The Klocheh watershed is a branch of the Sefid Rood River which the latter itself is drained into the Caspian Sea Basin. Six geomorphologic units can be identified in the Klocheh watershed including old plain unit (40.43%), new plain unit (29.98%), hill slope unit (21.38%), fluvial sediment unit (3.72%), valley unit (3.20%) and mountain unit (1.28%). In this study, five types of land use patterns were also identified including barren lands, dry farming lands, poor pasture lands, semi-dense pasture lands and woodlands. The dry farming lands have occupied the largest area (73.95%), followed by semi-dense pasture (11.97%), poor pasture (9.58%), woodlands (3.38) and barren lands, respectively. The Klocheh watershed is geologically located in the Sanandaj-Sirjan zone so that its effects are seen as magmatism in the basin (geological map with scale of 1:100,000). The lithology of the basin includes metamorphic-sedimentary rocks of the Jurassic period and Tertiary sediments includes Js (schist, sandstone, quartzite), JL (intracellular limestone layers), Plt (trachyte, trachyandesite, dacite) and Plb (basalt, basinite), which are covered by deposits of Mm (light green and red marls), Pcg (conglomerate loose deposits), Pl,m (clay limestone, marl, sand marl, limestone sandstone), Qt1 (high alluvial sediments) and Qal (river beds sediments) (geological map with scale of 1:100,000). Stone units cover about 4.93% and sediment units cover 95.07% of the basin surface. Based on this classification, sedimentary and rocky units of the basin have been classified into low erodibility units such as Js, Plt, and JL; moderate erodibility including Qc, Qt1, Pcg, Mm, and Plb; high erodibility including Qtr, and Qt2; very high erodibility including Qal, Plm, covering an area of about 4.061%, 43.826%, 20.49% and 31.62% of the basin, respectively. Gullies in the study area have been mainly formed due to susceptible lithological units such as marl and alluvial deposits on the rivers. We in this study selected the head of gullies of the tributaries of the streams. The gullies on the main river of the study area had large sizes in depth (>10 m) and width (>7 m) while the head of gullies of the tributaries had smaller sizes (depth < 2–3 m and width < 1 m). Our main aim of this study was to recognize the locations that are prone to gully development in the future.
3. Data Acquisition 3.1. Gully Inventory Map
In this study, locations of some gullies had been recorded earlier by the Natural Resources and Watershed Management General Office of Kurdistan province; however, other locations were recorded during comprehensive field surveys and these locations were then checked by Google Earth images (dated 22 May 2017) in order to prepare an accurate gully erosion inventory map. A total of 915 gully erosion lines were ultimately detected in the study area which were mainly on or near the river networks (Figure 1). These gully lines were converted to the points using “feature to point” using ArcGIS 10.2, with more focus on the head of gullies. These points were then randomly classified into 70% (640 gullies) and 30% (275 gullies) for modeling and validation processes, respectively. Most of the gullies are classified as permanent and bank gullies (stream gullies) in the current study. It should be noted that for the modeling process using machine learning algorithms, the dataset should contain both present and absent events of the gully erosion process. Basically, besides dividing gully erosion locations into 70% and 30%, a total of 915 non-gully erosion locations should be selected and classified into a ratio of 70%/30%. In this study, we selected these locations randomly over the watershed using “create random point” tool in ArcGIS 10.2. Figure 2 shows some typical examples of gullies in the study area. As can be seen in these figures, gullies in the study area are surprisingly developed from rill with small size in depth and width, in which some of them have a depth more than 4 m and a width more than 10 m. The primary filed surveys based on the expert knowledge revealed that shear stress of flow and geology were the most important factors to cause gullies. Indeed, the gully locations are in concordance with the loose and erodible quaternary depositions including marl with interlayers of limestone. The Natural Resources and Watershed Management General Office of Kurdistan province, Iran, has done many control practices including construction of check dams to prevent and even to control these gullies; however, as observed in Figure 2, all of them were unsuccessful since the check dams were destroyed and overturned.
3.2. Gully Erosion Conditioning Factors
A large set of geo-environmental factors are usually used in scientific literature to analyze gully erosion hazard. However, there are no universal guidelines for selecting gully conditioning factors. Previous researchers have considered different factors as independent variables. According to the literature, we selected 22 gully-erosion susceptibility predictor variables, which can be divided into six categories (Table 1): (1) Topographic factors; (2) Hydrological factor; (3) Lithological factors; (4) Land cover factors; (5) Anthropogenic factors; and (6) Geomorphological factors. Topographic factors include slope, aspect, elevation, plan curvature, profile curvature, sediment transport index (STI) and valley depth (VD). Hydrological factors include rainfall, stream power index (SPI), topographic wetness index (TWI), hydrological group (HG), flow accumulation, permeability, distance to river and river density. Lithological factors refer to lithology, distance to fault and fault density. Land cover factors include land use while distance to road and road density factors are anthropogenic factors considered in the analysis; and geomorphological factors enclose landforms. Table 1 shows gully conditioning factors and their classes for gully erosion modeling.
A Digital Elevation Model with 12.5 m resolution was extracted from ALOS PALSAR data, collected from Alaska Satellite Facility’s (https://vertex.daac.asf.alaska.edu/#). Slope, aspect, elevation, plan curvature, STI, VD, SPI, TWI, HG, flow accumulation, permeability, distance to river and river density were constructed from the digital elevation model (DEM) using ARC GIS 10.2 and SAGA 6.0.0 software.
Slope inclination is an important factor in gully formation and development. Gentle slopes are assumed to have higher infiltration in comparison to steeper slopes, and therefore gentle slopes are considered to be susceptible to gully initiation [21,68]. The slope factor was classified into six classes of (1) 0–2; (2) 2–5; (3) 5–10; (4) 10–15; (5) 15–20; (6) >20 (Table 1). Aspect is another conditioning factor that plays an important role in gully development [26]. Aspect can control evapotranspiration, vegetation cover and incoming solar radiation [69]. The aspect factor of the study area was created using the DEM and categorized later into nine classes of (1) Flat; (2) North; (3) Northeast; (4) East; (5) Southeast; (6) South; (7) Southwest; (8) West; (9) Northwest (Table 1). Elevation influences microclimate and vegetation community [70]. Therefore, several researchers have taken it into account in geohazards, especially for predicting gully-erosion susceptibility [30,71]. According to the previous research, most of occurred gullies were concentrated in low-altitude areas [26]. The elevation factor was divided to eight classes: (1) 1612–1700; (2) 1700–1800; (3) 1800–1900; (4) 1900–2000; (5) 2000–2100; (6) 2100–2200; (7) 2200–2300; (8) 2300–2400 m (Table 1). Plan curvature can be an important predictor of gully erosion by representing the spatial variability in diverging and converging overland flow of water [21,72]. The plan curvature factor of the study area was reclassified into five categories: (1) [(−5.67)–(−0.736)]; (2) [(−0.736)–(−0.188)]; (3) [(−0.188)–0.149]; (4) [0.149–0.697]; (5) [0.6974–5.08] (m−1) (Table 1). Profile curvature can reflect the geometric features of slopes, which in turn can influence stress distribution of slopes in the development of gully [73]. The profile curvature of the study area was classified into five classes: (1) [(−6.357)–(−0.972)]; (2) [(−0.972)–(−0.187)]; (3) [(−0.187)–0.317]; (4) [0.317–1.1]; and (5) [1.1–7.94] (m−1) (Table 1). The sediment transport index (STI) as another effective factor in gully erosion has an important role in characterizing the process of erosion and deposition. In the present study, the STI was divided into five classes: (1) 0–1.286; (2) 1.286–2.894; (3) 2.894–5.145; (4) 5.145–8.468; (5) 8.468–27.33 (Table 1). Valley depth (VD) is computed based on the elevation using SAGA 6.0.0 software. It was divided into five categories including (1) 0–48.231; (2) 48.231–108.520; (3) 108.520–176.340; (4) 176.340–254.720; and (5) 254.720–384.340 (Table 1).
Rainfall as a triggering factor by penetrating into the cracks of soils leads to gully occurrence and its development in different directions [6]. The annual average rainfall factor of the Klocheh watershed was obtained from the inside and outside rain gauge stations of the study area using Inverse Distance Weighted (IDW) method. The rainfall factor was divided into five classes including (1) 261–286; (2) 286–298; (3) 298–306; (4) 306–312; and (5) 312–322 mm.
Stream Power Index (SPI), as a hydrological factor, indicates the erosion power of streams that can affect gully occurrence [21]. It is calculated as follows [74]:
SPI=Astanβ,
where As (m2m−1) is the specific catchment area and β is the cumulative upslope area and slope gradient (in degrees). The SPI factor of current study was divided into five classes including (1) 0–112.4; (2) 112.4–224.8; (3) 224.8–401.5; (4) 401.5–722.7; (5) 722.7–4095.
Topographic wetness index (TWI) is considered an important factor in gully development. Therefore, some researchers have applied TWI as a secondary topographic factor for modeling gully occurrence [26,30]. The formula of TWI is shown in Equation (2):
TWI=ln(Astanβ),
where As and β are the cumulative upslope area and slope gradient (in degrees), respectively. In this study, the TWI value was produced in SAGA-GIS 6.0.0 software using a 12.5 m DEM and then reclassified into five groups: (1) 1–3; (2) 3–4; (3) 4–5; (4) 5–6; (5) 6–9.059.
Hydrological soil group (HSG) is another conditioning factor in gully erosion studies. It reflects the soil potential for runoff generation based on the amount of infiltration [75]. The HSG factor was classified in four groups including (1) A; (2) B; (3) C; (4) D.
The distance to road map was constructed from the road network built by Iran National Cartographic Center (INCC) in DGN format with 1:25,000 scale. Flow accumulation, distance to river (m) and river density (km/km2) are prominent hydrological factors that have an important role in gully erosion. The possible effect of river networks on gully erosion was analyzed by calculating the distance to the nearest perennial or major upstream ephemeral rivers in the region in every raster cell. The values of three factors were constructed from the DEM 12.5 m using ArcGIS 10.2 and SAGA 6.0.0 software. Their classes are shown in Table 1.
Permeability or degree of porosity in soil indicates the ability of water to percolate and disintegrate the structure of soils [76]. It is expected that soils with low permeability and high pore spaces are more prone to gully occurrence. In this study, the permeability map was prepared by the constant-head test (ASTM D 2434). It was then classified into three categories including low permeability, moderate permeability and high permeability.
The different lithology and weathering properties of geologic parent materials influence land surface processes and development of erosional landforms such as gullies [21,68,77]. The lithology factor was obtained from a geological map with the scale of 1:100,000. Lithology units of the study area include layered limestone layers (JL); schist, quartzite, and dark gray metamorphosed sandstones (JS); (3) an alternative of light green and red marls (Mm); (4) basalt and bazanite (PLb); (5) conglomerate with a matrix of marl and sandstone (Pcg); (6) clay limestone, marl, sand marl, sandstone (Plm); (7) trachyte, trachy-andesite, dacite (Plt); (8) fluvial sediment (Qal); (9) terraces land (Qc); (10) travertine stone (Qtr); (11) high alluvial terraces (Qt1); and (12) low alluvial terraces (Qt2) (Table 1).
Distances to fault (m) (proximity to the fault) and fault density (km/km2) (cumulative length of faults per unit area) are important lithological factors in gully erosion. The rills which are closer to faults or have higher cumulative length of faults in the area have higher probability of becoming gullies [22]. The distance to fault and fault density factors are extracted from a geological map with the scale of 1:100,000. They are classified into five classes that are shown in Table 1.
Land use is also a key element in land degradation in general and in gully formation in particular [68]. The land use map of the present study was exploited using interpretation of Landsat 7 ETM+ satellite images from the land cover map acquired on 25 August 2017. The land use factor was divided into five categories: (1) Wood lands; (2) Dry-farming and cultivated lands; (3) Poor pastures; (4) Semi-dense pastures; and (5) Destroyed pastures.
Distance to road (m) and road density (km/km2) as anthropogenic/man-made factors show a remarkable influence on gully erosion [78]. These two man-made factors were generated from a topographic map with the scale of 1:150,000. Then, they were divided into five categories shown in Table 1.
Geomorphologic units have different roles in gully erosion occurrence. For example, gullies will generally be formed on low slope angle and loose sediments (quaternary depositions). They will be triggered by changing in overland flow, decreasing in runoff lag time and increasing in runoff volume [79]. In this study, the geomorphological map was categorized into five classes including (1) The valley plain unit (2) Hilly unit; (3) Mountain unit; (4) New plain unit; (5) Old plain unit; and (6) Fluvial sediment unit (Table 1).
4. Background of Machine Learning Methods 4.1. Support Vector Machine Classifier
Support Vector Machine (SVM) which introduced by Vapnik [80], is a well-known machine learning classifier applied to facilitate the solution of many real world problems including landslide prediction [81,82], flood prediction [83,84] and forest fire prediction [85,86]. It is based on the principle of structural risk minimization of statistical learning theory to reduce the error test and complexity of computation. Using the SVM, an optimal hyper-plane is constructed to separate two classes for classification whereas one class is assigned as “1” located above the hyper-plane and another is assigned “0” located below the hyper-plane. A number of support vectors are used to define the optimal hyper-plane which can be obtained by minimizing the objective function as below:
Min∑i=1nφi−12∑i=1n∑j=1nφi φjyi yj(xi,xj).
Subject to
Min∑i=1nφi yj=0and0≤αi≤D,
wherex=xi,i=1,2,...,nis a vector of input variables,y=yj,j=1,2,...,nis a vector of output variables and φi is defined as Lagrange multipliers.
At last, the decision function used for the classification can be expressed as below:
f(x)=sgn(∑i=1nyi φiK(xi,xj)+a),
where a is defined as the bias defined as the distance of hyper plane from the origin,K(xi, xj) are the kernel functions namely polynomial (POL) and radial basis function (RBF) which can be expressed as below [87]:
KPOL(xi, xj)=((x.y)+1)d,
KRBF(xi, xj)=e−γ‖x−xi‖2.
4.2. Logistic Regression Classifier
Known as the most popular multivariate statistical analysis, logistic regression (LR) has been applied to many scientific fields such as medical science [88,89], computer science [90] and natural hazard assessment [91,92]. It can be used for prediction and assessment of gully erosion in regional scale [30]. Main principle of LR is that it uses logistic function to analyze the relationship between a set of the conditioning factors based on a set of dependent variables and one or more independent variables. Logistic function used in the LR can be expressed as the following equations:
Q=11+e−t. ,
t=log it (a)=ln(a1−a)=e0+e1 x1+…+en xn. ,
where Q is defined as the probability of an gully erosion occurrence, xi (i = 1, 2, 3, …, n) are defined as the conditioning factors, t is defined as the linear logistic factor which varies from −∞ to +∞, e0 is defined as the constant modeling coefficient, ei (i = 1, 2, 3, …, n) are the modeling coefficients, and n is defined as the number of independent variables.
4.3. Naïve Bayes Multinomial Updatable Classifier
Known as one of the effective Bayesian classifiers, Naïve Bayes has been applied to many studies such as text classification [93,94], heart disease prediction [95], classification of agricultural land soils [96], facies identification [97] and natural hazard prediction [98]. The main principle of naïve bayes (NB) is based on the probabilities of the observations from past observations to find the state of query among other variables in the dataset. It is a simple and fast learning method for classification. Training NB can be implemented through several steps such as (i) collection of data, (ii) estimation of the probability and mean for each class, (iii) crtion of the variance and covariance matrix and building of the discriminant function for each class. Decision function of NB can be expressed as the following equation:
yNB = argmax P(yi) ∏i=1nP(xi,yi) yi=[gully,non-gully]
wherex (x1, x2 ,…xn)is the vector of the influencing factors andy (y1, y2)is the vector of the output variables (gully, non-gully),P(yi)is defined as the prior probability ofyi,P(xi,yi)is defined as the conditional probability expressed as below:
P(xi,yi) = 12πβe−(xi−α)22β2,
whereαandβare defined as the mean and standard deviation, respectively.
4.4. Alternating Decision Tree Classifier
Alternating Decision Tree (ADT) which introduced by Freund and Mason [99], is known as one of the effective decision tree classifiers which is based on the boosting algorithm. Representation of this classifier is to construct a classification tree where each decision node is replaced by two nodes such as a prediction node and a splitter node [100]. Out of these nodes, a prediction node is related with a real value and a splitter node is related with a test [101]. In the ADT, the decision rules are easy to be interpreted; therefore, its decision-tree structures are simpler than other decision classifiers such as Classification and Regression Tree (CART) [102] and Random forest [103]. Let a base ruler mapping to the real number from the instances includes a precondition t1 and a base condition t2 and u and v are two real numbers where the prediction is u ast1∩t2or v ast1∩t2¯(t¯is a negation of t). Value of u and v can be calculated by the following equations:
u=12lnW+(c1∩c2)W−(c1∩c2),
v=12lnW+(t1∩− t2)W−(t1∩− t2),
whereW(pr)is the total weight of the training instances which satisfy the predicate pr. The best precondition t1 and base condition t2 are chosen by minimizing theZ(t1,t2)which is expressed as follows:
Z(t1,t2)=2W+(t1∩t2)W−(t1∩t2)+W+(t1∩t¯2)W−(t1∩t¯2)+W(t¯2).
4.5. Rotation Forest Ensemble Classifier
Proposed by Rodriguez [64], Rotation Forest (RF) is known as one of the most effective ensemble techniques which have been used for improving the predictive capability of many single classifiers such as naïve Bayes tree [45], Random forest [65], support vector machines [104]. Training the RF model can be carried out in several main steps such as (i) several subsets are generated by dividing the attribute sets, (ii) sample subsets are obtained by resampling and transforming features on the generated subsets, (iii) the rotation matrix is realigned according to sequence of original attribute sets, (iv) base classifiers are trained using the rotated sample subsets, and (v) the final outcome is obtained by integrating the results of various base classifiers on different sample subsets. In the RF, the rotation matrix is expressed as follows [64]:
Ri=[ei,1(1),ei,1(2),…,ei,1(M1)0⋯00ei,1(1),ei,1(2),…,ei,1(M2)⋯0⋮⋮⋱⋮00⋯ei,1(1),ei,1(2),…,ei,1(MK)],
wheree(1) ij,e(2) ij,…,e(Mj) ijare the coefficients of the rotation matrix,M=nK where n is the number of input factors and K is the number of subsets. Coefficients for each class in the given test sample are attained using the average combination method expressed as below [64]:
ηj(X)=1N∑i=1Ncij(xRia),j=1…d.,
whereηj(X)is defined as the largest confidence of the output class,cij(xRia)is the probability assigned by the classifier with the regressioncij , d is the number of output classes. The flowchart of this study is shown in Figure 3.
4.6. Factor Selection Using Information Gain Ratio (IGR)
Selecting the most important factors in the modeling process has a determinant role in the obtained results. In this stage, the factors that have noise and over-fitting problems will be detected and they should be eliminated from the final modeling process to achieve an accurate model [36,48]. There are some techniques for factor selection in the literature including Relief, Least Square Support Vector Machine (LSSVM), Fuzzy-Rough Sets (FRS), Information Gain, and Information Gain Ratio (IGR) [105]. Among these, the IGR technique [106] was used for selecting the most significant factors for gully erosion modeling using a training dataset. In this method, the IGR assigned the weights by entropy (En) method to each factor titled “average merit (AM)” and the factors will be ordered based on it. The higher the value of IGR is, the more important the conditioning factor will be. The AM is specified as the average information gain ratio with 10-fold cross-validation that has ranges between 0 and 1 [107]. Consider T as a training dataset with n input samples and the class label Gi (gully erosion, non-gully erosion). The IGR will compute an AM for a given conditioning factor such as slope angle (SA) as follows [108]:
IGR (T, SA) =En(T)−En(T, SA)SplitEn(T, SA),
En(T) = −∑i=12n(Ai, SA)|T|log2n(Ai, SA)|T|,
En (T, SA)=∑j=1mTj|T|En(T),
SplitEn(T,SA)=−∑j=1m|Tj||T|log2|Tj||T|.
4.7. Development of Gully Erosion Maps
To construct the gully erosion maps each machine learning algorithm was performed based on each probability distribution function (PDF) of algorithms. Then, the gully erosion susceptibility indexes (GESI) for all pixels of the study area were computed. These values were converted to raster format using the “point to raster” tool in ArcGIS 10.2 and all gully erosion maps were prepared. Consequently, these maps were classified into five zones including very low susceptibility (VLS), low susceptibility (LS), moderate susceptibility, high susceptibility (HS) and very high susceptibility (VHS) using different classification methods such as equal interval, natural breaks, quantile and geometrical interval. In order to select the best classification method, the proportion of the whole cells of the watershed and all the observed gullies in each susceptibility class were calculated according to different classification methods and developed models.
4.8. Evaluation and Comparison Methods
4.8.1. Statistical Index-Bases Measures
In this study, four statistical measures including sensitivity, specificity (SPF), accuracy and root mean square error (RMSE) were used for evaluation of the new proposed and other soft computing benchmark models. The sensitivity (SST), specificity (SPF) and accuracy (ACC) were computed based on the four types of possible consequences including True Positive (TP), False Positive (FP), True Negative (TN) and False Negative (FN) [109,110,111]. The TP and FP are the proportion of the number of gully cells that are correctly classified as gully and non-gully cells, respectively. While TN and FN are the number of gully cells classified correctly and incorrectly as non-gully cells, respectively. Basically, SST is defined as the number of correctly classified gully cells per total predicted gully cells. The SPF is the number of incorrectly classified gully cells per total predicted non-gully cells. While the ACC is the proportion of gully and non-gully cells which are correctly classified. The difference between the observed and estimated data can be obtained by the error metric of RMSE. The better performance of gully models were acquired when the values of SST, SPF, and ACC were high and the RMSE value was low. These statistical measures can be calculated as follows;
SST=TPTP+FN,
SPF=TNTN+FP,
ACC=TP+TNTP+TN+FP+FN,
RMSE=1n∑i=1n(XP−XA )2,
whereXPandXAare the predicted and actual (output) values in the training dataset or the validation dataset from the gully susceptibility models, and n is the total number of samples in the training dataset or the validation dataset.
4.8.2. Receiver Operating Characteristic (ROC)
The Receiver Operating Characteristic (ROC) is a standard tool for evaluation the performance of the models that it is plotted based on the sensitivity and 100-specificity on the x- and y-axis, respectively [108]. The area under the ROC curve, AUC, generally has been used to evaluate model performance. The AUC for an ideal and inaccurate model have the values of 1 and 0.5, respectively [112]. The AUC is calculated as follows:
AUC=(∑TP+∑TN)/(P+N),
where P and N are the total number of gullies and non-gullies, respectively.
4.8.3. Freidman and Wilcoxon Sign Rank Tests
In addition to the abovementioned measures, two statistical tests including Freidman and Wilcoxon sign-ranked tests for more evaluation of the efficiency of the new proposed gully model were used. These non-parametric tests assess the comparison of performance of two or more gully models. If there are no differences between the treatment/performance of the gully models at the significant level of α = 0.05, the null hypothesis is predominant. To reject or accept the null hypothesis, the probability of a hypothesis (p-value) will be judged. The null hypothesis is rejected when it is true resulting in the existence of a significant difference between the two models and vice versa [108]. Freidman tests were used for evaluation of performance of models without pairwise comparison [113]. Consequently, if the p-value is less than 0.05 between two or more models (the null hypothesis is true), the results of comparison is not reliable [48]. Basically, Wilcoxon signed-ranked test is used to check the statistical significance of systematic pairwise between the gully models. The results by this test are judged based on the p-value and z-value if the p-value is less than 0.05 and the z-value exceeds the critical values of z (−1.96 and +1.96), the null hypothesis is true (rejected) and thus the performance of the susceptibility models is significantly different [45,108].
4.8.4. Gully Density
Gully density for a gully erosion susceptibility map is defined as the ratio of the number of gully erosion cells to the number of cells in susceptibility class. It was computed for the machine learning algorithms and then the obtained results were analyzed and assessed.
5. Result and Analysis 5.1. The Most Important Factors in Gully Modelling by IGR
The predictive average merit of gully erosion affecting factors by the IGR method is shown in Figure 4. Factor selection results showed that 19 out of 22 conditioning factors were capable of modeling gully erosion prediction (AM > 0). Distance to river has the highest average merit for gully modeling (AM = 0.283). It is because most gullies in the study area were located beside the river networks. It is followed by geomorphology (AM = 0.147), land use (AM = 0.134), HG (AM = 0.134), lithology (AM = 0.076), slope (AM = 0.053), STI (AM = 0.052), SPI (AM = 0.051), river density (AM = 0.046), rainfall (AM = 0.045), elevation (AM = 0.036), road density (AM = 0.032), TWI (AM = 0.03), permeability (AM = 0.026), aspect (AM = 0.024), distance to road (AM = 0.019), profile curvature (AM = 0.008) and flow accumulation (AM = 0.007).
5.2. Gully Modeling Procedure or Optimization
In the modeling process, the determination of optimum parameters values in all algorithms is a critical issue for achieving an algorithm with the highest goodness-of-fit and performance. The optimum parameters of the investigated models are shown in Table 2. Basically, the new hybrid RF-ADTree and soft computing benchmark models (NBMU, SVM-Polynomial, SVM-RBF, and LR) were built using 19 conditioning factors and training dataset for the spatial prediction of gullies. In this study, the optimum number of seed (from 1 to 10) and iteration (from 10 to 20 iterations) was obtained with various numbers of iterations and seeds versus AUC and RMSE for the training and validation of datasets under a trial and error procedure.
The results of optimum value selection for the number of seeds are shown in Figure 5a–d. The highest AUC values of RF-ADree model for the training and validation datasets (AUC = 0.906) were obtained with the number of seeds equal to 5 and the number of iterations equal to 10 (Figure 5a,b). Additionally, other results indicated that the lowest values of the RMSE (0.379) were obtained with the number of seeds and iterations equal to 5 and 10, respectively (Figure 5c,d).
The results of statistical performance analysis of models by the training dataset are shown in Table 3. These results indicate that all of the models have shown good performance for gully erosion in the training stage. In terms of sensitivity, the results stated that the new proposed model, RF-ADTree, has the highest sensitivity (0.877), indicating that 87.7% of gully erosions are correctly classified as gully erosion. It is followed by SVM-Polynomial kernel (0.875), SVM-RBF kernel (0.858), ADTree (0.842), LR (0.739) and the NBMU (0.768) models. Similarly, the values of specificity concluded that the new proposed model showed the highest value (0.804), followed by SVM-RBF kernel (0.793), ADTree (0.771), LR (0.764), SVM-Polynomial kernel (0.762) and the NBMU (0.747) models. The accuracy values indicated that the RF-ADTree model also has the highest value (0.837), followed by the SVM-RBF kernel (0.822), SVM-Polynomial kernel (0.809), ADTree (0.802), LR (0.797) and the NBMU (0.765) models. Additionally, the RF-ADTree hybrid model obtained the least RMSE value (0.373) in the training dataset, followed by the SVM-RBF kernel (0.375), LR (0.376), SVM-Polynomial kernel (0.378), ADTree (0.379) and the NBMU (0.398) models. Moreover, it can be observed that the RF-ADTree model has the highest AUC value (0.909), followed by the SVM-RBF kernel model (0.895), the ADTree (0.885), the LR model (0.876), the SVM-Polynomial kernel (0.871) and the NBMU model (0.844).
Performance analysis of the gully erosion models using validation dataset was also carried out (Table 4). The results showed that all models have shown high performance for prediction of gully erosion. Out of these, like the training stage, the RF-ADTree model has the highest predictive capability (sensitivity = 0.859; specificity = 0.795; accuracy = 0.824; RMSE = 0.378 and AUC = 0.926) and the NBMU model has shown the lowest performance (sensitivity = 0.756; specificity = 0.739; accuracy = 0.747; RMSE = 0.403 and AUC = 0.843). Other values of the statistical indices of model performance are shown in Table 4. Overall, the RF-ADTree model has the best performance for spatial prediction of gullies using both training and validation datasets. In other words, the RF model can improve the performance of ADTree as a base classifier for spatial prediction of gully erosion by detecting and eliminating the weakness of ADTree.
5.3. Development of Gully Erosion Maps
As above-mentioned, the GESI for each cell converted into raster format and gully erosion susceptibility maps were prepared and they were classified. Generally, the histograms of all models for different classification methods indicated that the majority of the observed gullies are located in VHS class (Figure 6). According to the susceptibility map of the RF-ADTree model, the very high susceptibility class determined by equal interval, natural breaks, quantile and geometrical interval methods cover 26.9%, 26.4%, 20.2% and 19.5% of the whole watershed cells and, 71.4%, 70.6%, 56.1% and 53.4% of the observed gully cells, respectively. Therefore, for the RF-ADTree model, the equal interval method was selected as the most appropriate method for classification of gully erosion susceptibility. Accordingly, the geometrical interval method was selected for SVM-Polynomial kernel and SVM-RBF kernel susceptibility maps, and the natural break method was the appropriate classification method for the LR, the NBMU and the ADTree susceptibility maps. The gully erosion susceptibility maps generated by the developed models are shown in Figure 7.
5.4. The Contribution of the Sixth Most Important Factors Using GESMs
In this study, we overlaid the sixth most important factors obtained by the IGR technique with gully erosion susceptibility maps developed by the models. The results are shown in Figure 8. It can be concluded that the first class of distance to river factor (<20 m) occupied the most cells of VHS class of gully erosion susceptibility map prepared by the ADTree (35.56%) model, followed by the RF-ADTree (37.39%), the NBMU (36.31%), the SVM-RBF kernel (36.26%), the SVM- Polynomial kernel (36.19%) and the LR (35.84%) models. It implied that the lowest distance from the rivers had the highest potential for gully erosion occurrence. Additionally, results indicated that the third class of geomorphology (fluvial sediment) occupied the most cells of VHS class in the LR (49.45%) model. It was followed by the SVM-RBF kernel (43.97%), the ADTree (43.25%) model, the RF-ADTree (42.11%), the NBMU (41.89%) and the SVM-Polynomial kernel (23.46%) models. It can be indicated that the fluvial sediments were more prone to gully occurrence in comparison to other geomorphologic classes. In terms of land use analysis, the results revealed that the dry-farming and cultivated lands covered the most cells of VHS class in the LR (81.21%) model while the lowest one was obtained for the NBMU (74.86%) model. Moreover, the ADTree, the RF-ADTree, the SVM-Poly kernel, and the SVM-RBF kernel models had the values of 77.05%, 75.52%, 78.77%, and 75.47%, respectively. The obtained results indicated that land use change in the study area was one of the principal reasons for gully erosion so that most of very high susceptibility class of gully susceptibility maps occurred on this land use unit. Among the four classes of soil hydrological groups (SHG), type D was more effective for gully erosion incidence in which results of overlaying the VHS class of susceptibility maps with SHG pinpointed that the most cells of VHS classes were obtained in the LR model (77.82%), followed by the SVM-Polynomial kernel (73.89%), the ADTree (73.24%), the RF-ADTree (73.06%), the SVM-RBF kernel (72.45%) and the NBMU (72.06%) models. In terms of lithology (Plm), results stated that the ADTree (31.35%) and the RF-ADTree (30.81%) models assigned the most cells of VHS class while the NBMU (30.29%), the SVM-RBF kernel (28.63%), the LR (27.59%) and the SVM-Polynomial kernel (26.72%) gained the other ranks. It implied also that the Plm lithological unit among other units was more responsible for gully erosion in the study area. In the case of slope angle (10–15°), results illustrated that the NBMU model (30.36%) had the highest value of the VHS class of gully susceptibility map. It was followed by the ADTree (30.36%), the SVM-RBF kernel (29.20%), the RF-ADTree (28.20%), the LR (27.12%) and the SVM-Polynomial kernel (26.26%) models. Overall, the findings indicated that the first class of distance to river, fluvial sediment, dry-farming and cultivated land, soil hydrological group D, Plm lithological unit and slope between 10° and 15° were more considerable for management and any prevention practice in the land allocation of the study area.
5.5. Evaluation and Comparison of Gully Erosion Maps
The new ensemble RF-ADTree model performance in prediction of gully erosion susceptibility was compared with SVM-Polynomial kernel, SVM-RBF kernel, LR, NBMU and ADTree benchmark models using ROC, gully density method, Friedman’s and Wilcoxon signed-rank test measures. The model accuracy was evaluated using the area under the ROC curve (AUC) for both training and validation datasets. In the training stage, the AUC of the ensemble RF-ADTree model had the highest value (AUC = 0.961), followed by the SVM-RBF kernel (AUC = 0.953), the LR (AUC = 0.952), the SVM-Polynomial kernel (AUC = 0.949), the ADTree (AUC = 0.935) and the NBMU model (AUC = 0.901) (Figure 9a).
Additionally, in the validation stage, the excellent predictive performance was taken place by the RF-ADTree that by the AUC equal to 0.913 indicating an accuracy of 91.3%. It is followed by the SVM-Polynomial kernel (AUC = 0.879), the LR (AUC = 0.875), the SVM-RBF kernel (AUC = 0.867), the ADTree (AUC = 0.861) and the NBMU model (AUC = 0.811) (Figure 9b). The above-mentioned results indicated that, similar to the RF-ADTree model, the other models had an acceptable accuracy in both training and validation stages.
The gully erosion density (GED) is another index in order to evaluate the reliability of gully erosion susceptibility maps. The gully erosion density increases for a perfect gully erosion susceptibility map from very low to very high susceptibility classes.. For the RF-ADTree model, the GED values were calculated equal to 0.003, 0.011, 0.046, 0.105 and 0.212 for VLS, LS, MS, HS and VHS gully erosion classes, respectively. Thus, it can be concluded that the RF-ADTree model generated an ideal gully erosion susceptibility map. For the SVM-Polynomial kernel model, the results indicated that the VHS class showed the highest GED (0.211), followed by the HS (0.082), MS (0.060), LS (0.021), and VLS (0.005) susceptibility classes. In the case of SVM-RBF kernel, the GED values were 0.003, 0.009, 0.048, 0.098 and 0.163 for VLS, LS, MS and VHS classes, respectively. Additionally, for the LR model, these values were obtained as 0.003, 0.011, 0.046, 0.105 and 0.212, respectively. In the case of the ADTree model, gully density for the VLS, LS, MS and VHS classes were calculated as 0.002, 0.011, 0.043, 0.082 and 0.180, respectively. In terms of the NBMU model, the values of 0.006, 0.029, 0.084, 0.123 and 0.133 were acquired for the VLS, LS, MS and VHS classes, respectively. Overall, all these models had an increasing trend in the value of GED from VLS to VHS classes.
5.6. Statistical Tests
The Friedman and Wilcoxon signed-rank tests were applied to evaluate the significant difference between the predictions of the gully erosion susceptibility models. Based on the Friedman’s test, in the study area, the average ranking was 4.80, 4.65, 3.49, 3.06, 2.71 and 2.29 for the RF-ADTree, ADTree, NBMU, LR, SVM-RBF kernel and SVM-Polynomial kernel models, respectively. Additionally, the chi-square statistic was 2040 at the 0.01 significance level, indicating a significant difference between the models (Table 5). Since the Friedman’s test is not capable of finding which model makes any difference when there is a significant difference, the Wilcoxon signed-rank test was used for pairwise comparing between the models. According to this test, between all gully erosion susceptibility models, the p-values had significant levels less than 5% and z-values were more than the critical values (–1.96 and +1.96) except between the ADTree and RF-ADTree models (p-value = 0.538 and z-value = −0.616). Among the pairwise comparisons with a significant difference, the LR and NBMU models had a significant difference at the level of 5%, the other pairs had significant difference at the level of 1% (Table 6). Accordingly, it can be concluded that the efficiency of all gully erosion susceptibility models had statistical differences with the others except the ADTree and RF-ADTree models which had a similar efficiency.
6. Discussion
Since gully erosion is considered as one of the main sources of sediments [6] and due to its different onsite and offsite effects [114], detection of areas that are more prone to gully erosion is an important strategy for preventing land degradation and soil transportation to rivers. In this study, main streams and their tributaries of the watershed were recognized and mapped using a new proposed and state-of-the-art ensemble algorithm namely the RF-ADTree model. Although, some conditioning factors can affect the development of gullies, selecting the most important ones to enhance the performance of the modeling process using feature selection is undeniable and essential [115]. Basically, among 22 conditioning factors in this study based on the IGR technique, 19 factors were known to be more effective so that distance to river (the most important role), geomorphology, land use, SHG, geology and slope angle were the first six significant factors. Indeed, the water shear stress in the areas where lithology is more susceptible to erosion with low permeability, mainly quaternary depositions, is the main factor for occurring and developing gullies in the study area. Wijdenes et al. [116] have declared that land use changes and lithology were responsible for developing gullies in Guadalentin catchment, southeast Spain. Moreover, Arabameri et al. [117] evaluated land use/land cover, lithology and distance to roads as the most important factors for gully occurrence in their study area. Rahmati et al. [22] based on the learning vector quantization (LVQ), pinpointed that distance to river, drainage density and land use are the most effective factors for the development of gullies. Most of gullies in the study area occurred along with the rivers and other factors were played as triggered factors such geomorphology and land use. Chaplot et al. [28] reported that the land use is a triggered factor for gully occurring.
The results of modeling process and gully susceptibility mapping evaluation using the new proposed model and some soft computing benchmark models such as NBMU, SVM-Polynomial kernel, SVM-RBF kernel, LR and ADTree indicated that the RF meta classifier combined with the ADTree algorithm, acquired the most goodness-of-fit and also performance using training and validation datasets. However, the ability of all these machine learning algorithms based on some statistical measures indicted that they were more successful for detection of the areas prone to gully erosion with emphasis on the new proposed model of RF-ADTree. Literature reviews showed that there is no study about the application of RF as a Meta classifier on gully erosion modeling; however, RF has been used more in landslide events as one of the soil erosion forms. Accordingly, results indicated that it had a high performance, for example, Pham et al. [118] revealed that the RF-Naïve Bayes (RFNB), Chen et al. [45] stated that the RF-Naïve Bayes Tree (RFNBT) and also Pham et al. [119] depicted that the RF based Functional Tree (RFFT) as a new and promising technique was more powerful technique and those outperformed the other Meta classifiers for landslide susceptibility modeling. Hong et al. [63] exploited some meta classifiers on the J48 Decision Tree (JDT) as a base classifier and concluded that the RFJDT model as a new proposed model had the highest performance in comparison to other models. Our findings can be explained that the RF Meta classifier uses feature extraction by principal component analysis (PCA) to optimize the learning of training dataset of the base classifier. This feature of the RF ensemble classifier leads to enhance the goodness-of-fit and also predictive ability of based classifier [64]. In other words, the RF model as a robust algorithm could be more efficient in reduction of both variance and bias of the base classier such as ADT in this study. The results also depicted that the ensemble models outperformed and outclassed the individual/single based classifiers. This is agreed and confirmed by Jebur et al. [120], Bui et al. [121], Bui et al. [107] and Shirzadi et al. [112]. Gully erosion susceptibility maps were prepared by all machine learning models used in this study and then classified using four known classification methods such as natural breaks, quantile, geometrical interval and equal interval [122]. All these methods for classifying gully erosion susceptibility maps and the results indicated that, for example natural breaks, geometrical interval and equal interval, had a low logical prediction as visualization. In other words, these classifications were led to an underestimate of prediction so that lower number of gully locations had occurred on high and very high susceptibility classes of gully erosion. Unlike, considering the histogram of gully distribution in this study revealed that the quantile method can be selected as the most appropriate method because of its higher conformity with the real ground condition than the other classifiers. Additionally, quantile method could assign more gully erosion location in the high and very high susceptibility classes of gully susceptibility maps in all machine learning algorithms. Some researchers have used the quantile classification method to divide the natural hazards susceptibility index such as Umar et al. [123], while Farncis et al. [122] used natural beaks, Pham et al. [124] used geometrical interval classification methods in their study.
The gully susceptibility maps were specified that the lowest distance from the rivers caused the most susceptibility to gully erosion. The results of the new proposed model of RF-ADTree was overlaid with the first six conditioning factors concluded that in terms of distance to river, the high (37.21%) and the very high (37.39%) susceptibility classes covered the most cells of gully erosion so that they were located on the first class of distance to river (<20 m). In terms of geomorphology, the fluvial sediment unit (quaternary deposition) mostly covered the high (40.11) and the very high (42.11) susceptibility classes of gully erosion susceptibility map. In terms of land use, according to the RF-ADTree model, dry-farming and cultivated lands were more susceptible to gully occurrence in which the high and the very high susceptibility classes occupied 68.45%, and 75.52%, respectively. Additionally, the high (67.22) and the very high (73.06) susceptibility classes of gully erosion map are corresponded with soil hydrologic group (SHG) D unit. It is noticed that soil hydrologic group D is mainly the soils that have very low permeability and infiltration rate when thoroughly wet resulting in a high runoff potential [125]. Therefore, soil hydrologic group D provided conditions for higher gully occurrence and development over the study area. Geologic analysis indicted that among 12 lithological units, the Plm unit had the highest susceptibility to gully erosion in which the most percentages of high (33.83%) and very high (30.81%) susceptibility classes were located in this lithological unit. The Plm as a low permeability unit consists of Pliocene marls including clay limestone, marl, sandstone, silty tuff, conglomerate, sandstone and travertine. This unit has mainly been covered by hilly slopes in the study area with an average elevation of 1800–2000 m. It generally has low slope and its color is often white to worm, pepper and sometimes red, dark gray and yellow. Slope angles were other important factors which slope angles between 2° and 15° were more significant for gully occurrence. However, slope angle between 15° and 20° covered the high (3.65) and the very high susceptibility classes in the study area. This class of slope angle dealt mainly with soil hydrologic groups C and D, Plm lithological unit, hilly mountain and fluvial sediment of geomorphology class under the dry farming and also cultivated land areas of land use factor. The validity of gully erosion susceptibility map prepared by the new proposed model in addition to the AUC, also was statistically checked and the results were verified and confirmed the applicability of this model and its prepared susceptibility map for gully management purposes.
7. Conclusions
Gully erosion as one of the soil threatening hazards leads to damage and destruction of infra-structure such as check dams in the Klocheh Watershed, Kurdistan Province, Iran, that was shown Figure 1. However, identification, prediction, prevention and management of gullies have always been top priorities for soil scientists, natural resources authorities and land managers. Therefore, an accurate spatial prediction of the gully erosion locations is an essential issue for conservation of natural resources such as soil and reducing its potential risks. For this purpose, we developed a new designed intelligence-based ensemble model named RF-ADTree which could successfully map the spatial prediction of gully erosion development in the Klocheh Watershed, Kurdistan Province, Iran. Additionally, we used of five soft computing benchmark models to check the goodness-of-fit and prediction accuracy of the new proposed model. Results of validation indicted that although all machine learning algorithms had high prediction accuracy; however, the new ensemble model was successful in gully erosion prediction due to the generation of a very accurate gully susceptibility map of the study area compared to other benchmark models. We recommend this model for gully modeling in other similar areas with caution since other conditioning factors might be responsible for gully erosion in other areas; while, distance to river was the most susceptible conditioning factor in the Klocheh watershed. Therefore, the obtained gully erosion map from the new developed model can be useful for planners, decision makers and engineers to better sustainably manage and decrease the damage and losses from the existing and future gullies, or also better manage the high and very high susceptible zones by appropriate decisions by preventive measures and mitigation procedures.
[Image omitted. See PDF.]
[Image omitted. See PDF.]
[Image omitted. See PDF.]
[Image omitted. See PDF.]
[Image omitted. See PDF.]
[Image omitted. See PDF.]
[Image omitted. See PDF.]
[Image omitted. See PDF.]
[Image omitted. See PDF.]
[Image omitted. See PDF.]
[Image omitted. See PDF.]
[Image omitted. See PDF.]
[Image omitted. See PDF.]
[Image omitted. See PDF.]
[Image omitted. See PDF.]
No. | Factors | Classes | Classification Method | |
---|---|---|---|---|
Topographic | 1 | Slope (o) | (1) 0–2; (2) 2–5; (3) 5–10; (4) 10–15; (5) 15–20; (6) >20 | Manual |
2 | Aspect | (1) Flat; (2) North; (3) Northeast; (4) East; (5) Southeast; (6) South; (7) Southwest; (8) West; (9) Northwest | Azimuth | |
3 | Elevation (m) | (1) 1612–1700; (2) 1700–1800; (3) 1800–1900; (4) 1900–2000; (5) 2000–2100; (6) 2100–2200; (7) 2200–2300; (8) 2300–2400 | Manual | |
4 | Plan curvature (m−1) | (1) [(−5.67)–(−0.736)]; (2) [(−0.736)–(−0.188)]; (3) [(−0.188)–0.149]; (4) [0.149–0.697]; (5) [0.6974–5.08] | Natural break | |
5 | Profile curvature (m−1) | (1) [(−6.357)–(−0.972)]; (2) [(−0.972)–(−0.187)]; (3) [(−0.187)–0.317]; (4) [0.317–1.1]; (5) [1.1–7.94] | Natural break | |
6 | STI | (1) 0–1.286; (2) 1.286–2.894; (3) 2.894–5.145; (4) 5.145–8.468; (5) 8.468–27.33 | Natural break | |
7 | VD | (1) 0–48.231; (2) 48.231–108.520; (3) 108.520–176.340; (4) 176.340–254.720; (5) 254.720–384.340 | Natural break | |
Hydrological | 8 | Rainfall (mm) | (1) 261–286; (2) 286–298; (3) 298–306; (4) 306–312; (5) 312–322 | Natural break |
9 | SPI | (1) 0–112.4; (2) 112.4–224.8; (3) 224.8–401.5; (4) 401.5–722.7; (5) 722.7–4095 | Natural break | |
10 | TWI | (1) 1–3; (2) 3–4; (3) 4–5; (4) 5–6; (5) 6–9.059 | Natural break | |
11 | HG | (1) A; (2) B; (3) C; (4) D | HG type | |
12 | Flow accumulation | (1) 0–5; (2) 5–10; (3) 10–20; (4) 20–30; (5) >30 | Manual | |
13 | Permeability | (1) Low; (2) Moderate; (3) High | Permeability type | |
14 | Distance to river (m) | (1) 0–20; (2) 20–40; (3) 40–60; (4) 60–80; (5) >80 | Manual | |
15 | River density (km/km2) | (1) 0–2.775; (2) 2.775–4.810; (3) 4.810–6.598; (4) 6.598–8.694; (5) 8.694–15.72 | Natural break | |
Lithological | 16 | Lithology | (1) JL; (2) JS; (3) Mm; (4) PLb; (5) Pcg; (6) Plm; (7) Plt; (8) Qal; (9) Qc; (10) Qtr; (11) Qt1; (12) Qt2 | Lithology type |
17 | Distance to fault (m) | (1) 0–100; (2) 100–200; (3) 200–500; (4) 500–1000; (5) >1000 | Manual | |
18 | Fault density (km/km2) | (1) 0–0.287; (2) 0.287–0.823; (3) 0.823–1.270; (4) 1.270–1.820; (5) 1.820–2.440 | Natural break | |
Land cover | 19 | Land use | (1) Wood land; (2) Dry-farming and cultivated lands; (3) Poor pastures; (4) Semi-dense pastures; (5) Destroyed pastures | Land use type |
Anthropogenic | 20 | Distance to road (m) | (1) 0–100; (2) 100–200; (3) 200–300; (4) 300–500; (5) >500 | Manual |
21 | Road density (km/km2) | (1) 0–0.684; (2) 0.684–1.750; (3) 1.750–2.570; (4) 2.570–3.690; (5) 3.690–6.980 | Natural break | |
Geomorphology | 22 | Geomorphology | (1) The valley plain unit (2) Hilly unit; (3) Mountain unit; (4) New plain unit; (5) Old plain unit; (6) Fluvial sediment unit | Geomorphology type |
Model Name | Description of Parameters |
---|---|
RF-ADTree | Classifier: ADTree; MaxGroup: 3; MinGroup: 3; Number of iterations: 10; Number of Groups: False; Projection Filter: PCA; Removed Percentage: 50; Number of seeds: 5 |
ADTree | Number of Boosting Iterations: 10; Random Seed: 0; Save Instance Data: false; Search Path: Expand all Paths |
LR | Maximum Its: −1; Ridge: 1.0 × 108 |
SVM-PolyKernel | Build Logistic Models: True; C: 1; Check turned Off: False; Epsilon: 1.0 × 1012: Filter Type: Not normalization/standardization; Kernel: PolyKernel; Number of folds: −1; Tolerance Parameter: 0.001 |
SVM-RBF | Build Logistic Models: True; C: 1; Check turned Off: False; Epsilon: 1.0 × 1012: Filter Type: Not normalization/standardization; Kernel: RBF; Number of folds: −1; Tolerance Parameter: 0.001 |
NBMU | - |
Measures | NBMU | SVM-Polynomial | SVM-RBF | LR | ADTree | RF-ADTree |
---|---|---|---|---|---|---|
True positive | 466 | 461 | 494 | 470 | 476 | 501 |
True negative | 513 | 574 | 558 | 550 | 551 | 570 |
False positive | 174 | 179 | 146 | 170 | 164 | 139 |
False negative | 127 | 66 | 82 | 90 | 89 | 70 |
Sensitivity (%) | 0.786 | 0.875 | 0.858 | 0.839 | 0.842 | 0.877 |
Specificity (%) | 0.747 | 0.762 | 0.793 | 0.764 | 0.771 | 0.804 |
Accuracy (%) | 0.765 | 0.809 | 0.822 | 0.797 | 0.802 | 0.837 |
RMSE | 0.398 | 0.378 | 0.375 | 0.376 | 0.379 | 0.373 |
AUC | 0.844 | 0.871 | 0.895 | 0.876 | 0.885 | 0.909 |
Measures | NBMU | SVM-Polynomial | SVM-RBF | LR | ADTree | RF-ADTree |
---|---|---|---|---|---|---|
True positive | 201 | 195 | 198 | 201 | 204 | 213 |
True negative | 210 | 244 | 227 | 236 | 240 | 240 |
False positive | 74 | 80 | 77 | 47 | 71 | 62 |
False negative | 65 | 31 | 48 | 39 | 35 | 35 |
Sensitivity (%) | 0.756 | 0.863 | 0.805 | 0.838 | 0.854 | 0.859 |
Specificity (%) | 0.739 | 0.753 | 0.747 | 0.834 | 0.772 | 0.795 |
Accuracy (%) | 0.747 | 0.798 | 0.773 | 0.836 | 0.807 | 0.824 |
RMSE | 0.403 | 0.380 | 0.381 | 0.380 | 0.384 | 0.378 |
AUC | 0.843 | 0.863 | 0.873 | 0.869 | 0.882 | 0.906 |
No. | Gully Models | Mean Ranks | χ2 | Sig. |
---|---|---|---|---|
1 | SVM-Polynomial | 2.29 | 2040 | 0.000 |
2 | SVM-RBF | 2.71 | ||
3 | LR | 3.06 | ||
4 | NBMU | 3.49 | ||
5 | ADTree | 4.65 | ||
6 | RF-ADTree | 4.80 |
No. | Pairwise Comparison | NPD | NND | z-value | p-value | Significance |
---|---|---|---|---|---|---|
1 | SVM-Polynomial vs. SVM-RBF | 303 | 540 | −9.755 | 0.000 | Yes |
2 | SVM-Polynomial vs. LR | 245 | 700 | −13.424 | 0.000 | Yes |
3 | SVM-Polynomial vs. NBMU | 349 | 905 | −9.343 | 0.000 | Yes |
4 | SVM-Polynomial vs. ADTree | 196 | 1057 | −23.838 | 0.000 | Yes |
5 | SVM-Polynomial vs. RF-ADTree | 129 | 1126 | −26.125 | 0.000 | Yes |
6 | SVM-RBF vs. LR | 325 | 568 | −4.621 | 0.000 | Yes |
7 | SVM-RBF vs. NBMU | 434 | 813 | −3.536 | 0.000 | Yes |
8 | SVM-RBF vs. ADTree | 234 | 1009 | −21.050 | 0.000 | Yes |
9 | SVM-RBF vs. RF-ADTree | 194 | 1049 | −23.189 | 0.000 | Yes |
10 | LR vs. NBMU | 448 | 780 | −2.020 | 0.043 | Yes |
11 | LR vs. ADTree | 273 | 978 | −19.344 | 0.000 | Yes |
12 | LR vs. RF-ADTree | 222 | 1019 | −21.772 | 0.000 | Yes |
13 | NBMU vs. ADTree | 291 | 916 | −19.038 | 0.000 | Yes |
14 | NBMU vs. RF-ADTree | 249 | 919 | −19.714 | 0.000 | Yes |
15 | ADTree vs. RF-ADTree | 578 | 591 | −0.616 | 0.538 | No |
NPD: Number of positive; NND: Number of negative differences.
Author Contributions
D.T.B., A.S., H.S., K.C., E.O., B.T.P., D.T.A., H.K., B.P., M.P., B.B.A., H.R., G.G. and S.L contributed equally to the work. A.S., H.S, K.C., E.O. and D.T.A. collected field data and conducted the gully erosion mapping and analysis. D.T.B., A.S., H.S., K.C., E.O., D.T.A. and M.P. wrote the manuscript. D.T.B., B.T.P., H.K., B.P., B.B.A., H.R., G.G. and S.L. provided critical comments in planning this paper and edited the manuscript. All the authors discussed the results and edited the manuscript.
Funding
This research was supported by the Basic Research Project of the Korea Institute of Geoscience, Mineral Resources (KIGAM) funded by the Minister of Science and ICT and Universiti Teknologi Malaysia (UTM) based on Research University Grant (Q.J130000.2527.17H84).
Conflicts of Interest
The authors declare no conflict of interest.
1. Lal, R. Offsetting global CO2 emissions by restoration of degraded soils and intensification of world agriculture and forestry. Land Degrad. Dev. 2003, 14, 309–322.
2. Ayele, G.K.; Gessess, A.A.; Addisie, M.B.; Tilahun, S.A.; Tebebu, T.Y.; Tenessa, D.B.; Langendoen, E.J.; Nicholson, C.F.; Steenhuis, T.S. A biophysical and economic assessment of a community-based rehabilitated gully in the ethiopian highlands. Land Degrad. Dev. 2016, 27, 270–280.
3. Kosmas, C.; Danalatos, N.; Cammeraat, L.H.; Chabart, M.; Diamantopoulos, J.; Farand, R.; Gutierrez, L.; Jacob, A.; Marques, H.; Martinez-Fernandez, J. The effect of land use on runoff and soil erosion rates under mediterranean conditions. Catena 1997, 29, 45–59.
4. Bryan, R.B. Soil erodibility and processes of water erosion on hillslope. Geomorphology 2000, 32, 385–415.
5. Poesen, J.; Nachtergaele, J.; Verstraeten, G.; Valentin, C. Gully erosion and environmental change: Importance and research needs. Catena 2003, 50, 91–133.
6. Valentin, C.; Poesen, J.; Li, Y. Gully erosion: Impacts, factors and control. Catena 2005, 63, 132–153.
7. Poesen, J.; Vandekerckhove, L.; Nachtergaele, J.; Oostwoud Wijdenes, D.; Verstraeten, G.; van Wesemael, B. Gully erosion in dryland environments. In Dryland Rivers: Hydrology and Geomorphology of Semi-Arid Channels; Bull, L.J., Kirkby, M.J., Eds.; Wiley: Chichester, UK, 2002; pp. 229–262.
8. Poesen, J.; Vandaele, K.; Van Wesemael, B. Contribution of gully erosion to sediment production on cultivated lands and rangelands. IAHS Publ. Ser. Proc. Rep. Int. Assoc. Hydrol. Sci. 1996, 236, 251–266.
9. Soil Science Society of America. Glossary of Soil Science Terms; Soil Science Society of America: Madison, WI, USA, 2001.
10. Kociuba, W.; Janicki, G.; Rodzik, J.; Stępniewski, K. Comparison of volumetric and remote sensing methods (tls) for assessing the development of a permanent forested loess gully. Nat. Hazards 2015, 79, 139–158.
11. Gawrysiak, L.; Harasimiuk, M. Spatial Diversity of Gully Density of the Lublin Upland and Roztocze Hills (se Poland); Annales Universitatis Mariae Curie-Sklodowska; De Gruyter Open Sp. z oo: Lublin, Polandcity, 2012; p. 27.
12. Janicki, G.; Rejman, J.; Zgłobicki, W.; Poesen, J.; Starkel, L.; Agnesi, V.; Angileri, S.; Cappadonia, C.; Conoscenti, C.; Rotigliano, E. Human impact on gully erosion. Landf. Anal. 2011, 17, 1–229.
13. Poesen, J. Gully typology and gully control measures in the european loess belt. In Farm Land Erosion; Elsevier: Amsterdam, The Netherlands, 1993; pp. 221–239.
14. Brice, J.C. Erosion and Deposition in the Loess-Mantled Great Plains, Medicine Creek Drainage Basin, Nebraska; US Government Printing Office: Washington, DC, USA, 1966.
15. Poesen, J.; Govers, G. Gully Erosion in the Loam Belt of Belgium: Typology and Control Measures; In Soil Erosion on Agricultural Land, Proceedings of a workshop sponsored by the British Geomorphological Research Group, Coventry, UK, January 1989; John Wiley & Sons Ltd.: Hoboken, NJ, USA, 1990; pp. 513–530.
16. Mazaeva, O.; Pelinen, V.; Janicki, G. Development of Bank Gullies on the Shore Zone of the Bratsk Reservoir (Russia). In Annales Universitatis Mariae Curie-Sklodowska; De Gruyter Open Sp. z oo: Lublin, Poland, 2014; p. 117.
17. Najafi. Land and agricultural lands in iran. Mon. Dehati Mag. 2005, 24, 17–24. (In Persian)
18. Pulley, S.; Ellery, W.N.; Lagesse, J.V.; Schlegel, P.K.; McNamara, S.J. Gully erosion as a mechanism for wetland formation: An examination of two contrasting landscapes. Land Degrad. Dev. 2018, 29, 1756–1767.
19. Zakerinejad, R.; Märker, M. Prediction of gully erosion susceptibilities using detailed terrain analysis and maximum entropy modeling: A case study in the mazayejan plain, southwest iran. Geogr. Fis. Din. Quat. 2014, 37, 67–76.
20. Svoray, T.; Michailov, E.; Cohen, A.; Rokah, L.; Sturm, A. Predicting gully initiation: Comparing data mining techniques, analytical hierarchy processes and the topographic threshold. Earth Surf. Process. Landf. 2012, 37, 607–619.
21. Conforti, M.; Aucelli, P.P.; Robustelli, G.; Scarciglia, F. Geomorphology and gis analysis for mapping gully erosion susceptibility in the turbolo stream catchment (northern calabria, italy). Nat. Hazards 2011, 56, 881–898.
22. Rahmati, O.; Haghizadeh, A.; Pourghasemi, H.R.; Noormohamadi, F. Gully erosion susceptibility mapping: The role of gis-based bivariate statistical models and their comparison. Nat. Hazards 2016, 82, 1231–1258.
23. Azareh, A.; Rahmati, O.; Rafiei-Sardooi, E.; Sankey, J.B.; Lee, S.; Shahabi, H.; Ahmad, B.B. Modelling gully-erosion susceptibility in a semi-arid region, Iran: Investigation of applicability of certainty factor and maximum entropy models. Sci. Total Environ. 2018, 655, 684–696.
24. Dube, F.; Nhapi, I.; Murwira, A.; Gumindoga, W.; Goldin, J.; Mashauri, D. Potential of weight of evidence modelling for gully erosion hazard assessment in mbire district–zimbabwe. Phys. Chem. Earth Parts A/B/C 2014, 67, 145–152.
25. Al-Abadi, A.M.; Al-Ali, A.K. Susceptibility mapping of gully erosion using gis-based statistical bivariate models: A case study from ali al-gharbi district, maysan governorate, southern iraq. Environ. Earth Sci. 2018, 77, 249.
26. Rahmati, O.; Tahmasebipour, N.; Haghizadeh, A.; Pourghasemi, H.R.; Feizizadeh, B. Evaluating the influence of geo-environmental factors on gully erosion in a semi-arid region of Iran: An integrated framework. Sci. Total Environ. 2017, 579, 913–927.
27. Arabameri, A.; Rezaei, K.; Pourghasemi, H.R.; Lee, S.; Yamani, M. Gis-based gully erosion susceptibility mapping: A comparison among three data-driven models and ahp knowledge-based technique. Environ. Earth Sci. 2018, 77, 628.
28. Chaplot, V.; Le Brozec, E.C.; Silvera, N.; Valentin, C. Spatial and temporal assessment of linear erosion in catchments under sloping lands of northern laos. Catena 2005, 63, 167–184.
29. Vanwalleghem, T.; Van Den Eeckhaut, M.; Poesen, J.; Govers, G.; Deckers, J. Spatial analysis of factors controlling the presence of closed depressions and gullies under forest: Application of rare event logistic regression. Geomorphology 2008, 95, 504–517.
30. Conoscenti, C.; Angileri, S.; Cappadonia, C.; Rotigliano, E.; Agnesi, V.; Märker, M. Gully erosion susceptibility assessment by means of gis-based logistic regression: A case of sicily (Italy). Geomorphology 2014, 204, 399–411.
31. Pourghasemi, H.R.; Yousefi, S.; Kornejady, A.; Cerdà, A. Performance assessment of individual and ensemble data-mining techniques for gully erosion modeling. Sci. Total Environ. 2017, 609, 764–775.
32. Kheir, R.B.; Wilson, J.; Deng, Y. Use of terrain variables for mapping gully erosion susceptibility in lebanon. Earth Surf. Process. Landf. J. Br. Geomorphol. Res.Group 2007, 32, 1770–1782.
33. Märker, M.; Pelacani, S.; Schröder, B. A functional entity approach to predict soil erosion processes in a small plio-pleistocene mediterranean catchment in northern chianti, italy. Geomorphology 2011, 125, 530–540.
34. Kuhnert, P.M.; Henderson, A.K.; Bartley, R.; Herr, A. Incorporating uncertainty in gully erosion calculations using the random forests modelling approach. Environmetrics 2010, 21, 493–509.
35. Arabameri, A.; Pradhan, B.; Rezaei, K. Gully erosion zonation mapping using integrated geographically weighted regression with certainty factor and random forest models in gis. J. Environ. Manag. 2019, 232, 928–942.
36. Chapi, K.; Singh, V.P.; Shirzadi, A.; Shahabi, H.; Bui, D.T.; Pham, B.T.; Khosravi, K. A novel hybrid artificial intelligence approach for flood susceptibility assessment. Environ. Model. Softw. 2017, 95, 229–245.
37. Hong, H.; Panahi, M.; Shirzadi, A.; Ma, T.; Liu, J.; Zhu, A.-X.; Chen, W.; Kougias, I.; Kazakis, N. Flood susceptibility assessment in hengfeng area coupling adaptive neuro-fuzzy inference system with genetic algorithm and differential evolution. Sci. Total Environ. 2018, 621, 1124–1141.
38. Khosravi, K.; Pham, B.T.; Chapi, K.; Shirzadi, A.; Shahabi, H.; Revhaug, I.; Prakash, I.; Bui, D.T. A comparative assessment of decision trees algorithms for flash flood susceptibility modeling at haraz watershed, northern iran. Sci. Total Environ. 2018, 627, 744–755.
39. Shafizadeh-Moghadam, H.; Valavi, R.; Shahabi, H.; Chapi, K.; Shirzadi, A. Novel forecasting approaches using combination of machine learning and statistical models for flood susceptibility mapping. J. Environ. Manag. 2018, 217, 1–11.
40. Ahmadlou, M.; Karimi, M.; Alizadeh, S.; Shirzadi, A.; Parvinnejhad, D.; Shahabi, H.; Panahi, M. Flood susceptibility assessment using integration of adaptive network-based fuzzy inference system (anfis) and biogeography-based optimization (bbo) and bat algorithms (ba). Geocarto Int. 2018, 1–21.
41. Bui, D.T.; Panahi, M.; Shahabi, H.; Singh, V.P.; Shirzadi, A.; Chapi, K.; Khosravi, K.; Chen, W.; Panahi, S.; Li, S. Novel hybrid evolutionary algorithms for spatial prediction of floods. Sci. Rep. 2018, 8, 15364.
42. Miraki, S.; Zanganeh, S.H.; Chapi, K.; Singh, V.P.; Shirzadi, A.; Shahabi, H.; Pham, B.T. Mapping groundwater potential using a novel hybrid intelligence approach. Water Resour. Manag. 2018, 33, 281–302.
43. Rahmati, O.; Naghibi, S.A.; Shahabi, H.; Bui, D.T.; Pradhan, B.; Azareh, A.; Rafiei-Sardooi, E.; Samani, A.N.; Melesse, A.M. Groundwater spring potential modelling: Comprising the capability and robustness of three different modeling approaches. J. Hydrol. 2018, 565, 248–261.
44. Tien Bui, D.; Khosravi, K.; Li, S.; Shahabi, H.; Panahi, M.; Singh, V.; Chapi, K.; Shirzadi, A.; Panahi, S.; Chen, W. New hybrids of anfis with several optimization algorithms for flood susceptibility modeling. Water 2018, 10, 1210.
45. Chen, W.; Shirzadi, A.; Shahabi, H.; Ahmad, B.B.; Zhang, S.; Hong, H.; Zhang, N. A novel hybrid artificial intelligence approach based on the rotation forest ensemble and naïve bayes tree classifiers for a landslide susceptibility assessment in langao county, china. Geomat. Nat. Hazards Risk 2017, 8, 1955–1977.
46. Chen, W.; Shahabi, H.; Shirzadi, A.; Li, T.; Guo, C.; Hong, H.; Li, W.; Pan, D.; Hui, J.; Ma, M. A novel ensemble approach of bivariate statistical-based logistic model tree classifier for landslide susceptibility assessment. Geocarto Int. 2018, 33, 1398–1420.
47. Pham, B.T.; Shirzadi, A.; Bui, D.T.; Prakash, I.; Dholakia, M. A hybrid machine learning ensemble approach based on a radial basis function neural network and rotation forest for landslide susceptibility modeling: A case study in the himalayan area, India. Int. J. Sediment Res. 2018, 33, 157–170.
48. Abedini, M.; Ghasemian, B.; Shirzadi, A.; Shahabi, H.; Chapi, K.; Pham, B.T.; Bin Ahmad, B.; Tien Bui, D. A novel hybrid approach of bayesian logistic regression and its ensembles for landslide susceptibility assessment. Geocarto Int. 2018.
49. Tien Bui, D.; Shahabi, H.; Shirzadi, A.; Chapi, K.; Hoang, N.-D.; Pham, B.; Bui, Q.-T.; Tran, C.-T.; Panahi, M.; Bin Ahamd, B. A novel integrated approach of relevance vector machine optimized by imperialist competitive algorithm for spatial modeling of shallow landslides. Remote Sens. 2018, 10, 1538.
50. Jaafari, A.; Panahi, M.; Pham, B.T.; Shahabi, H.; Bui, D.T.; Rezaie, F.; Lee, S. Meta optimization of an adaptive neuro-fuzzy inference system with grey wolf optimizer and biogeography-based optimization algorithms for spatial prediction of landslide susceptibility. Catena 2019, 175, 430–445.
51. Nguyen, V.V.; Pham, B.T.; Vu, B.T.; Prakash, I.; Jha, S.; Shahabi, H.; Shirzadi, A.; Ba, D.N.; Kumar, R.; Chatterjee, J.M. Hybrid machine learning approaches for landslide susceptibility modeling. Forests 2019, 10, 157.
52. Tien Bui, D.; Shahabi, H.; Omidvar, E.; Shirzadi, A.; Geertsema, M.; Clague, J.J.; Khosravi, K.; Pradhan, B.; Pham, B.T.; Chapi, K. Shallow landslide prediction using a novel hybrid functional machine learning algorithm. Remote Sens. 2019, 11, 931.
53. Zhang, T.; Han, L.; Chen, W.; Shahabi, H. Hybrid integration approach of entropy with logistic regression and support vector machine for landslide susceptibility modeling. Entropy 2018, 20, 884.
54. Chen, W.; Shahabi, H.; Zhang, S.; Khosravi, K.; Shirzadi, A.; Chapi, K.; Pham, B.; Zhang, T.; Zhang, L.; Chai, H. Landslide susceptibility modeling based on gis and novel bagging-based kernel logistic regression. Appl. Sci. 2018, 8, 2540.
55. Tien Bui, D.; Shahabi, H.; Shirzadi, A.; Chapi, K.; Alizadeh, M.; Chen, W.; Mohammadi, A.; Ahmad, B.; Panahi, M.; Hong, H. Landslide detection and susceptibility mapping by airsar data using support vector machine and index of entropy models in cameron highlands, malaysia. Remote Sens. 2018, 10, 1527.
56. Chen, W.; Peng, J.; Hong, H.; Shahabi, H.; Pradhan, B.; Liu, J.; Zhu, A.-X.; Pei, X.; Duan, Z. Landslide susceptibility modelling using gis-based machine learning techniques for chongren county, Jiangxi province, china. Sci. Total Environ. 2018, 626, 1121–1135.
57. Chen, W.; Shahabi, H.; Shirzadi, A.; Hong, H.; Akgun, A.; Tian, Y.; Liu, J.; Zhu, A.-X.; Li, S. Novel hybrid artificial intelligence approach of bivariate statistical-methods-based kernel logistic regression classifier for landslide susceptibility modeling. Bull. Eng. Geol. Environ. 2018.
58. Shadman Roodposhti, M.; Aryal, J.; Shahabi, H.; Safarrad, T. Fuzzy shannon entropy: A hybrid gis-based landslide susceptibility mapping method. Entropy 2016, 18, 343.
59. Jaafari, A.; Zenner, E.K.; Panahi, M.; Shahabi, H. Hybrid artificial intelligence models based on a neuro-fuzzy system and metaheuristic optimization algorithms for spatial prediction of wildfire probability. Agric. Forest Meteorol. 2019, 266, 198–207.
60. Taheri, K.; Shahabi, H.; Chapi, K.; Shirzadi, A.; Gutiérrez, F.; Khosravi, K. Sinkhole susceptibility mapping: A comparison between bayes-based machine learning algorithms. Land Degrad. Dev. 2019, 30, 730–745.
61. Roodposhti, M.S.; Safarrad, T.; Shahabi, H. Drought sensitivity mapping using two one-class support vector machine algorithms. Atmos. Res. 2017, 193, 73–82.
62. Tien Bui, D.; Shahabi, H.; Shirzadi, A.; Chapi, K.; Pradhan, B.; Chen, W.; Khosravi, K.; Panahi, M.; Bin Ahmad, B.; Saro, L. Land subsidence susceptibility mapping in south korea using machine learning algorithms. Sensors 2018, 18, 2464.
63. Hong, H.; Liu, J.; Bui, D.T.; Pradhan, B.; Acharya, T.D.; Pham, B.T.; Zhu, A.-X.; Chen, W.; Ahmad, B.B. Landslide susceptibility mapping using j48 decision tree with adaboost, bagging and rotation forest ensembles in the guangchang area (China). Catena 2018, 163, 399–413.
64. Rodriguez, J.J.; Kuncheva, L.I.; Alonso, C.J. Rotation forest: A new classifier ensemble method. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28, 1619–1630.
65. Du, P.; Samat, A.; Waske, B.; Liu, S.; Li, Z. Random forest and rotation forest for fully polarized sar image classification using polarimetric and spatial features. ISPRS J. Photogram. Remote Sens. 2015, 105, 38–53.
66. Nguyen, Q.-K.; Tien Bui, D.; Hoang, N.-D.; Trinh, P.; Nguyen, V.-H.; Yilmaz, I. A novel hybrid approach based on instance based learning classifier and rotation forest ensemble for spatial prediction of rainfall-induced shallow landslides using gis. Sustainability 2017, 9, 813.
67. Zhu, H.-J.; You, Z.-H.; Zhu, Z.-X.; Shi, W.-L.; Chen, X.; Cheng, L. Droiddet: Effective and robust detection of android malware using static analysis along with rotation forest model. Neurocomputing 2018, 272, 638–646.
68. Agnesi, V.; Angileri, S.; Cappadonia, C.; Conoscenti, C.; Rotigliano, E. Multi parametric gis analysis to assess gully erosion susceptibility: A test in southern sicily, italy. Landf. Anal. 2011, 17, 15–20.
69. Wang, L.; Wei, S.; Horton, R.; Shao, M.A. Effects of vegetation and slope aspect on water budget in the hill and gully region of the loess plateau of china. Catena 2011, 87, 90–100.
70. Pauchard, A.; Alaback, P.B. Influence of elevation, land use, and landscape context on patterns of alien plant invasions along roadsides in protected areas of south-central chile. Conserv. Biol. 2004, 18, 238–248.
71. Zhu, H.; Tang, G.; Qian, K.; Liu, H. Extraction and analysis of gully head of loess plateau in china based on digital elevation model. Chin. Geogr. Sci. 2014, 24, 328–338.
72. Gómez-Gutiérrez, Á.; Conoscenti, C.; Angileri, S.E.; Rotigliano, E.; Schnabel, S. Using topographical attributes to evaluate gully erosion proneness (susceptibility) in two mediterranean basins: Advantages and limitations. Nat. Hazards 2015, 79, 291–314.
73. Cerdan, O.; Le Bissonnais, Y.; Couturier, A.; Bourennane, H.; Souchère, V. Rill erosion on cultivated hillslopes during two extreme rainfall events in normandy, france. Soil Tillage Res. 2002, 67, 99–108.
74. Moore, I.D.; Wilson, J.P. Length-slope factors for the revised universal soil loss equation: Simplified method of estimation. J. Soil Water Conserv. 1992, 47, 423–428.
75. Woltemade, C.J. Impact of residential soil disturbance on infiltration rate and stormwater runoff 1. J. Am. Water Resour. Assoc. 2010, 46, 700–711.
76. Danladi, A.; Ray, H. An analysis of some soil properties along gully erosion sites under different land use areas of gombe metropolis, gombe state, nigeria. J. Geogr. Reg. Plan. 2014, 7, 86–96.
77. El Maaoui, M.A.; Felfoul, M.S.; Boussema, M.R.; Snane, M.H. Sediment yield from irregularly shaped gullies located on the fortuna lithologic formation in semi-arid area of tunisia. Catena 2012, 93, 97–104.
78. Lesschen, J.; Kok, K.; Verburg, P.; Cammeraat, L. Identification of vulnerable areas for gully erosion under different scenarios of land abandonment in southeast spain. Catena 2007, 71, 110–121.
79. Billi, P.; Dramis, F. Geomorphological investigation on gully erosion in the rift valley and the northern highlands of ethiopia. Catena 2003, 50, 353–368.
80. Vapnik, V. The Nature of Statistical Learning Theory; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2013.
81. Hong, H.; Liu, J.; Zhu, A.-X.; Shahabi, H.; Pham, B.T.; Chen, W.; Pradhan, B.; Bui, D.T. A novel hybrid integration model using support vector machines and random subspace for weather-triggered landslide susceptibility assessment in the wuning area (china). Environ. Earth Sci. 2017, 76, 652.
82. Pham, B.T.; Jaafari, A.; Prakash, I.; Bui, D.T. A novel hybrid intelligent model of support vector machines and the multiboost ensemble for landslide susceptibility modeling. Bull. Eng. Geol. Environ. 2018, 78, 2865–2886.
83. Tehrany, M.S.; Pradhan, B.; Jebur, M.N. Flood susceptibility mapping using a novel ensemble weights-of-evidence and support vector machine models in gis. J. Hydrol. 2014, 512, 332–343.
84. Tehrany, M.S.; Pradhan, B.; Mansor, S.; Ahmad, N. Flood susceptibility assessment using gis-based support vector machine model with different kernel types. Catena 2015, 125, 91–101.
85. Cortez, P.; Morais, A.D.J.R. A Data Mining Approach to Predict Forest Fires Using Meteorological Data; Associação Portuguesa para a Inteligência Artificial (APPIA): Guimarães, Portugal, 2007.
86. Tien Bui, D.; Le, K.-T.T.; Nguyen, V.C.; Le, H.D.; Revhaug, I. Tropical forest fire susceptibility mapping at the cat ba national park area, hai phong city, vietnam, using gis-based kernel logistic regression. Remote Sens. 2016, 8, 347.
87. Kavzoglu, T.; Colkesen, I. A kernel functions analysis for support vector machines for land cover classification. Int. J. Appl. Earth Obs. Geoinf. 2009, 11, 352–359.
88. Tu, J.V. Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes. J. Clin. Epidemiol. 1996, 49, 1225–1231.
89. Bagley, S.C.; White, H.; Golomb, B.A. Logistic regression in the medical literature: Standards for use and reporting, with particular attention to one medical domain. J. Clin. Epidemiol. 2001, 54, 979–985.
90. Hosmer, D.W.; Lemesbow, S. Goodness of fit tests for the multiple logistic regression model. Commun. Stat. Theory Methods 1980, 9, 1043–1069.
91. Pham, B.T.; Prakash, I. Evaluation and comparison of logitboost ensemble, fisher’s linear discriminant analysis, logistic regression, and support vector machines methods for landslide susceptibility mapping. Geocarto Int. 2017, 34, 316–333.
92. Pradhan, B. Flood susceptible mapping and risk area delineation using logistic regression, GIS and remote sensing. J. Spat. Hydrol. 2010, 9, 33–49.
93. Chen, J.; Huang, H.; Tian, S.; Qu, Y. Feature selection for text classification with naïve bayes. Expert Syst. Appl. 2009, 36, 5432–5435.
94. Kim, S.-B.; Han, K.-S.; Rim, H.-C.; Myaeng, S.H. Some effective techniques for naive bayes text classification. IEEE Trans. Knowl. Data Eng. 2006, 18, 1457–1466.
95. Subbalakshmi, G.; Ramesh, K.; Rao, M.C. Decision support in heart disease prediction system using naive bayes. Indian J. Comput. Sci. Eng. (IJCSE) 2011, 2, 170–176.
96. Bhargavi, P.; Jyothi, S. Applying naive bayes data mining technique for classification of agricultural land soils. Int. J. Comput. Sci. Netw. Secur. 2009, 9, 117–122.
97. Li, Y.; Anderson-Sprecher, R. Facies identification from well logs: A comparison of discriminant analysis and naïve bayes classifier. J. Petrol. Sci. Eng. 2006, 53, 149–157.
98. Shirzadi, A.; Bui, D.T.; Pham, B.T.; Solaimani, K.; Chapi, K.; Kavian, A.; Shahabi, H.; Revhaug, I. Shallow landslide susceptibility assessment using a novel hybrid intelligence approach. Environ. Earth Sci. 2017, 76, 60.
99. Freund, Y.; Mason, L. The Alternating Decision Tree Learning Algorithm; ICML: New York, NY, USA, 1999; pp. 124–133.
100. Chen, W.; Xie, X.; Peng, J.; Wang, J.; Duan, Z.; Hong, H. Gis-based landslide susceptibility modelling: A comparative assessment of kernel logistic regression, naïve-bayes tree, and alternating decision tree models. Geomat. Nat. Hazards Risk 2017, 8, 950–973.
101. De Comité, F.; Gilleron, R.; Tommasi, M. Learning multi-label alternating decision trees from texts and data. In Proceedings of the International Workshop on Machine Learning and Data Mining in Pattern Recognition; Springer: Berlin/Heidelberg, Germany, 2003; pp. 35–49.
102. Berk, R.A. Classification and regression trees (cart). In Statistical Learning from a Regression Perspective; Springer: Berlin/Heidelberg, Germany, 2008; pp. 1–65.
103. Pal, M. Random forest classifier for remote sensing classification. Int. J. Remote Sens. 2005, 26, 217–222.
104. Ozcift, A. Svm feature selection based rotation forest ensemble classifiers to improve computer-aided diagnosis of parkinson disease. J. Med. Syst. 2012, 36, 2141–2147.
105. Shirzadi, A.; Solaimani, K.; Roshan, M.H.; Kavian, A.; Chapi, K.; Shahabi, H.; Keesstra, S.; Ahmad, B.B.; Bui, D.T. Uncertainties of prediction accuracy in shallow landslide modeling: Sample size and raster resolution. Catena 2019, 178, 172–188.
106. Quinlan, J.R. C4. 5: Programs for Empirical Learning; Morgan Kaufmann: San Francisco, CA, USA, 1993.
107. Bui, D.T.; Ho, T.-C.; Pradhan, B.; Pham, B.-T.; Nhu, V.-H.; Revhaug, I. Gis-based modeling of rainfall-induced landslides using data mining-based functional trees classifier with adaboost, bagging, and multiboost ensemble frameworks. Environ. Earth Sci. 2016, 75, 1101.
108. Bui, D.T.; Tuan, T.A.; Klempe, H.; Pradhan, B.; Revhaug, I. Spatial prediction models for shallow landslide hazards: A comparative assessment of the efficacy of support vector machines, artificial neural networks, kernel logistic regression, and logistic model tree. Landslides 2016, 13, 361–378.
109. He, Q.; Shahabi, H.; Shirzadi, A.; Li, S.; Chen, W.; Wang, N.; Chai, H.; Bian, H.; Ma, J.; Chen, Y. Landslide spatial modelling using novel bivariate statistical based naïve bayes, rbf classifier, and rbf network machine learning algorithms. Sci. Total Environ. 2019, 663, 1–15.
110. Pham, B.T.; Prakash, I.; Singh, S.K.; Shirzadi, A.; Shahabi, H.; Bui, D.T. Landslide susceptibility modeling using reduced error pruning trees and different ensemble techniques: Hybrid machine learning approaches. Catena 2019, 175, 203–218.
111. Chen, W.; Panahi, M.; Tsangaratos, P.; Shahabi, H.; Ilia, I.; Panahi, S.; Li, S.; Jaafari, A.; Ahmad, B.B. Applying population-based evolutionary algorithms and a neuro-fuzzy system for modeling landslide susceptibility. Catena 2019, 172, 212–231.
112. Shirzadi, A.; Soliamani, K.; Habibnejhad, M.; Kavian, A.; Chapi, K.; Shahabi, H.; Chen, W.; Khosravi, K.; Thai Pham, B.; Pradhan, B. Novel GIS based machine learning algorithms for shallow landslide susceptibility mapping. Sensors 2018, 18, 3777.
113. Chen, W.; Zhang, S.; Li, R.; Shahabi, H. Performance evaluation of the gis-based data mining techniques of best-first decision tree, random forest, and naïve bayes tree for landslide susceptibility modeling. Sci. Total Environ. 2018, 644, 1006–1018.
114. Vanmaercke, M.; Poesen, J.; Van Mele, B.; Demuzere, M.; Bruynseels, A.; Golosov, V.; Bezerra, J.F.R.; Bolysov, S.; Dvinskih, A.; Frankl, A. How fast do gully headcuts retreat? Earth-Sci. Rev. 2016, 154, 336–355.
115. Ballabio, C.; Sterlacchini, S. Support vector machines for landslide susceptibility mapping: The staffora river basin case study, italy. Math. Geosci. 2012, 44, 47–70.
116. Wijdenes, D.J.O.; Poesen, J.; Vandekerckhove, L.; Ghesquiere, M. Spatial distibution of gully head activity and sediment supply along an ephemeral channel in a mediterranean environment. Catena 2000, 39, 147–167.
117. Arabameri, A.; Pradhan, B.; Rezaei, K.; Yamani, M.; Pourghasemi, H.R.; Lombardo, L. Spatial modelling of gully erosion using evidential belief function, logistic regression, and a new ensemble of evidential belief function–logistic regression algorithm. Land Degrad. Dev. 2018, 29, 4035–4049.
118. Pham, B.T.; Bui, D.T.; Dholakia, M.; Prakash, I.; Pham, H.V.; Mehmood, K.; Le, H.Q. A novel ensemble classifier of rotation forest and naïve bayer for landslide susceptibility assessment at the luc yen district, yen bai province (viet nam) using gis. Geomat. Nat. Hazards Risk 2017, 8, 649–671.
119. Pham, B.T.; Nguyen, V.-T.; Ngo, V.-L.; Trinh, P.T.; Ngo, H.T.T.; Bui, D.T. A novel hybrid model of rotation forest based functional trees for landslide susceptibility mapping: A case study at kon tum province, Vietnam. In Proceedings of the International Conference on Geo-Spatial Technologies and Earth Resources, Hanoi, Vietnam, 5–6 October 2017; pp. 186–201.
120. Jebur, M.N.; Pradhan, B.; Tehrany, M.S. Optimization of landslide conditioning factors using very high-resolution airborne laser scanning (lidar) data at catchment scale. Remote Sens. Environ. 2014, 152, 150–165.
121. Bui, D.T.; Pradhan, B.; Revhaug, I.; Tran, C.T. A comparative assessment between the application of fuzzy unordered rules induction algorithm and j48 decision tree models in spatial prediction of shallow landslides at lang son city, vietnam. In Remote Sensing Applications in Environmental Research; Springer: Berlin/Heidelberg, Germany, 2014; pp. 87–111.
122. Francis, J.; Tontisirin, N.; Anantsuksomsri, S.; Vink, J.; Zhong, V. Alternative strategies for mapping acs estimates and error of estimation. In Emerging Techniques in Applied Demography; Springer: Berlin/Heidelberg, Germany, 2015; pp. 247–273.
123. Umar, Z.; Pradhan, B.; Ahmad, A.; Jebur, M.N.; Tehrany, M.S. Earthquake induced landslide susceptibility mapping using an integrated ensemble frequency ratio and logistic regression models in west sumatera province, indonesia. Catena 2014, 118, 124–135.
124. Pham, B.T.; Bui, D.T.; Prakash, I.; Dholakia, M. Rotation forest fuzzy rule-based classifier ensemble for spatial prediction of landslides using gis. Nat. Hazards 2016, 83, 97–127.
125. Neilsen, R.D.; Hjelmfelt, A.T. Hydrologic soil group assignment. Proc. Water Resour. Eng. 1998, 10, 1297–1302.
1Geographic Information Science Research Group, Ton Duc Thang University, Ho Chi Minh City, Vietnam
2Faculty of Environment and Labour Safety, Ton Duc Thang University, Ho Chi Minh City, Vietnam
3Department of Rangeland and Watershed Management, Faculty of Natural Resources, University of Kurdistan, Sanandaj 66177-15175, Iran
4Department of Geomorphology, Faculty of Natural Resources, University of Kurdistan, Sanandaj 66177-15175, Iran
5Department of Rangeland and Watershed Management, Faculty of Natural Resources and Earth Sciences, University of Kashan, Kashan 87317-53153, Iran
6Institute of Research and Development, Duy Tan University, Da Nang 550000, Vietnam
7Kurdistan Agriculture and Natural Resources Research and Education Center, AREEO, Sanandaj 66169-36311, Iran
8Center for Advanced Modeling and Geospatial System (CAMGIS), Faculty of Engineering and IT, University of Technology Sydney, CB11.06.106, Building 11, 81 Broadway, Ultimo NSW 2007, Australia
9Department of Energy and Mineral Resources Engineering, Choongmu-gwan, Sejong University, 209 Neungdong-ro, Gwangjin-gu, Seoul 05006, Korea
10Department of Geophysics, Young Researchers and Elites Club, North Tehran Branch, Islamic Azad University, Tehran P.O. Box 19585/466, Iran
11Faculty of Built Environment and Surveying, Universiti Teknologi Malaysia (UTM), Johor Bahru 81310, Malaysia
12Department of Computer Science and Engineering, and IT, School of Electrical and Computer Engineering, Shiraz University, Shiraz 84334-71964, Iran
13Department of Energy Engineering, Budapest University of Technology and Economics, Budapest 1111, Hungary
14Geoscience Platform Research Division, Korea Institute of Geoscience and Mineral Resources (KIGAM), 124 Gwahak-ro Yuseong-gu, Daejeon 34132, Korea
15Department of Geophysical Exploration, Korea University of Science and Technology, 217 Gajeong-ro Yuseong-gu, Daejeon 34113, Korea
*Authors to whom correspondence should be addressed.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2019. This work is licensed under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
In this study, we introduced a novel hybrid artificial intelligence approach of rotation forest (RF) as a Meta/ensemble classifier based on alternating decision tree (ADTree) as a base classifier called RF-ADTree in order to spatially predict gully erosion at Klocheh watershed of Kurdistan province, Iran. A total of 915 gully erosion locations along with 22 gully conditioning factors were used to construct a database. Some soft computing benchmark models (SCBM) including the ADTree, the Support Vector Machine by two kernel functions such as Polynomial and Radial Base Function (SVM-Polynomial and SVM-RBF), the Logistic Regression (LR), and the Naïve Bayes Multinomial Updatable (NBMU) models were used for comparison of the designed model. Results indicated that 19 conditioning factors were effective among which distance to river, geomorphology, land use, hydrological group, lithology and slope angle were the most remarkable factors for gully modeling process. Additionally, results of modeling concluded the RF-ADTree ensemble model could significantly improve (area under the curve (AUC) = 0.906) the prediction accuracy of the ADTree model (AUC = 0.882). The new proposed model had also the highest performance (AUC = 0.913) in comparison to the SVM-Polynomial model (AUC = 0.879), the SVM-RBF model (AUC = 0.867), the LR model (AUC = 0.75), the ADTree model (AUC = 0.861) and the NBMU model (AUC = 0.811).
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer