Content area
Abstract
How to reduce the complexity of network topology and make the learned joint probability distribution fit data are two important but inconsistent issues for learning Bayesian network classifier (BNC). By transforming one single high-order topology into a set of low-order ones, ensemble learning algorithms can include more hypothesis implicated in training data and help achieve the tradeoff between bias and variance. Resampling from training data can vary the results of member classifiers of the ensemble, whereas the potentially lost information may bias the estimate of conditional probability distribution and then introduce insignificant rather than significant dependency relationships into the network topology of BNC. In this paper, we propose to learn from training data as a whole and apply heuristic search strategy to flexibly identify the significant conditional dependencies, and then the attribute order is determined implicitly. Random sampling is introduced to make each member of the ensemble “unstable” and fully represent the conditional dependencies. The experimental evaluation on 40 UCI datasets reveals that the proposed algorithm, called random Bayesian forest (RBF), achieves remarkable classification performance compared to the extended version of state-of-the-art out-of-core BNCs (e.g., SKDB, WATAN, WAODE, SA2DE, SASA2DE and IWAODE).
Details
1 Jilin University, College of Software, ChangChun, China (GRID:grid.64924.3d) (ISNI:0000 0004 1760 5735)
2 Jilin University, Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, ChangChun, China (GRID:grid.64924.3d) (ISNI:0000 0004 1760 5735)
3 Jilin University, College of Computer Science and Technology, ChangChun, China (GRID:grid.64924.3d) (ISNI:0000 0004 1760 5735)





