Content area

Abstract

The initial population in genetic programming (GP) should form a representative sample of all possible solutions (the search space). While large populations accurately approximate the distribution of possible solutions, small populations tend to incorporate a sampling error. This paper analyzes how the size of a GP population affects the sampling error and contributes to answering the question of how to size initial GP populations. First, we present a probabilistic model of the expected number of subtrees for GP populations initialized with full, grow, or ramped half-and-half. Second, based on our frequency model, we present a model that estimates the sampling error for a given GP population size. We validate our models empirically and show that, compared to smaller population sizes, our recommended population sizes largely reduce the sampling error of measured fitness values. Increasing the population sizes even more, however, does not considerably reduce the sampling error of fitness values. Last, we recommend population sizes for some widely used benchmark problem instances that result in a low sampling error. A low sampling error at initialization is necessary (but not sufficient) for a reliable search since lowering the sampling error means that the overall random variations in a random sample are reduced. Our results indicate that sampling error is a severe problem for GP, making large initial population sizes necessary to obtain a low sampling error. Our model allows practitioners of GP to determine a minimum initial population size so that the sampling error is lower than a threshold, given a confidence level.

Details

Title
On sampling error in genetic programming
Author
Schweim, Dirk 1   VIAFID ORCID Logo  ; Wittenberg, David 1 ; Rothlauf, Franz 1 

 Johannes Gutenberg University, Mainz, Germany (GRID:grid.5802.f) (ISNI:0000 0001 1941 7111) 
Publication title
Volume
21
Issue
2
Pages
173-186
Publication year
2022
Publication date
Jun 2022
Publisher
Springer Nature B.V.
Place of publication
Dordrecht
Country of publication
Netherlands
Publication subject
ISSN
15677818
e-ISSN
15729796
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Online publication date
2021-01-31
Milestone dates
2020-11-25 (Registration); 2020-11-25 (Accepted)
Publication history
 
 
   First posting date
31 Jan 2021
ProQuest document ID
2673710960
Document URL
https://www.proquest.com/scholarly-journals/on-sampling-error-genetic-programming/docview/2673710960/se-2?accountid=208611
Copyright
© The Author(s) 2021. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2023-11-30
Database
ProQuest One Academic