The performance of coevolutionary topologies in developing competitive tree manipulation strategies for symbolic regression

Ombura, Martin

The performance of coevolutionary topologies in developing competitive tree manipulation strategies for symbolic regression

Master Thesis

2020

Abstract

Computer bugs and tests are antagonistic elements of the software development process, with the former attempting to corrupt a program and the latter aiming to identify and fix the introduced faults. The automation of bug identification and repair schemes through automated software testing is an area of research that has only seen success in niche areas of software development but has failed to progress into general areas of computing due to the complexity and diversity of programming languages, codebases and developer coding practices. Unlike traditional engineering fields such as mechanical or civil where project specifications are carefully outlined and built towards, software engineering suffers from a lack of global standardization required to “build from a spec”. In this study we investigate a coevolutionary spec-based approach to dynamically damage and repair programs mathematical programs (functions). We opt for mathematical functions instead of software due to their functional similarities and simpler syntax and semantics. We utilize symbolic regression (SR) as a framework to analyze the error maximized by bugs and minimized by test. We adopt a hybrid evolutionary algorithm (EA) that implements the tree based phenotypic structure of genetic programming (GP) and the list-based chromosome of genetic algorithm (GA) that permits embedding of mathematical tree manipulation (MTM) strategies, as well as adequate selection mechanisms for search. Bugs utilize the MTM strategies in their chromosome to manipulate the input program (IP) with the aim of maximizing the error while tests adopt a set of their own MTM strategies to repair the damaged program using a spec generated from the IP to guide the repair process. Both adversarial agents are investigated in four common coevolutionary topologies, Hall of Fame (HoF), K-Random Tournaments (KRT), Round Robin (RR) and Single Elimination Tournament (SET). We ran 1556 simulations each generating a random polynomial that the bugs and tests would have to contend over in all 4 topologies. We observed that KRT with a low k value of 5 performs best from a computational and fitness standpoint for all bugs and tests. Bugs were dominant in nearly all topologies for all polynomial complexities, whereas tests struggled in the HoF, RR and SET topologies as the input programs became more complex. The competitive landscape however was quite chaotic with the best individuals lasting a maximum of 14 generations out of 300, with the average top individuals lasting only 1 generation. This made predictions on when the best individuals would be born nearly impossible as the coevolutionary landscape changed quite rapidly and non-deterministically. The kinds of MTM strategies selected by both bugs and tests depended on the level of complexity of the input programs. For input programs that had negative polynomials, the best bugs opted to delete the program entirely and build a completely new tree, whereas the best tests were unable to select viable specialized strategies to repair such programs. For programs that had large polynomial degrees, bugs opted for strategies that added nodes their underlying GP tree, in the hopes of damaging the input program more. Tests on the other hand implemented strategies to carefully reduce the complexity of the polynomial. Tests however, frequently overcompensated when attempting to fix the fit bugs, leading to mediocre solutions.

Keywords

computer science

Reference:

Collections

Masters

Full item page