Evolutionary algorithms for optimising reinforcement learning policy approximation

Cuningham, Blake

Evolutionary algorithms for optimising reinforcement learning policy approximation

dc.contributor.advisor	Bassett, Bruce
dc.contributor.author	Cuningham, Blake
dc.date.accessioned	2020-02-19T12:18:14Z
dc.date.available	2020-02-19T12:18:14Z
dc.date.issued	2019
dc.date.updated	2020-02-19T12:17:41Z
dc.description.abstract	Reinforcement learning methods have become more efficient in recent years. In particular, the A3C (asynchronous advantage actor critic) approach demonstrated in Mnih et al. (2016) was able to halve the training time of the existing state-of-the-art approaches. However, these methods still require relatively large amounts of training resources due to the fundamental exploratory nature of reinforcement learning. Other machine learning approaches are able to improve the ability to train reinforcement learning agents by better processing input information to help map states to actions - convolutional and recurrent neural networks are helpful when input data is in image form that does not satisfy the Markov property. The specific required architecture of these convolutional and recurrent neural network models is not obvious given infinite possible permutations. There is very limited research giving clear guidance on neural network structure in a RL (reinforcement learning) context, and grid search-like approaches require too many resources and do not always find good optima. In order to address these, and other, challenges associated with traditional parameter optimization methods, an evolutionary approach similar to that taken by Dufourq and Bassett (2017) for image classification tasks was used to find the optimal model architecture when training an agent that learns to play Atari Pong. The approach found models that were able to train reinforcement learning agents faster, and with fewer parameters than that found by OpenAI’s model in Blackwell et al. (2018) - a superhuman level of performance.
dc.identifier.apacitation	Cuningham, B. (2019). <i>Evolutionary algorithms for optimising reinforcement learning policy approximation</i>. (). ,Faculty of Science ,Department of Statistical Sciences. Retrieved from http://hdl.handle.net/11427/31170	en_ZA
dc.identifier.chicagocitation	Cuningham, Blake. <i>"Evolutionary algorithms for optimising reinforcement learning policy approximation."</i> ., ,Faculty of Science ,Department of Statistical Sciences, 2019. http://hdl.handle.net/11427/31170	en_ZA
dc.identifier.citation	Cuningham, B. 2019. Evolutionary algorithms for optimising reinforcement learning policy approximation.	en_ZA
dc.identifier.ris	TY - Thesis / Dissertation AU - Cuningham, Blake AB - Reinforcement learning methods have become more efficient in recent years. In particular, the A3C (asynchronous advantage actor critic) approach demonstrated in Mnih et al. (2016) was able to halve the training time of the existing state-of-the-art approaches. However, these methods still require relatively large amounts of training resources due to the fundamental exploratory nature of reinforcement learning. Other machine learning approaches are able to improve the ability to train reinforcement learning agents by better processing input information to help map states to actions - convolutional and recurrent neural networks are helpful when input data is in image form that does not satisfy the Markov property. The specific required architecture of these convolutional and recurrent neural network models is not obvious given infinite possible permutations. There is very limited research giving clear guidance on neural network structure in a RL (reinforcement learning) context, and grid search-like approaches require too many resources and do not always find good optima. In order to address these, and other, challenges associated with traditional parameter optimization methods, an evolutionary approach similar to that taken by Dufourq and Bassett (2017) for image classification tasks was used to find the optimal model architecture when training an agent that learns to play Atari Pong. The approach found models that were able to train reinforcement learning agents faster, and with fewer parameters than that found by OpenAI’s model in Blackwell et al. (2018) - a superhuman level of performance. DA - 2019 DB - OpenUCT DP - University of Cape Town KW - statistical sciences LK - https://open.uct.ac.za PY - 2019 T1 - Evolutionary algorithms for optimising reinforcement learning policy approximation TI - Evolutionary algorithms for optimising reinforcement learning policy approximation UR - http://hdl.handle.net/11427/31170 ER -	en_ZA
dc.identifier.uri	http://hdl.handle.net/11427/31170
dc.identifier.vancouvercitation	Cuningham B. Evolutionary algorithms for optimising reinforcement learning policy approximation. []. ,Faculty of Science ,Department of Statistical Sciences, 2019 [cited yyyy month dd]. Available from: http://hdl.handle.net/11427/31170	en_ZA
dc.language.rfc3066	eng
dc.publisher.department	Department of Statistical Sciences
dc.publisher.faculty	Faculty of Science
dc.subject	statistical sciences
dc.title	Evolutionary algorithms for optimising reinforcement learning policy approximation
dc.type	Master Thesis
dc.type.qualificationlevel	Masters
dc.type.qualificationname	MSc

Files

Original bundle

Now showing 1 - 1 of 1

Name:: thesis_sci_2019_cuningham_blake.pdf
Size:: 7.19 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 0 B
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Masters