Evaluating transformers as memory systems in reinforcement learning

dc.contributor.advisorShock, Jonathan
dc.contributor.advisorPretorius, Arnu
dc.contributor.authorMakkink, Thomas
dc.date.accessioned2022-02-23T15:40:59Z
dc.date.available2022-02-23T15:40:59Z
dc.date.issued2021
dc.date.updated2022-02-23T15:34:07Z
dc.description.abstractMemory is an important component of effective learning systems and is crucial in non-Markovian as well as partially observable environments. In recent years, Long Short-Term Memory (LSTM) networks have been the dominant mechanism for providing memory in reinforcement learning, however, the success of transformers in natural language processing tasks has highlighted a promising and viable alternative. Memory in reinforcement learning is particularly difficult as rewards are often sparse and distributed over many time steps. Early research into transformers as memory mechanisms for reinforcement learning indicated that the canonical model is not suitable, and that additional gated recurrent units and architectural modifications are necessary to stabilize these models. Several additional improvements to the canonical model have further extended its capabilities, such as increasing the attention span, dynamically selecting the number of per-symbol processing steps and accelerating convergence. It remains unclear, however, whether combining these improvements could provide meaningful performance gains overall. This dissertation examines several extensions to the canonical Transformer as memory mechanisms in reinforcement learning and empirically studies their combination, which we term the Integrated Transformer. Our findings support prior work that suggests gating variants of the Transformer architecture may outperform LSTMs as memory networks in reinforcement learning. However, our results indicate that while gated variants of the Transformer architecture may be able to model dependencies over a longer temporal horizon, these models do not necessarily outperform LSTMs when tasked with retaining increasing quantities of information.
dc.identifier.apacitationMakkink, T. (2021). <i>Evaluating transformers as memory systems in reinforcement learning</i>. (). ,Faculty of Science ,Department of Mathematics and Applied Mathematics. Retrieved from http://hdl.handle.net/11427/35840en_ZA
dc.identifier.chicagocitationMakkink, Thomas. <i>"Evaluating transformers as memory systems in reinforcement learning."</i> ., ,Faculty of Science ,Department of Mathematics and Applied Mathematics, 2021. http://hdl.handle.net/11427/35840en_ZA
dc.identifier.citationMakkink, T. 2021. Evaluating transformers as memory systems in reinforcement learning. . ,Faculty of Science ,Department of Mathematics and Applied Mathematics. http://hdl.handle.net/11427/35840en_ZA
dc.identifier.ris TY - Master Thesis AU - Makkink, Thomas AB - Memory is an important component of effective learning systems and is crucial in non-Markovian as well as partially observable environments. In recent years, Long Short-Term Memory (LSTM) networks have been the dominant mechanism for providing memory in reinforcement learning, however, the success of transformers in natural language processing tasks has highlighted a promising and viable alternative. Memory in reinforcement learning is particularly difficult as rewards are often sparse and distributed over many time steps. Early research into transformers as memory mechanisms for reinforcement learning indicated that the canonical model is not suitable, and that additional gated recurrent units and architectural modifications are necessary to stabilize these models. Several additional improvements to the canonical model have further extended its capabilities, such as increasing the attention span, dynamically selecting the number of per-symbol processing steps and accelerating convergence. It remains unclear, however, whether combining these improvements could provide meaningful performance gains overall. This dissertation examines several extensions to the canonical Transformer as memory mechanisms in reinforcement learning and empirically studies their combination, which we term the Integrated Transformer. Our findings support prior work that suggests gating variants of the Transformer architecture may outperform LSTMs as memory networks in reinforcement learning. However, our results indicate that while gated variants of the Transformer architecture may be able to model dependencies over a longer temporal horizon, these models do not necessarily outperform LSTMs when tasked with retaining increasing quantities of information. DA - 2021 DB - OpenUCT DP - University of Cape Town KW - Mathematics and Applied Mathematics LK - https://open.uct.ac.za PY - 2021 T1 - Evaluating transformers as memory systems in reinforcement learning TI - Evaluating transformers as memory systems in reinforcement learning UR - http://hdl.handle.net/11427/35840 ER - en_ZA
dc.identifier.urihttp://hdl.handle.net/11427/35840
dc.identifier.vancouvercitationMakkink T. Evaluating transformers as memory systems in reinforcement learning. []. ,Faculty of Science ,Department of Mathematics and Applied Mathematics, 2021 [cited yyyy month dd]. Available from: http://hdl.handle.net/11427/35840en_ZA
dc.language.rfc3066eng
dc.publisher.departmentDepartment of Mathematics and Applied Mathematics
dc.publisher.facultyFaculty of Science
dc.subjectMathematics and Applied Mathematics
dc.titleEvaluating transformers as memory systems in reinforcement learning
dc.typeMaster Thesis
dc.type.qualificationlevelMasters
dc.type.qualificationlevelMSc
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
thesis_sci_2021_makkink thomas.pdf
Size:
4.92 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
0 B
Format:
Item-specific license agreed upon to submission
Description:
Collections