Self-attention policy architectures for reinforcement learning under partial observability
| dc.contributor.advisor | Shock, Jonathan | |
| dc.contributor.author | Du Plessis, Jeremy | |
| dc.date.accessioned | 2025-08-13T12:59:08Z | |
| dc.date.available | 2025-08-13T12:59:08Z | |
| dc.date.issued | 2025 | |
| dc.date.updated | 2025-08-07T09:06:32Z | |
| dc.description.abstract | Intermittent unavailability of sensory signals due to sensor failure and/or latency is a problem encountered in production environments such as in large manufacturing plants, for example. Deep reinforcement learning offers a natural solution for process control and optimisation in such environments. However, a shortcom-ing of conventional agent policy architectures in this instance is an inability to handle variable-sized inputs composed of available sensory signals, thus requiring the imputation of unavailable sensory signals with data which necessarily constitutes noise. We explore self-attention-based policy architectures as a solution to this problem, demonstrating their robustness under conditions of high partial observability on different rein-forcement learning benchmark tasks, and explore the advantages and disadvantages offered by our solution over conventional policy architectures. Additionally, we propose a novel hard attention mechanism, used in conjunction with our proposed policy architecture, enabling the agent to attend to the most salient sensory signals and allowing for greater interpretability of the agent's decision-making. | |
| dc.identifier.apacitation | Du Plessis, J. (2025). <i>Self-attention policy architectures for reinforcement learning under partial observability</i>. (). University of Cape Town ,Faculty of Science ,Department of Mathematics and Applied Mathematics. Retrieved from http://hdl.handle.net/11427/41574 | en_ZA |
| dc.identifier.chicagocitation | Du Plessis, Jeremy. <i>"Self-attention policy architectures for reinforcement learning under partial observability."</i> ., University of Cape Town ,Faculty of Science ,Department of Mathematics and Applied Mathematics, 2025. http://hdl.handle.net/11427/41574 | en_ZA |
| dc.identifier.citation | Du Plessis, J. 2025. Self-attention policy architectures for reinforcement learning under partial observability. . University of Cape Town ,Faculty of Science ,Department of Mathematics and Applied Mathematics. http://hdl.handle.net/11427/41574 | en_ZA |
| dc.identifier.ris | TY - Thesis / Dissertation AU - Du Plessis, Jeremy AB - Intermittent unavailability of sensory signals due to sensor failure and/or latency is a problem encountered in production environments such as in large manufacturing plants, for example. Deep reinforcement learning offers a natural solution for process control and optimisation in such environments. However, a shortcom-ing of conventional agent policy architectures in this instance is an inability to handle variable-sized inputs composed of available sensory signals, thus requiring the imputation of unavailable sensory signals with data which necessarily constitutes noise. We explore self-attention-based policy architectures as a solution to this problem, demonstrating their robustness under conditions of high partial observability on different rein-forcement learning benchmark tasks, and explore the advantages and disadvantages offered by our solution over conventional policy architectures. Additionally, we propose a novel hard attention mechanism, used in conjunction with our proposed policy architecture, enabling the agent to attend to the most salient sensory signals and allowing for greater interpretability of the agent's decision-making. DA - 2025 DB - OpenUCT DP - University of Cape Town KW - Self-attention LK - https://open.uct.ac.za PB - University of Cape Town PY - 2025 T1 - Self-attention policy architectures for reinforcement learning under partial observability TI - Self-attention policy architectures for reinforcement learning under partial observability UR - http://hdl.handle.net/11427/41574 ER - | en_ZA |
| dc.identifier.uri | http://hdl.handle.net/11427/41574 | |
| dc.identifier.vancouvercitation | Du Plessis J. Self-attention policy architectures for reinforcement learning under partial observability. []. University of Cape Town ,Faculty of Science ,Department of Mathematics and Applied Mathematics, 2025 [cited yyyy month dd]. Available from: http://hdl.handle.net/11427/41574 | en_ZA |
| dc.language.iso | en | |
| dc.language.rfc3066 | eng | |
| dc.publisher.department | Department of Mathematics and Applied Mathematics | |
| dc.publisher.faculty | Faculty of Science | |
| dc.publisher.institution | University of Cape Town | |
| dc.subject | Self-attention | |
| dc.title | Self-attention policy architectures for reinforcement learning under partial observability | |
| dc.type | Thesis / Dissertation | |
| dc.type.qualificationlevel | Masters | |
| dc.type.qualificationlevel | MSc |