Self-attention policy architectures for reinforcement learning under partial observability

dc.contributor.advisorShock, Jonathan
dc.contributor.authorDu Plessis, Jeremy
dc.date.accessioned2025-08-13T12:59:08Z
dc.date.available2025-08-13T12:59:08Z
dc.date.issued2025
dc.date.updated2025-08-07T09:06:32Z
dc.description.abstractIntermittent unavailability of sensory signals due to sensor failure and/or latency is a problem encountered in production environments such as in large manufacturing plants, for example. Deep reinforcement learning offers a natural solution for process control and optimisation in such environments. However, a shortcom-ing of conventional agent policy architectures in this instance is an inability to handle variable-sized inputs composed of available sensory signals, thus requiring the imputation of unavailable sensory signals with data which necessarily constitutes noise. We explore self-attention-based policy architectures as a solution to this problem, demonstrating their robustness under conditions of high partial observability on different rein-forcement learning benchmark tasks, and explore the advantages and disadvantages offered by our solution over conventional policy architectures. Additionally, we propose a novel hard attention mechanism, used in conjunction with our proposed policy architecture, enabling the agent to attend to the most salient sensory signals and allowing for greater interpretability of the agent's decision-making.
dc.identifier.apacitationDu Plessis, J. (2025). <i>Self-attention policy architectures for reinforcement learning under partial observability</i>. (). University of Cape Town ,Faculty of Science ,Department of Mathematics and Applied Mathematics. Retrieved from http://hdl.handle.net/11427/41574en_ZA
dc.identifier.chicagocitationDu Plessis, Jeremy. <i>"Self-attention policy architectures for reinforcement learning under partial observability."</i> ., University of Cape Town ,Faculty of Science ,Department of Mathematics and Applied Mathematics, 2025. http://hdl.handle.net/11427/41574en_ZA
dc.identifier.citationDu Plessis, J. 2025. Self-attention policy architectures for reinforcement learning under partial observability. . University of Cape Town ,Faculty of Science ,Department of Mathematics and Applied Mathematics. http://hdl.handle.net/11427/41574en_ZA
dc.identifier.ris TY - Thesis / Dissertation AU - Du Plessis, Jeremy AB - Intermittent unavailability of sensory signals due to sensor failure and/or latency is a problem encountered in production environments such as in large manufacturing plants, for example. Deep reinforcement learning offers a natural solution for process control and optimisation in such environments. However, a shortcom-ing of conventional agent policy architectures in this instance is an inability to handle variable-sized inputs composed of available sensory signals, thus requiring the imputation of unavailable sensory signals with data which necessarily constitutes noise. We explore self-attention-based policy architectures as a solution to this problem, demonstrating their robustness under conditions of high partial observability on different rein-forcement learning benchmark tasks, and explore the advantages and disadvantages offered by our solution over conventional policy architectures. Additionally, we propose a novel hard attention mechanism, used in conjunction with our proposed policy architecture, enabling the agent to attend to the most salient sensory signals and allowing for greater interpretability of the agent's decision-making. DA - 2025 DB - OpenUCT DP - University of Cape Town KW - Self-attention LK - https://open.uct.ac.za PB - University of Cape Town PY - 2025 T1 - Self-attention policy architectures for reinforcement learning under partial observability TI - Self-attention policy architectures for reinforcement learning under partial observability UR - http://hdl.handle.net/11427/41574 ER - en_ZA
dc.identifier.urihttp://hdl.handle.net/11427/41574
dc.identifier.vancouvercitationDu Plessis J. Self-attention policy architectures for reinforcement learning under partial observability. []. University of Cape Town ,Faculty of Science ,Department of Mathematics and Applied Mathematics, 2025 [cited yyyy month dd]. Available from: http://hdl.handle.net/11427/41574en_ZA
dc.language.isoen
dc.language.rfc3066eng
dc.publisher.departmentDepartment of Mathematics and Applied Mathematics
dc.publisher.facultyFaculty of Science
dc.publisher.institutionUniversity of Cape Town
dc.subjectSelf-attention
dc.titleSelf-attention policy architectures for reinforcement learning under partial observability
dc.typeThesis / Dissertation
dc.type.qualificationlevelMasters
dc.type.qualificationlevelMSc
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
thesis_sci_2025_du plessis jeremy.pdf
Size:
16.34 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.72 KB
Format:
Item-specific license agreed upon to submission
Description:
Collections