Self-attention policy architectures for reinforcement learning under partial observability

Du Plessis, Jeremy

Self-attention policy architectures for reinforcement learning under partial observability

dc.contributor.advisor	Shock, Jonathan
dc.contributor.author	Du Plessis, Jeremy
dc.date.accessioned	2025-08-13T12:59:08Z
dc.date.available	2025-08-13T12:59:08Z
dc.date.issued	2025
dc.date.updated	2025-08-07T09:06:32Z
dc.description.abstract	Intermittent unavailability of sensory signals due to sensor failure and/or latency is a problem encountered in production environments such as in large manufacturing plants, for example. Deep reinforcement learning offers a natural solution for process control and optimisation in such environments. However, a shortcom-ing of conventional agent policy architectures in this instance is an inability to handle variable-sized inputs composed of available sensory signals, thus requiring the imputation of unavailable sensory signals with data which necessarily constitutes noise. We explore self-attention-based policy architectures as a solution to this problem, demonstrating their robustness under conditions of high partial observability on different rein-forcement learning benchmark tasks, and explore the advantages and disadvantages offered by our solution over conventional policy architectures. Additionally, we propose a novel hard attention mechanism, used in conjunction with our proposed policy architecture, enabling the agent to attend to the most salient sensory signals and allowing for greater interpretability of the agent's decision-making.
dc.identifier.apacitation	Du Plessis, J. (2025). <i>Self-attention policy architectures for reinforcement learning under partial observability</i>. (). University of Cape Town ,Faculty of Science ,Department of Mathematics and Applied Mathematics. Retrieved from http://hdl.handle.net/11427/41574	en_ZA
dc.identifier.chicagocitation	Du Plessis, Jeremy. <i>"Self-attention policy architectures for reinforcement learning under partial observability."</i> ., University of Cape Town ,Faculty of Science ,Department of Mathematics and Applied Mathematics, 2025. http://hdl.handle.net/11427/41574	en_ZA
dc.identifier.citation	Du Plessis, J. 2025. Self-attention policy architectures for reinforcement learning under partial observability. . University of Cape Town ,Faculty of Science ,Department of Mathematics and Applied Mathematics. http://hdl.handle.net/11427/41574	en_ZA
dc.identifier.ris	TY - Thesis / Dissertation AU - Du Plessis, Jeremy AB - Intermittent unavailability of sensory signals due to sensor failure and/or latency is a problem encountered in production environments such as in large manufacturing plants, for example. Deep reinforcement learning offers a natural solution for process control and optimisation in such environments. However, a shortcom-ing of conventional agent policy architectures in this instance is an inability to handle variable-sized inputs composed of available sensory signals, thus requiring the imputation of unavailable sensory signals with data which necessarily constitutes noise. We explore self-attention-based policy architectures as a solution to this problem, demonstrating their robustness under conditions of high partial observability on different rein-forcement learning benchmark tasks, and explore the advantages and disadvantages offered by our solution over conventional policy architectures. Additionally, we propose a novel hard attention mechanism, used in conjunction with our proposed policy architecture, enabling the agent to attend to the most salient sensory signals and allowing for greater interpretability of the agent's decision-making. DA - 2025 DB - OpenUCT DP - University of Cape Town KW - Self-attention LK - https://open.uct.ac.za PB - University of Cape Town PY - 2025 T1 - Self-attention policy architectures for reinforcement learning under partial observability TI - Self-attention policy architectures for reinforcement learning under partial observability UR - http://hdl.handle.net/11427/41574 ER -	en_ZA
dc.identifier.uri	http://hdl.handle.net/11427/41574
dc.identifier.vancouvercitation	Du Plessis J. Self-attention policy architectures for reinforcement learning under partial observability. []. University of Cape Town ,Faculty of Science ,Department of Mathematics and Applied Mathematics, 2025 [cited yyyy month dd]. Available from: http://hdl.handle.net/11427/41574	en_ZA
dc.language.iso	en
dc.language.rfc3066	eng
dc.publisher.department	Department of Mathematics and Applied Mathematics
dc.publisher.faculty	Faculty of Science
dc.publisher.institution	University of Cape Town
dc.subject	Self-attention
dc.title	Self-attention policy architectures for reinforcement learning under partial observability
dc.type	Thesis / Dissertation
dc.type.qualificationlevel	Masters
dc.type.qualificationlevel	MSc

Files

Original bundle

Now showing 1 - 1 of 1

Name:: thesis_sci_2025_du plessis jeremy.pdf
Size:: 16.34 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.72 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Masters