Meta-learning adaptive intrinsic reward weighting for curiosity-driven reinforcement learning

Ziki, Batsirayi Mupamhi

Meta-learning adaptive intrinsic reward weighting for curiosity-driven reinforcement learning

dc.contributor.advisor	Shock, Jonathan
dc.contributor.advisor	Smit, Andries
dc.contributor.author	Ziki, Batsirayi Mupamhi
dc.date.accessioned	2026-07-01T08:29:54Z
dc.date.available	2026-07-01T08:29:54Z
dc.date.issued	2026
dc.date.updated	2026-07-01T08:27:30Z
dc.description.abstract	For both organisms and artificial agents, exploration is essential to continue learning and avoid becoming trapped in suboptimal behaviours. Reinforcement learning (RL) agents can also face exploration challenges in environments with sparse feedback. Curiosity-driven exploration algorithms can help address these challenges by providing intrinsic rewards based on the novelty of situations an agent encounters. These intrinsic rewards are typically combined with extrinsic rewards using a weighted sum with the parameter λ. However, fine-tuning λ for each task across multiple environments can become computationally expensive. We propose a meta-learning approach for automatic tuning of λ using a recurrent neural network (RNN) that dynamically outputs λ values. We call this RNN the reward combiner. The reward combiner was trained using evolutionary strategies on XLand-MiniGrid environments, where feedback is sparse. The fitness function was the total extrinsic reward obtained during the training phase of an agent. We used BYOL-Explore, a curiosity-driven exploration algorithm, for intrinsic reward generation. The reward combiner takes normalised extrinsic and intrinsic rewards as input, along with actions that provide task-specific context for λ selection. Trained on Unlock and Empty-16x16 environments, the reward combiner generalises across different grid sizes of the same task, outperforming baselines when tested on DoorKey environments. It also generalises across different tasks when tested on UnlockPickUp, where the objective differs from the training environments. Our approach achieves higher extrinsic returns at the end of training than curiosity-driven baselines across all test environments. Despite being tested only within XLand-MiniGrid environments, our results indicate this approach has potential to eliminate costly hyperparameter sweeps when switching to new tasks with similar mechanics.
dc.identifier.apacitation	Ziki, B. M. (2026). <i>Meta-learning adaptive intrinsic reward weighting for curiosity-driven reinforcement learning</i>. (). University of Cape Town ,Faculty of Science ,Department of Mathematics and Applied Mathematics. Retrieved from http://hdl.handle.net/11427/43437	en_ZA
dc.identifier.chicagocitation	Ziki, Batsirayi Mupamhi. <i>"Meta-learning adaptive intrinsic reward weighting for curiosity-driven reinforcement learning."</i> ., University of Cape Town ,Faculty of Science ,Department of Mathematics and Applied Mathematics, 2026. http://hdl.handle.net/11427/43437	en_ZA
dc.identifier.citation	Ziki, B.M. 2026. Meta-learning adaptive intrinsic reward weighting for curiosity-driven reinforcement learning. . University of Cape Town ,Faculty of Science ,Department of Mathematics and Applied Mathematics. http://hdl.handle.net/11427/43437	en_ZA
dc.identifier.ris	TY - Thesis / Dissertation AU - Ziki, Batsirayi Mupamhi AB - For both organisms and artificial agents, exploration is essential to continue learning and avoid becoming trapped in suboptimal behaviours. Reinforcement learning (RL) agents can also face exploration challenges in environments with sparse feedback. Curiosity-driven exploration algorithms can help address these challenges by providing intrinsic rewards based on the novelty of situations an agent encounters. These intrinsic rewards are typically combined with extrinsic rewards using a weighted sum with the parameter λ. However, fine-tuning λ for each task across multiple environments can become computationally expensive. We propose a meta-learning approach for automatic tuning of λ using a recurrent neural network (RNN) that dynamically outputs λ values. We call this RNN the reward combiner. The reward combiner was trained using evolutionary strategies on XLand-MiniGrid environments, where feedback is sparse. The fitness function was the total extrinsic reward obtained during the training phase of an agent. We used BYOL-Explore, a curiosity-driven exploration algorithm, for intrinsic reward generation. The reward combiner takes normalised extrinsic and intrinsic rewards as input, along with actions that provide task-specific context for λ selection. Trained on Unlock and Empty-16x16 environments, the reward combiner generalises across different grid sizes of the same task, outperforming baselines when tested on DoorKey environments. It also generalises across different tasks when tested on UnlockPickUp, where the objective differs from the training environments. Our approach achieves higher extrinsic returns at the end of training than curiosity-driven baselines across all test environments. Despite being tested only within XLand-MiniGrid environments, our results indicate this approach has potential to eliminate costly hyperparameter sweeps when switching to new tasks with similar mechanics. DA - 2026 DB - OpenUCT DP - University of Cape Town KW - reinforcement learning KW - exploration algorithms LK - https://open.uct.ac.za PB - University of Cape Town PY - 2026 T1 - Meta-learning adaptive intrinsic reward weighting for curiosity-driven reinforcement learning TI - Meta-learning adaptive intrinsic reward weighting for curiosity-driven reinforcement learning UR - http://hdl.handle.net/11427/43437 ER -	en_ZA
dc.identifier.uri	http://hdl.handle.net/11427/43437
dc.identifier.vancouvercitation	Ziki BM. Meta-learning adaptive intrinsic reward weighting for curiosity-driven reinforcement learning. []. University of Cape Town ,Faculty of Science ,Department of Mathematics and Applied Mathematics, 2026 [cited yyyy month dd]. Available from: http://hdl.handle.net/11427/43437	en_ZA
dc.language.iso	en
dc.language.rfc3066	eng
dc.publisher.department	Department of Mathematics and Applied Mathematics
dc.publisher.faculty	Faculty of Science
dc.publisher.institution	University of Cape Town
dc.subject	reinforcement learning
dc.subject	exploration algorithms
dc.title	Meta-learning adaptive intrinsic reward weighting for curiosity-driven reinforcement learning
dc.type	Thesis / Dissertation
dc.type.qualificationlevel	Masters
dc.type.qualificationlevel	MSc

Files

Original bundle

Now showing 1 - 1 of 1

Name:: thesis_sci_2026_ziki batsirayi mupamhi.pdf
Size:: 3.93 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.72 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Masters