Learning to Coordinate Efficiently through Multiagent Soft Q-Learning in the presence of Game-Theoretic Pathologies

Danisa, Siphelele

Learning to Coordinate Efficiently through Multiagent Soft Q-Learning in the presence of Game-Theoretic Pathologies

dc.contributor.advisor	Shock, Jonathan
dc.contributor.advisor	Pretorius, Arnu
dc.contributor.author	Danisa, Siphelele
dc.date.accessioned	2023-03-02T08:14:43Z
dc.date.available	2023-03-02T08:14:43Z
dc.date.issued	2022
dc.date.updated	2023-02-20T12:31:33Z
dc.description.abstract	In this work we investigate the convergence of multiagent soft Q-learning in continuous games where learning is most likely to be affected by relative overgeneralisation. While this will occur more often in multiagent independent learner problems, it is present in joint-learner problems when information is not used efficiently in the learning process. We first investigate the effect of different samplers and modern strategies of training and evaluating energy-based models on learning to get a sense of whether the pitfall is due to sampling inefficiencies or underlying assumptions of the multiagent soft Q-learning extension (MASQL). We use the word sampler to refer to mechanisms that allow one to get samples from a given (target) distribution. After having understood this pitfall better, we develop opponent modelling approaches with mutual information regularisation. We find that while the former (the use of efficient samplers) is not as helpful as one would wish, the latter (opponent modelling with mutual information regularisation) offers new insights into the required mechanism to solve our problem. The domain in which we work is called the Max of Two Quadratics differential game where two agents need to coordinate in a non-convex landscape, and where learning is impacted by the mentioned pathology, relative overgeneralisation. We close this research investigation by offering a principled prescription on how to best extend single-agent energy-based approaches to multiple agents, which is a novel direction.
dc.identifier.apacitation	Danisa, S. (2022). <i>Learning to Coordinate Efficiently through Multiagent Soft Q-Learning in the presence of Game-Theoretic Pathologies</i>. (). ,Faculty of Science ,Department of Mathematics and Applied Mathematics. Retrieved from http://hdl.handle.net/11427/37110	en_ZA
dc.identifier.chicagocitation	Danisa, Siphelele. <i>"Learning to Coordinate Efficiently through Multiagent Soft Q-Learning in the presence of Game-Theoretic Pathologies."</i> ., ,Faculty of Science ,Department of Mathematics and Applied Mathematics, 2022. http://hdl.handle.net/11427/37110	en_ZA
dc.identifier.citation	Danisa, S. 2022. Learning to Coordinate Efficiently through Multiagent Soft Q-Learning in the presence of Game-Theoretic Pathologies. . ,Faculty of Science ,Department of Mathematics and Applied Mathematics. http://hdl.handle.net/11427/37110	en_ZA
dc.identifier.ris	TY - Master Thesis AU - Danisa, Siphelele AB - In this work we investigate the convergence of multiagent soft Q-learning in continuous games where learning is most likely to be affected by relative overgeneralisation. While this will occur more often in multiagent independent learner problems, it is present in joint-learner problems when information is not used efficiently in the learning process. We first investigate the effect of different samplers and modern strategies of training and evaluating energy-based models on learning to get a sense of whether the pitfall is due to sampling inefficiencies or underlying assumptions of the multiagent soft Q-learning extension (MASQL). We use the word sampler to refer to mechanisms that allow one to get samples from a given (target) distribution. After having understood this pitfall better, we develop opponent modelling approaches with mutual information regularisation. We find that while the former (the use of efficient samplers) is not as helpful as one would wish, the latter (opponent modelling with mutual information regularisation) offers new insights into the required mechanism to solve our problem. The domain in which we work is called the Max of Two Quadratics differential game where two agents need to coordinate in a non-convex landscape, and where learning is impacted by the mentioned pathology, relative overgeneralisation. We close this research investigation by offering a principled prescription on how to best extend single-agent energy-based approaches to multiple agents, which is a novel direction. DA - 2022_ DB - OpenUCT DP - University of Cape Town KW - Pure and Applied Mathematics LK - https://open.uct.ac.za PY - 2022 T1 - Learning to Coordinate Efficiently through Multiagent Soft Q-Learning in the presence of Game-Theoretic Pathologies TI - Learning to Coordinate Efficiently through Multiagent Soft Q-Learning in the presence of Game-Theoretic Pathologies UR - http://hdl.handle.net/11427/37110 ER -	en_ZA
dc.identifier.uri	http://hdl.handle.net/11427/37110
dc.identifier.vancouvercitation	Danisa S. Learning to Coordinate Efficiently through Multiagent Soft Q-Learning in the presence of Game-Theoretic Pathologies. []. ,Faculty of Science ,Department of Mathematics and Applied Mathematics, 2022 [cited yyyy month dd]. Available from: http://hdl.handle.net/11427/37110	en_ZA
dc.language.rfc3066	eng
dc.publisher.department	Department of Mathematics and Applied Mathematics
dc.publisher.faculty	Faculty of Science
dc.subject	Pure and Applied Mathematics
dc.title	Learning to Coordinate Efficiently through Multiagent Soft Q-Learning in the presence of Game-Theoretic Pathologies
dc.type	Master Thesis
dc.type.qualificationlevel	Masters
dc.type.qualificationlevel	MSc

Files

Original bundle

Now showing 1 - 1 of 1

Name:: thesis_sci_2022_danisa siphelele.pdf
Size:: 2.54 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 0 B
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Masters