• English
  • ÄŒeÅ¡tina
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • LatvieÅ¡u
  • Magyar
  • Nederlands
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Log In
  • Communities & Collections
  • Browse OpenUCT
  • English
  • ÄŒeÅ¡tina
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • LatvieÅ¡u
  • Magyar
  • Nederlands
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Log In
  1. Home
  2. Browse by Author

Browsing by Author "Pretorius, Arnu"

Now showing 1 - 2 of 2
Results Per Page
Sort Options
  • Loading...
    Thumbnail Image
    Item
    Open Access
    Evaluating transformers as memory systems in reinforcement learning
    (2021) Makkink, Thomas; Shock, Jonathan; Pretorius, Arnu
    Memory is an important component of effective learning systems and is crucial in non-Markovian as well as partially observable environments. In recent years, Long Short-Term Memory (LSTM) networks have been the dominant mechanism for providing memory in reinforcement learning, however, the success of transformers in natural language processing tasks has highlighted a promising and viable alternative. Memory in reinforcement learning is particularly difficult as rewards are often sparse and distributed over many time steps. Early research into transformers as memory mechanisms for reinforcement learning indicated that the canonical model is not suitable, and that additional gated recurrent units and architectural modifications are necessary to stabilize these models. Several additional improvements to the canonical model have further extended its capabilities, such as increasing the attention span, dynamically selecting the number of per-symbol processing steps and accelerating convergence. It remains unclear, however, whether combining these improvements could provide meaningful performance gains overall. This dissertation examines several extensions to the canonical Transformer as memory mechanisms in reinforcement learning and empirically studies their combination, which we term the Integrated Transformer. Our findings support prior work that suggests gating variants of the Transformer architecture may outperform LSTMs as memory networks in reinforcement learning. However, our results indicate that while gated variants of the Transformer architecture may be able to model dependencies over a longer temporal horizon, these models do not necessarily outperform LSTMs when tasked with retaining increasing quantities of information.
  • Loading...
    Thumbnail Image
    Item
    Open Access
    Learning to Coordinate Efficiently through Multiagent Soft Q-Learning in the presence of Game-Theoretic Pathologies
    (2022) Danisa, Siphelele; Shock, Jonathan; Pretorius, Arnu
    In this work we investigate the convergence of multiagent soft Q-learning in continuous games where learning is most likely to be affected by relative overgeneralisation. While this will occur more often in multiagent independent learner problems, it is present in joint-learner problems when information is not used efficiently in the learning process. We first investigate the effect of different samplers and modern strategies of training and evaluating energy-based models on learning to get a sense of whether the pitfall is due to sampling inefficiencies or underlying assumptions of the multiagent soft Q-learning extension (MASQL). We use the word sampler to refer to mechanisms that allow one to get samples from a given (target) distribution. After having understood this pitfall better, we develop opponent modelling approaches with mutual information regularisation. We find that while the former (the use of efficient samplers) is not as helpful as one would wish, the latter (opponent modelling with mutual information regularisation) offers new insights into the required mechanism to solve our problem. The domain in which we work is called the Max of Two Quadratics differential game where two agents need to coordinate in a non-convex landscape, and where learning is impacted by the mentioned pathology, relative overgeneralisation. We close this research investigation by offering a principled prescription on how to best extend single-agent energy-based approaches to multiple agents, which is a novel direction.
UCT Libraries logo

Contact us

Jill Claassen

Manager: Scholarly Communication & Publishing

Email: openuct@uct.ac.za

+27 (0)21 650 1263

  • Open Access @ UCT

    • OpenUCT LibGuide
    • Open Access Policy
    • Open Scholarship at UCT
    • OpenUCT FAQs
  • UCT Publishing Platforms

    • UCT Open Access Journals
    • UCT Open Access Monographs
    • UCT Press Open Access Books
    • Zivahub - Open Data UCT
  • Site Usage

    • Cookie settings
    • Privacy policy
    • End User Agreement
    • Send Feedback

DSpace software copyright © 2002-2025 LYRASIS