Addressing deep reinforcement learning: empirical algorithm performance evaluations∗

dc.contributor.advisorShock, Jonathan
dc.contributor.authorDubb, Roland
dc.date.accessioned2025-09-01T19:02:53Z
dc.date.available2025-09-01T19:02:53Z
dc.date.issued2025
dc.date.updated2025-09-01T18:59:05Z
dc.description.abstractDue to the rapidly paced production of deep reinforcement learning (RL) research papers, some recent publications have begun to critique the manner in which RL algorithm performances are evaluated. Building on this recent scrutiny, our work attempts to identify the precise aspects of empirical deep RL algorithm performance evaluations that need attention for improvement. This dissertation begins by briefly introducing the RL problem. Thereafter, we review the literature and discuss recent scrutiny of various aspects of deep RL algorithm performance evaluations. Specifically, we discuss the following aspects: (i) the choice of RL environment, (ii) the measurement of uncertainty, (iii) the collection of data, and (iv) the aggregation of that data. From this discussion, we identify two particular problems with RL evaluations, namely the non-linear scaling of algorithm performance scores with the level of skill achieved by that particular algorithm, and the (potentially) biased weighting of scores in the data aggregation process, across RL environments. As multi-agent RL (MARL) presents a recently popular research paradigm whose evaluation procedures have not yet been carefully scrutinised in the literature, we analyse a dataset by Gorsane et al. [1] which documents the evaluation methodologies of many recent deep cooperative MARL publications. This analysis, which reveals several flawed aspects about MARL evaluation, along with the reviewed RL evaluation issues from the literature, motivates for an attempt at constructing an improved RL algorithm empirical performance evaluation guideline. Multi-criteria decision analysis (MCDA) is discussed as a potential framework that offers a data aggregation procedure that resolves the two aforementioned problems with RL evaluations. Combining the use of MCDA with our insights from the literature, we propose an improved guideline for deep RL empirical algorithm performance evaluations. This is contrasted with another proposed guideline by Gorsane et al. [1] before a proof-of-concept test is conducted. Overall, we aim to move toward the better evaluation of RL algorithms and contribute toward an increased sensitivity to a lack of scientific rigour [2, 3] in the field of machine learning.
dc.identifier.apacitationDubb, R. (2025). <i>Addressing deep reinforcement learning: empirical algorithm performance evaluations∗</i>. (). University of Cape Town ,Faculty of Science ,Department of Mathematics and Applied Mathematics. Retrieved from http://hdl.handle.net/11427/41669en_ZA
dc.identifier.chicagocitationDubb, Roland. <i>"Addressing deep reinforcement learning: empirical algorithm performance evaluations∗."</i> ., University of Cape Town ,Faculty of Science ,Department of Mathematics and Applied Mathematics, 2025. http://hdl.handle.net/11427/41669en_ZA
dc.identifier.citationDubb, R. 2025. Addressing deep reinforcement learning: empirical algorithm performance evaluations∗. . University of Cape Town ,Faculty of Science ,Department of Mathematics and Applied Mathematics. http://hdl.handle.net/11427/41669en_ZA
dc.identifier.ris TY - Thesis / Dissertation AU - Dubb, Roland AB - Due to the rapidly paced production of deep reinforcement learning (RL) research papers, some recent publications have begun to critique the manner in which RL algorithm performances are evaluated. Building on this recent scrutiny, our work attempts to identify the precise aspects of empirical deep RL algorithm performance evaluations that need attention for improvement. This dissertation begins by briefly introducing the RL problem. Thereafter, we review the literature and discuss recent scrutiny of various aspects of deep RL algorithm performance evaluations. Specifically, we discuss the following aspects: (i) the choice of RL environment, (ii) the measurement of uncertainty, (iii) the collection of data, and (iv) the aggregation of that data. From this discussion, we identify two particular problems with RL evaluations, namely the non-linear scaling of algorithm performance scores with the level of skill achieved by that particular algorithm, and the (potentially) biased weighting of scores in the data aggregation process, across RL environments. As multi-agent RL (MARL) presents a recently popular research paradigm whose evaluation procedures have not yet been carefully scrutinised in the literature, we analyse a dataset by Gorsane et al. [1] which documents the evaluation methodologies of many recent deep cooperative MARL publications. This analysis, which reveals several flawed aspects about MARL evaluation, along with the reviewed RL evaluation issues from the literature, motivates for an attempt at constructing an improved RL algorithm empirical performance evaluation guideline. Multi-criteria decision analysis (MCDA) is discussed as a potential framework that offers a data aggregation procedure that resolves the two aforementioned problems with RL evaluations. Combining the use of MCDA with our insights from the literature, we propose an improved guideline for deep RL empirical algorithm performance evaluations. This is contrasted with another proposed guideline by Gorsane et al. [1] before a proof-of-concept test is conducted. Overall, we aim to move toward the better evaluation of RL algorithms and contribute toward an increased sensitivity to a lack of scientific rigour [2, 3] in the field of machine learning. DA - 2025 DB - OpenUCT DP - University of Cape Town KW - Applied Mathematics LK - https://open.uct.ac.za PB - University of Cape Town PY - 2025 T1 - Addressing deep reinforcement learning: empirical algorithm performance evaluations∗ TI - Addressing deep reinforcement learning: empirical algorithm performance evaluations∗ UR - http://hdl.handle.net/11427/41669 ER - en_ZA
dc.identifier.urihttp://hdl.handle.net/11427/41669
dc.identifier.vancouvercitationDubb R. Addressing deep reinforcement learning: empirical algorithm performance evaluations∗. []. University of Cape Town ,Faculty of Science ,Department of Mathematics and Applied Mathematics, 2025 [cited yyyy month dd]. Available from: http://hdl.handle.net/11427/41669en_ZA
dc.language.isoen
dc.language.rfc3066eng
dc.publisher.departmentDepartment of Mathematics and Applied Mathematics
dc.publisher.facultyFaculty of Science
dc.publisher.institutionUniversity of Cape Town
dc.subjectApplied Mathematics
dc.titleAddressing deep reinforcement learning: empirical algorithm performance evaluations∗
dc.typeThesis / Dissertation
dc.type.qualificationlevelMasters
dc.type.qualificationlevelMSc
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
thesis_sci_2025_dubb roland.pdf
Size:
6.47 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.72 KB
Format:
Item-specific license agreed upon to submission
Description:
Collections