Addressing deep reinforcement learning: empirical algorithm performance evaluations∗

Dubb, Roland

Addressing deep reinforcement learning: empirical algorithm performance evaluations∗

dc.contributor.advisor	Shock, Jonathan
dc.contributor.author	Dubb, Roland
dc.date.accessioned	2025-09-01T19:02:53Z
dc.date.available	2025-09-01T19:02:53Z
dc.date.issued	2025
dc.date.updated	2025-09-01T18:59:05Z
dc.description.abstract	Due to the rapidly paced production of deep reinforcement learning (RL) research papers, some recent publications have begun to critique the manner in which RL algorithm performances are evaluated. Building on this recent scrutiny, our work attempts to identify the precise aspects of empirical deep RL algorithm performance evaluations that need attention for improvement. This dissertation begins by briefly introducing the RL problem. Thereafter, we review the literature and discuss recent scrutiny of various aspects of deep RL algorithm performance evaluations. Specifically, we discuss the following aspects: (i) the choice of RL environment, (ii) the measurement of uncertainty, (iii) the collection of data, and (iv) the aggregation of that data. From this discussion, we identify two particular problems with RL evaluations, namely the non-linear scaling of algorithm performance scores with the level of skill achieved by that particular algorithm, and the (potentially) biased weighting of scores in the data aggregation process, across RL environments. As multi-agent RL (MARL) presents a recently popular research paradigm whose evaluation procedures have not yet been carefully scrutinised in the literature, we analyse a dataset by Gorsane et al. [1] which documents the evaluation methodologies of many recent deep cooperative MARL publications. This analysis, which reveals several flawed aspects about MARL evaluation, along with the reviewed RL evaluation issues from the literature, motivates for an attempt at constructing an improved RL algorithm empirical performance evaluation guideline. Multi-criteria decision analysis (MCDA) is discussed as a potential framework that offers a data aggregation procedure that resolves the two aforementioned problems with RL evaluations. Combining the use of MCDA with our insights from the literature, we propose an improved guideline for deep RL empirical algorithm performance evaluations. This is contrasted with another proposed guideline by Gorsane et al. [1] before a proof-of-concept test is conducted. Overall, we aim to move toward the better evaluation of RL algorithms and contribute toward an increased sensitivity to a lack of scientific rigour [2, 3] in the field of machine learning.
dc.identifier.apacitation	Dubb, R. (2025). <i>Addressing deep reinforcement learning: empirical algorithm performance evaluations∗</i>. (). University of Cape Town ,Faculty of Science ,Department of Mathematics and Applied Mathematics. Retrieved from http://hdl.handle.net/11427/41669	en_ZA
dc.identifier.chicagocitation	Dubb, Roland. <i>"Addressing deep reinforcement learning: empirical algorithm performance evaluations∗."</i> ., University of Cape Town ,Faculty of Science ,Department of Mathematics and Applied Mathematics, 2025. http://hdl.handle.net/11427/41669	en_ZA
dc.identifier.citation	Dubb, R. 2025. Addressing deep reinforcement learning: empirical algorithm performance evaluations∗. . University of Cape Town ,Faculty of Science ,Department of Mathematics and Applied Mathematics. http://hdl.handle.net/11427/41669	en_ZA
dc.identifier.ris	TY - Thesis / Dissertation AU - Dubb, Roland AB - Due to the rapidly paced production of deep reinforcement learning (RL) research papers, some recent publications have begun to critique the manner in which RL algorithm performances are evaluated. Building on this recent scrutiny, our work attempts to identify the precise aspects of empirical deep RL algorithm performance evaluations that need attention for improvement. This dissertation begins by briefly introducing the RL problem. Thereafter, we review the literature and discuss recent scrutiny of various aspects of deep RL algorithm performance evaluations. Specifically, we discuss the following aspects: (i) the choice of RL environment, (ii) the measurement of uncertainty, (iii) the collection of data, and (iv) the aggregation of that data. From this discussion, we identify two particular problems with RL evaluations, namely the non-linear scaling of algorithm performance scores with the level of skill achieved by that particular algorithm, and the (potentially) biased weighting of scores in the data aggregation process, across RL environments. As multi-agent RL (MARL) presents a recently popular research paradigm whose evaluation procedures have not yet been carefully scrutinised in the literature, we analyse a dataset by Gorsane et al. [1] which documents the evaluation methodologies of many recent deep cooperative MARL publications. This analysis, which reveals several flawed aspects about MARL evaluation, along with the reviewed RL evaluation issues from the literature, motivates for an attempt at constructing an improved RL algorithm empirical performance evaluation guideline. Multi-criteria decision analysis (MCDA) is discussed as a potential framework that offers a data aggregation procedure that resolves the two aforementioned problems with RL evaluations. Combining the use of MCDA with our insights from the literature, we propose an improved guideline for deep RL empirical algorithm performance evaluations. This is contrasted with another proposed guideline by Gorsane et al. [1] before a proof-of-concept test is conducted. Overall, we aim to move toward the better evaluation of RL algorithms and contribute toward an increased sensitivity to a lack of scientific rigour [2, 3] in the field of machine learning. DA - 2025 DB - OpenUCT DP - University of Cape Town KW - Applied Mathematics LK - https://open.uct.ac.za PB - University of Cape Town PY - 2025 T1 - Addressing deep reinforcement learning: empirical algorithm performance evaluations∗ TI - Addressing deep reinforcement learning: empirical algorithm performance evaluations∗ UR - http://hdl.handle.net/11427/41669 ER -	en_ZA
dc.identifier.uri	http://hdl.handle.net/11427/41669
dc.identifier.vancouvercitation	Dubb R. Addressing deep reinforcement learning: empirical algorithm performance evaluations∗. []. University of Cape Town ,Faculty of Science ,Department of Mathematics and Applied Mathematics, 2025 [cited yyyy month dd]. Available from: http://hdl.handle.net/11427/41669	en_ZA
dc.language.iso	en
dc.language.rfc3066	eng
dc.publisher.department	Department of Mathematics and Applied Mathematics
dc.publisher.faculty	Faculty of Science
dc.publisher.institution	University of Cape Town
dc.subject	Applied Mathematics
dc.title	Addressing deep reinforcement learning: empirical algorithm performance evaluations∗
dc.type	Thesis / Dissertation
dc.type.qualificationlevel	Masters
dc.type.qualificationlevel	MSc

Files

Original bundle

Now showing 1 - 1 of 1

Name:: thesis_sci_2025_dubb roland.pdf
Size:: 6.47 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.72 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Masters