From GNNs to sparse transformers: graph-based architectures for multi-hop question answering

Acton, Shane

From GNNs to sparse transformers: graph-based architectures for multi-hop question answering

dc.contributor.advisor	Buys, Jan
dc.contributor.author	Acton, Shane
dc.date.accessioned	2024-03-05T07:43:02Z
dc.date.available	2024-03-05T07:43:02Z
dc.date.issued	2023
dc.date.updated	2024-03-05T07:41:33Z
dc.description.abstract	Multi-hop Question Answering (MHQA) is a challenging task in NLP which typically involves processing very long sequences of context information. Sparse Transformers [7] have surpassed Graph Neural Networks (GNNs) as the state-of-the-art architecture for MHQA. Noting that the Transformer [4] is a particular message passing GNN, in this work we perform an architectural analysis and evaluation to investigate why the Transformer outperforms other GNNs on MHQA. In particular, we compare attention- and non-attentionbased GNNs, and compare the Transformer's Scaled Dot Product (SDP) attention to the Graph Attention Network [5] (GAT)'s Additive Attention [2]. We simplify existing GNNbased MHQA models and leverage this system to compare GNN architectures in a lower compute setting than token-level models. We evaluate all of our model variations on the challenging MHQA task Wikihop [6]. Our results support the superiority of the Transformer architecture as a GNN in MHQA. However, we find that problem-specific graph structuring rules can outperform the random connections used in Sparse Transformers. We demonstrate that the Transformer benefits greatly from its use of residual connections [3], Layer Normalisation [1], and element-wise feed forward Neural Networks, and show that all tested GNNs benefit from this too. We find that SDP attention can achieve higher task performance than Additive Attention. Finally, we also show that utilising edge type information alleviates performance losses introduced by sparsity
dc.identifier.apacitation	Acton, S. (2023). <i>From GNNs to sparse transformers: graph-based architectures for multi-hop question answering</i>. (). ,Faculty of Science ,Department of Computer Science. Retrieved from http://hdl.handle.net/11427/39180	en_ZA
dc.identifier.chicagocitation	Acton, Shane. <i>"From GNNs to sparse transformers: graph-based architectures for multi-hop question answering."</i> ., ,Faculty of Science ,Department of Computer Science, 2023. http://hdl.handle.net/11427/39180	en_ZA
dc.identifier.citation	Acton, S. 2023. From GNNs to sparse transformers: graph-based architectures for multi-hop question answering. . ,Faculty of Science ,Department of Computer Science. http://hdl.handle.net/11427/39180	en_ZA
dc.identifier.ris	TY - Thesis / Dissertation AU - Acton, Shane AB - Multi-hop Question Answering (MHQA) is a challenging task in NLP which typically involves processing very long sequences of context information. Sparse Transformers [7] have surpassed Graph Neural Networks (GNNs) as the state-of-the-art architecture for MHQA. Noting that the Transformer [4] is a particular message passing GNN, in this work we perform an architectural analysis and evaluation to investigate why the Transformer outperforms other GNNs on MHQA. In particular, we compare attention- and non-attentionbased GNNs, and compare the Transformer's Scaled Dot Product (SDP) attention to the Graph Attention Network [5] (GAT)'s Additive Attention [2]. We simplify existing GNNbased MHQA models and leverage this system to compare GNN architectures in a lower compute setting than token-level models. We evaluate all of our model variations on the challenging MHQA task Wikihop [6]. Our results support the superiority of the Transformer architecture as a GNN in MHQA. However, we find that problem-specific graph structuring rules can outperform the random connections used in Sparse Transformers. We demonstrate that the Transformer benefits greatly from its use of residual connections [3], Layer Normalisation [1], and element-wise feed forward Neural Networks, and show that all tested GNNs benefit from this too. We find that SDP attention can achieve higher task performance than Additive Attention. Finally, we also show that utilising edge type information alleviates performance losses introduced by sparsity DA - 2023 DB - OpenUCT DP - University of Cape Town KW - Computer Science LK - https://open.uct.ac.za PY - 2023 T1 - From GNNs to sparse transformers: graph-based architectures for multi-hop question answering TI - From GNNs to sparse transformers: graph-based architectures for multi-hop question answering UR - http://hdl.handle.net/11427/39180 ER -	en_ZA
dc.identifier.uri	http://hdl.handle.net/11427/39180
dc.identifier.vancouvercitation	Acton S. From GNNs to sparse transformers: graph-based architectures for multi-hop question answering. []. ,Faculty of Science ,Department of Computer Science, 2023 [cited yyyy month dd]. Available from: http://hdl.handle.net/11427/39180	en_ZA
dc.language.rfc3066	eng
dc.publisher.department	Department of Computer Science
dc.publisher.faculty	Faculty of Science
dc.subject	Computer Science
dc.title	From GNNs to sparse transformers: graph-based architectures for multi-hop question answering
dc.type	Thesis / Dissertation
dc.type.qualificationlevel	Masters
dc.type.qualificationlevel	MSc

Files

Original bundle

Now showing 1 - 1 of 1

Name:: thesis_sci_2023_acton shane.pdf
Size:: 3.38 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.72 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Masters