From GNNs to sparse transformers: graph-based architectures for multi-hop question answering
dc.contributor.advisor | Buys, Jan | |
dc.contributor.author | Acton, Shane | |
dc.date.accessioned | 2024-03-05T07:43:02Z | |
dc.date.available | 2024-03-05T07:43:02Z | |
dc.date.issued | 2023 | |
dc.date.updated | 2024-03-05T07:41:33Z | |
dc.description.abstract | Multi-hop Question Answering (MHQA) is a challenging task in NLP which typically involves processing very long sequences of context information. Sparse Transformers [7] have surpassed Graph Neural Networks (GNNs) as the state-of-the-art architecture for MHQA. Noting that the Transformer [4] is a particular message passing GNN, in this work we perform an architectural analysis and evaluation to investigate why the Transformer outperforms other GNNs on MHQA. In particular, we compare attention- and non-attentionbased GNNs, and compare the Transformer's Scaled Dot Product (SDP) attention to the Graph Attention Network [5] (GAT)'s Additive Attention [2]. We simplify existing GNNbased MHQA models and leverage this system to compare GNN architectures in a lower compute setting than token-level models. We evaluate all of our model variations on the challenging MHQA task Wikihop [6]. Our results support the superiority of the Transformer architecture as a GNN in MHQA. However, we find that problem-specific graph structuring rules can outperform the random connections used in Sparse Transformers. We demonstrate that the Transformer benefits greatly from its use of residual connections [3], Layer Normalisation [1], and element-wise feed forward Neural Networks, and show that all tested GNNs benefit from this too. We find that SDP attention can achieve higher task performance than Additive Attention. Finally, we also show that utilising edge type information alleviates performance losses introduced by sparsity | |
dc.identifier.apacitation | Acton, S. (2023). <i>From GNNs to sparse transformers: graph-based architectures for multi-hop question answering</i>. (). ,Faculty of Science ,Department of Computer Science. Retrieved from http://hdl.handle.net/11427/39180 | en_ZA |
dc.identifier.chicagocitation | Acton, Shane. <i>"From GNNs to sparse transformers: graph-based architectures for multi-hop question answering."</i> ., ,Faculty of Science ,Department of Computer Science, 2023. http://hdl.handle.net/11427/39180 | en_ZA |
dc.identifier.citation | Acton, S. 2023. From GNNs to sparse transformers: graph-based architectures for multi-hop question answering. . ,Faculty of Science ,Department of Computer Science. http://hdl.handle.net/11427/39180 | en_ZA |
dc.identifier.ris | TY - Thesis / Dissertation AU - Acton, Shane AB - Multi-hop Question Answering (MHQA) is a challenging task in NLP which typically involves processing very long sequences of context information. Sparse Transformers [7] have surpassed Graph Neural Networks (GNNs) as the state-of-the-art architecture for MHQA. Noting that the Transformer [4] is a particular message passing GNN, in this work we perform an architectural analysis and evaluation to investigate why the Transformer outperforms other GNNs on MHQA. In particular, we compare attention- and non-attentionbased GNNs, and compare the Transformer's Scaled Dot Product (SDP) attention to the Graph Attention Network [5] (GAT)'s Additive Attention [2]. We simplify existing GNNbased MHQA models and leverage this system to compare GNN architectures in a lower compute setting than token-level models. We evaluate all of our model variations on the challenging MHQA task Wikihop [6]. Our results support the superiority of the Transformer architecture as a GNN in MHQA. However, we find that problem-specific graph structuring rules can outperform the random connections used in Sparse Transformers. We demonstrate that the Transformer benefits greatly from its use of residual connections [3], Layer Normalisation [1], and element-wise feed forward Neural Networks, and show that all tested GNNs benefit from this too. We find that SDP attention can achieve higher task performance than Additive Attention. Finally, we also show that utilising edge type information alleviates performance losses introduced by sparsity DA - 2023 DB - OpenUCT DP - University of Cape Town KW - Computer Science LK - https://open.uct.ac.za PY - 2023 T1 - From GNNs to sparse transformers: graph-based architectures for multi-hop question answering TI - From GNNs to sparse transformers: graph-based architectures for multi-hop question answering UR - http://hdl.handle.net/11427/39180 ER - | en_ZA |
dc.identifier.uri | http://hdl.handle.net/11427/39180 | |
dc.identifier.vancouvercitation | Acton S. From GNNs to sparse transformers: graph-based architectures for multi-hop question answering. []. ,Faculty of Science ,Department of Computer Science, 2023 [cited yyyy month dd]. Available from: http://hdl.handle.net/11427/39180 | en_ZA |
dc.language.rfc3066 | eng | |
dc.publisher.department | Department of Computer Science | |
dc.publisher.faculty | Faculty of Science | |
dc.subject | Computer Science | |
dc.title | From GNNs to sparse transformers: graph-based architectures for multi-hop question answering | |
dc.type | Thesis / Dissertation | |
dc.type.qualificationlevel | Masters | |
dc.type.qualificationlevel | MSc |