From GNNs to sparse transformers: graph-based architectures for multi-hop question answering

dc.contributor.advisorBuys, Jan
dc.contributor.authorActon, Shane
dc.date.accessioned2024-03-05T07:43:02Z
dc.date.available2024-03-05T07:43:02Z
dc.date.issued2023
dc.date.updated2024-03-05T07:41:33Z
dc.description.abstractMulti-hop Question Answering (MHQA) is a challenging task in NLP which typically involves processing very long sequences of context information. Sparse Transformers [7] have surpassed Graph Neural Networks (GNNs) as the state-of-the-art architecture for MHQA. Noting that the Transformer [4] is a particular message passing GNN, in this work we perform an architectural analysis and evaluation to investigate why the Transformer outperforms other GNNs on MHQA. In particular, we compare attention- and non-attentionbased GNNs, and compare the Transformer's Scaled Dot Product (SDP) attention to the Graph Attention Network [5] (GAT)'s Additive Attention [2]. We simplify existing GNNbased MHQA models and leverage this system to compare GNN architectures in a lower compute setting than token-level models. We evaluate all of our model variations on the challenging MHQA task Wikihop [6]. Our results support the superiority of the Transformer architecture as a GNN in MHQA. However, we find that problem-specific graph structuring rules can outperform the random connections used in Sparse Transformers. We demonstrate that the Transformer benefits greatly from its use of residual connections [3], Layer Normalisation [1], and element-wise feed forward Neural Networks, and show that all tested GNNs benefit from this too. We find that SDP attention can achieve higher task performance than Additive Attention. Finally, we also show that utilising edge type information alleviates performance losses introduced by sparsity
dc.identifier.apacitationActon, S. (2023). <i>From GNNs to sparse transformers: graph-based architectures for multi-hop question answering</i>. (). ,Faculty of Science ,Department of Computer Science. Retrieved from http://hdl.handle.net/11427/39180en_ZA
dc.identifier.chicagocitationActon, Shane. <i>"From GNNs to sparse transformers: graph-based architectures for multi-hop question answering."</i> ., ,Faculty of Science ,Department of Computer Science, 2023. http://hdl.handle.net/11427/39180en_ZA
dc.identifier.citationActon, S. 2023. From GNNs to sparse transformers: graph-based architectures for multi-hop question answering. . ,Faculty of Science ,Department of Computer Science. http://hdl.handle.net/11427/39180en_ZA
dc.identifier.ris TY - Thesis / Dissertation AU - Acton, Shane AB - Multi-hop Question Answering (MHQA) is a challenging task in NLP which typically involves processing very long sequences of context information. Sparse Transformers [7] have surpassed Graph Neural Networks (GNNs) as the state-of-the-art architecture for MHQA. Noting that the Transformer [4] is a particular message passing GNN, in this work we perform an architectural analysis and evaluation to investigate why the Transformer outperforms other GNNs on MHQA. In particular, we compare attention- and non-attentionbased GNNs, and compare the Transformer's Scaled Dot Product (SDP) attention to the Graph Attention Network [5] (GAT)'s Additive Attention [2]. We simplify existing GNNbased MHQA models and leverage this system to compare GNN architectures in a lower compute setting than token-level models. We evaluate all of our model variations on the challenging MHQA task Wikihop [6]. Our results support the superiority of the Transformer architecture as a GNN in MHQA. However, we find that problem-specific graph structuring rules can outperform the random connections used in Sparse Transformers. We demonstrate that the Transformer benefits greatly from its use of residual connections [3], Layer Normalisation [1], and element-wise feed forward Neural Networks, and show that all tested GNNs benefit from this too. We find that SDP attention can achieve higher task performance than Additive Attention. Finally, we also show that utilising edge type information alleviates performance losses introduced by sparsity DA - 2023 DB - OpenUCT DP - University of Cape Town KW - Computer Science LK - https://open.uct.ac.za PY - 2023 T1 - From GNNs to sparse transformers: graph-based architectures for multi-hop question answering TI - From GNNs to sparse transformers: graph-based architectures for multi-hop question answering UR - http://hdl.handle.net/11427/39180 ER - en_ZA
dc.identifier.urihttp://hdl.handle.net/11427/39180
dc.identifier.vancouvercitationActon S. From GNNs to sparse transformers: graph-based architectures for multi-hop question answering. []. ,Faculty of Science ,Department of Computer Science, 2023 [cited yyyy month dd]. Available from: http://hdl.handle.net/11427/39180en_ZA
dc.language.rfc3066eng
dc.publisher.departmentDepartment of Computer Science
dc.publisher.facultyFaculty of Science
dc.subjectComputer Science
dc.titleFrom GNNs to sparse transformers: graph-based architectures for multi-hop question answering
dc.typeThesis / Dissertation
dc.type.qualificationlevelMasters
dc.type.qualificationlevelMSc
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
thesis_sci_2023_acton shane.pdf
Size:
3.38 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.72 KB
Format:
Item-specific license agreed upon to submission
Description:
Collections