Hospital readmission prediction with long clinical notes

Nurmahomed, Yassin

Hospital readmission prediction with long clinical notes

dc.contributor.advisor	Buys, Jan
dc.contributor.author	Nurmahomed, Yassin
dc.date.accessioned	2023-04-13T10:49:24Z
dc.date.available	2023-04-13T10:49:24Z
dc.date.issued	2022
dc.date.updated	2023-04-12T11:29:48Z
dc.description.abstract	Electronic health records (EHR) data is captured across many healthcare institutions, resulting in large amounts of diverse information that can be analysed for diagnosis, prognosis, treatment and prevention of disease. One type of data captured by EHRs are clinical notes, which are unstructured data written in natural language. We can leverage Natural Language Processing (NLP) to build machine learning (ML) models to gain understanding from clinical notes that will enable us to predict clinical outcomes. ClinicalBERT is a pre-trained Transformer based model which is trained on clinical text and is able to predict 30-day hospital readmission from clinical notes. Although the performance is good, it suffers from a limitation on the size of the text sequence that is fed as input to the model. Models using longer sequences have been shown to perform better on different ML tasks, even with clinical text. In this work, a ML model called Longformer which pre-trained then fine-tuned on clinical text and is able to learn from longer sequences than previous models is evaluated. Performance is evaluated against the Deep Averaging Network (DAN) and Long short-term memory (LSTM) baselines and previous state-of-the-art models in terms of Area under the receiver operating characteristic curve (AUROC), Area under the precision-recall curve (AUPRC) and Recall at precision of 70% (RP70). Longformer is able to best ClinicalBERT on two performance metrics, however it is not able to surpass one of the baselines in any of the metrics. Training the model on early notes did not result in substantial difference when compared to training on discharge summaries. Our analysis shows that the model suffers from out-of-vocabulary words, as many biomedical concepts are missing from the original pre-training corpus.
dc.identifier.apacitation	Nurmahomed, Y. (2022). <i>Hospital readmission prediction with long clinical notes</i>. (). ,Faculty of Science ,Department of Computer Science. Retrieved from http://hdl.handle.net/11427/37712	en_ZA
dc.identifier.chicagocitation	Nurmahomed, Yassin. <i>"Hospital readmission prediction with long clinical notes."</i> ., ,Faculty of Science ,Department of Computer Science, 2022. http://hdl.handle.net/11427/37712	en_ZA
dc.identifier.citation	Nurmahomed, Y. 2022. Hospital readmission prediction with long clinical notes. . ,Faculty of Science ,Department of Computer Science. http://hdl.handle.net/11427/37712	en_ZA
dc.identifier.ris	TY - Master Thesis AU - Nurmahomed, Yassin AB - Electronic health records (EHR) data is captured across many healthcare institutions, resulting in large amounts of diverse information that can be analysed for diagnosis, prognosis, treatment and prevention of disease. One type of data captured by EHRs are clinical notes, which are unstructured data written in natural language. We can leverage Natural Language Processing (NLP) to build machine learning (ML) models to gain understanding from clinical notes that will enable us to predict clinical outcomes. ClinicalBERT is a pre-trained Transformer based model which is trained on clinical text and is able to predict 30-day hospital readmission from clinical notes. Although the performance is good, it suffers from a limitation on the size of the text sequence that is fed as input to the model. Models using longer sequences have been shown to perform better on different ML tasks, even with clinical text. In this work, a ML model called Longformer which pre-trained then fine-tuned on clinical text and is able to learn from longer sequences than previous models is evaluated. Performance is evaluated against the Deep Averaging Network (DAN) and Long short-term memory (LSTM) baselines and previous state-of-the-art models in terms of Area under the receiver operating characteristic curve (AUROC), Area under the precision-recall curve (AUPRC) and Recall at precision of 70% (RP70). Longformer is able to best ClinicalBERT on two performance metrics, however it is not able to surpass one of the baselines in any of the metrics. Training the model on early notes did not result in substantial difference when compared to training on discharge summaries. Our analysis shows that the model suffers from out-of-vocabulary words, as many biomedical concepts are missing from the original pre-training corpus. DA - 2022_ DB - OpenUCT DP - University of Cape Town KW - Computer Science LK - https://open.uct.ac.za PY - 2022 T1 - Hospital readmission prediction with long clinical notes TI - Hospital readmission prediction with long clinical notes UR - http://hdl.handle.net/11427/37712 ER -	en_ZA
dc.identifier.uri	http://hdl.handle.net/11427/37712
dc.identifier.vancouvercitation	Nurmahomed Y. Hospital readmission prediction with long clinical notes. []. ,Faculty of Science ,Department of Computer Science, 2022 [cited yyyy month dd]. Available from: http://hdl.handle.net/11427/37712	en_ZA
dc.language.rfc3066	eng
dc.publisher.department	Department of Computer Science
dc.publisher.faculty	Faculty of Science
dc.subject	Computer Science
dc.title	Hospital readmission prediction with long clinical notes
dc.type	Master Thesis
dc.type.qualificationlevel	Masters
dc.type.qualificationlevel	MSc

Files

Original bundle

Now showing 1 - 1 of 1

Name:: thesis_sci_2022_nurmahomed yassin.pdf
Size:: 6.27 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 0 B
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Masters