Exploring the application of Natural Language Processing to scientific medical cannabis publications

dc.contributor.advisorNyirenda, Juwa
dc.contributor.authorde Beer, James Charles
dc.date.accessioned2023-03-03T08:54:09Z
dc.date.available2023-03-03T08:54:09Z
dc.date.issued2022
dc.date.updated2023-02-20T12:31:55Z
dc.description.abstractCannabis has become recognised internationally as a powerful medicinal plant. The explosion of clinical research on cannabis has made it difficult for researchers and medical professionals to keep up to date with new findings. Analyzing the large quantities of available text data using natural language processing and machine learning algorithms could improve the speed and accuracy at which cannabis research is processed, as well as expose hitherto unknown connections between cannabis compounds and the treatment of healtth conditions. In turn, this would help direct future research and clinical trials. This thesis aims to develop an appropriate method to extract the key connections between cannabis compounds, human physiology and disease from the existing medical literature. First, natural language processing techniques (such as document clustering and topic modelling, global vector word embeddings and supervised document classifiers) are used to group 500 journal articles from the general literature on cannabis according to broad research topics; analyse the interaction between cannabis compounds, human physiology and diseases; and train a classifier to classify unseen documents. Second, the connections generated through this quantitative process are assessed qualitatively against those in a manual dataset of research findings from more than 500 studies collated over a number of years and provided by a medical company specialising in cannabis research. The results indicate that the methods developed were able to effectively and accurately demonstrate conenction between cannabis plant compounds and diseases. Hence, the working code accurately reproduced the results of manual analysis. This was shown by the close similarity of ranked key word to diseases. The unsupervised methods were able to effectively cluster and model topic distributions between the data to group documents by topic, while the supervised learning methods were able to accurately train models based on these suggestions, thereby solving a real-world practical problem in data management and analysis.
dc.identifier.apacitationde Beer, J. C. (2022). <i>Exploring the application of Natural Language Processing to scientific medical cannabis publications</i>. (). ,Faculty of Science ,Department of Statistical Sciences. Retrieved from http://hdl.handle.net/11427/37173en_ZA
dc.identifier.chicagocitationde Beer, James Charles. <i>"Exploring the application of Natural Language Processing to scientific medical cannabis publications."</i> ., ,Faculty of Science ,Department of Statistical Sciences, 2022. http://hdl.handle.net/11427/37173en_ZA
dc.identifier.citationde Beer, J.C. 2022. Exploring the application of Natural Language Processing to scientific medical cannabis publications. . ,Faculty of Science ,Department of Statistical Sciences. http://hdl.handle.net/11427/37173en_ZA
dc.identifier.ris TY - Master Thesis AU - de Beer, James Charles AB - Cannabis has become recognised internationally as a powerful medicinal plant. The explosion of clinical research on cannabis has made it difficult for researchers and medical professionals to keep up to date with new findings. Analyzing the large quantities of available text data using natural language processing and machine learning algorithms could improve the speed and accuracy at which cannabis research is processed, as well as expose hitherto unknown connections between cannabis compounds and the treatment of healtth conditions. In turn, this would help direct future research and clinical trials. This thesis aims to develop an appropriate method to extract the key connections between cannabis compounds, human physiology and disease from the existing medical literature. First, natural language processing techniques (such as document clustering and topic modelling, global vector word embeddings and supervised document classifiers) are used to group 500 journal articles from the general literature on cannabis according to broad research topics; analyse the interaction between cannabis compounds, human physiology and diseases; and train a classifier to classify unseen documents. Second, the connections generated through this quantitative process are assessed qualitatively against those in a manual dataset of research findings from more than 500 studies collated over a number of years and provided by a medical company specialising in cannabis research. The results indicate that the methods developed were able to effectively and accurately demonstrate conenction between cannabis plant compounds and diseases. Hence, the working code accurately reproduced the results of manual analysis. This was shown by the close similarity of ranked key word to diseases. The unsupervised methods were able to effectively cluster and model topic distributions between the data to group documents by topic, while the supervised learning methods were able to accurately train models based on these suggestions, thereby solving a real-world practical problem in data management and analysis. DA - 2022_ DB - OpenUCT DP - University of Cape Town KW - Statistical Sciences LK - https://open.uct.ac.za PY - 2022 T1 - Exploring the application of Natural Language Processing to scientific medical cannabis publications TI - Exploring the application of Natural Language Processing to scientific medical cannabis publications UR - http://hdl.handle.net/11427/37173 ER - en_ZA
dc.identifier.urihttp://hdl.handle.net/11427/37173
dc.identifier.vancouvercitationde Beer JC. Exploring the application of Natural Language Processing to scientific medical cannabis publications. []. ,Faculty of Science ,Department of Statistical Sciences, 2022 [cited yyyy month dd]. Available from: http://hdl.handle.net/11427/37173en_ZA
dc.language.rfc3066eng
dc.publisher.departmentDepartment of Statistical Sciences
dc.publisher.facultyFaculty of Science
dc.subjectStatistical Sciences
dc.titleExploring the application of Natural Language Processing to scientific medical cannabis publications
dc.typeMaster Thesis
dc.type.qualificationlevelMasters
dc.type.qualificationlevelMSc
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
thesis_sci_2022_de beer james charles.pdf
Size:
2.72 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
0 B
Format:
Item-specific license agreed upon to submission
Description:
Collections