An unsupervised approach to COVID-19 fake tweet detection

Jarana, Bulungisa

An unsupervised approach to COVID-19 fake tweet detection

dc.contributor.advisor	Ngwenya, Mzabalazo
dc.contributor.author	Jarana, Bulungisa
dc.date.accessioned	2024-07-04T13:37:19Z
dc.date.available	2024-07-04T13:37:19Z
dc.date.issued	2024
dc.date.updated	2024-07-03T13:39:11Z
dc.description.abstract	Context: With the ongoing COVID-19 pandemic, social media platforms have become a crucial source of information. However, not all information shared on these platforms is accurate. The dissemination of fake news, intentional or unintentional, can lead to panic among readers and further exacerbate the effects of the pandemic. Objectives: This research project aims to explore the potential of unsupervised machine learning algorithms in differentiating between genuine and fake COVID-19 news shared on Twitter. The methodology includes a literature review, experimental analysis, and the utilization of a Twitter dataset. Methods: The study used both Mini-Batch K-means and K-means algorithms of clustering techniques to provide us with ‘grouping' of Twitter data in the two of clusters. Word embedding techniques such as TF-IDF, Word2Vec, and BERT were employed because machine learning models cannot process unprocessed text data directly, and word embedding resolves this issue. Results: The results on the test data show that K-means algorithm was the best performing algorithm (76% accuracy was achieved) in determining fake tweets about Covid-19. K-means algorithm using Bert word embedding is the best performing model followed by Mini-Batch K-means using TF-IDF word embedding (69% accuracy was achieved). Conclusions: The study demonstrates that clustering Twitter COVID-19 news as genuine or fake using K-means and Mini-Batch K-means algorithms is feasible Keywords: Clustering, Machine Learning, unsupervised learning, K-Means, MiniBatch K-Means, TF-IDF, Word2Vec, Bert, Confusion Matrix, Truncated SVD (Singular Value Decomposition), t-distributed stochastic neighbourhood embedding (t-SNE)
dc.identifier.apacitation	Jarana, B. (2024). <i>An unsupervised approach to COVID-19 fake tweet detection</i>. (). ,Faculty of Science ,Department of Statistical Sciences. Retrieved from http://hdl.handle.net/11427/40266	en_ZA
dc.identifier.chicagocitation	Jarana, Bulungisa. <i>"An unsupervised approach to COVID-19 fake tweet detection."</i> ., ,Faculty of Science ,Department of Statistical Sciences, 2024. http://hdl.handle.net/11427/40266	en_ZA
dc.identifier.citation	Jarana, B. 2024. An unsupervised approach to COVID-19 fake tweet detection. . ,Faculty of Science ,Department of Statistical Sciences. http://hdl.handle.net/11427/40266	en_ZA
dc.identifier.ris	TY - Thesis / Dissertation AU - Jarana, Bulungisa AB - Context: With the ongoing COVID-19 pandemic, social media platforms have become a crucial source of information. However, not all information shared on these platforms is accurate. The dissemination of fake news, intentional or unintentional, can lead to panic among readers and further exacerbate the effects of the pandemic. Objectives: This research project aims to explore the potential of unsupervised machine learning algorithms in differentiating between genuine and fake COVID-19 news shared on Twitter. The methodology includes a literature review, experimental analysis, and the utilization of a Twitter dataset. Methods: The study used both Mini-Batch K-means and K-means algorithms of clustering techniques to provide us with ‘grouping' of Twitter data in the two of clusters. Word embedding techniques such as TF-IDF, Word2Vec, and BERT were employed because machine learning models cannot process unprocessed text data directly, and word embedding resolves this issue. Results: The results on the test data show that K-means algorithm was the best performing algorithm (76% accuracy was achieved) in determining fake tweets about Covid-19. K-means algorithm using Bert word embedding is the best performing model followed by Mini-Batch K-means using TF-IDF word embedding (69% accuracy was achieved). Conclusions: The study demonstrates that clustering Twitter COVID-19 news as genuine or fake using K-means and Mini-Batch K-means algorithms is feasible Keywords: Clustering, Machine Learning, unsupervised learning, K-Means, MiniBatch K-Means, TF-IDF, Word2Vec, Bert, Confusion Matrix, Truncated SVD (Singular Value Decomposition), t-distributed stochastic neighbourhood embedding (t-SNE) DA - 2024 DB - OpenUCT DP - University of Cape Town KW - Statistical Sciences LK - https://open.uct.ac.za PY - 2024 T1 - An unsupervised approach to COVID-19 fake tweet detection TI - An unsupervised approach to COVID-19 fake tweet detection UR - http://hdl.handle.net/11427/40266 ER -	en_ZA
dc.identifier.uri	http://hdl.handle.net/11427/40266
dc.identifier.vancouvercitation	Jarana B. An unsupervised approach to COVID-19 fake tweet detection. []. ,Faculty of Science ,Department of Statistical Sciences, 2024 [cited yyyy month dd]. Available from: http://hdl.handle.net/11427/40266	en_ZA
dc.language.rfc3066	Eng
dc.publisher.department	Department of Statistical Sciences
dc.publisher.faculty	Faculty of Science
dc.subject	Statistical Sciences
dc.title	An unsupervised approach to COVID-19 fake tweet detection
dc.type	Thesis / Dissertation
dc.type.qualificationlevel	Masters
dc.type.qualificationlevel	MSc

Files

Original bundle

Now showing 1 - 1 of 1

Name:: thesis_sci_2024_jarana bulungisa.pdf
Size:: 5.25 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.72 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Masters