Hydraulic Data Preprocessing for Anomaly Based Intrusion Detection on SCADA Level of Water Treatment Systems

dc.contributor.advisorRamotsoela, Daniel
dc.contributor.authorMboweni, Ignitious
dc.date.accessioned2024-07-04T14:04:34Z
dc.date.available2024-07-04T14:04:34Z
dc.date.issued2024
dc.date.updated2024-07-04T13:28:16Z
dc.description.abstractThe confidentiality, integrity and availability of critical infrastructure is crucial for any economy to operate efficiently. Critical water systems infrastructure is a target of many attackers who aim to penetrate the system for malicious reasons. The use of cyber-physical systems (CPSs) in Water Treatment Systems (WTSs) unveils many vulnerabilities that attackers can use. Although preventative security mechanisms are put into place they too can be defeated, and in this case, a second layer of security is essential. Intrusion detection mechanisms are important reactive security mechanisms to limit the damage done by a successful attack in the system. The ability to uncover data patterns and gather knowledge from data is a significant benefit of machine learning (ML), however factors such as noise, missing values, excessive features, and inconsistent and redundant data negatively affects the performance of the model, hence a need for data preprocessing which makes it possible to achieve speed and accuracy on a ML process by unveiling veracity in the data ergo making it valuable. Although many ML techniques for intrusion detection have been studied, comprehensive data preprocessing is scarcely documented. This begets a need for an adoptable data preprocessing workflow specifically for critical water systems infrastructure sensor and actuator data that researchers who intend on working on advancing cyber security in CPSs can utilise. The work provided in this dissertation explores data preprocessing techniques on secure water treatment (SWaT) testbed data and provides ideal critical water systems infrastructure specific data preprocessing techniques for a resultant informative dataset to yield high results when applied on machine learning (ML) classification models. The SWaT dataset was chosen as it was designed for cyber security research with a WTS use case. The techniques in this study can be applied to a similar kind of dataset collected from a similar environment and not limited to water treatment. Experiments were set up to evaluate the effect of preprocessing measures and the results showed good improvement on the model's performance which is a good indication of the impact that the data preprocessing has. The best performance was achieved when the preprocessed dataset was randomly split into training and testing, yielding a significant improvement in accuracy, F1 score and time to detection for both algorithms used in the study, namely Fine Tree and Boosted Trees Ensemble.
dc.identifier.apacitationMboweni, I. (2024). <i>Hydraulic Data Preprocessing for Anomaly Based Intrusion Detection on SCADA Level of Water Treatment Systems</i>. (). ,Faculty of Engineering and the Built Environment ,Department of Electrical Engineering. Retrieved from http://hdl.handle.net/11427/40335en_ZA
dc.identifier.chicagocitationMboweni, Ignitious. <i>"Hydraulic Data Preprocessing for Anomaly Based Intrusion Detection on SCADA Level of Water Treatment Systems."</i> ., ,Faculty of Engineering and the Built Environment ,Department of Electrical Engineering, 2024. http://hdl.handle.net/11427/40335en_ZA
dc.identifier.citationMboweni, I. 2024. Hydraulic Data Preprocessing for Anomaly Based Intrusion Detection on SCADA Level of Water Treatment Systems. . ,Faculty of Engineering and the Built Environment ,Department of Electrical Engineering. http://hdl.handle.net/11427/40335en_ZA
dc.identifier.ris TY - Thesis / Dissertation AU - Mboweni, Ignitious AB - The confidentiality, integrity and availability of critical infrastructure is crucial for any economy to operate efficiently. Critical water systems infrastructure is a target of many attackers who aim to penetrate the system for malicious reasons. The use of cyber-physical systems (CPSs) in Water Treatment Systems (WTSs) unveils many vulnerabilities that attackers can use. Although preventative security mechanisms are put into place they too can be defeated, and in this case, a second layer of security is essential. Intrusion detection mechanisms are important reactive security mechanisms to limit the damage done by a successful attack in the system. The ability to uncover data patterns and gather knowledge from data is a significant benefit of machine learning (ML), however factors such as noise, missing values, excessive features, and inconsistent and redundant data negatively affects the performance of the model, hence a need for data preprocessing which makes it possible to achieve speed and accuracy on a ML process by unveiling veracity in the data ergo making it valuable. Although many ML techniques for intrusion detection have been studied, comprehensive data preprocessing is scarcely documented. This begets a need for an adoptable data preprocessing workflow specifically for critical water systems infrastructure sensor and actuator data that researchers who intend on working on advancing cyber security in CPSs can utilise. The work provided in this dissertation explores data preprocessing techniques on secure water treatment (SWaT) testbed data and provides ideal critical water systems infrastructure specific data preprocessing techniques for a resultant informative dataset to yield high results when applied on machine learning (ML) classification models. The SWaT dataset was chosen as it was designed for cyber security research with a WTS use case. The techniques in this study can be applied to a similar kind of dataset collected from a similar environment and not limited to water treatment. Experiments were set up to evaluate the effect of preprocessing measures and the results showed good improvement on the model's performance which is a good indication of the impact that the data preprocessing has. The best performance was achieved when the preprocessed dataset was randomly split into training and testing, yielding a significant improvement in accuracy, F1 score and time to detection for both algorithms used in the study, namely Fine Tree and Boosted Trees Ensemble. DA - 2024 DB - OpenUCT DP - University of Cape Town KW - Engineering LK - https://open.uct.ac.za PY - 2024 T1 - Hydraulic Data Preprocessing for Anomaly Based Intrusion Detection on SCADA Level of Water Treatment Systems TI - Hydraulic Data Preprocessing for Anomaly Based Intrusion Detection on SCADA Level of Water Treatment Systems UR - http://hdl.handle.net/11427/40335 ER - en_ZA
dc.identifier.urihttp://hdl.handle.net/11427/40335
dc.identifier.vancouvercitationMboweni I. Hydraulic Data Preprocessing for Anomaly Based Intrusion Detection on SCADA Level of Water Treatment Systems. []. ,Faculty of Engineering and the Built Environment ,Department of Electrical Engineering, 2024 [cited yyyy month dd]. Available from: http://hdl.handle.net/11427/40335en_ZA
dc.language.rfc3066Eng
dc.publisher.departmentDepartment of Electrical Engineering
dc.publisher.facultyFaculty of Engineering and the Built Environment
dc.subjectEngineering
dc.titleHydraulic Data Preprocessing for Anomaly Based Intrusion Detection on SCADA Level of Water Treatment Systems
dc.typeThesis / Dissertation
dc.type.qualificationlevelMasters
dc.type.qualificationlevelMSc
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
thesis_ebe_2024_mboweni ignitious.pdf
Size:
5.43 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.72 KB
Format:
Item-specific license agreed upon to submission
Description:
Collections