Hydraulic Data Preprocessing for Anomaly Based Intrusion Detection on SCADA Level of Water Treatment Systems

Mboweni, Ignitious

Hydraulic Data Preprocessing for Anomaly Based Intrusion Detection on SCADA Level of Water Treatment Systems

Thesis / Dissertation

2024

Abstract

The confidentiality, integrity and availability of critical infrastructure is crucial for any economy to operate efficiently. Critical water systems infrastructure is a target of many attackers who aim to penetrate the system for malicious reasons. The use of cyber-physical systems (CPSs) in Water Treatment Systems (WTSs) unveils many vulnerabilities that attackers can use. Although preventative security mechanisms are put into place they too can be defeated, and in this case, a second layer of security is essential. Intrusion detection mechanisms are important reactive security mechanisms to limit the damage done by a successful attack in the system. The ability to uncover data patterns and gather knowledge from data is a significant benefit of machine learning (ML), however factors such as noise, missing values, excessive features, and inconsistent and redundant data negatively affects the performance of the model, hence a need for data preprocessing which makes it possible to achieve speed and accuracy on a ML process by unveiling veracity in the data ergo making it valuable. Although many ML techniques for intrusion detection have been studied, comprehensive data preprocessing is scarcely documented. This begets a need for an adoptable data preprocessing workflow specifically for critical water systems infrastructure sensor and actuator data that researchers who intend on working on advancing cyber security in CPSs can utilise. The work provided in this dissertation explores data preprocessing techniques on secure water treatment (SWaT) testbed data and provides ideal critical water systems infrastructure specific data preprocessing techniques for a resultant informative dataset to yield high results when applied on machine learning (ML) classification models. The SWaT dataset was chosen as it was designed for cyber security research with a WTS use case. The techniques in this study can be applied to a similar kind of dataset collected from a similar environment and not limited to water treatment. Experiments were set up to evaluate the effect of preprocessing measures and the results showed good improvement on the model's performance which is a good indication of the impact that the data preprocessing has. The best performance was achieved when the preprocessed dataset was randomly split into training and testing, yielding a significant improvement in accuracy, F1 score and time to detection for both algorithms used in the study, namely Fine Tree and Boosted Trees Ensemble.

Keywords

Engineering

Reference:

Collections

Masters

Full item page