A scalable database model of RFI data for the MeerKAT/SKA radio telescope

Thesis / Dissertation

2024

Permanent link to this Item
Authors
Journal Title
Link to Journal
Journal ISSN
Volume Title
Publisher
Publisher

University of Cape Town

License
Series
Abstract
In radio astronomy, radio frequency interference (RFI) refers to any signal captured by a radio telescope that did not originate from the observed target in the sky. RFI from terrestrial and other sources is a recognized problem that contaminates the desired signal and must be tracked and ultimately removed. RFI corrupts observed data and may even damage radio telescope equipment. Astronomers, therefore, seek to store data on RFI to mitigate or prevent future interference events. At the MeerKAT radio telescope (a precursor to the Square Kilometre Array, and one of the largest and most sensitive radio telescopes in the world to date), RFI is captured in different formats using a variety of devices including telescopes, sensors and scanners; however, the combination of data from these multiple sources does not only yield storage problems but also data integration challenges. In this work, we present two designs for the scalable database model. In the first design, RFI data is stored in multiple databases (PSQL, SciDB, and Accumulo). Our findings indicate that PSQL outperforms both SciDB and Accumulo. Consequently, in the second design (alternative), all RFI data is stored exclusively in PSQL. However, we observed that the performance of the alternative model is impacted by the transformation of SciDB (array data) and Accumulo (key-value data) into PSQL (relational data). Our results recommend storing RFI data in its appropriate database rather than transforming it into another format, as this approach boosts the model's performance. Our model demonstrates a response time of less than 12 seconds for 1 MB RFI data (bulk request), with latency below 0.14 seconds—well within the acceptable maximum latency of 1 second in scalable databases. We found that the native database API is slightly faster (5%) than a third-party API, with no significant impact on the model's performance. In addition, this work indicates the direction for improvement in join queries involving disparate databases, which remains a limitation in heterogeneous environments. Our model facilitates fast queries across various databases, underscoring the importance of storing each data type in the appropriate database system. Lastly, our RFI database model offers good performance and scales effectively with increasing data volumes, multiple users, and varying workloads, making it suitable for the MeerKAT and SKA radio telescopes.
Description

Reference:

Collections