Browsing by Author "Winberg, Simon"
Now showing 1 - 20 of 38
Results Per Page
Sort Options
- ItemOpen AccessA GPU based X-Engine for the MeerKAT Radio Telescope(University of Cape Town, 2020) Callanan, Gareth Mitchell; Winberg, SimonThe correlator is a key component of the digital backend of a modern radio telescope array. The 64 antenna MeerKAT telescope has an FX architecture correlator consisting of 64 F-Engines and 256 X-Engines. These F- and X-Engines are all hosted on 128 custom designed FPGA processing boards. This custom board is known as a SKARAB. One SKARAB X-Engine board hosts four logical X-Engines. This SKARAB ingests data at 27.2 Gbps over a 40 GbE connection. It correlates this data in real time. GPU technology has improved significantly since SKARAB was designed. GPUs are now becoming viable alternatives to FPGAs in high performance streaming applications. The objective of this dissertation is to investigate how to build a GPU drop-in replacement X-Engine for MeerKAT and to compare this implementation to a SKARAB X-Engine. This includes the construction and analysis of a prototype GPU X-Engine. The 40 GbE ingest, GPU correlation algorithm and the software pipeline framework that links these two together were identified as the three main sub-systems to focus on in this dissertation. A number of different tools implementing these sub-systems were examined with the most suitable ones being chosen for the prototype. A prototype dual socket system was built that could process the equivalent of two SKARABs worth of X-Engine data. This prototype has two 40 GbE Mellanox NICS running the SPEAD2 library and a single Nvidia GeForce 1080Ti GPU running the xGPU library. A custom pipeline framework built on top of the Intel Threaded Building Blocks (TBB) library was designed to facilitate the ow of data between these sub-systems. The prototype system was compared to two SKARABs. For an equivalent amount of processing, the GPU X-Engine cost R143 000 while the two SKARABs cost R490 000. The power consumption of the GPU X-Engine was more than twice that of the SKARABs (400W compared 180W), while only requiring half as much rack space. GPUs as X-Engines were found to be more suitable than FPGAs when cost and density are the main priorities. When power consumption is the priority, then FPGAs should be used. When running eight logical X-Engines, 85% of the prototype's CPU cores were used while only 75% of the GPU's compute capacity was utilised. The main bottleneck on the GPU X-Engine was on the CPU side of the server. This report suggests that the next iteration of the system should offload some CPU side processing to the GPU and double the number of 40 GbE ports. This could potentially double the system throughput. When considering methods to improve this system, an FPGA/GPU hybrid X-Engine concept was developed that would combine the power saving advantage of FPGAs and the low cost to compute ratio of GPUs.
- ItemOpen AccessAccelerator-based look-up table for coarse-grained molecular dynamics computations(2018) Gangopadhyay, Ananya; Naidoo, Kevin J.; Winberg, SimonMolecular Dynamics (MD) is a simulation technique widely used by computational chemists and biologists to simulate and observe the physical properties of a system of particles or molecules. The method provides invaluable three-dimensional structural and transport property data for macromolecules that can be used in applications such as the study of protein folding and drug design. The most time-consuming and inefficient routines in MD packages, particularly for large systems, are the ones involving the computation of intermolecular energy and forces for each molecule. Many fully atomistic systems such as CHARMM and NAMD have been refined over the years to improve their efficiency. But, simulating complex long-time events such as protein folding remains out reach for atomistic simulations. The consensus view amongst computational chemists and biologists is that the development of a coarse-grained (CG) MD package will make the long timescales required for protein folding simulations possible. The shortcoming of this method remains an inability to produce accurate dynamics and results that are comparable with atomistic simulations. It is the objective of this dissertation to develop a coarse-grained method that is computationally faster than atomistic simulations, while being dynamically accurate enough to produce structural and transport property data comparable to results from the latter. Firstly, the accuracy of the Gay-Berne potential in modelling liquid benzene in comparison to fully atomistic simulations was investigated. Following this, the speed of a course-grained condensed phase benzene simulation employing a Gay-Berne potential was compared with that of a fully atomistic simulation. While coarse-graining algorithmically reduces the total number of particles in consideration, the execution time and efficiency scales poorly for large systems. Both fully-atomistic and coarse-grained developers have accelerated packages using high-performance parallel computing platforms such as multi-core CPU clusters, Field Programmable Gate Arrays (FPGAs) and Graphics Processing Units (GPUs). GPUs have especially gained popularity in recent years due to their massively parallel architecture on a single chip, making them a cheaper alternative to a CPU cluster. Their relatively shorter development time also gives them an advantage over FPGAs. NAMD is perhaps the most popular MD package that employs efficient use of a single GPU or a multi-GPU cluster to conduct simulations. The Scientific Computing Research Unit’s in-house generalised CG code, the Free Energy Force Induced (FEFI) coarse-grained MD package, was accelerated using a GPU to investigate the achievable speed-up in comparison to the CPU algorithm. To achieve this, a parallel version of the sequential force routine, i.e. the computation of the energy, force and torque per molecule, was developed and implemented on a GPU. The GPU-accelerated FEFI package was then used to simulate benzene, which is almost exclusively governed by van der Waal’s forces (i.e. dispersion effects), using the parameters for the Gay-Berne potential from a study by Golubkov and Ren in their work “Generalized coarse-grained model based on point multipole and Gay-Berne potentials”. The coarse-grained condensed phase structural properties, such as the radial and orientational distribution functions, proved to be inaccurate. Further, the transport properties such as diffusion were significantly more unsatisfactory compared to a CHARMM simulation. From this, a conclusion was reached that the Gay-Berne potential was not able to model the subtle effects of dispersion as observed in liquid benzene. In place of the analytic Gay-Berne potential, a more accurate approach would be to use a multidimensional free energy-based potential. Using the Free Energy from Adaptive Reaction Coordinate Forces (FEARCF) method, a four-dimensional Free Energy Volume (FEV) for two interacting benzene molecules was computed for liquid benzene. The focal point of this dissertation was to use this FEV as the coarse-grained interaction potential in FEFI to conduct CG simulations of condensed phase liquid benzene. The FEV can act as a numerical potential or Look-Up Table (LUT) from which the interaction energy and four partial derivatives required to compute the forces and torques can be obtained via numerical methods at each step of the CG MD simulation. A significant component of this dissertation was the development and implementation of four-dimensional LUT routines to use the FEV for accurate condensed phase coarse-grained simulations. To compute the energy and partial derivatives between the grid points of the surface, an interpolation algorithm was required. A four-dimensional cubic B-spline interpolation was developed because of the method’s superior accuracy and resistance to oscillations compared with other polynomial interpolation methods. When The algorithm’s introduction into the FEFI CG MD package for CPUs exhausted the single-core CPU architecture with its large number of interpolations for each MD step. It was therefore impractical for the high throughput interpolation required for MD simulations. The 4D cubic B-spline algorithm and the LUT routine were then developed and implemented on a GPU. Following evaluation, the LUT was integrated into the FEFI MD simulation package. A FEFI CG simulation of liquid benzene was run using the 4D FEV for a benzene molecular pair as the numerical potential. The structural and transport properties outperformed the analytical Gay-Berne CG potential, more closely approximating the atomistic predicted properties. The work done in this dissertation demonstrates the feasibility of a coarse-grained simulation using a free energy volume as a numerical potential to accurately simulate dispersion effects, a key feature needed for protein folding.
- ItemOpen AccessAction Tracer with Android API(2024) Chiramba, Humphrey; Winberg, SimonThis dissertation aims to present the design and testing of a low-cost motion capture device with feedback capabilities. In addition to this, an API to communicate with this system is developed that allows devices to connect to the motion capture system and execute commands on it. This system, Action Tracer, would be useful in cases where a user may need to train or learn a repetitive motion found in an activity or sport. Such a task needs to be done under supervision and may need to be done away from a coach or therapist. In either situation, a system which can accommodate both cases while providing insights that are not visible to the eye is needed. In addition to this, the system would need to be con gurable for the various sports that it may be used with, as well as be operated from an Android device. Most of the work that will be presented in this report will relate to the development of marker based motion capture systems, with a particular interest in the use of Inertial Measurement Units (IMUs). In addition to this, the subject of Biofeedback is be covered, as well as how it relates to motion capture and how it can be applied in this eld.
- ItemOpen AccessAnalysis and Development of an Online Knowledge Management Support System for a Community of Practice(2019) Mafereka, Moeketsi; Winberg, SimonThe purpose of this study was to investigate how particular business practices, focusing on those occurring in multi-site non-governmental organization (NGOs), could be enhanced by use of a knowledge management system (KMS). The main objective of this KMS is to enhance business processes and save costs for a multi-site NGO through streamlining the organizational practices of knowledge creation, storage, sharing and application. The methodology uses a multiple perspective approach, which covers exploration of the problem space and solution space. Under exploration of problem space, interviews with employees of the NGO are done to identify core problem that the organization faced. Still under exploration of problem space, organization’s knowledge management maturity was assessed through an online questionnaire. The methodology then moved on to exploration of problem space. During the exploration of problem space, the requirements gathering and definition process was done through a combination of interviews with company employees and by completing a systematic literature review of best practices. The requirements were used to design system architecture and use-case models. The prototype for a Community of Practice (COP) support website was developed and investigated in test cases. The tests showed that the prototype system was able to facilitate asynchronous communication through the creation and management of events, creation and management of collaboration groups, creation of discussion topics and creation of basic pages. Furthermore, security capabilities were tested in terms of login functionality. Lastly page load times were tested for eight different scenarios. The system performance was found to be satisfactory because the scenarios covering crucial system requirements aspects had a response time of below 11 seconds. An exception was the landing page, which after login took 26 seconds to load. It is believed that creation of a platform that enables, and records, user interaction, easy of online discussions, managing groups, topics and events, are all major contributors to a successful knowledge management approach.
- ItemOpen AccessArchitectural Level Computational Hardware Abstraction: A New Programming Language for FPGA Projects(2022) Taylor, John-Philip; Winberg, SimonRecent years have seen vast improvements to the capability of programmable processing platforms, especially field programmable gate arrays, or FPGAs. Modern software languages have been developed, adding features such as duck-typing, dynamic interpretation, built-in high level data structures, etc. Yet, FPGA development is still mostly using traditional hardware description languages such as VHDL and Verilog, and the industry is resorting to third party tools and scripting-based automation in order to increase developer efficiency. This dissertation presents ALCHA: a new object-oriented language aimed at low-level FPGA development. Main language objectives include increasing the architectural abstraction capabilities, introducing structured programming to FPGA development, automating fixed-point related design, integrating design constraints and increasing the generalisation capability. In short, the ALCHA language is designed to allow the user to increase abstraction and reduce maintenance effort. After ensuring that the language grammar is parsable, the resulting language design is evaluated by means of a radar-based case study. Language complexity measurement is based on the number of lines of code, and language power is based on the cost of maintenance. ALCHA is shown to support code that is about half as complex and twice as powerful as traditional HDL-based design, based on these metrics. In future, ALCHA could evolve into a hardware description language in its own right, allowing developers to leverage the strengths of FPGAs.
- ItemOpen AccessAutomated troubleshooting for RTWP in 3G/4G RAN nodes(2018) Mohammed, Hisham; Winberg, SimonNowadays, Mobile Network Operators are confronted with many challenges to operate and maintain their network. Subscribers expect stable and perpetual services. Repeated interruptions of services will result in the dissatisfaction of users and may lead to losing the end user. One of the major issues facing a Radio Access Network (RAN) mobile operator is coping with the uplink interference in their RAN, such as the Receive Total Wideband Power (RTWP) in the Universal Mobile Telecommunications System (UMTS) band. A frequently occurring issue in such networks is the RTWP alarm. This alarm is reported in the Network Operation Centre (NOC) and contribute to poor quality in the network . Such an alarm may occur daily, thus impacting the network’s Key Point Indicator (KPI). The mobile network operator always tries to resolve the issue of RTWP quickly by means of several processes and strategies to diagnose and troubleshoot this issue, all within a target ‘Service Level Agreement’ (SLA). There are many different causes that can lead to an RTWP alarm in a mobile 3G RAN. In addition, each of these cases has different diagnoses and troubleshooting methods. The main idea of this project is to design a Graphical User Interface (GUI) tool to help the Front Office (FO) or Back Office (BO) engineer in mobile network operator to check and troubleshoot the RTWP issue in the network in a timely manner. The tool is designed to check the configuration of the radio, based on the Huawei NodeB 3900 and statistical performance counters, and to provide the correct decision for the engineer to improve the efficiency and minimize the time taken to troubleshoot the RTWP alarm in the network. It is very useful to design such a tool for interacting with the Huawei NodeB 3900. The GUI tool is thus basically designed to support the engineers in Oman Telecommunication Company’s NOC while dealing with the RTWP alarm in the Huawei NodeB 3900. The major finding of this study is the design of the GUI tool to minimize the time taken to resolve the RTWP issue in the Huawei NodeB 3900 both in a single site and in multiple sites, to conduct consistency checks for the software parameters, and finally to identify the root cause of the RTWP alarm. The GUI tool shows an operation log, which can be used by the administrator for maintenance records, and it also contains a help guide that gives the user more information about the functionality of each button.
- ItemOpen AccessAutomatic generation of a floor plan from a 3D scanned model: Making the Analogue World Digital(2018) Wilson, Bradlee Kenneth; Winberg, Simon; O'Hagan, DanielThe processing of three-dimensional (3D) room models is an area of research undertaken by many academics and hobbyists due to multiple uses derived from the information obtained - such as the generation of a floor plan; an example of bridging the real and digital world. A floor plan is required when an existing room, floor, or building requires alteration. By having the floor plan in the digital domain it allows the user to alter the room via simulation and render the environment in a life-like manner to determine if the alterations will suffice. This is done using Computer Aided Design Software (CAD). Designing a new room or building would be done using CAD software. However, not all building's digital files are readily available or exist - making the creation of a floor plan necessary. The floor plan can created up by a person on pen and paper, or with using software tools and sensors. Commercial systems exist for this task but there are no automated, open-source systems that can do the same. Current research tends to focus on the processing algorithms and not the sensors or methods for capturing the environment. This dissertation deals with testing and evaluating off-the-shelf (OTS) sensors and the processing of 3D modelled rooms captured with one of these sensors. The tests performed on the OTS sensors determine the overall accuracy of the sensors for 3D room modelling. The rationale for designing and conducting these tests is to provide the community with suggested practical tests to assist in selecting an OTS sensor for 3D room modelling. The 3D room models are captured using an opensource application and are imported into custom software. The 3D models undergo pre-processing algorithms producing 2D results, which were further processed to determine the walls of rooms. The dimension information about these features are used to create a 2D floor plan. 3D modelled environments are inherently noisy, requiring efficient pre-processing to remove the noise without hampering processing performance of the 3D model. One of the largest contributors to noise and accuracy is the sensor. Selecting the appropriate sensor can mitigate the need for complex pre-processing algorithms and will improve overall processing time. The project was able to extract dimension information within an acceptable error. The tests that were designed and used for sensor testing were able to determine which sensor was the better choice for 3D room modelling. The optimal sensor was found to be Microsoft's Kinect1 . Tests were performed in which the Microsoft Kinect was required to map a room. The results show that dimensional information about the given scene could be successfully extracted with an average error of 4.60 %.
- ItemOpen AccessA binaural sound sources localisation application for smart phones(2015) Mugagga, Pius Kavuma Basajjabaka; Winberg, SimonThe ability to estimate positions of sound sources is one that gives animals a 360° awareness of their acoustic environment. This helps compliment the visual scene which is restricted to 180° in humans. Unfortunately, deaf people are left out on this ability. Smart phones are rapidly becoming a common tool amongst mobile users in developed and emerging markets. Their processing ability has more than doubled since their introduction to mass consumer markets by Apple in 2007. Top-end smart phones such as the Samsung Galaxy Series; 3, 4, and 5 models, have two microphones with which one can acquire stereo recordings. The purpose of this research project was to establish a feasible Sound source localization algorithm for current top-end smart phones, and to recommend hardware improvements for future smart phones, to pave way for the use of smart phones as advanced auditory sensory devices capable of acting as avatars for intelligent remote systems to learn about different acoustic scenes with help of human users. The GCC-PHAT algorithm was chosen as the underlying core DOA algorithm due to its suitability for pair-wise localization as highlighted in literature. A stochastic power accumulation algorithm was designed and implemented to improve estimation outcomes by GCC-PHAT. This algorithm was based on inspiration from W-disjoint orthogonality assumption in literature and was extended to perform sound source counting and time domain source separation. The system yielded satisfactory azimuth estimates of sound source directions in real time with pin-point DOA estimation accuracy rates of 64%, and 90.67% accuracy rate when a tolerance of ± 1 correlation sample is considered. An effort to resolve front back ambiguity using phone orientation data from the MEMs sensors yielded un-satisfactory results prompting a recommendation that an extra microphone would be needed to achieve 360° localization in a more user friendly way. The dissertation concludes with plans for further work on the topic and provision of a further refined API and optimised libraries to facilitate development of customised solutions using this system.
- ItemOpen AccessComparative study of tool-flows for rapid prototyping of software-defined radio digital signal processing(2019) Setetemela, Khobatha; Winberg, SimonThis dissertation is a comparative study of tool-flows for rapid prototyping of SDR DSP operations on programmable hardware platforms. The study is divided into two parts, focusing on high-level tool-flows for implementing SDR DSP operations on FPGA and GPU platforms respectively. In this dissertation, the term ‘tool-flow’ refers to a tool or a chain of tools that facilitate the mapping of an application description specified in a programming language into one or more programmable hardware platforms. High-level tool-flows use different techniques, such as high-level synthesis to allow the designer to specify the application from a high level of abstraction and achieve improved productivity without significant degradation in the design’s performance. SDR is an emerging communications technology that is driven by - among other factors – increasing demands for high-speed, interoperable and versatile communications systems. The key idea in SDR is the need to implement as many as possible of the radio functions that were traditionally defined in fixed hardware, in software on programmable hardware processors instead. The most commonly used processors are based on complex parallel computing architectures in order to support the high-speed processing demands of SDR applications, and they include FPGAs, GPUs and multicore general-purpose processors (GPPs) and DSPs. The architectural complexity of these processors results in a corresponding increase in programming methodologies which however impedes their wider adoption in suitable applications domains, including SDR DSP. In an effort to address this, a plethora of different high-level tool-flows have been developed. Several comparative studies of these tool-flows have been done to help – among other benefits – designers in choosing high-level tools to use. However, there are few studies that focus on SDR DSP operations, and most existing comparative studies are not based on well-defined comparison criteria. The approach implemented in this dissertation is to use a system engineering design process, firstly, to define the qualitative comparison criteria in the form of a specification for an ideal high-level SDR DSP tool-flow and, secondly, to implement a FIR filter case study in each of the tool-flows to enable a quantitative comparison in terms of programming effort and performance. The study considers Migen- and MyHDL-based open-source tool-flows for FPGA targets, and CUDA and Open Computing Language (OpenCL) for GPU targets. The ideal high-level SDR DSP tool-flow specification was defined and used to conduct a comparative study of the tools across three main design categories, which included high-level modelling, verification and implementation. For tool-flows targeting GPU platforms, the FIR case study was implemented using each of the tools; it was compiled, executed on a GPU server consisting of 2 GTX Titan-X GPUs and an Intel Core i7 GPP, and lastly profiled. The tools were moreover compared in terms of programming effort, memory transfers cost and overall operation time. With regard to tool-flows with FPGA targets, the FIR case study was developed by using each tool, and then implemented on a Xilinx 7 FPGA and compared in terms of programming effort, logic utilization and timing performance.
- ItemOpen AccessThe design and development of a pulsed radar block for the Rhino platform(2012) Raw, Bruce; Winberg, Simon; Inggs, MichaelThe Reconfigurable Hardware Interface for computiNg and radiO (Rhino) Platform is an FPGA based computing platform designed at the University of Cape Town to provide an FPGA resource that is both affordable and easy to learn and use in research and skills development in the areas of Software Designed Radio, Radio Astronomy and Cognitive Radio. A fremework comprising reusable radar processing modules (referred to in this text as "Radar Blocks") has been implemented on the Rhino and allows users to control simple pulse radar. The pulse radar application is implemented on the FPGA using the radar blocks framework which allows each block to be configured from the ARM processor to adapt settings during experiments. This project developed blocks for the communications bus, Gigabit Ethernet and simple pulse radar.
- ItemOpen AccessDesign and Implementation of a Risc-V Based LoRa Module(2023) Njoroge, Mark; Winberg, SimonThe proliferation of the Internet of Things(IoT) in both scale and complexity, alongside advances in optimised edge and fog system architectures, is driving an increasing need for low power end nodes with greater computational capabilities. These distributed higher capacity nodes allow IoT infrastructures to minimise the power cost of data movement and increase real time response through increased edge data analytics. This dissertation presents the design of a prototype softcore RISC-V based LoRa end node Printed Circuit Board (PCB) design. By combining the reconfigurability and optimisation potential of a FPGA and RISC-V based architecture with a LoRa interface, the design contributes a novel option for use in solutions to the above. The design utilises the open source python framework LiteX to generate an open, low cost and flexible System on a Chip (SoC) that contains the necessary core and peripherals to facilitate integration with a LoRa transceiver. The SoC is implemented on an ultra low power FPGA (Lattice iCE40UP5k), providing access to both reconfigurable logic and a CPU for data analytics, and standard interfaces for 3rd party sensors, such UART, I2C and SPI. The whole design is integrated on a custom PCB in a USB dongle form factor. The resulting prototype can therefore be used as a peripheral for existing systems that may require additional compute power and IoT connectivity. The performance of the prototype is evaluated in various applicable outdoor and indoor scenarios and is observed to have comparative results with industry standard modules.
- ItemOpen AccessDesign, Implementation and Assessment of BPSK and QPSK PAPR over OFDM signals using LimeSDR(2023) Kagande, Tinashe; Winberg, SimonModern day communications required efficient and affordable wireless communication techniques. Traditional wireless communication systems involved a lot of hardware for each component within the system. Software Defined Radios (SDR) became popular and were a major area of research as they provided a highly adaptive software solution to the traditional analog hardware solutions that were commonly implemented for communication systems. The benefits of using SDRs outweighed those of traditional analog hardware by a significant margin, hence so much research and development was pursued in this area. SDRs offered upgradability and adaptation often without needing to change the hardware, giving them the edge over traditional hardware approaches that needed replacement for upgrades. With Orthogonal Frequency Division Multiplexing (OFDM) being one of the most popular modulation techniques for Next Generation Networks (NGN), it was important to understand how best it could be delivered using low-cost SDRs. The biggest challenge of OFDM multiplexing was high PAPR, which could necessitate expensive circuitry for ADC/DAC components of an SDR solution. OFDM signals were popularly modulated either by Binary Phase Shift Keying (BPSK) or Quadrature Phase Shift Keying (QPSK), which then brought in the question “What was their contribution to Peak-to-Average Power Ratio (PAPR) in an OFDM system?”. Consequently, this project compared BPSK and QPSK in terms of PAPR in OFDM signals. A personal computer was used to host GnuRadio-based prototypes on which an OFDM system was developed using OFDM blocks inbuilt in the software. LimeSDR hardware was used to sample radio waves. A LimeSDR block was implemented in GnuRadio to interconnect with the LimeSDR module. An OFDM transceiver was designed in GnuRadio, and the code developed for this project was also open source. GnuRadio was selected specifically for its open-source flexibility, that allowed adaptability and the prospect to experiment with code, which was expected to be of benefit for future work. For this project, pre-selected data stored on the host PC, was transmitted from the OFDM transmitter through the LimeSDR antennas and received by the Lime SDR antennas, then demodulated and saved in a different folder on the host PC. Once this was achieved, user interface facilities were added to facilitate use and testing. Results from the testing demonstrated the compatibility of LimeSDR and GnuRadio and showed significant differences between a BPSK modulated signal and QPSK modulated signal in terms of PAPR. This project aimed to provide contributions to the radio and wireless communication field as well as being supportive towards other ongoing projects taking place in the UCT Electrical Engineering Department that connected to pertinent considerations for 5G and IoT wireless remote sensing solutions.
- ItemOpen AccessDesigning and developing a robust automated log file analysis framework for debugging complex system failure(2022) Van Balla, Tyrone Jade; Winberg, SimonAs engineering and computer systems become larger and more complex, additional challenges around the development, management and maintenance of these systems materialize. While these systems afford greater flexibility and capability, debugging failures that occur during the operation of these systems has become more challenging. One such system is the MeerKAT Radio Telescope's Correlator Beamformer (CBF), the signal processing powerhouse of the radio telescope. The majority of software and hardware systems generate log files detailing system operation during runtime. These log files have long been the go-to source of information for engineers when debugging system failures. As these systems become increasingly complex, the log files generated have exploded in both volume and complexity as log messages are recorded for all interacting parts of a system. Manually using log files for debugging system failures is no longer feasible. Recent studies have explored data-driven, automated log file analysis techniques that aim to address this challenge and have focused on two major aspects: log parsing, in which unstructured, free-form text log files are transformed into a structured dataset by extracting a set of event templates that describe the various log messages; and log file analysis, in which data-driven techniques are applied to this structured dataset to model the system behaviour and identify failures. Previous work is yet to address the combination of these two aspects to realize an end-to-end framework for performing automated log file analysis. The objective of this dissertation is to design and develop a robust, end-to-end Automated Log File Analysis Framework capable of analysing log files generated by the MeerKAT CBF to assist in system debugging. The Data Miner, Inference Engine and the complete framework are the major subsystems developed in this dissertation. State-of-the-art, data-driven approaches to log parsing were considered and the best performing approaches were incorporated into the Data Miner. The Inference Engine implements an LSTM-based multi-class classifier that models the system behaviour and uses this to perform anomaly detection to identify failures from log files. The complete framework links these two components together in a software pipeline capable of ingesting unstructured log files and outputting assistive system debugging information. The performance and operation of the framework and its subcomponents is evaluated for correctness on a publicly available, labelled dataset consisting of log files from the Hadoop Distributed File System (HDFS). Given the absence of a labelled dataset, the applicability and usefulness of the framework in the context of the MeerKAT CBF is subjectively evaluated through a case study. The framework is able to correctly model system behaviour from log files, but anomaly detection performance is greatly impacted by the nature and quality of the log files available for tuning and training the framework. When analysing log files, the framework is able to identify anomalous events quickly, even when large log files are considered. While the design of the framework primarily considered the MeerKAT CBF, a robust and generalisable end-to-end framework for automated log file analysis was ultimately developed.
- ItemOpen AccessDevelopment and testing of the RHINO host streamed data acquisition framework(2017) Boleme, Mpati; Winberg, Simon; Mohapi, LeratoThis project focuses on developing a supporting framework for integrating the Reconfigurable Hardware INterface for computing and radiO (RHINO) with a Personal Computer (PC) host in order to facilitate the development of Software Defined Radio (SDR) applications built using a hybrid RHINO/multicore PC system. The supporting framework that is the focus of this dissertation is designed around two main parts: a) resources for integrating the GNU Radio framework with the RHINO platform to allow data streams to be sent from RHINO to be processed by GNU Radio, and b) a concise and highly efficient C code module with accompanying Application Program Interface (API) that will receive streamed data from RHINO and provide data marshalling facilities to gather and dispatch blocks of data for further processing using C/C++ routines. The methodology followed in this research project involves investigating real-time streaming techniques using User Datagram Protocol (UDP) packets, furthermore, investigating how GNU Radio high-level SDR development framework can be integrated into the real-time data acquisition systems such as in the case of this project with RHINO. The literature for real-time processing requirements for the streamer framework was reviewed. The guidelines to implement a high performance, low latency and maximum throughput for streaming will consequently be presented and the proposed design motivated. The results achieved demonstrate an efficient data streaming system. The objectives of implementing RHINO data acquisition system through integration with standard C/C++ code and GNU Radio were satisfactorily met. The system was tested with real-time Radio Frequency (RF) demodulation. The system captures a pair of an In-phase/Quadrature signal (I/Q) sample at a time, which is one packet. The results show that data can be streamed from the RHINO board to GNU Radio over GbE with a minimum capturing latency of 10.2μs for 2 0 packet size and an average data capturing throughput of 0.54 Mega Bytes per second (MBps). The capturing latency, in this case, is the time taken from the time of the request to receiving the data. The FM receiver case study successfully demonstrated results of a demodulated FM signal of a 94.5 Mega Hetz (MHz) radio station. Further recommendations include making use of the 10GbE port on RHINO for data streaming purposes. 10GbE port on RHINO can be used together with GNU Radio to improve the speed of the RHINO streamer.
- ItemOpen AccessA domain specific language for facilitating automatic parallelization and placement of SDR patterns into heterogeneous computing architectures(2017) Mohapi, Lerato Jerfree; Winberg, Simon; Inggs, Michael RThis thesis presents a domain-specific language (DSL) for software defined radio (SDR) which is referred to as OptiSDR. The main objective of OptiSDR is to facilitate the development and deployment of SDR applications into heterogeneous computing architectures (HCAs). As HCAs are becoming mainstream in SDR applications such as radar, radio astronomy, and telecommunications, parallel programming and optimization processes are also becoming cumbersome, complex, and time-consuming for SDR experts. Therefore, the OptiSDR DSL and its compiler framework were developed to alleviate these parallelization and optimization processes together with developing execution models for DSP and dataflow models of computation suitable for SDR-specific computations. The OptiSDR target HCAs are composed of graphics processing units (GPUs), multi-core central processing units (MCPUs), and field programmable gate arrays (FPGAs). The methodology used to implement the OptiSDR DSL involved an extensive review process of existing SDR tools and the extent to which they address the complexities associated with parallel programming and optimizing SDR applications for execution in HCAs. From this review process, it was discovered that, while HCAs are used to accelerate many SDR computations, there is a shortage of intuitive parallel programming frameworks that efficiently utilize the HCAs' computing resources for achieving adequate performance for SDR applications. There were, however, some very good general-purpose parallel programming frameworks identied in the literature review, including Python based tools such as NumbaPro and Copperhead, as well as the prevailing Delite embedded DSL compiler framework for heterogeneous targets. The Delite embedded DSL compiler framework motivated and powered the OptiSDR compiler development in that, it provides four main compiler development capabilities that are desired in OptiSDR: 1) Generic data parallel executable patterns; 2) Execution semantics for heterogeneous MCPU-GPU run-time; 3) Abstract syntax creation using intermediate representations (IR) nodes; and 4) Extensibility for defining new syntax for other domains. The OptiSDR DSL design processes using this Delite framework involved designing the new structured parallel patterns for DSP algorithms (e.g. FIR, FFT, convolution, correlation, etc.), dataflow models of computation (MoC), parallel loop optimizations (tiling and space splitting), and optimal memory access patterns. Advanced task and data parallel patterns were applied in the OptiSDR dataflow MoCs, which are especially suitable for SDR computations where FPGA-based realtime data acquisition systems feed data into multi-GPUs for implementation of parallel DSP algorithms. Furthermore, the research methodology involved an evaluation process that was used to determine the OptiSDR language's expressive power, efficiency, performance, accuracy, and ease of use in SDR applications, such as radar pulse compression and radio frequency sweeping algorithms. The results include measurements of performance and accuracy, productivity versus performance, and real-time processing speeds and accuracy. The performance of some of the regularly used modules, such as FFT-based Hilbert and cross-correlation was found to be very high, with computations speeds ranging from 70.0 GFLOPS to 72.6 GFLOPS, and speedups of up to 80× compared to sequential C/C++ programs and 50× for Matlab's parallel loops. Accuracy was favourable in most cases favourable. For instance, OptiSDR Octave-like DSP instantiations were found to be accurate, with L2 norm forward-errors ranging from 10⁻¹³ to 10⁻¹⁶for smaller and bigger SDR programs respectively. It can therefore be concluded from the analysis in this thesis that the objectives, which include alleviating the complexities in parallel programming and optimizing SDR applications for execution in HCAs, were met. Moreover, the following hypothesis was validated, namely: "It is possible to design a DSL to facilitate the development of SDR applications and their deployment on HCAs without significant degradation of software performance, and with possible improvement in the automatically emitted low-level source code quality.". It was validated by; 1) Defining the OptiSDR attributes such as parallel DSP patterns and dataflow MoCs; 2) Providing parameterizable SDR modules with automatic parallelization and optimization for performance and accuracy; and 3) Presenting a set of intuitive validation constructs for accuracy testing using root-mean square error, and functional verification of DSP using two-dimensional graphics plotting for radar and real-time spectral analysis plots.
- ItemMetadata onlyEEE4084F Digital Systems(2013) Winberg, SimonThe objective of this course is for students to develop an understanding of the concepts involved in the design and development of high performance and special-purpose digital computing systems. The course involves lectures in a standard lecture venue. Projects and pracs are done using computers and other hardware in a laboratory. Presentation slides and the assignments are available on the publicly accessible website for this course. Correspondence and assistance with assignments are provided by the lecturer, tutors and students via a Google Group. Some recorded lectures and tutorials are available on the website for the course as open access resources to assist in students' learning and completion of the pracs
- ItemOpen AccessEnhanced mobile computing using cloud resources(2011) Paverd, Andrew James; Inggs, Michael; Winberg, SimonThe purpose of this research is to investigate, review and analyse the use of cloud resources for the enhancement of mobile computing. Mobile cloud computing refers to a distributed computing relationship between a resource-constrained mobile device and a remote high-capacity cloud resource. Investigation of prevailing trends has shown that this will be a key technology in the development of future mobile computing systems. This research presents a theoretical analysis framework for mobile cloud computing. This analysis framework is a structured consolidation of the salient considerations identified in recent scientific literature and commercial endeavours. The use of this framework in the analysis of various mobile application domains has elucidated several significant benefits of mobile cloud computing including increases in system performance and efficiency. Based on recent scientific literature and commercial endeavours, various implementation approaches for mobile cloud computing have been identified, categorized and analysed according to their architectural characteristics. This has resulted in a set of advantages and disadvantages for each category of system architecture. Overall, through the development and application of the new analysis framework, this work provides a consolidated review and structured critical analysis of the current research and developments in the field of mobile cloud computing.
- ItemOpen AccessEnhancement of the Fynbos Leaf Optical Recognition Application (FLORA-E)(2021) Makumborenga, Roy; Winberg, SimonObject perception, classification and similarity discernment are relatively effortless tasks in humans. The exact method by which the brain achieves these is not yet fully understood. Identification, classification and similarity inference are currently nontrivial tasks for machine learning enabled platforms, even more so for ones operating in real time applications. This dissertation conducted research on the use of machine learning algorithms in object identification and classification by designing and developing an artificially intelligent Fynbos Leaf Optical Recognition Application (FLORA) platform. Previous versions of FLORA (versions A through D) were designed to recognise Proteaceae fynbos leaves by extracting six digital morphological features, then using the k-nearest neighbour (k-NN) algorithm for classification, yielding an 86.6% accuracy. The methods utilised in FLORA-A to -D are ineffective when attempting to classify irregular structured objects with high variability, such as stems and leafy stems. A redesign of the classification algorithms in the latest version, FLORA-E, was therefore necessary to cater for irregular fynbos stems. Numerous algorithms and techniques are available that can be used to achieve this objective. Keypoint matching, moments analysis and image hashing are the three techniques which were investigated in this thesis for suitability in achieving fynbos stem and leaf classification. These techniques form active areas of research within the field of image processing and were chosen because of their affine transformation invariance and low computational complexity, making them suitable for real time classification applications. The resulting classification solution, designed from experimentation on the three techniques under investigation, is a keypoint matching – Hu moment hybrid algorithm who`s output is a similarity index (SI) score that is used to return a ranked list of potential matches. The algorithm showed a relatively high degree of match accuracy when run on both regular (leaves) and irregular (stems) objects. The algorithm successfully achieved a top 5 match rate of 76% for stems, 86% for leaves and 81% overall when tested using a database of 24 fynbos species (predominantly from the Proteaceae family), where each species had approximately 50 sample images. Experimental results show that Hu moment and keypoint classifiers are ideal for real time applications because of their fast-matching capabilities. This allowed the resulting hybrid algorithm to achieve a nominal computation time of ~0.78s per sample on the test apparatus setup for this thesis. The scientific objective of this thesis was to build an artificially intelligent platform capable of correctly classifying fynbos flora by conducting research on object identification and classification algorithms. However, the core driving factor is rooted in the need to promote conservation in the Cape Floristic Region (CFR). The FLORA project is an example of how science and technology can be used as effective tools in aiding conservation and environmental awareness efforts. The FLORA platform can also be a useful tool for professional botanists, conservationists and fynbos enthusiasts by giving them access to an indexed and readily available digital catalogue of fynbos species across the CFR.
- ItemOpen AccessAn FPGA-based digital triggering system with model-integrated configuration environment for the control of NIM electronics(2012) Mohapi, Lerato Jerfree; Winberg, Simon; Murray, SeanThis dissertation presents a project to develop a real-time trigger that is hardware reconfigurable, triggering on user specified events, and captures data for permanent storage and later processing. This triggering platform is planned to replace previous analogue triggering systems that involve time-consuming manual tasks of connecting analogue electronics with NIM components. These manual tasks involve a multitude of wiring connections and their timings are error prone. Multiple on-going experiments could not time-share the expensive NIM electronics, implying lengthy waits between experiments and inefficient resource us- age. The new triggering platform provides significant time saving for physicists setting up experiments, together with a model- based system that speeds up the design and setting-up of experiments.
- ItemOpen AccessGalaxy evolution, cosmology and HPC : clustering studies applied to astronomy(2016) Tshililo, Israel R; Cress, Catherine; Winberg, SimonTools to measure clustering are essential for analysis of Astronomical datasets and can potentially be used in other fields for data mining. The Two-point Correlation Function (TPCF), in particular, is used to characterize the distribution of matter and objects such as galaxies in the Universe. However, it's computational time will be restrictively slow given the significant increase in the size of datasets expected from surveys in the future. Thus, new computational techniques are necessary in order to measure clustering efficiently. The objective of this research was to investigate methods to accelerate the computation of the TPCF and to use the TPCF to probe an interesting scientific question dealing with the masses of galaxy clusters measured using data from the Planck satellite. An investigation was conducted to explore different techniques and architectures that can be used to accelerate the computation of the TPCF. The code CUTE, was selected in particular to test shared-memory systems using OpenMP and GPU acceleration using CUDA. Modification were then made to the code, to improve the nearest neighbour boxing technique. The results show that the modified code offers a significant improved performance. Additionally, a particularly effective implementation was used to measure the clustering of galaxy clusters detected by the Planck satellite: our results indicated that the clusters were more massive than had been inferred in previous work, providing an explanation for apparent inconsistencies in the Planck data.