Browsing by Subject "Cloud Computing"
Now showing 1 - 3 of 3
Results Per Page
Sort Options
- ItemOpen AccessAutomated feature synthesis on big data using cloud computing resources(University of Cape Town, 2020) Saker, Vanessa; Berman, SoniaThe data analytics process has many time-consuming steps. Combining data that sits in a relational database warehouse into a single relation while aggregating important information in a meaningful way and preserving relationships across relations, is complex and time-consuming. This step is exceptionally important as many machine learning algorithms require a single file format as an input (e.g. supervised and unsupervised learning, feature representation and feature learning, etc.). An analyst is required to manually combine relations while generating new, more impactful information points from data during the feature synthesis phase of the feature engineering process that precedes machine learning. Furthermore, the entire process is complicated by Big Data factors such as processing power and distributed data storage. There is an open-source package, Featuretools, that uses an innovative algorithm called Deep Feature Synthesis to accelerate the feature engineering step. However, when working with Big Data, there are two major limitations. The first is the curse of modularity - Featuretools stores data in-memory to process it and thus, if data is large, it requires a processing unit with a large memory. Secondly, the package is dependent on data stored in a Pandas DataFrame. This makes the use of Featuretools with Big Data tools such as Apache Spark, a challenge. This dissertation aims to examine the viability and effectiveness of using Featuretools for feature synthesis with Big Data on the cloud computing platform, AWS. Exploring the impact of generated features is a critical first step in solving any data analytics problem. If this can be automated in a distributed Big Data environment with a reasonable investment of time and funds, data analytics exercises will benefit considerably. In this dissertation, a framework for automated feature synthesis with Big Data is proposed and an experiment conducted to examine its viability. Using this framework, an infrastructure was built to support the process of feature synthesis on AWS that made use of S3 storage buckets, Elastic Cloud Computing services, and an Elastic MapReduce cluster. A dataset of 95 million customers, 34 thousand fraud cases and 5.5 million transactions across three different relations was then loaded into the distributed relational database on the platform. The infrastructure was used to show how the dataset could be prepared to represent a business problem, and Featuretools used to generate a single feature matrix suitable for inclusion in a machine learning pipeline. The results show that the approach was viable. The feature matrix produced 75 features from 12 input variables and was time efficient with a total end-to-end run time of 3.5 hours and a cost of approximately R 814 (approximately $52). The framework can be applied to a different set of data and allows the analysts to experiment on a small section of the data until a final feature set is decided. They are able to easily scale the feature matrix to the full dataset. This ability to automate feature synthesis, iterate and scale up, will save time in the analytics process while providing a richer feature set for better machine learning results.
- ItemOpen AccessCloud Computing Benefit Realisation in a South African Public Sector: A postadoption study(2022) Breda, Leigh N; Kyobe, MichaelBackground: Cloud Computing is a globally evolving trend that is changing the landscape of Information Technology as we know it. The perceived benefits of Cloud adoption are spurring IT leaders to move to Cloud Computing to maintain the competitive edge, regardless of some of the challenges associated with Cloud adoption. Currently, the predominant reason for organisations to adopt Cloud Computing is the reduction of costs. However, some organisations report that they are not receiving the perceived benefits as expected pre-adoption. Despite this known fact that cost reduction is not guaranteed, organisations are expected to increase their IT spending in the future on Cloud Computing. As organisations are reporting that they are not receiving a tangible and easily measurable benefit such as cost reduction, it is imperative for organisations to measure and confirm that intangible benefits which are difficult to quantify are being received. This measured approach is essential to aid organisations in understanding the actualised benefits of Cloud Computing. Objective: Current literature predominantly focuses on the adoption of Cloud Computing with the private sector as its consumers. Minimal research has been explored with Cloud Computing postadoption, explicitly focusing on the South African public sector context. Little is known about these organisations and if they have actualised the perceived benefits from its adoption phase. Further to this, understand how these organisations have measured the degree in which they have benefited from the adoption. The purpose of this research is to contribute to knowledge regarding organisations in the public sector and what factors influence the actualisation of perceived adoption benefits postimplementation. Method: The researcher adopted a constructivism ontological stance, interpretivist epistemology, and an inductive approach to conduct this research. Qualitative data was collected in the form of 20 semistructured interviews conducted over a period of 12 months. These interviews were conducted in an organisation in the public sector that has implemented Cloud solutions already, and can provide a retrospective view of its adoption. Thematic analysis was utilised to sort the responses into categories and themes. These themes were further filtered by using a research model based on the TOE framework as the lens to structure the data. Findings: This research revealed a discrepancy in the perceived benefits of pre-adoption and the received actualised benefits of post-adoption across the organisation. This is primarily due to the lack of IT management predefining metrics to determine the degree that the adoption has benefited the organisation. Secondly, depending on the type of Cloud service and user role, benefits can vary, thus having one part of the organisation very satisfied and another area dissatisfied. Lastly, the pre-adoption factor for adoption is not necessarily a factor that influences the continued use of Cloud Computing. External factors such as the COVID-19 pandemic have shifted perceptions and organisational requirements due to the increased pressure to deliver services and work remotely. This increase in dependency on Cloud Computing altered the main factor of cost reduction, so significantly that if the Cloud would cost more, the organisation would continue its use due to the additional benefits that Cloud Computing provides.
- ItemOpen AccessData storage security for cloud computing using elliptic curve cryptography(University of Cape Town, 2020) Buop, George Onyango; Murgu, AlexandruInstitutions and enterprises are moving towards more service availability, managed risk and at the same time, aim at reducing cost. Cloud Computing is a growing technology, thriving in the fields of information communication and data storage. With the proliferation of online activity, more and more information is saved as data every day. This means that more data is being stored in the cloud than ever before. Data that is stored online often holds private information – such as addresses, payment details and medical documentation. These become the target of cyber criminals. There is therefore growing need to protect these data from threats and issues such as data breach and leakage, data loss, account takeover or hijackings, among others. Cryptography refers to securing the information and communication techniques based on mathematical concepts and algorithms which transform messages in ways that are hard to decipher. Cryptography is one of the techniques we could protect data stored in the cloud as it enables security properties of data confidentiality and integrity. This research investigates the security issues that affect storage of data in the cloud. This thesis also discusses the previous research work and the currently available technology and techniques that are used for securing data in the cloud. This thesis then presents a novel scheme for security of data stored in Cloud Computing by using Elliptic Curve Integrated Encryption Scheme (ECIES) that provides for confidentiality and integrity. This scheme also uses Identity Based Cryptography (IBC) for more efficient key management. The proposed scheme combines the security of Identity- Based Cryptography (IBC), Trusted cloud (TC), and Elliptic Curve Cryptography (ECC) to reduce system complexity and provide more security for cloud computing applications. The research shows that it is possible to securely store confidential user data on a Public Cloud such as Amazon S3 or Windows Azure Storage without the need to trust the Cloud Provider and with minimal overhead in processing time. The results of implementing the proposed scheme shows faster and more efficient communication operation when it comes to key generation as well as encryption and decryption. The difference in the time taken for these operations is as a result of the use of ECC algorithm which has a small key size and hence highly efficient compared with other types of asymmetric cryptography. The results obtained show the scheme is more efficient, when compared with other classification techniques in the literature.