Joining and aggregating datasets using CouchDB
| dc.contributor.advisor | Berman, Sonia | |
| dc.contributor.author | Smith, Zach | |
| dc.date.accessioned | 2019-02-14T13:15:50Z | |
| dc.date.available | 2019-02-14T13:15:50Z | |
| dc.date.issued | 2018 | |
| dc.date.updated | 2019-02-14T11:26:06Z | |
| dc.description.abstract | Data mining typically requires implementing operations that involve cross-cutting entity boundaries and are awkward to implement in document-oriented databases. CouchDB, for example, models entities as documents, with highly isolated entity boundaries, and on which joins cannot be directly performed. This project shows how join and aggregation can be achieved across entity boundaries in such systems, as encountered for example in the pre-processing and exploration stages of educational data mining. A software stack is presented as a means by which this can be achieved; first, datasets are processed via ETL operations, then MapReduce is used to create indices of ordered and aggregated data. Finally, a Couchdb list function is used to iterate through these indices and perform joins, and to compute aggregated values on joined datasets such as variance and correlations. In terms of the case study, it is shown that the proposed approach to implementing cross-document joins and aggregation is effective and scalable. In addition, it was discovered that for the 2014 - 2016 UCT cohorts, NBT scores correlate better with final grades for the CSC1015F course than do Grade 12 results for English, Science and Mathematics. | |
| dc.identifier.apacitation | Smith, Z. (2018). <i>Joining and aggregating datasets using CouchDB</i>. (). University of Cape Town ,Faculty of Science ,Department of Computer Science. Retrieved from http://hdl.handle.net/11427/29530 | en_ZA |
| dc.identifier.chicagocitation | Smith, Zach. <i>"Joining and aggregating datasets using CouchDB."</i> ., University of Cape Town ,Faculty of Science ,Department of Computer Science, 2018. http://hdl.handle.net/11427/29530 | en_ZA |
| dc.identifier.citation | Smith, Z. 2018. Joining and aggregating datasets using CouchDB. University of Cape Town. | en_ZA |
| dc.identifier.ris | TY - Thesis / Dissertation AU - Smith, Zach AB - Data mining typically requires implementing operations that involve cross-cutting entity boundaries and are awkward to implement in document-oriented databases. CouchDB, for example, models entities as documents, with highly isolated entity boundaries, and on which joins cannot be directly performed. This project shows how join and aggregation can be achieved across entity boundaries in such systems, as encountered for example in the pre-processing and exploration stages of educational data mining. A software stack is presented as a means by which this can be achieved; first, datasets are processed via ETL operations, then MapReduce is used to create indices of ordered and aggregated data. Finally, a Couchdb list function is used to iterate through these indices and perform joins, and to compute aggregated values on joined datasets such as variance and correlations. In terms of the case study, it is shown that the proposed approach to implementing cross-document joins and aggregation is effective and scalable. In addition, it was discovered that for the 2014 - 2016 UCT cohorts, NBT scores correlate better with final grades for the CSC1015F course than do Grade 12 results for English, Science and Mathematics. DA - 2018 DB - OpenUCT DP - University of Cape Town LK - https://open.uct.ac.za PB - University of Cape Town PY - 2018 T1 - Joining and aggregating datasets using CouchDB TI - Joining and aggregating datasets using CouchDB UR - http://hdl.handle.net/11427/29530 ER - | en_ZA |
| dc.identifier.uri | http://hdl.handle.net/11427/29530 | |
| dc.identifier.vancouvercitation | Smith Z. Joining and aggregating datasets using CouchDB. []. University of Cape Town ,Faculty of Science ,Department of Computer Science, 2018 [cited yyyy month dd]. Available from: http://hdl.handle.net/11427/29530 | en_ZA |
| dc.language.iso | eng | |
| dc.publisher.department | Department of Computer Science | |
| dc.publisher.faculty | Faculty of Science | |
| dc.publisher.institution | University of Cape Town | |
| dc.subject.other | Computer Science | |
| dc.title | Joining and aggregating datasets using CouchDB | |
| dc.type | Master Thesis | |
| dc.type.qualificationlevel | Masters | |
| dc.type.qualificationname | MSc |