Joining and aggregating datasets using CouchDB

dc.contributor.advisorBerman, Sonia
dc.contributor.authorSmith, Zach
dc.date.accessioned2019-02-14T13:15:50Z
dc.date.available2019-02-14T13:15:50Z
dc.date.issued2018
dc.date.updated2019-02-14T11:26:06Z
dc.description.abstractData mining typically requires implementing operations that involve cross-cutting entity boundaries and are awkward to implement in document-oriented databases. CouchDB, for example, models entities as documents, with highly isolated entity boundaries, and on which joins cannot be directly performed. This project shows how join and aggregation can be achieved across entity boundaries in such systems, as encountered for example in the pre-processing and exploration stages of educational data mining. A software stack is presented as a means by which this can be achieved; first, datasets are processed via ETL operations, then MapReduce is used to create indices of ordered and aggregated data. Finally, a Couchdb list function is used to iterate through these indices and perform joins, and to compute aggregated values on joined datasets such as variance and correlations. In terms of the case study, it is shown that the proposed approach to implementing cross-document joins and aggregation is effective and scalable. In addition, it was discovered that for the 2014 - 2016 UCT cohorts, NBT scores correlate better with final grades for the CSC1015F course than do Grade 12 results for English, Science and Mathematics.
dc.identifier.apacitationSmith, Z. (2018). <i>Joining and aggregating datasets using CouchDB</i>. (). University of Cape Town ,Faculty of Science ,Department of Computer Science. Retrieved from http://hdl.handle.net/11427/29530en_ZA
dc.identifier.chicagocitationSmith, Zach. <i>"Joining and aggregating datasets using CouchDB."</i> ., University of Cape Town ,Faculty of Science ,Department of Computer Science, 2018. http://hdl.handle.net/11427/29530en_ZA
dc.identifier.citationSmith, Z. 2018. Joining and aggregating datasets using CouchDB. University of Cape Town.en_ZA
dc.identifier.ris TY - Thesis / Dissertation AU - Smith, Zach AB - Data mining typically requires implementing operations that involve cross-cutting entity boundaries and are awkward to implement in document-oriented databases. CouchDB, for example, models entities as documents, with highly isolated entity boundaries, and on which joins cannot be directly performed. This project shows how join and aggregation can be achieved across entity boundaries in such systems, as encountered for example in the pre-processing and exploration stages of educational data mining. A software stack is presented as a means by which this can be achieved; first, datasets are processed via ETL operations, then MapReduce is used to create indices of ordered and aggregated data. Finally, a Couchdb list function is used to iterate through these indices and perform joins, and to compute aggregated values on joined datasets such as variance and correlations. In terms of the case study, it is shown that the proposed approach to implementing cross-document joins and aggregation is effective and scalable. In addition, it was discovered that for the 2014 - 2016 UCT cohorts, NBT scores correlate better with final grades for the CSC1015F course than do Grade 12 results for English, Science and Mathematics. DA - 2018 DB - OpenUCT DP - University of Cape Town LK - https://open.uct.ac.za PB - University of Cape Town PY - 2018 T1 - Joining and aggregating datasets using CouchDB TI - Joining and aggregating datasets using CouchDB UR - http://hdl.handle.net/11427/29530 ER - en_ZA
dc.identifier.urihttp://hdl.handle.net/11427/29530
dc.identifier.vancouvercitationSmith Z. Joining and aggregating datasets using CouchDB. []. University of Cape Town ,Faculty of Science ,Department of Computer Science, 2018 [cited yyyy month dd]. Available from: http://hdl.handle.net/11427/29530en_ZA
dc.language.isoeng
dc.publisher.departmentDepartment of Computer Science
dc.publisher.facultyFaculty of Science
dc.publisher.institutionUniversity of Cape Town
dc.subject.otherComputer Science
dc.titleJoining and aggregating datasets using CouchDB
dc.typeMaster Thesis
dc.type.qualificationlevelMasters
dc.type.qualificationnameMSc
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
thesis_sci_2018_smith_zach.pdf
Size:
1.4 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
0 B
Format:
Item-specific license agreed upon to submission
Description:
Collections