A lossy, dictionary -based method for short message service (SMS) text compression

dc.contributor.advisorMarsden, Garyen_ZA
dc.contributor.authorMartin, Wickusen_ZA
dc.date.accessioned2014-08-13T19:31:24Z
dc.date.available2014-08-13T19:31:24Z
dc.date.issued2009en_ZA
dc.description.abstractShort message service (SMS) message compression allows either more content to be fitted into a single message or fewer individual messages to be sent as part of a concatenated (or long) message. While essentially only dealing with plain text, many of the more popular compression methods do not bring about a massive reduction in size for short messages. The Global System for Mobile communications (GSM) specification suggests that untrained Huffman encoding is the only required compression scheme for SMS messaging, yet support for SMS compression is still not widely available on current handsets. This research shows that Huffman encoding might actually increase the size of very short messages and only modestly reduce the size of longer messages. While Huffman encoding yields better results for larger text sizes, handset users do not usually write very large messages consisting of thousands of characters. Instead, an alternative compression method called lossy dictionary-based (LD-based) compression is proposed here. In terms of this method, the coder uses a dictionary tuned to the most frequently used English words and economically encodes white space. The encoding is lossy in that the original case is not preserved; instead, the resulting output is all lower case, a loss that might be acceptable to most users. The LD-based method has been shown to outperform Huffman encoding for the text sizes typically used when writing SMS messages, reducing the size of even very short messages and even, for instance, cutting a long message down from five to two parts. Keywords: SMS, text compression, lossy compression, dictionary compressionen_ZA
dc.identifier.apacitationMartin, W. (2009). <i>A lossy, dictionary -based method for short message service (SMS) text compression</i>. (Thesis). University of Cape Town ,Faculty of Science ,Department of Computer Science. Retrieved from http://hdl.handle.net/11427/6415en_ZA
dc.identifier.chicagocitationMartin, Wickus. <i>"A lossy, dictionary -based method for short message service (SMS) text compression."</i> Thesis., University of Cape Town ,Faculty of Science ,Department of Computer Science, 2009. http://hdl.handle.net/11427/6415en_ZA
dc.identifier.citationMartin, W. 2009. A lossy, dictionary -based method for short message service (SMS) text compression. University of Cape Town.en_ZA
dc.identifier.ris TY - Thesis / Dissertation AU - Martin, Wickus AB - Short message service (SMS) message compression allows either more content to be fitted into a single message or fewer individual messages to be sent as part of a concatenated (or long) message. While essentially only dealing with plain text, many of the more popular compression methods do not bring about a massive reduction in size for short messages. The Global System for Mobile communications (GSM) specification suggests that untrained Huffman encoding is the only required compression scheme for SMS messaging, yet support for SMS compression is still not widely available on current handsets. This research shows that Huffman encoding might actually increase the size of very short messages and only modestly reduce the size of longer messages. While Huffman encoding yields better results for larger text sizes, handset users do not usually write very large messages consisting of thousands of characters. Instead, an alternative compression method called lossy dictionary-based (LD-based) compression is proposed here. In terms of this method, the coder uses a dictionary tuned to the most frequently used English words and economically encodes white space. The encoding is lossy in that the original case is not preserved; instead, the resulting output is all lower case, a loss that might be acceptable to most users. The LD-based method has been shown to outperform Huffman encoding for the text sizes typically used when writing SMS messages, reducing the size of even very short messages and even, for instance, cutting a long message down from five to two parts. Keywords: SMS, text compression, lossy compression, dictionary compression DA - 2009 DB - OpenUCT DP - University of Cape Town LK - https://open.uct.ac.za PB - University of Cape Town PY - 2009 T1 - A lossy, dictionary -based method for short message service (SMS) text compression TI - A lossy, dictionary -based method for short message service (SMS) text compression UR - http://hdl.handle.net/11427/6415 ER - en_ZA
dc.identifier.urihttp://hdl.handle.net/11427/6415
dc.identifier.vancouvercitationMartin W. A lossy, dictionary -based method for short message service (SMS) text compression. [Thesis]. University of Cape Town ,Faculty of Science ,Department of Computer Science, 2009 [cited yyyy month dd]. Available from: http://hdl.handle.net/11427/6415en_ZA
dc.language.isoeng
dc.publisher.departmentDepartment of Computer Scienceen_ZA
dc.publisher.facultyFaculty of Scienceen_ZA
dc.publisher.institutionUniversity of Cape Town
dc.subject.otherInformation Technologyen_ZA
dc.titleA lossy, dictionary -based method for short message service (SMS) text compressionen_ZA
dc.typeMaster Thesis
dc.type.qualificationlevelMasters
dc.type.qualificationnameMScen_ZA
uct.type.filetypeText
uct.type.publicationResearchen_ZA
uct.type.resourceThesisen_ZA
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
thesis_sci_2009_martin_wickus.pdf
Size:
1.89 MB
Format:
Adobe Portable Document Format
Description:
Collections