Exploring the application of Word2Vec to basket transaction data in the grocery retail industry

Master Thesis


Permanent link to this Item
Journal Title
Link to Journal
Journal ISSN
Volume Title
In this thesis, we explore the application of Word2vec to basket transaction data provided by a large grocery retailer in South Africa. Word2vec is an algorithm based on representation learning. The objective of the exploration is to establish whether the application of Word2vec to basket transaction data would generate product embeddings that represent a useful relationship between products. Furthermore, we compareWord2vec's outputs and performance to traditional methods for studying product relationships which include Association Rules Mining (ARM) and Recommendation Systems. The results from the experiments showed that indeed product embeddings created by Word2vec on transaction data are meaningful and useful. It was clear that the idea of using transactions in the place of sentences to the neural network, provides analogous results to that of a natural language task. Word2vec clearly demonstrated its ability to cluster products that are homogeneous or fulfill similar needs. Furthermore this sort of product relationship was not provided by any other traditional methods, which was clear when comparing the outputs to that of ARM and Recommendation Systems. We also show that usingWord2vec could potentially provide insight on truly complementary products that ARM perhaps fails to do. Word2vec also proved to be incredibly scalable, taking input data of 20 times the size of what traditional methods could handle on a local computer. We end with a description of a potential application of the ideas learnt during the course of this study, with a real business problem, that we believe could lead to an enhanced customer shopping experience and in turn increase revenue and profits for the retailer.