Statistics for Automated feature synthesis on big data using cloud computing resources