Linear regression techniques for identifying influential data and applications in commercial data analysis
dc.contributor.advisor | Troskie, Cas | |
dc.contributor.author | Jacobs, Michael Kalman | |
dc.date.accessioned | 2023-09-27T13:58:18Z | |
dc.date.available | 2023-09-27T13:58:18Z | |
dc.date.issued | 1983 | |
dc.date.updated | 2023-09-27T13:53:04Z | |
dc.description.abstract | Recent literature contains many publications on techniques for identifying extreme data points (outliers) and influential observations or groups in sample data sets. This thesis begins by reviewing the statistics and distributional properties of the standard techniques, viz. the standardized residual as a test for outliers, and Cook's distance as a measure of influence. An outlier test which is distributionally neater than the standardized residual is proposed. In practical applications, ordinary least squares regression is often inappropriate, and the use of biased estimators may be preferable. In this thesis, the existing theory is extended to several alternative regression techniques. Ridge regression and generalized inverse regression are suitable techniques when the cross-product matrix is ill-conditioned. Restricted least squares regression, with exact or stochastic prior information, · is used in many econometric application~. Models with selected · variables-are used to eliminate design faults or to reduce computational effort. New statistics are developed for all these techniques, the distributional results are proved, and computational formulae are developed. Computational problems may arise in the actual use of the various techniques, and these are investigated. Computer programs written in BASIC and suitable for microcomputer use are presented, making the techniques accessible to virtually any commercial environment. The performance of the various techniques is examined, using a controlled simulation study and a number of practical data sets drawn from several areas of South African commerce. This is, as far as can be ascertained, the first extensive practical South African study on the effects of influential data. It is shown that the presence of outliers or influential data can bias the results of any study significantly. It is recommended that no data analysis should be attempted without a preliminary scan of outliers and influential observation. The techniques presented can be used advantageously even in data sets where the ultimate analysis does not involve linear regression. It is shown that influential data are not merely of nuisance value in the analysis but may contain valuable - information in their own right._ | |
dc.identifier.apacitation | Jacobs, M. K. (1983). <i>Linear regression techniques for identifying influential data and applications in commercial data analysis</i>. (). ,Faculty of Commerce ,School of Economics. Retrieved from http://hdl.handle.net/11427/38914 | en_ZA |
dc.identifier.chicagocitation | Jacobs, Michael Kalman. <i>"Linear regression techniques for identifying influential data and applications in commercial data analysis."</i> ., ,Faculty of Commerce ,School of Economics, 1983. http://hdl.handle.net/11427/38914 | en_ZA |
dc.identifier.citation | Jacobs, M.K. 1983. Linear regression techniques for identifying influential data and applications in commercial data analysis. . ,Faculty of Commerce ,School of Economics. http://hdl.handle.net/11427/38914 | en_ZA |
dc.identifier.ris | TY - Doctoral Thesis AU - Jacobs, Michael Kalman AB - Recent literature contains many publications on techniques for identifying extreme data points (outliers) and influential observations or groups in sample data sets. This thesis begins by reviewing the statistics and distributional properties of the standard techniques, viz. the standardized residual as a test for outliers, and Cook's distance as a measure of influence. An outlier test which is distributionally neater than the standardized residual is proposed. In practical applications, ordinary least squares regression is often inappropriate, and the use of biased estimators may be preferable. In this thesis, the existing theory is extended to several alternative regression techniques. Ridge regression and generalized inverse regression are suitable techniques when the cross-product matrix is ill-conditioned. Restricted least squares regression, with exact or stochastic prior information, · is used in many econometric application~. Models with selected · variables-are used to eliminate design faults or to reduce computational effort. New statistics are developed for all these techniques, the distributional results are proved, and computational formulae are developed. Computational problems may arise in the actual use of the various techniques, and these are investigated. Computer programs written in BASIC and suitable for microcomputer use are presented, making the techniques accessible to virtually any commercial environment. The performance of the various techniques is examined, using a controlled simulation study and a number of practical data sets drawn from several areas of South African commerce. This is, as far as can be ascertained, the first extensive practical South African study on the effects of influential data. It is shown that the presence of outliers or influential data can bias the results of any study significantly. It is recommended that no data analysis should be attempted without a preliminary scan of outliers and influential observation. The techniques presented can be used advantageously even in data sets where the ultimate analysis does not involve linear regression. It is shown that influential data are not merely of nuisance value in the analysis but may contain valuable - information in their own right._ DA - 1983 DB - OpenUCT DP - University of Cape Town KW - Influential data LK - https://open.uct.ac.za PY - 1983 T1 - ETD: Linear regression techniques for identifying influential data and applications in commercial data analysis TI - ETD: Linear regression techniques for identifying influential data and applications in commercial data analysis UR - http://hdl.handle.net/11427/38914 ER - | en_ZA |
dc.identifier.uri | http://hdl.handle.net/11427/38914 | |
dc.identifier.vancouvercitation | Jacobs MK. Linear regression techniques for identifying influential data and applications in commercial data analysis. []. ,Faculty of Commerce ,School of Economics, 1983 [cited yyyy month dd]. Available from: http://hdl.handle.net/11427/38914 | en_ZA |
dc.language.rfc3066 | eng | |
dc.publisher.department | School of Economics | |
dc.publisher.faculty | Faculty of Commerce | |
dc.subject | Influential data | |
dc.title | Linear regression techniques for identifying influential data and applications in commercial data analysis | |
dc.type | Doctoral Thesis | |
dc.type.qualificationlevel | Doctoral | |
dc.type.qualificationlevel | PhD |