Skip to main content
eScholarship
Open Access Publications from the University of California

UCLA

UCLA Electronic Theses and Dissertations bannerUCLA

Predicting the Selling Prices of Used Cars in Pakistan Using Various Statistical Learning Models

Abstract

This thesis is dedicated to exploring the relationship between features of Pakistani used cars and their prices, potentially providing insights to help relevant stakeholders of used cars business make informed decisions. Multiple statistical learning methods involved in regression analysis, namely Linear Regression, Elastic Net, Random Forest, and XGBoost, are utilized to predictively model and quantitatively explain the said relationship. From a Kaggle dataset of 77,878 rows and 14 columns (including response variable “price”), data preprocessing was done, including removal of outliers, empty values, and missing entries; re-grouping, re-naming, and conversion of certain variables, etc. Then, exploratory data analyses were conducted, including addressing non-normal or highly-correlated numerical variables and overly homogeneous categorical variables. Next, the processed dataset (70,500 rows and 10 columns) was split into 80% training and 20% testing sets, and model evaluation criteria of Root Mean Squared Error (RMSE) and Coefficient of Determination (R2) were determined. Following, the four models were applied on the training set, with the 9 main predictors and all 36 pairs of the interactions served as a baseline design matrix, and natural-log transformed “price” served as the response. After fitting models with corresponding methods of dimension reduction method or hyper-parameter tuning, model evaluation criteria testing RMSE and R2 were calculated using prediction on the testing set. Eventually, XGBoost model stood out as the final model due to optimal performance in both performance scores, and features such as body type, engine volume, age, and transmission type, along with interacting effect between age and the other three factors, were agreed by all four models to be important in predicting Pakistani used car prices.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View