Skip to main content
eScholarship
Open Access Publications from the University of California

Model selection for probabilistic clustering using cross-validated likelihood

Abstract

Cross-validated likelihood is investigated as a tool for automatically determining the appropriate number of components (given the data) in finite mixture modelling, particularly in the context of model-based probabilistic clustering. The conceptual framework for the cross-validation approach to model selection is direct in the sense that models are judged directly on their out-of-sample predictive performance. The method is applied to a well-known clustering problem in the atmospheric science literature using historical records of upper atmosphere geopotential height in the Northern hemisphere. Cross-validated likelihood provides strong evidence for three clusters in the data set, providing an objective confirmation of earlier results derived using non-probabilistic clustering techniques.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View