Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

On the theory and application of pattern maximum likelihood

Abstract

Pattern Maximum Likelihood (PML) is a method of probability estimation that works well for large alphabets. It does not assume that all elements from the unknown alphabet have been observed. PML outperforms the traditional Maximum Likelihood for sequences, and it is particularly useful when the sample size is small. In this dissertation we study both the theory and application of PML. For the theory part, we extend the previous results on the properties of PML, and also show how to find the PML distributions analytically for patterns of simple forms. For general patterns, PML probabilities can be approximated using a previously developed EM algorithm, which we will prove to be equivalent to a generalized Gradient Ascend Method. We also use the algorithm to conduct experiments on different distributions and evaluate the performance of PML. In addition, we investigate the calculation of pattern probability. We show that the pattern probability is closely related to symmetric polynomials, and it can be written as a summation over graphs using power sums. Along the way we reveal a relation between pattern probability and the enumeration of certain connected graphs as well as inversion-free trees. For applications, we show how PML can be used to predict the number of new symbols that would appear in a future sample. We conduct experiments on various distributions and compare PML to the method of Good & Toulmin and the method of Efron & Thisted. We demonstrate that PML outperforms the other methods even if the future sample size is large. Finally we apply PML to authenticating the authorship of the Taylor poem, attributed to Shakespeare, and conclude that it is consistent with Efron and Thisted's models. PML deals with samples from a single distribution. In the last part of this dissertation we extend PML to set-patterns where multiple samples are observed from concurrent Bernoulli processes. Analogous to the single-process patterns, we show that for certain forms of set-patterns we can find the exact Set-pattern Maximum Likelihood (SPML) probabilities analytically. Furthermore, for general set- patterns we extend the previous EM algorithm to approximate the SPML probabilities. We also show that for samples taken from Poisson distributions the set-pattern is reduced to the single-process pattern problem

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View