Skip to main content
eScholarship
Open Access Publications from the University of California

UC Santa Barbara

UC Santa Barbara Electronic Theses and Dissertations bannerUC Santa Barbara

Things and Strings and More: Improving Place Name Disambiguation from Short Texts by Combining Entity Co-Occurrence, Topic Modeling, and Word Embedding

Abstract

Place name disambiguation, i.e., toponym disambiguation or toponym resolution, is the task of correctly identifying a place from a set of places sharing a common name. It contributes to a variety of tasks such as knowledge extraction, query answering, geographic information retrieval, and automatic tagging. Disambiguation quality relies on the ability to correctly identify and interpret contextual clues, complicating the task for short texts. Here I propose a novel approach to the disambiguation of place names from short texts that integrates three models: entity co-occurrence, topic modeling, and word embedding. The first model uses Linked Data to identify related entities to improve disambiguation quality. The second model uses topic modeling to differentiate places based on the terms used to describe them. The third model uses word embeddings to uncover the semantic relatedness between places and contexts. I evaluate this approach using a corpus of short texts collected through web scraping, determine the suitable weights for the models, and demonstrate that the combined model, i.e., Things and Strings Model, outperforms benchmark systems such as DBpedia Spotlight, TextRazor, and Open Calais by up to 85% in F-score and 46% in Precision at 1. A web service is built to demonstrate the proposed method and it can be a building block for those applications that need place name recognition and disambiguation.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View