Skip to main content
eScholarship
Open Access Publications from the University of California

UCLA

UCLA Electronic Theses and Dissertations bannerUCLA

Feature Representation in Mining and Language Processing

Abstract

Feature representation has been one of the most important factors for the success of machine learning algorithms. Since 2006, deep learning has been widely considered for various problems in different disciplines and, most of the time, has reset state-of-the-art results --- thanks to its excellent ability to learn highly abstract representations of data. I focus on extracting additional structural features in network analysis and natural language processing (NLP) --- via learning novel vector-based representations, usually known as embeddings.

For network analysis, I propose to learn representations for nodes, node embeddings, for social network applications. The embeddings are computed using attributes and links of nodes in the network. Experimental studies on community detection and mining tasks suggest that node embeddings can further reveal deeper structure of the network.

For NLP, I address the learning of representations at three levels: words, word relations, and linguistic expressions. First, I propose to extend the standard word embedding training process into two phases, treating context as second order in nature. This strategy can effectively compute embeddings for polysemous concepts of words, adding an extra conceptual layer for standard word embeddings. Second, I introduce the representations of ``semantic binders'' for words. These representations are learned using categorial grammar and are shown to effectively handle disambiguation, especially when meaning of a word largely depends on a specific context. Finally, I present a three-layer framework to learn representation for linguistic expressions --- for solving the semantic compositionality problem, using recurrent neural networks driven by categorial-based combinatory rules. This strategy specifically addresses the limitations of recurrent neural network approaches in deciding how --- and when --- to include individual information in the compositional embedding. The framework is flexible and can be integrated with the proposed representations. I study the efficiency of the proposed representations in different NLP applications: word analogies, subject-verb-object agreement, paraphrasing, and sentiment analysis.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View