Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

Towards Extracting Protein-Compound Interactions from BioChemical Patents

Abstract

We present in this work a protein entity tagging and normalization process focused on data extraction from biochemical patents. The project acts a single stage in the pipeline of general chemical interaction extraction. Novel to this work is the character embedded approach to mention identification and normalization. Additionally, this is the first work to use a siamese network and a prototypical network to augment protein database normalization. Our results show that character embeddings provide a reasonable approach to protein entity extraction achieving up to 6\% better results than previous work, and that normalization tasks can be improved significantly with a learned embedded space.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View