<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">
  <channel>
    <docs>http://www.rssboard.org/rss-specification</docs>
    <atom:link rel="self" type="application/rss+xml" href="https://escholarship.org/uc/ucsflibrary_asc/rss"/>
    <ttl>720</ttl>
    <title>Recent ucsflibrary_asc items</title>
    <link>https://escholarship.org/uc/ucsflibrary_asc/rss</link>
    <description>Recent eScholarship items from Archives &amp; Special Collections Projects</description>
    <pubDate>Fri, 15 May 2026 09:06:23 +0000</pubDate>
    <item>
      <title>2024 Industry Documents Undergraduate Summer Fellowship - JUUL Labs Collection Final&amp;nbsp;Report</title>
      <link>https://escholarship.org/uc/item/3cf2389w</link>
      <description>This report, developed as part of the 2024 UCSF Industry Documents Library Undergraduate Summer Fellowship, examines four distinct projects that leverage natural language processing and data science within the context of the JUUL Labs Collection and the broader IDL. Project One investigates the optical character recognition (OCR) accuracy of low-quality and handwritten documents in the absence of ground truth data. Project Two explores the implementation of embedding search algorithms and visualizations aimed at enhancing the relevance of document recommendations for users. Project Three employs txt-ferret to conduct a thorough scan of a substantial corpus of industry documents to identify sensitive information, including credit card numbers. Finally, Project Four assesses the biases present in large language model (LLM) summarization through the lens of sentiment analysis.</description>
      <guid isPermaLink="true">https://escholarship.org/uc/item/3cf2389w</guid>
      <pubDate>Mon, 21 Oct 2024 00:00:00 +0000</pubDate>
      <author>
        <name>Lichtstein, Gordon</name>
      </author>
    </item>
    <item>
      <title>Silence in OCR: What Could Handwritten Documents Tell Us?</title>
      <link>https://escholarship.org/uc/item/6z8709hd</link>
      <description>&lt;p&gt;This report, produced as part of the UCSF Archives and Special Collections Summer Fellowship program, explores the efficacy of Optical Character Recognition (OCR) technology in processing archival documents. OCR technology, which automates the extraction of text from images, has significantly advanced recently, providing substantial benefits for archival organizations by making vast amounts of previously “hidden” data more accessible. This study specifically examines the disparities in OCR quality between handwritten and typewritten documents, highlighting that OCR’s effectiveness is considerably lower for handwritten texts. This discrepancy results in biases and underrepresentation in datasets, particularly affecting the accessibility and utility of handwritten documents from historical archives.&lt;/p&gt;&lt;p&gt;Utilizing a dataset comprising documents related to AIDS/HIV activism from the 1980s and 1990s, this project evaluates the performance of three OCR tools—Tesseract, Google Cloud...</description>
      <guid isPermaLink="true">https://escholarship.org/uc/item/6z8709hd</guid>
      <pubDate>Wed, 2 Oct 2024 00:00:00 +0000</pubDate>
      <author>
        <name>Zhang, Theo</name>
      </author>
    </item>
  </channel>
</rss>
