Docuvela Blog

Sharing our knowledge and experiences with the content services community

Getting Started with Generative AI – Document Summarization

Sep 28, 2023 | AI | 0 comments

As previously shared on the Docuvela blog, we are exploring Generative AI within the Content Services space – specifically looking at how this groundbreaking technology can be utilized quickly and efficiently without a large investment in time or resources.

With its ability to process information swiftly and cost-effectively, generative AI is emerging as a game-changer for document summarization. In this blog series installment, we’ll delve into how generative AI can be harnessed to efficiently distill lengthy documents into concise, coherent summaries.  These AI-generated summaries provide organizations with a powerful tool to enhance productivity, streamline information retrieval, and make informed decisions, all without requiring a significant investment of time or money.

Why Summarize Documents?

In many content services applications, information overload can be a common challenge. To help overcome this challenge, document summarization plays a pivotal role in optimizing content retrieval. With document summaries readily available, users can quickly grasp the essence of a document without having to sift through pages of text. This convenience proves invaluable in search result listings since users can efficiently gauge the relevance of documents at a glance. By reducing the time and effort required to locate pertinent information, document summarization not only boosts productivity but also enhances the user experience. It empowers users to swiftly pinpoint the content that aligns with their objectives, improving user efficiency. This, in turn, results in faster decision-making and improved overall workflow within the organization.

ChatGPT Summarization Results

As part of the process of founding Docuvela, we wrote a number of policy documents for items such as password management, data backup, disaster recovery, customer data protection, and more.  To test how well ChatGPT could summarize documents, we decided to run a few of our policy documents through the ChatGPT API.  Here are the results:

POL-101 Password Management Policy

This document is a password management policy for Docuvela, LLC. The policy describes the requirements for acceptable password selection and maintenance, including guidelines for creating strong passwords and best practices for password management. It applies to all employees accessing systems that hold or transmit Docuvela data and emphasizes the importance of protecting passwords by not sharing them, not writing them down or storing them online, and not reusing them across multiple systems or applications. The policy also recommends the use of multi-factor authentication, role-based access controls, and regular password changes. Access should be removed promptly when an employee is terminated, changes roles, or is deemed a security risk.

POL-102 Customer Content and Data Policy

Docuvela’s Customer Content and Data Policy is aimed at protecting customer electronic confidential information by ensuring its integrity, availability, and confidentiality. The policy outlines access management practices, user authentication procedures, and data retention guidelines to ensure authorized personnel with a genuine need can access customer data. It emphasizes the importance of maintaining records of access permissions, promptly revoking access when no longer necessary, and implementing secure data purging practices. The policy applies to all employees involved in Docuvela’s operations and strives to maintain a secure environment for customer information. Through these practices, Docuvela is committed to upholding the highest standards of data protection and confidentiality.

At first glance, we think these summaries are pretty accurate!  They are a tad bit long though, so through prompt engineering, we had ChatGPT create shorter summaries.  Here are the results:

POL-101 Password Management Policy

This document provides guidance on password management for employees of Docuvela, LLC. It includes requirements for password strength, password management practices, and the removal of access for terminated or role-changed employees.

POL-102 Customer Content and Data Policy

This policy outlines Docuvela’s requirements for protecting customer data. It includes guidelines on access management, user authentication, data retention, and secure data purging. Docuvela is committed to safeguarding customer information and maintaining a secure environment.

Based on the output above, we would think that the shorter summaries would work best for most use cases, but depending on the content being summarized, the longer formats may work as well.

How to use AI-Generated Summaries

There are a number of ways to integrate content summaries  into a content services application:

  • Search – when searching for content, allow the Search API to search against content summaries as well as the metadata and full-text content.
  • Search Results – when displaying search results, display summaries to improve user efficiency when locating their desired content.  Summaries could be displayed inline or via a tooltip that appears upon hovering or tapping for mobile devices.
  • Document View – when viewing a document, display the summary content prominently so that that user can quickly get the gist of what the document contains.

While working through this R&D investigation effort, there were a few other items that came to mind for implementing AI-generated document summaries into a content services application.

  • Prompt engineering is important.  How the API is called with specific prompts prior to asking ChatGPT for the summary can drastically affect the output.  Controlling factors like length, tone, and overall consistency would be a good use case for fine tuning the model.
  • Performance of ChatGPT is good, but it’s not instantaneous.  For our relatively small documents, summaries generated in approximately 2-3 seconds for small requests and 7-10 seconds for larger requests.  Care would need to be taken when integrating this properly into a content services application so that functions such as document migration and fetching are not burdened.
  • We would recommend targeting AI-generated summaries to a subset of documents, based on type or aspect, to limit unnecessary summarization of documents that do not require it.  API costs and overall performance will benefit from proper targeting of the documents that need summarization.

Content summarization is just one way that we see for customers to get started with generative AI in content services applications without requiring a large investment of time or money. It also allows the IT organization and the business to familiarize, learn and test the AI technology in a simple, straightforward way before undertaking more complex AI implementations.  Contact us if you’d like to discuss your content services application further and be sure to follow the Docuvela blog, LinkedIn, and Twitter/X as we continue to explore the AI possibilities.

0 Comments

Leave a Reply

Discover more from Docuvela

Subscribe now to keep reading and get access to the full archive.

Continue reading