Prompt-Based Table Extraction

Docsumo introduces an advanced Table Extraction feature, enabling users to input table and column-specific prompts for the LLM (Large Language Model). This allows for precise extraction of data from complex tabular structures by defining specific column identifiers. The targeted extraction process significantly enhances accuracy and efficiency, making it easier to handle intricate tables with multiple formats and varying data types.

Feature Overview

Table Identifier

The Table Identifier functionality enables users to define specific keys that serve as identifiers for extracting information across all documents or certain document types. These key values are configured in the identifier settings, guiding the LLM to perform targeted extractions with high accuracy by focusing on the specified key.

Table Field Settings

Table Field Settings empowers users to craft detailed prompts for individual table columns, enabling precise data extraction from the document's tabular structures. Additionally, users can input prompts for the entire table to ensure comprehensive extraction according to their requirements.

Both functionalities are housed within a single settings block, which users can easily rename, add, or delete as needed, providing flexibility and control over the extraction process.

How to setup prompt based extraction for tables

Here's a step-by-step guide to setup prompt based extraction: for table data

Step 1: Access the Settings:

  • Access the Edit Fields(Field Settings) page for the document type you wish to make changes for.
  • Click on the setting option of the line item.

Step 2: Line Item settings

  • Navigate to the Advanced tab in the Line Item settings

Step 3: Setup Table identifier

  • In the Settings block, find the Table Identifier section.

  • You will see two options:

    • Apply block settings to all documents: This applies the block's settings to every document of the same type.
    • Apply block settings to specific documents: Use this option to apply settings only to documents that match certain criteria

Step 4: Setting Document-Specific Identifiers:

  • If you select the "Apply block settings to specific documents" option:

    • Under Field Name, choose the field that should serve as the identifier (e.g., "Account Number").
    • Under Value, input the corresponding value for the identifier.
    • Adjust the Match Score if needed (default is 80) to determine how closely the values must match.


Step 5: Add Additional Identifiers:

  • If necessary, click Add Identifier to set multiple key-value pairs to refine the scope of documents further.

Step 6: Set Up Prompts in Table Field Settings:

  • After setting the table identifier, scroll down to the Table Field Settings section within the same block.
  • Here, you can define specific prompts for each column in the table (e.g., "Extract the date of each transaction" for the Date column, or "Identify the debit amount" for the Debit column).
  • You can also write prompts for the entire table to guide the LLM in extracting data accurately from complex tabular structures.

Step 7: Modify and Manage Setting Blocks:

  • You can rename the Settings Block by clicking the pencil icon next to the block's name.
  • To delete an existing block, click the trash icon.
  • If additional blocks are needed, click Add Settings Block to create new ones with separate table identifiers and prompts.


Should you have any questions or encounter any issues during the process, feel free reach out to us at [email protected], and we'll be more than happy to help you.