Extraction

Efficient data extraction is pivotal for streamlined document processing. Docsumo provides diverse data extraction options, enabling you to define how you want to extract information from documents. This support document explains the data extraction options available, as well as how to choose and implement the most suitable model for your specific needs.

Data Extraction Options

Base Models
Base models are pre-built extraction solutions that seamlessly extract data from documents without manual training. These models are ready to use and cover a wide range of data extraction needs.

  1. Baseline Model: A foundational system default model that extracts reliable data from various document types without requiring specific training. Only available in Pre-trained document types
  2. AI Assist: This model offers generic learning capabilities and suggests values during annotation, reducing manual effort.
  3. Few-Shot Learning: Leverage your historical actions and interactions to predict optimal results for similar documents. It enhances efficiency without explicit training.

Advanced Models
Advanced models offer customised data extraction solutions for complex and specialized use cases. These models can be trained to meet specific requirements.

  1. Key-Value Model: Designed to extract key value information (e.g., names, dates, amounts) from documents. It can be trained to accurately identify and extract essential data.
  2. Table Model: Ideal for extracting tabular data from documents like invoices, receipts, and financial reports. Annotation of table structures helps train the model for effective extraction.

Selecting the Right Model:
Choosing the appropriate data extraction model depends on the complexity and uniqueness of your document processing needs.

  1. Consider Document Type: If your documents are common and generic, base models like the Baseline Model or AI Assist might be sufficient for accurate extraction.
  2. Custom Requirements: For specific fields or complex layouts, advanced models like the Key-Value Model or Table Model can be trained to align with your unique extraction needs.
  3. Training Data: If you have sufficient training data and specific requirements, advanced models can be fine-tuned to optimise extraction accuracy.

How to use Different Extraction Methods?

Follow these simple steps to setup the extraction

Step 1. Log in to your Docsumo account.

Step 2. Navigate to the Document Type Settings.

  • From the document type card you want to set up, navigate to the document type settings.

Step 3. Go to "Extraction"

  • Locate "Extraction" on the left side navigation and click on it.

Step 4. Select the Model

  • Enable or select the model that best fits your document processing requirements.

Train a model if needed and then link it with the document type.

Data extraction flexibility is essential in document processing. By offering both base and advanced models, Docsumo empowers you to tailor data extraction to your specific needs. Whether you require out-of-the-box solutions or customised approaches, the right model selection enhances accuracy, efficiency, and overall document processing success.