Training a Model

Training a model enables you to customise data extraction and streamline document processing. This support document provides a step-by-step guide on how to train a model using Docsumo's intuitive interface. By following these instructions, you can create a custom model tailored to your specific document type and improve the accuracy and efficiency of data extraction.

Step 1. Access the Models & Training Page

  • Log in to your Docsumo account using your credentials.
  • On the left side navigation bar, you will find "Models & Training", click on the "Models & Training" option. This will direct you to the Models & Training page.

Step 2. Initiate Model Training

  • On the Models & Training page, find and click the "New model" button. This will open a popup window to begin the model training process.
  • Inside the "New model" popup, select the Document type you wish to train the model for. This choice ensures that the model is optimized for accurate data extraction from the selected document type.
  • Choose the type of model you wish to train. Docsumo offers multiple model options based on the requirements of your chosen document type.
    • Type of models
      • Key-Value model: Trains a custom model only on key value fields of the document.
      • Table Model: Trains a custom model on the table data of the document.
      • COA classification: This option is only available if the document type selected is Profit & loss or balance sheet. You can train your own custom COA, after approving 20 document of this type.
        * We understand that the key-value and table models should be combined. Trust us, our engineers are working hard to integrate these into a single option, so you don't have to keep track of two models.
  • Once you've chosen the document type and model type, it's time to select the Training Dataset. You have two options:
    • All Approved Files: Choose this option to use all the documents previously approved in your account for the selected document type as the training dataset. This helps the model learn from existing data.
    • New Approved Files: If you want to focus on recent data for training, select this option and specify the Date post which you wish to use the approved files for new training


  1. You will need a minimum of 20 annotated/confirmed documents to train a model.
  2. Mandatory Parameters for training a model:
    1. Train From
    2. Select Model
    3. Sample Dataset

Step 3. Commence Model Training

  • After configuring the model parameters and dataset, click the "Train" button to initiate the model training process.
  • The model training will now begin, and you will see a progress indicator on the Models & Training page, indicating the ongoing training status.

Step 4. Monitor Model Training Progress

  • While the model is being trained, you can monitor its progress on the Models & Training page.
  • Depending on the size of the training dataset and model complexity, the training process may take some time. Please be patient and let the process complete.

Step 5. Link the Model to the Document Type

  • Once the model training is successfully completed, you will see the trained model listed on the Models & Training page. You can see the training report along with it, for a more detailed report, click on the model name.
  • If the model accuracy is as per your expectations you can use this model for data extraction, link it to the specific Document type you want to use it for. And if the accuracy reports are not as per your expectations you can choose to train a model with a bigger data set.
  • Click on the "Linked to" dropdown for the model under consideration and select the Document type from the dropdown menu to link the model.

By following the step-by-step instructions outlined in this support document, you can optimise your document processing accuracy. Should you have any questions or encounter any issues during the process, feel free reach out to us at [email protected], and we'll be more than happy to help you.