Document Lifecycle Stages

In Docsumo, documents go through a structured lifecycle, and each document is assigned a status that reflects the stage it's currently in. These statuses are crucial for efficiently managing your document processing workflow, enabling you to consume outputs more effectively and take necessary actions.

Here's an overview of the document lifecycle stages and their significance:

  • Processing: Once the document is in the "Processing" stage, it means it is undergoing the data extraction process. This stage indicates that the system is working on the document to generate output.

  • Review: After data extraction is complete, the document enters the "Review" stage. Here, you have the opportunity to verify the accuracy of the extracted information and make any necessary corrections.

  • Skipped: When a document is marked as "Skipped," it indicates that the user has chosen to bypass the review process for that particular document. No further actions are taken, as the document is separated out from the bunch of documents which are up for review. To review it again, you can open the document and press the button "Review Again" from the left bottom corner.

  • Processed: The status "Processed" signifies that the document has undergone a thorough review, with the user verifying and confirming the accuracy of the extracted information or it can also be a document which got processed automatically through straight through processing. . At this stage, the document is considered complete and accurate for further actions.

  • Erred: When a document is labeled as "Erred," it indicates that the document has encountered an error during processing. Reasons for a file getting into erred status are:

    • No data to extract: If there is no data to extract from the document, which generally happens if the file is empty or in case of spreadsheets workflow, if there is no tabular region that we can extract. In such cases the document goes to error.

    • No credits: When you have no credits left or the credits are insufficient, you might get to see this error.

    • Timeout: If a document takes more than 10 mins to process, then the document is sent to timeout. The user can manually retry the document processing in such cases.

    • Password protected or corrupt files: If a file is password protected or corrupt. We cannot process the file further and it goes to erred status.

    • Extraction Error: Sometimes there is issue while processing the documents in the server side. This might be due to error in extraction or internal server error.

      This diagram shows the lifecycle of a document. How a document moves through different status

While you are on the application, you will get to see these status on the "My Documents" page, here you can even filter the documents using these status.

Whereas if you are using the API's you will get the status in the JSON which is generated for the output.
Status value in the JSON might differ from what you see on the application, checkout the table below for correct status:

UI StatusJSON/Webhook Status
Processingnew
Reviewreviewing
Skippedreview_skipped
Processedprocessed
Errederred

Why Document Status Matters:

Document statuses play a vital role in managing your document processing workflow:

  1. They provide a clear understanding of where each document stands in its processing journey.
  2. They help you prioritise documents that require review or correction.
  3. They facilitate efficient data consumption and decision-making.
  4. They assist in keeping your document database organised and clutter-free.
  5. By understanding and utilising document statuses effectively, you can streamline your document processing, improve data accuracy, and ensure a more efficient workflow in Docsumo.