Document Processing

Document processing is a crucial aspect of an IDP, ensuring that data is accurately extracted and managed from uploaded documents. In this support document, we will provide an overview on how the data is extracted from your documents?

It is a process of automatically collecting structured information from unstructured documents. This step is essential for converting raw data into a usable format for analysis, reporting, and decision-making. Here, we'll explore various data extraction methods used in Docsumo, including OCR (Optical Character Recognition), ML (Machine Learning) models, AI Assist, and AI Table Assist.

OCR (Optical Character Recognition):

  • OCR converts printed or handwritten text from documents, images, or scanned pages into machine-readable text.

ML Models to Extract Data:

  • Machine Learning models are employed to extract specific data fields from documents by learning patterns and structures.
  • ML models are trained on labeled datasets containing examples of the data to be extracted.
  • The model learns to recognise patterns, keywords, and structures that indicate the presence of the target data.
  • When presented with new documents, the ML model applies its learned knowledge to locate and extract the required data fields.

Assistance with AI Assist for Faster Annotation:

  • AI Assist speeds up the data annotation process by providing suggestions and automating repetitive tasks.

AI Table Assist:

  • AI Table Assist is a specialised feature used for identify tables in your document. It gives you the option to simply pick and extract tables from a document without the need to draw a table.