Document Data
=============

Pre-requisite
-------------

Processing document data depends on the optical character recognition (OCR) package ``tesseract``.


For Ubuntu users, you can install Tesseract and its developer tools by simply running:
    
    ``sudo apt install tesseract-ocr``

For macOS users, run:
    
    ``sudo port install tesseract``
or run:
    
    ``brew install tesseract``

For Windows users, installer is available from Tesseract at UB-Mannheim_. 
To access tesseract-OCR from any location you may have to add the directory where the tesseract-OCR binaries are located to the Path variables.

For additional support, please refer to official instructions for tesseract_

.. _UB-Mannheim: https://github.com/UB-Mannheim/tesseract/wiki
.. _tesseract: https://tesseract-ocr.github.io/tessdoc/Installation.html


Quick Start
------------------
.. container:: cards

   .. card::
      :title: AutoMM for Scanned Document Classification - Quick Start
      :link: document_classification.html

      How to use MultiModalPredictor to build a scanned document classifier.


.. toctree::
   :maxdepth: 1
   :hidden:

   document_classification