# Document Prediction ## Pre-requisite Processing document data depends on the optical character recognition (OCR) package `tesseract`. For Ubuntu users, you can install Tesseract and its developer tools by simply running: ```bash sudo apt install tesseract-ocr ``` For macOS users, run: ```bash sudo port install tesseract ``` or run: ```bash brew install tesseract ``` For Windows users, installer is available from Tesseract at [UB-Mannheim](https://github.com/UB-Mannheim/tesseract/wiki). To access tesseract-OCR from any location you may have to add the directory where the tesseract-OCR binaries are located to the Path variables. For additional support, please refer to official instructions for [tesseract](https://tesseract-ocr.github.io/tessdoc/Installation.html) ## Quick Start ::::{grid} 2 :gutter: 3 :::{grid-item-card} AutoMM for Scanned Document Classification :link: document_classification.html How to use AutoMM to build a scanned document classifier. ::: :::{grid-item-card} Classifying PDF Documents with AutoMM :link: pdf_classification.html How to use AutoMM to build a PDF document classifier. ::: :::: ```{toctree} --- maxdepth: 1 hidden: true --- document_classification pdf_classification ```