Document Data#
Pre-requisite#
Processing document data depends on the optical character recognition (OCR) package tesseract
.
For Ubuntu users, you can install Tesseract and its developer tools by simply running:
sudo apt install tesseract-ocr
For macOS users, run:
sudo port install tesseract
or run:
brew install tesseract
For Windows users, installer is available from Tesseract at UB-Mannheim. To access tesseract-OCR from any location you may have to add the directory where the tesseract-OCR binaries are located to the Path variables.
For additional support, please refer to official instructions for tesseract