OCR

Turn paper documents into full-text searchable digital files and manage them in a paperless document management system that incorporates advanced OCR software. Quickly and easily apply all the tools and functions of electronic document management to hard-copy documents and previously scanned files. By leveraging the best of breed OCR technologies, LogicalDOC is able to extract texts from images and raster PDFs acquired from massive scans from your multi-funtion device.

Performance drawbacks

OCR processing usually takes a long time and high CPU load to index a single document, so if you activate OCR, expect to have much higher time to index your repository

OCR of a scan

You don't have to explicitly ask for OCRing your files, just store them in LogicalDOC and the OCR will be used automatically at indexing time to extract the texts from your images or raster PDFs.

Remember that this is not a zonal OCR, it just extracts all the texts in order to allow you to perform full-text searches.

Configuration of the OCR

You can set how the OCR works by changing the configurations in this panel.

Supported OCR engines

You can choose one of the supported OCR engines

OCR Engine Description Configuration
Tesseract The famous open source OCR engine handled by Google

path: absolute installation path of the tesseract executable (make sure to put this path in the allowed commands)

OCR Web Service A lightweight online OCR engine

username: your own OCR Web Service username

licenseCode: your own license code associated to your OCR Web Service account

Power PDF Advanced An OCR engine developed by Nuance

path: absolute installation path of Power PDF Advanced (make sure to put this path in the allowed commands)

History

In the History tab you see the list of recorded events related to the OCR extractions:

OCR History