ReportMiner’s Optical Character Recognition (OCR) capabilities allow users to extract data from any type of scanned document. For best results, the recognition engine needs to be trained and the settings adjusted. Users must also take a few precautions when scanning to ensure each character is recognized correctly and minimal, if any, editing is needed post scan.
We’ve listed a few important considerations to ensure optimal OCR:
Image and Font Size
- Documents should be scanned at minimum 300 dpi resolution for 10pt text size, or greater.
- For 9pt font size or smaller, 400-600 dpi is recommended.
- Same vertical and horizontal resolutions should ideally be maintained.
- Type of script is also a factor when determining ideal text size because of difference in character size:
- Simple script – 1 byte (English, Russian, Arabic, etc.)
Recommended text size = 14pt
- Complex script – 2 bytes (Chinese, Japanese, etc.)
Recommended text size = 18pt
Minimum text size = 15pt
Documents with complex layouts that contain colored images or backgrounds should be converted to grayscale for best results. Grayscale images reveal more information, optimizing OCR.
High-Quality Source Images
Ensure that the source documents are not damaged, wrinkled or discolored. For better results, use only high-quality. Lower quality images may require pre-processing techniques like binarization, contrast adjustments, and de-skewing.
Dark borders in source documents can be detected as extra characters by the OCR engine. We recommend removing them using OCR.
The quality of OCR is reduced significantly if an image or page is skewed. De-skew or rotate the image so that is horizontally aligned for accurate OCR.
- The largest font size that ReportMiner’s OCR can handle is 140pt.
- All image sizes can be processed.