Addendum IV

From Wiki
Revision as of 12:15, 3 March 2024 by Drutalj (talk | contribs) (Marked this version for translation)

Recommendations on the use of Optical Character Recognition Software in Digitization

  1. Using the Right Software: Should be highly accurate, reliable, and work with multiple languages.
  2. Using the Right Scan Parameters: When scanning documents, it is important to set the right parameters in your scanner settings. The foremost of these is orientation. Ensure that the document is fed into the scanner at the correct angle because a skewed scan can seriously affect Optical Character Recognition (OCR) software accuracy. Test and tweak the settings until you achieve the desired result.
  3. Resolution Setting: The best resolution for accurate OCR is 300 dpi. This level of resolution enables the OCR engine to work with twice the number of reference points versus 150 dpi.
  4. Color Mode Selection: For discolored or old documents, RGB is the recommended color mode to enable the scanner to fully capture the contents of the physical document. In general, however, scanning in grayscale mode is the best option for OCR accuracy. Although the black and white mode helps the image be scanned at a faster rate, this could affect the quality of text recognition.
  5. Brightness and Contrast Adjustments: For brightness, both extremes—too high and too low—can negatively affect OCR quality and accuracy. For that reason, 50% is the recommended brightness setting. However, this is also dependent on the scanner itself, so an initial trial and error phase may be expected. In terms of contrast, the highest setting is usually preferred.
  6. Image Correction and Decontamination: These two components greatly impact the quality of OCR scanning. Image correction covers aspects such as increasing the resolution, applying color corrections, and trying out different contrast settings; decontamination involves the removal of non-text characters such as icons, non-text images, unusual characters, and so on. Both are important because they enable the OCR engine to “read” the document more accurately.
  7. Careful Manual Proofreading: Depending on how accurate you want the end result to be, manual proofreading may be required—if accuracy is paramount, this is an indispensable step. It essentially involves human verification of a sample of processed files to ensure that the scanned characters are recognized correctly. It’s a tedious and painstaking process, but essential in many cases.