Class OCR

java.lang.Object
com.logicaldoc.ocr.OCR
Direct Known Subclasses:
OCRWebService, PowerPDF, Tesseract

public abstract class OCR extends Object
This OCR engine is capable of recognizing characters (letter and numbers) accurately
Author:
Alessandro Gasparini
  • Method Details

    • loadParameters

      public void loadParameters()
    • getParameters

      public Map<String,String> getParameters()
    • getParameter

      public String getParameter(String name)
    • getParameterNames

      public List<String> getParameterNames()
    • isAvailable

      public boolean isAvailable()
    • extractPDFText

      public void extractPDFText(File pdffile, String lang, String tenant, StringBuilder buffer, OCRHistory transaction) throws IOException
      Extracts the text from PDF file
      Parameters:
      pdffile - the file to ocr
      lang - the language in which the document is written
      tenant - name of the tenant
      buffer - the buffer to store the extracted text
      transaction - informations about the indexing transaction
      Throws:
      IOException - In case of OCR error
    • extractText

      public void extractText(File imgfile, String lang, String tenant, StringBuilder sb, OCRHistory transaction) throws IOException
      Throws:
      IOException
    • getResolutionThreshold

      public int getResolutionThreshold(String tenant)
    • isWindows

      public static boolean isWindows()