Class PDFImageExtractor

java.lang.Object
com.logicaldoc.core.util.PDFImageExtractor
All Implemented Interfaces:
AutoCloseable

public class PDFImageExtractor extends Object implements AutoCloseable
This utility class allows the extraction of raster images from a PDF document
Since:
1.0.0
Author:
Marco Meschieri - LogicalDOC
  • Constructor Details

    • PDFImageExtractor

      public PDFImageExtractor(File pdfFile)
      Creates a new PDF reader for the given PDF file
      Parameters:
      pdfFile - the pdf file
  • Method Details

    • close

      public void close() throws IOException
      Closes the PDF and releases resources used
      Specified by:
      close in interface AutoCloseable
      Throws:
      IOException - if the pdf file cannot be read
    • getNumberOfPages

      public int getNumberOfPages()
      Returns the total number of pages in the PDF
      Returns:
      the total number of pages
    • getPageAsImage

      public BufferedImage getPageAsImage(int pageIndex) throws IOException
      Renders the specified page as a buffered image
      Parameters:
      pageIndex - zero based page index, i.e., the first page is page 0
      Returns:
      object representation of the image
      Throws:
      IOException - if the pdf file cannot be read
    • extractImage

      public BufferedImage extractImage(int pageIndex, org.apache.pdfbox.cos.COSName imageKey) throws IOException
      Extracts the imageKey image from the given page
      Parameters:
      pageIndex - zero based page index, i.e., the first page is page 0
      imageKey - identifier of the image
      Returns:
      object representation of the image
      Throws:
      IOException - if the pdf file cannot be read
    • rotate90SX

      public BufferedImage rotate90SX(BufferedImage bi)
    • getImageKeys

      public Set<org.apache.pdfbox.cos.COSName> getImageKeys(int pageIndex) throws IOException
      Gets the set of images identifiers inside the given page
      Parameters:
      pageIndex - zero based page index, i.e., the first page is page 0
      Returns:
      set of image identifiers
      Throws:
      IOException - if the pdf file cannot be read
    • extractImages

      public List<BufferedImage> extractImages() throws IOException
      Extracts all images of the entire document
      Returns:
      The list of images
      Throws:
      IOException - if the pdf file cannot be read