Class PDFImageExtractor


  • public class PDFImageExtractor
    extends Object
    This utility class allows the extraction of raster images from a PDF document
    Since:
    1.0.0
    Author:
    Marco Meschieri - LogicalDOC
    • Constructor Detail

      • PDFImageExtractor

        public PDFImageExtractor​(File pdfFile)
        Creates a new PDF reader for the given PDF file
        Parameters:
        pdfFile - the pdf file
    • Method Detail

      • close

        public void close()
                   throws IOException
        Closes the PDF and releases resources used
        Throws:
        IOException - if the pdf file cannot be read
      • getPageAsImage

        public BufferedImage getPageAsImage​(int pageIndex)
                                     throws IOException
        Renders the specified page as a buffered image
        Parameters:
        pageIndex - zero based page index, i.e., the first page is page 0
        Returns:
        object representation of the image
        Throws:
        IOException - if the pdf file cannot be read
      • extractImage

        public BufferedImage extractImage​(int pageIndex,
                                          org.apache.pdfbox.cos.COSName imageKey)
                                   throws IOException
        Extracts the imageKey image from the given page
        Parameters:
        pageIndex - zero based page index, i.e., the first page is page 0
        imageKey - identifier of the image
        Returns:
        object representation of the image
        Throws:
        IOException - if the pdf file cannot be read
      • getImageKeys

        public Set<org.apache.pdfbox.cos.COSName> getImageKeys​(int pageIndex)
                                                        throws IOException
        Gets the set of images identifiers inside the given page
        Parameters:
        pageIndex - zero based page index, i.e., the first page is page 0
        Returns:
        set of image identifiers
        Throws:
        IOException - if the pdf file cannot be read
      • extractImages

        public List<BufferedImage> extractImages()
                                          throws IOException
        Extracts all images of the entire document
        Returns:
        The list of images
        Throws:
        IOException - if the pdf file cannot be read