Package com.logicaldoc.ocr
Class PDFImageExtractor
- java.lang.Object
 - 
- com.logicaldoc.ocr.PDFImageExtractor
 
 
- 
public class PDFImageExtractor extends Object
This utility class allows the extraction of raster images from a PDF document- Since:
 - 1.0.0
 - Author:
 - Marco Meschieri - LogicalDOC
 
 
- 
- 
Constructor Summary
Constructors Constructor Description PDFImageExtractor(File pdfFile)Creates a new PDF reader for the given PDF file 
- 
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidclose()Closes the PDF and releases resources usedBufferedImageextractImage(int pageIndex, org.apache.pdfbox.cos.COSName imageKey)Extracts the imageKey image from the given pageList<BufferedImage>extractImages()Extracts all images of the entire documentSet<org.apache.pdfbox.cos.COSName>getImageKeys(int pageIndex)Gets the set of images identifiers inside the given pageintgetNumberOfPages()Returns the total number of pages in the PDFBufferedImagegetPageAsImage(int pageIndex)Renders the specified page as a buffered imageBufferedImagerotate90SX(BufferedImage bi) 
 - 
 
- 
- 
Constructor Detail
- 
PDFImageExtractor
public PDFImageExtractor(File pdfFile)
Creates a new PDF reader for the given PDF file- Parameters:
 pdfFile- the pdf file
 
 - 
 
- 
Method Detail
- 
close
public void close() throws IOExceptionCloses the PDF and releases resources used- Throws:
 IOException- if the pdf file cannot be read
 
- 
getNumberOfPages
public int getNumberOfPages()
Returns the total number of pages in the PDF- Returns:
 - the total number of pages
 
 
- 
getPageAsImage
public BufferedImage getPageAsImage(int pageIndex) throws IOException
Renders the specified page as a buffered image- Parameters:
 pageIndex- zero based page index, i.e., the first page is page 0- Returns:
 - object representation of the image
 - Throws:
 IOException- if the pdf file cannot be read
 
- 
extractImage
public BufferedImage extractImage(int pageIndex, org.apache.pdfbox.cos.COSName imageKey) throws IOException
Extracts the imageKey image from the given page- Parameters:
 pageIndex- zero based page index, i.e., the first page is page 0imageKey- identifier of the image- Returns:
 - object representation of the image
 - Throws:
 IOException- if the pdf file cannot be read
 
- 
rotate90SX
public BufferedImage rotate90SX(BufferedImage bi)
 
- 
getImageKeys
public Set<org.apache.pdfbox.cos.COSName> getImageKeys(int pageIndex) throws IOException
Gets the set of images identifiers inside the given page- Parameters:
 pageIndex- zero based page index, i.e., the first page is page 0- Returns:
 - set of image identifiers
 - Throws:
 IOException- if the pdf file cannot be read
 
- 
extractImages
public List<BufferedImage> extractImages() throws IOException
Extracts all images of the entire document- Returns:
 - The list of images
 - Throws:
 IOException- if the pdf file cannot be read
 
 - 
 
 -