Class AbstractParser
- All Implemented Interfaces:
- Parser
- Direct Known Subclasses:
- AbiWordParser,- CatchAllParser,- DummyParser,- EpubParser,- HTMLParser,- KOfficeParser,- OpenOfficeParser,- PDFParser,- PPTParser,- RarParser,- RTFParser,- SevenZipParser,- TarParser,- TXTParser,- XLSParser,- XMLParser,- ZABWParser,- ZipParser
- Since:
- 3.5
- Author:
- Marco Meschieri - LogicalDOC
- 
Constructor SummaryConstructors
- 
Method SummaryModifier and TypeMethodDescriptionintcountPages(File file, String filename) Same as the otherParser.countPages(InputStream, String), but use this when you have a file rather than a stream.intcountPages(InputStream input, String filename) Counts the number of pages of the given binary document.Same asParser.parse(InputStream, String, String, Locale, String), use this when you have a file rather than a stream.parse(File file, String filename, String encoding, Locale locale, String tenant, Document document, String fileVersion) Same asParser.parse(InputStream, ParseParameters), but use this when you have a file rather than a stream.parse(InputStream input, ParseParameters parameters) Extracts content for the text content of the given binary document.Extracts content for the text content of the given binary document.
- 
Constructor Details- 
AbstractParserpublic AbstractParser()
 
- 
- 
Method Details- 
parsepublic String parse(File file, String filename, String encoding, Locale locale, String tenant) throws ParsingException Description copied from interface:ParserSame asParser.parse(InputStream, String, String, Locale, String), use this when you have a file rather than a stream.- Specified by:
- parsein interface- Parser
- Parameters:
- file- the file
- filename- name of the file
- encoding- character encoding
- locale- the locale
- tenant- name of the tenant
- Returns:
- the extracted text
- Throws:
- ParsingException- error in the parsing
 
- 
parsepublic String parse(File file, String filename, String encoding, Locale locale, String tenant, Document document, String fileVersion) throws ParsingException Description copied from interface:ParserSame asParser.parse(InputStream, ParseParameters), but use this when you have a file rather than a stream.- Specified by:
- parsein interface- Parser
- Parameters:
- file- the file
- filename- name of the file
- encoding- character encoding
- locale- the locale
- tenant- name of the tenant
- document- the document the file belongs to (optional)
- fileVersion- the file version being processed (optional)
- Returns:
- the extracted text
- Throws:
- ParsingException- error in the parsing
 
- 
parsepublic String parse(InputStream input, String filename, String encoding, Locale locale, String tenant) throws ParsingException Description copied from interface:ParserExtracts content for the text content of the given binary document. The content type and character encoding (if available and applicable) are given as arguments.The implementation can choose either to read and parse the given document immediately or to return a reader that does it incrementally. The only constraint is that the implementation must close the given stream latest when the returned reader is closed. The caller on the other hand is responsible for closing the returned reader. The implementation should only throw an exception on transient errors, i.e. when it can expect to be able to successfully extract the text content of the same binary at another time. An effort should be made to recover from syntax errors and other similar problems. This method should be thread-safe, i.e. it is possible that this method is invoked simultaneously by different threads to extract the text content of different documents. On the other hand the returned reader does not need to be thread-safe. The parsing has to be completed before the seconds specified in the parser.timeout config. property. Depending on the value of the parser.timeout.retain config. property, the already extracted text is retained or not in case of timeout. - Specified by:
- parsein interface- Parser
- Parameters:
- input- binary content from which to extract the text
- filename- name of the file
- encoding- character encoding
- locale- the locale
- tenant- name of the tenant
- Returns:
- the extracted text
- Throws:
- ParsingException- error in the parsing
 
- 
parseDescription copied from interface:ParserExtracts content for the text content of the given binary document. The content type and character encoding (if available and applicable) are given as arguments.The implementation can choose either to read and parse the given document immediately or to return a reader that does it incrementally. The only constraint is that the implementation must close the given stream latest when the returned reader is closed. The caller on the other hand is responsible for closing the returned reader. The implementation should only throw an exception on transient errors, i.e. when it can expect to be able to successfully extract the text content of the same binary at another time. An effort should be made to recover from syntax errors and other similar problems. This method should be thread-safe, i.e. it is possible that this method is invoked simultaneously by different threads to extract the text content of different documents. On the other hand the returned reader does not need to be thread-safe. The parsing has to be completed before the seconds specified in the parser.timeout config. property. Depending on the value of the parser.timeout.retain config. property, the already extracted text is retained or not in case of timeout. - Specified by:
- parsein interface- Parser
- Parameters:
- input- binary content from which to extract the text
- parameters- the parameters
- Returns:
- the extracted text
- Throws:
- ParsingException- error in the parsing
 
- 
countPagesDescription copied from interface:ParserCounts the number of pages of the given binary document.- Specified by:
- countPagesin interface- Parser
- Parameters:
- input- binary content from which to extract the text
- filename- name of the file
- Returns:
- the number of pages
 
- 
countPagesDescription copied from interface:ParserSame as the otherParser.countPages(InputStream, String), but use this when you have a file rather than a stream.- Specified by:
- countPagesin interface- Parser
- Parameters:
- file- the file
- filename- name of the file
- Returns:
- the number of pages
 
 
-