Class DOCParser
- All Implemented Interfaces:
Parser
- Since:
- 3.5
- Author:
- Michael Scholz, Sebastian Stein, Alessandro Gasparini - LogicalDOC
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionint
countPages
(InputStream input, String filename) Counts the number of pages of the given binary document.parse
(InputStream input, ParseParameters parameterObject) Extracts content for the text content of the given binary document.Methods inherited from class com.logicaldoc.core.parser.RTFParser
countPages, internalParse
Methods inherited from class com.logicaldoc.core.parser.AbstractParser
parse, parse, parse
-
Constructor Details
-
DOCParser
public DOCParser()
-
-
Method Details
-
parse
Description copied from interface:Parser
Extracts content for the text content of the given binary document. The content type and character encoding (if available and applicable) are given as arguments.The implementation can choose either to read and parse the given document immediately or to return a reader that does it incrementally. The only constraint is that the implementation must close the given stream latest when the returned reader is closed. The caller on the other hand is responsible for closing the returned reader.
The implementation should only throw an exception on transient errors, i.e. when it can expect to be able to successfully extract the text content of the same binary at another time. An effort should be made to recover from syntax errors and other similar problems.
This method should be thread-safe, i.e. it is possible that this method is invoked simultaneously by different threads to extract the text content of different documents. On the other hand the returned reader does not need to be thread-safe.
The parsing has to be completed before the seconds specified in the parser.timeout config. property.
Depending on the value of the parser.timeout.retain config. property, the already extracted text is retained or not in case of timeout.
- Specified by:
parse
in interfaceParser
- Overrides:
parse
in classAbstractParser
- Parameters:
input
- binary content from which to extract the textparameterObject
- the parameters- Returns:
- the extracted text
-
countPages
Description copied from interface:Parser
Counts the number of pages of the given binary document.- Specified by:
countPages
in interfaceParser
- Overrides:
countPages
in classRTFParser
- Parameters:
input
- binary content from which to extract the textfilename
- name of the file- Returns:
- the number of pages
-