Package com.logicaldoc.core.parser
Machinery for parsing different file formats. Implementations of the
The parsers are used by the full-text engine to extract the contents for indexing your documents and calculating the number of pages.
Parser
are designed to read the content of
a specific file type.The parsers are used by the full-text engine to extract the contents for indexing your documents and calculating the number of pages.
- Since:
- 1.0
-
Interface Summary Interface Description Parser A Parser is capable of parsing a content in order to extract the texts and other metadata within it. -
Class Summary Class Description AbiWordParser Text extractor for AbiWord documents.AbstractParser Abstract implementation of a ParserCatchAllParser Parser that tries to convert the document into PDF and then tries to parse itDOCParser Parses a MS Word (*.doc, *.dot) file to extract the text contained in the file.DummyParser Parser that doesn't parse anythingEpubParser A specialized parser to extract text from .epub(e-books) formatHTMLParser Text extractor for HyperText Markup Language (HTML).HTMLSAXParser Helper class for HTML parsingKOfficeParser Text extractor for KOffice 1.6 documents.OpenOfficeParser Text extractor for OpenOffice/OpenDocument documents.ParserFactory This is a factory, returning a parser instance for the given file.PDFParser Text extractor for Portable Document Format (PDF).PPTParser Parser for Office 2003 presentationsPSParser RarParser Class for parsing rar files.RTFParser SevenZipParser Class for parsing 7z files.TarParser Class for parsing tar files.TXTParser Class for parsing text (*.txt) files.XLSParser Parser for Office 2003 worksheetsXMLParser Text extractor for XML documents.ZABWParser Text extractor for AbiWord compressed documents.ZipParser Class for parsing zip files. -
Exception Summary Exception Description ParseException When an error happens during the parsing