com.logicaldoc.core.parser (logicaldoc 8.8.0 API)

Machinery for parsing different file formats. Implementations of the Parser are designed to read the content of a specific file type.

The parsers are used by the full-text engine to extract the contents for indexing your documents and calculating the number of pages.

Since:: 1.0

Interface Summary
Interface Description

Parser
A Parser is capable of parsing a content in order to extract the texts and other metadata within it.

Interface Summary
Interface	Description
Parser	A Parser is capable of parsing a content in order to extract the texts and other metadata within it.

Class Summary
Class	Description
AbiWordParser	Text extractor for AbiWord documents.
AbstractParser	Abstract implementation of a Parser
CatchAllParser	Parser that tries to convert the document into PDF and then tries to parse it
DOCParser	Parses a MS Word (.doc, .dot) file to extract the text contained in the file.
DummyParser	Parser that doesn't parse anything
EpubParser	A specialized parser to extract text from .epub(e-books) format
HTMLParser	Text extractor for HyperText Markup Language (HTML).
HTMLSAXParser	Helper class for HTML parsing
KOfficeParser	Text extractor for KOffice 1.6 documents.
OpenOfficeParser	Text extractor for OpenOffice/OpenDocument documents.
ParserFactory	This is a factory, returning a parser instance for the given file.
PDFParser	Text extractor for Portable Document Format (PDF).
PPTParser	Parser for Office 2003 presentations
PSParser
RarParser	Class for parsing rar files.
RTFParser
SevenZipParser	Class for parsing 7z files.
TarParser	Class for parsing tar files.
TXTParser	Class for parsing text (*.txt) files.
XLSParser	Parser for Office 2003 worksheets
XMLParser	Text extractor for XML documents.
ZABWParser	Text extractor for AbiWord compressed documents.
ZipParser	Class for parsing zip files.

Exception Summary
Exception Description

ParseException
When an error happens during the parsing

Package com.logicaldoc.core.parser