com.logicaldoc.core.parser (logicaldoc 8.6.1 API)

Machinery for parsing different file formats. Implementations of the Parser are designed to read the content of a specific file type.

The parsers are used by the full-text engine to extract the contents for indexing your documents

Since:: 1.0

Interface Summary
Interface Description

Parser
A Parser is capable of parsing a content in order to extract the texts within it.

Interface Summary
Interface	Description
Parser	A Parser is capable of parsing a content in order to extract the texts within it.

Class Summary
Class	Description
AbiWordParser	Text extractor for AbiWord documents.
AbstractParser	Abstract implementation of a Parser
CatchAllParser	Parser that tries to convert the document into PDF and then tries to parse it
DOCParser	Parses a MS Word (.doc, .dot) file to extract the text contained in the file.
DummyParser	Parser that doesn't parse anything
EpubParser	A specialized parser to extract text from .epub(e-books) format
HTMLParser	Text extractor for HyperText Markup Language (HTML).
HTMLSAXParser	Helper class for HTML parsing
KOfficeParser	Text extractor for KOffice 1.6 documents.
OpenOfficeParser	Text extractor for OpenOffice/OpenDocument documents.
ParserFactory	This is a factory, returning a parser instance for the given file.
PDFParser	Text extractor for Portable Document Format (PDF).
PPTParser	Parser for Office 2003 presentations
PSParser
RTFParser
TXTParser	Class for parsing text (*.txt) files.
XLSParser	Parser for Office 2003 worksheets
XMLParser	Text extractor for XML documents.
ZABWParser	Text extractor for AbiWord compressed documents.
ZipParser	Class for parsing text (*.txt) files.

Package com.logicaldoc.core.parser