Package com.logicaldoc.core.parser


package com.logicaldoc.core.parser
Machinery for parsing different file formats. Implementations of the Parser are designed to read the content of a specific file type.

The parsers are used by the full-text engine to extract the contents for indexing your documents and calculating the number of pages.
Since:
1.0
  • Class
    Description
    Text extractor for AbiWord documents.
    Abstract implementation of a Parser
    Parser that tries to convert the document into PDF and then tries to parse it
    Parses a MS Word (*.doc, *.dot) file to extract the text contained in the file.
    Parser that doesn't parse anything
    A specialized parser to extract text from .epub(e-books) format
    Text extractor for HyperText Markup Language (HTML).
    Text extractor for KOffice 1.6 documents.
    Text extractor for the Markdown language.
    Text extractor for OpenOffice/OpenDocument documents.
    Some parameters to parse documents
    A Parser is capable of parsing a content in order to extract the texts and other metadata within it.
    This is a factory, returning a parser instance for the given file.
    When an error happens during the parsing
    A parsing error due to timeout
    Text extractor for Portable Document Format (PDF).
    Parser for Office 2003 presentations
    Class for parsing rar files.
    A parser for the Rich Text Format
    Class for parsing 7z files.
    Class for parsing tar files.
    Class for parsing text (*.txt) files.
    Parser for Office 2003 worksheets
    Text extractor for XML documents.
    Text extractor for AbiWord compressed documents.
    Class for parsing zip files.