Package com.logicaldoc.core.parser

package com.logicaldoc.core.parser
Machinery for parsing different file formats. Implementations of the Parser are designed to read the content of a specific file type.

The parsers are used by the full-text engine to extract the contents for indexing your documents and calculating the number of pages.
  • Class
    Text extractor for AbiWord documents.
    Abstract implementation of a Parser
    Parser that tries to convert the document into PDF and then tries to parse it
    Parses a MS Word (*.doc, *.dot) file to extract the text contained in the file.
    Parser that doesn't parse anything
    A specialized parser to extract text from .epub(e-books) format
    Text extractor for HyperText Markup Language (HTML).
    Text extractor for KOffice 1.6 documents.
    Text extractor for OpenOffice/OpenDocument documents.
    When an error happens during the parsing
    Some parameters to parse documents
    A Parser is capable of parsing a content in order to extract the texts and other metadata within it.
    This is a factory, returning a parser instance for the given file.
    Text extractor for Portable Document Format (PDF).
    Parser for Office 2003 presentations
    Class for parsing rar files.
    Class for parsing 7z files.
    Class for parsing tar files.
    Class for parsing text (*.txt) files.
    Parser for Office 2003 worksheets
    Text extractor for XML documents.
    Text extractor for AbiWord compressed documents.
    Class for parsing zip files.