com.logicaldoc.core.parser (logicaldoc 8.9.0 API)

package com.logicaldoc.core.parser

Machinery for parsing different file formats. Implementations of the Parser are designed to read the content of a specific file type.

The parsers are used by the full-text engine to extract the contents for indexing your documents and calculating the number of pages.

Since:: 1.0

Related Packages

Package

Description

com.logicaldoc.core

Core plug-in that contains the most important domain objects
Class

Description

AbiWordParser

Text extractor for AbiWord documents.

AbstractParser

Abstract implementation of a Parser

CatchAllParser

Parser that tries to convert the document into PDF and then tries to parse it

DOCParser

Parses a MS Word (*.doc, *.dot) file to extract the text contained in the file.

DummyParser

Parser that doesn't parse anything

EpubParser

A specialized parser to extract text from .epub(e-books) format

HTMLParser

Text extractor for HyperText Markup Language (HTML).

KOfficeParser

Text extractor for KOffice 1.6 documents.

OpenOfficeParser

Text extractor for OpenOffice/OpenDocument documents.

ParseException

When an error happens during the parsing

ParseParameters

Some parameters to parse documents

Parser

A Parser is capable of parsing a content in order to extract the texts and other metadata within it.

ParserFactory

This is a factory, returning a parser instance for the given file.

PDFParser

Text extractor for Portable Document Format (PDF).

PPTParser

Parser for Office 2003 presentations

RarParser

Class for parsing rar files.

RTFParser

SevenZipParser

Class for parsing 7z files.

TarParser

Class for parsing tar files.

TXTParser

Class for parsing text (*.txt) files.

XLSParser

Parser for Office 2003 worksheets

XMLParser

Text extractor for XML documents.

ZABWParser

Text extractor for AbiWord compressed documents.

ZipParser

Class for parsing zip files.

Package com.logicaldoc.core.parser