Machinery for parsing different file formats.
-
class
Text extractor for AbiWord documents.
class
Parser that tries to convert the document into PDF and then tries to parse it
class
Parses a MS Word (*.doc, *.dot) file to extract the text contained in the
file.
class
Parser that doesn't parse anything
class
A specialized parser to extract text from .epub(e-books) format
class
Text extractor for HyperText Markup Language (HTML).
class
Text extractor for KOffice 1.6 documents.
class
Text extractor for OpenOffice/OpenDocument documents.
class
Text extractor for Portable Document Format (PDF).
class
Parser for Office 2003 presentations
class
Class for parsing rar files.
class
class
Class for parsing 7z files.
class
Class for parsing tar files.
class
Class for parsing text (*.txt) files.
class
Parser for Office 2003 worksheets
class
Text extractor for XML documents.
class
Text extractor for AbiWord compressed documents.
class
Class for parsing zip files.